Economics, Law and Individual Rights
This is the first book to examine individual rights from an economic perspective, collecting together leading articles in this emerging area of interest and showing the vibrant and expanding scholarship that relates them. Areas covered include: • • • • • •
The implications of constitutional protections of individual rights and freedoms, including freedom of speech and of the press, The right to bear arms, The right against unreasonable searches, The right against self-incrimination, The right to trial by jury, The right against cruel and unusual punishment, including capital punishment.
The focus of these papers is both theoretical and empirical, examining how economics can illuminate the entire sequence of crime and punishment, from the decision to commit a crime, to police methods for apprehending and arresting criminals, to the rules used in trials to the scope of punishment for the convicted. This book will be of interest to lawyers and legal scholars engaged in analyses of Constitutional Law, as well as students and researchers engaged with the economics of law. Hugo M. Mialon is Assistant Professor of Economics at Emory University. Paul H. Rubin is Samuel Candler Dobbs Professor of Economics and Law at Emory University and Editor in Chief of Managerial and Decision Economics.
THE ECONOMICS OF LEGAL RELATIONSHIPS Sponsored by the Michigan State University College of Law STATEMENT OF SCOPE The Economics of Legal Relationships is a book series dedicated to publishing original scholarly contributions that systematically analyze legal-economic issues. Each book can take a variety of forms: (1) It may be comprised of a collection of original articles devoted to a single theme, edited by a guest volume editor or co-editors. (2) An individual may wish to (co)author an entire volume. (3) It may be a collection of refereed articles derived from the Series Editors’ “call for papers” on a particular legal-economic topic. Each book in the series is published in hardback, approximately 250–300 pages in length and is dedicated to: •
• • • •
Formulate and/or critique alternative theories of law and economics— including—Chicago law and economics, the economics of property rights, institutionalist law and economics, neoinstitutionalist economics, public choice theory, Austrian law and economics, or social norms & law and economics. Analyze a variety of public policy issues related to the interface between (1) judicial decisions and/or statutory law and (2) the economy. Explore the economic impact of political and legal changes brought on by new technologies and/or environmental and natural resource concerns. Examine the broad array of legal/economic issues surrounding the deregulation/re-regulation phenomena. Analyze the systematic effects of legal change on incentives and economic performance.
CALL FOR AUTHORS/VOLUME EDITORS/TOPICS An individual who is interested in either authoring an entire volume, or editing a future volume of The Economics of Legal Relationships should submit a 3–5-page prospectus to one of the series editors. Each prospectus must include: (1) the prospective title of the volume; (2) a brief description of the organizing theme of the volume whether single authored or edited; (3) an identification of the line of literature from which the proposed topic
emanates, and (4) either a table of contents or, if edited, a list of potential contributors along with tentative titles of their contributions. Send prospectus to either series editor. Please note that the series editors only accept individual manuscripts for publication consideration in response to a specific “Call for Papers.” Send prospectus directly to either series editor: Professor Nicholas Mercuro Michigan State University College of Law East Lansing, MI 48824 phone: (517) 432–6897 e-mail
[email protected] Professor Michael D. Kaplowitz Michigan State University Department of Community Agriculture, Recreation, and Resource Studies East Lansing, MI 48824 phone: (517) 355–0101 e-mail
[email protected] Compensation for Regulatory Takings Thomas J. Miceli and Kathleen Segerson
The End of Natural Monopoly Deregulation and competition in the electric power industry Edited by Peter Z. Grossman and Daniel H. Cole
Dispute Resolution Bridging the settlement gap Edited by David A. Anderson The Law and Economics of Development Edited by Edgardo Buscaglia, William Ratliff and Robert Cooter
Just Exchange: a Theory of Contract F. H. Buckley Network Access, Regulation and Antitrust Edited by Diana L. Moss
Fundamental Interrelationships Between Government and Property Edited by Nicholas Mercuro and Warren J. Samuels
Property Rights Dynamics A law and economics perspective Edited by Donatella Porrini and Giovanni Ramello
Property Rights, Economics, and the Environment Edited by Michael D. Kaplowitz
The Firm as an Entity Implications for economics, accounting and the law Edited by Yuri Biondi, Arnaldo Canziani and Thierry Kirat
Law and Economics in Civil Law Countries Edited by Thierry Kirat and Bruno Deffains
The Legal–Economic Nexus Warren J. Samuels Economics, Law and Individual Rights Edited by Hugo M. Mialon and Paul H. Rubin
*The first three volumes listed above are published by and available from Elsevier
Economics, Law and Individual Rights
Edited by Hugo M. Mialon and Paul H. Rubin
First published 2008 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN Simultaneously published in the USA and Canada by Routledge 270 Madison Ave, New York, NY 10016 This edition published in the Taylor & Francis e-Library, 2008. “To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.” Routledge is an imprint of the Taylor & Francis Group, an informa business © 2008 selection and editorial matter, Hugo M. Mialon and Paul H. Rubin; individual chapters, the contributors. All rights reserved. No part of this book may be reprinted or reproduced or utilized in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data Economics, law, and individual rights / edited by Hugo M. Mialon and Paul H. Rubin. p. cm. – (The economics of legal relationships ; v. 14) Includes bibliographical references and index. 1. Civil rights–Economic aspects–United States. I. Mialon, Hugo M. II. Rubin, Paul H. KF4749.E26 2008 342.7308′5–dc22 2007034549 ISBN 0-203-93088-6 Master e-book ISBN
ISBN10: 0–415–77281–8 (hbk) ISBN10: 0–203–93088–6 (ebk) ISBN13: 978–0–415–77281–5 (hbk) ISBN13: 978–0–203–93088–5 (ebk)
Contents
List of figures List of tables Acknowledgements 1 Introduction
ix x xiii 1
HUGO M. MIALON AND PAUL H. RUBIN
2 The economics of the First Amendment: the market for goods and the market for ideas
6
R. H. COASE
3 An economic analysis of the law of false advertising
22
ELLEN R. JORDAN AND PAUL H. RUBIN
4 Freedom of speech vs. efficient regulation in markets for ideas
31
ALBERT BRETON AND RONALD WINTROBE
5 A free press is bad news for corruption
49
AYMO BRUNETTI AND BEATRICE WEDER
6 The market for news
73
SENDHIL MULLAINATHAN AND ANDREI SHLEIFER
7 The impact of gun laws: a model of crime and self-defense
87
HUGO M. MIALON AND THOMAS WISEMAN
8 Crime, deterrence, and right-to-carry concealed handguns
107
JOHN R. LOTT, JR., AND DAVID B. MUSTARD
9 The effect of concealed handgun laws on crime: beyond the dummy variables HASHEM DEZHBAKHSH AND PAUL H. RUBIN
129
viii Contents 10 Effects of criminal procedure on crime rates: mapping out the consequences of the exclusionary rule
222
RAYMOND A. ATKINS AND PAUL H. RUBIN
11 An economic theory of the Fifth Amendment
248
HUGO M. MIALON
12 The effects of a right to silence
271
DANIEL J. SEIDMANN
13 Noisy juries and the choice of trial mode in a sequential signalling game: theory and evidence
303
GERALD D. GAY, MARTIN F. GRACE, JAYANT R. KALE AND THOMAS H. NOE
14 Runaway judges? Selection effects and the jury
328
ERIC HELLAND AND ALEXANDER TABARROK
15 Reasonable doubt and the optimal magnitude of fines: should the penalty fit the crime?
355
JAMES ANDREONI
16 The deterrent effect of capital punishment: a question of life and death
370
ISAAC EHRLICH
17 Does capital punishment have a deterrent effect? New evidence from postmoratorium panel data
398
HASHEM DEZHBAKHSH, PAUL H. RUBIN AND JOANNA M. SHEPHERD
Index
427
Figures
5.1 8.1 10.1 14.1 14.2 15.1 15.2 15.3 17.1
Corruption and press freedom The effect of concealed handguns on violent crimes Dynamic impact of Mapp Kernal estimation of all cases: judge and jury Estimation procedure Standard result of optimal deterrence Expected utility of a potential criminal Wealthy criminals and jail sentence Murder rates in executing and nonexecuting states
75 158 237 332 337 360 361 364 403
Tables
5.1
Dependent variable: average corruption from 1994 to 1998 5.2 Ordered probit estimation; dependent variable corruption in 1995 5.3 Sample tests and two-stage least-square estimates using various instruments 5.4 Testing the effects of press freedom for alternative corruption measures 5.5 Testing alternative measures of press freedom (dependent variable: average corruption from 1994 to 1998) 5.6 Panel data evidence 5A.1 Descriptive statistics 5A.2 Correlation matrix 7.1 Normal form of the game of crime and self-defense 7.2 The game of crime and self-defense with full gun control 7.3 The game of crime and self-defense with a severe gun-crime penalty 8.1 Comparing the deviation in crime rates between states and by counties within states from 1977 to 1992 8.2 National sample means and standard deviations 8.3 The effect of “shall issue” right-to-carry firearms laws on the crime rate 8.4 Questions of aggregating the data: national state-level crosssectional time-series evidence 8.5 The effect of concealed handguns on victim costs: what if all states adopted “shall issue” laws? 8.6 Questions of aggregating the data: do law enforcement and “shall issue” laws have the same effect in high and low crime areas? 8.7 Controlling for the fact that larger changes in crime rates are expected in the more populous counties where the change in the law constituted a bigger break with past policies
77 80 81 82 82 83 86 86 124 126 126 134 138 141 147 148
149
151
Tables 8.8
8.9 8.10 8.11 8.12 8.13
8.14 8.15
8.16 8.17
8.18 9.1 9.2 10.1 10.2 10.3 10.4 10.5 10.A1 11.1 11.2 11.3 13.1 13.2 13.3
Using other crime rates that are relatively unrelated to changes in “shall issue” rules as a method of controlling for other changes in the legal environment: controlling for robbery and burglary rates Rerunning the regressions on differences Controlling for other laws regulating gun use Regression estimates of the causes and effects of the adoption of concealed handgun laws Changes in murder methods for counties over 100,000, 1982–91 Changes in composition of murder victims using annual state-level data from the uniform crime reports supplementary homicide reports, 1977–92 Oregon, Pennsylvania, and Arizona sample means and standard deviations Using Pennsylvania data on the number of permits issued to measure the differential impact of Pennsylvania’s 1989 “shall issue” law on different counties: data for counties with populations over 200,000 Oregon data on the number of permits issued, the conviction rate, and prison sentence lengths Arizona data on the number of permits issued, the convicton rate, and prison sentence lengths, 1990–95 Did carrying concealed handguns increase the number of accidental deaths? County-level data, 1982–91 The predicted effect of adopting concealed handgun laws on crimes in states without such laws in 1992 Determinants of the magnitude of the change in crime induced by concealed handgun laws Mean percentage change in total per capita crime rates Changes in crime rates by type: state data Changes in crime rates by type: city data Effect of the Mapp and Gideon rulings on crime Effect of the Wolf ruling on crime Percentage change in total per capita crime rates Likelihood matrix representing evidence quality Information structures without mandatory disclosure Information structures with mandatory disclosure Characterizations and empirical restrictions of equilibria in the choice between trial by bench or jury Proportion of defendants selecting trial by bench, Pˆ(B), in the the Florida Circuit Courts during 1982–1985 Proportion of defendants selecting trial by bench, Pˆ(B), in the Texas District Courts during 1981–1986
xi
154 156 160 164 171
172 174
178 180
183 186 213 216 223 229 231 234 236 239 253 253 255 314 316 317
xii
Tables
13.4
Conviction rates for trials by judges and juries in the Florida Circuit Courts during 1982–1985 13.5 Conviction rates for trials by judges and juries in the Texas District Courts during 1981–1986 13.6 Comparison of judge and jury verdicts in identical trials 14.1 Judge/jury differences (all trials) 14.2 Award regressions 14.3 Forum choice, trial, and win sequential probit results 14.4 Award equation results for Heckit estimation 14.5 Marginal effects, judge and jury win equations 14.A1 Descriptive statistics 14.B1 Time to trial results 16.1 Behavioral implications 16.2 Variables used in the regression analysis, annual observations 1933–69 16.3 Modified first differences of murder rates (in natural logarithms) regressed against corresponding modified first differences of selected variables set 1 (1933–69) 16.4 Modified first differences of murder rates (in natural logarithms) regressed against corresponding modified first differences of selected variables set II: alternative time periods and other tests 17.1 Executions and executing states 17.2 Status of the death penalty 17.3 Two-stage least squares regression results for murder rate (Models 1–3) 17.4 Two-stage least squares regression results for murder rate (Models 4–6) 17.5 Estimates of the execution probability coefficient under various specifications (robustness check)
318 319 321 331 333 341 344 347 348 350 373 384
385
386 401 402 411 412 418
Acknowledgements
We are very grateful to Bogdana Georgieva and Chang Liu for research assistance. Professor Ronald H. Coase and American Economic Association for permission to reproduce extracts from Coase, Ronald H. ‘The Market for Goods and the Market for Ideas’ (1974). 64 (2) American Economic Review 384–391. Copyright © 1974 American Economic Review. Professor Paul H. Rubin and The Journal of Legal Studies for permission to reproduce extracts from Rubin, Paul H. and Jordan, Ellen R. ‘An Economic Analysis of the Law of False Advertising’ (1979). 8 (3) Journal of Legal Studies 527–553. Copyright © 1979 The University of Chicago. All rights reserved. Professor Albert Breton, Professor Ronald Wintrobe and Elsevier Science for permission to reproduce extracts from Breton, Albert and Wintrobe, Ronald ‘Freedom of Speech vs. Efficient Regulation in Markets for Ideas’ (1992). 17 (2) Journal of Economic Behavior & Organization 217–239. Copyright © 1992 Elsevier. Professor Aymo Brunetti and Elsevier for permission to reproduce extracts from Brunetti, Aymo and Weder, Beatrice ‘A free press is bad news for corruption’ (2003). 87 (7–8) Journal of Public Economics 1801–1824. Copyright © 2003 Elsevier. Professor Sendhil Mullainathan, Professor Andrei Shleifer and American Economic Association for permission to reproduce extracts from Mullainathan, Sendhil and Shleifer, Andrei ‘The Market for News’ (2005). 95 (4) American Economic Review 1031–1053. Copyright © 2005 American Economic Review. Professor Hugo M. Mialon, Professor Tom Wiseman and Elsevier for permission to reproduce extracts from Mialon, Hugo M. and Wiseman, Tom ‘The Impact of Gun Laws: A Model of Crime and Self-Defense’ (2005). 88 (2) Economic Letters 170–175. Copyright © 2005 Elsevier.
xiv Acknowledgements Professor John R. Lott, Jr., Professor David B. Mustard and The Journal of Legal Studies for permission to reproduce extracts from Lott, John R. Jr. and Mustard, David B. ‘Crime, Deterrence, and Right-to-Carry Concealed Handguns’ (1997). 26 (1) Journal of Legal Studies 1–68. Copyright © 1997 The University of Chicago. All rights reserved. Professor Paul H. Rubin and Elsevier for permission to reproduce extracts from Dezhbakhsh, Hashem and Rubin, Paul H. ‘The Effect of Concealed Handgun Laws on Crime: Beyond the Dummy Variables’ (2003). 23 (2) International Review of Law and Economics 199–216. Copyright © 2003 Elsevier. Professor Paul H. Rubin and The Journal of Law and Economics for permission to reproduce extracts from Atkins, Raymond A. and Rubin, Paul H. ‘Effects of Criminal Procedure on Crime Rates: Mapping Out the Consequences of The Exclusionary Rule’ (2003). 46 (1) Journal of Law and Economics 157–180. Copyright © 2003 The University of Chicago. All rights reserved. Professor Hugo M. Mialon and RAND Corporation for permission to reproduce extracts from Mialon, Hugo M. ‘An Economic Theory of the Fifth Amendment’ (2005). 36 (4) Rand Journal of Economics 833–848. Copyright © 2005 RAND Corporation. Professor Daniel Seidmann and Blackwell Publishing for permission to reproduce extracts from Seidmann, Daniel ‘The Effects of a Right to Silence’ (2005). 72 (2) Review of Economic Studies 593–614. Copyright © 2005 Blackwell Publishing. Professor Gerald D. Gay, Professor Martin F. Grace; Professor Jayant R. Kale, Professor Thomas H Noe and RAND Corporation for permission to reproduce extracts from Gay, Gerald D. et al. ‘Noisy Juries and the Choice of Trial Mode in a Sequential Signalling Game: Theory and Evidence’ (1989). 20 (2) Rand Journal of Economics 196–213. Copyright © 1989 RAND Corporation. Professor Eric Helland, Professor Alexander Tabarrok and Oxford University Press for permission to reproduce extracts from Helland, Eric and Tabarrok, Alexander ‘Runaway Judges? Selection Effects and the Jury’ (2000). 16 (2) The Journal of Law, Economics, & Organization 305–333. Copyright © 2000 Oxford University Press. Professor James Andreoni and RAND Corporation for permission to reproduce extracts from Andreoni, James ‘Reasonable Doubt and the Optimal Magnitude of Fines: Should the Penalty Fit the Crime?’ (1991). 22 (3) Rand Journal of Economics 385–395. Copyright © 1991 RAND Corporation. Professor Isaac Ehrlich and American Economic Association for permission to reproduce extracts from Ehrlich, Isaac ‘The Deterrent Effect of Capital
Acknowledgements
xv
Punishment: A Question of Life and Death’ (1975). 65 (3) American Economic Review 397–417. Copyright © 1975 American Economic Review. Professor Paul H. Rubin and Oxford University Press for permission to reproduce extracts from Dezhbakhsh, Hashem, Shepherd, Joanna M. and Rubin, Paul H. ‘Does Capital Punishment Have a Deterrent Effect?’ (2003). 5 (2) American Law and Economics Review 344–376. Copyright © 2003 Oxford University Press. Every effort has been made to contact copyright holders for their permission to reprint selections in this book. The publishers would be grateful to hear from any copyright holders who is not here acknowledged and will undertake to rectify any errors or omissions in future editions of this book.
1
Introduction Hugo M. Mialon and Paul H. Rubin
This book brings together for the first time the emerging but still small and fragmented literature that employs economics to analyze the implications of constitutional protections of individual rights and freedoms, including freedom of speech and of the press, the right to bear arms, the right against unreasonable searches, the right against selfincrimination, the right to trial by jury, and the right against cruel and unusual punishment. Most of these constitutional protections involve tradeoffs, including the balance between preserving freedom of speech and allowing harmful ideas to be communicated; between arming victims and arming criminals; between protecting against crime and invading individual privacy; and between punishing the guilty and punishing the innocent. Therefore, issues related to constitutional safeguards of individual rights are actually in the realm of economics, a science uniquely suited to the study of decisions involving tradeoffs between benefits and costs. Several of the papers included in the book employ economic theory to analyze the social efficiency of policies related to these constitutional protections, and others formulate empirical models to estimate the effects of these policies on measurable outcomes of concern to society. Many of the results are directly relevant to current debate and policy-making. These studies are brought together here for the first time, resulting in a new, rigorous, and unified understanding about how best to protect important individual rights against encroachment by government. Studies of freedom of speech and of the press are presented first because these freedoms are essential for creating and enforcing all other individual rights. Economic analysis of freedom of speech starts with the notion of a marketplace for ideas, formally introduced by Coase (1974). Free speech is free competition in an ideas market. If an ideas market fails to produce a socially optimal outcome, intervention to restrict speech in that market might be justified. However, as Breton and Wintrobe (1992) argue, restrictions on freedom of speech in an ideas market can lead to a monopoly on ideas, which can also lead to social inefficiency. Under monopoly restrictions on speech, ideas could be unduly repressed.
2
Hugo M. Mialon and Paul H. Rubin
The danger and costs of monopoly restrictions on freedom of speech are particularly high in the case of political speech. Political parties in power have a natural tendency to restrict the speech of their political rivals to retain their power. Moreover, speech is the principal means by which politicians compete for power. Restrictions on political speech could therefore directly paralyze political competition and result in a monopoly over governmental power. Abuse of such a monopoly would entail costs that could easily outweigh all other considerations. Restrictions on speech can also limit economic freedom. Rubin and Jordan (1979) show that there are costs to regulation of commercial speech, and that markets for advertising and other forms of commercial speech work well without regulation. Like restrictions on the freedom of political speech, restrictions on the freedom of the press can also limit political competition, and could even subvert democracy. If voters are not well informed about the actions of elected politicians, then the elected politicians are free to be corrupt. A free press might reduce the informational asymmetry, thereby curbing corruption. Brunetti and Weder (2003) empirically investigate the effects of freedom of the press on aggregate measures of corruption across countries. Their results indicate that freedom of the press does indeed significantly reduce corruption. Mullainathan and Shleifer (2005) show that with reader heterogeneity a free press will lead to a variety of views such that a reader could become accurately informed. Freedom of speech and freedom of the press are essential for creating and enforcing other rights. If a government can violate a right and nobody can hear of the violation, then the government incurs no cost from the violation. The freedoms of speech and of the press increase political competition and government accountability, which in turn determine the extent of all other individual rights. The rest of the book is devoted primarily to individual rights related to crime and punishment. Economic studies of the right to bear arms, the right against unreasonable searches, the right to trial by jury, and the right against cruel and unusual punishment are presented in that order. The studies are ordered by topic in a way that follows the history of a crime and its punishment—from decisions about methods used to commit or defend against a crime, to the behavior of police in seeking a criminal, to rules of trial, to punishment. In the first stage of a crime, citizens choose whether to obtain a gun to commit or defend against a crime. Here, the individual’s right to bear arms comes into play. Mialon and Wiseman (2005) develop an economic model of crime and self-defense that provides a rationale both for having a basic right to bear arms and for regulating this right. An absolute prohibition on gun carrying would completely disarm both criminals and potential victims. Under such a rule, a criminal would not likely be severely hurt if he attacked a victim, since the victim would be unarmed. Moreover, the criminal’s chance
Introduction
3
of winning an unarmed confrontation could be quite high given the criminal’s first-mover advantage. If the values of victims’ assets are sufficiently high, criminals would then attack even though they do not have guns, leading victims to lie low to avoid losing their assets. Therefore, full gun control would lead to a situation in which potential victims usually lie low and suffer a large loss of freedom. Thus, a basic right to bear arms might be fundamental to individual freedom. However, once a basic right to bear arms is established, any further strengthening of the right involves an important social tradeoff. Guns may have a “deterrent effect” on crime, because criminals are less likely to attack if they fear an armed response. However, guns may also have a “facilitating effect” on crime, because they may end up in the hands of criminals. Given this tradeoff, it may be efficient to regulate or limit gun carrying. If the deterrent effect of guns is greater than the facilitating effect of guns, then allowing guns will reduce crime. However, if the facilitating effect is greater, then allowing guns will increase crime. Lott and Mustard (1997) examine the effects of “shall-issue” laws on crime rates. These laws mandate the issuance of permits to carry concealed weapons for most adult non-felons requesting such licenses. The authors use a comprehensive panel data set of all counties in the United States from 1977 to 1992. They find that allowing easy issuance of gun licenses led to very large reductions in crime. In other words, they find that the “deterrent effect” of guns greatly outweighed the “facilitating effect.” Many studies in the literature have examined these results, and the paper remains controversial. Dezhbakhsh and Rubin (1998) find that when the Lott–Mustard coefficients for states that allow shall-issue laws are plugged into the data for states that do not allow the laws, the effects on crime are small and ambiguous; some crimes in some states increase and some crimes in other states decrease. As of now, the weight of evidence is against the Lott–Mustard hypothesis, but more testing is still necessary. The Supreme Court will soon hear a challenge to the ruling that the District of Columbia law outlawing most guns possessed by civilians violates the Second Amendment. If the Court upholds this ruling, then the laws of many states will be overturned, and this will present an excellent opportunity to empirically test the effect of gun control on crime rates. In the second stage of a crime, after each citizen has chosen whether and how to commit or defend against crime, evidence about whether a crime has been committed is generated, and the police choose whether to search a citizen based on the available evidence against him. Here, the individual’s right against unreasonable searches comes into play. In practice, the right against unreasonable searches is enforced through the exclusionary rule: if the police search a citizen when the evidence does not provide probable cause against the citizen, then any additional evidence that they uncover through the search is thrown out or excluded from court.
4
Hugo M. Mialon and Paul H. Rubin
The exclusionary rule might reduce police searches and increase individual privacy, but it might also increase crime. If the rule is enforced, we would expect the police not to violate the rule because they know that its violation will lead to losing the case. Instead, they will adapt by using methods other than searches to find criminals. However, if searches are in some circumstances more efficient, having to turn to substitute alternative methods will lead to lower arrest and conviction rates, and in turn to higher crime rates. Atkins and Rubin (2003) examine empirically the effect of the exclusionary rule on crime rates in the United States. Precedent for the exclusionary rule in state crimes was set by the 1961 Supreme Court case of Mapp v. Ohio. Even before Mapp, exactly one half of the states had adopted a version of the exclusionary rule on their own. Atkins and Rubin find that the 1961 ruling significantly increased crime rates only in those states that did not previously have the exclusionary rule, controlling for other factors that might also affect crime. The 1961 ruling caused an increase in most types of crime, including larceny, auto theft, burglary, robbery, and assault. These results suggest that rules enforcing the right against unreasonable searches have a significant cost in terms of increased crime. However, these rules also protect against privacy invasion and police errors. From a social standpoint, the right against unreasonable searches should be strengthened up to the point where the marginal social benefits in terms of reduced privacy invasion and police errors are exactly equal to the marginal social costs in terms of reduced security. In the third stage of a crime, after police have chosen whether to search and arrest a citizen, a citizen who is arrested must choose whether to disclose possibly incriminating evidence to police and prosecutors before and at trial. Here, the individual’s right to remain silent comes into play. Seidmann (2005) and Mialon (2005) analyze the efficiency of a right to silence, which attempts to block any adverse inference from a criminal suspect’s or defendant’s silence in the face of accusation. Seidmann shows that given the possibility of perjury, a right to silence can benefit the innocent even if the right is exercised only by the guilty. Without a right to silence, the guilty might be forced to lie and make false exculpatory statements. Then the innocent would not be able to signal their innocence to the jury by making exculpatory statements since the guilty would also make such statements. As a result, the jury would not be able to distinguish the innocent from the guilty from their statements, and the innocent might be convicted. However, with a right to silence, the guilty would exercise the right and remain silent. Then, the innocent could signal their innocence through exculpatory statements, and would be less likely to be convicted. As Mialon argues, the innocent might also benefit from a right to silence more directly. The accuracy of the evidence regarding a defendant’s culpability is rarely perfect. Even if the defendant is innocent, the evidence might indicate that he is guilty, in which case he would choose to remain silent if lying is not an option. Moreover, the defendant might not know the
Introduction
5
evidence, in which case he would have to remain silent. In either case, the innocent defendant might be wrongfully convicted if the adverse inference from silence is not blocked, but rightfully acquitted if it is blocked. Thus, even without the possibility of perjury, a right to silence can reduce wrongful convictions and help the innocent. Of course, a right to silence can also help the guilty. If a guilty defendant does not know the evidence, or knows the evidence and it correctly indicates that he is guilty, he might be rightfully convicted without the right to silence, but wrongfully acquitted with it. By reducing convictions, a right to silence might reduce wrongful convictions, but only at the expense of increasing wrongful acquittals. A right to silence might directly improve social welfare if society prefers a wrongful acquittal sufficiently more than a wrongful conviction. However, a right to silence can only directly increase social welfare if the jury’s preferences are biased relative to society’s preferences. If the jury’s and society’s preferences over court outcomes coincide, and the jury rationally takes into account all available evidence, then the jury’s decision problem is the same as society’s. Thus, preventing the jury from making a rational inference from silence cannot improve social welfare. If some juries are unduly biased against a defendant, a right to silence can, but still need not, improve welfare. A right to silence is more likely to enhance welfare if the police are more corrupt or biased. Innocent suspects are more likely to end up in court if the police are prejudiced or they disregard rights of suspects in their investigations. In such circumstances, more innocent defendants end up in court and stand to benefit from a right to silence. Thus, if jury and police discrimination is more of a problem, a right to silence is more likely to improve welfare. In the fourth stage of a crime, after the defendant and prosecutor have chosen what evidence to present, the judge or jury evaluates the evidence and chooses whether to convict the defendant. Here, the right to trial by jury comes into play. A jury might evaluate the evidence differently than a judge, and therefore might arrive at a different verdict. The right of defendants to have their criminal culpability determined by a jury of their peers is a safeguard against overzealous prosecutors and biased judges. Imposing an all-judge system in criminal cases might be inconsistent with the objective of trial fairness. But then, why not impose an all-jury system? Gay et al. (1989) argue that a right to unilaterally waive a jury trial increases the accuracy of trial outcomes relative to a system in which jury trials are mandated. In general, juries might not process evidence as efficiently as judges; that is, juries might be “noisier” than judges. As long as this is the case, the right to waive a jury trial strictly benefits the innocent, because waiving the right allows them access to a more accurate fact-finder. Therefore, innocent defendants are always better off in a system with the right to waive a jury trial than in an all-jury system. Moreover, conviction rates for the guilty would be the same under both systems, because the guilty would
6 Hugo M. Mialon and Paul H. Rubin rationally choose jury trials even when they had the right to waive a jury trial, since juries are more likely to make mistakes than judges. Therefore, the right to waive a jury trial enhances the objective of trial accuracy. Helland and Tabarrok (2000) empirically investigate the effects of the right to trial by jury in civil cases. Awards granted by juries are significantly higher on average than awards granted by judges. Moreover, plaintiffs are more likely to win cases before juries than before judges. However, the authors find that much of these differences can be attributed to the types of cases that are brought before judges and juries. Juries tend to see more high-award cases, such as medical malpractice and product liability cases, whereas judges tend to see more low-award cases, such as premises liability cases. This is an example of the economic significance of a non-criminal constitutional protection. In the fifth stage of a crime, if the judge or jury has decided to convict the defendant, the nature and severity of the punishment are determined. Here, the right against cruel and unusual punishment comes into play. Punishment that is excessive relative to the severity of the offense is deemed cruel and unusual. Capital punishment might also be considered cruel and unusual. Andreoni (1991) shows that scaling the punishment according to the severity of the offense is efficient if maximal deterrence is the objective. As the punishment increases, the cost of convicting an innocent person increases, while the cost of acquitting a guilty person remains the same, so rational juries are less likely to convict. There is a maximal punishment beyond which jurors are not willing to convict at all. However, the cost of acquitting a guilty person is higher if the severity of the offense is higher. Therefore, if the offense is more severe, the maximal punishment at which jurors are still willing to convict is higher. Hence, the punishment scheme that ensures maximal deterrence is one in which the severity of punishment grows with the severity of the offense. An increase in the severity of punishment could deter more potential criminals from committing crime as long as it does not reduce the probability of conviction too much. However, it would also result in a harsher punishment for wrongly convicted innocent people, which will be inevitable since the justice system is not perfectly accurate. Therefore, there is a potential tradeoff between deterrence and wrongful punishment. It is difficult to estimate the frequency of wrongful convictions, because in many cases, culpability can never be ascertained completely. However, it is possible to measure the deterrent effects of punishments. Ehrlich (1975) provides the first empirical analysis by an economist of the deterrent effects of capital punishment. This work was highly controversial and led to extensive additional analysis, most of it a reanalysis of Ehrlich’s data. Results have been mixed: several authors found a deterrent effect, while several did not. Recently, several more authors have reexamined the possibility of a deterrent effect using more advanced statistical techniques than those used by Ehrlich, and analyzing more complete data.
Introduction
7
Dezhbakhsh, Rubin and Shepherd (2003) examine the deterrent effect of capital punishment using county-level panel data from 3,054 U.S. counties over the period 1977–1996. This study finds a substantial deterrent effect; both death row sentences and executions result in decreases in the murder rate. A conservative estimate is that each execution results in, on average, 18 fewer murders. The main finding, that capital punishment has a deterrent effect, is robust to several different ways of performing the statistical analysis and to several ways of measuring the probability of an execution. Several other studies find similar results, although there has also been heavy criticism of these results in the literature. The papers collected in this book demonstrate how economics can be fruitfully applied to analyze individual rights and freedoms. We believe that this is a very fruitful area for further research. For example, game-theoretic analysis can be applied to additional legal protections, since the adversarial legal system exactly lends itself to this form of analysis. One area of research that could be fruitfully explored is the interaction between various rights. For example, the right to avoid illegal searches is much more valuable because of the right to legal representation at trial, and conversely. Additional exploration of such interactions would be very productive. It should also be possible to empirically analyze the effects of various Supreme Court rulings. This would be especially useful in cases where the Court ruling affects different states differently because this would create a natural experiment that can be exploited for additional insights.
References Andreoni, James (1991). “Reasonable Doubt and the Optimal Magnitude of Fines: Should the Penalty Fit the Crime?,” RAND Journal of Economics, Vol. 22, 385–395. Atkins, Raymond A. and Rubin, Paul H. (2003). “Effects of Criminal Procedure on Crime Rates: Mapping Out the Consequences of The Exclusionary Rule,” Journal of Law and Economics, Vol. 46, 157–180. Breton, Albert and Wintrobe, Ronald (1992). “Freedom of Speech vs. Efficient Regulation in Markets for Ideas,” Journal of Economic Behavior and Organization, Vol. 17, 217–239. Brunetti, Aymo and Weder, Beatrice (2003). “A Free Press is Bad News for Corruption,” Journal of Public Economics, Vol. 87, 1801–1824. Coase, Ronald H. (1974). “The Market for Goods and the Market for Ideas,” American Economic Review, Vol. 64, 384–391. Dezhbakhsh, Hashem and Rubin, Paul H. (1998). “Lives Saved or Lives Lost: The Effect of Concealed Handgun Laws on Crime,” American Economic Review, Vol. 88, 468–474. Dezhbakhsh, Hashem, Rubin, Paul H. and Shepherd, Joanna M. (2003). “Does Capital Punishment Have a Deterrent Effect?,” American Law and Economics Review, Vol. 5, 344–376. Ehrlich, Isaac (1975). “The Deterrent Effect of Capital Punishment: A Question of Life and Death,” American Economic Review, Vol. 65, 397–417.
8
Hugo M. Mialon and Paul H. Rubin
Gay, Gerald D., Grace, Martin F., Kale, Jayant R. and Noe, Thomas H. (1989). “Noisy Juries and the Choice of Trial Mode in a Sequential Signalling Game: Theory and Evidence,” RAND Journal of Economics, Vol. 20, 196–213. Helland, Eric and Tabarrok, Alexander (2000). “Runaway Judges? Selection Effects and the Jury,” Journal of Law, Economics, and Organization, Vol. 16, 306–333. Lott, John R. Jr. and Mustard, David B. (1997). “Crime, Deterrence, and Right-toCarry Concealed Handguns,” Journal of Legal Studies, Vol. 26, 1–68. Mialon, Hugo M. (2005). “An Economic Theory of the Fifth Amendment,” RAND Journal of Economics, Vol. 36, 833–848. Mialon, Hugo M. and Wiseman, Tom (2005). “The Impact of Gun Laws: A Model of Crime and Self-Defense,” Economics Letters, Vol. 88, 170–175. Mullainathan, Sendhil and Shleifer, Andrei (2005). “The Market for News,” American Economic Review, Vol. 95, 1031–1053. Rubin, Paul H. and Jordan, Ellen R. (1979). “An Economic Analysis of the Law of False Advertising,” Journal of Legal Studies, Vol. 8, 527–553. Seidmann, Daniel (2005). “The Effects of a Right to Silence,” Review of Economic Studies, Vol. 72, 593–614.
2
The economics of the First Amendment The market for goods and the market for ideas R. H. Coase
The normal treatment of governmental regulation of markets makes a sharp distinction between the ordinary market for goods and services and the activities covered by the First Amendment—speech, writing, and the exercise of religious beliefs—which I call, for brevity, “the market for ideas.” The phrase, “the market for ideas,” does not describe the boundaries of the area to which the First Amendment has been applied very exactly. Indeed, these boundaries do not seem to have been very clearly drawn. But there can be little doubt that the market for ideas, the expression of opinion in speech and writing and similar activities, is at the center of the activities protected by the First Amendment, and it is with these activities that discussion of the First Amendment has been largely concerned. The arguments that I will be considering long antedate the passage of the First Amendment (which obviously incorporated views already held) and there is some danger for economists, although not necessarily for American lawyers, in confining our discussion to the First Amendment rather than considering the general problem of which it is a part. The danger is that our discussion will tend to concentrate on American court opinions, and particularly those of the Supreme Court, and that, as a result, we will be led to adopt the approach to the regulation of markets found congenial by the courts rather than one developed by economists, a procedure which already has gone a long way to ruin public utility economics and has done much harm to economic discussion of monopoly problems generally. This approach is confining in another way, since, by concentrating on issues within the context of the American Constitution, it is made more difficult to draw on the experience and thought of the rest of the world. What is the general view that I will be examining? It is that, in the market for goods, government regulation is desirable whereas, in the market for ideas, government regulation is undesirable and should be strictly limited. In the market for goods, the government is commonly regarded as competent to regulate and properly motivated. Consumers lack the ability to make the appropriate choices. Producers often exercise monopolistic power and, in any case, without some form of government intervention, would not act in a way which promotes the public interest. In the market for ideas, the position
10
R. H. Coase
is very different. The government, if it attempted to regulate, would be inefficient and its motives would, in general, be bad, so that, even if it were successful in achieving what it wanted to accomplish, the results would be undersirable. Consumers, on the other hand, if left free, exercise a fine discrimination in choosing between the alternative views placed before them, while producers, whether economically powerful or weak, who are found to be so unscrupulous in their behavior in other markets, can be trusted to act in the public interest, whether they publish or work for the New York Times, the Chicago Tribune or the Columbia Broadcasting System. Politicians, whose actions sometimes pain us, are in their utterances beyond reproach. It is an odd feature of this attitude that commercial advertising, which is often merely an expression of opinion and might, therefore, be thought to be protected by the First Amendment, is considered to be part of the market for goods. The result is that government action is regarded as desirable to regulate (or even suppress) the expression of an opinion in an advertisement which, if expressed in a book or article, would be completely beyond the reach of government regulation. This ambivalence toward the role of government in the market for goods and the market for ideas has not usually been attacked except by those on the extreme right or left, that is, by fascists or communists. The Western world, by and large, accepts the distinction and the policy recommendations that go with it. The peculiarity of the situation has not, however, gone unnoticed, and I would like to draw your attention to a powerful article by Aaron Director. Director quotes a very strong statement by Justice William O. Douglas in a Supreme Court opinion, a statement which is no doubt intended as an interpretation of the First Amendment, but which obviously embodies a point of view not dependent on constitutional considerations. Justice Douglas said: “free speech, free press, free exercise of religion are placed separate and apart; they are above and beyond the police power; they are not subject to regulation in the manner of factories, slums, apartment houses, production of oil and the like” (Beauharnis v. Illinois). Director remarks of the attachment to free speech that it is “the only area where laissez-faire is still respectable.” Why should this be so? In part, this may be due to the fact that belief in a free market in ideas does not have the same roots as belief in the value of free trade in goods. To quote Director again: “The free market as a desirable method of organizing the intellectual life of the community was urged long before it was advocated as a desirable method of organizing its economic life. The advantage of free exchange of ideas was recognized before that of the voluntary exchange of goods and services in competitive markets.” In recent years, particularly, I think in America (that is, North America), this view of the peculiar status of the market for ideas has been nourished by a commitment to democracy as exemplified in the political institutions of the United States, for whose efficient working a market in ideas not subject to government regulation is considered essential. This opens a large subject on
The market for goods and the market for ideas
11
which I will avoid comment. Suffice it to say that, in practice, the results actually achieved by this particular political system suggest that there is a good deal of “market failure.” Because of the view that a free market in ideas is necessary to the maintenance of democratic institutions and, I believe, for other reasons also, intellectuals have shown a tendency to exalt the market for ideas and to depreciate the market for goods. Such an attitude seems to me unjustified. As Director said: “the bulk of mankind will for the foreseeable future have to devote a considerable fraction of their active lives to economic activity. For these people, freedom of choice as owners of resources in choosing within available and continually changing opportunities, areas of employment, investment, and consumption is fully as important as freedom of discussion and participation in government.” I have no doubt that this is right. For most people in most countries (and perhaps in all countries), the provision of food, clothing, and shelter is a good deal more important than the provision of the “right ideas,” even if it is assumed that we know what they are. But leave aside the question of the relative importance of the two markets; the difference in view about the role of government in these two markets is really quite extraordinary and demands an explanation. It is not enough merely to say that the government should be excluded from a sphere of activity because it is vital to the functioning of our society. Even in markets which are mainly of concern to the lower orders, it would not seem desirable to reduce the efficiency with which they work. The paradox is that government intervention which is so harmful in the one sphere becomes beneficial in the other. The paradox is made even more striking when we note that at the present time it is usually those who press most strongly for an extension of government regulation in other markets who are most anxious for a vigorous enforcement of the First Amendment prohibitions on government regulation in the market for ideas. What is the explanation for the paradox? Director’s gentle nature does not allow him to do more than hint at it: “A superficial explanation for the preference for free speech among intellectuals runs in terms of vertical interests. Everyone tends to magnify the importance of his own occupation and to minimize that of his neighbor. Intellectuals are engaged in the pursuit of truth, while others are merely engaged in earning a livelihood. One follows a profession, usually a learned one, while the other follows a trade or a business.” I would put the point more bluntly. The market for ideas is the market in which the intellectual conducts his trade. The explanation of the paradox is self-interest and self-esteem. Self-esteem leads the intellectuals to magnify the importance of their own market. That others should be regulated seems natural, particularly as many of the intellectuals see themselves as doing the regulating. But self-interest combines with self-esteem to ensure that, while others are regulated, regulation should not apply to them. And so it is possible to live with these contradictory views about the role of
12
R. H. Coase
government in these two markets. It is the conclusion that matters. It may not be a nice explanation, but I can think of no other for this strange situation. That this is the main explanation for the dominance of the view that the market for ideas is sacrosanct is certainly supported if we examine the actions of the press. The press is, of course, the most stalwart defender of the doctrine of freedom of the press, an act of public service to the performance of which it has been led, as it were, by an invisible hand. If we examine the actions and views of the press, they are consistent in only one respect: they are always consistent with the self-interest of the press. Consider their argument that the press should not be forced to reveal the sources of its published material. This is termed a defense of the public’s right to know— which is interpreted to mean that the public has no right to know the source of material published by the press. To desire to know the source of a story is not idle curiosity. It is difficult to know how much credence to give to information or to check on its accuracy if one is ignorant of the source. The academic tradition, in which one discloses to the greatest extent possible the sources on which one relies and thus exposes them to the scrutiny of one’s colleagues, seems to me to be sound and an essential element in the search for truth. Of course, the counterargument of the press is not without validity. It is argued that some people would not express their opinions honestly if it became known that they really held these opinions. But this argument applies equally to all expressions of views, whether in government, business, or private life, where confidentiality is necessary for frankness. However, this consideration has commonly not deterred the press from revealing such confidences when it was in their interest to do so. Of course, it would also impede the flow of information to reveal the sources of the material published in cases in which the transmission of the information involved a breach of trust or even the stealing of documents. To accept material in such circumstances is not consistent with the high moral standards and scrupulous observance of the law which the press expects of others. It is hard for me to believe that the main thing wrong with the Watergate affair was that it was not organized by the New York Times. I would not wish to argue that there are not conflicting considerations in all these cases which are difficult to evaluate. My point is that the press does not find them difficult to evaluate. Consider another example which is in many ways more striking: the attitude of the press to government regulation of broadcasting. Broadcasting is an important source of news and information; it comes within the purview of the First Amendment. Yet the program content of a broadcasting station is subject to government regulation. One might have thought that the press, devoted to the strict enforcement of the First Amendment, would have been constantly attacking this abridgment of the right of free speech and expression. But, in fact, they have not. In the forty-five years which have passed since the formation of the Federal Radio Commission (now transformed into the Federal Communications Commission), very few doubts about the policy have been expressed in the press. The press, which is so
The market for goods and the market for ideas
13
anxious to remain unshackled by government regulation, has never exerted itself to secure a similar freedom for the broadcasting industry. Lest you think that I manifest a hostility to the American press, I would like to point out that the British press has acted in a similar fashion. In this case the contrast between actions and proclaimed beliefs is even stronger since what was established in Britain was a government-controlled monopoly of a source of news and information. It might have been thought that this affront to the doctrine of freedom of the press would have appalled the British press. It did not. They supported the broadcasting monopoly, mainly, as far as I can see, because they saw the alternative to the British Broadcasting Corporation (BBC) as commercial broadcasting and, therefore, as involving increased competition for advertising revenue. But if the press did not want competition for advertising revenue, they also did not want increased competition in the supply of news. And so they did their best to throttle the BBC, at least as a purveyor of news and information. When the monopoly was originally established (when it was still the British Broadcasting Company), the BBC was prohibited from broadcasting news and information unless obtained from certain named news agencies. No news could be broadcast before 7 p.m. and broadcasts likely to affect adversely the sale of newspapers faced other restrictions as well. Gradually, over the years, these restrictions were relaxed as a result of negotiations between the press and the BBC. But it was not until after the outbreak of World War II that the BBC broadcast a regular news bulletin before 6 p.m.1 But, it may be argued, the fact that businessmen are mainly influenced by pecuniary considerations is no great discovery. What else would one expect from the money-grubbers of the newspaper world? Furthermore, it may be objected, because a doctrine is propagated by those who benefit from it does not mean that the doctrine is unsound. After all, have not free speech and a free press also been advocated by high-minded scholars whose beliefs are determined by what is true rather than by more sordid considerations? There has surely never been a more high-minded scholar than John Milton. As his Areopagitica “for the liberty of unlicensed printing” is probably the most celebrated defense of the doctrine of freedom of the press ever written, it seemed to me that it would be worthwhile to examine the nature of his argument for a free press. Milton’s work has another advantage for my purpose. Written in 1644, that is, long before 1776, we can see the character of the argument before there was any general understanding of how competitive markets worked and before the emergence of modern views on democracy. It would be idle for me to pretend that I could act as a guide to Milton’s thought. I know too little of seventeenth century England and there is much in Milton’s pamphlet the meaning of which I cannot discern. Yet, there are passages which leap across the centuries and for whose interpretation no scholarship is needed. As one would expect, Milton asserts the primacy of the market for ideas: “Give me the liberty to know, to utter, and to argue freely according to
14
R. H. Coase
conscience, above all liberties” (p. 44). It is different from the market for goods and should not be treated in the same way: “Truth and understanding are not such wares as to be monopolised and traded in by tickets and statutes and standards. We must not think to make a staple commodity of all the knowledge in the land, to mark and license it like our broadcloth and our woolpacks” (p. 29). The licensing of printed material is an affront to learned men and to learning: “When a man writes to the world, he summons up all his reason and deliberation to assist him; he searches, mediates, is industrious, and likely consults and confers with his judicious friends; after all which done he takes himself to be informed in what he writes, as well as any that writ before him. If in this the most consummate act of his fidelity and ripeness no years, no industry, no former proof of his abilities can bring him to that state of maturity as not to be still mistrusted and suspected, unless he carry his considerate diligence, all his midnight watchings . . . to the hasty view of an unleisured licenser, perhaps much his younger, perhaps far his inferior in judgment, perhaps one who never knew the labour of book-writing, and, if he be not repulsed or slighted, must appear in print like a puny with his guardian and his censor’s hand on the back of his title to be his bail and surety, that he is no idiot or seducer, it cannot be but a dishonour and derogation to the author, to the book, to the privilege and dignity of learning” (p. 27). Licensing is also an affront to the common people: “Nor is it to the common people less than a reproach; for if we be so jealous over them, as that we dare not trust them with an English pamphlet, what do we but censure them for a giddy, vicious, and ungrounded people, in such a sick and weak state of faith and discretion, as to be able to take nothing down but through the pipe of a licenser” (p. 30). In the market for ideas, the right choices are made: “Let [truth] and falsehood grapple; who ever knew Truth put to the worse in a free and open encounter” (p. 45). Those who undertake the job of licensing will be incompetent. A licenser should be, according to Milton, “studious, learned, and judicious.” But this is not what we are likely to get: “we may easily foresee what kind of licensers we are to expect hereafter: either ignorant, imperious, and remiss, or basely pecuniary” (p. 25). The licensers are more likely to suppress truth than falsehood: “if it come to prohibiting, there is aught more likely to be prohibited than truth itself; whose first appearance to our eyes bleared and dimmed with prejudice and custom is more unsightly and unplausible than many errors . . .” (p. 47). Nor does Milton fail to tell us that the licensing scheme against which he was writing came about as a result of industry pressure: “And how it got the upper hand . . . there was in it the fraud of some old patentees and monopolisers in the trade of bookselling” (p. 50). In the formation of Milton’s views, self-interest may perhaps have played a
The market for goods and the market for ideas
15
part, but there can be little doubt that his argument embodies a good deal of intellectual pride of the kind to which Director refers. The writer is a learned man, diligent and trustworthy. The licenser would be ignorant, incompetent, and basely motivated, perhaps “younger” and “inferior in judgment.” The common man always chooses truth as against falsehood. The picture is a little too one-sided to be wholly convincing. And if it has been convincing to the intellectual community (and apparently it often has), it is surely because people are easily persuaded that what is good for them is good for the country. I do not believe that this distinction between the market for goods and the market for ideas is valid. There is no fundamental difference between these two markets and, in deciding on public policy with regard to them, we need to take into account the same considerations. In all markets, producers have some reasons for being honest and some for being dishonest; consumers have some information but are not fully informed or even able to digest the information they have; regulators commonly wish to do a good job, and though often incompetent and subject to the influence of special interests, they act like this because, like all of us, they are human beings whose strongest motives are not the highest. When I say that the same considerations should be taken into account, I do not mean that public policy should be the same in all markets. The special characteristics of each market lead to the same factors having different weights, and the appropriate social arrangements will vary accordingly. It may not be sensible to have the same legal arrangements governing the supply of soap, housing, automobiles, oil, and books. My argument is that we should use the same approach for all markets when deciding on public policy. In fact, if we do this and use for the market for ideas the same approach which has commended itself to economists for the market for goods, it is apparent that the case for government intervention in the market for ideas is much stronger than it is, in general, in the market for goods. For example, economists usually call for government intervention, which may include direct government regulation, when the market does not operate properly— when, that is, there exist what are commonly referred to as neighborhood or spillover effects, or, to use that unfortunate word, “externalities.” If we try to imagine the property rights system that would be required and the transactions that would have to be carried out to assure that anyone who propagated an idea or a proposal for reform received the value of the good it produced or had to pay compensation for the harm that resulted, it is easy to see that in practice there is likely to be a good deal of “market failure.” Situations of this kind usually lead economists to call for extensive government intervention. Or consider the question of consumer ignorance which is commonly thought to be a justification for government intervention. It is hard to believe that the general public is in a better position to evaluate competing views on economic and social policy than to choose between different kinds of food.
16
R. H. Coase
Yet there is support for regulation in the one case but not in the other. Or consider the question of preventing fraud, for which government intervention is commonly advocated. It would be difficult to deny that newspaper articles and the speeches of politicians contain a large number of false and misleading statements—indeed, sometimes they seem to consist of little else. Government action to control false and misleading advertising is considered highly desirable. Yet a proposal to set up a Federal Press Commission or a Federal Political Commission modeled on the Federal Trade Commission would be dismissed out of hand. The strong support enjoyed by the First Amendment should not hide from us that there is, in fact, a good deal of government intervention in the market for ideas. I have mentioned broadcasting. But there is also the case of education, which, although it plays a crucial role in the market for ideas, is subject to considerable regulation. One might have thought that those who were so anxious to obstruct government regulation of books and other printed material would also find such regulation in the field of education obnoxious. But, of course, there is a difference. Government regulation of education commonly accompanies government financing and other measures (such as compulsory school attendance) which increase the demand for the services of intellectuals and, therefore, their incomes. (See E. G. West, p. 101.) So self-interest, which, in general, would lead to support for a free market in ideas, suggests a different attitude in education. Nor do I doubt that detailed study would reveal other cases in which groups of practitioners in the market for ideas have supported government regulation and the restriction of competition when it would increase their incomes, just as we find similar behavior in the market for goods. But interest in monopolizing is likely to be less in the market for ideas. A general policy of regulation, by restricting the market, would have the effect of reducing the demand for the services of intellectuals. But more important, perhaps, is that the public is commonly more interested in the struggle between truth and falsehood than it is in the truth itself. Demand for the services of the writer and speechmaker depends, to a considerable extent, on the existence of controversy—and for controversy to exist, it is necessary that truth should not stand triumphant and alone. Whatever one may think of the motives which have led to the general acceptance of the present position, there remains the question of which policies would be, in fact, the most appropriate. This requires us to come to some conclusion about how the government will perform whatever functions are assigned to it. I do not believe that we will be able to form a judgment in which we can have any confidence unless we abandon the present ambivalence about the performance of government in the two markets and adopt a more consistent view. We have to decide whether the government is as incompetent as is generally assumed in the market for ideas, in which case we would want to decrease government intervention in the market for goods, or whether it is as efficient as it is generally assumed to be in the market for
The market for goods and the market for ideas
17
goods, in which case we would want to increase government regulation in the market for ideas. Of course, one could adopt an intermediate position—a government neither as incompetent and base as assumed in the one market nor as efficient and virtuous as assumed in the other. In this case, we ought to reduce the amount of government regulation in the market for goods and might want to increase government intervention in the market for ideas. I look forward to learning which of these alternative views will be espoused by my colleagues in the economics profession.
Note 1
For a discussion of the attitude of the press to the monopoly of British broadcasting, see Coase, pp. 103–10 and 192–93.
References Coase, R. H. British Broadcasting, A Study in Monopoly, Cambridge, Mass. 1950. Director, A. “The Parity of the Economic Market Place,” J. of Law and Econ., Oct. 1964. Milton, J. Areopagitica, A Speech for the Liberty of Unlicensed Printing, with introduction and notes by H. B. Cotterill, New York 1959. West, E. G. “The Political Economy of American Public School Legislation,” J. of Law and Econ., Oct. 1967. Beauharnis v. Illinois, 343 U.S. 250, 286, 1952.
3
An economic analysis of the law of false advertising Ellen R. Jordan and Paul H. Rubin *
The past twenty years have produced a substantial literature suggesting the relevance of economic principles to the analysis of legal processes.1 A seminal line of inquiry has evaluated the proposition that the common law, despite its diversity, is founded upon a unifying principle: courts behave rationally as if to achieve economic efficiency.2 To explain why these efficient outcomes are likely, Rubin,3 Landes and Posner,4 and Goodman5 have proposed evolutionary models where efficiency is most likely to result if the opposing parties have symmetric, ongoing interests in cases of the sort being disputed.6 Disputes between competitors, characterized by the law as “unfair competition,” would appear to be an area where the litigants would have the kind of ongoing interest which provides economic incentive to litigate to an efficient outcome. But in apparent contradiction to this expectation, the common law relating to false advertising has been criticized as being inefficient. Assume, for example, that firm A makes false claims about its product and, due to the deception, some consumers purchase the product. Although each consumer may well have a legal action against the seller, no one consumer is likely to have lost enough to make litigation worthwhile.7 Competitors of firm A may also have lost sales because of the deception and might have enough incentive to seek redress in the courts. To the surprise of commentators,8 however, the common law has generally discouraged such suits. Even Posner has argued that statutory law may have been necessary to correct the common law’s mistake: “[I]n section 43(a) of the Lanham TradeMark Act Congress created a new right of action for competitors injured by misrepresentation that, although little utilized, may well have repaired any deficiencies of the common law in this area.”9 If the common law treatment of false advertising is inefficient, even though the litigants fit the evolutionary models, then those models are thrown into question. Our contention in this paper is that, contrary to appearances, the common law outcome with regard to false advertising is indeed consistent with efficiency. A straightforward application of the economics of information and advertising explains the common law position. Furthermore, a study of the effect of the “correction” by statute sheds further light on the question. If
The law of false advertising
19
the common law was in fact inefficient, the Lanham Act and other statutory interventions should have led to increased efficiency in resource allocation. Conversely, if the common law had arrived at optimal rules, the statutes should not have improved upon common law outcomes. In short, a reversal of the common-law rule by statute provides a good test of common law efficiency. In Part I we will consider the economics of advertising, with particular reference to the models of Phillip Nelson. Part II discusses the legal treatment of consumer and competitor suits against misrepresentation. Part III provides some evidence about effects of changing the legal environment. Part IV summarizes our conclusions.
I. Economics of advertising Until recently advertising presented something of a puzzle to economic theory. Much advertising is patently uninformative: rational consumers should not care what sort of breakfast cereal is eaten by famous baseball players, nor should they expect any relationship between the cleanliness of their clothes and the catchiness of the tune used to advertise a washpowder. Nonetheless, advertisers spend large sums of money on these sorts of messages as well as many others of equal value as information. Rational consumers should not be influenced by such messages, and rational advertisers should not spend money on messages without influence. Economists, believers in rationality by both consumers and producers, were puzzled. Phillip Nelson has offered an explanation for this behavior.10 Nelson argues that there are two types of goods: search goods and experience goods. A search good is one whose salient characteristics can be ascertained by presale inspection (e.g., the comfort of a pair of shoes); experience goods are those which must be consumed to be evaluated (e.g., the taste of a candy bar). Nelson has shown that this distinction can account for many otherwise puzzling aspects of market behavior. The role of advertising differs depending on which type of good is involved. In the case of search goods, where the consumer can and will easily determine for himself whether the goods are what he wants, advertisers have little incentive to misrepresent the quality of their goods. Thus, advertisers simply urge the consumer to make the inspection, and their message should be largely informative and truthful. Note, however, that the time and effort rational purchasers devote to search are directly related to the magnitude of the purchase.11 If a mistake would not be very expensive, the cost of acquiring information is likely to exceed the cost of purchase error. If, on the other hand, a mistake would be very costly, it pays to invest in reducing the probability of error. In homely terms, one might throw away the wrong color of a $1.00 lipstick, but most people cannot view with equanimity buying another car if they don’t like the first one. Consequently, sellers should provide considerable information on high-risk, big-ticket purchases, since consumers should be willing to pay for it.12
20
Ellen R. Jordan and Paul H. Rubin
In the case of the experience good, the consumer can determine quality only by purchasing and using the good. The function of advertising, therefore, is to get the consumer to try the product. Here, advertisers might have an incentive to mislead and make false claims. For low-priced, often-purchased goods,13 however, it would not pay to make such claims, since the consumer can be fooled by them only once. Hence, a seller of such goods (e.g., soaps, which are very heavily advertised) would appear to be wasting his money to mislead. Nelson argues that advertising of such products does serve an informative purpose: by his outlay, the advertiser demonstrates his confidence that the consumer will be satisfied with the good and will purchase it more than once. In this instance the truth of any information is irrelevant, since the message conveys confidence in the quality of the product and consumers come away with little else. At the other end of the scale, false claims for higher-priced experience goods may result in enough damage to raise a realistic threat of consumer lawsuits, which should adequately deter those falsehoods. Market checks on the efficacy of false advertising break down in that vast range of purchases which produce losses too great to shrug off but too small to sue about. In this area, however, the market does provide some protection for the consumer: reputation in and of itself is valuable as an indicator of responsibility and honesty, qualities likely to produce substantial future benefits to the firm. Conversely, false advertising is likely to provoke negative consumer response. Hence, advertising expenditures can be capitalized much the same as other investment expenditures. For example, positive brand recognition contributes to the present value of the firm, but misrepresentation is likely to diminish it.14 Where the individual item sought is not heavily advertised, the consumer may choose to trust an intermediary, such as a department store or travel agent, to insure the quality of what he buys. The department store, in particular, deals in diverse products, not only in the particular merchandise being advertised; thus it stands to lose a great deal by misrepresentation. In short, no matter what kind of goods or services are involved, a decision to advertise conveys information: that the advertiser is willing to invest in his reputation and stands to lose if the customer is unhappy. This argument has especial force in another situation, identified by Darby and Karni.15 Some goods or, more often, services have “credence” qualities, defined as characteristics whose quality cannot be monitored by the nonexpert, even after consumption. For instance, if an automobile repair shop sells a consumer a part which he does not need, he will probably not detect any fraud if his car now runs better, even if it would have run equally well without the part. (Similar examples may be drawn from repair to humans, for example, unnecessary surgery.) Thus, the nonexpert consumer relies on the reputation of the seller, and the seller who has invested in his reputation has more to lose by practicing deception than one which has not.16
The law of false advertising
21
With respect to advertising, then, the characteristics of goods and services form a continuum, from those in which it is very easy to detect the truth or falsity of advertising claims (search goods: the truth of the claim can be ascertained before purchase) through experience goods (where the truth of the claim can be detected only after purchase and use) through credence goods (where the validity of advertisements may never be determined). As we move along this continuum from search to credence characteristics, misrepresentation becomes relatively more profitable, since detection by consumers becomes more expensive. Conversely, if consumers are aware of this problem, as goods and services acquire more credence characteristics, consumers should begin to rely more heavily on specialized buyers and on brand names as assurances of quality17 in order to avoid being defrauded. Nonetheless, it is in the case of credence characteristics that self-protection becomes most difficult and in which some legal remedy would seem most important. Even if we assume that there is in fact much advertising which is misleading, how much policing of false advertising would be likely if competitors were allowed to perform such policing? For theoretical reasons, we would not expect much. First, in a competitive industry no one firm would lose much from false advertising by competitors and therefore no firm would have much of an incentive to take legal action. Similarly, if a monopolist falsely advertises, there would be no firm which would lose much since, by definition, monopolists have no close competitors. It would only be in the case of oligopolistic industries that there would be any incentive for competitors to sue for false advertising, provided the advertising be about one particular firm’s product. If there were false advertising about the merits of the good made by all of the oligopolists, there would again be no incentive for legal action.18 Thus, even in an oligopolistic environment, there would be little gain from allowing policing of misleading advertising since firms have little incentive to undertake policing activities. Moreover, because it would be harder for new entrants to document all claims and because such firms advertise more heavily than established firms,19 we might expect that much of the policing which would occur would be by established brands against new entrants. If threat of suit is used as a barrier to entry, there might be costs of allowing such litigation. Although the data are not extensive, we will provide some evidence on this point in Part III. To summarize, this section has considered the costs and benefits of allowing more private policing of advertising. First, the analysis indicates that all advertising is informative; and even though its informational content may be nominal, the fact that the advertising exists is information in its own right. If advertisers were subject to suit, they might restrict their advertising, which would serve both to reduce information and to decrease the incentive to acquire a good reputation. Second, producers of new products rely heavily on advertising in order to create a market share. Claims of new entrants are particularly difficult to prove (or easy to challenge as being misleading) since such firms have no past history on which to base their claims. Thus, we might
22
Ellen R. Jordan and Paul H. Rubin
expect that, if competitors’ remedies for misrepresentation became easy, such remedies would be used disproportionately against new entrants into the market, thereby imposing substantial costs on consumers in the form of forgone opportunities for new and innovative products. In short, there appear to be few benefits and sizable costs from policing advertising, especially at the instance of competitors.
II. The common law’s response to false advertising Purchasers’ remedies If consumers are misled by advertising and choose a product because of a falsehood, they have suffered an injury and resources have been misallocated. Accordingly, the common law20 afforded the consumer redress for his injury, predicated on somewhat overlapping theories of breach of contract and misrepresentation. Some recently enacted consumer protection statutes provide additional causes of action, both state and federal. On the theory that the seller has failed to deliver what was promised, the buyer may assert a claim based on the contract of sale. Part of the seller’s obligation is to deliver “conforming” goods,21 and the buyer may reject goods which are not as promised.22 If he has already accepted the goods, a substantial nonconformity entitles him to revoke his acceptance if his failure to reject was either induced by the seller’s assurances or caused by a latent defect.23 The buyer in either case has an action for damages against the seller, computed by determining the amount it will cost him to obtain conforming goods, as well as so-called incidental and consequential damages, including personal injuries.24 If the buyer elects to keep the goods or if he cannot meet the standards for revocation, he may notify the seller that the goods do not conform to what he was promised,25 and may claim the difference in value between the goods as promised and the goods actually delivered.26 He may be barred from any remedy if he fails to notify the seller of his objection within a reasonable time.27 This scheme obviously hinges on determining what was in fact promised or warranted by the seller. The sales article of the Uniform Commercial Code recognizes that sellers often make express representations about the goods,28 but in addition protects buyer expectations by providing certain implied terms in sales of goods.29 In the case of false advertising, however, the complaint will normally be a breach of an express warranty if the goods do not correspond to the advertising claims made for them. Several roadblocks may confront a plaintiff here. First, he must show that the falsehood in the advertising rose to the level of a warranty or promise. The law requires that the representation in question must become “part of the basis of the bargain,”30 and hence seems to require some showing that the information was material in the purchase decision. Second, the seller is given
The law of false advertising
23
considerable latitude to “puff” or extol the virtues of his goods in a general way without incurring warranty obligations.31 Third, if the advertising is sponsored by the manufacturer, as it often is, the buyer may have to overcome the objection that he is not in a contractual relationship with anyone but his immediate seller, who made no false representation. But this privity-ofcontract defense seems to be crumbling, particularly when personal injury results, but also where the only loss is monetary.32 Fourth, the consumer must give proper notice or be barred from any remedy.33 A final and very important issue is whether the seller has attempted to limit his liability or restrict the buyer’s remedies by their “agreement,” often the closely printed form presented for the buyer’s signature. The code recognizes that parties may wish to bargain over the risks of product defects and to reflect those risk allocations in the price term. Thus, the seller is permitted to sell without warranties, provided certain requirements are met.34 In the false advertising situation, the seller may attempt to cut back on the claims made by his advertising by offering a much more limited undertaking in his forms. The code insists that such limiting language must be consistent with any express warranty the seller has created,35 and refuses to give effect to attempts to mislead the buyer by promising much in ads but cutting enforceable rights through the forms. One important loophole in such buyer protection is the code’s parol evidence rule,36 which allows seller and buyer to deny effect to representations made before the final agreement is signed. If buyer and seller in fact agree that the sum total of their understanding is what is on the paper they sign, the parol evidence rule causes no great injustice since, if a term is omitted, the signer should have objected before signifying assent. But where there is no negotiation at all and the buyer is unaware of the necessity to double-check the long, densely printed form, sellers may be able to exclude responsibility for very relevant point-of-sale representations. The seller may also undertake to limit the buyer’s remedies if a warranty has been breached, by offering a so-called “repair or replacement” guarantee.37 The code does seek to protect the buyer, too, providing that if such a limited remedy “fails of its essential purpose,”38 the buyer can assert all the remedies the code otherwise affords. If the buyer can establish that the advertising claim did amount to a warranty that was part of the enforceable bargain, his road to recovery is an easy one. The seller’s liability is absolute and the buyer’s fault, if any, is irrelevant, unless his own conduct and not the misrepresentation is the cause of his injury. If, however, he stumbles on any of the contractual defenses outlined above, he may elect to sue for breach of duty imposed by law, the duty not to deceive. The law of torts has long offered a remedy for misrepresentation on which the plaintiff has relied to his economic detriment. To avoid a too-easy upsetting of transactions, however, this remedy is difficult to obtain and requires proof of the defendant’s fault. In most situations, the purchaser must show the defendant knew or should have known the falsity of his statement and
24
Ellen R. Jordan and Paul H. Rubin
that he intended to deceive. Furthermore, the plaintiff must demonstrate that he justifiably relied on the misrepresentation and establish the causal connection between it and his damages.39 If the complaint is that the advertisement in question was only a half-truth, the problem becomes more complex since the seller’s duty to disclose has traditionally been very limited.40 If deceit can be proved, the buyer may choose to disaffirm the sale and get his money back, or sue for damages. In egregious cases, punitive damages may also be awarded. Any or all of these remedies could be asserted by a consumer who thinks he has been “taken” by false or deceptive advertising.41 All of them, however, share a common, major disadvantage: the cost of resorting to the courts in many cases far outstrips the relief available, even if the consumer has an airtight case. Hence, excluding those cases where the falsity of the advertised claim results in personal injury, or the magnitude of the loss is large,42 it is not surprising that there are very few cases where consumers have sued. Attempts to lower the barrier of the cost of suit by legislative action are evaluated in Part III. Competitors’ remedies False claims In addition to those consumers who are misled, competitors of the misrepresenting firm are also harmed: some sales which they would have made have been diverted to the other firm. Moreover, we would expect such competitors to lose substantially more than individual consumers. Thus, efficiency considerations suggest that competitors should be permitted to sue for damages or injunctions when false advertising has occurred. This is precisely the point made elsewhere by Posner.43 But the common law did not allow such suits. In American Washboard v. Saginaw Manufacturing Co.,44 the court held that misrepresentation did not in fact provide an action for competitors. This same opinion was upheld by the Supreme Court in Mosler Safe Co. v. Ely-Norris Safe Co. 45 In an earlier decision in the same dispute, Judge Hand explicitly stated that “The Law does not allow him [the damaged competitor] to sue as a vicarious avenger of the defendant’s customers.”46 This would seem a curious result, since it is unlikely that the customers would sue in their own behalf; thus, the decisions seemed to indicate that no legal penalty would be imposed on misleading advertisers. What would be the effect of these decisions on consumers? Consumers could not assume that there was any presumption of truth in advertising; rather, they would be forced to rely on their own devices to determine which products to purchase. In the case of search goods, inspection would be relatively more intensive than otherwise. In the case of experience goods, perhaps more sampling would occur than if consumers could believe the accuracy of advertising. In the case of credence goods consumers would be forced to rely
The law of false advertising
25
on reputation and intermediaries.47 Thus there would be some efficiency loss if advertising could not be believed. In an optimal world there would be no misleading advertising. As we have argued above, however, suits by competitors would probably have little real impact, except as a new barrier to entry. In disallowing such suits, then, the common law probably had little effect on whether or not advertising is truthful. Claims about competitors’ product One form of misrepresentation consists of making unjustified claims about one’s own product. An advertiser may also make unfavorable statements about competitors or their products. This latter claim may be characterized as either disparagement or defamation. Disparagement occurs when the claims refer to goods made by rivals; defamation refers to making claims about the personality or other characteristics of the competitor rather than his goods. This distinction is important: In most states if a statement is defamatory per se, no actual financial damage need be proved, injury being conclusively presumed; in an action for disparagement, on the other hand, only special damages can be recovered, and usually they must be alleged and proved with considerable specificity. This requirement has been particularly troublesome since injunctions classically have been unavailable against either disparagement or defamation. Other differences also tend to make disparagement the more difficult path to recovery: in disparagement the plaintiff must prove the defendant’s statement false, whereas in defamation the defendant bears the burden of proving truth; in disparagement, unlike defamation, “malice” is a requisite to recovery.48 These distinctions have been puzzling to commentators on the law.49 Economic analysis, however, can explain the common law’s greater solicitude for competitors who claim defamation rather than disparagement. If firm A claims that firm B is owned by a devil worshipper, this will cost firm B money (in forgone sales) if customers believe the claim and prefer not to do business with devil worshippers.50 There may be no efficient way for consumers to ascertain the truth of the devil-worshipper claim, for it is a credence characteristic. In contrast, if firm A claims that the product of firm B will not work well, then consumers can presumably determine the truth of this claim for themselves. Thus the relative ease with which a plaintiff may make out a case for defamation as compared with making out a case of disparagement, which appears mysterious when considered in terms used by courts, may be explicable in terms of the ability of consumers to determine truth and, hence, discount disparagement more than defamation. It is at least arguable that this distinction makes sense in economic terms. Specifically, false claims about product quality are claims about search or experience
26
Ellen R. Jordan and Paul H. Rubin
characteristics where consumers will not be misled. Claims about the personal characteristics of manufacturers, on the other hand, are claims about credence qualities, which cannot be verified and where the legal process may be the most efficient way to lay false claims to rest. In cases where an action for disparagement has been permitted, there is some evidence consistent with this contention. In at least some of these cases, the disparaging statements were not of the sort that consumers could verify for themselves; rather, such statements often dealt with actions of some third party. In Testing Systems, Inc. v. Magnaflux Corp.,51 part of the disparaging statement dealt with the product of Testing Systems; but in addition, it was alleged that the government had decided not to continue to use the product. While it may be argued that consumers should be able to determine for themselves the quality of the product, it seems less plausible that they would be able to determine what decisions the government had made about the product. Similarly, in Black & Yates, Inc. v. Mahogany Association, Inc.,52 at least one of the disparaging statements made about “Philippine mahogany” by the association was that the Federal Trade Commission was about to rule that Philippine mahogany could no longer be called mahogany; again, consumers could probably not check the validity of this sort of statement. Producer identity The argument thus far has been that misleading advertising will not have much of an effect, either because it can be checked by purchasers (in the case of search goods) or because the message contained in advertising is basically irrelevant (in the case of experience goods). Therefore the common law’s reluctance to entertain actions based upon such advertising is explicable in economic terms. To further strengthen this argument, consider an area in which the common law has generally provided injunctions against misrepresentation. This is the area of “passing off”—of misrepresenting the manufacturer of goods. In cases where B sells goods and claims that they were made by A, then the courts have not hesitated to enjoin this action.53 The original common law of trademark was aimed precisely at this practice. The economic value of identification of the manufacturer is obvious. A producer will not invest in gaining a reputation by producing high-quality goods unless he can be sure of capturing the value of the brand name.54 Trademark identification, moreover, imparts useful information to the consumer—information whose falsity would be difficult to detect. The name of the producer is perhaps the most important credence characteristic of a good. For example, when a consumer buys a pair of blue jeans marked “Levis” he must rely totally on the truth of the label; it is completely impossible for him to trace back the movement of the goods from retailer to wholesaler to the manufacturer and check this claim. The firm should be permitted to establish a property right in the brand name of the product inasmuch as that right will lead to efficient investment in quality and
The law of false advertising
27
reliability. Presumably if firm A spends money in advertising its product, it is to tell consumers that products made by this firm are worth buying— information which is useless unless consumers are able to tell which products are in fact made by firm A. Therefore efficiency considerations dictate that the law should protect trademarks as the common law has done. Another interesting area is the protection of geographic designation. In Grand Rapids Furniture Co. v. Grand Rapids Furniture Co.,55 the courts held that a furniture store could not advertise that its furniture was made in Grand Rapids when in fact it was not; this is consistent with other cases, such as Pillsbury-Washburn Flour Mills v. Eagle Co.56 But in California Apparel Creators v. Wieder of California,57 the court did not enjoin New York clothing manufacturers from using the name “California” in their business. The court distinguished these cases on the basis of the loss involved—in the Grand Rapids case the Grand Rapids manufacturers had a tight trade association, and virtually all of them were involved in the case; thus the court reasoned that any business diverted by the misrepresentation would have otherwise gone to these manufacturers and that therefore they had a substantial interest in the injunction. In the California case there were approximately 4,500 manufacturers in California, only a few of whom were represented in the suit. Here the courts believed that losses were not significant enough to the plaintiffs to grant relief. This reasoning does not make economic sense. If consumers are misled about the place of origin of goods and if the place of origin is relevant, then it should not matter whether there are 5 or 5,000 manufacturers who lose by the misrepresentation; efficiency would require an injunction in either case. However, though the reasons provided by the court are not useful, the decision is nonetheless probably correct. Consumers presumably care, not about the actual location of the manufacturing plant, but about the reputation of the firm or manufacturer. If there are few enough firms so that they can form a tight trade association and sue as a class, then there are probably few enough manufacturers so that they can self-police the use of the geographic name and thus insure the quality of the goods bearing this name. Conversely, when there are so many firms that they cannot all join in a class for the purposes of legal action, there are probably too many to guarantee that any level of quality can be maintained by the firms bearing the geographic name. Hence, when there are many firms suffering the supposed loss, consumers probably do not rely on the name as a proof of quality, and thus there is not any loss in not protecting the mark. This seems to be a situation where the law has reached the economically correct decision, though the judicial reasoning is couched in other terms. There are other areas of the law of producer identity which are subject to economic analysis, if not to economic solution. For example, consider the law of “trade names.” A trade name at common law was a name or descriptive word with some meaning in addition to its use to identify a particular brand. An example is given in American Aloe Corp. v. Aloe Creme Laboratories,
28
Ellen R. Jordan and Paul H. Rubin
Inc.; 58 here, a company that produced products derived from the aloe verde plant with names such as Aloe-Creme and Aloe-Ointment brought suit against a competitor producing products with names such as Aloe Essence. The issue was whether the name Aloe was subject to trademark-type protection; the courts ruled that it was not. The economic analysis is reasonably straightforward: the cost of maintaining “aloe” as a trademark or trade name would have been the lost information by consumers about ingredients in competing products; the benefits would have been the relatively greater incentive of the first company to protect its reputation. Economic analysis indicates that protection should be given to the point where marginal cost equals marginal benefits—that is, to the point where the incremental value of the greater incentive to protect reputation would just equal the incremental value of the additional information possessed about ingredients by consumers. Economic theory alone merely states the question; much as proof of facts is necessary to decide legal issues, concrete empirical data are required to estimate where this point lies. Therefore, it is not surprising that courts must consider each case on its facts, and have not derived clear-cut rules in this area. Similar issues arise in determining what symbols associated with a product should be protected as trademarks (e.g., the picture of a shredded wheat biscuit in Kellogg Co. v. National Biscuit Co.).59 Again the courts seem to have recognized the economic issue involved and attempted to deal with it. It is impossible to determine how successful they have been.
III. Some evidence Our argument has been that allowing competitors to sue for misrepresentation would have been inefficient, except in the case of producer identity. Conversely, if suits were permitted, gains would have been minimal, inasmuch as market forces substantially limit the likelihood that firms engage in false advertising. In addition, the increased danger of suit would increase the risk, and thus the cost, of truthful advertising. Finally, such suits may have been used to restrict entry by new competitors, thus creating a net loss in social welfare. To bolster our conclusion that the common law perhaps struck the best balance, we have examined the results of various steps taken to remedy the “deficiencies” of the common law in this area. First, government has assumed responsibility for consumer protection from false and misleading advertising. Second, legislation has been passed to encourage purchasers and competitors to undertake policing of false advertising. Finally, evidence is available from the German experience, where competitors are permitted to sue rather freely. The Federal Trade Commission Richard Posner has analyzed in detail Federal Trade Commission cases dealing with misrepresentation and false advertising for 1963, 1968, and
The law of false advertising
29
1973. His conclusion is “. . . that only a small fraction of the Federal Trade Commission’s activities in the false-advertising area is consistent with a proper allocation of commission resources, considering the character of the false-advertising problem and the limitations of the commission’s sanctions.”60 Posner blames this poor effort on the lack of a theory by the commission as to where intervention would be likely to prove valuable. However, an alternative theory which is also consistent with his evidence has been developed here: there is simply not much harmful false advertising which occurs; therefore, commission intervention would be unproductive, no matter what rule it might apply. (In fact, this argument is consistent with Posner’s other policy suggestion, which is that the commission be given powers to enforce a federal antifraud law and that it confine its efforts to deliberate attempts to defraud.) The uselessness in economic terms of most FTC proceedings may be due to ineptness on the part of the commission; but it may also be due simply to the small population of costly deceptions which occur. Both the FTC and commentators have explained the FTC’s poor performance by pointing out that the only sanction formerly available to the FTC, the cease-and-desist order, was not effective against consumer deception.61 That problem has now been remedied; the FTC has been granted vastly greater enforcement powers by the Magnuson-Moss Act of 1975.62 In addition to validating FTC power63 to promulgate trade regulation rules which have the force of law,64 the act provides stiff penalties for violations. The FTC may now sue and obtain an injunction,65 sue to exact a penalty for the violation of an existing cease-and-desist order by anyone (not merely the named respondent) who does so with actual knowledge that the act was unfair and unlawful,66 or sue for violation of an FTC legislative (as distinguished from interpretative) rule.67 Fines of up to $10,000 per day may be imposed for each violation. The act also empowers the FTC to seek redress for competitors and consumers if it can show that the act or practice “. . . is one which a reasonable man would have known under the circumstances was dishonest or fraudulent.”68 The FTC may seek rescission, reformation, return of property, and damages as relief for competitors or consumers.69 Although no private suit may be brought for violation of the FTC act,70 several states have passed little FTC acts71 or other laws against deceptive trade practices, which provide private actions for consumers or competitors, often offering incentives to encourage private enforcement.72 It remains to be seen whether a more potent FTC will “clean up” a marketplace in which false advertising has run rampant and unchecked. New remedies for purchasers Recognition that a rational person will not spend money for a lawyer, court fees, and whatever his own time is worth to collect miniscule amounts has led to a number of proposals to make it easier to assert rights. Historically the first efforts were to lower the cost of suit by establishing informal small claims
30
Ellen R. Jordan and Paul H. Rubin
courts.73 Although many have noted that these courts serve more as a collection agency against the consumer than as a forum for him to present his grievances,74 consumers who do go to these courts to seek justice often do very well.75 But even consumers who make use of these courts do not seek redress for misrepresentation, except in such small numbers that they are not separately classified in a major empirical study of small claims courts.76 Moreover, it is impossible to tell whether those consumers who alleged misrepresentation were complaining about advertising or about point-of-sale inducements by sellers. Another response to the problem of the high cost of dispute resolution has been to set up alternatives to courts, such as governmental arbitration and mediation services.77 A recent study78 of the workings of one very active Bureau of Consumer Protection, located in Illinois, remarked that only 18 percent of complainants mentioned misrepresentation, deception, or undue influence by the seller as the primary grievance.79 As might be expected, larger transactions (in dollar amounts) generated more complaints of this sort.80 Perhaps if false advertising is in fact a problem, buyers still prefer to absorb their small losses rather than to invest in these proceedings. Finally, some consumer advocates have argued for a system which provides incentives for private enforcement of consumer claims, such as multiple damage recovery, exemplary damages, and award of attorneys’ fees to a victorious consumer.81 Along these same lines, others have urged that consumers (or their legal representatives) be able to sue in a class action as representatives of all consumers who have been victimized.82 But the dynamics of litigation and its costs may shift the balance too far: groundless suits become too easy to bring and too expensive to defend against, offering the unscrupulous a kind of blackmail against legitimate business who may prefer to settle for some sum rather than to defend.83 Some state consumer legislation has attempted to build in safeguards against so-called “strike suits,” while still encouraging private enforcement of meritorious claims. In Oregon, for instance, the defrauded consumer who proves a willful violation can collect his actual damages or $200, whichever is greater, as well as punitive damages and attorneys’ fees and costs.84 Even where generous bounties such as these are available, however, research discloses only two cases in which consumers complaining about false advertising availed themselves of the provisions of the act. Interestingly, in both cases, the goods would seem to be search goods. In one85 the consumer wanted a tent for use in the winter, which requires special characteristics. Had the purchaser taken the precaution of inspecting a tent on display or opening the package in the store, he would have seen that the tent in question did not match the picture on the package and would not serve his purposes. In the other86 the consumer bought an automobile engine at a gas station. The size of an automobile engine seems to be a matter that would be of sufficient importance to the purchaser to expect him to inspect before he buys. Regardless of whether these cases should have been brought at all, two reported
The law of false advertising
31
cases in seven years indicate that even where consumers have been offered every inducement to come forward, no great flood of litigation seems to have engulfed the courts. The Lanham Act The Lanham Act, passed in 1946, greatly broadened rights of competitors to sue for misleading advertising.87 A test of the hypotheses in this paper would be to examine cases under this act in order to determine the efficiency effects of this broadened right of action. We have undertaken such an examination. All 182 cases citing 15 U.S.C. § 1145(a) reported in Shepard’s 1970 edition and 1972 and 1975 supplements were read. The vast majority of these cases were simply standard trademark infringement actions, coupled with a claim of unfair competition under the Lanham Act. Of the 182 cases cited, 16, or fewer than 10 percent, could be classified as complaints about a competitor’s false advertising. (Excluded from this total were a number of complaints that a competitor’s advertising used a picture of the plaintiff’s goods, since such situations are merely variations of the tort of “passing off.”) Those 16 cases can be further subclassified. Perhaps most important, from an economic viewpoint, are those claims which consumers cannot easily verify by inspection or inexpensive experience (credence qualities), and where the amount of injury is so small that no individual purchaser will resort to the courts. In such instances, deceptive advertising may be profitable and result in harm to honest competitors, as well as misallocation of resources. Six of the eighteen cases at least arguably fall into this category. Perhaps the clearest case for a competitor’s remedy is John Wright, Inc. v. Casper Corp.88 Two rivals in the sale of mechanical penny banks were the litigants. The plaintiff was the successor in interest to a firm which had gone to trouble and expense to produce authentic detailed reproductions of nineteenth-century banks. The defendant produced inexpensive imitations of the plaintiff’s banks on Taiwan yet advertised them as “authentic reproductions.” Those consumers to whom authenticity and faithfulness to the original were important were being deceived, the true source of authentic reproductions was being deprived of sales, and yet no consumer would find it worth his while to sue. Equally compelling is Skil Corp. v. Rockwell International Corp.,89 where the plaintiff complained that the defendant falsely and extensively advertised the results of tests conducted by an independent testing concern which compared the defendant’s power tools with the plaintiff’s and those of two other manufacturers. The credence quality here was the impartiality of the investigator; consumers might well place more faith in the result of a third party’s testing than they would in a claim originating with the seller. If those test results were misrepresented, both the consumer and other manufacturers would suffer injury. Actual injury to any one purchaser would be too small
32
Ellen R. Jordan and Paul H. Rubin
to justify litigation, although the consumer would not be getting what he wanted. A competitor’s remedy also seems sensible in Bohsei Enterprise Co. v. Porteous Fastener Co.90 In that case an importer of industrial fasteners complained that its rivals repackaged imported fasteners and either omitted the true country of origin or on occasion marked them “United States,” thereby creating the false impression that these were domestic rather than imported goods. The complaint also alleged that consumers associate domestic manufacture with higher-quality goods.91 If the allegations were true, the defendants were violating federal law92 as well as misrepresenting their goods; in addition, they were diverting sales from other importers and those domestic manufacturers who would otherwise be preferred. Again, individual purchasers would have too small a stake to assert any claim. A fourth case presents a slightly different situation. In Cutler Hammer, Inc. v. Universal Relay Corp.,93 the defendant had allegedly bought surplus relays manufactured by the plaintiff and changed the part numbers to make it appear that they met current military specifications for aircraft use. In this instance, however, the court noted that reliance on the misrepresentation could cause death or personal injury. In a sense the seriousness of the consequences mitigates against the need for a competitor’s remedy, since the large potential liability to injured consumers should itself discourage representations of this sort. A case which provides shakier support for a competitor’s remedy is Natcontainer Corp. v. Continental Can Co., Inc.94 The misrepresentation alleged was that Natcontainer’s boxes conformed to minimum standards established by the Interstate Commerce Commission when in fact they did not. Here the boxes were sold to businesses, rather than consumers, in such quantities as to make a breach-of-contract action feasible, although no such actions appear in the record. Indeed, the Lanham Act claim was asserted as a counterclaim by the defendant in an antitrust action brought by Natcontainer against one of its rivals. In this case, the Lanham Act complaint served as a weapon to discourage antitrust enforcement. Perhaps the weakest case for a competitor’s remedy, even where credence qualities were involved, was presented by Ames Publishing Co. v. WalkerDavis Publications, Inc.95 There, one magazine publisher complained about another’s attempts to launch a rival magazine. In the defendant’s presentation to potential advertisers, circulation projections were asserted as facts, and verification of subscriber lists was claimed but never done. Although one would think that a competitor’s remedy would be the only viable one here, the defendant was able to show that the plaintiff had engaged in exactly the same conduct when promoting its magazine. The court refused to recognize an “unclean hands” defense and granted an injunction. Ubiquity of the practice, however, may indicate that potential advertisers presented with these claims by all competitors would not be misled by one any more than by any other, and in fact might discount them all.
The law of false advertising
33
On the other end of the scale, several cases involve claims that would surely be verified by consumers before purchase. For instance, Saxony Products, Inc. v. Guerlain, Inc.,96 was part of a long-running battle between name-brand perfume manufacturers and those who market much cheaper perfumes as indistinguishable from them. When defeated in their claim that the copyists were impermissibly using their trademarks,97 the name-brand manufacturers countered by alleging that the claim that the scents were similar was false and hence a Lanham Act violation. Consumers would clearly rely on their own olfactory sense and not on an advertising claim in this instance, since they were invited to make the comparison at the point of sale. Any falsehood would be immediately apparent, making proof of damage difficult and injunctive relief superfluous. The only possible explanation for such an action would be harassment of a cut-rate competitor. In similar fashion Bose Corp. v. Linear Design Labs, Inc.98 involved complaints about statements that the defendant’s stereo system produced the “most life-like reproductions” and the “most exacting reproductions ever heard.” A potential purchaser of a stereo system will surely rely on his own ears, not on advertising copy, in such judgments. Any such statements, which can only be classified as seller’s puff, can hardly produce harm to anyone, and litigation about them must be viewed as harassment of a newcomer. What can we conclude from these cases? First, there have been extremely few of them—sixteen cases over a ten-year period. Part of the reason may be that competitors, rather than suing directly, would prefer to use the indirect, but free, remedy provided by complaint to the Federal Trade Commission. In the period from October 1977 to July 1978 there were 306 complaints filed with the FTC alleging deceptive advertising.99 The FTC will not divulge the identity of complainants, so we are unable to determine whether these were filed by consumers or by competitors. Nonetheless, it is at least possible that this method is used. But for whatever reason, the small number of complaints filed with the commission and the much smaller number of Lanham Act cases indicates that there was not in fact a huge demand for legal remedies for misrepresentation which the common law was restraining. Moreover, we have been charitable in deciding which cases might arguably have been efficient; none of the cases mentioned above seem to create large inefficiencies. The cases which we have not discussed (dealing with matters such as whether a hair rinse really rinses out easily100 or whether a cigarette can advertise that it is “lowest” in tar and nicotine when in fact it is only tied for lowest)101 also do not seem to be of significance. Those Lanham Act cases which do seem to involve substantial inefficiencies are generally passing-off cases, and remedies would have been available under common law. Moreover there is at least some evidence that firms have used the Lanham Act to harass new entrants. For example, in Smith-Victor Corp. v. Sylvania Electric Products, Inc. 102 the allegation was that a new type of photographic light would not in fact provide as much light as claimed. Since the two lights could be compared before purchase, this was clearly an attempt to harass a
34
Ellen R. Jordan and Paul H. Rubin
new competitor, who was at somewhat of a disadvantage since he did not have a long history of success with light. Defendants were clearly new entrants in seven of the sixteen cases selected for study. For instance, in Saxony Products, Inc. v. Guerlain, Inc.103 it was the newcomer’s frank and open imitation of plaintiff’s “Shalimar” fragrance which led to the litigation. Cheap imitations were also the issue in John Wright, Inc. v. Casper Corp.104 In Ames Publishing Co. v. Walker-Davis Publications, Inc. 105 the advertising attendant on launching a new publication led to the suit. Bose Corporation, an established manufacturer of stereo equipment, challenged the claims of newcomer Linear Design Labs, Inc.106 Likewise, H. A. Friend & Co. v. Friend & Co. 107 involved a son who set up an unauthorized “branch” of the family business in California and began competing with it. The Potato Chip Institute took offense when General Mills began marketing its “Chipos,” and tried to stop any advertising of the new product as potato chips.108 One of the most interesting of the cases involved Honeywell, Inc., a giant in the field of control systems, which was being sued by Electronics Corporation of America. Honeywell, in attempting to enter the market for replacement parts for Electronics Corporation equipment, was charged with misrepresenting the ease with which Honeywell parts could be substituted. Here Honeywell was trying to enter a new, although very limited, market and was advertising a “search” characteristic.109 Other cases, however, indicate that two already established brands may extend their competitive battle into the courts.110 In American Brands, Inc. v. R. J. Reynolds Tobacco Co.,111 instead of being used against the new entrant, the lawsuit seemed to be part of the challenger’s campaign to launch a new low-tar cigarette. The new entrant charged in its Lanham Act complaint that advertising for the defendant’s established Now brand falsely claimed that Now was “lowest” in tar when in fact the plaintiff’s new cigarettes were as low. Defendant promptly counterclaimed for inaccuracies in the plaintiff’s advertising (e.g., that plaintiff’s Carlton brand was the “fastest growing”). As the judge remarked, actual tar content of cigarettes must be disclosed and is easily discovered by purchasers, making it unlikely that anyone had been misled. Nonetheless, an injunction was granted directing Reynolds to stop claiming its cigarettes were “lowest” in tar without some qualifying information. In the remaining cases, it is difficult to ascertain the relative market positions of the plaintiff and defendant.112 The foregoing suggests that although Lanham Act claims may be asserted to block new entrants, they are used at other stages in the competitive struggle as well. In any case, claims of misrepresentation seem to be used as a method of harassing rivals rather than as a method of encouraging truthful statements.
The law of false advertising
35
The German experience Rudolf Callmann113 and Walter Derenberg,114 two leading authorities on the American law of unfair competition, were strong advocates of strengthening the rights of competitors to sue for false advertising. Both had experience with the German system, which depends on competitor action, not government regulation, to prevent deception in the marketplace. More recently Grimes115 has studied the German system and holds it up as a model. Although we have not independently assessed the data summarized by Grimes, some observations follow based upon it. In Germany the Law Against Unfair Competition has governed the regulation of misrepresentation and false advertising since 1909. The important section of this law, section 3, “prescribes injunctive relief, [which is freely granted]116 against those who, in the course of competitive business activities, make deceptive assertions concerning business matters, specifically concerning the ‘nature, origin, manner of manufacture or the value of goods or services,’ ” or “concerning ‘price lists, the manner or source of acquisition of goods, . . . the receipt of awards, . . . the motive or purpose of the sale, or the amount of available supplies.’ ”117 This law may be enforced by competing firms, by trade associations, or (since 1965) by consumer groups. In fact consumer enforcement seems to be a small part of total enforcement. The most influential group in enforcing this law is the Zentrale zur Bekämpfung unlauteren Wettbewerbs e.V., an organization consisting of more than 1,100 members, mostly chambers of commerce, trade organizations, and larger business entities. In 1967 this organization handled about 3,000 cases. Almost 93 percent of the complaints brought by this organization are directed against nonmembers. This is consistent with our argument that the larger, better-established firms who have invested in their reputations are less likely to practice deception. Indeed, in the United States, the worst offenders in consumer frauds are said to be the fly-by-night door-to-door operators who strike once and disappear.118 Nevertheless an economist might question whether all the organization’s energies are directed against such truly fraudulent practices. In fact “[t]he Zentrale has been accused of being a tool of the large and established firms.”119 Presumably, one use for such a tool might be to stifle competition by nonmembers, particularly smaller firms or new entrants. We have no direct evidence on this point, but in general the value of competition is less important in European jurisprudence. European law countenances anticompetitive collusion by business rivals which would be per se violations of American antitrust laws.120 Hence an adverse effect on competition would raise fewer problems for German courts than in this country.
36 Ellen R. Jordan and Paul H. Rubin
IV. Summary The major issue discussed in this paper has been the common law treatment of suits by competitors for false advertising consisting either of untrue claims about one’s own goods or about those of a competitor. In general the common law discouraged such suits, except in the special case of passing off. Although purchasers may have a right of action, in the general case consumers will not lose enough in any one transaction to bring legal action for misrepresentation. Competitors, however, may suffer enough loss to make legal action worthwhile, and commentators have urged that suits by competitors may be a better remedy for misrepresentation than government action. Thus the common law ban on such suits has been considered puzzling and potentially in conflict with the view that the common law is generally explicable in terms of economic efficiency. Except in the case of producer identity, our analysis casts doubt upon a need for any legal action for competitors premised on false advertising. We have shown that the economics of advertising, primarily as analyzed by Nelson, indicates that there generally will be little incentive to mislead in advertising.121 Moreover in only a few cases would there be competitors who were sufficiently damaged by misleading advertising to find it worthwhile to sue. The one situation where such suits are favored is in the case of the actual name of the manufacturer; this is a credence characteristic, whose truth consumers could not ascertain for themselves. If there is in fact little misleading advertising and if competitors would not usually sue, then there would be little loss if such private policing were discouraged. Moreover it is at least possible that allowing competitor suits for misrepresentation would give established firms a competitive weapon to use against new entrants. This is especially likely when we consider that new entrants advertise more heavily than established firms and that it may be more difficult for new firms to establish the truth of advertising claims than for established firms to do so. Thus if we did allow firms to sue easily for misrepresentation, there might be substantial costs in terms of reduced competition in markets. The economic argument indicates that there would be little benefit and substantial costs from allowing competitor suits. In support of these arguments we have relied on several pieces of evidence. First, Posner has found that the Federal Trade Commission in its attempts to police false advertising has in fact accomplished little. Second, the evidence from Germany indicates that most suits for misrepresentation are undertaken by established firms. While there are other arguments which might explain this pattern, it is consistent with the hypothesis that such suits would be used to discourage competition, especially given the much lesser importance of competition as a value in German law. Third, and most persuasive, is the evidence showing the effects of statutory reversal of this position. Although the Lanham Act now allows competitors to sue for false advertising which damages them, there have been a trivial number of such suits. Of those which
The law of false advertising
37
have been brought, moreover, only an extremely small number have been economically efficient, even with a broad construction of such efficiency. Many have been aimed at new entrants. Thus, this natural experiment with a statutory reversal of a common law position has served to buttress our theoretical arguments and to demonstrate that the common law was probably efficient in this area.122 Another consideration is the burden on advertisers if all advertising is scrutinized: even accurate advertising must run the risk of a charge of being “misleading.” If such burdens exist, the law may hinder efficient consumer choice since more information for consumers is better than less. The First Amendment to the United States Constitution clearly reflects that judgment when the information in question is political. On the other hand the Supreme Court only recently decided that commercial speech has any claim to First Amendment protection,123 and then the Court took pains to note that “untruthful speech, commercial or otherwise, has never been protected.”124 In the terms developed here, however, competing political claims seem to exhibit more credence characteristics than most claims about goods or services in that their truth or falsehood is much harder for the consumer to determine. Yet even if the need to guard against falsehood is greater, the government is prohibited from intervening because of the dangers and distortions such intervention might cause. As both Coase125 and Director126 have pointed out, it is difficult to understand why governmental intervention in the marketplace for goods is regarded by many as a blessing, while its role as a regulator of ideas is so vehemently opposed. Market, not government, regulation in the realm of commerce as well as ideas may be a better idea.127
Notes *
1 2 3 4 5 6 7
The authors would like to thank Richard Posner, Walter Hellerstein, Gregory Alexander, James Kau, and Carl Jordan for helpful comments. An earlier version of this paper was presented at the Law and Economics Workshop at the University of Chicago, and thanks are also due to the participants, especially Edmund Kitch. That literature is summarized in Richard A. Posner, Economic Analysis of Law (2nd ed. 1977). Different dynamics operate to shape the law created by legislatures. No economist has suggested that statutory law is economically efficient. Paul H. Rubin, Why Is the Common Law Efficient? 6 J. Legal Stud. 51 (1977). William M. Landes & Richard A. Posner, Adjudication as a Private Good, 8 J. Legal Stud. 235 (1979). John C. Goodman, An Economic Theory of the Evolution of the Common Law, 7 J. Legal Stud. 393 (1978). Similar evidence has been marshaled by Marc Galanter. See Marc Galanter, Why the “Haves” Come Out Ahead: Speculations on the Limits of Legal Change, 9 Law & Soc. Rev. 95, 100–102 (1974). “In its Final Report the National Commission on Product Safety stated that it was impractical for consumers to press claims . . . unless the claim is at least
38
8 9 10 11 12 13 14
15 16 17
18
19
Ellen R. Jordan and Paul H. Rubin in the $5,000 to $10,000 range.” D. Gould, Staff Report on the Small Claims Court submitted to the National Institute for Consumer Justice 16 (1972) [hereinafter cited as Small Claims Report]. See also Arthur Allen Leff, Injury, Ignorance, and Spite: The Dynamics of Coercive Collection, 80 Yale L.J. 1, 21 (1970). See, e.g., Milton Handler, False and Misleading Advertising, 39 Yale L.J. 22 (1929); Rudolf Callmann, False Advertising as a Competitive Tort, 48 Colum. L. Rev. 876 (1948). See Richard A. Posner, The Federal Trade Commission, 37 U. Chi. L. Rev. 47, 66 (1969). Phillip Nelson, Information and Consumer Behavior, 78 J. Pol. Econ. 311 (1970); and also his Advertising as Information, 82 J. Pol. Econ. 729 (1974). See Richard H. Holton, Consumer Behavior, Market Imperfections, and Public Policy, reprinted in David Rice, Consumer Transactions 57 (1975). George J. Stigler, The Economics of Information 69 J. Pol. Econ. 213 (1961), reprinted in George Stigler, The Organization of Industry (1968). See Holton, supra note 11, at 58. Rubin makes this point in another context. See Paul H. Rubin, The Theory of the Firm and the Structure of the Franchise Contract, 21 J. Law & Econ. 223 (1978). See also Keith Leffler, The Role of Price in Guaranteeing Quality (1977) (mimeographed paper), for an argument that advertising can be used to guarantee quality to consumers. See generally Kristian S. Palda, The Measurement of Cumulative Advertising Effects (1964), who demonstrates that advertising should be treated as an investment. Michael R. Darby & Edi Karni, Free Competition and the Optimal Amount of Fraud, 16 J. Law & Econ. 67 (1973). This may be another reason for allowing advertising by professionals such as doctors. The advertising will create an additional value in the brand name and hence provide additional incentives for quality service. For any one purchase where credence qualities are involved, the consumer cannot be sure that he is getting a desirable good; i.e., there is a low probability of his finding out whether claims about any one good are true or false. However, if the consumer buys many goods from the same source, the probability of ascertaining that claims about one of those goods are false would be increased. Thus, if there are ten goods sold by a store and if there is only a 1 percent chance of finding out that any claim is false, the larger number of claims increases the chance of finding out that one or more of them are false. In this situation, claims about individual goods have credence characteristics; but the reputation of the seller of all of the goods is an experience characteristic. It may be that consumer trust in the reputation of the intermediary is misplaced in the situation of mail order advertisements carried by magazines. Although the consumer may rely on the publisher of the magazine to police their advertisers, Consumers Union claims that in fact such policing is minimal or nonexistent. Delusions of Vigor: Better Health by Mail, Consumer Reports, January 1979, at 53. Striking evidence is provided by a case brought by the state of Arizona against five manufacturers of rigid polyurethane foam insulation materials, charging a conspiracy to falsely advertise the flammability characteristics of the product. State of Arizona v. Cook Paint & Varnish Co., 391 F. Supp. 962 (D. Ariz. 1975). If the allegations of the complaint are true, far from trying to keep one another honest, all the oligopolists banded together to deceive. See Yale Brozen, Entry Barriers: Advertising and Product Differentiation, in Industrial Concentration: The New Learning 115–16 (Harvey J. Goldschmidt, H. Michael Mann, & J. Fred. Weston, eds 1974).
The law of false advertising
39
20 The common law of sales has been largely replaced by statute, the Uniform Commercial Code. Given the unique drafting history of this statute, however, the fact that it is a “case-law code” makes it appropriate to include UCC provisions in this discussion. See Soia Mentschikoff, The Uniform Commercial Code, An Experiment in Democracy in Drafting, 36 A.B.A.J. 419 (1950); Grant Gilmore, In Memoriam: Karl Llewellyn, 71 Yale L.J. 813, 814 (1962). 21 U.C.C. § 2–503(1) 22 U.C.C. § 2–601. 23 U.C.C. § 2–608. 24 U.C.C. §§ 2–711–2-717. 25 U.C.C. § 2–607. 26 U.C.C. § 2–714. 27 U.C.C. § 2–607(3)(a). 28 U.C.C. § 2–313. 29 U.C.C. §§ 2–312–2–315. 30 U.C.C. § 2–313. 31 U.C.C. § 2–313(2). 32 See generally William L. Prosser, The Fall of the Citadel (Strict Liability to the Consumer), 50 Minn. L. Rev. 791 (1966). 33 U.C.C. § 2–607(3)(a). 34 U.C.C. § 2–316. 35 U.C.C. § 2–316(1). 36 U.C.C. § 2–202. 37 U.C.C. § 2–719. 38 U.C.C. § 2–719(2). 39 See William L. Prosser, The Law of Torts § 105 (4th ed. 1971). 40 See id. 41 We make the same distinction between the consumer who complains he has been “taken,” and the one who has received a defective product or “lemon,” as do Page Keeton & Marshall S. Shapo in Products and the Consumer: Deceptive Practices 3 (1972). 42 Jacobson v. Art Storage & Moving Co., 16 N.Y.S.2d 906 (N.Y. City Ct. 1939), appeal denied, 260 App. Div. 809, 22 N.Y.S.2d 928 (1940). 43 See Posner, supra note 9. 44 103 F. 281 (6th Cir. 1900). 45 273 U.S. 132 (1927). 46 7 F.2d 603, 604 (2d Cir. 1925). 47 See text at note 17 supra. 48 Developments in the Law: Competitive Torts, 77 Harv. L. Rev. 888, 893 (1964). 49 Id. at 893–95. 50 This very rumor bedeviled McDonald’s. See Wall Street Journal, Nov. 16, 1978, at 14, col. 2. 51 251 F.Supp. 286 (E.D. Pa. 1966). 52 129 F.2d 227 (3d Cir. 1941). 53 See Developments in the Law, supra note 48, at 908. 54 From time to time, it has been reported that dealers in illegal drugs have tried to use a trademark to guarantee the quality of their goods. Since such manufacturers cannot rely on the legal system to protect their trademarks, these attempts invariably fail. 55 127 F.2d 245 (7th Cir. 1942). 56 86 F. 608 (7th Cir. 1898), cert. denied, 173 U.S. 703 (1899). 57 162 F.2d 893 (2d Cir. 1947). 58 420 F.2d 1248 (7th Cir.), cert. denied, 400 U.S. 820 (1970). 59 305 U.S. 111 (1938).
40
Ellen R. Jordan and Paul H. Rubin
60 Richard A. Posner, Regulation of Advertising by the Federal Trade Commission 31 (Am. Enterprise Inst. 1973). 61 See, e.g., Developments in the Law—Deceptive Advertising, 80 Harv. L. Rev. 1005, 1082–83 (1967). 62 See Note, 29 Baylor L. Rev. 559, 563–67 (1977). 63 Pub. L. 93–637, 99 Stat. 2183 (1975). Such power had been upheld as “implied,” even without explicit legislative authority, in National Petroleum Refiners Ass’n v. FTC, 482 F.2d 672 (D.C. Cir. 1973). 64 15 USC § 57a(a)(1)(B)(1976). 65 15 USC § 53e(b) (1976). 66 15 USC § 45(m)(1)(B) (1976). 67 15 USC § 45(m)(1)(A) (1976). 68 15 USC § 57b(a)(2)(1976). 69 15 USC § 57b(b) (1976). 70 Holloway v. Bristol-Myers Corp., 485 F.2d 986 (1973). 71 See John A. Sebert, Jr., Enforcement of State Deceptive Trade Practices Statutes, 42 Tenn. L. Rev. 689, 698–704 (1975). 72 William A. Lovett, State Deceptive Trade Practice Legislation, 46 Tulane L. Rev. 724, 743–49 (1972). These statutes are considered in Part III-B infra. 73 For the history of the small-claims-court movement, which dates back to 1605 in England, see Small Claims Report, supra note 7, at 3 (1972). Efforts to reduce the cost of consumer suits continue: a bill to provide federal funds to states for establishing and promoting procedures for resolving minor consumer disputes passed the Senate but was defeated in the House. Cong. Q. Weekly Rept., Oct. 21, 1978, at 3082. 74 See, e.g., Note, 4 Stan. L. Rev. 237 (1952); Note, 21 Stan. L. Rev. 1657 (1969); Judge Tim Murphy, Small Claims Court—The Forgotten Court, 34 D.C. Bar J., February 1967, at 14. 75 See Small Claims Report, supra note 7, at 11–12. 76 John Montague Steadman & Richard S. Rosenstein, “Small Claims” Consumer Plaintiffs in the Philadelphia Municipal Court: An Empirical Study, 121 U. Pa. L. Rev. 1309 (1973). Their survey found that 18 percent of the 614 cases studied fell into a “miscellaneous” category, which included fraudulent advertising (id. at 1327, n. 133). Fraudulent advertising was not separately mentioned as one of “the more prominent miscellaneous claims categories.” (Table B, id. at 1347.) 77 See generally, David A. Rice, Remedies, Enforcement Procedures, and the Duality of Consumer Transaction Problems, 48 Boston U. L. Rev. 559, 588–95 (1968). 78 Eric H. Steele, Fraud, Dispute, and the Consumer: Responding to Consumer Complaints, 123 U. Pa. L. Rev. 1107 (1975). 79 Id. at 1131. 80 Id. at 1135–36. 81 See Rice, supra note 77, at 570–76. 82 The literature on consumer class actions is extensive. See the exhaustive report of Gould, in Small Claims Report, supra note 7. 83 See, e.g., Note, Consumers, Class Actions, and Costs: An Economic Perspective on Deceptive Advertising, 18 U. C. L.A. L. Rev. 592, 603–05 (1971). See also, Jonathan M. Landers, Of Legalized Blackmail and Legalized Theft: Consumer Class Actions and the Substance-Procedure Dilemma, 47 S. Cal. L. Rev. 842 (1974). Landers also points out that the cost of resort to the courts means that businesses, too, cannot obtain legal redress for small injuries. Landers suggests that the legal system may reflect a social judgment that such small losses should lie where they fall, instead of using resources to reallocate the loss through the legal system. In other words, court time should be reserved for more weighty problems.
The law of false advertising
84 85
86 87
88 89 90 91
92 93
94 95 96 97 98 99
100 101 102 103 104
41
See also Hacket v. General Host Corp., 455 F. 2d 618, 626 (3d Cir.), cerl. denied, 407 U.S. 925 (1972). Or. Rev. Stat. § 646.638 (1971). Scott v. Western International Surplus Sales, Inc., 267 Ore. 512, 517 P.2d 661 (1973). The real bone of contention appeared to be the store’s refusal to give a cash refund. The retailer offered the consumer a credit, which he claimed was worthless to him. Wolverton v. Stanwood, 278 Ore. 341, 563 P. 2d 1203 (1978). Some courts have questioned whether the Lanham Act was intended to do any more than give a federal right of action “false descriptions of substantially the same economic nature as those which involve infringement or other improper use of trademarks.” Samson Crane Co. v. Union National Sales, Inc., 87 F. Supp. 218 (D. Mass. 1949), aff’d per curiam, 180 F. 2d 896 (1st Cir. 1950). The Third Circuit gave the section a more expansive reading in L’Aiglon Apparel v. Lana Lobell, Inc., 214 F.2d 649 (3d Cir. 1954). The controversy is reviewed in Universal Athletic Sales Co. v. American Gym, Recreational & Athletic Equipment Corp., 397 F. Supp. 1063, 1071–73 (W.D. Pa. 1975), aff’d mem., 566 F. 2d 1171 (3d Cir. 1977). 419 F. Supp. 292 (E. D. Pa. 1976), aff’d sub nom., Donsco, Inc. v. Casper Corp., 587 F. 2d 602 (3d Cir. 1978). 375 F. Supp. 777 (N. D. III. 1974). 441 F. Supp. 162 (C. D. Calif. 1977). If “Made in USA” is perceived as a proxy for better quality by consumers, one may question the part of Posner’s critique of the FTC which condemns devoting resources to combatting such deceptions. Consumers may not be xenophobic, as he points out, but they may prefer better-made goods. See Richard A. Posner, supra note 9, at 73. 19 U.S.C. § 1304 (1976). 285 F. Supp. 636 (S.D.N.Y. 1968). This case also points up the necessity of permitting a firm to guard its trademark. If the misnumbered part in fact caused a crash and blame could be traced to that part, injured parties would look to the original manufacturer, identified on the part, for redress. Although current law places the burden on the plaintiff to prove that the defect existed at the time the product left the manufacturer (Restatement (Second) of Torts § 402A (1965)), in fact the original manufacturer would have to put the intervening alteration of the part into evidence in order to put the plaintiff to his proof. On misuse of trademark grounds alone, the plaintiff may have had a good cause of action. 362 F. Supp. 1094 (S.D.N.Y. 1973). 372 F. Supp. 1 (E.D. Pa. 1974). 513 F.2d 716 (9th Cir. 1975). R. G. Smith v. Chanel, Inc. 402 F. 2d 562 (9th Cir. 1968). 467 F. 2d 304 (2d Cir. 1972). Letter from Carol M. Thomas, Secretary, Federal Trade Commission to authors (Nov. 22, 1978), in response to Freedom of Information Act request. One widely publicized instance of competitor complaint to the FTC occurred in the hotly contested beer marketing battle between Miller and Anheuser-Busch. Miller complained that Anheuser-Busch’s advertising its beers as “natural” was “misleading.”. See Wall Street Journal, Mar. 14, 1979, 48, col. 2. Alberto-Culver Co. v. Gillette Co., 408 F. Supp. 1160 (N.D. Ill. 1976). American Brands, Inc. v. R. J. Reynolds Tobacco Co., 413 F. Supp. 1352 (S.D.N.Y. 1976). 242 F. Supp. 302 (N.D. Ill. 1965). 513 F.2d 716 (9th Cir. 1975). 419 F. Supp. 212 (E.D. Pa. 1976), aff’d sub nom., Donsco, Inc. v. Casper Corp., 587 F.2d, 602 (3d Cir. 1978).
42
Ellen R. Jordan and Paul H. Rubin
105 106 107 108 109 110
372 F. Supp. 1 (E.D. Pa. 1974). Bose Corp. v. Linear Design Labs Inc., 467 F.2d 304 (2d Cir. 1972). 276 F. Supp. 707 (C.D. Cal. 1967). Potato Chip Inst. v. General Mills, Inc., 461 F.2d 1088 (8th Cir. 1972). Electronics Corp. of Am. v. Honeywell, Inc., 358 F. Supp. 1230 (D. Mass. 1973). Skil Corp. v. Rockwell Int’l Corp., 375 F. Supp. 777 (N.D. Ill. 1974); AlbertoCulver Co. v. Gillette Co., 408 F. Supp. 1160 (N.D. Ill. 1976) (Tame v. Earth Born Creme Rinses); American Consumer, Inc. v. Kroger Co., 416 F. Supp. 1210 (E.D. Tenn. 1976) (Kroger’s “Price Patrol” report); American Home Products Corp. v. Johnson & Johnson, 436 F. Supp. 785 (S.D.N.Y. 1977) (Anacin v. Tylenol). 413 F. Supp. 1352 (S.D.N.Y. 1976). Universal Athletic Sales Co. v. American Gym, Recreational & Athletic Equipment Corp., 397 F. Supp. 1063 (W.D. Pa. 1975); Bohsei Enterprises Co. v. Porteous Fastener Co., 441 F. Supp. 162 (C.D. Calif. 1977). 1 Rudolf Callmann, The Law of Unfair Competition, Trademarks, and Monopolies § 8.2(c)(3d ed. 1965). See, e.g., Walter J. Derenberg, Federal Unfair Competition Law at the End of the First Decade of the Lanham Act: Prologue or Epilogue? 32 N.Y.U.L. Rev. 1029 (1957). Warren S. Grïmes, Control of Advertising in the United States and Germany: Volkswagen Has a Better Idea, 84 Harv. L. Rev. 1769 (1971). Id. at 1789–90. Id. at 1781–82. See, e.g., David Caplovitz, The Poor Pay More 153–54 (1967). Grimes, supra note 115, at 1784. F. M. Scherer, Industrial Market Structure and Economic Performance 158 (1970). What incentive there is will be strongest in the case of credence qualities. Consistently with its congruence with economic rationality, the common law most freely permitted private actions when such claims (as, for example, manufacturer identity) were at stake. The law of false advertising discussed here is part of the general body of law called “unfair competition,” which deals in general with torts committed by one business firm against another. We have found economic analysis fruitful in examining one part of this law, and would predict that it should also be useful in understanding other aspects of the legal treatment of relationships between competitors. As one example, a preliminary analysis by Paul H. Rubin (unpublished paper at Univ. of Ga.) of the rights of employees to compete with former employers indicates that many aspects of the employment contract can be understood in terms of the concepts of human capital developed by Gary Becker. See Gary S. Becker, Human Capital (2nd ed. 1975). Another example is the common law treatment of price competition. In fact, the common law said virtually nothing about such competition. See Edmund W. Kitch & Harvey S. Perlman, Legal Regulation of the Competitive Process 193 (1972). Strictures on such behavior, which ban unfairly low prices, are clearly inefficient; such restrictions are creatures of statute, such as the Robinson-Patman Act. In this area the failure of the common law to interfere is a clear sign of its efficiency. Virginia State Board of Pharmacy v. Virginia Citizens Consumer Council, Inc., 425 U.S. 748 (1976). Id. at 771. R. H. Coase, Advertising and Free Speech, 6 J. Legal Stud. 1 (1977). Aaron Director, The Parity of the Economic Market Place, 7 J. Law & Econ. 1 (1964). But see the contrary view expressed by Vern Countryman, who argues that
111 112 113 114 115 116 117 118 119 120 121
122
123 124 125 126 127
The law of false advertising
43
we must trust government to protect the consumer against defective goods “which he has neither the information nor the skill to identify for himself.” Vern Countryman, Advertising Is Speech, in Allen Hyman & M. Bruce Johnson, Advertising and Free Speech 35, 37 (1977). Our argument is that the consumer has considerable information and sources of advice without governmental intervention, most of it provided by advertising.
4
Freedom of speech vs. efficient regulation in markets for ideas* Albert Breton and Ronald Wintrobe
Introduction This paper is concerned with the operation of markets for ideas. For concreteness and to focus the discussion we shall consider four specific markets: the market for judicial decisions (the courts, viewed strictly as a marketplace for ascertaining the truth), those for scientific ideas, for advertising messages and the market for political ideas. The problem to be investigated was first posed by Coase (1974, 1977), and can be understood by looking briefly at the market for political ideas. In the United States free competition in that market is guaranteed by the First Amendment which prohibits Congress from making any law ‘abridging the freedom of speech, or of the press; . . .’. Similar legislation or implicit rules exist in all democratic countries; it would, indeed, seem to be an essential requirement of democracy. Free speech is easy to justify on ethical grounds; to many, it is desirable for its own sake. Can it, however, also be justified on efficiency grounds? Does freedom of speech result in the election to public office of the best candidates and in the adoption by governments of the best policies? John Milton must have thought so when he wrote: ‘Let (truth) and falsehood grapple; who ever knew Truth put to the worse in a free and open encounter?’1 If, under freedom of speech, truth always triumphs over falsehood, one would have a powerful justification for its enactment. One would not have to assume that it is desirable for its own sake: it would be desirable on efficiency grounds because it would sort out good from bad ideas. But this view leads to a number of paradoxes, as Coase (1974, 1977) pointed out. Let us consider them in turn. First, if free competition is always desirable in the market for ideas, why is it not so in markets for goods? One can argue that the presence of externalities or of public goods make regulation desirable in certain goods markets. But the case for regulation on these grounds would appear to be, if anything, considerably stronger for ideas, since these possess all of the properties of public goods. It follows that, as
Freedom of speech vs. efficient regulation in markets for ideas 45 Coase put it, liberal intellectuals are inconsistent in favoring government regulation in goods markets while at the same time fiercely defending freedom of speech as a basic right [Coase (1974, pp. 384–385)]. A second paradox concerns commercial speech or advertising. If competition in the market for ideas leads to efficiency, how can the regulation of advertising be justified? The United States Supreme Court, for example, draws a sharp distinction between commercial and political advertising and holds that the second is protected by the First Amendment, while the first is not. If free speech distinguishes truth from error in the market for political ideas, presumably it does so in the market for advertising also. Yet the U.S. Supreme Court consistently draws the distinction, an attitude which Coase found incomprehensible [Coase (1977, pp. 13–32)]. Finally, there is the puzzle of the courts, viewed as a marketplace for sorting truth from error. Competition within the courtroom is heavily regulated. Who can speak, when, the order of speaking and the kinds of things which can be said are all subject to numerous rules and regulations and to interpretation of them by the presiding judge. Many kinds of evidence cannot be introduced at all. It must be presumed that in courtrooms the sorting of truth from error is of paramount importance; why then is that market so heavily regulated if unregulated competition can be counted on to sort truth from error? In this paper, we contend that the paradoxes can be clarified by reference to a basic proposition (elaborated below) which states that under unregulated competition error tends to drive out truth. Our analysis provides a consistent explanation for the broad pattern of observable regulation in many markets for ideas: specifically, it allows us to explain the extensive regulation of the courts, the form of regulation in markets for scientific ideas, the regulation of advertising messages and finally, it tells us why the political marketplace is not and should not be regulated. The last argument provides the basis of our case for free speech. Essentially, we are suggesting, contrary to the agrument of Coase, that the broad observable pattern of regulation in markets for ideas can be rationalized on efficiency grounds. This should not be taken to imply that we do not believe that improvements in regulation are possible. However, a more fully developed model than that provided here would be necessary before useful recommendations in that direction could be made. Our purpose is the relatively modest one of trying to make sense of existing regulatory structures. As indicated, we shall develop our argument by considering each of the four markets listed above. The structure of the argument is, however, the same in each case, so that a brief preliminary summary of it may be useful. It consists of five propositions:
46
Albert Breton and Ronald Wintrobe
Proposition 1. Competition distinguishes truth from error if experimentation (direct testing of competing hypotheses) is possible. Where this is not possible, or where it is too costly, unregulated (perfect) competition does not necessarily distinguish truth from error. Instead, adverse selection may occur; when it does error tends to drive out truth in the same way that bad products drive out good ones in Akerlof’s (1970) model of the market for ‘lemons’. The necessary conditions for adverse selection are: (i) buyers cannot distinguish ‘good’ products from ‘bad’ ones (because the cost of experimentation is too high), and (ii) information is distributed asymmetrically – sellers may, for example, have more information than buyers about the quality of their products. In the ideas markets we shall be considering, both of these conditions are fulfilled.2 Proposition 2. If the two necessary conditions just listed hold, efficient regulation – restrictions on freedom of speech – can improve the working of markets for ideas. These restrictions may take any form, including self-regulation, namely, regulation imposed by the sector on its own members. Leland (1979) showed that where the conditions for adverse selection ((i) and (ii) above) obtain, regulation, in the form of licensing, which successfully (efficiently) removes (filters or screens) low-quality products from the market can improve (in the Pareto sense) the functioning of goods markets. The same proposition holds in ideas markets where ‘good’ ideas can be defined as those which contain a high proportion of the truth, ‘bad’ ideas as those which possess a high proportion of error and efficient regulation as that which, on balance, successfully (though not necessarily perfectly) removes bad ideas from the market. Some ideas are obviously bad – condition (i) pertains to the kind of information which needs to be screened out because it is deceptive: false information which appears to be true or which, for one reason or another, would command greater credence than it deserves.3 Proposition 3. screen.
It does not pay (even if it were possible) to use a perfect
Consequently, some truth is always screened out in the process of regulating ideas markets. We define the level of repression in an ideas market as the extent to which true information is kept out of the marketplace.4 This notion is, of course, difficult to quantify precisely and we make no attempt to do so here. We shall assume, as does everyone else who discusses the subject, that rough and ready judgements are possible.
Freedom of speech vs. efficient regulation in markets for ideas
47
Proposition 4. The set of regulations governing speech (the screening device) in any ideas market is what we shall term an ‘enforced monopoly’. The reasons are best explained in the context of each of the specific ideas markets considered. The rationale for it, however, may be briefly stated: it follows from the objective which, as we shall see, holds in all the ideas markets considered, that the same set of regulations be consistently applied to all statements made in that market. We shall refer to the set of regulations governing speech in any ideas market as a collective screening device. To illustrate, consider the rules of evidence used in the judicial system. In order that the objective of justice be met, the same set of regulations must be enforced across defendants in similar cases: to the extent that the rules are weakened for some and not for others, justice is not obtained. Moreover, there are many possible sets of rules of evidence which could be used, and there is, in fact, considerable variation across judicial systems in the rules that are used. The requirement that in each jurisdiction, only a single system can prevail implies the existence of monopoly power which can be captured by certain groups who are favored by a particular system. Although the argument that the set of regulations governing speech in an ideas market constitutes an ‘enforced monopoly’ applies to ideas markets such as science or justice, the category of enforced monopoly is broader than that of ideas markets. Consider for example, phenomena such as a country’s national flag, its monetary system, system of weights and measures, or its legal system. In each case, the social objective (national identity in the case of flags, the reduction in transaction costs in the case of money or weights and measures, and justice in the case of the legal system) requires that only a single ordering device (the flag, the monetary unit, the units of weights and measures, and the set of laws) be used. In each case, the monopoly is not necessarily ‘natural’ and private incentives to deviate from it (e.g. to use the flags of ethnic groups, foreign currencies, other systems of weights and measures, or to ‘buy’ justice) must be policed and the monopoly ‘enforced’. Proposition 5. Because the collective screening device is a monopoly, ideas markets are not perfectly self-regulating. In none of the markets considered are the ideas ‘sold’; consequently the social problem caused by monopoly in ideas markets is not the usual one of price exceeding marginal cost. Instead, the problem is to guard against undue or excessive repression of information. As a consequence, a second kind of regulation is needed for efficiency, namely, regulation by the state or by some other authority to protect minority viewpoints and to guard against inefficient entry. In what follows, we hope to show that this framework is an illuminating way to explain the existing pattern of regulation in ideas markets. We
48
Albert Breton and Ronald Wintrobe
shall discuss each of the four markets in turn, beginning with the courts, then going on to consider scientific ideas, advertising and finally political ideas.
The courts The aspects of the courts with which we are concerned is their role as fact finding bodies, that is, as markets for ideas in which attorneys on each side of a case compete to convince an impartial tribunal of the truth or justice of their own side.5 Direct experimentation or testing is obviously impossible here; indeed, court hearings may be thought of as substitutes for this. As Coase emphasized, participants in these markets are not free to say all they desire or to introduce any evidence they wish. For brevity, we shall refer to the set of regulations prevailing in any jurisdiction as the rules of evidence. One function of those rules is to act as a screen to remove deceptive information from the market. For example, in most and possibly in all (democratic) jurisdictions, the evidence obtained from lie detector tests cannot be used in the courts. This restriction is surely based on the notion that lie detector tests appear to have a validity that they do not in fact possess. In that sense the information they convey is deceptive. However, by screening these tests out, some true information is, at the same time, rejected. The validity of the information obtained from the lie detector tests is on a par with the validity of information derived from (say) econometric tests; that information is probably also deceptive in the same sense, but the consequences of accepting it as true if it is not, are very different than the consequence of accepting as true the results of lie detector tests when they are not. Proposition 3 states that screening devices are never perfect. That seems to be clearly true for judicial rules of evidence. It follows that some good information will be screened out and some bad information let in so that errors in the administration of justice will occur. Moreover, since direct experimentation is ruled out, it will seldom be known whether justice is done in any particular instance. Usually, all that can be known is whether justice is done in terms of whether the procedures followed were those prescribed by the judicial code. As long as it is believed that a correspondence between justice in this sense and true justice exists, procedural justice can serve as an indicator of the latter. [This is, of course, the standard argument for the efficiency of screening, see Stiglitz (1975).] The enforced monopoly argument (Proposition 4) follows directly from this. For, if establishing whether true justice is done is decided on the basis of whether procedural justice is observed, it follows that the same rules of evidence must be applied to everyone. The requirement of consistency of treatment implies that everyone must be dealt with according to the same rules. This requirement is easy to rationalize: since direct comparison of the results with ‘true’ justice is difficult, if not impossible, it can never be known whether deviations from the rules result in a closer or further approximation to true justice. From this follows the requirement of uniformity of treatment
Freedom of speech vs. efficient regulation in markets for ideas
49
according to a particular set of rules and that, within any jurisdiction, the existing set of rules is a monopoly. Finally, from Propositions 3 and 4 (to the effect that the rules of evidence are a monopoly, and that the monopoly, as a screen, is imperfect), it is easy to infer Proposition 5: the case for regulation to protect citizens against the biases in the rules. In practice, such regulations are embodied in constitutional provisions guaranteeing the rights of defendants, in the appeals process, in the provision of state-financed lawyers to the indigent, and so on. In these and other ways, the state acts as a regulator to protect citizens against undue exercise of the monopoly power contained in the rules of evidence. But such protection is also embodied in the basic right of citizens to make changes in or overturn the existing rules via the political process. The issue of how changes in the rules are effected is discussed in section 5 (on political ideas). We hope that this brief discussion has shown, however, that the notion of restrictions on free speech in the market for judicial decisions is quite sensible. We now turn (in somewhat greater detail) to our second case: the market for scientific ideas.
Scientific ideas Models of science There appear to be two main competing views of how scientific ideas markets work.6 According to the first, which Newton-Smith (1981) calls the ‘rational’ model – we will call it the ‘rationalist’ approach – the objective of science is to get at the truth. The chief exponents of this view are Popper (1968) and Lakatos (1978). Popper states this goal precisely as ‘verisimilitude’, namely, the process of continually increasing the degree to which theories approximate the truth. Popper suggests that the method of scientists in achieving this objective is that of ‘conjecture and refutation’: hypotheses are conjectured and scientists are led nearer to (but never reach) the truth through experimentation and the refutation of false hypotheses. Accordingly, the great leaps forward by men like Darwin, Einstein, Bohr or Keynes are just an extension to a higher level of a procedure the average scientist engages in every day. Lakatos (1978) differs from Popper in his emphasis on scientific research programs: sequences of theories in which each theory is generated by modifying its predecessor. This concept enables Lakatos to explain (as Popper cannot) the apparent fact that scientific theories have typically not in practice been rejected just because they have led to false predictions. Lakatos, however, is a rationalist. Competition among rival scientific research programs sorts out the good theories from the bad: in his language, scientific research programs either progress or degenerate, and whether they do one or the other depends only on their relative success in predicting empirical phenomena.
50
Albert Breton and Ronald Wintrobe
The difficulty with the rationalist approach in any of its versions is that it does not appear to describe what scientists do at all accurately. Rather, it pronounces on what they ought to do, in general disregard for what they typically do and have historically done. In economics, for example, the theory of utility-maximization is never tested against some other hypothesis. There is no alternative hypothesis. As Stigler has put it: ‘It is essentially inconceivable (but not impossible) that the theory of utility-maximizing is wrong, and the purpose of empirical investigation is not to test this assumption (pace Friedman). . . . Indeed there is no alternative hypothesis’ [Stigler (1975, p. 139–140)]. This seems to be an accurate reflection of the view of most economists. As a result, there is a basic corpus of economic theory which is simply never subjected to test. Kuhn (1962), the major ‘non-rationalist’ philosopher of science, explains this and other phenomena with his well-known concept of a paradigm.7 ‘Normal science’ – what ordinary scientists do on a day-to-day basis – is conducted within a paradigm – the central body of accepted theory of a discipline.8 The paradigm is expected to yield further knowledge, but exactly how it does so is not known. Normal science is like puzzle-solving. As puzzles get solved knowledge grows and cumulates within the paradigm. Kuhn suggests that a science is only a ‘mature science’ if it operates within a paradigm. In the absence of a paradigm, knowledge is not cumulative. As a consequence, what distinguishes scientific knowledge from other forms of knowledge is the property of cumulation. However, with the growth of knowledge, normal science also yields a growing stock of anomalies: puzzles which the paradigm should, and was expected to explain, but cannot. According to Kuhn, the growth of anomalies and the growing recognition of them eventually results in a scientific ‘revolution’: the creation of a new paradigm. This is how new hypotheses are created. An obvious example is the replacement of Newtonian by Einsteinian mechanics. If the revolution is successful, scientific activity reverts to normal puzzlesolving activities within the new paradigm. The so-called non-rationalist element in Kuhn’s approach enters at the level of paradigmatic choice. Kuhn puts this in a number of different ways. For example: ‘[t]he choice [between paradigms] is not and cannot be determined merely by the evaluative procedures characteristic of normal science for these depend in part upon a particular paradigm, and that paradigm is at issue’; ‘[the] issue of paradigm choice can never be unequivocally settled by logic and experiment alone’ [Kuhn (1962, p. 94)]. Scientific revolutions are like ‘changes of world view’9 or changes in gestalt perception. His fundamental point is that the choice among paradigms does not, and cannot, depend on logic and experimentation alone, but upon other ‘non-rational’ (in Newton-Smith’s sense) criteria, such as the relative influence or power of different groups within the scientific community.
Freedom of speech vs. efficient regulation in markets for ideas
51
An economic model Kuhn’s view of how markets for scientific ideas operate is based on the history of science and not on a logical theory of scientific method. That view can, however, be given a logical or efficiency rationale using the model proposed in this paper. In that model, a paradigm is simply a screening device which sorts hypotheses into two categories: those which are consistent with (may be derived from) the paradigm and those which are not. The function of paradigms, like that of any screen, is to economize on the costs of testing or experimentation (Propositions 1 and 2). While paradigms themselves are never tested, hypotheses may be derived from them which can be. This permits the accumulation of knowledge about propositions which are themselves difficult or impossible to test directly, such as the proposition in economic theory that people are rational. All of the propositions or empirical hypotheses which may be derived from the paradigm are necessarily consistent with one another. Empirical documentation of any one tends to raise the confidence of the scientific community in all of them and in the paradigm itself as an ‘engine of truth’, as Marshall described supply and demand analysis, the basic paradigm of economic science. From our point of view the central difference between the rationalists and the non-rationalists is not between rationality and irrationality, but between competition and monopoly. According to the rationalist view, scientific truth is the outcome of free competition among different hypotheses as decided by the impartial test of logic and experimentation. For the ‘non-rationalist’, on the other hand, the creation of alternative hypotheses and, therefore, of scientific truth is an entirely different activity from that of normal science; the ruling paradigm is a monopoly, displacement of which requires a scientific revolution. Competition among paradigms is competition ‘for the field’, as in Demsetz’ (1968) discussion of the bidding for the rights to provide the services of public utilities. In contrast to Demsetz and to more recent theories of this process which emphasize the contestability of natural monopoly markets,10 Kuhn may be interpreted as suggesting that the market for scientific paradigms is not perfectly contestable: the scientific community has some genuine monopoly power which can be exploited in the choice of paradigm. This explains why the goals of scientific communities and even the use of techniques of persuasion can be important in determining the choice of paradigms. To the extent that economists have attempted to model the market for scientific ideas, they have, usually implicitly, opted for the rationalist approach. This is the case for the older models among which one can note those of Marshall (1890), Keynes (1891) and Wicksell (1901, 1934), but it is also the case for the more recent models used in the debate launched by Friedman’s celebrated essay (1953) and joined, among others, by Machlup (1955), Koopmans (1957), Rotwein (1959), Samuelson (1963, 1966), Melitz
52
Albert Breton and Ronald Wintrobe
(1965) and Bear and Orr (1967). All of these deal with the nature of the tests that can falsify hypotheses and theories and implicitly with the competitive process among economic scholars which is assumed to lead to truth.11 In our model, on the other hand, there is competition among scientific hypotheses, but this competition is conducted within a paradigm which is itself a monopoly. What is the source of this monopoly power? Costs of experimenting and the demand for screening do not imply this result, for the screening process could be competitively organized, as it is, for example, for many consumer goods. Indeed, for some products, a considerable selection of screens is available such as critics for movies, plays or books; expert appraisals from hi-fi magazines, automobile magazines, and periodicals such as Consumer Reports; advice from agencies such as Better Business Bureaus, home service counselling; and many others. By sampling screens and comparing the rankings they yield with their own direct experience, consumers can choose those whose rankings are most consonant with their own. Competitive screening is feasible for consumer goods because of the twin postulates of consumer rationality and consumer sovereignty: rationality means that each consumer is assumed able to rank bundles of goods and services in a consistent order of preference; acceptance of consumer sovereignty implies that the ranking by each consumer is the best possible for him or her. Different consumers may assess the qualities of goods in very different ways. Since there is no consistency requirement across consumers’ rankings, there is no consistency requirement across screens. Accordingly, screening of consumer goods can be competitively organized, and consumers can choose among screens. Some aspects of scientific screening are indeed organized competitively. For example, journals, as screens, are organized on a competitive basis. The quality of an article is difficult to assess and many scientists find it convenient to rely on the reputations of journals and of their referees to screen out good articles from bad. Competitive screening is possible here for the same reason it is for consumers: by sampling the articles published by different journals, each scientist can compare the standards used by the journals with his or her own. Could individual scientists choose among competing paradigms in the same way as they do among journals? Kuhn suggested that, as a factual matter, they do not behave this way, and that in general, there are no purely logical or factual grounds on which paradigms can be compared, and on which one paradigm may be said to be better than one another: paradigms are ‘incommensurable’. Of course, judgements among paradigms are made; indeed, such judgements are implicit in the process whereby one paradigm displaces another— Kuhn’s ‘scientific revolution’. In what sense, then, can one paradigm be said to be better than another? Basically, in the sense the it is the consensus of the relevant scientific community that the one paradigm is better than another. Hence the consistency requirement (Proposition 4) and hence the difference
Freedom of speech vs. efficient regulation in markets for ideas
53
between a goods market and a pure ideas market. In the first, the ultimate judgement or ranking is individual, in the second, it is collective. To be sure, there have been and will undoubtedly be further attempts by philosophers of science to provide a logical standard with which paradigms can be compared.12 The present model suggests, however, that this problem is inherently insoluble, so that there can be no substitute for the collective judgement of the scientific community. Paradigms could be compared either in terms of their assumptions or their implications. Our model supports Friedman’s (1953) contention that the ‘realism’ of assumptions cannot be the basis for comparison. Indeed, it provides a simple economic rationale for it: in our model, the assumption of a paradigm cannot by definition be easily compared with reality, because the costs of direct experimentation to determine their truth or falsehood is high. So the ‘realism’ of paradigms can only be compared in terms of their implications. But how can paradigms be ranked on that basis? A paradigm will be clearly superior to another if it can predict or explain all that the other paradigm does and more. That may indeed be the case in some comparisons – for example, of Einstein’s and Newton’s mechanics, or in the comparison of the neoclassical and labour theories of value. Beyond this one cannot go: consider any two paradigms, each of which predicts or explains some phenomena that the other does not. No matter how much more one of them explains as compared to the other, so long as one can explain some phenomena that the other cannot, choosing among them is a judgement of values, not of logic. On this count, there can never by any absolute standard for comparing paradigms other than the collective judgement of the scientific community. The same line of argument explains why collective screening is used in the courts: there is no ultimate standard of justice other than a collective value judgement, which can be used to compare rules of evidence. The same relationship exists between scientific truth and truth in an abstract sense as that between justice according to a particular set of rules and abstract justice. In both cases, the abstract notions are the objective: ‘Justice is the first virtue of social institutions, as truth is of systems of thought’ [Rawls (1971, p. 3)]. In science, as in justice, however, it can never be known if that objective is met: no scientific proposition can ever be demonstrated to be absolutely true and no judicial decison can ever be shown to be truly just. According to Proposition 3, the use of a paradigm – a screening device – implies the presence of some degree of repression. The use of paradigms may be justifiable on efficiency grounds and if the paradigm in use is in fact the best available, the repression that takes place is efficient in some sense, but it is repression nonetheless. The proposition can be illustrated by reference to the neoclassical paradigm in economics. Within the confines of that paradigm, it is very difficult to explain such phenomena as wage and price stickiness, involuntary unemployment, why people vote, why they sometimes behave altruistically, why discrimination persists under competitive circumstances, and why apparent gains from trade are not exploited and give rise to strikes,
54
Albert Breton and Ronald Wintrobe
divorces, torture, and other similar refusals to trade. These phenomena must be classed as ‘anomalies’ which, despite considerable effort and ingenuity, have remained puzzles which have defied solution. Moreover, if one steps outside the neoclassical paradigm, solutions to some of these problems can be found and some are even ‘obvious’ to the non-economist. Attempts are therefore made to ‘explain them away’, to deny the facts, or to suggest that their explanation is properly the task of some other discipline.13 There are other forms of scientific repression besides the attempted suppression of anomalies. All points of view other than that embodied in the ruling paradigm are repressed. One cannot use the propositions of Marxian economics or the views of (say) Galbraith, for the simple reason that the ideas embodied in these models cannot be derived from the neoclassical paradigm. Consequently, some knowledge of ‘reality’ is lost (repressed) by adhesion to the neoclassical paradigm. The problem is that the use of a paradigm is an either-or choice: one cannot use the neoclassical paradigm to analyze some problems and the Marxist one to analyze others. If one does this, one may gain insight, but not scientific knowledge, the cumulative character of which follows from its being consistently based on a single paradigm. Finally, Proposition 5 follows from these considerations: because of the monopoly character of the ruling paradigm, regulation is needed, both to protect ‘dissidents’ – minority viewpoints within the scientific community – and to ensure that the ruling paradigm will not survive beyond the point at which it should be replaced. Of course, actual regulation might be inefficient and serve to protect an outmoded paradigm against dissidents. This is discussed further in section 5. But, in addition, there is the theoretical possibility of inefficient entry, to which recent developments in the theory of contestable markets have drawn our attention. Even in perfectly contestable markets, such entry is possible, as Baumol, Panzar and Willig (1982) have shown. The resulting inefficiency can be eliminated by regulation. Let us briefly consider one form of inefficient entry. Suppose that a monopolist produces a line of outputs which if it was produced by two or more firms would entail higher average costs. This ‘subadditivity’ derives from the presence of one or more sharable inputs in the production process which give rise to economies of scope. An entrant who, through innovation, successfully undercuts the monopolist and captures the market for even one single product would, by so doing, raise the average cost of producing the remaining outputs. It is possible that the increase in average production costs of the monopolist would be large enough to cancel any gain from the lower price of the product captured by the successful entrant, thus making the entry inefficient. To see how this applies to scientific paradigms, consider economics and suppose that the various fields which comprise the discipline – industrial organization, public finance, labor economics, international trade, microeconomics, monetary economics, etc. – reflect genuine economies of scope which in turn derive from joint use of the paradigm – a
Freedom of speech vs. efficient regulation in markets for ideas
55
sharable input. The flow of analytical propositions would then be produced at minimum ray average cost [Baumol (1982)], that is, at costs which were a minimum for each bundle of propositions. Suppose now that a successful entrant – call it mathematical economics – captures the field of microeconomics. Imagine further that this successful capture reduces the extent to which researchers can practice (and publish in) both microeconomics and applied fields. As a consequence, research in microeconomic theory and in applied fields would tend to be increasingly performed by different individuals (firms), and theory and application become increasingly estranged from one another. Would such a successful capture represent inefficient entry? It would if the increased cost of obtaining propositions in applied fields and the costs of the lost progress in microeconomics due to reduced contact with the fields were greater than the benefits of the progress which mathematics made possible in microeconomic analysis.14 Granted that for these reasons, regulation is essential for an efficient accumulation of knowledge, might it not be that science, unlike justice, is ‘self regulating’, i.e., that the members of the scientific community could be counted on to regulate themselves in an optimal fashion? There are two reasons why this could be so. First, it would seem that in their role as producers of knowledge, scientists have an incentive to seek to replace a paradigm which is relatively unproductive. The matter is, however, not that simple. Competition between scientists for the rewards of professional activity will mean that younger members of the profession will have an incentive to overthrow the paradigm even when the proposed replacement is less productive than the old (the problem of inefficient entry). Elders, on the other hand, will seek to maintain the existing paradigm even when it has outlived its usefulness; the value of what they themselves have accomplished will depreciate rapidly if the paradigm is replaced, so that their incentive to invest in a potential replacement is smaller than that of younger scientists. Elders will therefore tend to overprotect the paradigm. Kuhn has noted that if a revolution occurs in their lifetime, most never accept it: the way that the new paradigm replaces the old is by older scholars dying out. Samuelson (1964, pp. 315–316) has argued that something like this took place at the time of the Keynesian revolution. A second argument to the effect that science could be self-regulating considers scientists in their role of demanders rather than producers of truth. Suppose, for purposes of illustration, that one important reason scientists choose their line of work is love of truth. Assuming that it is truth they love (and not merely, say, the aesthetic beauty of theorems) and that the members of a scientific discipline are the only consumers of its output, the scientific community would indeed be perfectly self-regulating. That community could then be counted on to ‘police’ its members to prevent over- or underexploitation of a paradigm, so that external regulation would not be required.15 The problem with this line of reasoning is that scientists are also supported by, and ‘sell’ their output to, the rest of society. To the extent that they do, the
56 Albert Breton and Ronald Wintrobe interests of scientists as producers intent on selling their output, will also enter into the way the scientific professions regulate themselves. The smaller the fraction of the total demand for scientific output made up by the scientists themselves, the more their interests as producers will dominate. Hence the need for outside (state) regulation to protect minority viewpoints is larger, the larger the fraction of output consumed by the outside world. This may explain the presence of outsiders on governing boards of universities and of granting agencies. The institution of academic tenure, which might be thought of as a device to protect minority viewpoints, is in fact mixed: academic tenure does protect the views of those who have it. In order to be given tenure, however, one must either publish – and virtually the only way to publish in sufficient volume to merit tenure is to work at puzzlesolving within the existing paradigm – or kow-tow to elders, who, we have just seen, have a vested interest in the existing paradigm.16 Consequently, the institution of tenure protects the minority viewpoints of the relatively old and unproductive, while ensuring that young, would-be ‘revolutionaries’ must, for the most part, stay within the boundaries of the paradigm handed down to them by the elders of their profession.
Commercial advertising and goods markets The market for commercial advertising is different from the other ideas markets considered so far in that it is not a pure ideas market. The function of commercial advertising is to induce people to purchase goods. In other words, direct contact with the real world is available in principle. Indeed, for many goods, the cost of experimenting is low. For one class of goods, which Nelson (1970, 1974) calls search goods, qualities can be discovered prior to purchase by acquiring information on them. For others (experience goods) experimentation is only possible by purchasing, but for many goods, such as those which are bought repeatedly, the costs of experimenting may still be quite low. For these classes of goods, the problem of adverse selection cannot be very large. In addition there is always the possibility of regulating the qualities of the goods themselves, rather than regulating the advertising of them. On both these counts, the demand for screening of advertising messages is low relative to the demand for it in pure ideas markets. The second difference between advertising and pure ideas markets has already been pointed out. It rests on the fact that even where screening is desired, it is often possible for the screening process to be competitively organized. On the other hand, many countries regulate advertising. In Canada and the United States, the authorities screen commercial advertising for deceptive messages and issue cease and desist orders in those cases where a message is deemed deceptive; in the U.S., sales promotion devices (such as trading stamps) are subject to the FTC’s ‘fairness’ doctrine. Governments also regulate some goods and prohibit others from being sold entirely, which is in part a restriction on advertising, for if the prohibited goods (e.g. some
Freedom of speech vs. efficient regulation in markets for ideas
57
drugs) are sold on the black market, sellers are prevented from advertising in the media, or are restricted in the ways they can advertise. We can refer to the whole body of regulations that is in one way or another aimed at controlling commercial advertising as the advertising code. That code, like rules of evidence in the courts and the scientific paradigm, is a screening or filtering device which has the character of an enforced monopoly. How can the regulation of advertising be explained if the postulates of consumer rationality and of consumer sovereignty are accepted? The answer is that they are not always accepted, and they are least acceptable precisely in those areas where the basic axioms of consumer choice are least plausible. For example, it is well known that consumer choices do not satisfy the axioms of rationality (comparability and transitivity) if consumers are experimenting with goods, rather than choosing among known sources of satisfaction.17 Our model implies that the greatest demand for the regulation of advertising – for the replacement of individual screens by a collective one – will be revealed in at least two cases: (i) when the cost to individual consumers of experimenting and, therefore, the cost of generating an efficient ranking of goods and services are much higher than the cost of such experimentation to consumers collectively, as, for example, in the case of prescription drugs; (ii) when the cost of collective screening is low, because individual variation in what is demanded is unimportant, as is the case with weights and measures or with traffic signals to mention only two examples. Even if there is a demand for a collective screening device, this does not imply that the actual regulation or advertising code is necessarily efficient. The reason is that, as with justice and science, the collective advertising code is a monopoly. Such a code necessarily entails a ‘consistent’ treatment across goods. In some instances, the degree of consistency may be greater than what is optimally efficient. But the use of any advertising code is repressive, which, as the reader will recall, means that information about goods and services which consumers would find useful does not get to them. This last point brings to mind the question raised by Coase of the relationship between commercial and political advertising. Some information is repressed in the marketplace for commercial ideas, while information very much like it in the political marketplace is not repressed. To analyze why this is so, we must turn our attention to the functioning of this second marketplace.
Politics The political marketplace is, in some respects, like the market for goods. The primary function of political competition and political debate is not to sort out truth from error, but to sort out ‘good’ public policies from ‘bad’ ones, where the standard of what is good is the preferences of citizens. In one respect, however, political markets are more like pure ideas markets: the cost of experimenting is typically high. The reason is that many public policies
58
Albert Breton and Ronald Wintrobe
possess characteristics of public goods. Consequently, experimentation can only be performed by a whole political jurisdiction at once.18 Moreover, because it is difficult, if not impossible, to control the other variables which affect the outcome, the results of experiments are muddy. It is difficult, in other words, to judge whether the outcome in any particular instance is due to the variable whose influence it is sought to measure, or to the effects of other changes which occur during the period in question. Has the U.K.’s experiment in ‘monetarist’ economics correctly isolated the effects of that policy? Did the U.S. flirtation with supply-side economics conclusively show that those policies do not have the effects claimed for them by their proponents? Did the 1964 tax cut in the U.S. prove that Keynesian economics works? The same question can be asked of any innovation in public policy, and the answer is always the same: only when policies have been tried a large number of times by many different jurisdictions under different conditions is it possible to hold a view with some confidence. Yet the costs of these experiments are very large. Because these costs are so high, it would seem to be of paramount importance that the market in political ideas function effectively and sort out good political ideas from bad ones. Yet observation of the level of debate in this market – the speeches and utterances of politicians, the coverage of politics by the media, and the contributions of social critics and individual citizens – suggests that this is not the case. This, of course, is one way in which economists explain the fate of their own policy prescriptions in the political marketplace. Consider only those (relatively few) issues on which the profession is virtually in agreement, such as the benefits of free international trade, or the harm done by rent controls, wage and price controls, marketing boards, and minimum wage laws. Have not economists conclusively shown that these policies are bad? Yet they are continually being implemented by ruling politicians in democratic countries, and the frequency and duration of their use hardly appears to be diminishing. How can the implementation of these policies be rationalized? One explanation suggests that, though inefficient, they are implemented because of their income-redistribution features [e.g., Breton (1964), Stigler (1971), Peltzman (1976), Buchanan, Tollison and Tullock (1980) and Becker (1983)]. But such an explanation is itself incomplete. The number of those who gain by such policies is usually trivially small compared to the majority required to elect the government which promises them. It must therefore be explained how it happens that the majority of voters who do not gain, but lose, from these policies are persuaded to vote for them. To put it differently, an election in a democratic country is a paragon example of a contest in ideas, and if the victorious policies are typically inefficient, this must be explained as the result of a process of competition for the minds (‘and hearts’ as the saying goes) of voters.19 Alternatively, it may be that these policies are in fact good ones, and that economists have been unable to see this. But this simply raises the same
Freedom of speech vs. efficient regulation in markets for ideas
59
difficulty in a different way. For it must then be explained why the set of policies which are successful in the political marketplace (even if rightly so) are unsuccessful in the scientific marketplace for economic ideas. One reason for the difference is that the restrictions on speech imposed in the journals of economics are vastly greater than those in the political marketplace. The speech of economists in professional journals is heavily regulated: one cannot publish an article which makes an assertion unless that assertion is backed up with facts or references, one cannot explain events without articulating a specific model, and so on. If the restrictions on speech in the economists’ marketplace are productive, so that the economists are correct (the policies are bad), then it must be that the triumph of bad ideas in the political marketplace is an instance of adverse selection, that is, of bad ideas driving out good ones. Put differently, bad ideas triumph because the political marketplace is unregulated. There are virtually no rules which regulate the content of political debate.20 There is nothing to prevent a politician from claiming, for example, that his or her policies have had the opposite of their actual effects (e.g., that they have reduced inflation when they have in fact increased it), or that some proposed policies would, if implemented, have the opposite of their likely effects. Assertions may be made without proof or backed up with false or misleading evidence. Virtually anyone can speak and do so at any time. Not only is this generally true in practice in democratic countries, but it is held to characterize an ideal democracy. How can this ideal be justified if it results in a low level of debate and in the implementation of bad policies? Alternatively, why not regulate speech in the political marketplace, nesting the regulatory powers in the constitution? Could it not be that the benefit derived from improvements in the efficiency of the political process would outweigh the costs due to the restrictions on liberty? Even if people value liberty for its own sake, they surely also value the goods and services provided by the political process.21 Finally, as we have seen, other major ideas markets are regulated. How can a ‘corner’ solution – zero restrictions on liberty of speech – be justified in politics? We suggest that freedom of speech in politics can be justified in three ways. The first is that it would only be possible to regulate political speech through constitutional provisions if the regulators could be placed à la Buchanan and Tullock (1962) or à la Rawls (1971) behind a ‘veil of ignorance’ and thus be deprived of all information pertaining to their own interests. Real world constitutions are seldom, if ever, written behind veils of ignorance if only because they have to evolve in response to changing historical realities [see, for example, Gordon (1976)]. Those who write and amend constitutions can in such circumstances, hardly be expected to resist the temptation of altering the constitution and its interpretation in ways favorable to their own interests. Furthermore, constitutions must be enforced; what guarantees can one have that those doing the enforcing will not favor one party to the detriment of the others?
60
Albert Breton and Ronald Wintrobe
The second reason for unrestricted freedom of speech in politics is that in any political jurisdiction, of all the political parties competing for office, only one (or one coalition) can be in office at any one time and only that one, of all those competing, can supply goods and services to citizens [Tullock (1965)]. It follows that competition in political ideas is, in any jurisdiction, the main and in many instances the only form of competition between political parties. Indeed virtually all substitutes to competition in ideas that come readily to mind – stuffing ballot boxes, strong arm tactics and gerrymandering – are not the type of competitive behaviors that characterize a well-functioning political marketplace. To put it differently, regulation of commercial advertising still allows business firms to compete with each other on a more or less equal footing using the great variety of instruments at their disposal (style of products, durability, price, etc). In politics, regulation of speech impedes the use of the only instrument of competition available to non-governing political parties. It goes to the heart of the competitive process itself and works to destroy it. On this ground alone, the case for freedom of speech in politics is stronger than it is in the market for goods. The third reason pertains to the particular relationship that exists between the political marketplace and the other markets for ideas. Judicial rules of evidence, scientific paradigms and advertising codes are or can be controlled by what goes on in the political marketplace. Changes to the judicial rules and to the advertising code are usually the outcome of political decisions made in the political marketplace. Changes in scientific paradigms are not, in democracies, decided directly by the political process, but the allocation of public funds to scientific activity surely is. In addition, the instances in which governments22 have chosen to directly control the evolution of scientific ideas are sufficiently numerous not to require documentation. The outcome of competition ‘for the field’ in science, justice, and advertising is, therefore, influenced or controlled by what goes on in the marketplace for political ideas. Restrictions on free speech, in other words, restrict competition ‘for the field’ in all ideas markets: the danger that comes from restrictions on free speech in politics is simply that they weaken competition in ideas generally. Now, it is well known that the social loss from monopoly varies with the extent to which the substitutes for a good are monopolized. A monopoly in the form of the only restaurant at the corner of Main and Third may give the firm a less than perfectly elastic demand curve, but the social loss from this is likely to be minute. But if all sources of food are monopolized by one firm, the loss from monopoly can be colossal. And the danger is obviously much greater than this when government can control all markets in ideas. That was, of course, Orwell’s vision of 1984 and more or less the same spectre supported Hayek’s (1944) and Arendt’s (1951) analysis of the political process in what Arendt called ‘totalitarian’ regimes. In those admittedly polar representations of autocracy, all ideas are governed by a single ordering device which is ‘the’ ideology of the state: the paradigms of science, the action of the courts, the educational
Freedom of speech vs. efficient regulation in markets for ideas
61
system and all media must conform to that ideology. All others are suppressed. Real world dictatorships never achieve this objective because they can never completely suppress competition in the marketplace for ideas.23 Still, there is an element of danger in the polar model and it is that danger, we suggest, which explains why democracies protect the right of their citizens to free speech. This protection is necessary to ensure competition in all ideas markets.
Conclusion In the foregoing discussion, we have presented a simple model of the regulation of speech in ideas markets. The questions we have addressed pertain to the extent of regulation and to what should be the appropriate regulatory body under different conditions in different ideas markets. Four ideas markets were considered to illustrate how these conditions change: justice, science, advertising and politics. The first two are ‘pure’ ideas markets: only ideas are exchanged (by a process which we have left unspecified) in them. In the second two, ideas are merely the vehicles of goods. Science, justice and politics are all alike, however, in that if information on the buyer’s side is poor (the cost of experimenting is high), adverse selection problems result. This is a prominent feature of some goods markets as well, but by no means all. Where it is important, adverse selection implies that participants in the markets could, with profit, jointly agree to some restrictions on their freedom of speech in order to improve the working of the market. In the courts, these restrictions are called the judicial rules of evidence; in science, the ruling paradigm; and in advertising, the advertising code. In politics, restrictions could be nested in the constitution; the constitution of democratic countries, however, typically forbids that laws be passed which restrict freedom of speech. None of these markets is self-regulating. In each market where restrictions are agreed to, the restrictions themselves are an enforced monopoly. So there must also be regulation to ensure competition ‘for the field’. This competition takes two forms: competition from other ideas markets (as Galileo’s ideas competed with those of the Church, or as economic theory nowadays competes with the modes of thought in sociology or law); this competition is, however, inherently weak. The second form of competition, and the most effective, is competition ‘for the field’ from other practitioners within the same ideas market. The last point leads to our case for freedom of speech. Changes in the judicial rules of evidence, in scientific paradigms, and in advertising codes are collective decisions which are either made in, influenced or implemented by the political process. The political marketplace is therefore unique in that regulation or restrictions on speech in that market would reduce competition in all ideas markets.
62
Albert Breton and Ronald Wintrobe
Notes *
1 2 3
4
5 6 7 8 9 10 11 12 13
14
The ordering of the authors’ names is random. We would like to thank Ronald Coase, Allan Hynes, Stan Liebowitz. John Palmer, Paul Rubin, George Stigler and the seminar participants at the universities of Perugia, Roma (La Sapienza). Torino, Venezia and Western Ontario for helpful comments and suggestions. Any remaining errors are our responsibility. Quoted in Coase (1974, p. 388). The conditions are necessary but, of course, not sufficient; the latter are rather difficult to specify. See Leland (1979). On the difficulties of precisely defining ‘deceptive’ information, see Beales, Carswell, and Salop (1981). Kuran (1987, 1991) assumes that all individuals engage in preference or belief falsification, which he defines as favoring a policy in public that differs from what the individual favors in private. This is appropriate for his purposes, but does not get at the degree or extent of deceptiveness of an idea or statement, central to the problem examined here. An alternative definition of repression is the extent to which any information, including bad or false information, is screened out: in that case the level of repression and the level of regulation are the same. The definition given in the text seems to us to accord better with common usage. Posner (1977, pp. 399ff) takes a similar viewpoint. Both authors are amateurs in the philosophy of science. The account given here has had to rely heavily on that of experts such as Newton-Smith (1981) and Kuhn (1962). Of course, there is nothing ‘non-rational’ about this in the economist’s sense of that term; the term (in Newton-Smith’s nomenclature) pertains to the purely scientific goal of pursuing the truth. Many different definitions of the term ‘paradigm’ are possible. Masterman (1970), in a study of Kuhn’s (1962) work, has counted twenty-one different definitions. The definition we use in the text is sufficient for the purposes of this paper. Kuhn (1962); the title of Chapter 10. For a definition of perfectly contestable markets, see Baumol (1982), especially p. 3. Among economists, Coase (1982) is the only one known to us to have adopted a ‘non-rationalist’ approach. For recent examples, see Kordig (1971) or Newton-Smith (1981). Attempts to suppress anomalies are similar to what psychologists call ‘cognitive dissonance’ – the often observed fact that people are incapable of holding to two seemingly contradictory ideas. Akerlof and Dickens (1982) have discussed the economic consequences of this phenomenon. They do not, however, propose a rational explanation for it, suggesting simply that people have preferences over their beliefs. The present model shows that cognitive dissonance need not be irrational: just as the efficient accumulation of knowledge by the scientific community requires the use of a paradigm, and hence the repression of anomalous facts or observations, so does the efficient processing of information by the human mind. A related issue is whether competition among scientific disciplines results in the optimum division of labour. Coase (1978) suggests that it does. The difficulties of paradigm comparison would, however, seem to be exacerbated when the comparisons are made across disciplines. In addition, there is an obvious barrier to entry – knowledge of a discipline cannot be effected by acquiring the purely logical structure of its central paradigm, if there is one; it is also necessary to master factual knowledge and to hone one’s empirical imagination and intuition. This explains why mathematicians who could learn the mathematics of economics in a
Freedom of speech vs. efficient regulation in markets for ideas
15 16 17 18
19
20
21 22 23
63
matter of hours do not, by so doing, become economists. For the same reason, it is inconceivable that law could be taken over by neoclassical economics, in the way Ptolemaic astronomy was taken over by the Copernican paradigm. Coase also suggests that the recent ‘invasion’ by economics of the ‘contiguous disciplines’ of political science and law is likely to be only partial and limited, and for essentially this reason. Ignoring, of course, the free rider problem, which originates in the fact that even though each scientist has an interest in seeing all scientists self-policing themselves, he or she has no incentive to do so. Carmichael (1988) provides a different theory of academic tenure. For a simple exposition of this point, see Hirshleifer (1980, p. 64). This point is also made by Olson (1973). He argues that, because of the higher cost of experimenting in the public sector – whether to find optimum production techniques or optimum quantities of public goods and services – the public sector is technically less efficient than the private sector. Kuran (1987) provides an explicit model of an ideas market in which agents may engage in deception, and in which the key element is that the relative influence of a particular belief depends on the share of society that asserts it. Multiple equilibria are possible, and there is no presumption that this market works efficiently. He does not, however, consider the question of the effects of regulation on the market. It is true that the Question Period which is the occasion of important competition between governing and non-governing parties in Parliamentary governments is subject to certain rules and procedures, but these relate more to the allocation of time, courtesy and general deportment than to the content of what is said and to the evidence that is admissible. Hayek (1960) suggests that liberty is the only ultimate value. For an effective rebuttal, see Stigler (1978). For some evidence that citizens do not, in fact, appear to value basic political rights very highly, see Noam (1981). When there is no separation between church and state, it is a matter of convention to whom one imputes the responsibility for political control of ideas. Wintrobe (1990) develops a simple model of a totalitarian dictatorship, which suggests that there is a conflict between the dictator’s maximization of power and the perfect repression of the population (even if the latter were possible).
References Akerlof, George A., 1970, The market for ‘lemons’: Qualitative uncertainty and the market mechanism, Quarterly Journal of Economics 84, 488–500. Akerlof, George, A. and William T. Dickens, 1982, The economic consequences of cognitive dissonance, American Economic Review 72, 307–319. Arendt, Hannah, 1951, The origins of totalitarianism (Harcourt, Brace, Jovanovitch, New York). Baumol, William J., 1982. Contestable markets: An uprising in the theory of industry structure, American Economic Review 72, 1–15. Baumol, William J., John Panzar and Robert Willig, 1982, Contestable markets and the theory of industry structure (Harcourt, Brace, Jovanovitch, San Diego). Beales, Howard, Richard Carswell and Steven Salop, 1981, The efficient regulation of consumer information, Journal of Law and Economics 24, 491–540. Bear, D.V.T. and Daniel Orr, 1967, Logic and expediency in economic theorizing, Journal of Political Economy 75, 188–196.
64
Albert Breton and Ronald Wintrobe
Becker, Gary S., 1983, A theory of competition among pressure groups for political influence, Quarterly Journal of Economics 98, 371–400. Breton, Albert, 1964, The economics of nationalism, Journal of Political Economy 72, 376–386. Breton, Albert 1974, The economic theory of representative government (Aldine Publishing, Chicago). Breton, Albert and Ronald Wintrobe, 1986, The bureaucracy of murder revisited, Journal of Political Economy 94, 905–926. Buchanan, James M. and Gordon Tullock, 1980, The calculus of consent (University of Michigan Press, Ann Arbor, MI). Buchanan, James M., Robert Tollison and Gordon Tullock, eds., 1980, Towards a theory of the rent-seeking society (Texas A&M University Press, College Station, TX). Carmichael, H. Lorne, 1988, Incentives in academics: Why is there tenure?, Journal of Political Economy 96, 453–472. Coase, Ronald H., 1974, The economics of the first amendment: The market for goods and the market for ideas, 64 American Economic Review, 384–391. Coase, Ronald H., 1977, Advertising and free speech, 6 Journal of Legal Studies, 1–34. Coase, Ronald H., 1978, Economics and contiguous disciplines, Journal of Legal Studies 6, 201–212. Coase, Ronald H., 1982, How should economists choose? (American Enterprise Institute for Public Policy Research, Washington, DC). Demsetz, Harold, 1968, Why regulate utilities?, Journal of Law and Economics 11, 55–66. Friedman, Milton, 1953, The methodology of positive economics in essays in positive economics (University of Chicago Press, Chicago, IL). Gordon, H. Scott, 1976, The new contractarians, Journal of Political Economy 84, 573–590. Hayek, Friedrich A., 1944, The road to serfdom (University of Chicago Press, Chicago, IL). Hayek, Friedrich A., 1960. The constitution of liberty (University of Chicago Press, Chicago, IL). Hirshleifer, Jack, 1980, Price theory and applications, 2nd edition (Prentice-Hall, Englewood Cliffs, NJ). Keynes, John Neville, 1891, The scope and method of political economy (Macmillan, London). Koopmans, Tjalling C., 1957, Three essays on the state of economic science (McGraw Hill, New York). Kordig, R., 1971, The justification of scientific change (D. Reidel Publishing Co., Dordrecht). Kuhn, Thomas S., 1962, The structure of scientific revolutions, 2nd edition (University of Chicago Press, Chicago, IL). Kuran, Timur, 1987, Preference falsification, policy continuity, and collective conservatism, Economic Journal 97, 642–665. Kuran, T., 1991, The role of deception in political competition, in: Albert Breton, Gianluigi Galeotti, Pierre Salmon and Ronald Wintrobe (eds.), The competitive state: Colombella papers on competitive politics (Kluwer Academic Press).
Freedom of speech vs. efficient regulation in markets for ideas
65
Lakatos, Imre, 1978, The methodology of scientific research programmes (Cambridge University Press, Cambridge). Leland, Hayne E., 1979, Quacks, lemons, and licensing: A theory of minimum quality standards, Journal of Political Economy 87, 1328–1346. Machlup, Fritz, 1955, The problem of verification in economics, Southern Economic Journal 22, 1–21. Marshall, Alfred, 1890, 1952, Principles of economics, 8th edition (Macmillan, New York). Masterman, M., 1970, The nature of a paradigm, in: Imre Lakatos and A. Musgrave (eds.), Criticism and the growth of knowledge (Cambridge University Press, Cambridge). Melitz, Jack, 1965, Friedman and Machlup on the significance of testing economic assumptions, Journal of Political Economy 73, 37–60. Nelson, Phillip, 1970, Information and consumer behaviour, Journal of Political Economy 78, 311–329. Nelson, Phillip, 1974, Advertising as information, Journal of Political Economy 82, 729–754. Newton-Smith, W.H., 1981, The rationality of science (Routledge and Kegan Paul, Boston, MA). Noam, Eli, 1981, The valuation of legal rights, Quarterly Journal of Economics 96, 465–476. Olson, Mancur, Jr., 1973, Evaluating performance in the public sector, and reply, in: M. Moss (ed.), The measurement of economic and social performance (National Bureau of Economic Research, New York: Columbia University Press, 1973). Peltzman, Sam, 1976, Toward a more general theory of regulation, The Journal of Law and Economics 14, 211–240. Popper, Karl, 1968, The logic of scientific discovery (London: Hutcheson). Posner, Richard, 1977, Economic analysis of law, 2nd edition (Little, Brown, Boston, MA). Rawls, John, 1959, A theory of justice (Harvard University Press, Cambridge, MA). Rotwein, Eugene, 1959, On ‘the methodology of positive economics’, Quarterly Journal of Economics 73, 554–575. Samuelson, Paul A., 1964, The general theory, in: R. Lekachman (ed.), Keynes’ general theory: Reports of three decades (St. Martin’s Press, New York). Samuelson, Paul A., 1963, 1966, Comment on Ernest Nagel’s ‘assumptions in economic theory’, American Economic Review 53, reprinted in J.E. Stiglitz (ed.), The collected scientific papers of Paul A. Samuelson (MIT Press, Cambridge, MA). Stigler, George J., 1971, The theory of economic regulation, The Bell Journal of Economics and Management Science 2, 3–21. Stigler, George J., 1972, Economic competition and political competition, Public Choice 13, 91–106. Stigler, George J., 1975, The citizen and the state (University of Chicago Press, Chicago, IL). Stigler, George J., 1978, Wealth, and possibly liberty, The Journal of Legal Studies 2, 213–217. Stiglitz, Joseph E., 1975, The theory of ‘screening’, education, and the distribution of income, American Economic Review 65, 283–300. Tullock, Gordon, 1965, Entry barriers in politics, American Economic Review 60, 458–466.
66
Albert Breton and Ronald Wintrobe
Wicksell, Knut, 1901, 1934, Lectures on political economy (George Routledge and Sons, London). Wintrobe, Ronald, 1987, The efficiency of the Soviet system of production, European Journal of Political Economy, extra issue. Wintrobe, Ronald, 1990, The totalitarian and the tinpot: A simple economic theory of dictatorship, American Political Science Review 84, 849–872.
5
A free press is bad news for corruption Aymo Brunetti and Beatrice Weder
1. Introduction Freedom of speech and a free press are generally considered important human rights and powerful controls against government malfeasance. An independent press is probably one of the most effective institutions to uncover trespassing by government officials. The reason is that any independent journalist has a strong incentive to investigate and uncover stories on wrongdoing. Countries with a free press should, therefore, ceteris paribus, have less corruption than countries where the press is controlled and censored. This paper presents an empirical evaluation of this proposition. The paper is motivated by recent research which has shown that higher corruption is detrimental to economic performance and which has led to a general acceptance that corruption is one of the central issues in development policy.1 This has given rise to the question of how differences in corruption across countries can be explained and a few recent studies started to explore possible determinants of corruption in small country samples. This paper argues that of the probable controls on bureaucratic corruption a free press is likely to be among the most effective ones – a proposition that is supported in an empirical analysis of a large cross-section of countries as well as by evidence from time series. The paper is organized as follows. Section 2 reviews possible determinants of corruption discussed in the literature and explains how press freedom could affect corruption. Section 3 describes the main data, the specifications estimated and the econometric approach. Section 4 shows cross-section empirical results for a range of specifications. Section 5 tests the sensitivity of results to alternative measures of corruption and press freedom. Section 6 presents some panel data evidence and Section 7 concludes.
2. Determinants of corruption and the role of press freedom In theory the incidence of corruption can be explained by three types of determinants, which also reflect the different approaches in the literature. The
68
Aymo Brunetti and Beatrice Weder
first focuses on the role of internal mechanisms and incentives within the bureaucracy in controlling corruption. A second branch stresses the role of external mechanisms in checking corruption, such as an independent judiciary or watch body. Finally, the third branch argues that corruption can be explained by more indirect factors, such as culture or the level of rents that can be appropriated. In this categorization press freedom is an external control mechanism on corruption. Before discussing the role of press freedom we briefly summarize the potential other determinants of corruption in each of the three categories. This motivates our choice of control variables in the specifications used in the empirical analysis. 2.1. Determinants of corruption Internal controls include all systems and incentives that control corruption within the bureaucracy. Corruption tends to be high in an administrative environment where there is a lack of explicit standards of performance which are strictly enforced and in an environment where the individual bureaucrat is poorly supervised. Rauch and Evans (2000) argue that an important aspect of internal control is whether the recruitment and promotion process in the bureaucracy is based on meritocracy or on nepotism. Less nepotism tends to reduce the probability that internal control is eliminated by collusion among bureaucrats. Rauch and Evans (2000) test this point by constructing an index of meritocratic recruitment and promotion and showing that it is significantly associated with corruption in a sample of less developed countries. Van Rijckeghem and Weder (2001) argue that a low level of public sector wages compared to wages in the private sector increases the incentives for bureaucrats to accept bribes. In an empirical test of this hypothesis they find a negative relationship between the level of public sector wages and corruption for a sample of less developed countries. External control of corruption is exercised by individuals or organizations outside the administration. In a working system of checks and balances this is mainly performed by the judiciary power. A court system where corrupt bureaucrats can be easily and effectively sued sharply reduces the potential rewards of corruption. In countries with less-developed checks and balances other parts of the society can play the role of external controller. Rahman (1986) describes such a mechanism in the case of Singapore where citizens committees were established which enable citizens to vent their grievances and seek redress. An empirical analysis of the effects of external control on corruption is difficult since there are few convincing empirical measures for cross-country differences of the power of external control mechanisms. Empirical studies such as Ades and Di Tella (1999) therefore use rather indirect measures such as the general level of development and education to capture the ability of the civil society to control government performance. We argue that a free press is another potentially powerful external control on corruption.
A free press is bad news for corruption
69
Finally, indirect determinants of corruption identified in the literature include culture and the level of distortions in the economy. Lee (1986) for instance suggests that a culture of bureaucratic elitism may lead to a disassociation of civil servants with the rest of society and breed corruption. Tanzi (1994) argues that the absence of a culture of arms-length relationships may lead to corruption becoming systemic. Shleifer and Vishny (1993) suggest that more ethnically diverse countries are prone to particularly harmful forms of corruption. In an empirical study Mauro (1995) indeed finds evidence of a positive relationship between ethnolinguistic fractionalization and corruption. A second indirect determinant of corruption are distortive policies. Tanzi (1994) for example suggests that government interventions in free markets create rents and lead to a sharp rise in corruption payments. Kaufman (1997) tests the relationship between an indicator of regulatory discretion and corruption and finds a strong correlation in a small sample of developing countries. Ades and Di Tella (1999) generalize this argument by pointing out that monopolistic powers of bureaucrats are an important precondition for the occurrence of corruption. They also provide empirical cross-country evidence that more competition is associated with less corruption. 2.2. Press freedom and corruption This paper focuses on a particular mechanism of external control, namely, press freedom. A free press is potentially a highly effective mechanism of external control on corruption because it works not only against extortive but also against collusive corruption. Extortive corruption means that the government official has the discretionary power to refuse or delay a service (say a business license or the approval of a new construction project) in order to extract a rent from the private agent in the form of a bribe. Hindricks et al. (1999) describe extortion in a model analyzing mechanisms against tax evasion and they argue that this is a particularly serious form of corruption, Klitgaard (1988) provides drastic examples in his case study of the corruption structures in the Philippine tax system. He describes how taxpayers were extorted simply to receive the treatment legally due to them. Different strategies were applied by dishonest bureaucrats to extort taxpayers. For example, the tax inspector would assess an unrealistically high payment on the taxpayer. In the Philippine legal framework it was very costly and time-consuming to appeal and in addition the taxpayer in many cases was not sure, what the correct due really was. The tax assessor could take advantage of this situation by extorting a payment in exchange for the correct assessment. Faced with such blackmailing the taxpayer has the option of either paying the bribe or complaining to a higher official or the judiciary, i.e., use channels of internal or external control. By fighting extortion the private agents help in limiting corruption. This is rather likely to occur as the private agent has a
70
Aymo Brunetti and Beatrice Weder
strong incentive to do something against this kind of corruption. However, if the costs of appealing are very high which means that the formal mechanisms of internal and external control are not working well, the taxpayer might in fact be better off by surrendering to the extortion. A free and active press constitutes an additional channel of external control which can substantially reduce the costs of fighting extortive corruption. A firm can reveal (or credibly threaten to reveal) the bureaucrat’s behavior to a journalist and the (potential) media reports will raise the costs for the bureaucrat as the probability of being detected and punished is increased. In particular it will be much harder for an administration to keep up ineffective internal control mechanisms such as low penalties or ineffective external control mechanisms such as a weak judiciary if this fact is likely to be regularly reported in the media. For extortive corruption the press, therefore, reinforces the reaction possibilities of the private sector by providing a platform for voicing complaints. The second form, collusive corruption has been rather extensively treated in the theoretical literature on tax evasion (see, e.g., Besley and McLaren, 1993 or Flatters and MacLeod, 1995). With collusive corruption the incentives are somewhat different than with extortion. The official again has some discretionary power in her application of rules. Take for instance a customs inspector who has information about the value of a firm’s imports. For a ‘fee’ she could make a deal with the firm’s management to reduce the overall tariff liability of the company. This is unlikely to be detected by other officials – unless there are very good internal control systems in place – and the firm also has no incentive to report the corruption. Again, Klitgaard (1988) describes an example of this form of corruption in the Philippine tax system; there this kind of corruption is called arreglo (arrangement). In a typical case the taxpayer would submit a return with understated income or too many deductions. If the tax collector discovered these ‘errors’, arreglo frequently occurred. The taxpayer would for example pay half of the correct taxes and of the other half two-thirds would be paid as a bribe to the tax collector whereas the taxpayer could keep the rest. In this form of corruption the private agent cooperates in the corrupt act and always pays the bribe. Obviously, this is much more difficult to fight than extortive corruption as the arrangement is beneficial for both, the bureaucrat and the firm and they will do everything to hide it. In contrast to extortive corruption the private agent can, therefore, not be trusted to help fighting this illegal action. A free press is probably the most effective institution to control collusive corruption. Independent journalists have incentives to actively investigate any wrongdoing. Other outside bodies, such as the judiciary or even watchdog bodies such as anti-corruption commissions may be less effective, unless their internal incentive structures are closely aligned with the goal of discovering and prosecuting corruption. There is a substantial danger that internal control agencies will be included in the arrangements and get a share of the pie. It may be much harder for the architects of corrupt
A free press is bad news for corruption
71
arrangements to apply the same strategy to journalists. If the press is free and competitive it might be possible to buy some journalists but this only increases the incentives for other journalists to detect such arrangements and publicize this. The more involved a corrupt arrangement the more fame an investigative journalist can earn by uncovering it. As long as there is free entry in journalism and in publishing – which is one of the defining features of a free press – it will be difficult to form an effective cartel which encompasses all journalists. Therefore, in theory press freedom can help to fight extortive corruption and may be a particularly effective institution to fight collusive corruption where client and bureaucrat have a mutual interest in the corrupt act.
3. Data and empirical strategy This section presents the data on press freedom and corruption, derives the specifications tested in the empirical analysis, and explains the econometric methodology used.2 3.1. Measure of press freedom Our main measure of press freedom is assembled by Freedom House, a think tank that is known for having published widely used indexes for political rights and civil liberties for the last 25 years. Since 1996 Freedom House in addition has compiled expanded indices of press freedom for 145 countries based on experts’ opinions, findings of international human rights groups and press organizations, analysis of publications and news services and reports of governments on related subjects. The motivation for collecting this kind of information derives from Article 19 of the Universal Declaration of Human Rights, which postulates that “everyone should have the right to freedom of opinion and expression.” The idea is to gain a comprehensive assessment of press freedom by not focusing exclusively on actual incidents of censorship like for instance arrests or assassinations of journalists but on the overall structure of the news delivery system. Therefore, several dimensions of potential violations of press freedom are evaluated.3 1
2
Laws and regulations that influence media content reflects “our judgment of the degree of actual impact on press freedom (. . .), not simply the ceremonial commitment to press freedom.” For instance, “if private broadcast media are owned by government with no dissent allowed, the rating will be 15 (i.e., the worst score)” but if “a government that owns all broadcast media may permit widely pluralist ideas, even active dissent from government positions” then the rating will be more favorable. Political influence over media content captures “political pressure on the content of both privately owned and government media and takes into
72 Aymo Brunetti and Beatrice Weder
3
4
account the day-to-day conditions in which journalist work.” It also includes “threats from organized crime” which may lead to selfcensorship. Economic influence over media content: reflects “competitive pressures in the private sector that distort reportage as well as economic favoritism or reprisals by government for unwanted press coverage.” Repressive actions: measures actual acts which constitute violations of press freedom. For instance, arrests, murders or suspensions of journalists, physical violence against journalists or facilities, self-censorship, arrests, harassment, expulsion, etc.
All four categories are rated for both, the freedom of print media and the freedom of broadcast media. In the first three categories countries are rated on a scale from 0 to 15 (0 meaning absence of violation of press freedom and 15 meaning the highest degree of such violation). The fourth category is rated from 0 to 5. The overall measure of press freedom is the sum of these eight subcomponents, and ranges from 0 (equals total freedom of the press) to 100 (equals the highest degree of violation of press freedom). For instance in 1997 Norway with a rating of 5 was the country with the freest press. Most OECD countries are in the quintile with the best ratings. At the bottom end we find North Korea, Iraq, and Burma with a rating of 100 each. Many of the countries of the Former Soviet Union fall in the fourth quintile together with a few of the East Asian tigers. Overall the world wide picture is one of rather high degrees of violations of press freedom; the worldwide average of the index is 46 points.4 3.2. Measures of corruption Measuring corruption is obviously tricky because of its illegal nature. It is further complicated by a wide range of definitions of a corrupt act. And finally, there seem to be many different expressions of corruption, which range from routine “tips” and “speed money” to complicated schemes of favors between businessmen and civil servants. Data on corruption levels across countries are available from various sources, which are all based on surveys of experts or entrepreneurs. Our main measure of corruption is an indicator collected by the International Country Risk Guide (ICRG). This firm produces annual ratings of corruption levels by using surveys among country experts. The indicator ranges from 0 to 6. A low score means that “high government officials are likely to demand special payments” and “illegal payments are generally expected throughout lower levels” in the form of “bribes connected with import and export licenses, exchange controls, tax assessment, policy protection, or loans.” 5 Of all such country risk services ICRG covers by far the largest number of countries. We use the average for 1994–1998 which is available for 128 countries. In addition we use corruption measures from three alternative
A free press is bad news for corruption
73
sources (World Bank, Institute for Management Development and Transparency International) which are discussed below. 3.3. Specification As noted above, the theoretical and empirical literature have identified a number of determinants of corruption. On the one hand there are direct internal and external control mechanisms. On the other hand there are more indirect determinants such as distortions and sociological determinants of higher corruption. This suggests that estimates of corruption should at least include proxies for the direct control mechanisms which leads to our following preferred specification: CORRi = β0 + β1 PRESSi + β2 BUREAUi + β3 RULEi + εi.
(1)
We are mainly interested in the sign and the significance of β1 which we expect to be significantly negative since a higher value of CORR means less corruption and a higher value of PRESS means less press freedom. The indicator of internal control BUREAU is a measure of the quality of the bureaucracy. The variable is provided annually for a large number of countries by ICRG based on evaluations from country experts.6 It indicates the degree of “autonomy [of the bureaucracy] from political pressure” the “strength and expertise to govern without drastic changes in policy or interruptions in government services” as well as the existence of “established mechanisms for recruiting and training.” A higher value of this indicator means better quality of the bureaucracy so that we expect β2 to be positive. RULE is our measure of external control and is also provided by ICRG. The indicator marks the presence of “sound political institutions, a strong court system, and provisions for an orderly succession of power” and reflects the degree to which “citizens of a country are willing to accept the established institutions to make and implement laws and adjudicate disputes.” This broad measure covers not only external control through checks and balances but also the degree to which citizens exercise control power. The higher this indicator the better established is the rule of law so that we expect β3 to be positive. These variables are available for a large number of countries, which allows us to estimate this specification even for corruption measures where we have relatively small country samples. In addition, we test a second, broad specification which includes a number of the other potentially relevant determinants of corruption discussed above: CORRi = β0 + β1 PRESSi + β2 BUREAUi + β3 RULEi + β4 GDPi + β5 HUMCAPi + β6 TRADEi + β7 BLACKi + β8 ETHNICi + εi.
(2)
74
Aymo Brunetti and Beatrice Weder
This specification includes two more proxies for external control, two proxies for distortions and one proxy for cultural factors. GDP and HUMCAP measure the level of per capita GDP in 1995 (calculated at purchasing power parity), and the educational attainment. Both of these variables proxy for external controls since the ability of civil society to judge government performance and act as an external control on corruption in the administration tends to increase with the level of development (see, e.g., Ades and Di Tella, 1999). We therefore expect β4 and β5 to be positive. TRADE and BLACK are proxies for distortions and restrictions of competition in an economy. TRADE measures the exposure of an economy to foreign trade and is defined as the sum of exports and imports as a percentage of GDP. For instance Ades and Di Tella (1999) argue that open countries are subject to larger competitive pressure that in turn reduces monopolistic rents and thus corruption. β6 is therefore expected to be positive. BLACK measures the black market premium on foreign exchange and is a broad indicator of the degree of government-created distortions in an economy and we expect β7 to be negative. Finally ETHNIC measures the degree of ethnolinguistic diversity which is a proxy for the cultural background of a country. Mauro (1995) has found a positive correlation between this variable and corruption and we expect β8 to be negative. This second specification captures almost all relevant variables discussed in Section 2. 3.4. Econometric methodology The dependent variable is the corruption measure from ICRG. We use a short average of this index (1994–1998) in order to match the timing with our main measure of press freedom while avoiding shocks which would be particular to 1 year (e.g., a financial crisis) and which might affect corruption ratings that are based on subjective evaluations. The dependent variable is therefore a continuous variable and this allows estimates using ordinary least squares. An alternative strategy would be to use the last available value of corruption as dependent variable. In this case the dependent variable can only take on integer values and the ordered probit model is the appropriate method. In the econometric analysis we also show estimates for an ordered probit model for corruption values in a single year. We conduct tests to determine whether the relationship between press freedom and corruption is driven by outliers or by the difference between developed and less developed countries. We test extensively for robustness to measurement of both the corruption and the press freedom variables. White’s test for heteroskedasticity in the residuals of the basic specification rejects the null of no heteroskedasticity, thus all standard errors of coefficients are calculated using White (1980) correction. A potential criticism is that press freedom may be endogenous since corrupt regimes may tend to limit press freedom. In theory this effect should only be relevant where corruption is systemic, i.e., where all journalists
A free press is bad news for corruption
75
participate in the revenues from corruption or where there is a political machine and extreme repression. In all other cases the causality is likely to run from press freedom to corruption rather than the other way around. The reason is, that journalists have incentives to uncover corruption and it is unlikely that all journalists can be brought to cooperate in corrupt arrangements. Such a cartel would be difficult to sustain in an environment with many independent journalists and high reputational profits from uncovering corrupt arrangements. On the empirical level we address this issue of endogeneity in three ways. First we exclude the countries with highly repressive regimes from the sample. Second we use several instruments for press freedom and finally we exploit the time series dimension of alternative data set on press freedom. It should be said from the beginning that we cannot fully resolve the issue of causality since we have only imperfect instruments and little time series variation to work with.
4. The cross-section results Figure 5.1 shows a scatterplot that illustrates the close relationship between corruption and press freedom. Recall that the negative relationship observed
Figure 5.1 Corruption and press freedom. Note: corruption index ranges from 0 (highest corruption) to 6 (lowest corruption), index of press freedom ranges from 0 (highest press freedom) to 100 (lowest press freedom).
76 Aymo Brunetti and Beatrice Weder in the figure implies that less corruption is associated with more press freedom. Inspection of the raw data suggests that there exist no important outliers. Table 5.1 presents the cross-section results. The dependent variable is the average corruption level from 1994 to 1998. Column (1) shows the result for the base specification which includes 125 countries. The coefficient of the indicator of press freedom has the expected negative sign and is significant at conventional confidence levels. The coefficients of both control variables have a positive sign and are significant at the 1 percent level. This indicates that, as expected, corruption is lower in countries with a well-working bureaucracy and in countries with a well-established rule of law. The base specification explains two-thirds of the variation in the corruption levels between countries. Column (2) documents that the significant relationship between press freedom and corruption is not driven by the differences between developed and less developed countries alone. This regression estimates the base specification for a sample containing only less developed (non-OECD) countries. The coefficient of PRESS is significant and of the expected sign. BUREAU remains significantly related to corruption whereas the coefficient of RULE is not significant in this regression. Compared to regression (1) the adjusted R 2 drops sharply from 0.67 to 0.38 indicating that including the developed countries in the sample improves the fit of the regression. To exclude possible outliers we restricted the sample to observations with residuals plus minus two standard deviations (results not reported). The results are not affected (the coefficient of press freedom is 0.017 with a t statistic of 5.4). Column (3) shows the results of a two-stage least-square estimation instrumenting with the level of political rights.7 The results of the estimates remain largely unchanged compared with the previous estimates the size of the coefficient on press freedom is larger.8 Column (4) includes other variables that could impact on corruption. The coefficient PRESS is again significant at conventional confidence levels and the coefficients of BUREAU and RULE remain positive and significant. The coefficient of the logarithm of GDP per capita has the expected positive sign whereas the measure of human capital is negatively related with corruption. Both coefficients are however, insignificant. The coefficients of the two variables that proxy for distortions in the economy are both insignificant. Whereas TRADE has the expected positive sign, the positive coefficient of BLACK is unexpected. Finally the proxy for the degree of cultural fractionalization ETHNIC has the expected negative sign but is insignificant. Adding these explanatory variables only marginally improves the fit of the regression (the adjusted R 2 increases from 0.67 to 0.74) but comes at the cost of reducing the sample size significantly. Column (5) reduces the sample to the less developed countries and shows that the relationship between press freedom and corruption holds also for within this set of countries in the extended specification.
125 0.67
2.560 (10.508) −0.017 (−6.350) 0.220 (2.893) 0.265 (3.482)
93 0.38
2.614 (10.516) −0.015 (−4.789) 0.254 (2.708) 0.146 (1.624)
(2) OLS (LDCs only)
104 0.67
3.392 (5.003) −0.028 (−3.266) 0.221 (2.310) 0.143 (1.527)
(3) TSLS
68 0.74
1.945 (1.721) −0.017 (−4.023) 0.200 (2.058) 0.259 (2.583) 0.104 (0.681) −0.043 (−1.007) 0.002 (1.103) 0.001 (1.882) −0.246 (−0.690)
(4) OLS
47 0.38
1.506 (1.260) −0.015 (−3.501) 0.128 (1.220) 0.068 (0.607) 0.226 (1.358) −0.085 (−1.562) 0.004 (2.091) 0.001 (1.288) −0.053 (−0.154)
(5) OLS (LDCs only)
Note: t Statistics in parentheses; White-corrected standard errors; political rights as instrument in Columns (3) and (7).
Observations Adjusted R 2
OECD
LATIN
AFRICA
ETHNIC
BLACK
TRADE
HUMCAP
log(GDP)
RULE
BUREAU
PRESS
Constant
(1) OLS
Table 5.1 Dependent variable: average corruption from 1994 to 1998
2.946 (2.180) −0.020 (−4.439) 0.089 (0.942) 0.154 (1.530) 0.107 (0.538) −0.052 (−1.058) 0.003 (1.358) 0.001 (1.350) −0.457 (−1.170) −0.142 (−0.521) −0.563 (−2.298) 0.419 (0.983) 68 0.77
(6) OLS
4.139 (1.867) −0.037 (−1.926) 0.073 (0.663) 0.044 (0.0251) 0.127 (0.523) −0.064 (−1.088) 0.003 (1.367) 0.001 (0.730) −0.410 (−1.021) −0.102 (−0.252) −0.857 (−2.530) 0.075 (0.150) 68 0.72
(7) TSLS
78
Aymo Brunetti and Beatrice Weder
Column (6) shows the results of another sample test; it checks whether there are differences among continents that are not captured in the explanatory variables used in the broad specification. The result on press freedom is unchanged and only the Latin American continent dummy is significant. Finally, Column (7) shows the two-stage least-square estimates for this specification. Again the results are largely unchanged, although the significance of the indicator of press freedom drops somewhat. Given that all other control variables are insignificant, and that sample size is a consideration in the following sections, we will estimate the base specification in the remainder of this paper. Overall the results presented in Table 5.1 indicate that there is a close relationship between press freedom and corruption which is robust to specification and sample variation. Furthermore, the effect of press freedom on corruption is sizable. The absolute value of the coefficient varies between 0.017 and 0.028 for the full sample of countries. This means that an improvement of 46 points in the press freedom indicator (that is a move of the average country to full press freedom) could reduce corruption by about 1 point.9 With the mean corruption at 3.4 (on a scale from 0 (highest) to 6 (lowest) corruption) this implies that a complete move to press freedom would lead to a dramatic reduction of corruption in the average country. This does not include any indirect effects that higher press freedom might have through improving other bureaucratic controls and other external controls. We conducted additional tests of sensitivity to specification not reported in the tables. We included in the base specification one by one, inflation, government consumption, a different measure of openness (provided by Sachs and Warner, 1995) and a different measure of human capital (secondary school enrollment) as further determinants of corruption. However, these variables were insignificant and the result on press freedom was unchanged.10 We substituted BUREAU with more direct measures of internal controls, namely an index of meritocracy and of relative salaries between civil servants and private sector employees (both provided by Rauch and Evans (2000)). Both indicators are only available for 35 less developed countries and the results are similar to those obtained in Table 5.1, Column (2), for the sample of developing countries. We now turn to an alternative estimation model. The corruption measure used in Table 5.1 is a continuous variable by virtue of averaging 4 years of an ordered variable that can only take on discrete values between 0 and 6. However, one could argue that averaging does not really change the discrete nature of the measure which could be interpreted simply as an ordering. This interpretation would be pertinent if the indicators were based on country rankings rather than numerical ratings, but most expert surveys provide explicit numerical ratings with the understanding that they are linear. Supposedly it is for this reason that the literature on determinants of corruption has so far treated the corruption indicators as a continuous variable and has used OLS estimates. However, under the interpretation of an ordered
A free press is bad news for corruption
79
variable, OLS would not be appropriate since this method treats the difference from one value to the next equally, although it is only a ranking. To address this issue, and test its influence on the results, in Table 5.2 we show results of ordered probit estimates of the basic specification using the corruption value for one single year. The equations show that press has the expected negative sign and the z value indicates that it is significant at conventional levels. The same applies to the other two control variables. The interpretation of coefficients in the ordered probit model is not straightforward (see Greene, 1997), therefore in the bottom panel we show the marginal effects of the changes in the regressors. Since in all classes the marginal effects are evaluated at the mean of the respective explanatory variable the sign must change when moving from lower to higher classes. Take for instance a large value of PRESS. If our reasoning is correct this should reduce the probability for a country to have high corruption ratings (e.g., classes 1 or 2) and vice versa for small values of PRESS. The signs of the marginal effects displayed in Table 5.2 show that this is indeed the case as these effects are negative for the low classes of corruption and switch to positive in the high classes. The signs of the marginal effects are as expected for the other two control variables as well. In the remainder of the paper we only report the OLS results because they can be interpreted more readily. Using ordered probit estimations does not alter any results. Next we address possible concerns about endogeneity. Table 5.3 presents the results from a sample test and estimates with alternative instruments for press freedom. Recall from above that the danger of reverse causality should be largest in authoritarian and repressive regimes (that also happen to be corrupt) where the press would be stifled. Column (1) demonstrates that the relationship between press freedom and corruption exists also for the sample of countries that cannot be classified as repressive. The estimate includes the controls of the base specification (not shown) but excludes all countries that are classified as authoritarian, according to Gastil (1989). The number of observations drops substantially but the results on PRESS are not affected.11 Alternatively by restricting the sample to medium and low corruption countries we obtain the same results. Columns (2) and (3) are variants of the TSLS estimates in Table 5.1. Column (2) shows the results using the Jaggers and Gurr (1996) democracy measure as an instrument. In Column (3) press freedom is instrumented with a proxy for European influence, the fraction of protestants in the population and the fraction of the population that speaks a European language as their first language.12 The results are similar to those obtained earlier.13
5. Sensitivity of the results to measurement The last section has shown results for specific measures of corruption and press freedom. This raises the question whether these results are characteristics
−0.0169 0.3504 0.3712
PRESS BUREAU RULE
127 0.28
0.0050 0.1162 0.1177
Std. error −3.3824 3.0162 3.1546
z statistic
PRESS BUREAU RULE
−0.000012 0.00024 0.00025
−0.00023 0.00485 0.00514
−0.00181 0.03750 0.03973
Marginal effects (evaluated at the mean of each variable) Variable Class 1 Class 2 Class 3
Observations: Likelihood ratio index:
Coefficient
Variable
−0.00467 0.09691 0.10267
Class 4
Table 5.2 Ordered probit estimation; dependent variable corruption in 1995
0.00348 −0.07220 −0.07650
Class 5
0.00295 −0.06118 −0.06481
Class 6
0.00030 −0.00613 −0.00649
Class 7
A free press is bad news for corruption
81
Table 5.3 Sample tests and two-stage least-square estimates using various instrumentsa
PRESS Adjusted R 2 Number of observations
(1) OLS excluding repressive regimesb
(2) TSLS instr: democracy
(3) TSLS instr: fraction of protestants, fraction of European language
−0.017 (−3.072)
−0.016 (−3.109)
−0.012 (−2.135)
0.72 62
0.69 113
0.46 66
Notes: t Statistics in parentheses; White-corrected standard errors. a All estimates include a constant as well as BUREAU and RULE as additional control variables (not shown). b Defined as an average a value of the Gastil political rights index of higher than 5 (on the scale from 1 to 7 = most authoritarian).
of this data since both corruption and press freedom are not easily observable. In this section we use alternative corruption measures and alternative measures of press freedom in order to address this question. 5.1. Alternative corruption measures We test the robustness of the results by using three other measures of corruption. All of them are based on surveys and they cover a smaller number of countries than the measure by ICRG. The first is a measure of corruption for 1997 based on a firm level survey done for the World Bank’s World Development Report 1997.14 The second corruption indicator is from the annual business executive’s survey done by the Institute for Management Development for the World Competitiveness Report. The last indicator is from Transparency International and is based on a “poll of polls,” an average of about five different corruption indices. We use the 1998 value of this indicator. Table 5.4 shows regression results for each of these corruption measures. In all cases the indicator of press freedom has the expected sign and is significant at the 10 percent level in the estimates of two out of the three regressions. It is not significant in the regressions at the corruption indicator provided by IMD. This seems partly to be due to sample composition since IMD includes mostly industrialized countries with high ratings in BUREAU and RULE but little variation among themselves. 5.2. Alternative measures of press freedom As a next step we check whether the results are confirmed by an alternative measure of press independence collected by Humana (1992) for the World
82
Aymo Brunetti and Beatrice Weder
Table 5.4 Testing the effects of press freedom for alternative corruption measuresa Dep. variable: PRESS
Corr-IMD 0.008 (0.364)
Observations Adj. R 2
46 0.67
Corr-WB −0.010 (−1.738) 55 0.42
Corr-TI −0.028 (−2.667) 78 0.71
Notes: t statistics in parentheses; White-corrected standard errors. a All estimates include a constant as well as BUREAU and RULE as additional control variables (not shown).
Table 5.5 Testing alternative measures of press freedom (dependent variable: average corruption from 1994 to 1998)a Measure for press freedom Press freedom measure:
Overall press Censorship Independence Independence Independence freedom (lack of) newspapers book publishing broadcasting 0.052 (2.915)
Observations 97 Adj. R 2 0.63
0.257 (2.970)
0.118 (1.650)
0.228 (2.348)
0.154 (2.087)
98 0.63
98 0.60
98 0.62
98 0.61
Notes: t statistics in parentheses; White-corrected standard deviations. a All estimates include a constant as well as BUREAU and RULE as additional control variables (not shown).
Human Rights Guide. This guide contains five indicators related to media freedom, all ranging from 0 (least freedom) to 3 (most freedom). The indicators measure political censorship of the press, independence of newspapers, independence of book publishing, independence of radio and television networks and the possibility to teach ideas and receive information all for the year 1991. Table 5.5 shows the results of the basic specification for the overall indicator of press freedom as well as for individual subcomponents. The indicator of overall press freedom is the sum of the sub-indicators; it is scored from 0 (no press freedom) and 15 (highest press freedom); therefore we expect a positive relation with corruption. The overall indicator has the expected sign and is highly significant. The results for the individual subcomponents give more detailed information on the relationship. They are all significant, however, the sizes of the coefficient indicates that lack of censorship may be the strongest curb on corruption. Interestingly, the relationship between independent newspapers and broadcasting with low corruption is less strong. Independence of newspapers is just significant at the 10 percent level. In economic terms, the effect of press
A free press is bad news for corruption
83
freedom on corruption is somewhat smaller than the one found earlier: here a one standard deviation improvement in overall press freedom reduces corruption by 0.25 points. Overall the results indicate that the relationship between press freedom and corruption is robust to alternative measurements.
6. Panel data evidence This subsection presents some panel data evidence using a 5-year panel with the Humana press indicator (which is available for three individual years) and a yearly panel for the Freedom House press indicator (1996–1999). Both panels use the ICRG corruption index, the only one that offers time series. It is worth noting that by and large the corruption data varies more across countries than over time, which is probably due to the fact that changes in corruption levels within a country are difficult to detect and may take a long time. The within country variation is only 13 percent of the total variation. Therefore, much of the research on determinants of corruption has focused on the cross-section. In Table 5.6 we present results from both panels. Columns (1) and (2) use the Humana index and Columns (3) and (4) the Freedom House press indicator. Table 5.6 Panel data evidencea Dep. variable:
PRESSb (Humana) PRESSc
(1) CorrICRG 5-year averages 1982–1995 0.028 (2.333)
(2) CorrICRG 5-year averages 1982–1995 0.027 (2.421) −0.945 (−2.747)
LOG (GDP)
(3) CorrICRG yearly panel 1996– 1998
(4) Difference in log CorrICRG, yearly panel 1996– 1998
−0.004 (−3.906)
−0.092 (−1.909)
No 491 125 0.005
Lagged LOG (Corr-ICRG) Fixed Effects
Yes
Yes
0.861 (38.029) No
Observations Number of countries Adj. R 2
155 70 0.84
150 70 0.86
497 125 0.85
Notes: t statistics in parentheses; White-corrected standard errors. a Columns (1) and (2) include BUREAU and RULE and Columns (3) and (4) a constant (not shown). b Measured at the beginning of the period (i.e., 82, 86 and 92). Expected sign for press indicator from Humana is positive. c Using the 1 year lagged value in Column (3) and the log difference in Column (4). Expected sign for press indicator from Freedom House is negative.
84
Aymo Brunetti and Beatrice Weder
Column (1) includes country fixed effects in the base specification. We find that the coefficients of press freedom are significant and of the expected sign (positive for the Humana Press index and negative for the Freedom House indicator) though smaller than in the cross-section estimates. The Humana indicator is measured (approximately) at the beginning of each 5-year period while the corruption and the control variables are 5-year averages. This contributes to mitigate the concerns about causality. Column (2) includes GDP per capita as a further control variable. The results are not altered. Columns (3) and (4) relate to the yearly panel for the mid to end 1990s. For this period our standard control variables (BUREAU and RULE) were not available, thus we run more parsimonious regressions. Regression (3) finds that the 1-year lagged panel indicator of press freedom is significantly associated with lower corruption (expected sign is negative for this indicator). The coefficient is much smaller though than the one found in the cross-section estimates. The coefficient of the 1-year lagged endogenous variable shows that corruption is highly persistent. We also tested fixed and random effects with the lagged press indicators only (i.e., excluding the lagged endogenous since it would lead to biased results): press was significant in the random effects regression but not significant in the fixed effects regression. One way of addressing the problem of omitted variables that are country specific and vary little over time is to look at first differences. In this short period of time many of the fundamental determinants of corruption can be assumed to be time invariant. Thus, the last regression (4) uses first (log) differences and again finds that an increase in press freedom is associated with a decrease in corruption.
7. Conclusions The empirical evidence presented in this paper shows a strong association between the level of press freedom and the level of corruption across countries. The results suggest that an independent press may represent an important check against corruption. This result is not sensitive to the specifications estimated and for alternative measures of corruption and press freedom. Theoretical considerations, estimations with various instruments as well as panel data evidence suggest that the causation runs from more press freedom to less corruption. All in all the results indicate that press freedom might be an important check on corruption. How much improvement in corruption can countries expect from more press freedom? The estimated coefficients range from −0.015 to −0.037. This suggests that an improvement of one standard deviation in press freedom could reduce corruption between 0.4 and 0.9 points (on the scale form 0 to 6). Alternatively, one could ask how much an improvement in press freedom to the level of Norway (the country with the freest press) would affect the corruption index for countries with particularly repressive practices. Even using the lower bound of the estimates we find that the effect might be substantial. By way of illustration, in the case of Indonesia it would mean a
A free press is bad news for corruption
85
reduction in corruption to the level of Singapore, for the Russian Federation it would imply reaching the corruption level of the Slovak Republic, and for Nigeria the level of Belgium.
Acknowledgements We thank two referees, Boris Zürcher, Peter Kugler, George Sheldon, and seminar participants at the University of Basel, the University of Saarland, the University of St. Gallen and the University of Fribourg for helpful comments. Financial support from the WWZ Förderverein is gratefully acknowledged.
Appendix A. Data definitions and sources PRESS
Press freedom in 1997 for the cross-section estimates and yearly data from 1996–1999 in the panel estimates (Freedom House) (0 = highest, 100 = lowest level of press freedom) PRESS (Humana) Press freedom in 1982, 1986 and 1991 (Humana, 1992) (0 = lowest, 15 = highest level of Press Freedom) Corr-ICRG Average corruption 1994–1998 (International Country Risk Guide) (1 = highest, 6 = lowest level of corruption) BUREAU Average quality of the bureaucracy 1982–1995 (International Country Risk Guide) (1 = lowest, 6 = highest level of bureaucratic quality) RULE Average rule of law 1982–1995 (International Country Risk Guide) (1 = lowest, 6 = highest level of rule of law) HUMCAP Average educational attainment (Nehru et al., 1995) TRADE Average Trade ((Export + Import)/GDP) 1970–1992 (The World Bank, World Development Indicators) BLACK Black market premium on foreign exchange 1974–1989 (Pick’s Currency Yearbook) ETHNIC Ethnolinguistic fractionalization (Easterly and Levine, 1997) POLRIGHTS Average political right 1974–1989 (Gastil, 1989), GDP GDP per capita in PPP terms, 1995 (The World Bank, World Development Indicators) DEMOCRACY Level of democracy (Jaggers and Gurr, 1996) (0 = lowest, 10 = highest level of democracy) Frac Prot Fraction of Protestants in a country (La Porta et al., 1998) Frac Eur Fraction of European Languages spoken in a country (Hall and Jones, 1999) Corr-IMD Corruption indicator in 1996 (Intern. Institute for Management Development)
46.23 47.00 100.00 5.00 24.51
129
3.20 3.00 6.00 0.89 1.52
BUREAU
PRESS BUREAU RULE Log(GDP) HUMCAP TRADE BLACK ETHNIC Corr-ICRG
1.00 −0.63 −0.73 −0.69 −0.60 −0.01 0.34 0.47 −0.74
PRESS
129
3.33 3.04 6.00 0.90 1.49
1.00 0.87 0.80 0.69 0.20 −0.32 −0.36 0.79
122
8.39 8.34 10.54 6.11 1.12
1.00 0.83 0.64 0.20 −0.39 −0.41 0.83
82
5.82 5.73 12.58 0.57 2.71
1.00 0.79 0.22 −0.45 −0.60 0.75
137
71.98 61.90 328.12 12.86 42.31
TRADE
1.00 0.14 −0.41 −0.47 0.58
HUMCAP
HUMCAP
Log(GDP)
Log(GDP)
RULE
RULE
BUREAU
Table 5A.2 Correlation matrix
Observations 145
Mean Median Maximum Minimum S.D.
PRESS
Table 5A.1 Descriptive statistics
1.00 −0.11 −0.11 0.20
TRADE
96
57.42 9.21 732.44 −0.27 121.75
BLACK
1.00 0.41 −0.28
BLACK
99
0.42 0.43 0.92 0.00 0.30
ETHNIC
1.00 −0.43
ETHNIC
128
3.37 3.00 6.00 0.80 1.22
Corr-ICRG
1.00
Corr-ICRG
99
8.92 10.00 15.00 0.00 4.89
PRESS (Humana)
A free press is bad news for corruption Corr-WB Corr-TI
87
Corruption indicator in 1997 (Brunetti et al., 1998) (1 = highest, 6 = lowest level of level of corruption) Aggregate corruption indicator for 1998 (Transparency International) (0 = highest, 10 = lowest level of level of corruption)
Notes 1 The costs of corruption have been a topic of academic debate for a long time (see, e.g., Rose-Ackerman, 1975). Case studies such as Klitgaard (1988) and De Soto (1989) helped to popularize the notion that corruption is a major obstacle to growth in developing countries. This notion has been confirmed in a series of recent empirical cross-country studies. Mauro (1995) showed that corruption negatively affects rates of investment. Knack and Keefer (1995) and Brunetti et al. (1998) find that corrupt institutions lower growth through lower accumulation of resources as well as misallocation of resources. Mauro (1996) shows that corruption distorts the allocation of public expenditures and Johnson et al. (1998) find that countries with more corruption have a larger informal sector. For surveys of research on the economic effects of corruption see, e.g., Bardhan (1997) or Tanzi (1994). 2 Data sources, descriptive statistics and correlations are in Appendix A. 3 Source of quotes: Freedom House (1997). 4 Restrictions of press freedom come in many guises. In many countries, the press is regulated through an array of laws which claim to protect national security, personal privacy or even ‘the truth’. To take a few examples: In 13 Latin American countries journalists can be penalized for insulting or violating the privacy of officials. Exposing illegal actions by a government official may thus be charged against the journalist rather than the official. In Uganda, a bill could impose 5 years of imprisonment and large fines for a journalist publishing (unspecified) ‘false’ or ‘aggravating’ information. In Colombia a regulatory commission has the authority to take television news programs off the air to protect the nation’s ‘honor’. In Bolivia and Botswana laws are planned that impose ethical standards for the press to regulate alleged ‘abuse’. In Malawi a law denies journalists the right to protect their news sources. The press in Gambia is hit with high start-up fees and extreme penalties for libel are considered in Brazil and Congo. In 1996, 46 journalists are known to have been murdered on the job and 372 arrested. See Freedom House (1997) for a more detailed discussion. 5 See Knack and Keefer (1995). 6 Sources and precise description of data for control variables are in Appendix A. To reduce endogeneity problems the timing of independent variables is chosen such that they are long averages for a period (10–15 years) previous to the corruption measure. 7 In OLS a good instrument is highly correlated with the instrumented variable but orthogonal to the residual. On this account the political rights instruments could be problematic since there may be multiple interactions between various features of a political system and bureaucratic outcomes. However, it is not obvious that more autocratic countries should a priori be more corrupt and vise versa. Corruption has been known to flourish in democracies as well as in autocracies. Ades and Di Tella (1999), for instance, fail to find a significant positive relationship between political rights and low corruption and explain this finding with the fact that a number of countries such as Iraq, Hong Kong and Singapore combine low corruption with low levels of political rights. By the same token, democracy does
88
8 9 10 11 12
13 14
Aymo Brunetti and Beatrice Weder not automatically reduce corruption but only through its effect on improving the level of external and internal controls on the bureaucracy, e.g., through the control exercised through a free press. In this view it seems reasonable to assume that our instrument is uncorrelated with the error term. Tests for nonlinear relationships did not produce better fits of the data. Alternatively, an improvement of press freedom by one standard deviation decreases corruption by about 0.5 points. The finding on inflation is consistent with Braun and Di Tella (2000). This result is robust to a higher or lower cut off for the definition of an authoritarian regime as well as using the democracy measure by Jaggers and Gurr (1996) to define this cut off. A key feature of Western European expansion around the world was the value attributed to freedom of speech, which of course implies press freedom. Since the expansion of European values can be taken as exogenous, indicators of European influence can serve as instruments. We follow Hall and Jones (1999) and use the fraction of Western European languages (English, French, German, Portuguese, and Spanish) spoken as a first language as a proxy for European influence. A second, measure of a particular kind of European values is given by the fraction of protestants in the population. The other instruments used in Hall and Jones (1999) are the distance from the equator and the predicted trade share of the economy, based on a gravity model. However, these variables are not necessarily natural measures of European influence and indeed their correlation with press freedom is low. The same is true for an indicator of French legal origin. Hausmann’s test of overidentifying restrictions could not reject the exogeneity of the instruments on conventional levels (P value > 0.4). See Brunetti et al. (1998) for a more detailed presentation of the survey.
References Ades, A., Di Tella, R., 1999. Rents, competition and corruption. American Economic Review 89, 982–993. Bardhan, P., 1997. Corruption and development. Journal of Economic Literature 25, 1320–1346. Besley, T., McLaren, J., 1993. Taxes and bribery: the role of wage incentives. Economic Journal 103, 119–141. Braun, M., Di Tella, R., 2000. Inflation and Corruption, mimeo, Harvard University. Brunetti, A., Kisunko, G., Weder, B., 1998. Credibility of rules and economic growth: evidence from a worldwide survey of the private sector. World Bank Economic Review 12, 353–384. De Soto, H., 1989. In: The Other Path. Harper & Row, New York. Easterly, W., Levine, R., 1997. Africa’s growth tragedy: policies and ethnic divisions. Quarterly Journal of Economics 112, 1203–1250. Flatters, F., MacLeod, B., 1995. Administrative corruption and taxation. International Tax and Public Finance 2, 397–417. Freedom House, 1997. Press Freedom World Wide, www.freedomhouse.org/Press/ Press97/ratings97. Gastil, R., 1989. In: Freedom in the World: Political Rights and Civil Liberties 1988–1989. Freedom House, Lanham. Greene, W., 1997. In: Econometric Analysis, 3rd Edition. Prentice-Hall International, London.
A free press is bad news for corruption
89
Hall, R., Jones, C., 1999. Why do some countries produce so much more output per worker than others? Quarterly Journal of Economics 110, 495–525. Hindricks, J., Keen, M., Muthoo, A., 1999. Corruption, extortion and evasion. Journal of Public Economics 74, 395–430. Humana, C., 1992. In: World Human Rights Guide, 3rd Edition. Oxford University Press, New York, Oxford. Jaggers, K., Gurr, T., 1996. Polity III: Regime Type and Political Authority, 1800– 1994, mimeo, Inter-University Consortium for Political and Social Research, Ann Arbor. Johnson, S., Kaufman, D., Zoido-Lobaton, P., 1998. Regulatory discretion and the unofficial economy. American Economic Review Papers and Proceedings 88, 387–392. Kaufman, D., 1997. Corruption: the facts. Foreign Policy 107, 114–131. Klitgaard, R., 1988. In: Controlling Corruption. University of California Press, Berkeley, CA. Knack, S., Keefer, P., 1995. Institutions and economic performance: cross-country tests using alternative institutional measures. Economics and Politics 7, 207–227. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., Vishny, R., 1998. The Quality of Government. NBER Working Paper 6727. National Bureau of Economic Research, Cambridge, MA. Lee, R., 1986. Bureaucratic corruption in Asia: the problem of incongruence between legal norms and folk norms. In: Carino, A. (Ed.), Bureaucratic Corruption in Asia: Causes, Consequences, and Controls. NMC Press, Quezon City. Mauro, P., 1995. Corruption and Growth. Quarterly Journal of Economics 110, 681– 712. Mauro, P., 1996. The effects of corruption on growth, investment and government expenditure. Journal of Public Economics 69, 263–279. Nehru, V., Swanson, E., Dubey, A., 1995. A new database on human capital stock: sources, methodology and results. Journal of Development Economics 46, 379–401. Rahman, R., 1986. Legal and administrative measures against bureaucratic corruption in Asia. In: Carino, A. (Ed.), Bureaucratic Corruption in Asia: Causes, Consequences, and Controls. NMC Press, Quezon City. Rauch, J., Evans, P., 2000. Bureaucratic structures and economic performance in less developed countries. Journal of Public Economics 75, 49–71. Rose-Ackerman, S., 1975. The economics of corruption. Journal of Political Economy 83, 187–203. Sachs, J., Warner, A., 1995. Economic reform and the process of global integration. Brookings Papers of Economic Activity 1, 1–118. Shleifer, A., Vishny, R., 1993. Corruption. Quarterly Journal of Economics 108, 599–617. Tanzi, V., 1994. Corruption, Governmental Activities and Markets. IMF Working Paper 94/99. International Monetary Fund, Washington, DC. Van Rijckeghem, C., Weder, B., 2001. Corruption and the rate of temptation: do low wages in the civil service cause corruption. Journal of Development Economics 65, 307–331. White, H., 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817–838.
6
The market for news Sendhil Mullainathan and Andrei Shleifer *
Several recent books have accused mainline media outlets of reporting news with a heavy political bias. Bernard Goldberg (2002) and Ann Coulter (2003) argue that the bias is on the left, and provide numerous illustrations of their argument, while Eric Alterman (2003) and Al Franken (2003) argue that the bias is on the right, with equally numerous illustrations. In principle, media bias can come from the supply side, and reflect the preferences of journalists (David Baron, 2004), editors, or owners (Besley and Andrea Prat, 2004; Simeon Djankov et al., 2003). Alternatively, it can come from the demand side, and reflect the news providers’ profit-maximizing choice to cater to the preferences of the consumers. We examine, theoretically, the determinants of media accuracy in such a demand-side model, focusing specifically on the effects of reader beliefs, reader heterogeneity, and competition on media bias. We argue that the analysis of media accuracy relies crucially on how one conceptualizes the demand for news. In the traditional conception of the demand for news, consumers read, watch, and listen to the news in order to get information. The quality of this information is its accuracy. The more accurate the news, the more valuable is its source to the consumer. Pressure from audiences and rivals forces news outlets to seek and deliver more accurate information, just as market forces motivate auto-makers to produce better cars.1 This conception of the news as a source of pure information is dramatically different from that of noneconomists studying the media. According to these scholars, private media want to sell newspapers and television programs, as well as advertising space. To do that, they provide a great deal of pure entertainment. But even with news, audiences want their sources not only to inform but also to explain, interpret, persuade, and entertain. To meet this demand, media outlets do not provide unadulterated information, but rather tell stories that hang together and have a point of view, what is referred to in the business as “the narrative imperative.”2 In this view, news provision can be analyzed in the same way as entertainment broadcasting.3 In this paper, we examine these two conceptions of what the consumers want and what the media deliver, and evaluate media accuracy under different scenarios. We show, in particular, that these two conceptions have
The market for news 91 radically different implications for the accuracy of news in the competitive media, and more specifically on the question of which news issues will be reported more accurately. Our model of rational readers seeking information shows that, indeed, consistent with economists’ priors, media reporting is unbiased. We compare this to a specific behavioral model (of which the rational consumers are a special case), which relies on two assumptions, one about reader preferences and one about the technology of delivering news.4 We assume that readers hold biased beliefs, which might come from their general knowledge and education, from previous news, from prejudices and stereotypes, or from the views of politicians or political parties they trust. With respect to preferences, we assume that readers prefer to hear or read news that is more consistent with their beliefs. Such biased readers might believe, for example, that corporate executives are cheats and crooks, and these readers prefer news about their indictments to news about their accomplishments. They might think that China is up to no good with respect to the United States, and appreciate stories about Chinese spies. Some readers might like President Bill Clinton and prefer to read about partisan Republicans persecuting the hard-working president; others might dislike Clinton and look for stories explaining, in salacious detail, the impeachability of his offenses. The idea that people appreciate, find credible, enjoy, and remember stories consistent with their beliefs is standard in the communications literature (Graber, 1984; Severin and Tankard, 1992). Basic research in psychology strongly supports it. Research on memory suggests that people tend to remember information consistent with their beliefs better than information inconsistent with their beliefs (Frederic Bartlett, 1932). Research on information processing shows people find data inconsistent with their beliefs to be less credible and update less as a result (Charles Lord et al., 1979; John Zaller, 1992; Matthew Rabin and Joel Schrag, 1999). According to Graber (1984, p. 130), “stories about economic failures in third world countries were processed more readily than stories about economic successes.” People seek information that confirms their beliefs (Josh Klayman, 1995). When people categorize, they tend to ignore category-inconsistent information unless it is large enough to induce category change (Susan Fiske, 1995; Mullainathan, 2002). Severin and Tankard (1992) see the demand for cognitive consistency as crucially shaping which news people listen to, and which they ignore. Our second assumption is that newspapers can slant the presentation of the news to cater to the preferences of their audiences. The term “slanting” was introduced by Hayakawa (1940), and defined as “the process of selecting details that are favorable or unfavorable to the subject being described.” Slanting is easily illustrated in a simple example. Suppose that the Bureau of Labor Statistics (BLS) releases data that show the rate of unemployment rising from 6.1 percent to 6.3 percent. What are the different ways a paper can report this number? One is a single sentence report that simply presents the above fact. But there are alternatives. Consider just two.
92
Sendhil Mullainathan and Andrei Shleifer
a
Headline: Recession Fears Grow. New data suggest the economy is slipping into a recession. The BLS reports that the number of unemployed grew by 200,000 in the last quarter, reaching 6.3 percent. John Kenneth Galbraith, the distinguished Harvard economist, sees this as an ominous sign of the failure of the administration’s policies. “Not since Herbert Hoover has a president ignored economic realities so blatantly. This news is only the beginning of more to come,” he said. (Accompanying picture: a long line for unemployment benefits in Detroit, Michigan.) Headline: Turnaround in Sight. Is the economy poised for an imminent turnaround? Data from the BLS suggest that it might be. Newly released figures show unemployment inching up just 0.2 percent last quarter. Abbie Joseph Cohen, the chief stock market strategist at Goldman Sachs, sees the news as highly encouraging. “This is a good time to increase exposure to stocks,” she says, “both because of the strong underlying fundamentals and because the softness in the labor market bodes well for corporate profitability.” (Accompanying picture: smiling Abbie Joseph Cohen.)
b
Each of these stories could easily have been written by a major U.S. newspaper. In fact, stories like these, in light of public disclosure of identical facts, are written every day. Neither story says anything false, yet they give radically different impressions. Each cites an authority, without acknowledging that a comparably respectable authority might have exactly the opposite interpretation of the news. Each omits some aspect of the data: the first by neglecting to mention the starting point of the unemployment rate, the second by ignoring unemployment levels. Each uses a headline, and a picture, to persuade readers who do not focus on the details. Each, in other words, slants the news by not telling the whole truth, but the articles are slanted in opposite directions.5 Our model of the market for news combines the assumption of readers preferring stories consistent with their beliefs with the assumption that newspapers can slant stories toward specific beliefs. We examine two crucial aspects of this environment. First, we consider two alternative assumptions about the nature of competition: monopoly versus duopoly. Our model of media competition is analogous to a Hotelling model of product placement (Jean Tirole, 1988, ch. 7). Newspapers locate themselves in the product space through their reporting strategies (i.e., how they slant). Readers’ beliefs determine their “transportation” costs, since they face psychic costs of reading papers whose reporting does not cater to their beliefs. We ask whether competition by itself eliminates or reduces the slanting of news, as economists often argue. We show that the answer for biased readers is clearly no. Competition generally reduces newspaper prices, but does not reduce, and might even exaggerate, media bias. Second, we study heterogeneity of reader beliefs. What effect does such heterogeneity have on the nature of slanting and the overall accuracy in
The market for news
93
media? What is the impact of competition on media accuracy when reader beliefs are heterogeneous, as in the case of beliefs about President Clinton? To answer this question, it is crucial to distinguish between an average reader, who reads one source of news, and a hypothetical conscientious reader, who reads multiple sources. In general, competition with heterogeneous readers increases the slanting by individual media sources. But with heterogeneous readers, the biases of individual media sources tend to offset each other, so the beliefs of the conscientious reader become more accurate than they are with homogeneous readers. Our central finding is that reader heterogeneity plays a more important role for accuracy in media than does competition. At a broader level, this paper contributes to one of the central issues in economics, namely whether the presence of rational, profit-maximizing firms eliminates any effect of irrational participants on market “efficiency.” In the context of financial markets, Milton Friedman (1953) argued long ago that it does, and that rational arbitrageurs keep financial markets efficient. Subsequent research, however, has proved him wrong, both theoretically and empirically (Shleifer, 2000; Markus Brunnermeier and Stefan Nagel, 2004). One finding of this research is that, in some situations, such as stock market bubbles, it might pay profit-maximizing firms to pump up the tulips rather than eliminate irrationality (Brad DeLong et al., 1990). Subsequent research has considered the interaction between biased individuals and rational entrepreneurs in other contexts, such as the incitement of hatred (Glaeser, 2005), political competition (Kevin Murphy and Shleifer, 2004), and product design (Xavier Gabaix and Laibson, 2004). Here we ask a closely related question for the market for news: does competition among profit-maximizing news providers eliminate media bias? We find that the answer, in both financial and political markets, is no. Powerful forces motivate news providers to slant and increase bias rather than clear up confusion. The crucial determinant of accuracy is not competition, per se, but consumer heterogeneity.
Model setup Readers are interested in some underlying variable t, such as the state of the economy, which is distributed N(0, υt). Let p = 1/υt denote the precision. Readers hold a belief about t that may be biased; beliefs are distributed N(b, υt). Thus, readers are potentially biased about the expected value of t, but have the correct variance. Newspapers are in the business of reporting news about t. They receive some data d = t + ε, where ε ~ N(0, υε). In the example from the introduction, these data might be an unemployment rate release. We assume that the papers then report the data with a slant s, so the reported news is n = d + s. For most of the paper, the exact technology of slanting is not important, but in Section V we study a specific one.
94
Sendhil Mullainathan and Andrei Shleifer
Reader utility Suppose readers are rational and unbiased. All they want is information. They dislike slanting because it is costly both in effort and the time it takes to read slanted news and figure out the “truth.” In the BLS example, the report of the first newspaper does not tell the reader how much the unemployment rate changed, while that of the second newspaper does not contain the unemployment rate. To get a full picture, the reader needs more information. We assume that a rational reader’s utility is decreasing in the amount of slanting. So, if he reads a newspaper, his utility is: Ur = u¯ − χs2 − P
(1)
where P is the paper’s price. If he does not read the newspaper, he receives utility 0. Biased readers, on the other hand, get disutility form reading news inconsistent with their beliefs. We model consistency as the distance between the news and the reader’s beliefs, b, measured as (n − b)2. In the BLS example, a reader optimistic about the economy experiences disutility when reading stories that suggest a recession. At the same time, even biased readers dislike blatant and extreme slanting, at least in the long run. Holding constant the consistency with beliefs, they prefer less slanted news.6 So, if he reads the newspaper, the overall utility of a biased reader is: Ub = u¯ − χs2 − (n − b)2 − P
(2)
where > 0 calibrates his preference for hearing confirming news. Newspaper strategy Before seeing the data d, a newspaper announces its slanting strategy s(d ) and the price P it charges. Potential readers buy the paper if the price P is lower than the expected utility associated with reading the paper, Ed [U(s(d ))]. To form expected utility, expectations are taken over d and are assumed to be the true expectations (d ~ N(t, υd )) rather than the biased ones. This approach crudely captures the idea that this is a long-run game. Readers get a general sense of how much pleasure the paper provides them and make their purchasing decisions accordingly. It then makes more sense to think of expected utility using the empirical distributions. Practically, in the model both assumptions about expectations produce the same results. Once readers decide whether to buy the paper, the paper observes its signal d and reports n = d + s(d ). Readers read the news and receive their utility. Timing of the full game is as follows: (a) The newspaper announces a strategy s(d ) for how to report the news. When there are two papers, both announce strategies simultaneously.
The market for news
95
(b) Price P is announced. When there are two papers, both announce prices simultaneously, after the other paper has revealed its strategy. (c) Individuals decide whether to buy the paper based on average utility associated with its strategy s(d ) and price P. (d) Newspaper receives data d and reports news d + s(d ). If there are two papers, they receive the same data d and report d + sj(d ) where j = 1, 2. (e) If individuals buy the paper, they read the news and receive utility. Cases considered We consider two different distributions of reader beliefs: homogeneous and heterogeneous. Homogeneity means that all readers hold the same beliefs b with precision p. For example, all or nearly all readers in the United States might believe that the Russians are corrupt or that the French are antiAmerican. Heterogeneity means that there is a distribution of reader beliefs. Such heterogeneity could come from political ideology. For example, opinions about U.S. presidents often divide along party lines. We assume that heterogeneous beliefs are distributed uniformly between b1 and b2 where b1 < b2 and b2 > 0. Readers in this uniform distribution are indexed by i ∈ [1, 2] so that reader i holds belief bi. All readers hold their beliefs with precision p. We denote by b¯ the average of b1 and b2. We also denote reader i’s utility function as ui(d ) or ub (d ), depending on context. The homogeneous and heterogeneous cases are designed to capture two different types of issues: ones on which there is consensus in the population and ones where there is substantial disagreement. We also examine two cases of industry structure. In the first case, there is a single monopolistic newspaper. In the second, there are two newspapers, indexed by j = 1, 2, each seeing the same data d. For a monopolist, s*hom and s* het denote the optimal slanting strategy for the homogeneous and heterogeneous case. Similarly, P*hom and P* het denote optimal price in these cases. For duopolists, s* j,hom and s* j,het denote the optimal strategy of paper j = 1, 2 in the homogeneous and heterogeneous cases, respectively. Similarly, P*j,hom and P* j,het denote each duopolist’s optimal price in these two cases. This formalism of industry structure is similar in spirit to a Hotelling model. Readers’ beliefs resemble consumers’ preferred locations. Their dislike of inconsistent news resembles transportation costs. Firms’ choice of a slanting rule resembles their choice of location. In this context, our utility function implies quadratic transportation costs and our distribution of reader beliefs in the heterogeneous case corresponds to a uniform distribution of consumers. Consequently, many of our proofs resemble the proofs for the Hotelling models in this case (Claude d’Aspremont et al., 1979).7 i
96
Sendhil Mullainathan and Andrei Shleifer
Defining bias We are interested in the extent of newspaper bias in the market. We measure this by the average bias of the newspapers in the market, weighted by their market share. In the homogeneous case, where there is only one kind of reader, we simply define bias as ARBhom = Ed [(n − d )2]
(3)
where n is the news read by these readers. So bias is defined as the average amount by which the news read deviates from the data for the average reader. In the heterogeneous case, let ni be the news read by reader i ∈ [1, 2]. Bias is then defined as:
ARBhet =
冮 E [(n − d ) ]. 2
d
i
(4)
i
This measures the average bias that readers encounter.
Rational readers When readers are rational, newspapers face only a disincentive to slant. The following proposition summarizes the outcomes for different cases. PROPOSITION 1: Suppose readers are rational. Then, whether readers are homogeneous or heterogeneous, the monopolist does not slant and charges the same price: s*hom = s*het = 0
(5)
¯. P* hom = P* het = u
(6)
and
In the duopolist case as well, papers do not slant and once again charge the same price: s* j,hom = s* j,het = 0
(7)
P* j,hom = P* j,het = 0
(8)
and
for all j on the equilibrium path. The only effect of competition is to lower prices.
97
The market for news PROOF:
See Appendix for all proofs.
Proposition 1 illustrates the normal logic of economists’ thinking about the media. When readers seek accuracy in news, newspapers pass on, without slant, the information they receive. Since perfect quality is achieved even without competition, the effect of competition is to reduce the price that readers pay. With both monopoly and duopoly, consumers get what they want and there is no media bias.8 In the rest of the paper, we focus on the case of biased readers.
Homogeneous biased readers The following proposition summarizes the monopolist’s behavior with homogeneous readers. PROPOSITION 2: s*hom(d ) =
χ+
P* ¯− hom = u
A monopolist facing a homogeneous audience chooses: (b − d )
χ χ+
[b2 + υd ]
(9) (10)
if u¯ > [χ/(χ + )][b2 + υd ]. If not, there exists no slanting strategy that results in the news being read. Because the monopolist can capture all surplus through the price he charges, to maximize profits he merely maximizes expected utility. The news he reports is: n=
χ+
b+
χ χ+
d.
(11)
The reported news is a convex combination of bias and data, with weights given by utility parameters. In this case, we say the monopolist “slants toward b.” Since this linear slanting strategy will reappear throughout the paper, we define: sB(d ) ≡
(B − d ). χ+
(12)
With this notation, the proposition above can be rewritten as s*hom(d ) = sb(d ). The monopolist chooses this linear form because expected utility functions are separable in the value of d. The monopolist maximizes utility for every given value of d, which leads him to slant toward a biased reader’s beliefs.9
98
Sendhil Mullainathan and Andrei Shleifer
The following corollary derives comparative statics for the magnitude of slanting. COROLLARY 1: In the homogeneous reader case, slanting increases with the reader preference for hearing confirmatory news and declines with the cost of slanting: ∂|s*hom(d)| >0 ∂
(13)
∂|s* hom(d)| < 0. ∂χ
(14)
Proposition 2 suggests a theory of spin. Suppose that a politician, or some other figure of authority, has a first mover advantage, i.e., can choose which data d gets presented to the media first. The papers slant the data toward reader beliefs, but by Proposition 2, d will have significant influence on what papers report as compared to their getting data from an unbiased source. For example, by preemptively disclosing that a Chinese spy has been found in Los Alamos, a politician can focus the discussion on the risk to U.S. security from Chinese espionage, rather than on the administrative incompetence in the Department of Energy. This effect becomes even more powerful in a more general model of sequential reporting. In this case, the initial spin may shape reader priors, which future papers face and consequently slant news toward. The initial spin would then be reinforced even by ideologically neutral papers. The condition u¯ > [χ/(χ + )][b2 + υd ] guarantees that this reader’s reservation utility u¯ is high enough that he prefers reading the optimally biased news to no news. From now on, we assume that this condition holds. ASSUMPTION 1: Reader utility from news is high enough that readers prefer the equilibrium news to no news: u¯ >
χ χ+
[b2 + υd ].
(15)
With this assumption in place, we now turn to competition. How does competition between two newspapers affect the results above? PROPOSITION 3: Suppose duopolists face a homogeneous audience. Then there is an equilibrium in which duopolists choose on the equilibrium path: s* j,hom(d ) = and prices
(b − d ) χ+
(16)
The market for news P*j,hom = 0
99 (17)
for both j = 1, 2. Readers are indifferent between the two papers. With a homogeneous audience, competition is Bertrand-like: it simply drives prices down to zero.10 Each duopolist’s slant is exactly equal to the monopolist’s slant, and they split the readers between them. The following corollary summarizes the impact of competition on bias in the homogeneous case.11 COROLLARY 2: For a homogeneous audience, both monopoly and duopoly produce the same amount of average reader bias: ARBmon(υd ) = ARBduo(υd ).
(18)
Propositions 2 and 3 are the first critical results of the paper. They show that when readers have homogeneous biases, competition does not eliminate them—it only leads to price reductions. Both monopolists and duopolists cater to reader prejudices. These propositions basically say that one cannot expect accuracy—even in the competitive media—on issues where the readers share beliefs. One example of such uniformity might be foreign affairs, where there may be a great deal of commonality of views toward a particular foreign country, such as Russia, China, or France. Another example is law enforcement, where most readers might sympathize with efforts by the government to prosecute members of a disliked group (e.g., the Arabs or the rich).
Heterogeneous biased readers What happens when readers differ in their beliefs? Newspapers must now decide which one of the heterogeneous reader groups is its target audience. PROPOSITION 4: Suppose a monopolist faces a heterogeneous audience with b¯ = 0. There exists a Cm, which depends on the parameters of the model, that determines the monopolist’s strategy. If b2 − b1 < Cm, the monopolist maximizes profits by choosing: s*het = sb¯(d ) = P* ¯− het = u
χ+
χ χ+
(b¯ − d ) = −
υd − 2b22.
d χ+
(19) (20)
If b2 − b1 > Cm the monopolist chooses not to cover the market, i.e., not all readers read the paper.
100 Sendhil Mullainathan and Andrei Shleifer According to Proposition 4, the monopolist covers the market if the dispersion of reader beliefs is small enough. If beliefs are too far apart, readers on either extreme will not read the paper.12 Duopolists, in contrast, respond completely differently to heterogeneity. For tractability, we now consider only the situation where duopolists choose linear strategies. PROPOSITION 5: Suppose duopolists choose linear strategies of the form sB(d ) = [/(χ + )](B − d ) and that b¯ = 0. Then there exists a constant
Cd =
4 +χ
冪33 冤
2
u¯ −
χ
冥
υd
(21)
such that if b2 < Cd duopolists choose:
s* 1,het(d ) =
s* 2,het(d ) = P* j,het =
冢
3
冣
b 1 − d1
(22)
3 b 2 − d2 χ+ 2
(23)
χ+ 2
冢
冣
62 2 b2 χ+
(24)
where we assume, without loss of generality, that firm 1 slants toward the left and firm 2 slants toward the right. All readers read the newspaper. Each duopolist positions himself as far away from the other as possible. The reported news in this case equals nj = d + s* j,het(d ) =
χ 3 bj + dj. χ+2 χ+
(25)
The reported news is a weighted average of the actual data d and 3/2 bj, where bj is the endpoint of the reader bias distribution. So duopolists are slanting news toward 3/2 bj, points that are more extreme than the most extreme readers in the population. This is analogous to the standard Hotelling result with uniform distributions and quadratic transportation costs (Tirole, 1988; d’Aspremont et al., 1979). As in the standard Hotelling model, the monopolist caters to both audiences unless they are too far apart, while duopolists maximally differentiate. But in the standard Hotelling model, firms are constrained to choose within the preference distribution. In our model, they can choose positions
The market for news
101
outside the distribution of reader bias, and in equilibrium choose very extreme positions.13 To see why this occurs, consider a simple case where = 1, χ = 1, b2 = 1 and b1 = −1. With these parameters, suppose the firms locate at z1 ≤ z2.14 Equilibrium prices then equal (see the proof of Proposition 5): z¯
冢 冣
P*1 (z1, z2) = ∆z 1 +
3
z¯
冢 3冣
P* 2 (z1, z2) = ∆z 1 −
(26)
(27)
where ∆z = z2 − z1 and z¯ = (z1 + z2)/2. The more differentiated the duopolists (the greater is ∆z), the higher the prices they can charge. Differentiation softens price competition because the temptation to undercut each other diminishes as the firms move farther away from the marginal consumer (who is located between them). Now consider firm 1’s choice of where to locate. When biasing toward z1, firm 1 captures all readers between −1 and x*(z1, z2) = z¯/3. Hence its profits equal P*1(1 + z¯/3). Differentiating with respect to z1 gives the first-order condition ∂P* 1 ∂z1 ∂P* 1 z1
∂x* =0 ∂z1
冢 冣
(x*(z1, z2)) + P*1 1
z¯
冢1 + 3冣 + P*冢3冣 = 0. 1
(28)
(29)
Increasing z1 (that is, moving closer to the origin) has two effects on profits. The first is a price effect; there is a change in profits because changing position affects the equilibrium prices. The second is a market share effect; there is a change in profits because moving closer to the origin raises market share. Papers slant toward positions well beyond the extreme consumers because the price effect dominates the market share effect until firms are very far apart. Focusing on the symmetric case with z¯ = 0, the price effect is ∂P*1 /∂z1 = ∆z/6 − 1. The price effect is negative as long as ∆z < 6, in other words, until the difference in firm locations is three times as high as the difference in most extreme readers (3(b2 − b1) = 6). The market share effect, on the other hand, is P* 1 /3 = ∆z/6. These two effects offset each other to produce an optimum when ∆z/6 − 1 + ∆z/6 = 0 or ∆z = 3. At the symmetric equilibrium, the optimum is reached at ∆z = −2z1 = 3 or z1 = − 3/2. The distance between the newspapers (z2 − z1 = 3) is greater than the distance between the most extreme readers (b2 − b1 = 2).
102 Sendhil Mullainathan and Andrei Shleifer In short, when choosing how to slant, duopolists maximally differentiate themselves.15 Practically, this means that news sources can be even more extreme than their most biased readers. One cannot, therefore, infer reader beliefs directly from media bias. Another point is worth noting: E[|(s*j,het(d ))|] ≥ E[|(s*het(d ))|].
(30)
Duopolists always slant more than the monopolist when readers are heterogeneous. In this sense, competition tends to polarize the news. The following corollary summarizes the impact of competition on bias. COROLLARY 3: Suppose b1 − b2 < Cm. In the heterogeneous reader case, competition increases the bias of the average reader: ARBmon,het(υd ) < ARBduo,het(υd ).
(31)
Corollary 3 shows that, with heterogeneous readers, competition by itself polarizes readership and, if anything, raises the average reader bias. Entry of a left-wing newspaper or a TV station into a local market previously dominated by a moderate or slightly right-wing monopolist might cause this monopolist to shift his reporting to the right. Corollary 3 might shed light on the growing controversy in the United States about media bias. Several recent books have angrily attacked media outlets for having a left-wing bias (e.g., Goldberg, 2002; Coulter, 2003). Several equally angry books have responded that other media outlets have an even stronger right-wing bias (Alterman, 2003; Franken, 2003). We suspect that there is a grain of truth in all these books, and that the growing partisanship of alternative media sources is a response to the growth in competition, and market segmentation, in the media. Changes in media technology have led to significant entry, especially in television. If these media sources divide the market along ideological lines, we expect them to become more biased than they were in the regime of moderate competition. This is perhaps what the various commentators are recognizing. Corollary 3 may also have implications for the effects of entry of new media outlets on the nature of reporting. In a provocative recent study, Gentzkow and Shapiro (2004) examine the responses to a Gallup poll by residents of nine Muslim countries about such topics as the United States, terrorism, responsibility for 9/11, and so on. The authors document a striking pattern of factually inaccurate beliefs, but also suggest that the media have a strong effect on these beliefs. In particular, those who watch al-Jazeera (Arab television) are much more likely to hold factually false beliefs (as well as antiAmerican ones) than those watching CNN.16 In concluding their paper, Gentzkow and Shapiro appear to endorse recent proposals favoring an
The market for news
103
expansion of Western news in the Arab world, because such news is likely to moderate opinions and beliefs. Our model suggests that caution is appropriate. The people who watch or listen to Western news are already sympathetic to its perspective and might already watch CNN, so they are unlikely to be strongly affected. Additional entry might cause al-Jazeera and similar networks to further differentiate their product by advancing yet more extreme views. The effect might be to radicalize, rather than moderate, their audience.
Reader heterogeneity and accuracy in media Our results so far focus on how an average reader in the population is affected. We can also look at the impact of reporting on a conscientious reader, a hypothetical reader who reads all the news available but is too small to affect what is reported. The interesting insights arise in the duopoly case where the hypothetical conscientious reader reads both papers. Since both papers are reporting on the same event, the conscientious reader might in principle be able to use the two to undo the slanting. To understand this process we need a precise model of slanting. Technology of slanting Following Hayakawa (1940), we assume that newspapers slant by selectively omitting specific bits of news, i.e., not reporting the whole truth.17 To formalize this idea, suppose that, rather than simply receiving a composite d = t + ε, the newspaper receives a sequence of positive and negative “bits” or facts. In the example from the introduction, these facts could be the unemployment rate, the unemployment rate in the past, expert opinions, other relevant economic indicators, and so on. These bits or facts are modeled as a length L string f consisting of positive (+1), negative (−1), or nonexistent (∅) pieces of news. At each position, the probability of each of these values is a function of d, so now instead of simply seeing the composite d, the paper sees all the bits of facts that constitute it. The probability that the piece of news in position i, denoted ft, is positive, negative, or nonexistent is given by the distribution function:
冦
+1 = qg(d)
Pr( fi ) = −1 = q(1 − g(d )) ∅ = (1 − q)
(32)
where g(·) is a continuous and increasing function that is bounded between 0 and 1, and 0 < q ≤ 1. With probability 1 − q, there is no news at position i. If there is news, it is positive with probability g(d ) and negative otherwise. Conditional on d, these probabilities are iid across different bits on a string. With multiple papers, we assume that they all see the same string f.
104
Sendhil Mullainathan and Andrei Shleifer
A newspaper that does not slant at all would simply report the string f without alteration. A reader who sees the string f can draw inferences from the number of + 1’s and −1’s, which we define as N+( f ) and N−( f ), respectively. By the Law of Large Numbers: N+( f ) = g(d ) + η → g(d ) N−( f ) + N+( f )
(33)
where η is a noise term that converges to zero as the length of the string L → ∞. Consequently, for large L, the information the reader receives is well approximated by the case in which he simply observes d since g−1[N+( f )/(N−( f ) + N+( f ))] → d. In this formalism, a newspaper slants the signal by selectively omitting positive or negative bits of information. To slant upward, for example, a newspaper drops negative bits. Instead of reporting + 1, −1, −1, ∅, + 1, −1, . . . it reports + 1, ∅, ∅, ∅, + 1, −1 . . ., for example. A paper that wishes to slant upward by s > 0 produces a string f ′ by dropping enough negative bits to guarantee g−1
N+( f ′)
冢N ( f ′) + N ( f ′)冣 ≈ d + s. −
(34)
+
Likewise, a paper that wishes to slant negatively by s < 0 simply drops enough positive bits. As L → ∞, the paper can choose to drop bits to approximate better and better any given slant s. For simplicity, assume that newspapers omit facts in fixed ways. To slant positively, a paper omits the lowest indexed negative bits until it approximates the desired fraction. To slant negatively, a paper omits the lowest indexed positive bits until it reaches the desired fraction. This assumption is simply one way of formalizing the idea that two papers wishing to slant in a particular direction do so similarly. Cross-checking By cross-checking the facts in the two newspapers, a conscientious reader may be able to reduce the effect of slanting. Suppose each paper receives string f, which can be thought of as implying data d = t + ε, and paper j reports string fj. There are now several cases. If the implied slants for both papers are positive and s1 > s2 > 0, then every fact that paper 1 reports, paper 2 also reports. Moreover, because paper 2 is slanting less, it reports some facts that paper 1 does not. Consequently, a conscientious reader would interpret the news as if she had read only paper 2. The case where 0 > s2 > s1 is similar. On the other hand, if the two papers are on opposite sides of the issue so that s1 > 0 > s2, paper 1 omits some negative details to slant upward and paper 2 omits some positive details to slant downward. The conscientious reader,
The market for news
105
however, can cross-check both papers. Paper 1 reports the positive facts, which paper 2 omits, and paper 2 reports the negative facts, which paper 1 omits. By cross-checking, the conscientious reader gets all the facts, as if she were able to read an unslanted newspaper. Define xc(·) to be the crosschecking function: min{s1, s2} xc(s1, s2) = max{s1, s2} 0
冦
if s1 > 0, s2 > 0 if s1 < 0, s2 < 0
(35)
otherwise.
This function summarizes how the conscientious reader can cross-check the two papers.18 Define nc to be the news the conscientious reader is effectively exposed to: nc =
冦d + xc(s , s ) n
1
2
if one newspaper if two newspapers.
(36)
We then define conscientious reader bias analogously to the average reader bias: CRB = Ed [(nc − d )2].
(37)
This definition of conscientious reader bias is independent of heterogeneity of reader beliefs. However, CRB does depend on the equilibrium news reporting, which in turn may depend on the heterogeneity of reader beliefs. As the discussion on cross-checking suggests, reader heterogeneity can help the conscientious reader quite a bit. To formalize this, let us compare the case of homogeneous readers with bias b to the case of heterogeneous readers with beliefs distributed uniformly on [b − δ, b + δ]. The following corollary summarizes our principal finding: COROLLARY 4: The interaction of reader heterogeneity and duopoly lowers conscientious reader bias. When readers are heterogeneous, conscientious reader bias is lower under duopoly than monopoly: CRBhet,duo < CRBhet,mon.
(38)
Under duopoly, conscientious reader bias is lower under heterogeneity than homogeneity: CRBhet,duo < CRBhom,duo.
(39)
Corollary 4 is the final result of our paper and its bottom line. It points to the absolutely central role that heterogeneity of reader beliefs plays in assuring accuracy in media. We have shown that when readers are homogeneous,
106
Sendhil Mullainathan and Andrei Shleifer
competition results in lower prices, but not in accurate news reporting. When readers are heterogeneous, the news received by the average reader might become even more biased as competitive media outlets segment the market. Such market segmentation, however, benefits a conscientious reader, who can then aggregate the news from different sources to synthesize a more accurate picture of reality. When newspapers are at different sides of the political spectrum, the conscientious reader gets all the facts. While individual news sources slant even more when faced with a heterogeneous public, the aggregate picture becomes more clear. In this respect, reader heterogeneity is the crucial antidote to media bias. This analysis indicates which issues are more likely to receive accurate media coverage, at least for the conscientious reader. Almost surely, the most likely domain of reader heterogeneity is domestic politics, where readers have diverse beliefs and media coverage is correspondingly diverse. Such dispersion of reader beliefs could come from their self-interested economic and social preferences, what used to be called “class differences.” But, as Glaeser (2005) argues, such differences are reinforced by political entrepreneurs, who have an incentive to create particular beliefs that would bring them support, especially if these beliefs distinguish them from the incumbent. Newspapers would then follow these entrepreneurs in mirroring and reinforcing the beliefs of their supporters. In fact, in many countries today, and in the United States 100 years ago, newspapers were affiliated with political parties (Hamilton, 2003). Reader diversity, and newspaper diversity, are partly a reflection of underlying political competition. In other areas of competition, such as sports, we likewise expect local papers to support local teams, thereby creating diversity of reporting across cities reflecting the diversity of reader beliefs. Perhaps the clearest illustration of this corollary is the coverage of the Monica Lewinsky affair during the Clinton presidency. The left-wing press presented an enormous amount of information designed to expiate the president’s sins, while the right-wing press dug out as many details pointing to his culpability. In the end, however, as Posner (1999) remarks in his book, much of the truth has come out and a conscientious reader could get a fairly complete picture of reality.
Conclusion We have examined the roles of two forces in promoting accuracy in media: competition and reader diversity. We have found that competition by itself is not a powerful force toward accuracy. Competition forces newspapers to cater to the prejudices of their readers, and greater competition typically results in more aggressive catering to such prejudices as competitors strive to divide the market. On the other hand, we found that reader diversity is a powerful force toward accuracy, as long as accuracy is interpreted as some aggregate measure of revelation of information to a reader who takes
The market for news
107
in all the news. Greater partisanship and bias of individual media outlets may result in a more accurate picture being presented to a conscientious reader. Reader heterogeneity comes in part from underlying political competition, whereby political parties, movements, and individual entrepreneurs attempt to generate support by presenting their points of view. If they can generate enough interest, media outlets will try to cater to the very same audiences that the political entrepreneurs attract, and diversity in media coverage will arise endogenously. In contrast, when potential audiences share similar beliefs, and when there is no advantage from political entry, such as the coverage of foreign countries or crime, we do not expect to see diversity of media reports or accuracy in media. Political competition is only one source of underlying reader diversity. We can also imagine entrepreneurs starting newspapers on their own and, as long as they have deep enough pockets, creating enough demand for unorthodox views to broaden the range of opinions (and slants) that are being covered. Ideological diversity of entrepreneurs themselves may be the source of diversity of media coverage. We have studied competitive persuasion in the market for news. Our principal finding is that, when competitors can create or reinforce differences of opinion, they will do so in order to divide the market and reap higher profits. There will be no convergence in reporting to the median reader (as in a Downsian median voter framework). We believe that this consequence of competitive persuasion is more general, and that attempts to differentiate competitively by moving toward extreme positions will arise in both political (Murphy and Shleifer, 2004) and product (Gabaix and Laibson, 2004) markets. In these and other domains, the influence of audience heterogeneity and competition on the content of persuasive messages remains to be fully explored.
Appendix A: Lemmas LEMMA A1: sB(d ) =
Define
(B − d ) χ+
(A1)
to be the strategy where a newspaper biases around point B. The reader’s expected utility (gross of price) of reading such a newspaper is: Ed [U(sB(d ))] = u¯ −
χ χ+
Consequently when B = 0:
[υd + b2] −
2 χ+
[B − b]2.
(A2)
108 Sendhil Mullainathan and Andrei Shleifer Ed [U(s0(d ))] = u¯ −
χ υd − b2. χ+
(A3)
And when B = b: Ed [U(sb(d ))] = u¯ −
χ
(υd + b2).
χ+
(A4)
PROOF: Expected utility for sB(d ) is:
u¯ − χ
冮 冢χ +
冣 冮冢
2
(B − d )
d
−
d+
d
冣
2
χ+
(B − d ) − b .
(A5)
The first integral is:
冢
−χ
2 2 [B + υd ] χ+
冣
(A6)
because E[d ] = 0 and E[d 2] = υd . The second integral is:
冤冢
−
χ χ+
冣
2
υd +
χ+
冢
2
冣
B 2 + b2 − 2
Bb χ+
冥
(A7)
again because E[d ] = 0 and E[d 2] = υd . Collecting terms produces u¯ − u¯ − u¯ − u¯ −
χ χ+ χ χ+ χ χ+ χ χ+
υd − b2 −
2 2 B2 + 2 Bb = χ+ χ+
[υd + b2] − b2 + [υd + b2] − [υd + b2] −
2 χ+
χ χ+ b2 −
b2 − 2
χ+
2 χ+
B2 + 2
B2 + 2
2 [b2 + B 2 − 2Bb] χ+
(A8)
2 χ+
2 χ+ Bb =
Bb =
(A9) (A10) (A11)
and hence the result. LEMMA A2: strategy
Let x1 ≤ x2 be the biases of two readers. For any 1 ≥ c ≥ 0, the
The market for news sx¯(d ) =
(x¯ − d ) χ+
109 (A12)
maximizes weighted average reader utility cEd ux (s(d )) + (1 − c) Ed ux (s(d )), where x¯ = cx1 + (1 − c)x2. 1
2
Moreover, for some x1 ≤ z ≤ x2, the strategy sz(d ) = [/(χ + )](z − d ) maximizes min{Eux (s(d )), Eux (s(d ))}. 1
2
PROOF: Consider total utility cEd ux (s(d )) + (1 − c) Ed ux (s(d )), which equals 1
冮 [u¯ − χs(d )
2
2
− c(d + s(d ) − x1)2 + (1 − c)(d + s(d ) − x2)2].
(A13)
d
Since the right-hand side shows no interdependency in d, maximizing this integral is equivalent to maximizing for every single d, the term u¯ − χs(d )2 − c(d + s(d ) − x1)2 + (1 − c)(d + s(d ) − x2)2.
(A14)
Taking derivatives with respect to s then produces the first-order condition −2χs − 2(d + s − x¯) = 0,
(A15)
which implies that the optimal slanting is: (x¯ − d ). χ+
(A16)
For the second part, let s(d ) be a candidate slanting strategy that maximizes min {Eux (s(d )), Eux (s(d ))}. Define u1 and u2 to be the expected utilities for s(d ). Note that sx and sx maximize reader 1 and reader 2 utilities, respectively. Consequently, there must be a c such that for x¯ = cx1 + (1 − c)x2 the strategy sx¯ yields the same ratio of reader 1 and 2 utilities as the candidate strategy does: u1/u2. But by the first part of the Lemma, Eui(sx¯(d )) ≥ ui for i = 1, 2. Otherwise, the candidate strategy s(d ) would yield higher weighted average utility. But this shows sx¯ maximizes the min and hence s = sx¯. 1
2
1
2
Appendix B: Proofs of propositions PROOF OF PROPOSITION 1: Consider the monopolist’s maximization problem. Reader utility is Ur = max{u¯ − χs2 − P, 0}.
(B1)
110
Sendhil Mullainathan and Andrei Shleifer
Since readers only dislike slanting, a newspaper gets no benefit from slanting and only pays costs. The optimal strategy for both the homogeneous and heterogeneous case is therefore s*(d ) = 0. Since the reader’s gross utility in this case is u¯, the monopolist can extract all surplus and charge P, so that the reader’s net utility is 0. Consider now the duopoly case. Begin with the homogeneous reader case and proceed by backward induction. Consider the price-setting stage. Define Vj to be the utility the reader associates with reading newspaper j. There are two cases here: equal and unequal utilities. For the case of unequal utilities, suppose without loss of generality that V1 > V2. The price equilibrium is for paper 1 to charge V1 − V2 and capture the full market. If V1 = V2, then both papers charge zero. In the strategy-setting stage, holding constant the other’s strategy, both papers’ profit functions are increasing in the reader utility from the strategies they choose. Consequently, it is a weakly dominant strategy for each paper to maximize reader utility. From the monopoly case, we know these strategies are s(d ) = 0. It is an equilibrium, therefore, to have both prices and slanting equal to zero. In the heterogeneous reader case, the logic remains the same because reader utility functions are the same as in the homogeneous case, since utility is independent of beliefs for rational readers. The homogeneous and heterogeneous cases produce the same incentives for the firm. PROOF OF PROPOSITION 2: Since the monopolist can extract all surplus, he maximizes expected utility, max u¯ − χ s*(d)
冮 (s*(d ))
2
−
d
冮 (d + s*(d ) − b) . 2
(B2)
d
There are no interdependencies in this utility maximization across d ’s. Because the maximand is separable in d, choosing the optimal s*(d ) is equivalent to choosing the optimal s* for each d or s*(d ) = argmaxsu¯ − χs2 − (d + s − b)2.
(B3)
For a given d, differentiating with respect to s produces the first-order condition χs + (d + s − b) = 0,
(B4)
which implies s* hom(d ) =
χ+
(b − d ).
(B5)
The market for news
111
Prices then are equal to the expected utility under this strategy. From Lemma (A1), we know the expected utility and hence price is P*hom = u¯ −
χ [b2 + υd ]. χ+
(B6)
PROOF OF PROPOSITION 3: We proceed by backward induction. Consider the price-setting stage. Let Vj be the reader’s utility associated with reading paper j. There are two cases here: equal and unequal utilities. For the case of unequal utilities, suppose without loss of generality that V1 > V2. The price equilibrium is for paper 1 to charge V1 − V2 and capture the full market. If V1 = V2, then both papers charge zero. In the stage where the slanting strategy is set, maximizing reader utility is, as before, a weakly dominant strategy. Holding constant the other firm’s strategy, each firm’s profit is increasing in the reader utility associated with its strategy. We know from Proposition 2 that the utility-maximizing strategy is to slant [/(χ + )](b − d ). Therefore, it is an equilibrium for duopolists to choose this strategy. Since this means both papers provide equal utility, prices equal zero. This shows that this is an equilibrium. Moreover, this logic directly implies that the only equilibrium involves both papers choosing a slanting strategy that maximizes utility and prices equal to zero on the equilibrium path.19 PROOF OF PROPOSITION 4: We proceed in three steps: 1 2 3
First, we show that a linear strategy is an optimal one. Second, we show that of the linear strategies, the strategy with zero bias produces maximum profit. Third, we compute the prices the monopolists would charge.
Step 1: Linearity of monopolist’s strategy. The first step is to show that the linear strategy of the type sB(d ) = [/(χ + )](B − d ) is optimal. To show this, suppose s(d ) and P form an optimal strategy for the monopolist. Let X = {bi | Eui(s(d )) − P ≥ 0} be the biases of the readers who read the paper in this case.20 Since X 傺 [b1, b2] is non-empty, it must have a well-defined inf and sup. Let x1 and x2 be the inf and sup of this set and u1 and u2 be the utility of these readers. Lemma (A2) shows that a linear strategy of the form sz(d ) = [/(χ + )](z − d ), where z = cx1 + (1 − c)x2, maximizes min{u1, u2}. So, sz yields the maximum payoffs for x1 and x2. But by Lemma (A1), all readers with bias between x1 and x2 have even greater utility from this strategy. Define the price Pz to be min{u1, u2}. Since x1 and x2 are the inf and sup of the set X, by the formula in Lemma (A1), it is easy to see that the strategy sz, with price Pz, satisfies the participation constraint of all readers in X.
112 Sendhil Mullainathan and Andrei Shleifer Let us now contrast the supposed optimum strategy (s, P) with this strategy (sz, Pz). By construction, sz has at least as large a market share as s since it spans all readers between the inf and sup of the set X. Moreover, since s satisfies the participation constraint of x1 and x2 we know that it cannot yield higher gross utility for readers at x1 and x2 than sz does. Hence, we know that P ≤ Pz. Thus, the linear strategy sz(d ) yields at least as much profit as the supposed optimum. This shows that we can work with a linear strategy as an optimum. Step 2: Optimal bias is zero. The second step is to show that a monopolist would choose a linear strategy of sB(d ) with B = 0. To do this, we proceed by contradiction. Let (B, P) be a linear strategy B and price P set by a monopolist that we suppose is an optimum strategy. Lemma (A1) shows that for a reader with bias b receives utility u¯ −
χ χ+
[υd + b2] −
2 χ+
[B − b]2 − P.
(B7)
All readers for whom this term is positive will read the paper. Since this is a quadratic equation, we can define the indifferent readers from this equation. By the quadratic formula, the zeros of this equation are at z+(P, B) = (2B + √−4χB 2 + 4(K/)(χ + )2))/2(χ + ) and z−(P, B) = (2B − √−4χB 2 + 4(K/)(χ + )2/2(χ + ), where K is defined to be u¯ − [χ/( + χ)]υd − P. Of course, these zero points may lie outside the range of reader biases, so define b+(P) = max(z−(P, B), b1) and b−(P) = min(z+(P, B), b2). By definition, therefore, all individuals within this interval have weakly positive utility and therefore will purchase the paper. With these definitions in hand, suppose now the monopolist chooses a B ≠ 0. We will consider three different cases: (a) the case where b+ and b− are both interior (i.e., equal to z+ and z−); (b) the case where they are both at the boundary (i.e., equal to b2 and b1); and (c) the case where one is at the boundary and the other is at the interior. First, consider the case where b+(B, P) = z+(B, P) and b−(B, P) = z−(B, P) so that the endpoints are defined by the quadratic equation and not by the boundaries of the reader bias distribution. The size of the interval in this case then equals
冪−4χB + 4 (χ + ) 2
z+(P) − z−(P) =
K
χ+
2
.
(B8)
But, since the constant K does not depend on B, this is strictly decreasing in B 2. Hence, a B ≠ 0 strategy cannot be optimal. If B > 0, reducing it and keeping prices the same would increase profits, and similarly for B < 0. Second, consider the case where both endpoints are defined by the boundary so that b+ = b2 < z+ and b− = b1 > z−. Let U1 be the gross utility of
The market for news
113
the reader at the left boundary (i.e., with bias b1) and U2 be the corresponding utility for the reader at the right boundary. Prices in this case are defined by P = min{U1, U2}. A price smaller than this could be increased marginally without violating the participation constraint and raising profits. A price higher than this would violate the participation constraint of the boundary readers and would be inconsistent with the definition of the boundary. Yet this price implies a violation of optimality. Lemma (A2) shows that for some c, choosing B = cb1 + (1 − c)b2 would maximize min{U1, U2}. Moreover, by the symmetry of the formula in Lemma (A1) it is clear that c = ½. So a strategy of B = b¯ = 0 would still satisfy the participation constraint since it is maximizing the minimum utility. Moreover, by switching to this strategy, the monopolist could increase the price he could charge since min {U1, U2} rises. Profits also rise because he continues to cover the whole market. Hence, by switching to this strategy, the monopolist could raise profits and this contradicts B ≠ 0 as an optimal strategy. Third, consider the case where (without loss of generality) b− = z− > b1 but b+ = b2 < z+. By definition of the roots z− and z+, the reader at b2 earns greater utility than the reader at z− who is indifferent between buying the paper and not. But in this case, consider a deviation that leaves prices fixed but changes strategies to B′ = B − ε. For small enough ε > 0, this continues to give strictly positive utility to the reader at b2 and hence he will continue to read. This will increase market share, however, because some readers with b < z− now earn positive utility from reading. Since this deviation increases market share without decreasing price, the original B could not be an optimum. As this includes all the cases, we have now shown that profits are maximized by a linear strategy with B = 0. What should optimal prices look like? For B = 0, the monopolist’s profits equal P * 2√K where K = u¯ − [χ/ ( + χ)]υd − P. Let Pm be the global maximum of this function. At this maximum b+ − b− = 2√/[u¯ − (χ/(χ + )) − Pm]. Define this to be Cm. So if b2 − b1 < Cm, the monopolist will cover the whole market. He can then set a price equal to the utility of the boundary reader’s utility, which by Lemma (A1) equals u¯ − [χ/( + χ)]υd − 2b22. PROOF OF PROPOSITION 5: We proceed by backward induction in several steps: (1) We calculate x(P1, P2, z1, z2), the bias of the reader who is indifferent between reading the two papers if paper j charges Pj and has bias zj (chosen in the first stage of the game and taken as given in this stage). This allows us to determine the market share of each firm for that location and price pair. (2) We then calculate PR1 (P2; z1, z2) and PR2 (P1; z1, z2), the best response functions for firms 1 and 2, respectively. These are the best price responses of each firm to the other’s price (given the biases zj which are chosen in the first stage and taken as given in the second stage).
114 Sendhil Mullainathan and Andrei Shleifer (3) Using these prices, we calculate the equilibrium prices P*1 (z1, z2) and P* 2 (z1, z2) and market share x*(z1, z2) that result from the choice of bias in the first stage. (4) We then use these equilibrium prices to show that in the first stage, firms will want to differentiate as long as z2 ≤ 3b2. We show that at z2 = 3b2 and z1 = 3b1 = −3b2, the firms are indifferent between lowering and raising zj and thus in equilibrium. (5) Finally, we show that all participation constraints for the consumer are satisfied at the equilibrium. Step 1: Calculating x(P1, P2; z1, z2). u¯ −
χ χ+
[υd + x2] −
2 χ+
A reader with bias x receives utility:
[zj − x]2 − Pj
(B9)
from reading paper j (Lemma (A1)). If the reader with bias x is indifferent between these two papers, then the utilities from reading the two papers are equal: u¯ −
χ χ+
[υd + x2] −
2 χ+
[z2 − x]2 − P2 =
u¯ −
χ χ+
[υd + x2] −
2 χ+
[z1 − x]2 − P1.
(B10)
This equality can in turn be simplified to 2 χ+ 2 χ+ 22 χ+
[(z1 − x)2 − (z2 − x)2] = P2 − P1
(B11)
(z2 − z1)[2x − (z2 + z1)] = P2 − P1
(B12)
∆z[x − z¯] = ∆P
(B13)
x(P1, P2; z1, z2) = z¯ +
∆P χ + ∆z 22
(B14)
where z¯ = (z1 + z2)/2, ∆P = P2 − P1, and ∆z = z2 − z1. Step 2: Calculation price best response functions PRj(P−j). different reader is located at x, firm profits are given by Π1 (P1, P2; z1, z2) =
P1 b2 − b1
(x − b1)
Since the in-
(B15)
The market for news Π2 (P1, P2; z1, z2) =
P1 (b2 − x). b2 − b1
115 (B16)
The firm’s best price response can be derived by differentiating profits with respect to own price. For firm 1, this first-order condition is 1 b2 − b 1
冤
x − b1 + P1
x − b1 + P1 z¯ + z¯ +
∂x1 ∂P1
∆P χ + ∆z 2
2
∂x1 ∂P1
冥=0
(B17)
=0
(B18)
冢
− b1 + P1 −
χ+
冣=0
22∆z
P2 χ + χ+ − b1 = P1 2 2 ∆z 2 ∆z
(z¯ − b1)
∆z2 χ+
+
(B19) (B20)
P2 = P1. 2
(B21)
So the best response function is PR1 (P2; z1, z2) =
P2 2
+ (b2 + z¯)
∆z2 χ+
(B22)
where we’ve used the fact that b2 = − b1 by assumption. Similarly, the best response function for firm 2 is PR2 (P1; z1, z2) =
P1 2
+ (b2 − z¯)
∆z2 χ+
.
(B23)
Step 3: Calculating equilibrium prices and market share. The Nash equilibrium of prices can be calculated from the best response functions by solving R R P* 1 = P 1 (P 2 (P* 1 ; z1, z2))
(B24)
R R P* 2 = P 2 (P 1 (P* 2 ; z1, z2)).
(B25)
The first equation can be calculated as follows: R R P* 1 = P 1 (P 2 (P* 1 ; z1, z2))
P* 1 =
PR2 (P* 1 ; z1, z2) 2
+ (b2 + z¯)
(B26) ∆z2 χ+
(B27)
116 Sendhil Mullainathan and Andrei Shleifer P* 1 = 3 4
1 P1 ∆z2 ∆z2 + (b2 + z¯) + (b2 + z¯) 2 2 χ+ χ+
冢
P* 1 =
P* 1 =
冣
2 ∆z b2
− + z¯ + b 冥 χ + 冤2 2 z¯
2
2 ∆z
2b + 冥. χ+冤 3 2z¯
(B28) (B29) (B30)
2
By a similar calculation P* 2 =
2 ∆z 2z¯ 2b2 − . χ+ 3
冤
冥
(B31)
Using these equilibrium prices, we can also calculate equilibrium market share as x* (z1, z2) = z¯ + ∆P
冤
x* (z1, z2) = z¯ + − x* (z1, z2) = z¯ − z¯
χ+
(B32)
22∆z
4z¯ 2∆z χ + 3 χ + 22∆z
冥
2 3
z¯ x* (z1, z2) = . 3
(B33) (B34) (B35)
Step 4: Differentiation in choosing bias (the first stage). These prices and market share allow us to backward induct and examine the firm’s decision in stage 1. Taking the other firm’s bias as given, they can be used to calculate each firm’s profits for each bias chosen. Specifically profits in stage 1 are Π1(z1, z2) = P* 1 (z1, z2) [x* (z1, z2) − b1]
(B36)
Π2(z1, z2) = P* 2 (z1, z2) [ b2 − x* (z1, z2)].
(B37)
The first-order condition for this problem is instructive. Focusing on firm 1, we can write profits as
P* 1 (z1, z2)
z¯
冤3 − b 冥. 1
Differentiating with respect to z1 gives
(B38)
The market for news ∂P* ∂Π1 P* ¯ 1 (z1, z2) 1 (z1, z2) z + − b1 = ∂z1 6 ∂z1 3
冤
∂Π1 ∂z1
冢
= b2 +
z¯ 22 ∆z
3冣6(χ + ) 冢
+ b2 +
冥
(B39) ∆z
− b − 冥. 3冣 χ + 冤 6 3 z¯
117
22
z¯
(40)
2
Now we are interested in the sign of this derivative. Define sign(x) to be the function that equals + 1 if x > 0 and − 1 if x < 0. We can then write sign
∂Π1
冢 ∂z 冣 = sign 冢 冢 b + 3 冣 ∆z + 冢b + 3冣 * [ ∆z − 6b − 2z¯]冣 z¯
z¯
2
2
2
(B41)
1
sign
∂Π1
冢 ∂z 冣 = sign 冢 冤 b + 3冥 * (∆z + ∆z − 6b − 2z¯)冣 . z¯
2
2
(B42)
1
Now suppose that we are in a symmetric case where z1 = −z2 so that z¯ = 0. In this case, sign
∂Π1
冢 ∂z 冷 冣 = sign (b * (2∆z − 6b )) 2
2
(B43)
1 z¯ = 0
sign
∂Π1
冢 ∂z 冷 冣 = sign (2∆z − 6b )) 2
(B44)
1 z¯ = 0
sign
∂Π1
冢 ∂z 冷 冣 = sign (− 4z − 6b )). 1
2
(B45)
1 z¯ = 0
So (∂Π1)冷∂z1)冷z¯ = 0 < 0 if and only if −2z1 < 3b2. In other words, if −z1 < 3/2b2, firm 1 always has an incentive to further lower z1. If −z1 > 3/2 b2, firm 1 has an incentive to raise z1. A similar derivation shows that (∂Π2/∂z2)冷z¯ = 0 > 0 only if z2 < 3/2b2. This, therefore, shows that at z*2 = 3/2 b2 and z*1 = − 3/2b2, the firms are at a Nash equilibrium for the first stage game. Substitution shows that for this 2 2 choice of z* j , prices must be equal to [ /(X + Φ)]6b2. Step 5: Boundary conditions. Finally, we must verify that in equilibrium, the participation constraints of the consumer are satisfied. It suffices to show that the consumer located at zero receives non-zero utility from buying either paper. That is, we must show (by Lemma (1)) that: u¯ −
χ Φ+χ
υd −
Φ2 9 2 Φ2 2 b2 − 6 b2 > 0 χ+Φ4 χ+Φ
(B46)
where the first three terms are the gross utility of reading the paper and the last term is the price. This is equivalent to:
118
Sendhil Mullainathan and Andrei Shleifer χ 33 2 Φ2 b2 < u¯ − υd, 4 χ+ χ+Φ
(B47)
which is equivalent to
b2
CRBhet,mon. From Propositions 5 and 4, we know that duopolists report more diverse news than the monopolist when readers are heterogeneous. But from the functional form of xc(· , ·), we know that this diversity allows the conscientious reader to cross-check and thus produces less bias for her overall. Consider the second comparison CRBhet,duo < CRBhom,duo. By Propositions 5 and 3, we know that reporting in the heterogeneous case is more diverse. So, once again, the increased diversity means lower conscientious reader bias.
Notes * We are extremely grateful to Alberto Alesina, Daniel Benjamin, Tim Besley, Filipe Campante, Gene D’Avolio, Glenn Ellison, Josh Fischman, Edward Glaeser, Matthew Gentzkow, Simon Johnson, Emir Kamenica, Lawrence Katz, David Laibson, Dominique Olie Lauga, Emily Oster, Richard Posner, Jesse Shapiro, Jeremy Stein, Lawrence Summers, and three anonymous referees for comments. This paper is a substantially revised version of an earlier paper entitled “Media Bias.” 1 Ronald Coase (1974), Besley and Robin Burgess (2002), Besley and Prat (2002), Djankov et al. (2003), David Stromberg (2001), and Alexander Dyck and Luigi Zingales (2002) all advance this view of competition in the media as delivering greater accuracy. 2 H. L. Mencken (1920), Walter Lippmann (1922), Samuel Hayakawa (1940), Michael Jensen (1979), Doris Graber (1984), James Hamilton (2003), and the standard communications textbook (Werner Severin and James Tankard, Jr., 1992) all advance this view of news. 3 Entertainment broadcasting is analyzed by Peter Steiner (1952), Michael Spence and Bruce Owen (1977), Ronald Goettler and Ron Shachar (2001), and Esther Gal-Or and Anthony Dukes (2003). Jean Gabszewicz et al. (2001) take the approach closest to ours by conceptualizing news provision in a Hotelling framework. They examine how advertisers have an impact on content, whereas we focus on media accuracy. 4 For concreteness, we talk about newspapers, although our argument applies equally well to television and radio. 5 Persuasion can also work through outright fabrication of news, as was done routinely by the Communist press, and occasionally even in Western newspapers (e.g., Jason Blair’s reporting for the New York Times.)
The market for news
119
6 This assumption is immaterial to our results. All we require is that newspapers face some quadratic cost of slanting. This cost could just as easily arise on the supply side, with firms facing a technological or private reputational cost of slanting, and the results would be the same. The necessary feature is that firms cannot slant freely. 7 As with all Hotelling models, the assumptions on transportation costs matter. With linear transportation costs, an equilibrium does not exist. But while the results depend on nonlinear transportation costs, they are not specific to the quadratic. Other convex functions produce similar results (Nicholas Economides, 1986). See Steffen Brenner (2001) for a survey. Similarly, as with all Hotelling models, the assumption of Bertrand competition is key to our results. 8 As is clear from the proof of the proposition, this result generalizes trivially to J > 2 newspapers. 9 Even when b = 0, there is slanting. This is because even a reader who has zero bias ex ante does not want to change his mind ex post. Consequently, the monopolist slants news toward the reader’s bias, 0. 10 For this same reason, and as is clear from the proof of the proposition, this result holds for any number of newspapers J ≥ 2. 11 The stated equilibrium for the duopolists is not unique because any strategy profile that differs on a set of measure zero would also be an equilibrium. 12 If b¯ = 0, but b2 − b1 > Cm, the monopolist would use the same slanting strategy as in Proposition 4, but would charge a high enough price that not all people read the paper. The case where b¯ ≠ 0 is more complicated. The monopolist would not slant toward b¯ anymore. Instead, he would slant toward a point between b¯ and 0. This is because readers closer to the origin enjoy higher overall surplus from reading the paper (see Lemma (A1)). Consequently, the monopolist would prefer a distribution of readers closer to the origin so as to be able to charge higher prices. 13 If b¯ = 0 but b2 < Cd , the duopolists differentiate less than stated in Proposition 5. The participation constraint of the reader with bias 0 begins to bind and the duopolists locate closer together than in the proposition. If b2 is sufficiently large, the duopolists would even end up inside the distribution of reader beliefs so that | zj| < |bj|. 14 Recall that “located at z” means the paper biases according to the rule sz(d ) = [/(Ξ + )](z − d ). 15 This analysis also illustrates why Proposition 5 is about competition, per se, and not about variety alone. A monopolist who could start two newspapers does not need to differentiate to increase market power. He would differentiate simply to cater to reader tastes, but would not go beyond the most extreme readers as duopolists would. 16 These results are not unique to the Muslim world. Steven Kull et al. (2003) document significant confusion among large percentages of U.S. respondents on such questions as Saddam Hussein’s culpability in 9/11 and the discovery of weapons of mass destruction in Iraq. The study also finds that those who get their news from Fox News are less well informed about these issues than those who get their news from PBS and NPR. 17 Importantly, newspapers do not slant by simply manufacturing evidence. 18 The extreme cross-checking depends on the two papers slanting stories using the same rule. It is necessary for our results only that the papers use similar rules. Suppose that when one paper omits a fact, it appears in an oppositely slanted paper only with probability z. In this case, the cross-checking function becomes (1 − z)s1 + (1 − z)s2 + zxc(s1, s2). Thus, the qualitative statements we make are preserved. 19 Any slanting strategy that deviates on measure zero from the optimal one also forms an equilibrium since expected utility is the same. 20 This set must be non-empty since the strategy stated in the proposition earns positive profits and an empty readership would earn zero profits.
120
Sendhil Mullainathan and Andrei Shleifer
References Alterman, Eric. What liberal media? The truth about bias and the news. New York: Basic Books, 2003. Baron, David. “Persistent Media Bias.” Stanford University, Graduate School of Business Research Papers: No. 1845, 2004. Bartlett, Frederic. Remembering: A study in experimental and social psychology. Cambridge: Cambridge University Press, 1932. Besley, Timothy and Burgess, Robin. “The Political Economy of Government Responsiveness: Theory and Evidence from India.” Quarterly Journal of Economics, 2002, 117(4), pp. 1415–51. Besley, Timothy and Prat, Andrea. “Handcuffs for the Grabbing Hand? Media Capture and Government Accountability.” Center for Economic Policy Research, CEPR Discussion Papers: No. 3132, 2002. Brenner, Steffen. “Determinants of Product Differentiation: A Survey.” Unpublished Paper, 2001. Brunnermeier, Markus K. and Nagel, Stefan. “Hedge Funds and the Technology Bubble.” Journal of Finance, 2004, 59(5), pp. 2013–40. Coase, Ronald H. “The Market for Goods and the Market for Ideas.” American Economic Review, 1974, 64(2), pp. 384–91. Coulter, Ann. Slander: Liberal lies about the American Right. New York: Three Rivers Press, 2003. d’Aspremont, Claude; Gabszewicz, Jean J. and Thisse, Jacque-Francois. “On Hotelling’s ‘Stability in Competition’.” Econometrica, 1979, 47(5), pp. 1145–50. De Long, J. Bradford; Shleifer, Andrei; Summers, Lawrence H. and Waldmann, Robert J. “Positive Feedback Investment Strategies and Destabilizing Rational Speculation.” Journal of Finance, 1990, 45(2), pp. 379–95. Djankov, Simeon; McLiesh, Carlee; Nenova, Tatiana and Shleifer, Andrei. “Who Owns the Media?” Journal of Law and Economics, 2003, 46(2), pp. 341–81. Dyck, Alexander and Zingales, Luigi. “The Corporate Governance Role of the Media,” in Roumeen Islam, ed., The right to tell: The role of media in development. Washington, DC: 2002, pp. 107–37. Economides, Nicholas. “Minimal and Maximal Product Differentiation in Hotelling’s Duopoly.” Economics Letters, 1986, 21(1), pp. 67–71. Fiske, Susan. “Social Cognition,” in Abraham Tesser, ed., Advanced social psychology. New York: McGraw-Hill, 1995. Franken, Al. Lies and the lying liars who tell them: A fair and balanced look at the Right. New York: E.P. Dutton & Company, 2003. Friedman, Milton. “The Case for Flexible Exchange Rates,” in Milton Friedman, ed., Essays in positive economics. Chicago: University of Chicago Press, 1953, pp. 157–203. Gabaix, Xavier and Laibson, David. “Shrouded Attributes and Information Suppression in Competitive Markets.” Unpublished Paper, 2004. Gabszewicz, Jean J.; Laussel, Dider and Sonnac, Nathalie. “Press Advertising and the Ascent of the ‘Pensee Unique’.” European Economic Review, 2001, 45(4–6), pp. 641–51. Gal-Or, Esther and Dukes, Anthony. “Minimum Differentiation in Commercial Media Markets.” Journal of Economics and Management Strategy, 2003, 12(3), pp. 291–325.
The market for news
121
Gentzkow, Matthew A. and Shapiro, Jesse M. “Media, Education and AntiAmericanism in the Muslim World.” Journal of Economic Perspectives, 2004, 18(3), pp. 117–33. Glaeser, Edward L. “The Political Economy of Hatred.” Quarterly Journal of Economics, 2005, 120(1), pp. 45–86. Goettler, Ronald L. and Shachar, Ron. “Spatial Competition in the Network Television Industry.” RAND Journal of Economics, 2001, 32(4), pp. 624–56. Goldberg, Bernard. Bias: A CBS insider exposes how the media distort the news. Washington, DC: Regency Publishing, Inc., 2002. Graber, Doris. Processing the news: How people tame the information tide. New York: Longman Press, 1984. Hamilton, James T. All the news that’s fit to sell: How the market transforms information into news. Princeton: Princeton University Press, 2003. Hayakawa, Samuel I. Language in thought and action, 5th ed. New York: Harcourt, Brace & Company, 1990. First published in 1940 by Harcourt, Brace & Company. Jensen, Michael. “Toward a Theory of the Press,” in Karl Brunner, ed., Economics and social institutions: Insights from the conferences on analysis and ideology. Boston: Martinus Nijhoff Publishing, 1979. Klayman, Josh. “Varieties of Confirmation Bias,” in Jerome Busemeyer, Reud Hastie and Douglas Medin, eds., Decision making from a cognitive perspective. The psychology of learning and motivation. Vol. 32. San Diego: Academic Press, 1995, pp. 365–418. Kull, Steven, et al. “Misperceptions, the Media and the Iraq War,” Program on International Policy Attitudes, http://www.pipa.org, 2003. Lippmann, Walter. Public opinion. 1922. Reprint, New York: Free Press, 1965. Lord, Charles G.; Ross, Lee and Lepper, Mark R. “Biased Assimilation and Attitude Polarization: The Effect of Theories on Subsequently Considered Evidence.” Journal of Personality and Social Psychology, 1979, 37(11), pp. 2098–109. Mencken, Henry L. A gang of pecksniffs. 1920. Reprint, New Rochelle, NY: Arlington House Publishers, 1975. Mullainathan, Sendhil. “Thinking through Categories.” Unpublished Paper, 2002. Murphy, Kevin M. and Shleifer, Andrei. “Persuasion in Politics.” American Economic Review, 2004 (Papers and Proceedings), 94(2), pp. 435–39. Posner, Richard A. An affair of state: The investigation, impeachment, and trial of President Clinton. Cambridge, MA: Harvard University Press, 1999. Rabin, Matthew and Schrag, Joel L. “First Impressions Matter: A Model of Confirmatory Bias.” Quarterly Journal of Economics, 1999, 114(1), pp. 37–82. Severin, Werner J. and Tankard, James W., Jr. Communication theories: Origins, methods and uses in the mass media, 3rd ed. New York: Longman Group, Ltd., 1992. Shleifer, Andrei. Inefficient markets: An introduction to behavioral finance. Oxford: Oxford University Press, 2000. Spence, A. Michael and Owen, Bruce. “Television Programming, Monopolistic Competition, and Welfare.” Quarterly Journal of Economics, 1977, 91(1), pp. 103–26. Steiner, Peter. “Program Patterns and Preferences, and the Workability of Competition in Radio Broadcasting.” Quarterly Journal of Economics, 1952, 66(2), pp. 194–223.
122
Sendhil Mullainathan and Andrei Shleifer
Stromberg, David. “Mass Media and Public Policy.” European Economic Review, 2001, 45(4–6), pp. 652–63. Tirole, Jean. The theory of industrial organization. Cambridge, MA: MIT Press, 1988. Zaller, John R. The nature and origins of mass opinion. Cambridge: Cambridge University Press, 1992.
7
The impact of gun laws A model of crime and self-defense Hugo M. Mialon and Thomas Wiseman
Introduction Gun-control advocates argue that guns have a facilitating effect on crime, because they inevitably end up through loss or theft in the hands of criminals. Gun-rights advocates argue that guns have a deterrent effect on crime, because victims may be carrying a concealed weapon. We propose a simple, strategic model of crime and self-defense. We analyze the impact of gun control on gun use by criminals, gun carrying and non-gun prevention or lying low by potential victims, and completed gun and nongun crime. A marginal increase in gun control from a moderate level increases completed non-gun crime, but reduces completed gun crime, and so may on net be desirable. However, with total gun control potential victims always seek to avoid crime by lying low, suffering a substantial loss of freedom. The model thus provides a rationale both for a right to bear arms and for regulating this right. In contrast to full gun control, we show that a severe punishment for committing a crime with a gun leads to no gun crime and no lying low by victims. The gun-crime penalty eliminates gun crime but preserves the threat of armed response, and so may best ensure safety while preserving freedom. In the large literature on the economics of crime, stemming from the work of Becker (1968) and Ehrlich (1973), few studies model theoretically the connection between guns and crime.1 There is, though, a growing empirical literature on gun control and crime. Starting with Lott and Mustard (1997), several studies (including Bronars and Lott, 1998, Benson and Mast, 2001, Plassmann and Tideman, 2001, and Lott, 2003) find that allowing concealed handguns has significantly reduced crime across America. Other studies (including Dezhbakhsh and Rubin, 1998, 2003, Black and Nagin, 1998, Ludwig, 1998, Duggan, 2001, and Ayres and Donohue, 2003) do not support that finding. The empirical debate lacks a strong theoretical basis to guide it, as evidenced by the failure to distinguish between guns used for self-defense and guns used in crime, the latter of which is the real problem. Our result that a harsh penalty for gun crime could eliminate such crime without losing the
124
Hugo M. Mialon and Thomas Wiseman
deterrent effect suggests that empirical research might productively focus on the impact of gun-crime sentencing laws.
Model The population is divided into two groups: potential criminals and potential victims. Victims have an endowment w > 0. Criminals have no endowment, but can take it from victims, which is a crime. Potential criminals choose not to commit crime ( ¬C), to commit crime without a gun ( ¬GC), or to carry a gun, at a cost g > 0, and commit crime with it (GC). Potential victims choose not to carry a gun ( ¬G), to carry a gun (G) for protection, also at a cost g, or to take alternative preventive measures, such as lying low by staying home at night or avoiding certain places or situations (L), at a cost l > 0, which represents the loss of freedom. Agents from each population are paired at random to play the game in Table 7.1. Let α1 and α2 be the probabilities that the criminal chooses GC and ¬GC, respectively, and let β1 and β2 be the probabilities that the victim chooses ¬G and G. By lying low, potential victims avoid meeting a criminal. If an unarmed potential victim meets an armed criminal, the criminal takes the victim’s endowment and possibly shoots the victim in the process, creating an expected cost, d > 0. If an armed victim meets an armed criminal, then with equal probability the victim either defends his endowment and possibly shoots the criminal, or loses his endowment and is possibly shot. If an unarmed victim meets an unarmed criminal, then again equal chance determines the outcome, but neither party is shot. An armed potential victim meeting an unarmed criminal defends his endowment and possibly shoots the criminal. We make the following assumptions about payoffs: (1) l > g, (2) 2g > l, (3) w > 2l, and (4) d > w − 2g. Lying low is costlier than carrying a gun, but there is always the possibility of an accident, so the cost of carrying a gun cannot be too low. The endowment is valuable relative to the cost of lying low, and the disutility from being shot is also high. Given Assumptions 1–4, the game has a unique, fully-mixed equilibrium: α1 =
2(l − G) d+w
, α2 =
2(2g − l) w
, β1 =
4gd 2gw , β2 = . w(3d + w) w(3 d + w)
The assumptions ensure that these are well-defined probabilities. Table 7.1 Normal form of the game of crime and self-defense
¬C ¬GC GC
¬G
G
L
0, w ½(w), ½(w) w − g, − d
0, w − g − d, w − g ½(w − d) − g, ½(w − d ) − g
0, w − l 0, w − l − g, w − l
(1)
The impact of gun laws
125
The impact of marginal gun control Gun control increases the cost of carrying a gun, g, by increasing the risk of legal punishment. It cannot increase g only for potential criminals because victims can lose their guns or have them stolen, and criminals can then obtain them underground. We are interested in the effect of gun control on the following outcomes: gun carrying by potential victims, β2, lying low by potential victims, 1 − β1 − β2, attempted non-gun crime, α2, attempted gun crime, α1, completed non-gun crime, α2(12β1), and completed gun crime, α1(β1 + 12 β2), which is possibly the most important element. Let us look at the effect of a marginal increase in g on equilibrium strategies: ∂α1 ∂g
−2 ∂α2 4 ∂β1 4d ∂β2 2w < 0, = > 0, = > 0, = > 0. d+w ∂g w ∂g w(3d + w) ∂g w(3 d + w) (2)
=
Marginal gun control reduces the incentives of potential criminals to commit gun crime and increases their incentives to commit non-gun crime. Thus potential victims reduce their lying low and increase their gun carrying, by the exact magnitudes that restore criminals’ indifference. Raising g reduces the incentives of victims to carry a gun for protection and increases their incentives to lie low. Thus potential criminals commit less gun crime and more non-gun crime, so as to make victims indifferent again. The effects of marginal gun control on completed non-gun and gun crime are: ∂α2 (12 β1) ∂g
∂α1 (β1 + 12 ∂g
∂α2 1
1 ∂β1
4d(4g − l)
冤2 ∂g 冥 + ∂g 冤2 β 冥 = w (3d + w) > 0 β) ∂β 1 ∂β ∂α 1 =α冤 + + β + β冥= 冥 冤 ∂g 2 ∂g ∂g 2
= α2
1
2
1
1
2
2
(3)
1
1
2
2(4d + w) w(d + w)(3d + w)
[l − 2g] < 0. (4)
Gun control increases attempted non-gun crime and reduces lying low by victims, so it increases completed non-gun crime. Since it reduces attempted gun crime and increases gun carrying by victims, it reduces completed gun crime. Comparing (3) and (4) reveals that the effect on completed non-gun crime is larger, so total completed crime rises. The expected utilities of criminals and victims are unchanged. However, since gun crime may generate more negative externalities (e.g., on family members of shooting victims), and because potential victims who do not lie low have more freedom, which may create positive externalities for society, marginal gun control may be beneficial.
126
Hugo M. Mialon and Thomas Wiseman
The impact of full gun control To examine the effect of full gun control, implemented through a large g, we relax Assumption 1 (that g < l). If g > l, then G is strictly dominated for potential victims. Table 7.2 shows the resulting game. Now no crime is weakly dominated for the potential criminal. There is a continuum of equilibria, in all of which victims lie low with probability one, and criminals randomize between no crime and no-gun crime, with probability 2lw or more on the latter. Expected utilities are unchanged. There is no completed crime, but the threat of crime leads potential victims to always lie low and suffer the loss of freedom. The result suggests that some right to bear arms is fundamental to the freedom of potential victims.
The impact of severe gun-crime punishment So far, government only plays a role through gun control. In reality, it can also set the punishment for committing crimes. In particular, it could punish gun crime more severely than non-gun crime. We now introduce a cost s > 0 to the criminal’s payoff for the strategy pairs (GC, ¬G) and (GC, G), representing the expected additional punishment for committing a gun crime. (For simplicity, we set the sentence for non-gun crime to zero.) If s is high enough, then GC is dominated for the criminal, so L is conditionally dominated for victims. The resulting game, shown in Table 7.3, has a unique equilibrium: criminals commit non-gun crime with probability 1 − 2g w , and victims carry d . Criminals’ utility is unchanged, but victims’ guns with probability d+½w increases. A harsh punishment for gun crime keeps criminals from using guns, but preserves the threat of an armed response by victims, so that potential victims never lie low. In contrast, under the no-gun regime they always lie low. Table 7.2 The game of crime and self-defense with full gun control
¬C ¬GC GC
¬G
L
0, w ½(w), ½(w) w − g, − d
0, w − l 0, w − l − g, w − l
Table 7.3 The game of crime and self-defense with a severe gun-crime penalty
¬C ¬GC
¬G
G
0, w ½(w), ½(w)
0, w − g − d, w − g
The impact of gun laws
127
The gun-crime penalty is focused on the real problem, which is not guns, but using guns to commit crime.2
Conclusion Our model of crime and self-defense distinguishes between gun crime and non-gun crime, between lying low and carrying guns for self-defense, and between gun crime and gun carrying in self-defense. Our results are that (1) marginal gun control increases non-gun crime but reduces gun crime, (2) full gun control eliminates gun crime but also reduces the freedom of potential victims, and (3) severe gun-crime punishment eliminates gun crime and preserves freedom. The second result suggests that the Second Amendment’s right to bear arms cannot be revoked without severely threatening individual freedom, while the first result shows that some gun control may also benefit society, by reducing gun crime. The third result suggests that gun-crime punishment could curb gun crime while preserving the gun rights of potential victims. A severe penalty for committing gun crime may be the best way to guarantee both security and freedom.
Acknowledgements We thank Preston McAfee, Paul Rubin, and an anonymous referee for helpful comments and suggestions.
Notes 1 One notable exception is Donohue and Levitt (1998), who model the impact of guns on criminals fighting against other criminals for the possession of an external prize. In contrast, our model is about criminals attacking victims for their endowments. In this context, guns also play a role in self-defense. 2 Helsley and O’Sullivan (2001) consider a refundable deposit policy that would encourage law-abiding gun owners to protect their guns from being stolen by criminals. In terms of our model, that policy would correspond to increasing g only for potential criminals, and so would have a similar effect to independently increasing the punishment for committing crime with a gun.
References Ayres, I. and J.J. Donahue, 2003, Shooting Down the More Guns, Less Crime Hypothesis. Stanford Law review 55, 1193–1312. Becker, G.S., 1968, Crime and Punishment: An Economic Approach, Journal of Political Economy 76, 169–217. Benson, B.L. and B.M. Mast, 2001, Privately Produced General Deterrence, Journal of Law and Economics 44, 725–746. Black, D.A. and D.S. Nagin, 1998, Do Right-to-Carry Laws Deter Violent Crime?, Journal of Legal Studies 27, 209–219. Bronars, S.G. and J.R. Lott Jr., 1999, Criminal Deterrence, Geographic Spillovers,
128
Hugo M. Mialon and Thomas Wiseman
and Right-to-Carry Concealed Handguns, American Economic Review Papers and Proceedings 88, 475–479. Dezhbakhsh, H. and P.H. Rubin, 1998, Lives Saved or Lives Lost: The Effect of Concealed Handgun Laws on Crime, American Economic Review Papers and Proceedings 88, 468–474. Dezhbakhsh, H. and P.H. Rubin, 2003, The Effect of Concealed Handgun Laws on Crime: Beyond the Dummy Variables, International Review of Law and Economics 23, 199–216. Donohue III, J.D. and S.D. Levitt, 1998, Guns, Violence, and the Efficiency of Illegal Markets, American Economic Review Papers and Proceedings 88, 463–467. Duggan, M., 2001, More Gun, More Crime, Journal of Political Economy 109, 1086– 1114. Ehrlich, I., 1973, Participation in Illegitimate Activities: A Theoretical and Empirical Investigation, Journal of Political Economy, 81 (3), 521–565. Helsley, R.W. and A. O’Sullivan, 2001, Stolen Gun Control, Journal of Urban Economics 50, 436–447. Lott, J.R. Jr., 2003, The Bias Against Guns: Why Almost Everything You’ve Heard About Gun Control Is Wrong (DC, Regnery Publishing). Lott, J.R. Jr. and D.B. Mustard, 1997, Crime, Deterrence, and Right-to-Carry Concealed Handguns, Journal of Legal Studies 26, 1–68. Ludwig, J., 1998, Concealed Gun-Carrying Laws and Violent Crime: Evidence from State Panel Data, International Review of Law and Economics 18, 239–254. Plassmann, F. and T.N. Tideman, 2001, Does the Right to Carry Concealed Handguns Deter Countable Crimes? Only a Count Analysis Can Say, Journal of Law and Economics 44, 771–798.
8
Crime, deterrence, and rightto-carry concealed handguns John R. Lott, Jr., and David B. Mustard *
I. Introduction Will allowing concealed handguns make it likely that otherwise law-abiding citizens will harm each other? Or will the threat of citizens carrying weapons primarily deter criminals? To some, the logic is fairly straightforward. Philip Cook argues that “[i]f you introduce a gun into a violent encounter, it increases the chance that someone will die.”1 A large number of murders may arise from unintentional fits of rage that are quickly regretted, and simply keeping guns out of people’s reach would prevent deaths.2 Using the National Crime Victimization Survey, Cook further states that each year there are “only” 80,000–82,000 defensive uses of guns during assaults, robberies, and household burglaries.3 By contrast, other surveys imply that private firearms may be used in self-defense up to two and a half million times each year, with 400,000 of these defenders believing that using the gun “almost certainly” saved a life.4 With total firearm deaths from homicides and accidents equaling 19,187 in 1991,5 the Kleck and Gertz numbers, even if wrong by a very large factor, suggest that defensive gun use on net saved lives. While cases like the 1992 incident where a Japanese student was shot on his way to a Halloween party in Louisiana make international headlines,6 they are rare. In another highly publicized case, a Dallas resident recently became the only Texas resident so far charged with using a permitted concealed weapon in a fatal shooting.7 Yet, in neither case was the shooting found to be unlawful.8 The rarity of these incidents is reflected in Florida statistics: 221,443 licenses were issued between October 1, 1987, and April 30, 1994, but only 18 crimes involving firearms were committed by those with licenses.9 While a statewide breakdown on the nature of those crimes is not available, Dade County records indicate that four crimes involving a permitted handgun took place there between September 1987 and August 1992, and none of those cases resulted in injury.10 The potential defensive nature of guns is indicated by the different rates of so-called hot burglaries, where residents are at home when the criminals strike.11 Almost half the burglaries in Canada and Britain, which have tough gun control laws, are “hot burglaries.” By contrast, the United States, with
130 John R. Lott, Jr., and David B. Mustard laxer restrictions, has a “hot burglary” rate of only 13 percent. Consistent with this, surveys of convicted felons in America reveal that they are much more worried about armed victims than they are about running into the police. This fear of potentially armed victims causes American burglars to spend more time than their foreign counterparts “casing” a house to ensure that nobody is home. Felons frequently comment in these interviews that they avoid late-night burglaries because “that’s the way to get shot.”12 The case for concealed handgun use is similar. The use of concealed handguns by some law-abiding citizens may create a positive externality for others. By the very nature of these guns being concealed, criminals are unable to tell whether the victim is armed before they strike, thus raising criminals’ expected costs for committing many types of crimes. Stories of individuals using guns to defend themselves have helped motivate 31 states to adopt laws requiring authorities to issue, without discretion, concealed-weapons permits to qualified applicants.13 This constitutes a dramatic increase from the nine states that allowed concealed weapons in 1986.14 While many studies examine the effects of gun control,15 and a smaller number of papers specifically address the right-to-carry concealed firearms,16 these papers involve little more than either time-series or cross-sectional evidence comparing mean crime rates, and none controls for variables that normally concern economists (for example, the probability of arrest and conviction and the length of prison sentences or even variables like personal income).17 These papers fail to recognize that, since it is frequently only the largest population counties that are very restrictive when local authorities have been given discretion in granting concealed handgun permits, “shall issue” concealed handgun permit laws, which require permit requests be granted unless the individual has a criminal record or a history of significant mental illness,18 will not alter the number of permits being issued in all counties. Other papers suffer from additional weaknesses. The paper by McDowall et al.,19 which evaluates right-to-carry provisions, was widely cited in the popular press. Yet, their study suffers from many major methodological flaws: for instance, without explanation, they pick only three cities in Florida and one city each in Mississippi and Oregon (despite the provisions involving statewide laws), and they use neither the same sample period nor the same method of picking geographical areas for each of those cities.20 Our paper hopes to overcome these problems by using annual crosssectional time-series county-level crime data for the entire United States from 1977 to 1992 to investigate the effect of “shall issue” right-to-carry concealed handgun laws. It is also the first paper to study the questions of deterrence using these data. While many recent studies employ proxies for deterrence— such as police expenditures or general levels of imprisonment—we are able to use arrest rates by type of crime and for a subset of our data also conviction rates and sentence lengths by type of crime.21 We also attempt to analyze a question noted but not empirically addressed in this literature: the concern
Crime, deterrence, and right-to-carry concealed handguns
131
over causality between increases in handgun usage and crime rates. Is it higher crime that leads to increased handgun ownership, or the reverse? The issue is more complicated than simply whether carrying concealed firearms reduces murders because there are questions over whether criminals might substitute between different types of crimes as well as the extent to which accidental handgun deaths might increase.
II. Problems testing the effect of “shall issue” concealed handgun provisions on crime Following Becker (1968), many economists have found evidence broadly consistent with the deterrent effect of punishment.22 The notion is that the expected penalty affects the prospective criminal’s desire to commit a crime. This penalty consists of the probabilities of arrest and conviction and the length of the prison sentence. It is reasonable to disentangle the probability of arrest from the probability of conviction since accused individuals appear to suffer large reputational penalties simply from being arrested.23 Likewise, conviction also imposes many different penalties (for example, lost licenses, lost voting rights, further reductions in earnings, and so on) even if the criminal is never sentenced to prison.24 While this discussion is well understood, the net effect of “shall issue” right-to-carry concealed handguns is ambiguous and remains to be tested when other factors influencing the returns to crime are controlled for. The first difficulty involves the availability of detailed county-level data on a variety of crimes over 3,054 counties during the period from 1977 to 1992. Unfortunately, for the time period we study, the Federal Bureau of Investigation’s (FBI) Uniform Crime Report includes only arrest rate data rather than conviction rates or prison sentences. While we make use of the arrest rate information, we will also use county-level dummies, which admittedly constitute a rather imperfect way to control for cross-county differences such as differences in expected penalties. Fortunately, however, alternative variables are available to help us proxy for changes in legal regimes that affect the crime rate. One such method is to use another crime category as an exogenous variable that is correlated with the crimes that we are studying but at the same time is unrelated to the changes in right-to-carry firearm laws. Finally, after telephoning law enforcement officials in all 50 states, we were able to collect time-series county-level conviction rates and mean prison sentence lengths for three states (Arizona, Oregon, and Washington). The FBI crime reports include seven categories of crime: murder, rape, aggravated assault, robbery, auto theft, burglary, and larceny.25 Two additional summary categories were included: violent crimes (including murder, rape, aggravated assault, and robbery) and property crimes (including auto theft, burglary, and larceny). Despite being widely reported measures in the press, these broader categories are somewhat problematic in that all crimes are given the same weight (for example, one murder equals one
132 John R. Lott, Jr., and David B. Mustard aggravated assault). Even the narrower categories are somewhat broad for our purposes. For example, robbery includes not only street robberies, which seem the most likely to be affected by “shall issue” laws, but also bank robberies, where, because of the presence of armed guards, the additional return to having armed citizens would appear to be small.26 Likewise, larceny involves crimes of “stealth,” but these range from pickpockets, where “shall issue” laws could be important, to coin machine theft.27 This aggregation of crime categories makes it difficult to separate out which crimes might be deterred from increased handgun ownership and which crimes might be increased as a result of a substitution effect. Generally, we expect that the crimes most likely to be deterred by concealed handgun laws are those involving direct contact between the victim and the criminal, especially those occurring in a place where victims otherwise would not be allowed to carry firearms. For example, aggravated assault, murder, robbery, and rape seem most likely to fit both conditions, though obviously some of all these crimes can occur in places like residences where the victims could already possess firearms to protect themselves. By contrast, crimes like auto theft seem unlikely to be deterred by gun ownership. While larceny is more debatable, in general—to the extent that these crimes actually involve “stealth”—the probability that victims will notice the crime being committed seems low and thus the opportunities to use a gun are relatively rare. The effect on burglary is ambiguous from a theoretical standpoint. It is true that if “shall issue” laws cause more people to own a gun, the chance of a burglar breaking into a house with an armed resident goes up. However, if some of those who already owned guns now obtain right-to-carry permits, the relative cost of crimes like armed street robbery and certain other types of robberies (where an armed patron may be present) should rise relative to that for burglary. Previous concealed handgun studies that rely on state-level data suffer from an important potential problem: they ignore the heterogeneity within states.28 Our telephone conversations with many law enforcement officials have made it very clear that there was a large variation across counties within a state in terms of how freely gun permits were granted to residents prior to the adoption of “shall issue” right-to-carry laws.29 All those we talked to strongly indicated that the most populous counties had previously adopted by far the most restrictive practices on issuing permits. The implication for existing studies is that simply using state-level data rather than county data will bias the results against finding any effect from passing right-to-carry provisions. Those counties that were unaffected by the law must be separated out from those counties where the change could be quite dramatic. Even cross-sectional city data30 will not solve this problem, because without timeseries data it is impossible to know what effect a change in the law had for a particular city. There are two ways of handling this problem. First, for the national sample, we can see whether the passage of “shall issue” right-to-carry laws
Crime, deterrence, and right-to-carry concealed handguns
133
produces systematically different effects between the high and low population counties. Second, for three states, Arizona, Oregon, and Pennsylvania, we have acquired time-series data on the number of right-to-carry permits for each county. The normal difficulty with using data on the number of permits involves the question of causality: do more permits make crimes more costly or do higher crimes lead to more permits? The change in the number of permits before and after the change in the state laws allows us to rank the counties on the basis of how restrictive they had actually been in issuing permits prior to the change in the law. Of course, there is still the question of why the state concealed handgun law changed, but since we are dealing with county-level rather than state-level data, we benefit from the fact that those counties which had the most restrictive permitting policies were also the most likely to have the new laws exogenously imposed on them by the rest of their state. Using county-level data also has another important advantage in that both crime and arrest rates vary widely within states. In fact, as Table 8.1 indicates, the standard deviation of both crime and arrest rates across states is almost always smaller than the average within-state standard deviation across counties. With the exception of robbery, the standard deviation across states for crime rates ranges from between 61 and 83 percent of the average of the standard deviation within states. (The difference between these two columns with respect to violent crimes arises because robberies make up such a large fraction of the total crimes in this category.) For arrest rates, the numbers are much more dramatic, with the standard deviation across states as small as 15 percent of the average of the standard deviation within states. These results imply that it is no more accurate to view all the counties in the typical state as a homogenous unit than it is to view all the states in the United States as one homogenous unit. For example, when a state’s arrest rate rises, it may make a big difference whether that increase is taking place in the most or least crimeprone counties. Depending on which types of counties the changes in arrest rates are occurring in and depending on how sensitive the crime rates are to changes in those particular counties, widely differing estimates of how increasing a state’s average arrest rate will deter crime could result. Aggregating these data may thus make it more difficult to discern the true relationship that exists between deterrence and crime. Perhaps the relatively small across-state variation as compared to withinstate variations is not so surprising given that states tend to average out differences as they encompass both rural and urban areas. Yet, when coupled with the preceding discussion on how concealed handgun provisions affected different counties in the same state differently, these numbers strongly imply that it is risky to assume that states are homogenous units with respect to either how crimes are punished or how the laws which affect gun usage are changed. Unfortunately, this focus of state-level data is pervasive in the entire crime literature, which focuses on state- or city-level data and fails to recognize the differences between rural and urban counties.
134
John R. Lott, Jr., and David B. Mustard
Table 8.1 Comparing the deviation in crime rates between states and by counties within states from 1977 to 1992: does it make sense to view states as relatively homogenous units? Standard deviation of state means Crime rates per 100,000 population: Violent crimes 284.77 Murder 6.12 Murder with guns (1982–91) 3.9211 Rape 16.33 Aggravated assault 143.35 Robbery 153.62 Property crime 1,404.15 Auto theft 162.02 Burglary 527.70 Larceny 819.08 Arrest rates defined as the number of arrests divided by the number of offenses:* Violent crimes 23.89 Murder 18.58 Rape 19.83 Robbery 21.97 Aggravated assault 25.30 Property crimes 7.907 Burglary 5.87 Larceny 11.11 Auto theft 17.37 Truncating arrest rates to be no greater than one: Violent crimes 11.11 Murder 10.78 Rape 10.60 Robbery 8.06 Aggravated assault 11.14 Property crimes 5.115 Burglary 4.63 Larceny 5.91 Auto theft 8.36
Mean of within-state standard deviations 255.57 8.18 6.4756 23.55 172.66 92.74 2,120.28 219.74 760.22 1,332.52
112.97 88.41 113.86 104.40 78.53 44.49 25.20 71.73 118.94
25.40 36.40 31.59 32.67 27.08 11.99 14.17 12.97 26.66
* Because of multiple arrests for a crime and because of the lags between when a crime occurs and an arrest takes place, the arrest rate for counties and states can be greater than one. This is much more likely to occur for counties than for states.
However, using county-level data has some drawbacks. Frequently, because of the low crime rates in many low population counties, it is quite common to find huge variations in the arrest and conviction rates between years. In addition, our sample indicates that annual conviction rates for some counties are as high as 13 times the offense rate. This anomaly arises for a couple of reasons. First, the year in which the offense occurs frequently differs from
Crime, deterrence, and right-to-carry concealed handguns
135
the year in which the arrests and/or convictions occur. Second, an offense may involve more than one offender. Unfortunately, the FBI data set allows us neither to link the years in which offenses and arrests occurred nor to link offenders with a particular crime. When dealing with counties where only a few murders occur annually, arrests or convictions can be multiples higher than the number of offenses in a year. This data problem appears especially noticeable for murder and rape. One partial solution is to limit the sample to only counties with large populations. For counties with a large numbers of crimes, these waves have a significantly smoother flow of arrests and convictions relative to offenses. An alternative solution is to take a moving average of the arrest or conviction rates over several years, though this reduces the length of the usable sample period, depending on how many years are used to compute this average. Furthermore, the moving average solution does nothing to alleviate the effect of multiple suspects being arrested for a single crime. Another concern is that otherwise law-abiding citizens may have carried concealed handguns even before it was legal to do so. If “shall issue” laws do not alter the total number of concealed handguns carried by otherwise lawabiding citizens but merely legalizes their previous actions, passing these laws seems unlikely to affect crime rates. The only real effect from making concealed handguns legal could arise from people being more willing to use handguns to defend themselves, though this might also imply that they will be more likely to make mistakes using these handguns. It is also possible that concealed firearm laws both make individuals safer and increase crime rates at the same time. As Peltzman has pointed out in the context of automobile safety regulations, increasing safety can result in drivers offsetting these gains by taking more risks in how they drive.31 The same thing is possible with regard to crime. For example, allowing citizens to carry concealed firearms may encourage people to risk entering more dangerous neighborhoods or to begin traveling during times they previously avoided. Thus, since the decision to engage in these riskier activities is a voluntary one, it is possible that society still could be better off even if crime rates were to rise as a result of concealed handgun laws. Finally, there are also the issues of why certain states adopted concealed handgun laws and whether higher offense rates result in lower arrest rates. To the extent that states adopted the law because crime was rising, ordinary least squares (OLS) estimates would underpredict the drop in crime. Likewise, if the rules were adopted when crime rates were falling, the bias would be in the opposite direction. None of the previous studies deal with this last type of potential bias. At least since Ehrlich,32 economists have also realized that potential biases exist from having the offense rate as both the endogenous variable and the denominator in determining the arrest rate and because increasing crime rates may lower the arrest rate if the same resources are being asked to do more work. Fortunately, both these sets of potential biases can be dealt with using two-stage least squares (2SLS).
136
John R. Lott, Jr., and David B. Mustard
III. The data Between 1977 and 1992, 10 states (Florida (1987), Georgia (1989), Idaho (1990), Maine (1985),33 Mississippi (1990), Montana (1991), Oregon (1990), Pennsylvania (1989), Virginia (1988),34 and West Virginia (1989)) adopted “shall issue” right-to-carry firearm laws. However, Pennsylvania is a special case because Philadelphia was exempted from the state law during our sample period. Eight other states (Alabama, Connecticut, Indiana, New Hampshire, North Dakota, South Dakota, Vermont, and Washington) effectively had these laws on the books prior to the period being studied.35 Since the data are at the county level, a dummy variable is set equal to one for each county operating under “shall issue” right-to-carry laws. A Nexis search was conducted to determine the exact date on which these laws took effect. For the states that adopted the law during the year, the dummy variable for that year is scaled to equal that portion of the year for which the law was in effect. Because of delays in implementing the laws even after they go into effect, we also used a dummy variable that equals one starting during the first full year that the law is in effect. The following tables report this second measure, though both measures produced similar results. While the number of arrests and offenses for each type of crime in every county from 1977 to 1992 were provided by the Uniform Crime Report, we also contacted the state departments of corrections, state attorneys general, state secretaries of state, and state police offices in every state to try to compile data on conviction rates, sentence lengths, and right-to-carry concealed weapons permits by county. The Bureau of Justice Statistics also released a list of contacts in every state that might have available state-level criminal justice data. Unfortunately, county data on the total number of outstanding right-to-carry pistol permits were available for only Arizona, California, Florida, Oregon, Pennsylvania, and Washington, though time-series county data before and after a change in the permitting law were available only for Arizona (1994–96), Oregon (1990–92) and Pennsylvania (1986–92). Since the Oregon “shall issue” law passed in 1990, we attempted to get data on the number of permits in 1989 by calling up every county sheriff in Oregon, with 25 of the 36 counties providing us with this information. (The remaining counties claimed that records had not been kept.)36 For Oregon, data on the county-level conviction rate and prison sentence length were also available from 1977 to 1992. One difficulty with the sentence length data is that Oregon passed a sentencing reform act that went into effect in November 1989 causing criminals to serve 85 percent of their sentence, and thus judges may have correspondingly altered their rulings. Even then, this change was phased in over time because the law applied only to crimes that took place after it went into effect in 1989. In addition, the Oregon system did not keep complete records prior to 1987, and the completeness of these records decreased the further into the past one went. One solution to both of these problems is to
Crime, deterrence, and right-to-carry concealed handguns
137
interact the prison sentence length with year dummy variables. A similar problem exists for Arizona, which adopted a truth-in-sentencing reform during the fall of 1994. Finally, Arizona is different from Oregon and Pennsylvania in that it already allowed handguns to be carried openly before passing its concealed handgun law, thus one might expect to find a somewhat smaller response to adopting a concealed handgun law. In addition to using county dummy variables, other data were collected from the Bureau of the Census to try controlling for other demographic characteristics that might determine the crime rate. These data included information on the population density per square mile, total county population, and detailed information on the racial and age breakdown of the county (percentage of population by each racial group and by sex between 10 and 19 years of age, between 20 and 29, between 30 and 39, between 40 and 49, between 50 and 64, and 65 and over).37 While a large literature discusses the likelihood of younger males engaging in crime,38 controlling for these other categories allows us to also attempt to measure the size of the groups considered most vulnerable (for example, females in the case of rape).39 Recent evidence by Glaeser and Sacerdote confirms the higher crime rates experienced in cities and examines to what extent this arises due to social and family influences as well as the changing pecuniary benefits from crime,40 though this is the first paper to explicitly control for population density. The Data Appendix provides a more complete discussion of the data. An additional set of income data was also used. These included real per capita personal income, real per capita unemployment insurance payments, real per capita income maintenance payments, and real per capita retirement payments per person over 65 years of age.41 Including unemployment insurance and income maintenance payments from the Commerce Department’s Regional Economic Information System data set was an attempt to provide annual county-level measures of unemployment and the distribution of income. Finally, we recognize that other legal changes in how guns are used and when they can be obtained can alter the levels of crime. For example, penalties involving improper gun use might also have been changing simultaneously with changes in the permitting requirements for concealed handguns. In order to see whether this might confound our ability to infer what was responsible for any observed changes in crime rates we read through various editions of the Bureau of Alcohol, Tobacco, and Firearms’ State Laws and Published Ordinances—Firearms (1976, 1986, 1989, and 1994). Excluding the laws regarding machine guns and sawed-off shotguns, there is no evidence that the laws involving the use of guns changed significantly when concealed permit rules were changed.42 Another survey which addresses the somewhat broader question of sentencing enhancement laws for felonies committed with deadly weapons (firearms, explosives, and knives) from 1970 to 1992 also confirms this general finding, with all but four of the legal changes clustered from 1970 to 1981.43 Yet, controlling for the dates
138
John R. Lott, Jr., and David B. Mustard
Table 8.2 National sample means and standard deviations Variable
N
Gun ownership information: Shall issue dummy
50,056
Arrests rates (ratio of arrests to offenses) for a particular crime category: Index crimes Violent crimes Property crimes Murder Rape Aggravated assault Robbery Burglary Larceny Auto theft
45,108 43,479 45,978 26,472 33,887 43,472 34,966 45,801 45,776 43,616
Crime rates are defined per 100,000 people: Index crimes Violent crimes Property crimes Murder Murder with guns (1982–91 in counties over 100,000) Rape Robbery Aggravated assault Burglary Larceny Auto theft Causes of accidental deaths and murders per 100,000 people: Rate of accidental deaths from guns Rate of accidental deaths from sources other than guns Rate of total accidental deaths Rate of murders using handgun Rate of murders using other guns Real per capita income data (in real 1983 dollars): Personal income Unemployment insurance Income maintenance Retirement payments per person over 65 Population characteristics: County population County population per square mile
Mean
S.D. .164704
27.43394 71.30733 24.02564 98.04648 57.8318 71.36647 61.62276 21.51446 25.57141 44.8199
.368089
126.7298 327.2456 120.8654 109.7777 132.8028 187.354 189.5007 47.28603 263.706 307.5356
46,999 47,001 46,999 47,001 12,759
2,984.99 249.0774 2,736.59 5.651217 3.9211
3,368.85 388.7211 3,178.41 10.63025 6.4756
47,001 47,001 47,001 47,001 47,000 47,000
18.7845 44.6861 180.0518 811.8642 1,764.37 160.4165
32.39292 149.2124 243.2615 1,190.23 2,036.03 284.5969
23,278 23,278
.151278 1.165152
1.216175 4.342401
23,278 23,278 23,278
51.95058 .444301 3.477088
32.13482 1.930975 6.115275
50,011 50,011 50,011 49,998
10,554.21 67.57505 157.2265 12,328.5
50,023 50,023
75,772.78 214.3291
2,498.07 53.10043 97.61466 4,397.49
250,350.4 1,421.25
Crime, deterrence, and right-to-carry concealed handguns State population State NRA membership per 100,000 state population % of votes Republican in presidential election Race and age data (% of population): Black male 10–19 Black female 10–19 White male 10–19 White female 10–19 Other male 10–19 Other female 10–19 Black male 20–29 Black female 20–29 White male 20–29 White female 20–29 Other male 20–29 Other female 20–29 Black male 30–39 Black female 30–39 White male 30–39 White female 30–39 Other male 30–39 Other female 30–39 Black male 40–49 Black female 40–49 White male 40–49 White female 40–49 Other male 40–49 Other female 40–49 Black male 50–64 Black female 50–64 White male 50–64 White female 50–64 Other male 50–64 Other female 50–64 Black male over 65 Black female over 65 White male over 65 White female over 65 Other male over 65 Other female over 65
50,056 50,056 50,056
50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023 50,023
6,199,949 1,098.11 52.89235
.920866 .892649 7.262491 6.820146 .228785 .218348 .751636 .762416 6.792357 6.577894 .185308 .186327 .539637 .584164 6.397395 6.318641 .151869 .167945 .358191 .415372 4.932917 4.947299 .105475 .115959 .43193 .54293 6.459038 6.911502 .101593 .11485 .384049 .552889 5.443062 7.490128 .065265 .077395
139
5,342,068 516.0701 8.410228
1.556054 1.545335 1.747557 1.673272 .769633 .742927 1.214317 1.2783 1.991303 1.796134 .557494 .559599 .879286 .986009 1.460204 1.422831 .456388 .454721 .571475 .690749 1.086635 1.038738 .302059 .304423 .708241 .921819 1.410181 1.54784 .367467 .374837 .671189 .980266 2.082804 2.69476 .286597 .264319
supplied by Marvell and Moody still allows us to examine the deterrence effect of criminal penalties specifically targeted at the use of deadly weapons during this earlier period.44 States also differ in terms of their required waiting periods for handgun purchases. Again using the Bureau of Alcohol, Tobacco, and Firearms’ State Laws and Published Ordinances – Firearms, we identified states with waiting periods and did a Lexis search on those ordinances to determine exactly when
140
John R. Lott, Jr., and David B. Mustard
those laws went into effect. Thirteen of the 19 states with waiting periods had them prior to the beginning of our sample period.45
IV. The empirical evidence Using county data for the United States The first group of regressions reported in Table 8.3 attempts to explain the natural log of the crime rate for nine different categories of crime. The regressions are run using weighted ordinary least squares. While we are primarily interested in a dummy variable to represent whether a state has a “shall issue” law, we also control for each type of crime’s arrest rate, demographic differences, and dummies for the fixed effects for years and counties. The results imply that “shall issue” laws coincide with fewer murders, rapes, aggravated assaults, and rapes.46 On the other hand, auto theft and larceny rates rise. Both changes are consistent with our discussion on the direct and substitution effects produced by concealed weapons.47 Rerunning these specifications with only the “shall issue” dummy, the “shall issue” dummy and the arrest rates, or simply just the “shall issue” dummy and the fixed year effects produces even more significant effects for the “shall issue” dummy.48 The results are large empirically. When state concealed handgun laws went into effect in a county, murders fell by 7.65 percent, and rapes and aggravated assaults fell by 5 and 7 percent.49 In 1992, there were 18,469 murders, 79,272 rapes, 538,368 robberies, and 861,103 aggravated assaults in counties without “shall issue” laws. The coefficients imply that if these counties had been subject to state concealed handgun laws, murders in the United States would have declined by 1,414. Given the concern that has been raised about increased accidental deaths from concealed weapons, it is interesting to note that, for the most recent year that such a breakdown is available, the entire number of accidental handgun deaths in the United States in 1988 was 200. Of this total, 22 accidental deaths were in states with concealed handgun laws and 178 were in those without these laws. The reduction in murders is as much as eight times greater than the total number of accidental deaths in concealed handgun states. Thus, if our results are accurate, the net effect of allowing concealed handguns is clearly to save lives. Similarly, the results indicate that the number of rapes in states without “shall issue” laws would have declined by 4,177, aggravated assaults by 60,363, and robberies by 11,898.50 On the other hand, property crime rates definitely increased after “shall issue” laws were implemented. The results are equally dramatic. If states without concealed handgun laws had passed such laws, there would have been 247,165 more property crimes in 1992 (a 2.7 percent increase). Thus, criminals respond substantially to the threat of being shot by instead substituting into less risky crimes.51
−.00022 (3.970) .07% −.0000699 (.841) .3%
7.92E–06 (2.883) 1%
−.00048 (77.257) 9% .00006 (3.684) 5%
−.0490 (5.017) 1%
ln (Violent crime rate)
Retirement payments per person over 65 −1.97E–06 (.895) .5%
Income maintenance
Unemployment insurance
Real per capita income data: Personal income
Population per square mile
Arrest rate for the crime category appropriate endogenous variable
Shall issue law adopted dummy
Exogenous variables
−.000013 (3.713) 3%
−.00046 (5.260) 1% .00025 (1.928) 1%
0.0000163 (3.623) 2%
−.00139 (37.139) 7% −.00002 (.942) 1%
−.0765 (4.660) 2%
ln (Murder rate)
−.000896 (69.742) 9% 5.76E–06 (.320) .4%
−.0701 (6.137) 1%
ln (Aggravated assault rate)
−2.37E–06 (.861) .4%
−.00047 (6.731) 1% −.00017 (1.634) .7% −6.81E–06 (2.651) 2%
−.00019 (2.904) .05% .000139 (1.438) .7%
−.5.85E–06 4.71E–06 (1.669) (1.467) 1% 1%
−.00081 (47.551) 4% −.00002 (1.022) 1%
−.0527 (4.305) 1%
ln (Rape rate)
−5.50E–06 (1.835) 1%
.00007 (.898) .01% −.00032 (2.840) 1%
4.73E–06 (1.244) 1%
−.00057 (88.984) 4% .000316 (15.117) 17%
−.0221 (1.661) .3%
ln (Robbery rate)
Endogenous variables (natural logs of the crime rate per 100,000 people)
−8.65E–06 (5.371) 4%
.00038 (9.468) 2% .00019 (3.107) 2%
−.0000102 (5.118) 3%
−.000759 (96.996) 10% 4.83E–06 (.428) 1%
.0269 (3.745) 1%
ln (Property crime rate)
−.0000106 (6.273) 7%
.00060 (14.003) 3% .00039 (6.219) 4%
−.0000184 (8.729) 4%
−.0024 (90.189) 11% −.00007 (5.605) 9%
.00048 (.063) .02%
ln (Burglary rate)
−6.34E–06 (3.186) 2%
−.00019 (3.706) .08% .00002 (.320) .1%
−.0000123 (4.981) 2%
−.00018 (77.616) 4% .000037 (2.651) 4%
.03342 (3.763) 1%
ln (Larceny rate)
−9.27E–06 (3.613) 2% (Continued )
.00021 (3.316) .06% .00033 (3.452) 2%
.000015 (4.689) 2%
−.00018 (74.972) 3% .00048 (26.722) 36%
.0714 (6.251) 1%
ln (Auto theft rate)
Table 8.3 The effect of “shall issue” right-to-carry firearms laws on the crime rate: national county-level cross-sectional time-series evidence
Black female 50–64
Black female 40–49
Black female 30–39
Black female 20–29
Black female 10–19
Black male over 65
Black male 50–64
Black male 40–49
Black male 30–39
Black male 20–29
Race and age data (% of population): Black male 10–19
Population
Exogenous variables
Table 8.3—continued
.05637 (1.293) 5% .0009 (.035) .0419 (1.063) −.0243 (.300) .1816 (2.159) .12165 (1.337) −.00394 (.088) −.0993 (3.094) .1218 (3.383) .0107 (.158) −.2105 (2.826)
8.59E–08 (4.283) 1%
ln (Violent crime rate)
.04108 (.722) 3% .0794 (2.366) −.0832 (1.617) .9029 (8.562) −.1509 (1.381) .4373 (3.742) .0368 (.630) .1751 (4.280) .1489 (3.228) −.7396 (8.431) .1044 (1.076)
−2.94E–07 (11.884) 3%
−3.44E–08 (1.109) .4%
.1134 (1.515) 8% .0663 (1.514) .1085 (1.640) −.33549 (2.498) −.34753 (2.518) −.14275 (.971) .0374 (.490) −.2247 (4.312) −.0828 (1.409) .59197 (5.321) .20188 (1.648)
ln (Rape rate)
ln (Murder rate)
.0900695 (1.767) 7% −.0528 (1.749) .2024 (4.424) −.3654 (3.860) .2861 (2.889) .1053 (1.014) −.0692 (1.321) −.1938 (5.219) .0947 (2.265) .26946 (3.387) −.0532 (.612)
4.54E–08 (1.947) .06%
ln (Aggravated assault rate) ln (Property crime rate)
.10548 (1.752) 5% −.0060 (.168) .0061 (.111) −.00867 (.077) −.00706 (.060) .17053 (1.379) −.18307 (2.957) −.2167 (4.986) .3808 (7.691) −.06891 (.738) .07078 (.684)
.1287 (4.068) 22% −.0143 (.759) .04126 (1.445) −.02391 (.406) −.0519 (.843) −.0367 (.567) .0836 (2.570) −.0996 (4.307) .13409 (5.137) .05958 (1.213) −.0241 (.443)
−6.10E–08 −2.18E–07 (2.271) (15.063) .06% 6%
ln (Robbery rate)
Endogenous variables (natural logs of the crime rate per 100,000 people)
.074 (2.214) 11% −.0203 (1.022) −.0074 (.246) −.03132 (.506) .09135 (1.409) .06132 (.900) .0217 (.631) −.1688 (6.936) .2721 (9.909) −.05022 (.970) −.21799 (3.817)
−2.14E–07 (14.060) 5%
ln (Burglary rate)
.1710 (4.366) 22% −.0057 (.245) .0044 (.124) .18939 (2.601) −.1318 (1.730)(.569) −.0965 (1.204) .1564 (3.883) −.0075 (.264) .0944 (2.923) −.0342 (.562) .0100 (.149)
−3.10E–07 (17.328) 6%
ln (Larceny rate)
−.3384 (3.254) −.1766 (3.372) −.2481 (6.711) .1701 (4.072) .4816 (6.093) .1153 (1.321)
.0513 (1.007) 4% .00665 (.220) .14955 (3.254) −.6846 (7.235) .05626
−4.06E–09 (.177) .05%
ln (Auto theft rate)
Other male 40–49
Other male 30–39
Other male 20–29
Other male 10–19
White female over 65
White female 50–64
White female 40–49
White female 30–39
White female 20–29
White female 10–19
White male over 65
White male 50–64
White male 40–49
White male 30–39
White male 20–29
White male 10–19
Black female over 65
−.2035 (3.229) −.0060 (.382) .00842 (.729) −.006 (.322) −.0095 (.375) −.00575 (.236) −.1291 (6.065) .02346 (1.410) .0128 (.896) .01878 (.890) −.0901 (3.553) .00332 (.163) .0558 (3.719) .2501 (2.179) −.1229 (1.966) .23126 (1.866) .12678 (.824)
.3071 (2.969) −.0271 (.935) .0598 (3.023) −.01289 (.371) −.02078 (.462) −.0458 (1.074) .02336 (.618) .0452 (1.473) −.0405 (1.673) .0447 (1.209) −.00077 (.017) .0119 (.335) −.0681 (2.588) .6624 (3.022) .14495 (1.367) −.2958 (1.370) −.35775 (1.341)
−.5164 (6.278) .0056 (.265) .03779 (2.528) −.0376 (1.444) .0898 (2.685) .0397 (1.237) .0441 (1.547) .0741 (3.307) .0551 (2.999) .14127 (5.092) −.0689 (2.061) .0213 (.794) .0578 (2.904) .5572 (3.546) −.1656 (2.065) −.1907 (1.161) −.2406 (1.180)
−.1557 (2.104) .03998 (2.208) .0219 (1.623) .0739 (3.206) −.0406 (1.369) −.0904 (3.184) −.1651 (6.627) −.00863 (.448) .03926 (2.348) .0299 (1.215) −.0031 (.106) .07882 (3.313) .0836 (4.761) .1872 (1.389) −.0573 (.794) .4015 (2.777) −.1903 (1.060)
−.36915 (4.212) .00219 (.098) .0426 (2.636) −.0706 (2.507) −.11188 (3.099) −.14195 (4.104) .0421 (1.370) .0561 (2.359) .01327 (.669) −.0079 (.265) −.02258 (.626) .03094 (1.072) −.0870 (4.046) .5360 (3.124) .0129 (.149) −.1021 (.572) .77753 (3.538)
−.2035 (4.406) −.0066 (.593) .00456 (.542) −.0520 (3.633) −.14626 (7.981) −.1282 (7.309) −.1442 (7.635) .0824 (6.907) −.0086 (.828) .0388 (2.545) .0584 (3.193) .1044 (7.103) .02027 (1.867) .1587 (1.917) .0786 (1.748) −.1779 (1.996) .0287 (.261)
−.3877 (7.968) −.0062 (.523) .01738 (1.958) −.0268 (1.779) −.0995 (5.147) .0729 (3.942) −.1194 (8.887) .0816 (6.474) −.0421 (3.832) .0171 (1.065) −.354 (1.833) .06396 (4.126) .0483 (4.218) .2708 (3.100) .0007 (.015) −.4257 (4.532) .2356 (2.027)
−.1234 (2.160) .00027 (.020) .00377 (.362) −.0579 (3.268) −.1271 (5.600) −.1071 (4.929) −.13975 (6.264) .0865 (5.863) .02928 (2.272) .06611 (3.502) .0741 (3.270) .1100 (6.042) .03631 (2.701) .1487 (1.451) .2037 (3.661) −.0415 (.376) −.2320 (1.700) .2433 (3.283) −.0568 (3.152) −.0200 (1.487) −.0592 (2.583) −.0962 (3.265) −.2749 (9.771) −.1104 (5.651) .0866 (4.513) −.0289 (1.739) −.1017 (4.165) −.0172 (.585) .10687 (4.534) −.0459 (2.636) .6039 (4.532) −.4066 (5.667) .64667 (4.525) .4640 (2.620) (Continued )
ln (Murder rate)
−.1572 (.623) −.2585 (1.019) −.7299 (3.185) −.3293 (2.145) −.1103 (.531) .56562 (2.343) .4354 (1.612) .0569 (.277) 2.0247 (3.326)
26,458 37.95 .8060
ln (Violent crime rate)
−.0904 (.605) .3469 (2.222) −.0303 (.253) −.1323 (1.253) −.2187 (1.823) −.1413 (1.011) −.0972 (.607) −.4376 (3.489) 5.8905 (15.930)
43,451 115.11 .8925
33,865 44.93 .8004
.2403 (1.240) .8709 (4.389) −.1095 (.670) .2093 (1.670) .1556 (.988) .07877 (.429) −.6588 (3.184) −.3715 (2.324) .4189 (.890)
ln (Rape rate)
43,445 70.47 .8345
−.2829 (1.612) 1.0193 (5.566) .1207 (.857) .0933 (.557) −.1674 (1.189) .1831 (1.116) −.2700 (1.439) −.4428 (3.012) 4.2648 (9.857)
ln (Aggravated assault rate)
34,949 131.75 .9196
−.39616 (1.869) −.267 (1.237) −.3461 (1.936) −.3033 (1.535) −.2158 (1.253) −.48132 (2.407) .36585 (1.620) −.3596 (2.058) 5.4254 (10.623)
ln (Robbery rate)
Endogenous variables (natural logs of the crime rate per 100,000 people)
45,940 87.22 .8561
−.0211 (.194) −.0785 (.688) −.1769 (2.049) −.1464 (1.849) −.0874 (1.005) .2452 (2.432) −.0491 (.424) −.1052 (1.148) 9.1613 (33.945)
ln (Property crime rate)
45,769 82.16 .8490
.2676 (2.330) .1863 (1.549) −.2861 (3.140) −.3243 (3.366) .2703 (2.949) −.2767 (2.600) −.4901 (4.006) −.1408 (1.458) 8.7058 (30.614)
ln (Burglary rate)
45,743 59.33 .8016
−.1952 (1.449) −.2342 (1.659) −.2304 (2.155) −.3334 (2.435) −.2838 (2.638) .6971 (5.574) .1615 (1.125) −.0478 (.422) 7.596 (22.751)
ln (Larceny rate)
43,589 116.35 .8931
−.4198 (2.411) −.1792 (.985) −.2739 (1.971) −.5646 (4.768) −.7516 (5.395) −.1461 (.901) .3078 (1.659) −.587 (4.020) 8.332 (19.372)
ln (Auto theft rate)
Note: The absolute t-statistics are in parentheses, and the percentage reported below that for some of the numbers is the percent of a standard deviation change in the endogenous variable that can be explained by a 1 standard deviation change in the exogenous variable. Year and county dummies are not shown. All regressions use weighted least squares where the weighting is each county’s population.
N F-statistic Adjusted R2
Intercept
Other female over 65
Other female 50–64
Other female 40–49
Other female 30–39
Other female 20–29
Other female 10–19
Other male over 65
Other male 50–64
Exogenous variables
Table 8.3—continued
Crime, deterrence, and right-to-carry concealed handguns
145
A recent National Institute of Justice study52 estimates the costs of different types of crime based on lost productivity; out-of-pocket expenses such as medical bills and property losses; and losses for fear, pain, suffering, and lost quality of life. While there are questions about using jury awards to measure losses such as fear, pain, suffering, and lost quality of life, the estimates provide us one method of comparing the reduction in violent crimes with the increase in property crimes. Using the numbers from Table 8.3, the estimated gain from allowing concealed handguns is over $5.74 billion in 1992 dollars. The reduction in violent crimes represents a gain of $6.2 billion ($4.28 billion from murder, $1.4 billion from aggravated assault, $374 million from rape, and $98 million from robbery), while the increase in property crimes represents a loss of $417 million ($343 million from auto theft, $73 million from larceny, and $1.5 million from burglary). However, while $5.7 billion is substantial, to put it into perspective, it equals only about 1.23 percent of the total aggregate losses from these crime categories. These estimates are probably most sensitive to the value of life used (in the Miller et al. study this was set at about $3 million in 1992 dollars). Higher estimated values of life will increase the net gains from concealed handgun use, while lower values of life will reduce the gains.53 To the extent that people are taking greater risks toward crime because of any increased safety produced by concealed handgun laws,54 these numbers will underestimate the total savings from concealed handguns. The arrest rate produces the most consistent effect on crime. Higher arrest rates imply lower crime rates for all categories of crime. A 1 standard deviation change in the probability of arrest accounts for 3–17 percent of a 1 standard deviation change in the various crime rates. The crime most responsive to arrest rates is burglary (11 percent), followed by property crimes (10 percent); aggravated assault and violent crimes more generally (9 percent); murder (7 percent); rape, robbery, and larceny (4 percent); and auto theft (3 percent). For property crimes, a 1 standard deviation change in the percentage of the population that is black, male, and between 10 and 19 years of age explains 22 percent of these crime rates. For violent crimes, the same number is 5 percent. Other patterns also show up in the data. For example, more black females between the ages of 20 and 39, more white females between the ages of 10 and 39 and those over 65, and other race females between 20 and 29 are positively and significantly associated with a greater number of rapes occurring. Population density appears to be most important in explaining robbery, burglary, and auto theft rates, with a 1 standard deviation change in population density being able to explain 36 percent of a 1 standard deviation change in auto theft. Perhaps most surprising is the relatively small, even if frequently significant, effect of income on crime rates. A 1 standard deviation change in real per capita income explains no more than 4 percent of a 1 standard deviation change in crime, and in seven of the specifications it explains 2 percent or less of the change. If the race, sex, and age variables are
146 John R. Lott, Jr., and David B. Mustard replaced with variables showing the percentage of the population that is black and the percent that is white, 50 percent of a standard deviation in the murder rate is explained by the percentage of the population that is black. Given the high rates at which blacks are arrested and incarcerated or are victims of crimes, this is not unexpected. Given the wide use of state-level crime data by economists and the large within-state heterogeneity shown in Table 8.1, Table 8.4 provides a comparison by reestimating the specifications reported in Table 8.3 using state-level rather than county-level data. The only other difference in the specification is the replacement of county dummies with state dummies. While the results in these two tables are generally similar, two differences immediately manifest themselves: (1) all the specifications now imply a negative and almost always significant relationship between allowing concealed handguns and the level of crime and (2) concealed handgun laws explain much more of the variation in crime rates while arrest rates (with the exception of robbery) explain much less of the variation.55 Despite the fact that concealed handgun laws appear to lower both violent and property crime rates, the results still imply that violent crimes are much more sensitive to the introduction of concealed handguns, with violent crimes falling three times more than property crimes. These results imply that if all states had adopted concealed handgun laws in 1992, 1,592 fewer murders and 4,811 fewer rapes would have taken place.56 Overall, Table 8.4 implies that the estimated gain from the lower crime produced by handguns was $8.3 billion in 1992 dollars (see Table 8.5). Yet, at least in the case of property crimes, the concealed handgun law coefficients’ sensitivity to whether these regressions are run at the state or county level suggests caution in aggregating these data into such large units as states. Table 8.6 examines whether changes in concealed handgun laws and arrest rates have differential effects in high- or low-crime counties. To test this, the regressions shown in Table 3 were reestimated first using the sample above the median crime rate by type of crime and then separately using the sample below the median. High crime rates may also breed more crime because the stigma from arrest may be less when crime is rampant.57 If so, any change in apprehension rates should produce a greater reputational effect and thus greater deterrence in low-crime than high-crime counties. The results indicate that the concealed handgun law’s coefficient signs are consistently the same for both low- and high-crime counties, though for two of the crime categories (rape and aggravated assault) concealed handgun laws have only statistically significant effects in the relatively high-crime counties. For most violent crimes such as murder, rape, and aggravated assault, concealed weapons laws have a much greater deterrent effect in high-crime counties, while for robbery, property crimes, auto theft, burglary, and larceny the effect appears to be greatest in low-crime counties. The table also shows that the deterrent effect of arrests is significantly different at least at the 5 percent level between high- and low-crime counties. for eight of the nine
804 76.44 .9103
811 132.60 .9461
1.4156 (.728)
(4.230) 3.9%
−.00153
−.1090 (3.365) 6.5%
ln (Aggravated assault rate)
811 126.64 .9437
−1.4719 (.531)
(21.030) 14.4%
−.0105
−.1421 (3.071) 5.7%
ln (Robbery rate)
811 80.25 .9135
8.5370 (6.502)
(4.591) 8.1%
−.00599
−.0419 (1.907) 4.8%
ln (Property crime rate)
811 174.63 .9586
8.5195 (4.687)
(3.727) 6.5%
−.00145
−.0088 (.206) .43%
ln (Auto theft rate)
811 85.06 .9181
7.6149 (4.847)
(3.772) 7.6%
−.00715
−.0825 (3.146) 7.6%
ln (Burglary rate)
811 76.83 .9100
7.7438 (5.985)
(6.257) 10.4%
−.00657
−.0314 (1.452) 3.8%
ln (Larceny rate)
Note: Except for the use of state dummies in place of county dummies, the control variables are the same as those used in Table 8.3 including year dummies, although they are not all reported. Absolute t-statistics are in parentheses, and the percentage reported below that for some of the numbers is the percentage of a standard deviation change in the endogenous variable that can be explained by a 1 standard deviation change in the exogenous variable. All regressions use weighted least squares where the weighting is each state’s population.
809 103.83 .9322
804 139.45 .9490
N F-statistic Adjusted R2
−1.2892 (.686)
−.2715 (.121)
2.093 (1.089)
(3.979) 5.3%
(2.920) 1.5%
Intercept
(1.823) .69%
−.00073
−.000205
−.0607 (1.955) 4.7%
ln (Rape rate)
−.000802
−.0862 (2.297) 5.0%
−.1011 (3.181) 5.8%
Shall issue law adopted dummy
Arrest rate for the crime category corresponding to the appropriate endogenous variable
ln (Murder rate)
ln (Violent crime rate)
Exogenous variables
Table 8.4 Questions of aggregating the data: national state-level cross-sectional time-series evidence
−1,592 −4,811 −93,860 −62,852 −180,813 −180,261 −11,084
−1,414 −4,177 −60,363 −11,898 1,052 191,743 89,928
Murder Rape Aggravated assault Robbery Burglary Larceny Auto theft
−4,820,594,155 −431,086,861 −2,184,737,007 −517,868,225 −260,716,190 −68,693,188 −42,236,828 −8,325,932,454
−5,741,681,741
Estimates using state-level data
−4,281,608,125 −374,277,659 −1,405,042,403 −98,033,414 1,516,890 73,068,706 342,694,264
Estimates using county-level data
Change in victim costs if the states without “shall issue” laws in 1992 had adopted the law (in 1992 dollars)
Note: The table uses 1996 estimates of the costs of crime in 1992 dollars from Ted R. Miller, Mark A. Cohen, & Brian Wiersema, Victim Costs and Consequences: A New Look (February 1996).
Total change in annual victim costs
Estimates using state-level data
Estimates using county-level data
Crime category
Change in number of crimes if the states without “shall issue” laws in 1992 had adopted the law
Table 8.5 The effect of concealed handguns on victim costs: what if all states had adopted “shall issue” laws?
−.0005242 (30.302)
−.0369 (1.934)
−.000523 (−17.661)
−.0597 (7.007)
ln (Violent crime rate)
−.04468 (4.411)
ln (Aggravated assault rate)
−.0304 (.978)
−.0025 (.013)
−.000326 −.00063 (3.8130) (18.456)
−.0719 (7.415)
ln (Rape rate)
−.00123 −.000656 −.00068 (25.43) (31.542) (37.306)
−.0436 (1.938)
−.00049 (11.472)
−.0988 (7.173)
ln (Murder rate)
−.0003699 (9.018)
−.0787 (2.978)
−.00294 (9.381)
−.0342 (3.012)
ln (Robbery rate)
−.001354 (39.101)
.0881 (5.801)
−.005354 (33.669)
.0161 (2.943)
ln (Property crime rate)
.0874 (5.246)
−.00596 (41.585)
.0296 (5.474)
ln (Larceny rate)
.07226 (3.276)
−.00133 (11.907)
.0524 (5.612)
ln (Auto theft rate)
−.0027135 −.000998 −.0001412 (41.603) (37.559) (62.596)
.0297 (2.110)
−.00565 (27.390)
.0036 (.533)
ln (Burglary rate)
Note: The control variables are the same as those used in Table 8.3 including year and county dummies, although they are not all reported. Absolute t-statistics are in parentheses. All regressions use weighted least squares where the weighting is each county’s population.
Arrest rate for the crime category corresponding to the appropriate endogenous variable
B. Sample where county crime rates are below the median: Shall issue law adopted dummy
Arrest rate for the crime category corresponding to the appropriate endogenous variable
A. Sample where county crime rates are above the median: Shall issue law adopted dummy
Exogenous variables
Table 8.6 Questions of aggregating the data: do law enforcement and “shall issue” laws have the same effect in high and low crime areas?
150
John R. Lott, Jr., and David B. Mustard
crime categories (the one exception being violent crimes). The results do not support the claim that arrests produce a greater reputational penalty in lowcrime areas. While additional arrests in low- and high-crime counties produce virtually identical changes in violent crime rates, the arrest rate coefficient for high-crime counties is almost four times larger than it is for low-crime counties. One relationship in these first three sets of regressions deserves a special comment. Despite the relatively small number of women using concealed handgun permits, the concealed handgun coefficient for explaining rapes is consistently comparable in size to the effect that this variable has on other violent crime rates. In the states of Washington and Oregon in January 1996, women constituted 18.6 and 22.9 percent of those with concealed handgun permits for a total of 118,728 and 51,859 permits, respectively.58 The timeseries data which are available for Oregon during our sample period even indicates that only 17.6 percent of permit holders were women in 1991. While it is possible that the set of women who are particularly likely to be raped might already carry concealed handguns at much higher rates than the general population of women, the results are at least suggestive that rapists are particularly susceptible to this form of deterrence. Possibly this arises since providing a woman with a gun has a much bigger effect on her ability to defend herself against a crime than providing a handgun to a man. Thus even if relatively few women carry handguns, the expected change in the cost of attacking women could still be nearly as great. To phrase this differently, the external benefits to other women from a woman carrying a concealed handgun appear to be large relative to the gain produced by an additional man carrying a concealed handgun. If concealed handgun use were to be subsidized to capture these positive externalities, these results are consistent with efficiency requiring that women receive the largest subsidies.59 As mentioned in Section II, an important concern with these data is that passing a concealed handgun law should not affect all counties equally. In particular, we expect that it was the most populous counties that most restricted people’s ability to carry concealed weapons. To test this, Table 8.7 repeats all the regressions in Table 8.3 but instead interacts the shall issue law adopted dummy with county population. While all the other coefficients remain virtually unchanged, this new interaction retains the same signs as those for the original shall issue dummy, and in all but one case the coefficients are more significant. The coefficients are consistent with the hypothesis that the new laws produced the greatest change in the largest counties. The larger counties have a much greater response in both directions to changes in the laws. Violent crimes fall more and property crimes rise more in the largest counties. The bottom of the table indicates how these effects vary for different size counties. For example, passing a concealed handgun law lowers the murder rate in counties 2 standard deviations above the mean population by 12 percent, 7.4 times more than a shall issue law lowers murders for the mean population city. While the law enforcement officers we
N F-statistic Adjusted R2
Arrest rate for the crime category corresponding to the appropriate endogenous variable
Shall issue law adopted dummy *county population
Exogenous variables
−.00139 (37.135)
26,458 38.02 .8062
43,451 115.15 .8925
−2.07E–07 (7.388)
ln (Murder rate)
−.000475 (77.222)
−9.41E–08 (6.001)
ln (Violent crime rate)
33,865 44.92 .8004
−.000807 (47.535)
−7.83E–08 (4.043)
ln (Rape rate)
43,445 70.46 .8345
−.000895 (69.663)
−1.06E–07 (5.784)
ln (Aggravated assault rate)
34,949 131.74 .9196
−.000575 (88.980)
−2.29E–08 (1.295)
ln (Robbery rate)
45,940 87.23 .8561
−.000759 (97.027)
5.18E–08 (4.492)
ln (Property crime rate)
45,769 82.16 .8490
−.002429 (90.185)
6.96E–09 (.572)
ln (Burglary rate)
45,743 59.33 .8016
−.000177 (77.620)
4.90E–08 (3.432)
ln (Larceny rate)
43,589 116.41 .8931 (Continued )
−.0001754 (75.013)
1.40E–07 (7.651)
ln (Auto theft rate)
Table 8.7 Controlling for the fact that larger changes in crime rates are expected in the more populous counties where the change in the law constituted a bigger break with past policies
Violent crimes
−.3 −.6 −2.6 −4.5
4
7
Rape
−.78 −1.6 −6.8 −11.9
Murder
9
−.4 −.8 −3.5 −6.1
Aggravated assault
4
−.1 −.2 −.7 −1.3
Robbery
10
.2 .4 1.7 2.99
Property crimes
11
.03 .05 .23 .4
Auto theft
4
.2 .4 1.6 2.8
Burglary
3
.5 1.1 4.6 8.1
Larceny
Note: The control variables are the same as those used in Table 8.3 including year and county dummies, although they are not reported since the coefficient estimates are very similar to those reported earlier. Absolute t-statistics are in parentheses. All regressions use weighted least squares where the weighting is each county’s population.
Implied percent change in crime rate: The effect of the “shall issue” interaction coefficient evaluated at different levels of county populations: 1/2 Mean = 37,887 −.36 Mean = 75,773 −.71 Plus 1 SD = 326,123 −3.1 Plus 2 SD = 576,474 −5.4 % of a 1 standard deviation change in corresponding crime rate that can be explained by a 1 standard deviation change in the arrest rate for that crime 9
Table 8.7—continued
Crime, deterrence, and right-to-carry concealed handguns
153
talked to continually mentioned population as being the key variable, we also reran these regressions using population density as the variable that we interacted with the shall issue dummy. The results remain very similar to those reported. Admittedly, although arrest rates and county fixed effects are controlled for, these regressions have thus far controlled for expected penalties in a limited way. Table 8.8 reruns the regressions in Table 8.7 but includes either the burglary or robbery rates to proxy for other changes in the criminal justice system. Robbery and burglary are the violent and property crime categories that are the least related to changes in concealed handgun laws, but they are still positively correlated with all the other types of crimes. One additional minor change is made in two of the earlier specifications. In order to avoid any artificial collinearity either between violent crime and robbery or between property crimes and burglary, violent crimes net of robbery and property crimes net of burglary are used as the endogenous variables when robbery or burglary are controlled for. Some evidence that burglary or robbery rates will proxy for other changes in the criminal justice system can be seen in their correlations with other crime categories. The Pearson correlation coefficient between robbery and the other crime categories ranges between .49 and .80, and all are statistically significant at least at the .0001 level. For burglary the correlations range from .45 to .68, and they are also equally statistically significant. The two sets of specifications reported in Table 8.8 closely bound our earlier estimates, and the estimates continue to imply that the introduction of concealed handgun laws coincided with similarly large drops in violent crimes and increases in property crimes. The only difference with the preceding results is that they now imply that the effect on robberies is statistically significant. The estimates on the other control variables also essentially remain unchanged. We also reestimated the regressions in Table 8.3 using first differences on all the control variables (see Table 8.9). These regressions were run using a dummy variable for the presence of “shall issue” concealed handgun laws and differencing that variable, and the results consistently indicate a negative and statistically significant effect from the legal change for violent crimes, rape, and aggravated assault. Shall issue laws negatively affect murder rates in both specifications, but the effect is statistically significant only when the shall issue variable is also differenced. The property crime results are also consistent with those shown in the previous tables, showing a positive effect of shall issue laws on crime rates. Perhaps not surprisingly, the results imply that the gun laws immediately altered crime rates, but that an additional change was spread out over time, possibly because concealed handgun use did not instantly move to its new steady-state level (for example, in 1994, Oregon permits increased by 50 percent and Pennsylvania’s by 16 percent even though both ordinances had been in effect for at least 4 years). The annual decrease in violent crimes averaged about 2 percent, while the annual increase in property crimes averaged about 5 percent.
N F-statistic Adjusted R2
ln (Robbery rate)
Arrest rate for the crime category corresponding to the appropriate endogenous variable
Controlling for robbery rates: Shall issue law adopted dummy *county population
Exogenous variables
26,458 39.19 .8111
33,865 46.55 .8062
43,445 75.09 .8433
−1.03E–07 (5.777)
ln (Aggravated assault rate)
43,197 81.93 .8555
−7.73E–08 (4.049)
ln (Rape rate)
−.000776 (60.834) .1196466 (47.469)
−1.72E–07 (7.253)
ln (Murder rate)
−.0003792 −.0013449 −.00073 (57.644) (36.240) (42.672) .1083118 .116406 .0983088 (46.370) (24.616) (30.363)
−1.03E–07 (6.318)
ln (Net violent crime rate)
Endogenous variables
... ... ...
...
...
...
ln (Robbery rate)
45,940 101.83 .8744
−.0006448 (86.517) .1176149 (78.825)
5.61E–08 (5.206)
ln (Property crime rate)
ln (Larceny rate)
45,769 93.39 .8649
−.0020339 (77.992) .1135451 (70.826)
45,743 65.82 .8179
−.0001547 (69.968) .1164045 (61.762)
−3.50E–09 5.35E–08 (.304) (3.911)
ln (Burglary rate)
43,589 143.54 .9117
−.0001382 (63.888) .2173908 (92.212)
1.47E–07 (8.844)
ln (Auto theft rate)
Table 8.8 Using other crime rates that are relatively unrelated to changes in “shall issue” rules as a method of controlling for other changes in the legal environment: controlling for robbery and burglary rates
26,458 40.78 .8173
33,865 50.59 .8191
43,445 84.97 .8591
−1.03E–07 (6.072)
43,451 154.04 .9176
−8.03E–08 (4.356)
−.00054 (42.883) .5302516 (83.889)
−1.73E–07 (7.434)
−.00026 −.00128 −.00051 (44.982) (35.139) (30.010) .5667123 .4459916 .4916113 (110.768) (37.661) (56.461)
−9.52E–08 (6.937) 7.23E–08 (6.854)
34,949 159.18 .9327
45,813 123.99 .8949
−.000429 −.000469 (69.190) (61.478) .6719892 .5773792 (78.531) (155.849)
−1.47E–08 (.759)
... ... ...
...
...
...
45,743 98.08 .8706
−.000102 (53.545) .6009071 (150.635)
5.50E–08 (4.769)
43,589 152.82 .9167
−.000116 (53.961) .6416852 (106.815)
1.45E–07 (8.943)
Note: While not all coefficient estimates are reported, all the control variables are the same as those used in Table 8.3 including year and county dummies. Absolute t-statistics are in parentheses. All regressions use weighted least squares where the weighting is each county’s population. Net violent and property crime rates are respectively net of robbery and burglary crime rates to avoid producing any artificial collinearity. Likewise, the arrest rates for those values subtract out that portion of the corresponding arrest rates due to arrests for robbery and burglary.
N F-statistic Adjusted R2
ln (Burglary rate)
Arrest rate for the crime category corresponding to the appropriate endogenous variable
Controlling for burglary rates: Shall issue law adopted dummy *county population
N F-statistic Adjusted R2
Intercept
First differences in the arrest rate for the crime category corresponding to the appropriate endogenous variable
All variables except for the “shall issue” dummy differenced: Shall issue law adopted dummy
Exogenous variables
−.0331607 (1.593) .0526532 (4.982)
∆ln (Property crime rate)
.0352582 (3.16)
∆ln (Burglary rate)
.0522435 (4.049)
∆ln (Larceny rate)
.128475 (5.324)
∆ln (Auto theft rate)
−.0522417 (3.68) 37,694 4.03 .1972
−.1203331 −.0952347 −.0770997 −.1062443 −.2604944 (6.925) (10.8) (8.312) (9.872) (13.009) 27,999 40,901 40,686 40,671 37,581 4.05 4.36 6.62 3.1 10.34 .2283 .2047 .3018 .1386 .4338
−.0456251 (2.693)
∆ln (Robbery rate)
−.073928 −.0402018 −.014342 (6.049) (1.554) (.904) 37,611 20,420 26,269 3.80 .69 2.56 .1867 −.0379 .1389
−.052034 (2.761)
∆ln (Aggravated assault rate)
−.0005725 −.0007599 −.0024482 −.0001748 −.0001831 (82.38) (91.259) (88.38) (75.969) (53.432)
−.025933 (.841)
∆ln (Rape rate)
−.0004919 −.0015482 −.0008641 −.0009272 (75.713) (25.967) (46.509) (67.782)
−.021589 (1.689)
∆ln ∆ln (Violent (Murder crime rate) rate)
Endogenous variables (in terms of first differences)
Table 8.9 Rerunning the regressions on differences
(25.968)
−.0015481
−.0363798 (1.826)
−.0758797 −.042305 (6.241) (1.642) 37,611 20,420 3.8 .69 .1868 −.0378
(75.728)
−.0004919
−.026959 (2.57)
(67.819)
−.0009275
−.0540946 (4.414)
−.0188927 −.056264 (1.196) (3.983) 26,269 37,694 2.56 4.04 .1389 .1975
(46.519)
−.0008642
−.0394318 (2.887)
(91.266)
−.0007598
.0481937 (6.303)
(88.362)
−.002448
.0072487 (.898)
(75.978)
−.0001748
(53.495)
−.0001829
.0623146 .2419118 (6.676) (13.884)
−.1176478 −.0907433 −.0742121 −.1016434 −.248623 (6.801) (10.341) (8.038) (9.494) (12.506) 27,999 40,901 40,686 40,671 37,581 4.05 4.37 6.62 3.11 10.45 .2282 .205 .3016 .1393 .4365
(82.371)
−.0005724
.0071132 (.471)
Note: The variables for income, population, racial, sex, and age compositions of the population and density are all in terms of first differences. While not all the coefficient estimates are reported, all the control variables used in Table 8.3 are used here, including year and county dummies. Absolute t-statistics are in parentheses. All regressions use weighting where the weighting is each county’s population.
N F-statistic Adjusted R2
Intercept
First differences in the arrest rate for the crime category corresponding to the appropriate endogenous variable
All variables differenced: First differences in the shall issue law adopted dummy
158
John R. Lott, Jr., and David B. Mustard
The short- and long-term effects of these legal changes were further examined by reestimating the regressions in Tables 8.3 and 8.7 with a time trend for the number of years after the law has been in effect and that time trend squared. A similar set of time trends were also added for before the law went into effect to test whether there were systematic changes in crime leading up to the passage of the law. While not shown, these regression results provide consistent strong evidence that the deterrent impact of concealed handguns increases with time. For most violent crimes, the time trend leading up to the adoption of the laws indicates that crime was rising prior to the laws being enacted. Figure 8.1 shows how the violent crime rate varies before and after the implementation of these nondiscretionary permit laws. Using restricted least squares to compare whether the crime rate trends before and after the enactment of the laws were the same, F-tests reject that hypothesis at least at the 10 percent level for all the crime categories except aggravated assault and larceny, where the F-tests are only significant at the 20 percent level. All the results in Tables 8.3, 8.6, and 8.7 were reestimated to deal with the concerns raised in Section II over the “noise” in arrest rates arising from the timing of offenses and arrests and the possibility of multiple offenders. We reran all the regressions in this section first by limiting the sample to those counties over 10,000, 100,000, and then 200,000 people. Consistent with the evidence reported in Table 8.7, the more the sample was limited to larger population counties the stronger and more statistically significant was the
Figure 8.1 The effect of concealed handguns on violent crimes.
Crime, deterrence, and right-to-carry concealed handguns
159
relationship between concealed handgun laws and the previously reported effects on crime. The arrest rate results also tended to be stronger and more significant. We also tried rerunning all the regressions by redefining the arrest rate as the number of arrests over the last 3 years divided by the total number of offenses over the last 3 years. Despite the reduced sample size, the results remained similar to those already reported. Two of the most common laws affecting the use of handguns are increased sentencing penalties when crimes are committed using a gun and waiting periods before a citizen can obtain a gun. To test what role these two types of laws may have played in changing crime rates, we reran the regressions in Tables 8.3 and 8.4 by adding a dummy variable to control for state laws that increase sentencing penalties when deadly weapons are used and variables to measure the impact of waiting periods.60 Because we have no strong prior beliefs about whether the effect of waiting periods on crime is linear with respect to the length of the waiting period, we included not only a dummy variable for when the waiting period is in effect but also variables for the length of the waiting period in days and the length in days squared. In both sets of regressions, the dummy variable for the presence of “shall issue” concealed handgun laws remains generally consistent with the results reported earlier, though the “shall issue” coefficients for robbery in the county-level regressions and for property crimes using the state levels are no longer statistically significant. While the coefficients for arrest rates are not reported, they remain very similar to those shown previously. With respect to the other gun laws, the pattern shown in Table 8.10 is less clear. The county-level data imply that increased sentencing penalties when deadly weapons are used reduce violent crimes (particularly, aggravated assault and robbery), but this effect is not statistically significant for violent crimes using state-level data. The state-level data also indicate no statistically significant nor economically consistent relationship between either the presence of waiting periods or their length and crime. While the county-level data frequently imply a relationship between murder, rape, aggravated assault, and robbery, the coefficients imply quite inconsistent effects for these different crimes. For example, simply passing the law appears to raise murder and rape rates but lower aggravated assaults and robbery. These differential effects also apply to the length of the waiting periods, with longer periods at first lowering and then raising the murder and rape rates; the reverse is true for aggravated assaults. However, these results make it very difficult to argue that waiting periods (particularly long ones) have an overall beneficial effect on crime. In concluding this section, not only does this initial empirical work provide strong evidence that concealed handgun laws reduce violent crime and that higher arrest rates deter all types of crime, but the work also allows us to evaluate some of the broader empirical issues concerning criminal deterrence discussed in Section II. The results confirm some of our earlier discussions on potential aggregation problems with state-level data. County-level data imply
N F-statistic Adjusted R2
Waiting period in days squared
Waiting period in days
Waiting law dummy
Enhanced sentencing law dummy
A. County-level regressions: Shall issue law adopted dummy
Exogenous variables
(.230) .23386 (3.663) −.0943 (5.112)
(3.976) .02297 (.601) −.000829 (.075)
43,451 115.06 .8926
26,458 37.96 .8062
−.0008046 .00546 (1.182) (4.864)
−.08747 (5.173) −.00284
−.04171 (3.976) −.04171
ln ln (Violent (Murder crime rate) rate)
33,865 45.24 .8018
.00802 (9.363)
(1.165) .2534 (5.213) −.1363 (9.726)
−.06113 (4.660) .01128
ln (Rape rate)
Table 8.10 Controlling for other laws regulating gun use
43,445 70.51 .8348
−.00498 (6.248)
(1.680) −.0937 (2.071) .06447 (4.966)
−.05462 (4.452) −.01528
ln (Aggravated assault rate)
34,949 132.58 .9202
.00731 (7.836)
(2.694) −.09307 (1.704) −.1121 (7.349)
−.01817 (1.272) −.028832
ln (Robbery rate)
45,940 87.30 .8564
.0001884 (.376)
(.003) .02023 (.718) −.01477 (1.812)
.03633 (4.717) −.0000151
ln (Property crime rate)
45,769 84.99 .8499
.002268 (4.297)
(3.340) .02012 (.679) −.04533 (5.279)
.0133 (1.636) −.01992
ln (Burglary rate)
45,743 59.34 .8018
(2.751)
(1.733) −.003398 (.098) −.011885 (1.175) −.001706
(4.723) .01219
.045018
ln (Larceny rate)
43,589 116.32 .8932
(1.237)
(2.021) −.08302 (1.853) −.0100 (.772) .0009851
(6.695) −.0182
.08206
ln (Auto theft rate)
(1.103) .0684 (.464) −.03066 (.744)
(1.491) .1010 (.809) −.02988 (.854)
804 134.75 .9491
809 100.20 .9322
−.00132 (.553)
(2.068) .0303
(3.030) .0347
.00117 (.576)
−.0810
−.1005
804 76.15 .9129
.0059 (3.004)
(1.209) .2173 (1.805) −.1049 (3.109)
(1.799) .02725
−.05746
811 127.93 .9461
−.00041 (.200)
(1.192) .02613 (.205) −.0065 (.183)
(3.013) −.0283
−.10189
811 123.66 .9443
.0059 (2.017)
(.217) .1524 (.842) −.1000 (1.978)
(2.770) .0073
−.1332
811 78.29 .9144
−.000207 (.149)
(1.798) .0325 (.378) −.0095 (.397)
(1.499) .0287
−.0342
811 82.33 .9183
.0005 (.302)
(.282) .0647 (.628) −.0220 (.765)
(2.785) .0054
−.0761
811 75.57 .9116
(.435)
(2.354) .0233 (.276) −.0053 (.223) −.00059
(.976) .0369
−.0219
811 168.47 .9586
(.921)
(.564) −.0307 (.184) −.0238 (.509) −.00248
(.178) .0175
−.0079
Note: The control variables are the same as those used in Table 8.3 including year and county dummies. Absolute t-statistics are in parentheses. All regressions use weighting where the weighting is each county’s population.
N F-statistic Adjusted R2
Waiting period in days squared
Waiting period in days
Waiting law dummy
Enhanced sentencing law dummy
B. State-level regressions: Shall issue law adopted dummy
162 John R. Lott, Jr., and David B. Mustard that arrest rates explain about six times the variation in violent crime rates and eight times the variation in property crime rates that arrest rates explain when we use state-level data. Breaking the data down by whether a county is a high- or a low-crime county indicates that arrest rates do not affect crime rates equally in all counties. The evidence also confirms the claims of law enforcement officials that “shall issue” laws represented more of a change in how the most populous counties permitted concealed handguns. One concern that was not borne out was over whether state-level regressions could bias the coefficients on the concealed handgun laws toward zero. In fact, while state- and county-level regressions produce widely different coefficients for property crimes, seven of the nine crime categories imply that the effect of concealed handgun laws was much larger when state-level data were used. However, one conclusion is clear: the very different results between state- and county-level data should make us very cautious in aggregating crime data and would imply that the data should remain as disaggregated as possible. The endogeneity of arrest rates and the passage of concealed handgun laws The previous specifications have assumed that both the arrest rate and the passage of concealed handgun laws are exogenous. Following Ehrlich,61 we allow for the arrest rate to be a function of the lagged crime rates; per capita and per violent and property crimes measures of police employment and payroll at the state level (these three different measures of employment are also broken down by whether police officers have the power to make arrests); the measures of income, unemployment insurance payments, and the percentages of county population by age, sex, and race used in Table 8.3; and county and year dummies.62 In an attempt to control for political influences, we also included the percentage of a state’s population that are members of the National Rifle Association and the percentage of the vote received by the Republican presidential candidate at the state level. Because presidential candidates and issues vary between elections, the percentage voting Republican is undoubtedly not directly comparable across years. To account for these differences across elections, we interacted the percentage voting Republican with dummy variables for the years immediately next to the relevant elections. Thus, the percentage of the vote obtained in 1980 is multiplied by a year dummy for the years 1979–82, the percentage of the vote obtained in 1984 is multiplied by a year dummy for the years 1983–86, and so on, through the 1992 election. A second set of regressions explaining the arrest rate also includes the change in the natural log of the crime rates to proxy for the difficulty police forces face in adjusting to changing circumstances.63 However, the time period studied in all these regressions is more limited than in our previous tables because state-level data on police employment and payroll are only available from the U.S. Department of Justice’s Expenditure and Employment data for the Criminal Justice System from 1982 to 1992.
Crime, deterrence, and right-to-carry concealed handguns
163
There is also the question of why some states adopted concealed handgun laws while others did not. As noted earlier, to the extent that states adopted the law because crime was either rising or was expected to increase, OLS estimates underpredict the drop in crime. Similarly, if these rules were adopted when crime rates were falling, the bias is in the opposite direction. Thus, in order to predict whether a county would be in a state with concealed handgun laws we used both the natural logs of the violent and property crime rates and the first differences of those crime rates. To control for general political differences that might affect the chances of these laws being adopted, we also included National Rifle Association membership as a percentage of a state’s population; the Republican presidential candidate’s percentage of the statewide vote; the percentage of a state’s population that is black and the percentage white; the total population in the state; regional dummy variables for whether the state is in the South, Northeast, or Midwest; and year dummy variables. While the 2SLS estimates shown in the top half of Table 8.11 again use the same set of control variables employed in the preceding tables, the results differ from all our previous estimates in one important respect: concealed handgun laws are associated with large significant drops in the levels of all nine crime categories. For the estimates most similar to Ehrlich’s study, five of the estimates imply that a 1 standard deviation change in the predicted value of the shall issue law dummy variable explains at least 10 percent of a standard deviation change in the corresponding crime rates. In fact, concealed handgun laws explain a greater percentage of the change in murder rates than do arrest rates. With the exception of robbery, the set of estimates using the change in crime rates to explain arrest rates indicates a usually more statistically significant but economically smaller effect from concealed handgun laws. For example, concealed handgun laws now explain 3.9 percent of the variation in murder rates compared to 7.5 percent in the preceding results. While these results imply that even crimes with relatively little contact between victims and criminals experienced declines, the coefficients for violent crimes are still relatively more negative than the coefficients for property crimes. For the first-stage regressions explaining which states adopt concealed handgun laws (shown in the bottom half of Table 8.11), both the least square and logit estimates imply that the states adopting these laws are relatively Republican with large National Rifle Association memberships and low but rising violent and property crime rates. The other set of regressions used to explain the arrest rate shows that arrest rates are lower in high-income, sparsely populated, Republican areas where crime rates are increasing. We also reestimated the state-level data using similar 2SLS specifications. The coefficients on both the arrest rates and concealed handgun law variables remained consistently negative and statistically significant, with the statelevel data again implying a much stronger effect from concealed handguns and a much weaker effect from higher arrest rates. Finally, in order to use
Arrest rate for the crime category corresponding to the appropriated endogenous variable
1. Using the predicted values of arrest rates similar to Ehrlich’s 1973 study: Shall issue law adopted dummy
Exogenous variables
−.002324 (9.6892) 60.7%
−1.262 (21.731) 10.5%
ln (Violent crime rate)
−.00094 (1.8436) 5.2%
−1.1063 (5.7598) 7.5%
ln (Murder rate)
−.0359 (9.667) 60.1%
−1.059 (−4.4884) 6.4%
ln (Rape rate)
−.002176 (7.1883) 44.6%
−1.3192 (18.5277) 10.1%
−.00241 (4.481) 36.9%
−.8744 (7.4979) 4.9%
ln ln (Aggravated (Robbery assault rate) rate)
Endogenous variables (in crimes per 100,000 population)
−.01599 (33.26) 80.1%
−1.1182 (15.3716) 7.67%
−.002759 (2.989) 21.3%
−.7668 (11.435) 11.4%
ln ln (Property (Auto theft crime rate) rate)
A. Allowing the change in the “shall issue” law and the arrest rate to be endogenous using two-stage least squares (2SLS)*
Table 8.11 Regression estimates of the causes and effects of the adoption of concealed handgun laws
−.01783 (14.36) 79.6%
−.7603 (19.328) 10.6%
ln (Burglary rate)
−.0124 (31.814) 80.6%
−1.122 (25.479) 13.5%
ln (Larceny rate)
N F-statistic Adjusted R2
31,129 1,723 .9942
31,129 1,260.9 .9921
N 31,129 31,129 F-statistic 61.97 19.07 .8592 .644 Adjusted R2 2. Including the change in crime rates when estimating the predicted values of the arrest rates: Shall issue law adopted dummy −.26104 −.5732 (20.12) (18.21) 2.2% 3.9% Arrest rate for the crime category corresponding to the appropriate endogenous variable −.007827 −.024 (746.74) (687.7) 104% 95% −.29881 (15.4465) 2.3%
31,129 39.81 .7953
31,129 4,909.6 .9980
31,129 797.5 .9876
−.02626 −.01028 (1.047) (582) 117% 88%
−.1992 (9.6317) 1.2%
31,129 22.3 .6807
−.20994 (29.4242) 3.3%
−.2774 (32.5051) 2.1%
31,129 31,129 60.78 84.21 .8568 .8893
31,129 3,614.86 .9972
31,129 31,129 1,671.49 6,424 .9941 .9984
−.00716 −.00933 −.01233 (901.8) (820.7) (1,242.7) 109% 95% 95.1%
−.0054 (.2935) 0.3%
31,129 63.71 .8626
−.2623 (32.4253) 3.2%
31,129 38.37 .7891
31,129 1,389 .9929
(Continued )
31,129 1,625.8 .9939
−.03839 −.0101 (796.8) (956.14) 71% 101%
−.1153 (13.397) 1.6%
31,129 46.48 .8199
ln (Violent crime rate)
N χ2 Pseudo-R2
Shall issue law
−.0797 (6.003)
Least squates estimate: Shall issue law −.01817 (9.710) N F statistic Adjusted R2 Logit:
Endogenous variables
.038249 (3.294)
.00825 (5.031)
∆ ln (Violent crime rate)
Exogenous variables
B. First-stage estimates of shall issue law†
−.2095 (8.657)
−.02889 (8.748)
ln (Property crime rate)
.08119 (3.121)
.0094 (2.577)
∆ ln (Property crime rate)
.0004344 (10.329)
.000107 (19.383)
NRA membership as % of state population
.0034 (4.986)
% Rep. pres. in state vote 84 * year dummy 83–86
.0567 .01456 (6.227) (2.437) 31,137 5,007.44 .1687
.0061 (5.485) 31,137 209.85 .1436
% Rep. pres. in state vote 80 * year dummy 79–82
.09976 (16.203)
.01702 (22.844)
% Rep. pres. in state vote 88 * year dummy 87–90
.12249 (16.273)
.0299 (27.317)
% Rep. pres. in state vote 92 * year dummy 91–92
.0409 (10.090)
.00518 (13.06)
% Population black for state
.0364 (9.131)
.0031 (8.470)
% Population white for state
N F-statistic Adjusted R2
N F-statistic Adjusted R2 Arrest rate for property crimes
1. The predicted values of arrest rates that most closely correspond to Ehrlich’s 1973 2SLS estimates: Arrest rate for violent crimes
Endogenous variables
...
−2.224 (1.441)
ln (Violent crime rate lagged)
.90203 (.738)
...
ln (Property crime rate lagged)
Exogenous variables
−2,805.2 (1.173)
−14,093.61 (3.065)
No. of police in state employed with power of arrest/state population
−1.3057 (.059)
95.085 (2.206)
No. of police in state employed without power of arrest/state population
.01045 (1.305) 30,814
.01463 (1.940) 28,954
1.08 .0084
.00415 (.697)
1.83 .0814
.0739 (6.418)
NRA Population membership density per as % of state square mile population
C. First-stage estimates of the probability of arrest: violent and property crime rates‡
−1.5931 (4.434)
−6.936 (9.975)
% Rep. pres. in state vote 80 * year dummy 79–82
−.9155 (3.420)
−4.293 (8.270)
% Rep. pres. in state vote 84 * year dummy 83–86
−1.1778 (4.004)
−3.3467 (5.865)
−1.2009 (3.416)
− 3.4316 (4.967)
% Rep. pres. in state vote 92 * year dummy 91–92
(Continued\)
% Rep. pres. in state vote 88 * year dummy 87–90
2. Including the change in crime rates in addition to those already noted when estimating the predicted values of arrest rates (the coefficients on the percentage of the state voting Republican in presidential elections is similar to those reported above): Arrest rate for violent crimes −128.4 (39.86)
ln (Violent crime rate lagged)
−123.64 (44.17)
∆ ln (Violent crime rate)
Exogenous variables
...
ln (Property crime rate lagged)
...
∆ ln (Property crime rate)
−12,194 (2.750)
No. of police in state employed with power of arrest/state population
96.3244 (2.317)
No. of police in state employed without power of arrest/state population
.0009 (.060)
.0646 (5.284)
NRA Density per membership square mile as % of state population
−.0000726 (4.877)
Country population
...
...
−109.69 (49.342)
−106.92 (58.21)(.618) 30,814 2.30 .1165
28,954 2.59 .1458 −1,394 (.095)
−1.9891 (.949)
−.0072
(1.473)
.0083
(1.522)
−.0000111
Source: Isaac Ehrlich, Participation in Illegitimate Activities: A Theoretical and Empirical Investigation, 81 J. Pol. Econ. 521–65 (1973). Notes: * While not all coefficient estimates are reported, all the control variables are the same as those used in Table 3, including year and county dummies. Absolute tstatistics are in parentheses, and the percentage reported below that for some of the numbers is the percent of a standard deviation change in the endogenous variable that can be explained by a 1 standard deviation change in the exogenous variable. † Absolute t-statistics are in parentheses. The sample is limited because the data on police employment used in producing the predicted arrest rates were available only for 1982–92. While the estimates from the first specification were used in the above regressions, the logit estimates are provided for comparison. Not all the variables that were controlled for are shown. These additional variables included year and regional dummies (South, Northeast, and Midwest) and the state’s population. NRA = National Rifle Association. % Rep. Pres. = percentage of the vote received by the Republican presidential candidate. ‡ Absolute t-statistics are in parentheses. The sample is limited because the data on police employment were available only for 1982–92. Not all the variables that were controlled for are shown. These additional variables included the number of police with arrest powers divided by the number of violent crimes; the number of police with arrest powers divided by the number of property crimes; the number of police without arrest powers divided by the number of violent crimes; the number of police without arrest powers divided by the number of property crimes; these preceding variables using payrolls; the breakdown of the county’s population by age, sex, and race used in Table 8.3; year and county dummies; the measures of income reported in Table 8.3; and the state’s population. The estimates also using the change in crime rates are available from the authors. NRA = National Rifle Association. % Rep. pres. = percentage of the vote received by the Republican presidential candidate.
N F-statistic Adjusted R2
N F-statistic Adjusted R2 Arrest rate for property crimes
170
John R. Lott, Jr., and David B. Mustard
the longer data series available for the nonpolice employment and payroll variables, we reran the regressions without those variables and produced similar results. Ehrlich also raises the concern that the types of 2SLS estimates shown in Table 8.11, part A, might still be affected by spurious correlation if the measurement errors for the crime rate are serially correlated over time. (The potential difficulties for part B are much more serious.) To account for this, we reestimated the first-stage regressions predicting the arrest rate without the lagged crime rate. Doing this makes the estimated results for the shall issue law dummy even more negative and statistically significant than those already shown. Finally, using the predicted values for the arrest rates allows us to investigate the significance of another weakness with the data. The arrest rate data experience not only some missing observations but also instances where it is undefined when the crime rate in a county equals zero. This last issue is really only a concern for murders and rapes in low population counties. In these cases both the numerator and denominator in the arrest rate are equal to zero, and it is not clear whether we should count this as an arrest rate equal to 100 or 0 percent, neither of which seems very plausible. The previously reported evidence where regressions were run only on the larger counties sheds some light on this question since these counties do not exhibit this problem. In addition, if the earlier reported evidence that the movement to nondiscretionary permits largely confirmed the preexisting practice in the lower population counties, one would expect relatively little change in these counties with the missing observations. However, the analysis presented in this section also allowed us to try another approach to deal with this issue. We created predicted arrest rates for these observations using the regressions that explain the arrest rate in Table 8.11, and then we reestimated the second-stage relationships shown there for murder and rape with the new larger samples. While the coefficient on murder declines, implying a 5 percent drop when “shall issue” laws are adopted, the coefficient for rape increases, now implying over a 10 percent drop. Both coefficients are statistically significant. The effect of arrest rates also remains negative and statistically significant. Concealed handgun laws, the method of murder, and the choice of murder victims Do concealed handgun laws cause a substitution in the methods of committing murders? For example, it is possible that the number of gun murders rises after these laws are passed even though the total number of murders falls. While concealed handgun laws raise the cost of committing murders, murderers may also find it relatively more dangerous to kill people using nongun methods once people start carrying concealed handguns and substitute into guns to put themselves on a more even basis with their potential
Crime, deterrence, and right-to-carry concealed handguns
171
prey. Using data on the method of murder from the Mortality Detail Records provided by the United States Department of Health and Human Services, we reran the murder rate regression from Table 8.3 on counties over 100,000 during the period from 1982 to 1991. We then separated out murders caused by guns from all other murders. Table 8.12 shows that carrying concealed handguns appears to have been associated with approximately equal drops in both categories of murders. Carrying concealed handguns appears to make all types of murders relatively less attractive. There is also the question of what effect concealed handgun laws have on determining which types of people are more likely to be murdered? Using the Uniform Crime Reports Supplementary Homicide Reports we were able to obtain annual state-level data from 1977 to 1992 on the percentage of victims by sex, age, and race as well as information on whether the victim and the offender knew each other (whether they were members of the same family, knew each other but were not members of the same family, strangers, or the relationship is unknown).64 Table 8.13 implies no statistically significant relationship between the concealed handgun dummy and the victim’s sex, race, or relationships with offenders. However, while they are not quite statistically significant at the .10 level for a two-tailed t-test, two of the point estimates appear economically important and imply that in states with concealed handgun laws the percent of victims who know their non-family offenders rose by 2.6 percentage points and that the percentage of victims where it was not possible to determine whether a relationship existed declined Table 8.12 Changes in murder methods for counties over 100,000, 1982–91 Endogenous variables (in murders per 100,000 population) Exogenous variables
ln (Total murders)
ln (Murder with guns)
ln (Murders by nongun methods)
Shall issue law adopted dummy
−.09074 (3.183) −.00151 (26.15)
−.09045 (1.707) −.00102 (6.806)
−.08854 (1.689) −.00138 (7.931)
.63988 (.436) 12,740 21.40 .8127
−8.7993 (2.136) 12,759 6.60 .5432
−7.51556 (2.444) 8,712 4.70 .5065
Arrest rate for murder Intercept N F-statistic Adjusted R 2
Note: While not all the coefficient estimates are reported, all the control variables are the same as those used in Table 8.3, including year and county dummies. Absolute t-statistics are in parentheses. All regressions use weighting where the weighting is each county’s population. The first column uses the Uniform Crime Reports numbers for counties over 100,000, while the second column uses the numbers on total gun deaths available from the Mortality Detail Records, and the third column takes the difference between the Uniform Crime Report’s numbers for total murders and Mortality Detail Records of gun deaths.
102.20 (1.718) 804 14.27 .6409
.00068 (.141)
.3910 (.388)
Male
.0476 (.399)
Unidentified
−3.2763 (.056) 804 14.51 .6450 1.0558 (.150) 804 1.06 .0077
−.001385 .000703 (.289) (1.227)
−.4381 (.439)
Female
152.19 (1.418) 804 45.47 .8568
−.0202 (2.316)
.0137 (.017)
White
−30.948 (.428) 804 125.09 .9435
.0132 (2.244)
.7031 (.575)
Black
By victim’s race
−7.7863 (.093) 804 35.94 .8245
.00327 (.478)
−.8659 (.609)
Hispanic
−73.4677 (.755) 804 14.96 .6525
.0174 (2.198)
2.5824 (1.567)
Offender is known to victim but is not in family
165.1719 (2.345) 804 12.87 .6150
89.843 (165.17) 804 7.84 .4790
(1.394)
.0079 (2.541)
(.459)
(.210)
.5438
−.2503
−.0145
Offender is a stranger
Offender is in the family
By victim’s relationship with offender
−81.55 (.703) 804 26.06 .7712
(1.141)
−.0108
(1.464)
−2.8755
Relationship is unknown
Note: While not all the coefficient estimates are reported, all the control variables are the same as those used in Table 8.4, including year and state dummies. Absolute t-statistics are in parentheses. All regressions use weighting where the weighting is each state’s population.
N F-statistic Adjusted R2
Intercept
Arrest rate for murder
Shall issue law adopted dummy
Exogenous variables
By victim’s sex
Endogenous variables (in percentage points)
Table 8.13 Changes in composition of murder victims using annual state-level data from the uniform crime reports supplementary homicide reports, 1977–92
Crime, deterrence, and right-to-carry concealed handguns
173
by 2.9 percentage points. This raises the question of whether concealed handguns cause criminals to substitute into crimes against those whom they know and presumably are also more likely to know whether they carry concealed handguns. While the effect of age (not shown in Table 8.13) is negative (consistent with the notion that concealed handguns deter crime against adults more than young people because only adults can legally carry concealed handguns), the effect is statistically insignificant. Possibly some of the benefits from adults carrying concealed handguns are conferred to younger people who may be protected by these adults. The arrest rate for murder variable produces more interesting results. The percentage of white victims and the percentage of victims killed by family members both declined when states passed concealed handgun laws, while the percentage of black victims and the percentage of victims killed by nonfamily members that they know both increased. The results imply that higher arrest rates have a much greater deterrence effect on murders involving whites and family members. One explanation is that whites with higher incomes face a greater increase in expected penalties for any given increase in the probability of arrest. Arizona, Pennsylvania, and Oregon county data One problem with the preceding results was the use of county population as a proxy for how restrictive counties were in allowing concealed handgun permits before the passage of “shall issue” laws. Since we are still going to control county-specific levels of crime with county dummies, a better measure would have been to use the actual change in gun permits before and after the adoption of a concealed handgun law. Fortunately, we were able to get that information for three states: Arizona, Oregon, and Pennsylvania (see Table 8.14). Arizona and Oregon also provided additional information on the conviction rate and the mean prison sentence length. However, for Oregon, because the sentence length variable is not directly comparable over time, it is interacted with all the year dummies so that we can still retain any crosssectional information in the data. One difficulty with the Arizona prison sentence and conviction data is that they are available only from 1990 to 1995 and that since the shall issue handgun law did not take effect until July 1994, it is not possible for us to control for all the other variables that we control for in the other regressions. Unlike Oregon and Pennsylvania, Arizona did not allow private citizens to carry concealed handguns prior to July 1994, so the value of concealed handgun permits equals zero for this earlier period. Unfortunately, however, because Arizona’s change in the law is so recent, we are unable to control for all the variables that we can control for in the other regressions. The results in Table 8.15 for Pennsylvania and Table 8.16 for Oregon provide a couple of consistent patterns. The most economically and statistically important relationship involves the arrest rate: higher arrest rates consistently
Gun ownership information: Shall issue dummy Change in the (number of right-to-carry pistol permits/population 21 and over) between 1988 and each year since the law was implemented, otherwise zero Arrest rates are the ratio of arrests to offenses for a particular crime category: Violent crimes Murder Rape Aggravated assault Robbery Property crimes Auto theft Burglary Larceny Conviction rates are the ratio of convictions to arrests for a particular crime category (for Arizona it is the ratio of convictions to offenses): Violent crimes Murder Rape Aggravated assault
Variable
.1875 .02567
66.17437 100.8344 37.80920 76.37541 50.98248 21.95107 57.17941 18.99394 21.71564
25.93325 94.42969 161.7508 2.505037
576 368 507 558 490 576 566 576 576
542 358 444 536
Mean
576 576
N
Oregon
40.5691 107.128 215.635 5.61042
49.2031 97.2253 37.8298 62.5568 53.2559 7.90548 99.6343 11.0296 8.21388
.39065 .13706
S.D.
1,072 801 1,031 1,070 999 1,072 1,069 1,072 1,072
1,072 1,072
N
55.0738 92.2899 52.5967 57.4422 53.5970 21.0539 36.6929 18.8899 22.0378
.24627 .46508
Mean
Pennsylvania
Table 8.14 Oregon, Pennsylvania, and Arizona sample means and standard deviations
21.1293 64.0169 32.8287 25.6491 49.3320 7.12458 63.9266 8.50639 7.47778
.4310 1.2365
S.D.
90 90 90 90
90 90
N
16.0757 111.8722 47.4365 9.204778
.33333 2.1393
Mean
Arizona
33.85482 107.9311 81.42314 13.66225
.47404 15.02066
S.D.
Robbery Property crime Auto theft Burglary Larceny Prison sentence in months (Oregon) or years (Arizona): Murder Rape Aggravated assault Robbery Auto theft Burglary Larceny Crime rates defined per 100,000 people: Violent crimes Murder Rape Aggravated assault Robbery Property crimes Auto theft Burglary Larceny Real per capita income data (in real 1983 dollars): Personal income Unemployment insurance Income maintenance Retirement payments per person over 65
38.51352 6.530883 10.1805 15.56064 2.577337
301.6697 103.2212 154.4647 106.8709 43.40494 65.17791 46.42925 4079.07 4.52861 31.4474 196.192 50.5625 282.666 228.403 1,089.5 2,761.17
11,389.39 108.8037 131.4323 12,335.17
420 555 539 544 552
327 443 241 364 405 489 424 576 576 576 576 576 576 576 576 576
576 576 576 576
1,630.47 45.9864 40.3703 1,278.18
1621.53 6.67245 25.4623 152.965 89.5707 230.421 157.204 495.926 1,098.06
164.55 50.4662 79.7893 55.4847 20.7846 32.2003 19.0075
49.9308 13.8484 14.3673 17.7937 11.3266
1,072 1,072 1,072 1,072
1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072
11,525 130.560 149.652 13,398.9
2281.56 3.01319 15.9726 107.332 45.2030 171.485 160.831 753.668 1,367.06
2,099.44 64.0694 69.5516 2,253.29
967.430 4.12252 11.6156 78.5966 86.7830 156.683 162.572 535.022 569.563
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90
90 90 90 90 90
429.2972 5.7787778 23.5 339.2977 60.72056 4,147.692 351.3749 950.7187 2,845.597
16.0557 8.761905 4.28876 6.852239 1.415 3.937647 66.64444
17.09185 1.370787 1.175114 2.534157 1.070667
(Continued )
254.1692 4.413259 18.90888 200.0264 71.75822 2,282.633 339.0281 563.3711 1,569.837
7.31179 5.974623 1.874496 3.108169 .3308054 1.03187 145.6599
39.17454 1.432515 3.671085 3.4627 1.308081
Population characteristics: County population County population per square mile Race and age data (% of population): Black male under 10 Black female under 10 White male under 10 White female under 10 Other male under 10 Other female under 10 Black male 10–19 Black female 10–19 White male 10–19 White female 10–19 Other male 10–19 Other female 10–19 Black male 20–29 Black female 20–29 White male 20–29 White female 20–29 Other male 20–29 Other female 20–29 Black male 30–39 Black female 30–39 White male 30–39
Variable
Table 8.14—continued
576 576 576 576 576 576 576 576 576 576 576 576 576 576 576 576 576 576 576 576 576
576 576
N
.051847 .049275 7.367641 7.012212 .322532 .307242 .052283 .047129 7.603376 7.140808 .308009 .295728 .064034 .042044 6.918945 6.767993 .280987 .273254 .048262 .032534 7.363739
74,954.98 77.46861
Mean
Oregon
.092695 .089665 .683587 .649409 .437321 .402487 .084658 .088479 .952584 .895257 .348147 .286703 .087570 .082821 1.613700 1.485155 .322992 .287497 .073100 .071081 .883651
112,573.3 219.7100
S.D.
1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072
1,072 1,072
N
.2089 .2018 6.7258 6.3567 .0525 .0536 .2515 .2276 7.7274 7.37287 .05396 .05141 .24866 .22014 7.53233 7.56037 .05412 .05431 .19163 .17443 6.81373
177,039 453.549
Mean
Pennsylvania
.439286 .434456 .808574 .761709 .040573 .039637 .468536 .473586 1.155154 1.158130 .040844 .038375 .439191 .497373 1.416936 1.094322 .078002 .060281 .354741 .419096 .850949
274,289.9 1,516.16
S.D.
N
Mean
Arizona S.D.
White female 30–39 Other male 30–39 Other female 30–39 Black male 40–49 Black female 40–49 White male 40–49 White female 40–49 Other male 40–49 Other female 40–49 Black male 50–64 Black female 50–64 White male 50–64 White female 50–64 Other male 50–64 Other female 50–64
576 576 576 576 576 576 576 576 576 576 576 576 576 576 576
7.333140 .227610 .248852 .030101 .022872 5.506716 5.456938 .148190 .157778 .028558 .024530 7.123300 7.396392 .135419 .158164
.845647 .215892 .221020 .044355 .043869 .817220 .760387 .127731 .121413 .045301 .050093 1.164997 1.084129 .115337 .126546
1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072 1,072
6.87622 .04737 .05518 .12300 .12520 5.27656 5.43223 .03571 .03901 .13316 .15634 7.27097 8.08559 .02496 .03093
.837649 .050606 .045324 .244123 .311716 .727481 .650546 .030029 .030711 .305455 .404990 .814601 1.031230 .021059 .021638
Population per square mile
Arrest rate for the crime category corresponding to the appropriate endogenous variable
Change in the (number right-to-carry pistol permits/population over 21) between 1988 and each year since the law was implemented
Exogenous variables
−.000386 (.832)
−.00785 (7.371) 25%
−.0527 (1.653) 10%
ln (Violent crime rate)
.00262 (1.991)
−.00365 (6.364) 15%
−.267 (2.759) 21%
ln (Murder rate)
.000987 (1.087)
−.000804 (.668) 2%
−.0567 (1.725) 6%
ln (Rape rate)
−.00039 (.600)
−.00763 (6.413) 28%
−.0481 (1.656) 9%
ln (Aggravated assault rate)
Endogenous variables (in crimes per 100,000 population)
.0005395 (.835)
−.000836 (7.031) 24%
.0124 (.265) 2%
ln (Robbery rate)
.00037 (1.283)
−.0041 (2.057) 8%
−.00116 (.060) 1%
ln (Property crime rate)
−.000171 (.275)
−.00065 (1.185) 4%
.0146 (.337) 2%
ln (Auto theft rate)
.000518 (1.442)
−.0112 (5.138) 25%
−.0140 (.562) 4%
ln (Burglary rate)
.00077 (2.601)
.00126 (.641) 2%
.0073 (.37) 2%
ln (Larceny rate)
Table 8.15 Using Pennsylvania data on the number of permits issued to measure the differential impact of Pennsylvania’s 1989 “shall issue” law on different counties: data for counties with populations over 200,000
−15.352 (.348) 264 219.4 .9841
.0000376 (1.074)
118.93 (1.069) 264 38.70 .9150
−.000016 (.156) −67.015 (.889) 264 42.49 .9221
.000066 (1.071) 34.752 (.671) 264 75.00 .9549
.0000197 (.452) −52.529 (.993) 264 227.51 .9848
.000047 (1.055) −10.31 (.467) 264 111.04 .9691
−.0000485 (2.611) 27.816 (.557) 264 225.8 .9846
−.000067 (1.599)
−29.40 (1.016) 264 87.43 .9609
−.000034 (1.396)
6.2484 (.269) 264 83.19 .9591
−.00004 (2.025)
Note: Absolute t-statistics are in parentheses, and the percentage reported below is the percent of a standard deviation change in the endogenous variable that can be explained by a 1 standard deviation change in the exogenous variable. While not all the coefficient estimates are reported, all the control variables are the same as those used in Table 8.3, including year and county dummies. All regressions use weighted least squares where the weighting is each county’s population. The use of SHALL*POPULATION variable that was used in the earlier regressions instead of the change in right-to-carry permits variable was tried here and produced very similar results. We also tried controlling for either the robbery or burglary rates, but we obtained very similar results.
N F-statistic Adjusted R2
Intercept
Real per capita personal income
(9.284) 19%
(6.785) 17%
6.1725 (.342) 250 5.74 .6620
8.2432 (.496) 393 16.61 .8113
84.464 (3.131) 239 38.79 .9439
−.01511 (2.150) 6% .01177 (2.430) −.000162 (1.301)
(7.279) 19%
−.0475 (.272) .5% −.00442
ln (Aggravated assault rate)
−16.303 (1.114) 337 97.94 .9677
−.00190 (4.465) 4% .0079 (2.551) −.000108 (1.542)
(4.806) 9%
−.04664 (.385) .28% −.00363
ln (Robbery rate)
(4.458) 16%
.02655 (.536) 1% −.00679
2.6213 (.326) 403 156.02 .9766
−11.2489 (2.169) 487 89.90 .9522
20.047 (4.748) 422 86.81 .9569
−.00859 (3.140) 20% −.0003 (.319) 8.29E–6 (.407)
(6.764) 16%
−.0936 (2.328) 3% −.00936
ln (Burglary ln (Larceny rate) rate)
−.00373 −.00274 (3.031) (4.297) 4% 10% .00062 .00425 (.367) (3.937) .000037 .000021 (.965) (.816)
(1.481) 3%
.1172 (1.533) 1% −.00036
ln (Auto theft rate)
Note: Absolute t-statistics are in parentheses, and the percentage reported below that is the percent of a standard deviation change in the endogenous variable that can be explained by a 1 standard deviation change in the exogenous variable. We also controlled for prison sentence length, but the different reporting practices used by Oregon over this period makes its use somewhat problematic. To deal with this problem the prison sentence length variable was interacted with year dummy variables. Thus while the variable is not consistent over time it is still valuable in distinguishing penalties across counties at a particular point in time. While not all the coefficient estimates are reported, all the remaining control variables are the same as those used in Table 8.3, including year and county dummies. The categories for violent and property crimes are eliminated because the mean prison sentence data supplied by Oregon did not allow us to use these two categories. All regressions use weighted least squares where the weighting is each county’s population.
N F-statistic Adjusted R2
Intercept
−.00093 (7.668) 10% .0063 (.059) −.000038 (.463)
−.0674 (.486) 1% −.00976
−.3747 (1.598) 3% −.00338
ln (Murder ln (Rape rate) rate)
Conviction rate conditional on arrest for the crime category corresponding to the appropriate endogenous variable −.00208 (6.026) 11% Population per square mile −.00333 (.415) Real per capita personal income −.000138 (.769)
Arrest rate for the crime category corresponding to the appropriate endogenous variable
Change in the (number right-to-carry pistol permits/population over 21) between 1988 and each year since the law was implemented
Exogenous variables
Endogenous variables (in crimes per 100,000 population)
Table 8.16 Oregon data on the number of permits issued, the conviction rate, and prison sentence lengths
Crime, deterrence, and right-to-carry concealed handguns
181
imply lower crime rates, and in 12 of the 16 regressions the effect is statistically significant. Five cases for Pennsylvania (violent crime, murder, aggravated assault, robbery, and burglary) show that arrest rates explain more than 15 percent of a standard deviation change in crime rates. Automobile theft is the only crime for which the arrest rate is insignificant in both tables. For Pennsylvania, murder and rape are the only crimes where a 1 standard deviation change in per capita concealed handgun permits explains a greater percentage of a standard deviation in crime rates than it does for the arrest rate. However, increased concealed handgun usage explains more than 10 percent of a standard deviation change in murder, rape, aggravated assault, and burglary rates. For six of the nine regressions, the concealed handgun variable for Pennsylvania exhibits the same coefficient signs that were shown for the national data. Violent crimes, with the exception of robbery, show that higher concealed handgun use lowers crime rates, while property crimes exhibit very little relationship. Concealed handgun use only explains about one-tenth the variation for property crimes that it explains for violent ones.65 The regressions for Oregon weakly imply a similar relationship between concealed handgun use and crime, but the effect is only statistically significant in one case: larceny, which is also the only crime category where the negative concealed handgun coefficient differs from our previous findings. The Oregon data also show that higher conviction rates consistently result in significantly lower crime rates. A 1 standard deviation change in conviction rates explains 4–20 percent of a 1 standard deviation change in the corresponding crime rates. However, increases in conviction rates appear to produce a smaller deterrent effect than increases in arrest rates for five of the seven crime categories.66 The biggest differences between the deterrent effects of arrest and conviction rates produce an interesting pattern. For rape, increasing the arrest rate by 1 percentage point produces more than 10 times the deterrent effect of increasing the conviction rate conditional on arrest by 1 percent. The reverse is true for auto theft, where a 1 percentage point increase in arrests reduces crime by about 10 times more than the same increase in convictions. These results are consistent with arrests producing large shaming or reputational penalties.67 In fact, the existing evidence shows that the reputational penalties from arrest and conviction can dwarf the other legally imposed penalties.68 However, while the literature has not separated out whether these drops are occurring because of arrest or conviction, these results are consistent with the reputational penalties for arrests alone being significant for at least some crimes. One possible explanation for these results is that Oregon simultaneously passed both the “shall issue” concealed handgun law and a waiting limit. Given the very long waiting period imposed by the Oregon law (15 days), the regressions in Table 8.10 imply that such a waiting period increases murder by 4.8 percent, rape by 2 percent, and robbery by 5.9 percent. At least in the case of murder, which is almost statistically significant in any case, combining the two sets of regressions implies that the larger drop in murder that would have
182
John R. Lott, Jr., and David B. Mustard
been observed in the absence of the Oregon waiting period would have produced a t-statistic for murder of 1.8. The results for the prison sentences are not shown, but the t-statistics are frequently near zero and the coefficients indicate no clear pattern. One possible explanation for this result is that all the changes in sentencing rules produced a great deal of noise in this variable not only over time but also across counties. For example, after 1989 whether a crime was prosecuted under the pre- or post-1989 rules depended on when the crime took place. If the average time between when the offense occurred and when the prosecution took place differs across counties, the recorded prison sentence length could vary even if the actual time served was the same. Finally, the much more limited data set for Arizona used in Table 8.17 produces no significant relationship between the change in concealed handgun permits and the various measures of crime rates. In fact, the coefficient signs themselves indicate no consistent pattern, with the 14 coefficients being equally divided between negative and positive signs, though six of the specifications imply that a 1 standard deviation change in the concealed handgun permits explains at least 8 percent of a 1 standard deviation change in the corresponding crime rates. The results involving either the mean prison sentence length for those sentenced in a particular year or the actual time served for those ending their sentences also imply no consistent relationship between prison and crime rates. While the coefficients are negative in 11 of the 14 specifications, they provide weak evidence of the deterrent effect of longer prison terms: only two coefficients are negative and statistically significant. Since the Brady Law also went into effect during this sample period, we reran Table 8.17 using a dummy variable for the Brady Law. Both the coefficients for the change in permits and the Brady Law dummy variable are almost always insignificant, except for the case of aggravated assault, where the Brady Law is both positive and significant, implying that it increased the number of aggravated assaults by 24 percent. Overall, the Pennsylvania results provide more evidence that concealed handgun ownership reduces violent crime, murder, rape, and aggravated assault, and in the case of Oregon larceny decreases as well. While the Oregon data imply that the change in handgun permits is statistically significant at 11 percent level for a one-tailed t-test, the point estimate is extremely large economically, implying that a doubling of permits reduces murder rates by 37 percent. The other coefficients for Pennsylvania and Oregon imply no significant relationship between the change in concealed handgun ownership and crime rates. The evidence from the small sample for Arizona implies no relationship between crime and concealed handgun ownership. All the results also support the claim that higher arrest and conviction rates deter crime, though, possibly in part due to the relatively poor quality of the data, no systematic effect appears to occur from longer prison sentences. Combining these individual state estimates with the National Institute of Justice’s measures of the losses that victims bear from crime allows us to
−.0803 −.0095 (1.397) (.334) 8% 2%
(6)
(7)
.0039 (.551) 3%
(8)
−.0019 (.222) 2%
(9)
−.0076 (.940) 9%
(10)
.0007 (.225)
(12)
(14)
(Continued )
−.3298 (13.80) 60%
−.0003 −.0005 (.094) (.185) 1% 1%
(13)
9% −.10032 −.1037 −.325 (14.44) (14.62) (12.1) 28% 29% 60%
.0006 (.210) 8%
(11)
ln(Robbery rate) ln(Auto theft rate) ln(Burglary rate) ln(Larceny rate)
.0051 −.00516 .0037 (1.265) (1.291) (.574) 9% 9% 3%
(5)
ln(Aggravated)
−.00399 −.0055 −.0053 −.0453 −.0429 −.0111 −.0110 −.1373 −.1605 (6.798) (7.558) (7.014) (13.51) (12.18) (9.553) (9.391) (1.678) (1.879) 30% 27% 26% 72% 67% 21% 20% 37% 43%
.0025 (.311) 2.7%
(4)
(3)
(1)
(2)
ln(Rape rate)
ln(Murder rate)
Change in the (number right-tocarry pistol permits/population) from the zero allowed before the law and each year since the law was implemented; the numbers for 1994 were multiplied by .5 .0016 (.209) 1.7% Conviction rate for the crime category corresponding to the appropriate endogenous variable −.0039 (7.677) 29%
Exogenous variables
Endogenous variables (in crimes per 100,000 population)
Table 8.17 Arizona data on the number of permits issued, the conviction rate, and prison sentence lengths, 1990–95
−.0178 (.602) 2% −.4459 (3.274) 1.477 (5.262) 75 24.86 .8856
...
−.4748 (3.595) 1.4750 (5.095) 78 27.64 .8925
(6)
−.1424 (2.164) 4.341 (28.46) 89 56.48 .9380
... −.1361 (1.942) 4.365 (26.30) 86 38.79 .9439
−.0170 (.464) 2%
−.0261 . . . (1.155) 6%
(5)
ln(Aggravated)
−.1411 (1.288) 1.838 (5.157) 64 81.33 .9656
...
−.0095 (.629) 1%
(7)
−.1514 (1.477) 1.753 (4.203) 68 76.67 .9629
−.0221 (.871) 2%
...
(8)
−.413 (2.603) 3.432 (5.061) 60 32.12 .9239
...
−.0087 (.055) .2%
(9)
−.4019 (2.433) 2.5099 (7.094) 89 39.60 .9330
.0317 (.463) 2%
...
(10)
−.0835 (1.759) 5.467 (38.66) 84 109.61 .9691
...
−.0084 (1.759) .7%
(11)
(.319) 6.873 (57.475) 84 118.24 .9713
(.631) 6.621 (53.03) 85 99.75 .9658
(1.670) 5.4296 (5.430) 84 101.18 .9666
...
(14)
−.0952 (3.479) 11% −.0313 −.00030 ...
−.018 (.936) 3%
(13)
−.0119 (.405) .8% −.0798
...
(12)
ln(Robbery rate) ln(Auto theft rate) ln(Burglary rate) ln(Larceny rate)
Note: Absolute t-statistics are in parentheses, and the percentage reported below that is the percent of a standard deviation change in the endogenous variable that can be explained by a 1 standard deviation change in the exogenous variable. All variables, except for the county’s population and the year and county dummies, have been reported. The categories for violent and property crimes are eliminated because the mean prison sentence data supplied by Oregon did not allow us to use these two categories. All regressions use weighting where the weighting is each county’s population. Odd-numbered columns control for mean prison sentence, while even-numbered columns control for time actually served for those leaving prison.
...
.0052 (.364) 2%
(4)
(3)
(1)
(2)
ln(Rape rate)
ln(Murder rate)
Mean prison sentence length for those sentenced to prison in that year −.01033 . . . (1.457) 5% Time served for those ending their prison terms in that year ... .0041 (.18) 4% Population per square mile −.1014 −.0791 (.826) (.569) Intercept 1.208 .926 (3.594) (1.765) N 74 70 F-statistic 17.26 14.50 .8367 .8182 Adjusted R2
Exogenous variables
Endogenous variables (in crimes per 100,000 population)
Table 8.17—continued
Crime, deterrence, and right-to-carry concealed handguns
185
attach a monetary value to the marginal social benefit from an additional concealed handgun permit and to compare this with the private costs of gun ownership. While the results for Arizona imply no real savings from reduced crime, the estimates for Pennsylvania indicate that potential victims’ costs are reduced by $5,079 for each additional concealed handgun permit, and for Oregon the savings are $3,439 per permit. As with the discussion in Table 5, the results are largely driven by the effect that concealed handguns have in lowering the murder rate (with savings of $4,986 for Pennsylvania and $3,202 for Oregon). These estimated gains appear to far exceed the private costs of owning a concealed handgun. The purchase price of concealed handguns ranges from $25 for the least expensive .25-caliber pistols to $719 for the newest ultracompact 9 millimeter models; the permit filing fees can range from $19 every 5 years in Pennsylvania to a first-time $65 fee with subsequent 5-year renewals at $50 in Oregon; and several hours of supervised safety training are required in Oregon. Assuming a 5 percent real interest rate and the ability to amortize payments over 10 years, purchasing a $300 handgun and paying the licensing fees every 5 years in Pennsylvania implies a yearly cost of only $43, excluding the time costs incurred. The estimated expenses for Oregon are undoubtedly higher because of both the higher fees and the time costs and fees involved in obtaining certified safety instruction, but even if these annual costs double, they are still quite small compared to the social benefits. While any ammunition purchases and additional annual training would increase annualized costs, the very long life span of guns and the ability to resell them work to reduce the above estimate. The results imply that permitted handguns are being obtained at much lower than optimal rates, perhaps because of the important externalities not directly captured by the handgun owners themselves.
V. Accidental deaths from handguns Even if “shall issue” handgun permits lower murder rates, the question of what happens to accidental deaths still remains. Possibly, with more people carrying handguns, accidents may be more likely to happen. Earlier we saw that the number of murders prevented exceeded the entire number of accidental deaths. In the case of suicide, carrying concealed handguns increases the probability that a gun will be available to commit suicide with when an individual feels particularly depressed, and thus it could conceivably increase the number of suicides. As Table 8.2 showed, while only a small portion of accidental deaths are attributable to handgun laws, there is still the question whether concealed handgun laws affected the total number of deaths through their effect on accidental deaths. To get a more precise answer to this question, Table 8.18 uses county-level data from 1982 to 1991 to test whether allowing concealed handguns increased accidental deaths. Data are available from the Mortality Detail
.00478 (.096) −.0007 (6.701) .0000267 (1.559) −3.376 (1.114) 23,271 3.98 .2896
Shall issue law adopted dummy
.0980 (1.706) .000856 (7.063) −.000057 (2.882) −8.7655 (2.506) 23,271 3.91 .2846
In (Accidental deaths from nonhandgun sources)
1.331 (.840) −.0001635 (1.083) −.009046 (6.412) 29.36 (201.7) 23,271 −109,310.6 680
−7,424.6 21,897
Accidental deaths from nonhandgun sources
.574 (.743) −.0000436 (.723) .0000436 (1.464) 7.360841 (44.12) 23,271
Accidental deaths from handguns
Tobit
Note: While not all the coefficient estimates are reported, all the control variables are the same as those used in Table 8.3, including year and county dummies. Absolute t-statistics are in parentheses. All regressions weight the data by each county’s population.
N F-statistic Adjusted R2 Log likelihood Left-censored observations
Intercept or ancillary parameter
Real per capita personal income
Population per square mile
In (Accidental deaths from handguns)
Exogenous variables
Ordinary least squares
Endogenous variables (in deaths per 100,000 population)
Table 8.18 Did carrying concealed handguns increase the number of accidental deaths? County-level data, 1982–91
Crime, deterrence, and right-to-carry concealed handguns
187
Records (provided by the United States Department of Health and Human Services) for all counties from 1982 to 1988 and for counties over 100,000 population from 1989 to 1991. The specifications are identical to those shown in all the previous tables with the exceptions that we no longer include variables related to arrest or conviction rates and that the endogenous variables are replaced with a measure of the number of either accidental deaths from handguns or accidental deaths from all other nonhandgun sources. While there is some evidence that the racial composition of the population and the level of income maintenance payments affect accident rates, the coefficient of the shall issue dummy is both quite small economically and insignificant. The point estimates for the first specification imply that accidental handgun deaths rose by about .5 percent when concealed handgun laws were passed. With only 156 accidental handgun deaths during 1988 (22 accidental handgun deaths occurred in states with “shall issue” laws), this point estimate implies that implementing a concealed handgun law in those states which currently do not have it would produce less than one more death (.851 deaths). Given the very small number of accidental handgun deaths in the United States, the vast majority of counties have an accidental handgun death rate of zero, and thus using ordinary least squares is not the appropriate method of estimating these relationships. To deal with this, the last two columns in Table 8.18 reestimate these specifications using Tobit procedures. However, because of limitations in statistical packages we were no longer able to control for all the county dummies and opted to rerun these regressions with only state dummy variables. While the coefficients for the concealed handgun law dummy variable is not statistically significant, with 186 million people living in states without these laws in 1992,69 the third specification implies that implementing the law across those remaining states would have resulted in about 9 more accidental handgun deaths. Combining this finding with the earlier estimates from Tables 8.3 and 8.4, if the rest of the country had adopted concealed handgun laws in 1992, the net reduction in total deaths would have been approximately from 1,405 to 1,583.
VI. Conclusion Allowing citizens without criminal records or histories of significant mental illness to carry concealed handguns deters violent crimes and appears to produce an extremely small and statistically insignificant change in accidental deaths. If the rest of the country had adopted right-to-carry concealed handgun provisions in 1992, at least 1,414 murders and over 4,177 rapes would have been avoided. On the other hand, consistent with the notion that criminals respond to incentives, county-level data provide evidence that concealed handgun laws are associated with increases in property crimes involving stealth and where the probability of contact between the criminal
188
John R. Lott, Jr., and David B. Mustard
and the victim is minimal. The largest population counties where the deterrence effect from concealed handguns on violent crimes is the greatest also experienced the greatest substitution into property crimes. The estimated annual gain in 1992 from allowing concealed handguns was over $5.74 billion. The study provides the first estimates of the annual social benefit from private expenditures on crime reduction, with an additional concealed handgun permit reducing total victim losses by up to $5,000. The results imply that permitted handguns are being obtained at much lower than optimal rates in two of the three states for which we had the relevant data, perhaps because of the important externalities that are not captured by the individual handgun owners. Our evidence implies that concealed handguns are the most cost-effective method of reducing crime thus far analyzed by economists, providing a higher return than increased law enforcement or incarceration, other private security devices, or social programs like early educational intervention.70 The data also supply dramatic evidence supporting the economic notion of deterrence. Higher arrest and conviction rates consistently and dramatically reduce the crime rate. Consistent with other recent work,71 the results imply that increasing the arrest rate, independent of the probability of eventual conviction, imposes a significant penalty on criminals. Perhaps the most surprising result is that the deterrent effect of a 1 percentage point increase in arrest rates is much larger than the same increase in the probability of conviction. Also surprising is that while longer prison lengths usually implied lower crime rates, the results were normally not statistically significant. This study incorporates a number of improvements over previous studies on deterrence, and it represents a very large change in how gun studies have been done. This is the first study to use cross-sectional time-series evidence for counties at both the national level and for individual states. Instead of simply using cross-sectional state- or city-level data, our study has made use of the much bigger variations in arrest rates and crime rates between rural and urban areas, and it has been possible to control for whether the lower crime rates resulted from the gun laws themselves or other differences in these areas (for example, low crime rates) which led to the adoption of these laws. Equally important, our study has allowed us to examine what effect concealed handgun laws have on different counties even within the same state. The evidence indicates that the effect varies both with a county’s level of crime and with its population.
Data appendix The numbers of arrests and offenses for each crime in every county from 1977 to 1992 were provided by the Uniform Crime Report (UCR). The UCR program is a nationwide, cooperative statistical effort of over 16,000 city, county, and state law enforcement agencies to compile data on crimes that are
Crime, deterrence, and right-to-carry concealed handguns
189
reported to them. During 1993, law enforcement agencies active in the UCR Program represented over 245 million U.S. inhabitants, or 95 percent of the total population. The coverage amounted to 97 percent of the U.S. population living in metropolitan statistical areas (MSAs) and 86 percent of the population in non-MSA cities and in rural counties.72 The Uniform Crime Reports Supplementary Homicide Reports supplied the data on the victim’s sex and race and whatever relationship might have existed between the victim and the offender.73 The regressions report results from a subset of the UCR data set, though we also ran the regressions with the entire data set. The main differences were that the effects of concealed handgun laws on murder were greater than what is shown in this paper and the effects on rape and aggravated assault were smaller. Observations were eliminated because of changes in reporting practices or definitions of crimes (see Crime in the United States (1977–92)). For example, from 1985 to 1994 Illinois adopted a unique “gender-neutral” definition of sex offenses. Another example involves Cook County, Illinois, from 1981 to 1984 where there was a large jump in reported crime because there was a change in the way officers were trained to report crime. The additional observations that either were never provided or were dropped from the data set include Arizona (1980), Florida (1988), Georgia (1980), Kentucky (1988), and Iowa (1991). The counties with the following cities were also eliminated: violent crime and aggravated assault for Steubenville, Ohio (1977–89); violent crime and aggravated assault for Youngstown, Ohio (1977–87); violent crime, property crime, aggravated assault, and burglary for Mobile, Alabama (1977–85); violent crime and aggravated assault for Oakland, California (1977–90); violent crime and aggravated assault for Milwaukee, Wisconsin (1977–85); all crime categories for Glendale, Arizona (1977–84); violent crime and aggravated assault for Jackson, Mississippi (1977–83); violent crime and aggravated assault for Aurora, Colorado (1977–82); violent crime and aggravated assault for Beaumont, Texas (1977– 82); violent crime and aggravated assault for Corpus Cristi, Texas (1977–82); violent crime and rape for Macon, Georgia (1977–81); violent crime, property crime, robbery, and larceny for Cleveland, Ohio (1977–81); violent crime and aggravated assault for Omaha, Nebraska (1977–81); all crime categories for Little Rock, Arkansas (1977–79); all crime categories for Eau Claire, Wisconsin (1977–78); all crime categories for Green Bay, Wisconsin (1977). For all of the different crime rates, except for the Supplementary Homicide Data, if the true rate equals zero, we added .1 before we took the natural log of those values. For the accident rates and the Supplementary Homicide Data, if the true rate equals zero, we added .01 before we took the natural log of those values.74 The original Uniform Crime Report data set did not have arrest data for Hawaii in 1982. These missing observations were supplied to us by the Hawaii Uniform Crime Report program. In the original data set, a few observations
190
John R. Lott, Jr., and David B. Mustard
also had two listings for the same county and year identifiers. The incorrect observations were deleted from the data. The number of police in a state, which of those police have the power to make arrests, and police payrolls for a state by type of police officer are available for 1982–92 from the U.S. Department of Justice’s Expenditure and Employment Data for the Criminal Justice System. The data on age, sex, and racial distributions estimate the population in each county on July 1 of the respective years. The population is divided into 5-year segments, and race is categorized as white, black, and neither white nor black. The population data, with the exception of 1990 and 1992, were obtained from the Bureau of the Census.75 The estimates use modified census data as anchor points and then employ an iterative proportional fitting technique to estimate intercensal populations. The process ensures that the county-level estimates are consistent with estimates of July 1 national and state populations by age, sex, and race. The age distributions of large military installations, colleges, and institutions were estimated by a separate procedure. The counties for which special adjustments were made are listed in the report.76 The 1990 and 1992 estimates have not yet been completed by the Bureau of the Census and made available for distribution. We estimated the 1990 data by taking an average of the 1989 and 1991 data. We estimated the 1992 data by multiplying the 1991 populations by the 1990–91 growth rate of each county’s populations. Data on income, unemployment, income maintenance, and retirement were obtained by the Regional Economic Information System. Income maintenance includes Supplemental Security Insurance, Aid to Families with Dependent Children, and food stamps. Unemployment benefits include state unemployment insurance compensation, Unemployment for Federal Employees, unemployment for railroad employees, and unemployment for veterans. Retirement payments include Old Age, Survivors, and Disability Insurance, federal civil employee retirement payments, military retirement payments, state and local government employee retirement payments, and workers compensation payments (both federal and state). Nominal values were converted to real values by using the consumer price index.77 The index uses the average consumer price index for July 1983 as the base period. There were 25 observations whose county codes did not match any counties listed in the ICPSR code book. Those observations were deleted from the sample. Data concerning the number of concealed weapons permits for each county were obtained from a variety of sources. The Pennsylvania data were obtained from Alan Krug. Mike Woodward of the Oregon Law Enforcement and Data System provided the Oregon data for 1991 and after. The number of permits available for Oregon by county in 1989 was provided by the sheriffs’ departments of the individual counties. Cari Gerchick, deputy county attorney for Maricopa County in Arizona, provided us with the Arizona county-level conviction rates, prison sentence lengths, and concealed handgun permits from 1990 to 1995. The National Rifle Association
Crime, deterrence, and right-to-carry concealed handguns
191
provided data on their membership by state from 1977 to 1992. Information on the dates at which states enacted enhanced sentencing provisions for crimes committed with deadly weapons was obtained from Marvell and Moody.78 The first year where the dummy variable comes on is weighted by the portion of that first year that the law was in effect. For the Arizona regressions, the Brady Law dummy for 1994 is weighted by the percentage (83 percent) of the year that it was in effect. The Bureau of the Census provided data on the area in square miles for each county. The number of total and firearm unintentional injury deaths was obtained from annual issues of Accident Facts and The Vital Statistics of the United States. The classification of types of weapons is in International Statistical Classification of Diseases and Related Health Problems, Tenth Edition, Volume 1. The handgun category includes guns for single-hand use, pistols, and revolvers. The total includes all other types of firearms. Finally, while our regressions use the ICPSR’s estimates of arrest rates, after this paper was accepted we discovered that the ICPSR may have accidentally recorded some missing data on the number of arrests as zero. Working with the ICPSR and the FBI we attempted to correct this problem, and doing so tends to usually increase the significance and size of the shall issue dummies.
Notes *
1 2
3
4
The authors would like to thank Gary Becker, Phil Cook, Clayton Cramer, Gertrud Fremling, Ed Glaeser, Hide Ichimura, Don Kates, Gary Kleck, David Kopel, William Landes, David McDowall, Derek Neal, Bob Reed, and Dan Polsby and the seminar participants at the Cato Institute, University of Chicago, Emory University, Fordham University, Harvard University, Northwestern University, Stanford University, Valparaiso University, American Law and Economics Association meetings, American Society of Criminology, and the Western Economic Association meetings for their unusually helpful comments. Editorial, Cincinnati Enquirer, January 23, 1996, at A8. See P. J. Cook, The Role of Firearms in Violent Crime, in Criminal Violence 236–91 (M. E. Wolfgang & N. A. Werner eds. 1982); and Franklin Zimring, The Medium Is the Message: Firearm Caliber as a Determinant of Death from Assault, 1 J. Legal Stud. 97 (1972), for these arguments. P. J. Cook, The Technology of Personal Violence, 14 Crime and Justice: Annual Review of Research 57, 56 n.4 (1991). It is very easy to find people arguing that concealed handguns will have no deterrence effect. H. Richard Uviller, Virtual Justice 95 (1996), writes that “[m]ore handguns lawfully in civilian hands will not reduce deaths from bullets and cannot stop the predators from enforcing their criminal demands and expressing their lethal purposes with the most effective tool they can get their hands on.” Gary Kleck & Marc Gertz, Armed Resistance to Crime: The Prevalence and Nature of Self-Defense with a Gun, 86 J. Crim. L. & Criminology 150, 153, 180, 180–82 (Fall 1995). Kleck and Gertz’s survey of 10 other nationwide polls implies a range of 764,036–3,609,682 defensive uses of guns per year. Recent evidence confirms other numbers from Kleck and Gertz’s study. For example, Annest et al. estimate that 99,025 people sought medical treatment for nonfatal firearm
192
5 6 7
8
9
10 11 12
John R. Lott, Jr., and David B. Mustard
woundings. When one considers that many criminals will not seek treatment for wounds and that not all wounds require medical treatment, Kleck and Gertz’s estimate of 200,000 woundings seems somewhat plausible, though even Kleck and Gertz believe that this is undoubtedly too high given the very high level of marksmanship that this implies by those shooting the guns. Yet, even if the true number of times that criminals are wounded is much smaller, it still implies that criminals face a very real expected cost from attacking armed civilians. See J. L. Annest, J. A. Mercy, D. R. Gibson, & G. W. Ryan, National Estimates of Nonfatal Firearm-Related Injuries: Beyond the Tip of the Iceberg, J. A.M.A. 1749–54 (June 14, 1995); and also Lawrence Southwick, Jr., Self-Defense with Guns: The Consequences (working paper, SUNY Buffalo 1996), for a discussion on the defensive uses of guns. U.S. Bureau of the Census, Statistical Abstract of the United States (115th ed. 1995). Japan Economic Newswire, U.S. Jury Clears Man Who Shot Japanese Student, Kyodo News Service, May 24, 1993; and Lori Sharn, Violence Shoots Holes in USA’s Tourist Image, USA TODAY, September 9, 1993, at 2A. Dawn Lewis of Texans against Gun Violence provided a typical reaction from gun control advocates to the grand jury decision not to charge Gordon Hale. She said, “We are appalled. This law is doing what we expected, causing senseless death.” Mark Potok, Texan says the concealed gun law saved his life: “I did what I thought I had to do,” USA TODAY, March 22, 1996, at 3A. For a more recent evaluation of the Texas experience, see Few Problems Reported after Allowing Concealed Handguns, Officers Say, Fort Worth Star-Telegram, July 16, 1996. By the end of June 1996, more than 82,000 permits had been issued in Texas. In fact, police accidentally killed 330 innocent individuals in 1993, compared to the mere 30 innocent people accidentally killed by private citizens who mistakenly believed the victim was an intruder. John R. Lott, Jr., Now That the Brady Law Is Law, You Are Not Any Safer than Before, Philadelphia Inquirer, February 1, 1994, at A9. Clayton E. Cramer & David B. Kopel, “Shall Issue”: The New Wave of Concealed Handgun Permit Laws, 62 Tenn. L. Rev. 679, 691 (Spring 1995). An expanded version of this paper dated 1994 is available from the Independence Institute, Golden, Colorado. Similarly, Multnomah County, Oregon, issued 11,140 permits over the period January 1990 to October 1994 and experienced five permit holders being involved in shootings, three of which were considered justified by grand juries. Out of the other two cases, one was fired in a domestic dispute and the other was an accident that occurred while an assault rifle was being unloaded. Bob Barnhart, Concealed Handgun Licensing in Multnomah County (photocopy, Intelligence/Concealed Handgun Unit, Multnomah County, October 1994). Cramer & Kopel, supra note 9, at 691–92. For example, David B. Kopel, The Samurai, the Mountie, and the Cowboy 155 (1992); and Lott, supra note 8. Wright and Rossi (p. 151) interviewed felony prisoners in 10 state correctional systems and found that 56 percent said that criminals would not attack a potential victim that was known to be armed. They also found evidence that criminals in those states with the highest levels of civilian gun ownership worried the most about armed victims. James D. Wright & Peter Rossi, Armed and Considered Dangerous: A Survey of Felons and Their Firearms (1986). Examples of stories where people successfully defend themselves from burglaries with guns are quite common. For example, see Burglar Puts 92-YearOld in the Gun Closet and Is Shot, New York Times, September 7, 1995, at A16. George F. Will, Are We “a Nation of Cowards”? Newsweek, November 15, 1993, discusses more generally the benefits produced from an armed citizenry.
Crime, deterrence, and right-to-carry concealed handguns
13
14
15 16
17 18 19 20
21
22
193
In his paper on airplane hijacking, William M. Landes, An Economic Study of U.S. Aircraft Hijacking, 1961–1976, 21 J. Law & Econ. 1 (April 1978), references a quote by Archie Bunker from the television show “All in the Family” that is quite relevant to the current discussion. Landes quotes Archie Bunker as saying “Well, I could stop hi-jacking tomorrow . . . if everyone was allowed to carry guns them hi-jackers wouldn’t have no superiority. All you gotta do is arm all the passengers, then no hi-jacker would risk pullin’ a rod.” These states were Alabama, Alaska, Arizona, Arkansas, Connecticut, Florida, Georgia, Idaho, Indiana, Kentucky, Louisiana, Maine, Mississippi, Montana, Nevada, New Hampshire, North Carolina, North Dakota, Oklahoma, Oregon, Pennsylvania, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, and Wyoming. These states were Alabama, Connecticut, Indiana, Maine, New Hampshire, North Dakota, South Dakota, Vermont, and Washington. Fourteen other states provided local discretion on whether to issue permits: California, Colorado, Delaware, Hawaii, Iowa, Louisiana, Maryland, Massachusetts, Michigan, Minnesota, New Jersey, New York, Rhode Island, and South Carolina. See Gary Kleck, Guns and Violence: An Interpretive Review of the Field, 1 Soc. Pathology 12–47 (January 1995), for a survey. For example, P. J. Cook, Stephanie Molliconi, & Thomas B. Cole, Regulating Gun Markets, 86 J. Crim. L. & Criminology, 59–92 (Fall 1995); Cramer & Kopel, supra note 9; David McDowall, Colin Loftin, & Brian Wiersema, Easing Concealed Firearm Laws: Effects on Homicide in Three States, 86 J. Crim. L. & Criminology 193–206 (Fall 1995); and Gary Kleck & E. Britt Patterson, The Impact of Gun Control and Gun Ownership Levels on Violence Rates, 9 J. Quantitative Criminology 249–87 (1993). All 22 gun control papers studied by Kleck, supra note 15, use either crosssectional state or city data or use time-series data for the entire United States or a particular city. Cramer & Kopel, supra note 9, at 680–707. McDowall et al., supra note 16. Equally damaging, the authors appear to concede in a discussion that follows their piece that their results are highly sensitive to how they define the crimes that they study. Even with their strange sample selection techniques, total murders appear to fall after the passage of concealed weapon laws. Because the authors only examine murders committed with guns, there is no attempt to control for any substitution effects that may occur between different methods of murder. For an excellent discussion of the McDowall et al. paper, see Daniel D. Polsby, Firearms Costs, Firearms Benefits and the Limits of Knowledge, 86 J. Crim. L. & Criminology 207–20 (Fall 1995). Recent attempts to relate the crime rate to the prison population concern us (see, for example, Levitt). Besides difficulties in relating the total prison population with any particular type of crime, we are also troubled by the ability to compare a stock (the prison population) with a flow (the crime rate). Steven Levitt, The Effect of Prison Population Size on Crime Rates: Evidence from Prison Overcrowding Litigation, 144 Q. J. Econ. (1996). Gary S. Becker, Crime and Punishment: An Economic Approach, 76 J. Pol. Econ. 169–217 (March/April 1968). For example, Isaac Ehrlich, Participation in Illegitimate Activities: A Theoretical and Empirical Investigation, 81 J. Pol. Econ. 521–65 (1973); Michael K. Block & John Heineke, A Labor Theoretical Analysis of Criminal Choice, 65 Am. Econ. Rev. 314–25 (June 1975); Landes, supra note 12; John R. Lott, Jr., Juvenile Delinquency and Education: A Comparison of Public and Private Provision, 7 Int’l Rev. L. & Econ. 163–75 (December 1987); James Andreoni, Criminal Deterrence in the Reduced Form: A New Perspective on
194
23 24
25 26
27 28
29
John R. Lott, Jr., and David B. Mustard
Ehrlich’s Seminal Study, 33 Econ. Inquiry 476–83 (July 1995); Morgan O. Reynolds, Crime and Punishment in America (Policy Report 193, National Center for Policy Analysis, June 1995); and Levitt, supra note 21. John R. Lott, Jr., Do We Punish High Income Criminals Too Heavily? 30 Econ. Inquiry 583–608 (October 1992). John R. Lott, Jr., The Effect of Conviction on the Legitimate Income of Criminals, 34 Econ. Letters 381–85 (December 1990); John R. Lott, Jr., An Attempt at Measuring the Total Monetary Penalty from Drug Convictions: The Importance of an Individual’s Reputation, 21 J. Legal Stud. 159–87 (January 1992); and Lott, supra note 23. Arson was excluded because of a large number of inconsistencies in the data and the small number of counties reporting this measure. Murder is defined as murder and nonnegligent manslaughter. Robbery includes street robbery, commercial robbery, service station robbery, convenience store robbery, residence robbery, and bank robbery. (See also the discussion of burglary for why the inclusion of residence robbery creates difficulty with this broad measure.) After we wrote this paper, two different commentators have attempted to argue that “[i]f ‘shall issue’ concealed carrying laws really deter criminals from undertaking street crimes, then it is only reasonable to expect the laws to have an impact on robberies. Robbery takes place between strangers on the street. A high percentage of homicide and rape, on the other hand, occurs inside a home—where concealed weapons laws should have no impact. These findings strongly suggest that something else—not new concealed carry laws—is responsible for the reduction in crime observed by the authors.” (Doug Weil, Response to John Lott’s Study on the Impact of “Carry Concealed” Laws on Crime Rates, U.S. Newswire, August 8, 1996.) The curious aspect about the emphasis on robbery over other crimes like murder and rape is that if robbery is the most obvious crime to be affected by gun control laws, why have virtually no gun control studies examined robberies? In fact, Kleck’s literature survey only notes one previous gun control study that examined the issue of robberies (see Kleck, supra note 15). Yet, more importantly, given that the FBI includes many categories of robberies besides robberies that “take place between strangers on the street,” it is not obvious why this should exhibit the greatest sensitivity to concealed handgun laws. Larceny includes pickpockets, purse snatching, shoplifting, bike theft, theft from buildings, theft from coin machines, and theft from motor vehicles. For example, Arnold S. Linsky, Murray A. Strauss, & Ronet Bachman-Prehn, Social Stress, Legitimate Violence, and Gun Availability (paper presented at the annual meeting of the Society for the Study of Social Problems, 1988); and Cramer & Kopel, supra note 9. Among those who made this comment to us were Bob Barnhardt, manager of the Intelligence/Concealed Handgun Unit of Multnomah County, Oregon; Mike Woodward with the Oregon Law Enforcement Data System; Joe Vincent with the Washington Department of Licensing Firearms Unit; Alan Krug, who provided us with the Pennsylvania Permit data; and Susan Harrell with the Florida Department of State Concealed Weapons Division. Evidence for this point with respect to Virginia is obtained from Eric Lipton, Virginians Get Ready to Conceal Arms; State’s New Weapon Law Brings a Flood of Inquiries, Washington Post, June 28, 1995, at Al, where it is noted that “[a]nalysts say the new law, which drops the requirement that prospective gun carriers show a ‘demonstrated need’ to be armed, likely won’t make much of a difference in rural areas, where judges have long issued permits to most people who applied for them. But in urban areas such as Northern Virginia—where judges granted few permits because few residents could justify a need for them—the number of concealed weapon permits issued is expected to soar. In Fairfax, for example, a county of more than 850,000 people,
Crime, deterrence, and right-to-carry concealed handguns
30 31 32 33
34
35
36
37 38 39
40 41 42
195
only 10 now have permits.” Cramer & Kopel, supra note 9. An expanded version of this paper dated 1994, available from the Independence Institute, Golden, Colorado, also raises this point with respect to California. For example, Kleck & Patterson, supra note 16. Sam Peltzman, The Effects of Automobile Safety Regulation, 83 J. Pol. Econ. 677–725 (August 1975). Ehrlich, supra note 22, at 548–53. While we will follow Cramer and Kopel’s definition of what constitutes a “shall issue” or a “do issue” state, one commentator has suggested that it is not appropriate to include Maine in these categories (Stephen P. Teret, Critical Comments on a Paper by Lott and Mustard (photocopy, Johns Hopkins University, School of Hygiene and Public Health, August 7, 1996)). Either defining Maine so that the “shall issue” dummy equals zero for it or removing Maine from the data set does not alter the findings shown in this paper. Please see note 49 infra for a further discussion. While the intent of the 1988 legislation in Virginia was clearly to institute a “shall issue” law, the law was not equally implemented in all counties in the state. To deal with this problem, we reran the regressions reported in this paper with the “shall issue” dummy both equal to 1 and 0 for Virginia. The results as reported later in footnote 49 are very similar in the two cases. We rely on Cramer & Kopel, supra note 9, for this list of states. Some states known as “do issue” states are also included in Cramer and Kopel’s list of “shall issue” states though these authors argue that for all practical purposes these two groups of states are identical. The Oregon counties providing permit data were Benton, Clackamas, Coos, Curry, Deschutes, Douglas, Gilliam, Hood River, Jackson, Jefferson, Josephine, Klamath, Lane, Linn, Malheur, Marion, Morrow, Multnomah, Polk, Tillamook, Washington, and Yamhill. See Table 8.2 for the list and summary statistics. For example, James Q. Wilson & Richard J. Herrnstein, Crime and Human Nature 126–47 (1985). However, the effect of an unusually large percentage of young males in the population may be mitigated because those most vulnerable to crime may be more likely to take actions to protect themselves. Depending on how responsive victims are to these threats, it is possible that the coefficient for a variable like the percentage of young males in the population could be zero even when the group in question poses a large criminal threat. Edward L. Glaeser & Bruce Sacerdote, Why Is There More Crime in Cities? (working paper, Harvard Univ., November 14, 1995). For a discussion of the relationship between income and crime see John R. Lott, Jr., A Transaction-Costs Explanation for Why the Poor Are More Likely to Commit Crime, 19 J. Legal Stud. 243–45 (January 1990). A more detailed survey of the state laws is available from the authors. The findings of a brief survey of the laws excluding the permitting changes are as follows: Alabama: No significant changes in these laws during period. Connecticut: Law gradually changed in wording from criminal use to criminal possession from 1986 to 1994. Florida: Has the most extensive description of penalties. The same basic law (790.161) is found throughout the years. An additional law (790.07) is found only in 1986. Georgia: A law (16–11–106) that does not appear in the 1986 edition appears in the 1989 and 1994 issues. The law involves possession of a firearm during commission of a crime and specifies the penalties associated with it. Because of the possibility that this legal change might have occurred at the same time as the 1989 changes in permitting rules, we used a Lexis search to check the legislative history of 16–11–106 and found that the laws were last changed in 1987,
196
43 44
45
46
John R. Lott, Jr., and David B. Mustard
2 years before the change in permitting rules (O.C.G.A. 16–11–106 (1996)). Idaho: There are no significant changes in Idaho over time. Indiana: No significant changes in these laws during the period. Maine: No significant changes in these laws during the period. Mississippi: Law 97–37–1 talks explicitly about penalties. It appears in the 1986 version, but not in the 1989 or the 1994 versions. Montana: Some changes in punishments related to unauthorized carrying of concealed weapons laws, but no changes in the punishment for using a weapon in a crime. New Hampshire: No significant changes in these laws during the period. North Dakota: No significant changes in these laws during the period. Oregon: No significant changes in these laws during the period. Pennsylvania: No significant changes in these laws during the period. South Dakota: Law 22–14–13, which specifies penalties for commission of a felony while armed appears in 1986, but not 1989. Vermont: Section 4005, which outlines the penalties for carrying a gun when committing a felony, appears in 1986, but not in 1989 or 1994. Virginia: No significant changes in these laws during the period. Washington: No significant changes in these laws during the period. West Virginia: Law 67–7–12 is on the books in 1994, but not the earlier versions. It involves punishment for endangerment with firearms. Removing Georgia from the sample, which was the only state that had gun laws changing near the year that the “shall issue” law went into affect, so that there is no chance that the other changes in gun laws might affect our results does not appreciably alter our results. Thomas B. Marvell & Carlisle E. Moody, The Impact of Enhanced Prison Terms for Felonies Committed with Guns, 33 Criminology 247, 258–61 (May 1995). Using Marvell and Moody’s findings shows that the closest time period between these sentencing enhancements and changes in concealed weapon laws is 7 years (Pennsylvania). Twenty-six states passed their enhancement laws prior to the beginning of our sample period, and only four states passed these types of laws after 1981. Maine, which implemented its concealed handgun law in 1985, passed its sentencing enhancement laws in 1971. The states with a waiting period prior to the beginning of our sample include Alabama, California, Connecticut, Illinois, Maryland, Minnesota, New Jersey, North Carolina, Pennsylvania, Rhode Island, South Dakota, Washington, and Wisconsin. The District of Columbia also had a waiting period prior to the beginning of our sample. The states which adopted this rule during the sample include Hawaii, Indiana, Iowa, Missouri, Oregon, and Virginia. One possible concern with these initial results arises from our use of an aggregate public policy variable (state right-to-carry laws) on county-level data. See Bruce C. Greenwald, A General Analysis of the Bias in the Estimated Standard Errors of Least Squares Coefficients, 22 J. Econometrics 323–38 (August 1983); and Brent R. Moulton, An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units, 72 Rev. Econ. & Stat. 334 (1990). As Moulton writes: “If disturbances are correlated within the groupings that are used to merge aggregate with micro data, however, then even small levels of correlation can cause the standard errors from the ordinary least squares (OLS) to be seriously biased downward.” Yet, this should not really be a concern here because of our use of dummy variables for all the counties, which is equivalent to using state dummies as well as county dummies for all but one of the counties within each state. Using these dummy variables thus allows us to control for any disturbances that are correlated within any individual state. The regressions discussed in footnote 53 rerun the specifications shown in Table 3 but also include state dummies that are interacted with a time trend. This should thus not only control for any disturbances that are correlated with the states, but also for any disturbances that are correlated within a state over time. Finally, while right-to-carry laws are almost always statewide laws, there is one exception. Pennsylvania exempted its largest
Crime, deterrence, and right-to-carry concealed handguns
197
county (Philadelphia) from the law when it was passed in 1989, and it remained exempt from the law during the rest of the sample period. 47 However, the increase in the number of property crimes is larger than the drop in the number of robberies. 48 While we believe that such variables as the arrest rate should be included in any regressions on crime, one concern with the results reported in the tables is whether the relationship between the “shall issue” dummy and the crime rates still occurs even when all the other variables are not controlled for. Using weighted least squares and reporting only the “shall issue” coefficients, we estimated the following regression coefficients (absolute t-statistics are shown in parentheses): Endogenous variables
Shall issue dummy only
Shall issue dummy and year effects only
Violent crimes Murder Rape Aggravated assault Robbery Property crime Auto theft Burglary Larceny
−.335 (22.849) −.394 (19.095) −.147 (8.030) −.322 (21.932) −.485 (19.522) −.1603 (18.030) −.268 (7.793) −.247 (26.381) −.101 (10.288)
−.449 (30.092) −.419 (19.829) −.248 (13.34) −.448 (30.356) −.561 (22.110) −.186 (20.605) −.358 (23.407) −.217 (22.739) −.136 (13.640)
Regressing the crime rates on only the “shall issue” dummy and the year and county dummies produces a “shall issue” coefficient that equals −.021 (t-statistic = 1.66) for violent crimes and .051 (t-statistic = 6.52) for property crimes. The other estimates discussed in the text produce similar results and are available on request from the authors. 49 While we adopt the classifications used by Cramer and Kopel (supra note 9), some are more convinced by other classifications of the states (for example, Weil, supra note 26; and Teret, supra note 33). Setting the “shall issue” dummy for Maine to zero and rerunning the regressions shown in Table 3 results in the following “shall issue” coefficients (t-statistics in parentheses): −.0295 (2.955) for violent crimes, −0.813 (5.071) for murder, −.0578 (4.622) for rape, −.0449 (3.838) for aggravated assault, −.0097 (0.714) for robbery, .029 (3.939) for property crimes, .081 (6.942) for automobile theft, .0036 (0.466) for burglary, and .0344 (3.790) for larceny. Similarly, setting the “shall issue” dummy for Virginia to zero results in the following “shall issue” coefficients (t-statistics in parentheses): −.0397 (3.775) for violent crimes, −0.868 (5.138) for murder, −.0527 (4.007) for rape, −.05426 (4.410) for aggravated assault, −.0011 (0.076) for robbery, .0334 (4.326) for property crimes, .091 (7.373) for automobile theft, .0211 (2.591) for burglary, and .0348 (3.646) for larceny. As a final test, dropping both Maine and Virginia from the data set results in the following “shall issue” coefficients (t-statistics in parentheses): −.0233 (2.117) for violent crimes, −0.9698 (5.519) for murder, −.0629 (4.589) for rape, −.0313 (2.436) for aggravated assault, 0.006 (0.400) for robbery, .0361 (4.436) for property crimes, .0977 (7.607) for automobile theft, .0216 (2.526) for burglary, and .03709 (3.707) for larceny. 50 Given the possible relationship between drug prices and crime, we reran the regressions in Table 3 by including an additional variable for cocaine prices. One argument linking drug prices and crime is that if the demand for drugs is inelastic and if people commit crimes in order to finance their habits, higher drug prices
198
51
52 53
54 55
56
John R. Lott, Jr., and David B. Mustard
might lead to increased levels of crime. Using the Drug Enforcement Administration’s STRIDE data set from 1977 to 1992 (with the exceptions of 1988 and 1989), Michael Grossman, Frank J. Chaloupka, & Charles C. Brown, The Demand for Cocaine by Young Adults: A Rational Addiction Approach (working paper, National Bureau of Economic Research, July 1996), estimate the price of cocaine as a function of its purity, weight, year dummies, year dummies interacted with eight regional dummies, and individual city dummies. There are two problems with this measure of predicted prices: (1) it removes observations during a couple of important years during which changes were occurring in concealed handgun laws and (2) the predicted values that we obtained from this ignored the city-level observations. The reduced number of observations provides an important reason why we do not include this variable in the regressions shown in Table 3. However, the primary impact of including this new variable is to make the “shall issue” coefficients in the violent crime regressions even more negative and more significant (for example, the coefficient for the violent crime regression is now −.075, −.10 for the murder regression, −.077 for rape, and −.11 for aggravated assault, with all of them significant at more than the .01 level). Only for the burglary regression does the “shall issue” coefficient change appreciably: it is now negative and insignificant. The variable for drug prices itself is negatively related to murders and rapes and positively and significantly related to all the other categories of crime at least at the .01 level for a one-tailed t-test. We would like to thank Michael Grossman for providing us with the original regressions on drug prices from his paper. By contrast, if the question had instead been what would the difference in crime rates have been between either all states or no states adopting right-to-carry handgun laws, the case of all states adopting concealed handgun laws would have produced 2,020 fewer murders, 5,747 fewer rapes, 79,001 fewer aggravated assaults, and 14,862 fewer robberies. By contrast, property crimes would have risen by 336,409. Ted R. Miller, Mark A. Cohen, & Brian Wiersema, Victim Costs and Consequences: A New Look (February 1996). We reran the specifications shown in Table 8.3 by also including state dummies which were each interacted with a time trend variable. In this case, all of the concealed handgun dummies were negative, though the coefficients were not statistically significant for aggravated assault and larceny. Under this specification, adopting concealed handgun laws in those states currently without them would have reduced 1992 murders by 1,839, rapes by 3,727, aggravated assaults by 10,990, robberies by 61,064, burglaries by 112,665, larcenies by 93,274, and auto thefts by 41,512. The total value of this reduction in crime in 1992 dollars would have been $7.02 billion. With the exceptions of aggravated assault and burglary, violent crimes still experienced larger drops from the adoption of concealed handgun laws than did property crimes. Rerunning the specifications in Table 8.3 without either the percentage of the populations that fall into the different sex, race, and age categories or without the measures of income tended to produce similar though somewhat more significant results with respect to concealed handgun laws. The estimated gains from passing concealed handgun laws were also larger. Again see Peltzman, supra note 31. Other differences also arise in the other control variables such as those relating the percentage of the population of a certain race, sex, and age. For example, the percentage of black males in the population between 10 and 19 is no longer statistically significant. By contrast, if the question had instead been what would the difference in crime rates have been between either all states or no states adopting right-to-carry
Crime, deterrence, and right-to-carry concealed handguns
57 58
59
60 61 62 63
64
65
66 67 68
199
handgun laws, the case of all states adopting concealed handgun laws would have produced 2,286 fewer murders, 9,630 fewer rapes, 50,353 fewer aggravated assaults, and 92,264 fewer robberies. Property crimes would also have fallen by 659,061. Eric Rasmusen, Stigma and Self-Fulfilling Expectations of Criminality, 39 J. Law & Econ. 519 (1996). The Washington State data were obtained from Joe Vincent of the State Department of Licensing Firearms Unit in Olympia, Washington. The Oregon state data were obtained from Mike Woodward with the Law Enforcement Data System, Department of State Police, Salem, Oregon. Unpublished information obtained by Kleck and Gertz, supra note 4, in their 1995 National Self-Defense Survey implies that women were as likely as men to use handguns in self-defense in or near their home (defined as in their yard, carport, apartment hall, street adjacent to home, detached garage, and so on), but that women were less than half as likely to use a gun in self-defense away from home. Marvell & Moody, supra note 43, at 259–60. With the exception of only one state, the adoption of waiting periods corresponds to the adoption of background checks. Ehrlich, supra note 22, at 548–51. See also Robert E. McCormick & Robert Tollison, Crime on the Court, 92 J. Pol. Econ. 223–35 (April 1984), for a novel article testing the endogeneity of the “arrest rate” in the context of basketball fouls. We would like to thank Phil Cook for suggesting this addition to us. In a sense, this is similar to Ehrlich’s specification, supra note 22, at 557, except that the current crime rate is broken down into its lagged value and the change between the current and previous periods. While county-level data were provided in the Supplementary Homicide Report, matching these county observations with those used in the Uniform Crime Report (UCR) proved unusually difficult. A unique county identifier was used in the Supplementary Homicide Report, and it was not consistent across years. In addition, some caution is suggested in using both the Mortality Detail Records and the Supplementary Homicide Report since the murder rates reported in both sources have relatively low correlations of less than .7 with the murder rates reported in Uniform Crime Reports. This is especially surprising for the Supplementary Report, which is derived from the UCR. Running the regressions for all Pennsylvania counties (and not just those over 200,000 population) produced similar coefficients and signs for the change in concealed handgun permits coefficient, though the coefficients were no longer statistically significant for violent crimes, rape, and aggravated assault. Alan Krug, who provided us with the Pennsylvania handgun permit data, told us that one reason for the large increase in concealed handgun permits in some rural counties was because people used the guns for hunting. He told us that these low population rural counties tended to have their biggest increase in people obtaining permits in the fall around hunting season. If people were in fact getting a large number of permits in low population counties which already have extremely low crime rates for some reason other than crime, it will make it more difficult to pick up the deterrent effect on crime from concealed handguns that was occurring in the large counties. We reran these regressions taking the natural logs of the arrest and conviction rates, and they continued to produce statistically larger and even economically more important effects for the arrest rates than they did for the conviction rates. For example, see Dan M. Kahan, What Do Alternative Sanctions Mean? 63 U. Chi. L. Rev. 591–653 (1996). Lott, supra note 23; Lott, The Effect of Conviction; and An Attempt at Measuring the Total Monetary Penalty from Drug Convictions, both supra note 24.
200
John R. Lott, Jr., and David B. Mustard
69 In 1991, 182 million people lived in states without these laws, so the Tobit regressions would have also implied nine more accidental handgun deaths in that year. 70 For a comparison with the efficiency of other methods to reduce crime, see John Donohue and Peter Siegelman, Is the United States at the Optimal Rate of Crime? Stanford University School of Law (1996); and Ian Ayres and Steven Levitt, Measuring Positive Externalities from Unobservable Victim Precaution: An Empirical Analysis of Lojack (Yale University working paper, October 1996). For a discussion of what constitutes true externalities (both benefits and costs) from crime, see Kermit Daniel and John R. Lott, Jr., Should Criminal Penalties Include Third-Party Avoidance Costs? 24 J. Legal Stud. 523–34 (June 1995). 71 Kahan, supra note 67; and Lott, The effect of Conviction; and An Attempt at Measuring the Total Monetary Penalty from Drug Convictions, both supra note 24. 72 Federal Bureau of Investigation, Crime in the United States (Uniform Crime Reports 1994). We also wish to thank Tom Bailey at the FBI and Jeff Maurer at the U.S. Department of Health and Human Services for answering questions concerning the data used in this article. 73 The Intercensal Estimates of the Population of Counties by Age, Sex and Race (ICPSR) number for this data set was 6,387, and the principal investigator was James Alan Fox of Northeastern University College of Criminal Justice. 74 Dropping the zero crime values from the sample made the shall issue coefficients larger and more significant, but doing the same thing for the accident rate regressions did not alter those shall issue coefficients. (See also the discussion at the end of Section IVB.) 75 For further descriptions of the procedures for calculating intercensus estimates of population, see U.S. Department of Commerce, Bureau of the Census, Intercensal Estimates of the Population of Counties by Age, Sex, and Race (United States): 1970–1980 (ICPSR No. 08384, ICPSR, Ann Arbor, Mich., Winter 1985); also see U.S. Department of Commerce, Bureau of the Census, Intercensal Estimates of the Population of Counties by Age, Sex and Race: 1970–1980 Tape Technical Documentation. U.S. Bureau of the Census, Current Population Reports, Series P-23, No. 103, Methodology for Experimental Estimates of the Population of Counties by Age and Sex: July 1, 1975. U.S. Bureau of the Census, Census of Population, 1980: County Population by Age, Sex, Race and Spanish Origin (Preliminary OMB-Consistent Modified Race). 76 U.S. Bureau of the Census, Current Population Reports, Series P-23, No. 103, Methodology for Experimental Estimates of the Population of Counties by Age and Sex: July 1, 1975. U.S. Bureau of the Census, Census of Population, 1980: County Population by Age, Sex, Race and Spanish Origin (Preliminary OMB-Consistent Modified Race), at 19–23. 77 U.S. Bureau of the Census, Statistical Abstract of the United States, Table No. 746, at 487 (114th ed. 1994). 78 Marvell & Moody, supra note 43, at 259–60.
9
The effect of concealed handgun laws on crime Beyond the dummy variables Hashem Dezhbakhsh and Paul H. Rubin
I. Introduction The right-to-carry concealed handgun laws—“shall issue” laws—and their possible effects on crime have been the subject of extensive policy and academic debate as more states adopt such laws.1 From 1977 to 1992 ten states passed such laws making it much easier to obtain licenses to carry concealed handguns, and thirteen states adopted this law between 1992 and 1996. These laws are at odds with the recently passed Federal Brady Bill, which is restrictive in terms of gun ownership, reflecting the conflict among various levels of government regarding the role of handguns in violence. Such conflict also extends to academic circles. Some argue the concealed handgun laws increase criminals’ access to guns through theft, overpowering victims, or black market, thus leading to a civil arms race which can only increase crime (Cook (1991), Kellermann et al. (1995), McDowall, Loftin, and Wiersema (1995), Cook and Ludwig (1996), Cook and Leitzel (1996), Hemenway (1997), and Ludwig (1998)). We call this outcome the “facilitating effect” of concealed handgun laws. The supporters of these laws dispute the facilitating effect, maintaining that the effect is opposite. They argue that allowing citizens to carry firearms will increase criminals’ uncertainty regarding an armed response, thus leading to less crime—the “deterrence effect” (Kleck and Patterson (1993), Polsby (1994, 1995), Lott and Mustard (1997) and Lott (1998)). No study has formalized the above arguments theoretically. Such a theoretical basis is necessary for any empirical investigation of the issue. In this paper we formalize these arguments in the context of the economic model of crime. We demonstrate that the direction and magnitude of any resulting change would depend on the parameters of the criminal’s optimization problem and the characteristics of the individual and his social and economic setting. This means that any change in crime rate induced by concealed handgun laws will depend on demographic, social, and economic specifities of the observation units (in this case counties). Thus, these laws might lead to increases in crime in some jurisdictions and decreases in others. For example, one would expect the effect of the law on crime to be more pronounced in
202
Hashem Dezhbakhsh and Paul H. Rubin
more populated counties, because authorities who have discretion over issuing handgun-carrying permits in the absence of a concealed handgun law are the most restrictive in these counties. The largest changes in handgun density as the result of such laws are therefore expected in populated counties. Moreover, since the law excludes juveniles from receiving gun-carrying permits, the deterrent effect is expected to be smaller in counties with a younger population. Other demographic determinants of propensity to carry a concealed weapon may lead to similar differential effects. We also empirically examine the effect of concealed handgun laws on crime. We base our empirical analysis on the aforementioned theoretical considerations by allowing the effect of the law on crime to be a function of characteristics of the population in a given jurisdiction. We can then infer how various factors influence the magnitude of the change in crime resulting from these laws. More specifically, we project what the 1992 crime rate for counties without such a law would have been if the county had adopted such a law by 1992. We then compare these projections, which are a function of county characteristics, with actual crime data for each county in 1992 to infer how the absence of the law has affected crime in these counties. We also examine the relationship between these projected changes and county characteristics. We use Lott and Mustard’s data which covers 3,054 counties for the period 1977–1992 and includes series on various categories of crime and arrest rates and economic, demographic, and political variables. The rich data set allows us to exploit cross-county heterogeneities, while our theory-based empirical procedure allows us to make state-level inference about the potential effect of the law. Ignoring specific population characteristics when modeling the effect of the law leads to model misspecification and invalid inference. For example, in a highly publicized study, which covers more than 3,000 U.S. counties over a decade, Lott and Mustard (1997) use a dummy variable to model the effect of the law as a shift in the intercept of the linear crime equation they estimate.2 The method is predicated on two assumptions: (1) all behavioral (response) parameters of this equation (slope coefficients) are fixed—unaffected by the law and (2) the effect of the law on crime is identical across counties. We demonstrate that these assumptions can be rejected both on theoretical and empirical grounds. Our procedure is intended to overcome such shortcomings. The remaining sections are organized as follows: Section II elaborates on the stated effects of the concealed handgun laws and extends the economic model of crime to examine such effects. Section III discusses the estimation issues involved in measuring the effect of these laws and the problems with using dummy variables for this purpose. This section also presents an alternative estimation procedure that draws on the theoretical considerations discussed in Section II. Section IV describes the data and presents and discusses the results. Section V contains concluding remarks.
The effect of concealed handgun laws on crime 203
II. Concealed handgun laws and the economic model of crime Thirty-one states have so far adopted concealed handgun laws.3 These laws require that permits to carry concealed handguns be granted to any adult applicant unless the individual has a criminal record or a history of serious mental illness. Prior to adopting these laws, local authorities had discretion in granting such permits on a case by case basis, and the most populated counties were the most restrictive in issuing such permits (Lott and Mustard (1997)). The supporters of concealed handgun laws argue that allowing law-abiding citizens to carry concealed handguns increases the overall security by deterring attackers (Kleck and Patterson (1993), Polsby (1995), and Lott and Mustard (1997)). Since the firearms are concealed, predators do not know a priori which potential victims or bystanders might be armed. The armed citizens, therefore, not only enhance their own security but also provide a positive externality for unarmed citizens. The resulting uncertainty increases the criminal’s perceived failure probability, leading to a lower expected benefit from (or a higher expected cost of) a criminal act and therefore to a lower crime rate. The opponents of these laws are skeptical of such implications (Cook (1991), Kellermann et al. (1993, 1995), Cook, Molliconi, and Cole (1995), McDowall, Loftin, and Wiersema (1995), Cook and Ludwig (1996), Hemenway (1997), and Ludwig (1998)). They argue, to the contrary, that these laws are likely to increase the crime rate. For example, Cook and Leitzel (1996) note that only a small percentage of felons and youths use the primary market to acquire their handguns; the rest rely on friends, theft, or on street transactions to acquire handguns. Through these channels, concealed handgun laws may increase the number of guns available to criminals. Criminals can also use their victims’ guns against them, as many individuals who suddenly find themselves involved in a violent confrontation may not be able to use their guns effectively (Kellermann et al. (1995)). Overall, these authors believe that increased gun availability lowers the criminals’ cost of illegally obtaining firearms, thus fueling a civilian arms race. The enhanced prevalence of guns, in turn, prompts their substitution for less lethal weapons in hostile confrontations, thus leading to an increase in crime rates (Ludwig (1998)). The arguments on both sides imply that the net change in expected benefit from committing crime is the causal link between concealed handgun laws and crime rates. The direction and magnitude of such change depends on the relative strength of the hypothesized forces. In the rest of this section we use the economic model of crime to formalize theses arguments, to examine theoretically the effect of these laws on crime, and to provide a basis for empirical examination of the issue.
204
Hashem Dezhbakhsh and Paul H. Rubin
Modeling the effect of the law The economic models of criminal behavior—Fleisher (1966), Becker (1968), Sjoquist (1973), Ehrlich (1975), and Block and Heineke (1975)—are formulated within the framework of the theory of choice under uncertainty. These models derive a supply, or production offense, function assuming an optimizing agent who allocates time between legal and/or illegal activities in such a way as to maximize expected utility. Given the empirical focus of this paper, we do not develop a new crime model; rather, we extend an existing model to incorporate the effect of the gun laws. The basic model we consider assumes that an individual must allocate a ¯¯ to two time-consuming activities, one of which is given amount of time T legal, i.e., work, and the other illegal, i.e., any criminal behavior. The times allocated to these activities are denoted by Tl and Ti respectively. Also, follow¯¯ excludes the time devoted to non-market ing other studies, we assume that T ¯¯ = T0 − Tnm, where T0 is total activities, Tnm, such as leisure; so, Tl + Ti = T time. We assume, the individual’s preference ordering is a von NewmannMorgenstern utility function, U(Tl,Ti,W(x)), where W represent pecuniary as well as non-pecuniary (psychic) wealth and x represents the stochastic component of the model. The individual’s optimal supply of illegal (and legal) activities is determined by maximizing the following expected utility function: max 冮 U[Tl, Ti, W0 + RTl + (B − xP) C(Ti )] dF(x),
(1)
Ti
¯¯. The third argument in the utility subject to Tl ≥ 0, Ti ≥ 0, and Tl + Ti = T function is wealth which includes the individual’s assets (net of expected current earnings) W0, return on legal activities R (i.e., wage rate), number of criminal offenses C(Tl), benefit per offense B, punishment (if arrested) per offense P, and a random variable, x, representing the stochastic failure rate. The function F (x) denotes the individual’s subjective probability distribution of x. Following Block and Heineke (1975), we assume that random variable x can take any value in the interval [0, 1].4 Also, note that both B and P incorporate pecuniary as well as non-pecuniary (psychic) values and that the number of criminal offenses is assumed to increase with the amount of time devoted to illegal activities—C ′(Ti) > 0, We introduce concealed handgun laws through an index variable H defined on [0, 1] interval, where H = 0 means no law (no concealed handgun carrying) and a larger H value indicates a more permissive handgun law. To incorporate the arguments for or against such laws, we extend the basic model by allowing several of its variables to change with H. Some changes capture the deterrent effect and others capture the facilitating effect of these laws. The deterrent effect is rooted in the offender’s concern about an armed response which alters his perceived probability distribution of the stochastic failure rate x and also increases the possible punishment for an unsuccessful crime P. We
The effect of concealed handgun laws on crime 205 model the probability change by introducing a parameter, α, that shifts the mean of the perceived distribution—the expected failure rate. This changes the failure rate to x + αH where the added term αH is zero when H is zero (no law) and increases with H in such a way that x + αH remains within the [0, 1] range. So, the expected failure rate in the presence of a concealed handgun law is E(x) + α. The prospect of an armed response also increases the possible punishment. The perpetrator now, besides fearing financial sanctions and prison term, must worry about being shot by the victim or a bystander. We model this by changing P to P(H), where P(H) > 0. On the other hand, the facilitating effect of the law manifests itself through an increase in the net benefit per offense B and an increase in the number of offenses, C, given the amount of time devoted to illegal activities, Ti . Note that B is net of any expense associated with implementing a crime, so the reduction in the cost of acquiring handguns, which is argued to be the result of permissive gun laws, increases B. So we change B to B(H), where B (H) > 0. Moreover, the substitution of handguns for less lethal weapons increases the efficiency of offenses committed for any given amount of time devoted to illegal activities. So the offense function is now C(Ti, H), where ∂ C(·) CH = > 0 according to the opponents of concealed handgun laws.5 ∂H Therefore, in the context of the economic model of crime, concealed handgun laws affect the individual’s decision to commit crime by changing the parameters of his expected earnings function. The resulting wealth function is W(x, H) = W0 + RTl + (B(H) − (x + αH)P(H)) C(Ti, H).
(2)
Given any time allocation scheme (Tl, Ti ), the expected change in wealth resulting from a more permissive handgun law is E(D) = − E [(αP + (x + α)P′)C(·)] + E[B′C(·) + (B − (x + α)P)CH].
(3)
The expression consists of two bracketed terms: The first term captures the deterrent effect and is negative. The second term captures the facilitating effect and is positive.6 Obviously, the combined effect cannot be signed without some knowledge of the relative magnitude of the parameters. The individual’s optimization problem in the extended framework is given by max 冮 U[Tl, Ti, W0 + RTl + (B(H) − (x + αH) P(H)) C(Ti, H)] dF(x), Tl
¯¯. The first-order condition for subject to Tl ≥ 0, Ti ≥ 0, and Tl + Ti = T maximization requires that A = E[Ui − Ul + Uw((B(H) − (x + αH) P(H)) Ci (Ti, H) − R)] = 0,
(4)
206 Hashem Dezhbakhsh and Paul H. Rubin where the first three terms denote the derivatives of U with respect to Tl, Tl, and W, respectively, Ci denotes the derivative of C with respect to Ti, and Ui − Ul is referred to as the individual’s preference for honesty. The second∂A < 0. order condition requires that ∆ = ∂Ti Equation (4) provides the basis for evaluating the effect of various policy changes. We are mainly interested in the effect of concealed handgun laws on the time allocated to criminal activities as well as on the number of crimes committed. These effects are analytically derived by differentiating equation (4). The effect of a more permissive handgun law on Ti is E
∂Ti
冤∂H冥 = − ∆ E 冤(U 1
iW
− UiW + UWWG)D + UW
∂G
冥,
∂H
(5)
where D is defined in equation (3), ∆ is from the second-order condition above, and G is the expression which is multiplied by UW in equation (4). Since the ratio outside the bracket is positive the sign of the derivative depends on the expectation of the bracketed term. This term cannot be signed without a detailed knowledge of the individual’s preference structure and the magnitudes of D and G. However, equation (5) clearly indicates that the effect of concealed handgun laws on criminal activities does indeed depend on several variables, some individual specific and others more general. For example, such an effect depends on the individual’s attitude toward risk UWW, the effect of increased wealth on his preference for honesty UiW − UlW, his perceived failure rate, the perceived benefits and costs associated with concealed handgun laws, and return on legal market activities. Also, since the number of criminal offenses increases with Ti, the law has a similar effect on the number of crimes. More specifically, E
∂C(·)
∂Ti
冤 ∂H 冥 = E 冤C ∂H冥 + E(C ), i
H
(6)
where the second term on the right-hand side is zero in absence of a facilitating effect. This effect cannot be signed either. Given the sign-ambiguity, the issue has to be settled empirically. The above results, however, should influence the empirical examination of the effect of the law, as will be discussed below. Empirical implications Equations (5) and (6) and the above arguments suggest that the effect of concealed handgun laws on the crime rate is not fixed, because it depends on behavioral parameters as well as the exogenous variables of the underlying model. This theoretical finding is also consistent with other observations
The effect of concealed handgun laws on crime 207 reported in the literature. Ludwig (1998), for example, argues that because juveniles are not eligible to carry concealed weapons, any deterrent benefit from such laws will be limited to the non-juvenile population. In the present context, such asymmetry affects the behavior of the criminals to the extent that their potential victims are juveniles. Therefore, counties with a younger population may not experience the full deterrent effect of these laws. Other demographic determinants of the propensity to carry a concealed handgun, e.g., age or gender, may also lead to a similar differential effect. Moreover, the effect of the law on crime should be more pronounced in the more populated counties, because authorities who have discretion over issuing handguncarrying permits in the absence of a concealed handgun law are the most restrictive in using such discretion in populated counties. The largest changes in handgun density as the result of such laws are therefore expected in densely populated counties. Finally, Black and Nagin’s (1998) time-specific dummies also point to the variability of the effect of these laws. Using the county as the basis for aggregation, behavioral equations (5) and (6) can be written in the following general form: E
∂C (·)
冤 ∂H 冥 = EK[W , R , B , P , α C (T ), x , g (U), η ], jt
0
jt
jt
jt
jt
jt
i
jt
jt
jt
(7)
where j and t denote county and time, K[·] is a general function, g(U) is a function denoting higher derivatives of the utility function, and η is a portmanteau variable capturing influences which are unaccounted for as well as higher derivatives of the terms included in the above expression. The heterogeneity indicated by the above equations has important implications for testing the effect of concealed handgun laws on crime. In particular the effect is not fixed and should be allowed to vary across observation units, e.g., counties. Moreover, if the law affects the behavior of criminals or of citizens, then the testing procedure should allow the behavioral (response) parameters of the model to change. It seems highly unlikely that the magnitude of the effects such laws may have on crime rates in a county would be independent of its economic and demographic characteristics. In fact, the effect may vary with the age and gender composition of the population, population density, characteristics of police, and economic conditions of the counties, among other things. Finally, variations across counties within a state in terms of how easily permits were issued prior to adoption of concealed handgun laws make it necessary to allow the effect of these laws to be heterogeneous across counties. For example, the most pronounced changes are expected in counties with the most restrictive licensing practice prior to the enactment of the law. Ignoring such heterogeneity and assuming that ∂C (·) E is a fixed quantity leads to estimation bias due to imposing ∂H jt incorrect restriction. We later report empirical evidence to support this point.
冤
冥
208
Hashem Dezhbakhsh and Paul H. Rubin
Also, note that a crime equation in implicit form can be derived from the first-order condition, equation (4). The right-hand side variables and parameters of this equation are the same as the variables and parameters that appear on the right-hand side of equation (7) which captures the effect of the concealed handgun law on crime. We maintain the same parallel between our crime equation and the equation we propose for measuring the effect of the law.
III. Estimation methods and issues In regression analysis an intercept-shifting dummy variable is often used to estimate the effect of an institutional change. The statistical and conceptual ramifications of this practice are seldom examined, particularly when the empirical analysis is not predicated on economic theory. To better motivate our procedure which is intended to overcome the shortcomings of this approach, we elaborate on this issue using Lott and Mustard’s (1997) highly publicized study on the effect of concealed handgun laws on crime.7 Lott and Mustard use county-level panel data to estimate several linear crime equations. The dependent variable in each equation is one of several crime rates— murder, rape, aggravated assault, robbery, burglary, larceny, and auto theft. The regressors include the arrest rate corresponding to that crime category, a host of economic and socio-demographic factors, and a binary variable measuring the status of the concealed handgun law. This variable equals 1 if a county has such a law in place in a given period and 0 otherwise. The other regressors serve as control variables. The model they estimate is therefore Cjt = α + γHjt + βAjt + δXjt + εjt,
(8)
where H is the binary variable, A is the arrest rate, X includes the economic and demographic variables and a set of time and county dummies (one for each sampling year or county), ε is the regression error, and j and t denote counties and time periods, respectively. Lott and Mustard’s inference about the effect of concealed handgun laws on various categories of crime is based on the sign and statistical significance of the estimated coefficient of the binary variable—estimate of γ. A positive and significant estimate suggests that concealed handgun provisions would increase the crime rate, while a negative and significant estimate points to the contrary conclusion. Note that they use γ in place of the expression in the right-hand side of equation (7). This expression clearly depends on countyspecific exogenous variables as well as the behavioral parameters of the model. Ignoring the heterogeneity of the effect of the law on various counties and parameterizing the effect as a fixed parameter leads to biased estimation. The 2SLS estimate of γ reported by Lott and Mustard is negative, substantially large, and significant for all crime categories, further supporting their
The effect of concealed handgun laws on crime 209 deterrence hypothesis.8 The aforementioned bias can perhaps explain these unusually large negative estimates.9 Following our theoretical results, we allow all behavioral parameters of the regression model to change with the law, and thus the effect of the law on crime rates to be heterogeneous across counties. The data will then show which of these parameters the law indeed affects. We implement this parameter flexibility by first estimating two separate crime equations, one for counties in states with a concealed handgun law and the other for the remaining counties: Cl,jt = αl + βlAl,jt + δlXl,jt + εl,jt,
(9a)
Cnl,jt = αnl + βnlAnl,jt + δnlXnl,jt + εnl,jt,
(9b)
where l, nl indicate the presence or the absence of the concealed handgun law, respectively. Then, we examine whether the law affects the response parameters by using an asymptotic Wald test of the null hypothesis H0 : Θl = Θnl against the alternative H0 : Θl ≠ Θnl, where Θ denotes (β,δ).10 This hypothesis implies that the effect of the law on crime is a constant parameter γ (or αl − αnl) which does not change across county or over time. This of course is at odds with equation (7). A rejection of the null implies that the law affects the response (slope) parameters of the model, thus rejecting a simple intercept change formulation. As we report in the next section the null of a fixed parameter effect of the law is rejected strongly in all cases, making the use of a less restrictive procedure necessary. We estimate for each county the direction and extent of the change in crime rate that may result from introducing the concealed handgun law. We determine how different the crime rate would have been during 1992 in the counties that did not have the concealed handgun law in place, had they adopted the law by 1992. We obtain these estimates, which are useful for policy purposes, simply by switching the estimates of the behavioral parameters in equations (9a) and (9b) and computing the resulting predicted values for the dependent variable (the crime rate) over the relevant year. The estimates are obtained from C∃j92 = α∃l + Θ ∃l Znl,j92, where Znl denotes the regressors in equation (9b), Θ denotes (β,δ), 92 is year, and j is restricted to the aforementioned group of counties. These are simply predicted crime rates conditional on adopting the concealed handgun law. The difference between the predicted and actual crime rates measures the effect of concealed handgun laws on crime. We emphasize that our interest is in making an inference about the expected 1992 crime rates conditional on the law being in place in a county that did not have it in 1992. This estimate is then compared with the county’s actual 1992 crime rates to estimate the expected change. It is important to note that in the above comparison, one should not use the county’s predicted crime rate without the law in 1992, α∃nl + Θ ∃nl Znl,j92, instead of the observed (actual) crime rate Cnl. This is because the former does not have any information that is useful for our inference but is not contained in
210 Hashem Dezhbakhsh and Paul H. Rubin the county’s observed 1992 crime rate. Therefore, if we used the predicted crime rate instead of the actual crime rate, we would just be adding extra noise (residual), thus reducing the accuracy of the inference. Also, note that all the information relevant to adopting the law is incorporated in Θ ∃l which is estimated using counties with the law. To see how our procedure relates to the theoretical equation (7) and also to formally contrast this procedure with the intercept-shifting dummy variable approach, consider the following. The latter parameterizes the law-induced crime change as αl − αnl which is γ in equation (8)—the shift in the intercept. This parameter is fixed across all counties and independent of county characteristics. It also implies that the law does not affect any of the behavioral parameters of the model. We, on the other hand, parameterize the change as (αl + Θl Znl,j92) − Cnl,j92 which after substitution from (9b) and setting the random error εnl to zero— its expected value—yields (αl − α nl) + (Θl − Θnl)Znl,j92. Note that the first term in the above expression is the intercept change, used, e.g., by Lott and Mustard, while the second term is our addition which varies with county characteristics and is a function of model parameters. This expression is the empirical counterpart of the law-induced crime change as given by equation (7). Also, note the similarities between the parameters and variables in this expression and those in the crime equations (9a,b). We documented a similar parallel between the theoretical counterparts of crime and the change in crime. We summarize the predictions we so obtain to generate an inference about the potential influence of the law in each state which did not have a concealed handgun law in 1992. The predictions are further analyzed to determine factors that influence their direction or magnitude. This approach allows the effect of permissive handgun laws to vary with population density, racial and gender characteristics, income, and so forth. At the same time, it exploits the variation in the timing of these state laws to investigate their impact. Following Ehrlich (1973), in all our estimation we treat the arrest rate, A, as an endogenous variable that is affected by such variables as lagged crime rate, economic and demographic variables in the crime equation, police employment and payroll, and a set of variables to control for political influences. These latter variables include percentage of votes received by the Republican presidential candidate, and the percentage of a state’s population that are members of the National Rifle Association.11 We estimate equations (9a) and (9b) along with the corresponding arrest equations via 2SLS, allowing the concealed handgun law to also shift the coefficients of the arrest equation in
The effect of concealed handgun laws on crime 211 the first stage of estimation; such shifts are incorporated in cases where the Wald test applied to an arrest equation suggested such a change is warranted. This ensures the consistency of the second-stage estimates. In all our estimations, we correct the residuals from the second-stage least square to account for using predicted arrest rather than the actual arrest rate in estimation of crime equation; see, e.g., Davidson and MacKinnon (1993, ch. 7).
IV. Data and results12 We use the data provided to us by Lott and Mustard. The data set covers 3,054 counties for the period 1977–1992. However, since several series are only reported for 1982 through 1992, the effective time span is shorter. The data set includes the FBI’s crime data for murder, rape, aggravated assault, and robbery which comprise “violent crime” and auto theft, burglary, and larceny which comprise “property crime.” The series also include the corresponding arrest rate for these nine crime categories, population, population density, real per capita personal income, real per capita unemployment insurance payments, real per capita income maintenance payments, real per capita retirement payments per person over 65 years of age, and population characteristics for thirty-six age and race segments (black, white and other; male and female; and age divisions.) The data set also includes state-level data on police employment and payroll, percentage of votes received by the Republican presidential candidate, and the percentage of each state’s population that are members of the National Rifle Association. This is the most exhaustive panel data set available for research in this area. As indicated earlier, our empirical strategy starts with testing whether the data support modeling the effect of the law through an intercept-shifting dummy variable—the hypothesis of no slope change due to the law. Using an asymptotic Wald test for all nine categories of crimes, we find that this hypothesis is rejected strongly for all categories of crime. The statistics for various crime equations are 131.2 (murder), 152.5 (rape), 395.3 (aggravated assault), 194.2 (robbery), 451.2 (burglary), 323.7 (larceny), 479.3 (auto theft); all statistics have p-values which are close to zero. That is, in all cases, there are significant changes in the slope coefficients, so that assuming all changes to be embedded in the intercept is incorrect. This suggests that the LottMustard results are biased by misspecification. Similar results for the arrest equation, used in the first stage of the 2SLS estimation, indicate the coefficients of these equations also change with the law. In fact, we incorporate these changes when obtaining the predicted arrest rates. A comparison of our predicted arrest rates to that of Lott and Mustard’s reveal the inaccuracy introduced by limiting the change to the intercept term. For example, depending on the crime category, the mean square error of Lott and Mustard’s predicted arrest rates is from 1.5 to 5.2 times larger than ours. Their predicted arrest rates also include a large number of negative values (i.e., over 19,000 of about 33,000 observation for auto theft are negative; the number of negative
212
Hashem Dezhbakhsh and Paul H. Rubin
arrest rates for aggravated assault and property crimes are, respectively, 9,900 and 13,500. We use the two-stage procedure described earlier to estimate the hypothetical effect on crime in each county in states that did not have a concealed handgun law in place if such a law had been in effect in 1992. We examine these effects in two ways, both on a county-by-county basis. First, we examine for each crime and for each county the predicted effect of changing the law. Table 9.1 contains summary statistics derived from these county-level conditional predictions. Second, we examine the effect of county characteristics on predicted change in crime rates for each aggregated crime category (violent, property). Table 9.2 reports results of regressing these predictions on various county characteristics. The interpretation of Table 9.1 is as follows: There were thirty-three states without such laws in 1992, excluding Pennsylvania where Philadelphia county was given exemption from the law passed in 1989. Consider, for example, murder in Texas. Since Texas is in our sample, this indicates that in 1992 this state did not have a concealed handgun law in place, although the * indicates that it had adopted such a law by 1996. There are 254 counties in Texas as shown in (column 1). Had the concealed handgun law been in effect in Texas in 1992, then in seven of those counties, which include 0.4 percent of the population in the state and account for 9.4 percent of the state murders, murder rates would have decreased by a statistically significant amount.13 Thus, for counties in six states a concealed handgun law would have reduced murder rates and for all counties in the other twenty-seven states it would have been ineffective. Overall, the results indicate a relatively small, and crime-reducing, effect of concealed handgun laws on murder rates. It appears that there would have been little effect on rape with twenty-one states unaffected, four states with unambiguous increases, and two states with unambiguous decreases. The effect on robbery would have been an increase in crime for many states. For counties in thirteen states, there would have been an unambiguous increase in robbery; there would have been mixed effect (increase in some counties and decrease in some) in counties in only three states. The overall increase in robbery is not surprising given that concealed handguns add little deterrence in this case. Many potential robbery targets such as banks and various shops already have armed protection; therefore, concealed firearm laws do not provide them with deterrence benefits, but these laws apparently have a large crime-facilitating effect for robbers. For aggravated assault eleven states would have been unaffected, seven states adversely affected, and four states would have observed a drop in crime. The result for the remaining states is mixed. For the three categories of property crime (only two reported in the table) the effect would have been more mixed. Altogether there were 33 states containing 2074 counties that did not have shall issue laws in 1992, so the largest percentage of counties predicted to be affected in one direction by changing the law would have been
Alaska* (26) Arizona* (15) Arkansas* (75) California (58) Colorado (63) Delaware (3) Dist. of Col. (1) Hawaii (5) Illinois (102) Iowa (99) Kansas (105) Kentucky* (120) Louisiana* (64) Maryland (24) Mass. (14) Michigan (83) Minnesota (87) Missouri (115) Nebraska (93) Nevada* (17) New Jersey (21) New Mexico (33) New York (62)
States without such laws in 1992 (no. of counties)
0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%)
0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 3 (2.1%, 15.8%) 0 (0.0%, 0.0%) 1 (0.2%, 7.5%) 3 (0.8%, 7.1%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 1 (0.3%, 9.6%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%)
0 (0.0%, 0.0%) 1 (3.0%, 0.2%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 5 (14.2%, 3.3%) 0 (0.0%, 0.0%) 1 (0.6%, 0.5%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 2 (1.6%, 1.4%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%)
0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 3 (1.7%, 8.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 1 (0.7%, 8.6%) 1 (0.1%, 2.2%) 9 (3.7%, 32.9%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 2 (3.7%, 4.3%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%)
Crime decrease 0 (0.0%, 0.0%) 1 (3.0%, 0.1%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 1 (0.1%, 0.1%) 8 (18.1%, 5.7%) 1 (0.6%, 0.2%) 8 (4.8%, 4.2%) 1 (1.1%, 0.1%) 2 (14.7%, 8.6%) 0 (0.0%, 0.0%) 2 (1.0%, 0.2%) 1 (0.5%, 0.3%) 3 (1.8%, 1.4%) 0 (0.0%, 0.0%) 1 (3.2%, 6.3%) 0 (0.0%, 0.0%) 1 (3.2%, 0.4%) 0 (0.0%, 0.0%)
0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 1 (1.5%, 7.3%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 1 (0.6%, 1.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) (Continued )
Crime decrease
Crime increase
Crime increase
Crime increase
Crime decrease
Robbery No. of counties (population & crime as % of state pop. & crime) to experience:
Murder Rape No. of counties (population & crime as No. of counties (population & crime as % of state pop. & crime) to experience: % of state pop. & crime) to experience:
Table 9.1 The predicted effect of adopting concealed handgun laws on crimes in states without such laws in 1992
0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%)
0 (0.0%, 0.0%) 1 (3.0%, 0.6%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 3 (5.7%, 2.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 8 (5.3%, 2.8%) 8 (13.2%, 1.7%) 1 (0.3%, 0.3%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%)
Alaska* (26) Arizona* (15) Arkansas* (75) California (58) Colorado (63) Delaware (3) Dist. of Col. (1) Hawaii (5) Illinois (102) Iowa (99) Kansas (105) Kentucky* (120) Louisiana* (64)
0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 5 (7.5%, 6.2%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 2 (0.1%, 0.9%) 1 (0.4%, 2.0%) 0 (0.0%, 0.0%) 12 (6.4%, 8.2%) 1 (0.2%, 2.5%)
0 (0.0%, 0.0%) 1 (1.3%, 3.2%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 7 (0.4%, 9.4%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 1 (3.0%, 1.0%) 2 (0.9%, 1.4%) 0 (0.0%, 0.0%) 9 (6.6%, 6.2%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 9 (1.8%, 2.6%) 23 (36.6%, 13.8%) 10 (2.0%, 3.1%) 50 (36.3%, 41.4%) 2 (1.3%, 0.9%)
2 (1.9%, 0.3%) 2 (2.6%, 0.4%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 1 (1.6%, 0.3%) 2 (10.9%, 8.1%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 1 (0.7%, 0.2%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 8 (4.9%, 7.8%) 0 (0.0%, 0.0%) 3 (1.3%, 2.7%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 13 (5.1%, 11.8%) 4 (1.8%, 4.7%) 19 (24.6%, 16.8%) 6 (1.2%, 2.9%) 5 (6.4%, 8.5%)
0 (0.0%, 0.0%) 1 (1.2%, 1.8%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 5 (5.2%, 13.8%) 8 (0.6%, 6.4%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%)
Crime decrease
0 (0.0%, 0.0%) 2 (3.2%, 1.9%) 5 (2.3%, 0.8%) 0 (0.0%, 0.0%) 1 (0.1%, 0.8%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 15 (2.7%, 3.9%) 19 (33.8%, 12.4%) 16 (5.1%, 10.4%) 23 (9.0%, 13.2%) 4 (3.2%, 2.3%)
0 (0.0%, 0.0%) 5 (3.9%, 1.2%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 6 (12.7%, 17.3%) 2 (0.3%, 0.1%) 0 (0.0%, 0.0%) 2 (1.6%, 0.3%) 0 (0.0%, 0.0%)
0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 4 (2.2%, 4.0%) 1 (0.1%, 0.2%) 4 (5.4%, 4.9%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 6 (6.2%, 12.7%) 5 (2.4%, 6.9%) 15 (4.0%, 4.9%) 14 (4.7%, 6.9%) 4 (2.3%, 5.4%)
0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 1 (0.01%, 2.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%)
Crime decrease
Crime increase
Crime increase
Crime increase
Crime decrease
Robbery No. of counties (population & crime as % of state pop. & crime) to experience:
Murder Rape No. of counties (population & crime as No. of counties (population & crime as % of state pop. & crime) to experience: % of state pop. & crime) to experience:
N. Carol.* (100) Ohio (88) Oklahoma* (77) Rhode Island (5) S. Carolina* (46) Tennessee* (95) Texas* (254) Utah* (29) Wisconsin (72) Wyoming* (23)
States without such laws in 1992 (no. of counties)
Table 9.1—continued
1 (14.4%, 7.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 1 (0.2%, 0.2%) 4 (2.0%, 1.2%) 2 (0.6%, 0.8%) 0 (0.0%, 0.0%) 1 (3.2%, 5.0%) 1 (3.2%, 0.5%) 1 (3.9%, 0.8%) 2 (0.3%, 0.1%) 8 (7.3%, 4.4%) 2 (0.6%, 0.3%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 7 (13.2%, 7.8%) 3 (0.1%, 0.1%) 1 (0.6%, 0.2%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%)
1 (1.5%, 4.4%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 2 (0.6%, 7.9%) 6 (4.7%, 11.2%) 3 (1.9%, 3.2%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 2 (0.4%, 4.2%) 0 (0.0%, 0.0%) 3 (2.0%, 9.6%) 1 (0.3%, 7.1%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 6 (9.4%, 13.8%) 17 (1.6%, 8.1%) 0 (0.0%, 0.0%) 2 (1.1%, 2.9%) 0 (0.0%, 0.0%)
3 (15.3%, 11.3%) 1 (0.1%, 6.6%) 0 (0.0%, 0.0%) 8 (2.1%, 3.3%) 18 (6.9%, 16.6%) 12 (3.6%, 11.6%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 2 (3.5%, 5.0%) 0 (0.0%, 0.0%) 2 (0.5%, 0.4%) 14 (9.0%, 10.1%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 10 (39.1%, 21.7%) 16 (16.1%, 12.6%) 13 (0.5%, 2.4%) 2 (0.9%, 3.2%) 0 (0.0%, 0.0%) 2 (2.7%, 3.2%)
2 (2.6%, 5.2%) 0 (0.0%, 0.0%) 1 (0.2%, 1.4%) 5 (1.2%, 4.6%) 8 (6.4%, 9.4%) 8 (5.2%, 8.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 3 (0.5%, 3.0%) 8 (4.0%, 5.0%) 2 (0.4%, 1.5%) 4 (0.6%, 3.2%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 3 (1.2%, 3.9%) 20 (1.0%, 5.9%) 4 (4.1%, 14.5%) 14 (9.0%, 17.6%) 1 (1.1%, 2.1%)
3 (15.3%, 12.3%) 1 (0.1%, 1.2%) 0 (0.0%, 0.0%) 3 (0.6%, 0.9%) 18 (6.1%, 7.4%) 12 (4.2%, 8.1%) 2 (0.7%, 5.4%) 0 (0.0%, 0.0%) 2 (6.5%, 4.4%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 9 (5.2%, 3.5%) 3 (0.9%, 1.6%) 0 (0.0%, 0.0%) 3 (9.6%, 7.6%) 15 (16.1%, 15.6%) 37 (2.2%, 7.2%) 2 (0.5%, 3.7%) 2 (0.9%, 0.1%) 4 (5.9%, 4.6%)
0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 6 (1.0%, 4.2%) 12 (4.5%, 9.2%) 5 (4.8%, 3.5%) 6 (4.6%, 10.9%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 4 (3.4%, 2.2%) 5 (3.0%, 3.0%) 3 (1.7%, 2.9%) 1 (0.1%, 0.3%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 0 (0.0%, 0.0%) 24 (2.4%, 7.3%) 5 (5.6%, 12.8%) 16 (16.9%, 23.8%) 1 (1.1%, 3.5%)
Notes: The entries in each crime category are the number of counties in each state that would have experienced a statistically significant change in their 1992 crime rates, had they adopted a concealed handgun law by 1992. The numbers in parentheses are the respective population of these counties as a percent of the state population and their crime rates as a percent of the state total crimes in that category. In 1992 Philadelphia was the only county in Pennsylvania that was exempt from Pennsylvania’s 1989 concealed handgun law. Entries for Philadelphia not reported, are all zero. * indicates that the state adopted a handgun law between 1992 and 1996.
Maryland (24) Mass (14) Michigan (83) Minnesota (87) Missouri (115) Nebraska (93) Nevada* (17) New Jersey (21) New Mexico (33) New York (62) N. Carol.* (100) Ohio (88) Oklahoma* (77) Rhode Island (5) S. Carolina* (46) Tennessee* (95) Texas* (254) Utah* (29) Wisconsin (72) Wyoming* (23)
216
Hashem Dezhbakhsh and Paul H. Rubin
Table 9.2 Determinants of the magnitude of the change in crime induced by concealed handgun laws Characteristics
Violent crimes
Property crimes
Arrest Rate Police Payroll Population Density NRA Membership Income Retirement Payment Black Males (10–29) Black Fem. (10–29) Non-Black Males (10–29) Non-Black Fem. (10–29) Population Over 65
+ −
+ − + −
+ + − − + + − −
+
− −
Notes: A + sign indicates that characteristic is associated with a significant (at the 10% level) increase in the type of crime for counties if a handgun law had been in effect in 1992; a − sign indicates that the characteristic is associated with a significant decrease.
the 15 percent of counties predicted to experience an increase in larceny; all other predicted percentage changes in any direction are less than 10 percent. We can also derive policy implications from these ex post predictions for particular states which had not adopted the law by 1996 (states without an identifying *). Maryland would expect increases in robbery, assault, burglary, and auto theft, and so probably should not adopt the law. Similarly New Mexico would expect small increases in robbery and all three categories of property crime, and so also should not adopt the law. In Iowa, rape, robbery, assault, burglary and auto theft would increase, if the law is adopted. On the other hand, were Illinois to adopt a handgun law, then we would expect decreases in murder, robbery, burglary, and auto theft, but an increase in assault. Kansas could expect reductions in murder, rape, and burglary, and increases in auto theft and a small increase in assault. Minnesota might also benefit from the law. For most other states that had not adopted the law by 1996, effects would be small and mixed. We next determine which characteristics of counties are associated with increases or decreases in each aggregate type of crime (violent and property crime). We do this by regressing the predicted change in crime rates for each of the counties without the law in 1992 on a set of demographic and economic variables for the county. The economic variables, all measured per capita, are personal income, unemployment insurance, and retirement payments per person over 65. We also include (predicted) arrest rates, population density, and demographic variables. Since most crime is committed by young males, we include number of black and non-black males 10–29 years old, and similarly for females. We include persons 65 and over, who are perhaps more likely to be victims than perpetrators of crimes. Finally, we include per capita measures of the number of NRA members in the state, and police payroll. In
The effect of concealed handgun laws on crime 217 all cases, we measure the effect of the relevant variable on predicted changes in crime by category of the existence of a concealed handgun law in the county. Regression results are summarized in Table 9.2. For example, the + marks for arrest rate suggest that for counties with higher arrest rates, passage of shall issue laws leads to increased crime, perhaps because residents of such counties have a higher propensity to commit crime which leads to a larger facilitating effect of handguns. On the other hand, for counties that spend relatively more on police the laws lead to crime reductions. This is plausible: Higher spending on police will not effect the deterrent benefit of handguns, but will reduce the facilitating effect of handguns that benefits criminal activity. It may also be that higher police expenditures enable more effective screening of would-be gun carriers. This implies that states contemplating passage of handgun laws should increase expenditure on enforcement; police and private guns seem to be complements in crime reduction, not substitutes. The other consistent results are for likely victims: more elderly people and more young (10–29-year-old) non-black females are associated with reduced crime as a result of passage of gun laws. This may represent evidence of the deterrent effect in some cases, with these variables contributing to such an effect. Experiments with other specifications indicates that this specification provides most of the useful information in the data, and is sufficiently aggregated so that the results are easily interpreted.
V. Concluding remarks The role of handguns in violent crimes is a hotly debated public policy issue. Recently many states have adopted right-to-carry concealed handgun laws. The advocates argue these permissive laws have a deterrent effect on crime, while the opponents point to their potential crime-facilitating effects through increased gun availability. These arguments imply that concealed handgun laws may cause a change in the behavior of criminals either directly or as the result of a change in the behavior of their potential victims. No attempt has yet been made to model the effect of such laws in the context of the economic theory of crime. The ensuing empirical work, therefore, uses methods that are not based on the behavioral implications of such laws, although such a theoretical basis is necessary for any credible examination of the issue. For example, the highly publicized Lott and Mustard (1997) study that suggests these laws have a strong crime-reducing effect estimates such effects through a regression dummy variable. This method assumes that the effect of the law on crime is identical across all counties and independent of any county characteristics, an assumption flatly contradicted by “conventional wisdom” and by the theory presented in this paper. Research purporting to demonstrate statistically that handgun-related laws have important impacts on crime rates are of direct relevance to policy debates as well as to legislation. It is imperative that such debates and subsequent legislation rely on solid empirical findings.
218
Hashem Dezhbakhsh and Paul H. Rubin
In this paper we extend the economic model of crime to incorporate the effect of concealed handgun laws. We demonstrate that the direction and magnitude of any resulting change would depend on the parameters of the criminal’s optimization problem and thus the characteristics of the individual and his (her) social and economic setting. This means that any change in crime rate induced by concealed handgun laws will depend on demographic, social, and economic specifities of the observation units. So these laws might lead to increases in crime in some jurisdictions and decreases in others. We then propose an empirical procedure to examine the effect of concealed handgun laws on crime rates. Our procedure draws on the theoretical considerations resulting from the extended crime model, therefore allowing us to assess the full implications of the right-to-carry gun provisions. We find that the results of concealed weapons laws are much smaller than suggested by Lott and Mustard and by no means crime-reducing across all categories. For murder, for example, there is only a small reduction. For robbery, many states experience increases in crime. For other crimes, results are ambiguous, with some states showing predicted increases and some predicted decreases. We identify states (Illinois, Kansas, Minnesota) that might benefit from passage of these laws, and states (Maryland, New Mexico, and Iowa) that probably would not. We also examine demographic and other influences on the likely effect of passage of laws on crime rates. We find that there are predictable patterns on the effect of shall issue laws on crime. For example, counties spending more on police could expect a decrease in crime from the passage of a law, or smaller increases where the law leads to an increase in crime; police and private guns seem to be complements in crime reduction. Our theoretical and subsequent empirical work points to the inadequacy of testing the effect of concealed handgun laws without considering their theoretical implication. The sort of analysis developed here could be used to enable policy makers to more carefully tailor laws to particular conditions in a jurisdiction.
Notes 1 Henceforth, we refer to such provisions as “concealed handgun” laws. These laws are also referred to as “shall issue” laws (laws mandating that authorities “shall issue” permits to carry concealed handguns). 2 Lott and Mustard (this volume) report that passage of concealed handgun laws by a state causes a significant reduction in violent as well as property crime rates (Lott and-Mustard, Table 8.11). They attribute their results to a deterrent effect. 3 States are adopting these laws at an increasing rate. Only 8 states had adopted such laws by 1986. By 1992, another 10 states had adopted them, and since then 13 more states have joined the group. 4 This approach is more general than Becker’s, Ehrlich’s and Sjoquist’s approaches that assume x is either 1 or 0 and F(x) follows a Bernoulli distribution. The present model, therefore, encompasses models developed in those studies. Moreover, the binary formulation assumes that the individual makes his allocative decision believing that he would either succeed in all offenses he plans or fail them
The effect of concealed handgun laws on crime 219
5 6 7
8
9
10
11 12 13
all. This is unrealistic because the individual may fail on all, none, or a fraction of the attempted offenses; the formulation we adopt allows the individual to be confronted with a continuum of failure possibilities. In the rest of this analysis we include both effects, although the effects can be isolated by setting either CH and B′ or α and P′ equal to zero. Note that B must exceed the expected punishment [E(x)+α]P for any offense to take place. Lott and Mustard use the most comprehensive data set to examine this issue. There are several other useful but smaller studies that examine the effect of gun availability on crime; See Kleck (1995) and Lott and Mustard (1997) for a review. The 2SLS that treats the arrest rate as an endogenous variable which is itself affected by the crime rate is the appropriate method for estimating equation (8). In addition to 2SLS, Lott and Mustard use OLS method, which ignores the simultaneity between crime and arrest, to project the expected reduction in the number of murders, rapes, robberies, and aggravated assaults for 1992 through 1995 if those states without right-to-carry concealed handgun provisions had adopted them in 1992. Much of the public attention that Lott and Mustard have received centers on these OLS-based projections and not the more appropriate 2SLS results; see, e.g., the article by Richard Morin, in The Washington Post, Sunday, March 23 1997, page 5; also, see Ludwig (1998) and Black and Nagin (1998) who criticize Lott and Mustard on methodological grounds. These authors all focus on the inappropriate OLS results rather than the 2SLS results. Such specification bias also makes the coefficient estimates fragile with respect to small change in the model such as inclusion or exclusion of various control variables. Bartley, Cohen and Froeb (1998) use the method suggested by Leamer (1983) to examine the range of estimates of the coefficient of the binary variable in Lott-Mustard specification and find it to be quite wide in many cases. The Wald statistic is the quadratic form constructed on the estimate of the difference (Θ−Θnl). The statistic is asymptotically distributed as a χ2 variate with degrees of freedom equal to the number of parameters tested (Godfrey (1988, ch. 4) and Lo and Newey (1985), see also Pesaran et al. (1985)). See, also, Lott and Mustard (1997), page 42. We reported some of the empirical results in Dezhbakhsh and Rubin (1998). If the actual 1992 crime rate for a county falls short of (exceeds) the confidence interval for the projected crime rate conditional on the law being in place, then we infer that the law would have increased (decreased) the crime rate for that county.
References Bartley, William Alan and Mark A. Cohen, “The Effect of Concealed Weapon Laws: An Extreme Bounds Analysis,” Economic Inquiry, 36, April 1998, pp. 258–265. Becker, Gary S., “Crime and Punishment: An Economic Approach,” Journal of Political Economy, 76(2), 1968, pp. 169–217. Black, Dan A. and Daniel S. Nagin, Do Right-to-Carry Laws Deter Violent Crime?” Journal of Legal Studies, 27, January 1998, 209–219. Block, M.K. and J.M. Heineke, “A Labor Theoretic Analysis of the Criminal Choice,” American Economic Review, 65(3), June 1975, pp. 314–325. Cook, Philip J., “The Technology of Personal Violence,” In Tonry, M., ed. Crime and Justice: A Review of Research, University of Chicago Press, 14, 1991, pp. 1–70.
220
Hashem Dezhbakhsh and Paul H. Rubin
Cook, Philip J. and James A. Leitzel, “Perversity, Futility, Jeopardy: An Economic Analysis of the Attack on Gun Control,” Law and Contemporary Problems, 59 (1), Winter 1996, pp. 1–28. Cook, Philip J. and Jen Ludwig, “Guns in America: Results of a Comprehensive National Survey on Firearms Ownership and Uses,” Report Prepared for National Institute of Justice, 1996. Cook, Philip J., Stephanie Molliconi and Thomas B. Cole, “Regulating Gun Markets,” Journal of Criminal Law and Criminology, 86(1), Fall 1995, pp. 59–92. Davidson, Russell, and James G. MacKinnon, Estimation and Inference in Econometrics, New York: Oxford University Press, 1993. Dezhbakhsh, Hashem and Paul H. Rubin, “Lives Saved or Lives Lost: The Effects of Concealed Handgun Laws on Crime,” American Economic Review, 88(2), May 1998, pp. 468–474. Ehrlich, Isaac, “Participation in Illegitimate Activities: A Theoretical and Empirical Investigation,” Journal of Political Economy, 81(3), May–June 1973, pp. 521–565. Fleisher, Belton, The Economics of Delinquency, Chicago: University of Chicago Press, 1966. Godfrey, L.G., Misspecification Tests in Econometrics, The Lagrange Multiplier Principle and Other Approaches, Cambridge: Cambridge University Press, 1988. Hemenway David, “Survey Research and Self-defense Gun Use: An Explanation of Extreme Overestimates,” Journal of Criminal Law and Criminology, 87, August 1997, pp. 1430–1445. Kellermann, Arthur L., Lori Westohal, Lauri Fischer, and Beverly Harvard, “Weapon Involvement in Home Invasion Crime,” The Journal of the American Medical Association, 273(22), 1995, pp. 1759–1762. Kellermann, Arthur L., Fredrick P. Rivara, Norman B. Rushforth, Joyce G. Banton, Donald T. Reay, Jerry T. Francisco, Ana B. Locci, Janice Prodzinski, Bella B. Hackman, and Grant Somes, “Gun Ownership as a Risk Factor for Homicide in the Home, “The New England Journal of Medicine, 329(15), 1993, pp. 1084–1091. Kleck, Gary, “Guns and Violence: An Interpretive Review of the Field,” Social Pathology, 1, January 1995, pp. 12–47. Kleck, Gary and E. Britt Patterson, “The Impact of Gun Control and Gun Ownership Levels on Violence Rates,” Journal of Quantitative Criminology, 9, 1993, pp. 249–287. Leamer, Edward E. and Leonard Herman, “Reporting the Fragility of Regression Estimates,” The Review of Economics and Statistics, 65(2), 1983, pp. 306–317. Lo, Andrew W. and Whitney K. Newey, “A Large-Sample Chow Test for the Linear Simultaneous Equation,” Economics Letters, 1985, 18(4), pp. 351–353. Lott, John R., More Guns, Less Crime: Understanding Crime and Gun-Control Laws, Chicago: University of Chicago Press, 1998. Lott, John R. and David B. Mustard, “Crime, Deterrence and Right-to-Carry Concealed Handguns,” Journal of Legal Studies, 26(1), January 1997, pp. 1–68. Ludwig, Jens, “Concealed-Gun-Carrying Laws and Violent Crime: Evidence from State Panel Data,” International Review of Law and Economics, Forthcoming 1998. McDowall, David, Colin Loftin, and Brian Wiersema, “Easing Concealed Firearms Laws: Effects on Homicide in Three States,” Journal of Criminal Law and Criminology, 86(1), Fall 1995, pp. 193–206.
The effect of concealed handgun laws on crime 221 Pesaran, M.H., R.P. Smith, and J.S. Yeo, “Testing for Structural Stability and Predictive Failure: A Review,” Manchester Review, 53, 1985, pp. 280–295. Polsby, Daniel D., “The False Promise of Gun Control,” Atlantic Monthly, March 1994, pp. 57–70. ———— , “Firearm Costs, Firearm Benefits and the Limits of Knowledge,” Journal of Criminal Law and Criminology, 86(1), Fall 1995, pp. 207–220. Sjoquist, David L., “Property Crime and Economic Behavior: Some Empirical Results, American Economic Review, 63(3), 1973, pp. 439–446.
10 Effects of criminal procedure on crime rates Mapping out the consequences of the exclusionary rule* Raymond A. Atkins and Paul H. Rubin I. Introduction Since the work of Gary Becker, economists have studied the impact of various aspects of the criminal justice system on crime rates. There has been, however, little analysis of the effects of criminal procedure on criminal behavior.1 This is surprising, because in the 1960s, the Supreme Court revolutionized criminal procedure in the United States, and its rulings were and remain controversial. The relevant decisions include the Miranda v. Arizona ruling of 1966, requiring the well-known warnings to suspects; the Gideon v. Wainwright ruling of 1963, providing all citizens the right to a lawyer at trial; and the Mapp v. Ohio ruling of 1961, excluding from trial evidence obtained in violation of the Fourth Amendment.2 Each of these changes was a radical transformation of the criminal justice system. Consequently, there was great interest in observing the effects of these new procedural rules. At the time of these decisions, however, the empirical techniques and theoretical models available to analyze the criminal justice system were unable to predict or detect the effects of the Supreme Court’s rulings. Although techniques have now been developed, this gap in the literature persists. Economists and other students of crime who embrace the rational models of criminal behavior have largely ignored procedural aspects of the system, instead focusing upon the detection and sentencing phases of the process. To determine whether changes in criminal procedure can have a predictable and detectable impact on criminal behavior, we focus on the Mapp v. Ohio ruling of 1961. Mapp is best suited for empirical analysis for several reasons. When the Supreme Court decided Mapp, exactly half of the continental states had already adopted a similar rule, creating a control group to be used in the statistical analysis (see Appendix Section A1). It is a perfect example of a “natural experiment.”3 Further, the exclusionary rule established in Mapp has been debated vigorously, both in the past, at the time of the controversial decision, and more recently, with Congress considering altering the rule.4 Changing this rule was also part of the “Contract with America.”5 Finally, much of the importance of the other procedural rules stems from Mapp. For example, the possibility of excluding evidence from trial greatly augments the
Effects of criminal procedure on crime rates
223
value and impact of the Fifth Amendment’s right to counsel.6 For these reasons, we test the general proposition that changes in criminal procedure can have a significant impact on the criminal justice system by focusing primarily on the exclusionary rule. The economic model of criminal behavior predicts that if the Mapp ruling did affect the behavior of police—altering the probability of either conviction or detection—then criminals should respond by increasing their level of unlawful activity. The exclusionary rule increases the costs of police investigations since the police will respond by substituting away from those activities that require a warrant toward those that do not. Criminals in turn would alter their behavior by committing more crimes. The empirical evidence reveals a significant increase in crime rates following the involuntary adoption of an exclusionary rule as the penalty for an unlawful search and seizure. Before the Supreme Court’s 1961 decision, some states had already adopted an exclusionary rule. We would expect the imposition of the rule to have little or no effect on those states, but to lead to an increase in crime in the states that had not adopted the rule voluntarily before the Supreme Court’s decision. This is what we find. See Table 10.1 (comparing average per capita crime rates for 1958 and 1959 with crime rates for 1960 and 1961). Before Mapp, there was no discernible difference in the rate of change in per capita crime rates between those states that admitted unlawfully obtained evidence and those that excluded the evidence. But the table shows that crime rates rose significantly more rapidly after 1961 in states that previously admitted tainted evidence (see also Appendix Table 10.A1).7 Regression analysis of crime rates in the years surrounding the Mapp ruling supports the comparisons in Table 10.1. Further analysis of crime rates shows that the Mapp ruling was not a one-time shock to the criminal justice system. The disparity in crime rates between these two groups of states continued to widen over time. This empirical finding, while consistent with an economic theory of the criminal justice system, is dramatically at odds with current academic and judicial beliefs regarding the impact of the exclusionary rule. Our finding is also in conflict with numerous older studies of the exclusionary rule that failed to detect any significant adverse effect; these studies are discussed below. Table 10.1 Mean percentage change in total per capita crime rates States that prior to 1961
1960–61 versus 1958–59
1962–64 versus 1958–60
Admitted the evidence (%) Excluded the evidence (%) t-Statistic
+12 +13 −.44
+31 +23 2.31
Note. FBI Uniform Crime Reports data.
224
Raymond A. Atkins and Paul H. Rubin
In Section II we set forth the history of the exclusionary rule leading up to Mapp v. Ohio and examine several previous studies of the Mapp ruling. Our empirical work is presented in two sections, dictated by data availability. Section III analyzes state crime data for 1958–67. Section IV investigates city crime data for 1948–69 and also examines the effects of other important Supreme Court decisions.
II. Background The exclusionary rule: Mapp v. Ohio The Fourth Amendment was established to protect the rights of U.S. citizens to be free from personal invasion by protecting them from the intrusive investigative techniques of the federal government.8 The amendment, as interpreted by the Supreme Court, requires that searches be performed with a valid warrant.9 The amendment does not indicate how it is to be enforced and is silent about the punishment for illegal searches. The details of enforcement were left for the judicial system to determine. The Supreme Court first considered excluding evidence as a method of enforcing the Fourth Amendment in 1914. In Weeks v. United States,10 the federal district court was presented at trial with private papers confiscated without the use of a warrant. In this landmark case, the Supreme Court established the United States as the only country that excluded valid evidence from trial to protect its citizens from the government.11 The Weeks decision and several subsequent decisions forced the federal district courts to exclude evidence gathered illegally by federal officers. But this ruling did not extend the protections of the Fourth Amendment and the exclusionary rule to state action. The Court considered this issue in Wolf v. Colorado 12 and determined that the Fourth Amendment was “implicit in ‘the concept of ordered liberty’ and as such enforceable against the states through the Due Process Clause.”13 Although the Court determined that all of the states were subject to the Fourth Amendment, it did not mandate the exclusionary rule as the only acceptable method of enforcement. The Court recognized that the exclusionary rule was only one possible enforcement mechanism available to the states. Other remedies included civil and criminal liability for the officers who performed the searches and for the police departments that permitted the searches. The Court concluded that as long as those individual states without an exclusionary rule had viable alternatives, it would be unnecessary to impose a universal exclusionary rule. The Wolf ruling survived for 12 years. During this time, several states voluntarily moved to an exclusionary rule, supporting the argument that the states would self-select the most efficient rule that fitted their particular circumstances. In 1961, however, the Supreme Court decided, in Mapp v. Ohio,14 to overturn Wolf and to force all states to adopt an exclusionary rule. The Court argued that all other alternatives had failed and that the exclusionary
Effects of criminal procedure on crime rates
225
rule was the only viable course. The Supreme Court argued further that judicial integrity mandated the universal adoption of the exclusionary rule because courts would otherwise passively support the illegal behavior of the government by allowing the use of the ill-gotten gains at trial. Previous empirical studies In the 1960s and 1970s, there were several attempts to measure the effect of Mapp.15 The results of these empirical studies were conflicting, making it difficult for the legal community to determine what impact, if any, Mapp actually had. After examining the studies as of 1976, Justice Blackmum indicated, “The final conclusion is clear. No empirical researcher, proponent or opponent of the rule, has yet been able to establish with any assurance whether the rule has a deterrent effect.”16 This lack of a concrete empirical result, coupled with increasing political and judicial concerns, prompted further analysis into the cost of the exclusionary rule. But these works marked the last attempt to understand Mapp’s impact on the criminal justice system in 1961. The host of studies performed in the late 1970s examined the contemporary system, looking at the number of suspects released or the time spent on evidentiary issues to determine the cost of the new procedural rule. The studies of the late 1970s and early 1980s are responsible for the widely held belief that the exclusionary rule has little effect on crime rates in the United States.17 These studies focused upon the number of cases lost at trial and generally discovered that the percentage of cases lost because of an exclusionary issue was small. From this research, the judicial and political community concluded that there were few repercussions on crime rates because few criminals were released. Although the more recent studies focus on different costs of the exclusionary rule, their general findings demonstrate clearly that one of the costs of the exclusionary rule, the number of cases lost at trial, is not outrageous. The American Bar Association in 1988 summarized the results of these contemporary studies:18 (1) prosecutors screen out between .2 and .8 percent of all adult felony cases because of illegal searches; (2) adding together data from all stages of the felony process, the cumulative loss from illegal searches ranges from .6 and 2.35 percent; (3) in felony arrests for offenses other than drugs and guns, the number of illegal searches is lower; and (4) 2.3 percent of drug cases are screened out, and the total cumulative loss ranges from 2.8 to 7.1 percent. From these and other findings, the special committee wrote, “[T]he conclusion that the exclusionary rule neither causes serious malfunction of the criminal justice system nor promotes crime is strongly supported by practically all of our . . . witnesses and by our telephone survey results.”19 This characterizes the mainstream academic opinion of the exclusionary rule—proponents agree that the exclusionary rule deters the police but consider a possible effect on criminals unlikely and irrelevant.
226 Raymond A. Atkins and Paul H. Rubin But this conclusion overlooks the significant secondary effects of the exclusionary rule. If the rule changes the behavior of police and thereby reduces the probability of apprehension, then theory predicts an increase in the number of crimes committed. Thus, the police may adhere to the exclusionary rule and commit fewer illegal searches, but they might investigate or solve fewer crimes, as the police weigh the benefits of investigating a crime against the costs involved. If the Supreme Court induces police to use alternatives that are less preferred than a search, we would expect a decrease in the effectiveness of police officers, holding all other factors constant. This would lessen the expected punishment from undertaking criminal activities and, accordingly, would eventually increase crime rates.20 This is true even if few or even no cases are ultimately lost at trial because police investigators had fully adapted to the more restrictive procedural rules. The courts would actually exclude evidence only in the rare instances where the police misjudged the new requirements or where the courts modified the existing rules.21 While economic models can predict a general increase in crime rates, the model cannot predict the magnitude of the effect across the various categories of crime. Following the Mapp ruling, police may shift resources toward alternative investigation techniques and away from investigations where the burden is now relatively heavier. One might expect that the exclusionary rule would have a greater impact on the investigation of robberies, as compared to assaults, because the critical piece of evidence, the ill-gotten loot, is potentially excluded from trial. But the police will be aware of this disparity and may reallocate additional police resources toward robbery investigations. In the end, the magnitude of the impact on crime rates may vary depending on the category of crime and will depend on many factors, including the marginal cost to the police for failing to solve the crime, the response by criminals (the elasticity of supply), and the relative cost of alternative investigation techniques. In any case, contrary to the assumptions that underlie the previous empirical research of the Mapp ruling, imposing an exclusionary rule on the states may have increased the crime rates without causing a significant number of cases to be dismissed at trial. Employing two panel data sets, we investigate the impact of the Mapp ruling.
III. Empirical analysis: 48 states, 1958–67 data The data The crime data were gathered from the Uniform Crime Reports, which are compiled by the Federal Bureau of Investigation (FBI). These data were used in the earlier studies, primarily because it is the only complete crime data set available before 1970.22 This data set has several well-recognized problems. First, it consists only of reported crimes. Victims must report the crime to local police, and the police must report the crime to the FBI. Second, the data
Effects of criminal procedure on crime rates
227
are not really uniform, as the number of cities reporting to the FBI increased over the 1957–67 period. And finally, the data are compiled by a political organization with its own agenda. Nonetheless, since the exclusionary rule was mandated in 1961 and the Uniform Crime Reports are the only state crime data available,23 we rely on these crime figures for our empirical analysis. Gary Becker’s rational choice model predicts that citizens will weigh the expected costs of committing a crime against the expected benefits.24 Criminals will participate in all crimes with a positive expected outcome. Empirical analysis that attempts to estimate a supply function for criminal behavior must, therefore, find adequate proxies for the expected costs and benefits of committing crimes. In this study, the variables that explain a state’s crime rates are similar to those used by Isaac Ehrlich.25 The variables (see Appendix Section A4) include employment rates, personal incomes, education levels, percentage of the population living in an urban setting, population age, and racial distributions. These variables have long been recognized as being as important as the more obvious explicit cost of committing a crime: the sentence received, adjusted for the probability of being caught and convicted.26 Unfortunately, we do not have the average sentence received and clearance rates by crime type, which proxy the expected sentence and the probability of being captured by the police. The omission of these variables, however, probably implies that our estimate of the effect of Mapp is conservative. If Mapp increased crime rates, judges and police should respond by substituting toward other types of punishments or deterrence. Sentences, for example, should rise with the growing crime rates. These rising sentences would counteract some of the adverse impact of Mapp on crime rates. Our estimates of Mapp include some of this offsetting effect.27 Model specifications The primary specification for this data set will be a model with state and year fixed effects. The principal model takes the form28 log (Crime)it = ai + b′xit + g × Mapp + state fixed effects + year fixed effects + eit. The Mapp variable is a dummy variable that will capture the effect of the Supreme Court’s Mapp v. Ohio decision. This dummy variable equals one if the year is 1962 or later and the state had not previously adopted the exclusionary rule. The explanatory variables in xit are used to hold constant other factors that could account for changes in crime rates. The criteria for using the data were, principally, the availability of the state data and, secondarily, variables found relevant in previous criminal empirical and theoretical works.
228
Raymond A. Atkins and Paul H. Rubin
Empirical results We begin our analysis with the state data gathered for 1958–67 to test the impact of exogenous changes in criminal procedure on state crime levels. With this data set, only the impact of Mapp v. Ohio is investigated.29 Crime rates were broken down by the type of offense, which allows the analysis to consider the effect of Mapp upon different types of criminal activity. (Appendix Section A3, provides the FBI definition of the six crime types.) Several variables were included to explain the changes in crime rates for the decade surrounding the introduction of the exclusionary rule in 1961. Becker’s rational choice model predicts the following signs for each variable: Per18–20, positive; Employ, negative; PerWhite, negative; UpperEd, negative; Urban, no prediction; Rincome, positive.30 The results, set forth in Table 10.2, imply increases in crime in jurisdictions forced by the Supreme Court’s ruling to exclude evidence of 3.9 percent for larceny, 4.4 percent for auto theft, 6.3 percent for burglary, 7.7 percent for robbery, and 18 percent for assault.31 For murder, there was a small, statistically insignificant, increase. Our analysis of Mapp is conservative, as we used the Supreme Court’s tabulation to identify those states that adopted an exclusionary rule prior to Mapp. It has been suggested that the Supreme Court may have overstated the number of states that favored the exclusionary rule. The Supreme Court had the incentive to create the appearance that the exclusionary rule it was dictating was already accepted in the majority of states as the appropriate remedy for a Fourth Amendment violation. We recomputed the regressions reported above assuming that Alabama, Maryland, South Dakota, and Oregon did not exclude evidence obtained in violation of the Fourth Amendment, as suggested by Paul Cassell.32 The Mapp dummy variable became slightly larger and noticeably more significant for all crime types (except auto theft, where the significance level remained constant). For example, under the alternative tabulation, we detected a 4.4 percent rise in larceny crimes attributable to Mapp with a t-ratio of 2.13. For the remainder of this article, we continue to use the Supreme Court’s conservative tabulation of states that had adopted an exclusionary rule before the Mapp decision.
IV. Empirical analysis: 396 cities, 1948–69 data The data and specification The data were gathered by Herbert Jacob beginning in 1978 as part of the Governmental Responses to Crime project: “The project investigated the way in which urban governments, citizens, newspapers and state governments responded to the growth and increasing complexity of crime during the period from 1948 to 1978.” The data were made available by the
.0012 (.026) −.731 (−.058) −.795 (−.650) .539 (.850) .021 (.061) −.023 (−1.72) .26E–03 (3.38) .91 479
.166 (3.21) 1.247 (.08) .427 (.31) −1.60 (−.22) −.0712 (−.18) −.00987 .411 .348E–04 (.41) .92 480
Assault .0738 (1.75) −8.321 (−.72) .668 (.60) .190 (.33) .0468 (.14) −.0212 (−1.73) .455E–03 (6.59) .94 480
Robbery .0608 (3.20) 8.163 (1.58) −.772 (−1.54) −.145 (−.563) .155 (1.09) −.0198 (−3.60) .116E–03 (3.74) .95 480
Burglary .0387 (1.91) −7.637 (−1.39) .682 (1.28) .141 (.51) −.286 (−1.88) .0263 (4.483) .145E–03 (4.38) .96 480
Larceny
Note: State data, 1958–67; all regressions also include state and year fixed effects, not shown. Values in parentheses are t-ratios.
R2 N
Rincome
Urban
UpperEd
PerWhite
Employ
Per 18–20
Mapp
Murder
Table 10.2 Changes in crime rates by type: state data
.0429 (1.65) −12.025 (−1.70) 1.175 (1.72) .194 (.54) −.319 (−1.64) −.0202 (−2.68) .399E–03 (9.37) .94 480
Auto
230
Raymond A. Atkins and Paul H. Rubin
Inter-university Consortium for Political and Social Research.33 Jacob gathered crime data from 396 cities. (The list of cities is available from the authors.) The variables (see Appendix Section A5) available for some of the years in this data set are crime rates for murder, robbery, assault, burglary, theft, and auto theft; violent crimes and property crimes; population estimates; median family incomes; percentage of the adult population with a fifth-grade education; percentage of the population that is nonwhite; civilian labor force and number of persons employed who are over 16 years old; number of persons aged 15–24 years; and percentage of families with income of less than $3,000 per year.34 The city data set offers a major advantage over the state panel data set: the varying sizes of the different cities allow us to examine the effect of Mapp on urban and suburban cities. We also use this city data set to investigate the impact of two other Supreme Court rulings in criminal procedure on crime rates: Wolf v. Colorado in 1949 and Gideon v. Wainwright 35 in 1963. In Wolf, the Supreme Court incorporated the Fourth Amendment, thereby imposing its commands on state as well as federal actors. We test to see whether the incorporation of the Fourth Amendment, without imposing any constitutional remedy, had any effect on crime rates in those states that had not adopted an exclusionary rule. In Gideon, the Supreme Court provided indigent citizens the right to a lawyer at trial. To capture the effect of that decision, a new dummy variable, Gideon, was created and equals one if the year is 1963 or later and the state did not provide indigent defendants with counsel as of 1962; it equals zero otherwise.36 Several states voluntarily adopted an exclusionary rule between 1948 and 1961. The Exclude dummy variable is designed to detect the effect of the voluntary adoption of an exclusionary rule, in contrast with the imposition of the rule on the states by Mapp. The econometric specification for this data set is the fixed-effects model with city and year effects. Empirical results The general results from the first data set carry through to this larger data set, with one interesting new observation: the exclusionary rule had a dramatically different effect on urban cities than on suburban cities. These results are somewhat counterintuitive and unexpected. The effect of Mapp on suburban cities was larger for almost every offense type. One might expect that the smaller cities with lower crime rates would not be as burdened by the heightened warrant requirement. But the data, set forth in Table 10.3, clearly say otherwise. Aggregation of the data by state would mask this effect on account of the dominance of the urban cities in any aggregation.37 The possible omission of relevant variables in this fixed-effects model may create serial correlation between the error terms for individual cities. The fixed-effects estimators would, therefore, be inefficient, and our regressions
R2 N
Per14–25
NonWhite
EduEst
MedInc
Employ
PoorEst
Gideon
Exclude
Suburban cities:a Mapp
−.61 (−3.10) .39 (1.58) .18 (.71) .10 (1.15) −2.69 (−.86) −2.4E–04 (−3.14) −.14 (−1.90) .06 (4.26) −11.80 (−2.41) .31 1,205
Murder .15 (1.32) .31 (2.25) .35 (2.50) −.02 (−.40) 3.19 (1.76) 1.5E–04 (3.49) .03 (.68) .03 (3.57) 12.70 (4.60) .64 1,300
Assault
Table 10.3 Changes in crime rates by type: city data
.20 (3.73) −.31 (−4.64) .15 (2.23) .07 (2.86) 1.74 (2.00) 1.5E–04 (7.40) .02 (1.28) .04 (9.39) 8.90 (6.73) .85 1,259
Robbery .24 (6.73) .04 (.92) −.12 (−2.60) .06 (3.92) −.44 (−.78) 1.5E–04 (11.50) −.02 (−1.63) .03 (9.36) 7.18 (8.30) .87 1,263
Burglary .12 (3.67) −.02 (−.49) .03 (.78) .02 (1.16) −1.93 (−3.55) 1.9E–04 (14.90) −.02 (−2.03) .02 (8.56) 2.03 (2.45) .88 1,263
Larceny
.33 (8.16) −.07 (−1.33) −.08 (−1.49) .06 (3.13) 3.32 (5.07) 2.0E–04 (13.10) −.02 (−1.43) .03 (10.10) 3.54 (3.56) .86 1,263 (Continued)
Auto
−.10 (−1.63) .09 (.77) −.05 (−.65) −.02 (−.76) .32 (.53) −9.6E–05 (−2.49) −.05 (−2.76) .05 (7.98) 1.96 (1.12) .31 4,313
Murder .12 (3.06) −.15 (−2.03) .00 (−.09) −.02 (−1.16) −.95 (−2.54) 1.9E–04 (8.08) .03 (2.42) .00 (.68) 8.85 (8.11) .75 4,423
Assault .06 (2.06) −.24 (−4.75) .15 (4.64) .08 (7.93) −.41 (−1.56) 1.1E–04 (6.22) −.01 (−1.60) .04 (13.90) 6.32 (8.19) .85 4,348
Robbery .05 (2.85) −.08 (−2.58) −.01 (−.29) .02 (2.33) .17 (1.01) 1.5E–04 (13.44) −.06 (−9.82) .03 (14.76) 4.08 (8.24) .82 4,370
Burglary
Notes: Values in parentheses are t-ratios. a Data are for 309 suburban cities, 1948–69; all regressions also include city and year fixed effects, not shown. b Data are for 87 central cities, 1948–69; all regressions also include city and year fixed effects, not shown.
R2 N
Per14–25
NonWhite
EduEst
MedInc
Employ
PoorEst
Gideon
Exclude
Central cities;b Mapp
Table 10.3—continued
−.04 (−2.58) −.05 (−1.66) .01 (.63) .01 (1.76) .41 (2.98) 1.4E–04 (16.06) −.03 (−6.67) .02 (10.98) 2.47 (6.07) .86 4,338
Larceny
.17 (8.44) 4.5E–03 (.12) .04 (1.67) 4.0E–03 (.52) .54 (2.85) 1.5E–04 (12.19) −.03 (−5.47) .04 (20.01) 3.17 (5.66) .80 4,369
Auto
Effects of criminal procedure on crime rates
233
may overstate the degrees of freedom and significance level of the estimates.38 To curtail the degree of potential serial correlation, we examined these city data by taking a snapshot of the cities in 1948, 1953, 1958, 1963, and 1968. We then examined this sample with a fixed-effects model. Naturally, this changed the estimates, and as expected, it also reduced the significance level of nearly all estimates. The Exclude dummy variable was statistically significant only for robbery crimes, and Gideon lost statistical significance for all crimes in both urban and suburban cities. But the impact of Mapp remained detectable. The regression analysis continued to detect a larger impact of the Mapp decision on suburban cities than on larger ones. Following Mapp, the regression analysis detected a statistically significant increase in crimes of burglary, larceny, and auto theft in urban cities. Similarly, the regression analysis detected a statistically significant increase in crimes of robbery, burglary, larceny, and auto theft in suburban cities. And the model continued to do a poor job of explaining the variations in murder rates among these 396 cities. The effect of Gideon v. Wainwright, the landmark decision that gave every indigent defendant the right to trial, may have increased crime rates in those 15 states that chose not to provide counsel to these defendants. The regression results, provided above, indicate that robbery and assaults dramatically increased following the Gideon ruling, with a larger and more significant increase in suburban cities than in urban cities. One might question why Gideon would have had such a dramatic impact on crime rates. The answer is simple—in some states, Gideon was applied retroactively. This meant that criminals currently incarcerated without having counsel present were eligible for a new trial. In Florida, the state that convicted Clarence Earl Gideon for breaking and entering, 1,976 prisoners were released outright and another 500 were back in court by January 1, 1964.39 This mass release of indigent men (men who could not afford an attorney) with a disposition for committing crimes could alone be responsible for the observed increase in crime following the Gideon ruling. The aggregate effect of the Mapp and Gideon rulings on total crimes, violent crimes, and property crimes is presented in Table 10.4 The voluntary decision by states to adopt an exclusionary rule did not have the same effect on crime rates as did the imposition of the rule by the Supreme Court. Between 1948 and 1961, four states voluntarily adopted an exclusionary rule: California, Delaware, North Carolina, and Rhode Island. While the Exclude dummy variable was not always statistically significant for all crime categories, it did detect a negative correlation between the decision to adopt the exclusionary rule and crime rates in those states. We anticipated that those states with lower crime rates might be more willing to adopt an exclusionary rule. The result is also consistent with the argument that states free to choose the method of enforcing Fourth Amendment violations would voluntarily adopt the most efficient deterrent remedy from the menu of possible remedies.
.19 (7.40) −.03 (−1.04) −.01 (−.26) .05 (4.40) −.61 (−1.45) 1.8E–04 (18.2) −.02 (−1.72) .02 (12.1) 3.23 (4.93) .92 1.201
.27 (5.10) −.22 (−3.35) .10 (1.43) .06 (2.68) 2.84 (3.32) 2.2E–04 (10.62) .09 (4.84) .03 (7.43) 8.72 (6.53) .88 1.201
.19 (7.20) −.02 (−.65) −.02 (−.59) .04 (3.93) −.95 (−2.24) 1.8E–04 (17.92) −.03 (−3.15) .02 (11.89) 3.48 (5.42) .92 1.263
.04 (3.04) −.07 (−3.26) .01 (.95) .02 (4.07) .38 (3.36) 1.5E–04 (20.04) −.03 (−8.57) .02 (20.10) 2.88 (8.48) .89 4.269
.14 (5.27) −.23 (−4.98) .08 (2.61) .03 (2.80) −.62 (−2.56) 1.7E–04 (10.93) .02 (2.84) .02 (8.90) 7.32 (10.22) .86 4.303
Violent crime
Note: Data are for 396 cities, 1948–69; all regressions also include state and year fixed effects, not shown. Values in parentheses are t-ratios.
R2 N
Per14–25
NonWhite
EduEst
MedInc
Employ
PoorEst
Gideon
Exclude
Mapp
Total crime
Property crime
Total crime
Violent crime
Central cities only
Suburban cities only
Table 10.4 Effect of the Mapp and Gideon rulings on crime
.03 (2.63) −.05 (−2.18) 3.2E–03 (.23) .02 (3.35) .42 (3.67) 1.5E–04 (19.65) −.04 (−9.85) .02 (19.22) 2.70 (7.95) .89 4.338
Property crime
Effects of criminal procedure on crime rates
235
In Wolf, the Supreme Court held that the directives of the Fourth Amendment bound the states. But the Court went on to declare that states could choose the method of enforcing Fourth Amendment rights. It may have appeared settled, following Wolf, that state court judges need not worry about continuing federal supervision and review of their decisions not to exclude evidence obtained in violation of the Fourth Amendment. To detect whether Wolf had an impact on state judges, police, and criminals, we truncate our sample to look only at the years 1948–52. The dummy variable of interest is a Wolf dummy variable, which equals one after 1949 if the state did not exclude evidence and zero otherwise. The results are presented in Table 10.5. The results from the fixed-effects model with city and year effects reveal that Wolf may have had a small but statistically significant negative impact on crime. Crime rates fell in those states that were freed from the uncertainty regarding federal oversight of their decisions to use unlawfully obtained evidence at trial. We suggest that following Wolf, state courts felt more comfortable using unlawfully obtained evidence at trial, and the apparent sanctioning of that behavior by the Supreme Court resulted in lower crime rates.
V. The impact of Mapp over time In addition to exploring the possibility that Mapp has a different effect on suburban cities than on urban cities, we use the city data set to explore the long-term versus short-term effect of Mapp. This specification utilized takes the form log (Crime)it = ai + tt + b′xit + g′tt × Mapp + eit. The vector of estimates, g′, will reveal the short-run and long-run impact of the Supreme Court’s Mapp ruling. In effect, each year was given its own Mapp dummy variable. These Mapp variables for 1951–68 were introduced (4 years were dropped to utilize a fixed-effects model) to reveal the year-byyear difference in crime rates between those states that excluded unlawfully obtained evidence from trial and those that utilized the evidence. Focusing only on those offense types for which a statistically significant impact was detected, we find that the dynamic effect of the Mapp ruling is revealed in Figure 10.1. This graph illustrates the year-by-year impact of Mapp. Not surprisingly, for 1951–62, those states that did not exclude tainted evidence had lower crime rates. Following 1962, crime rates rose sharply as compared with the crime rates of those cities that already excluded unlawfully obtained evidence, an effect that continued to increase over time.
.0297 (.19) −59.778 (−.58) −5.530 (−1.09) .581E–03 (1.24) .247 (1.56) −.0618 (−.99) 16.487 (.96) .52 1,029
Robbery −.0783 (−1.49) 32.351 (.90) .887 (.50) .931E–04 (.57) −.0170 (−.31) .0562 (2.60) 5.812 (.98) .87 1,027
Assault
−.0186 (−.20) 2.791 (.46) .957 (.322) −.163E–03 (−.55) −.116 (−1.20) .116 (2.73) −1.648 (−.16) .80 1,029
−.0649 (−1.99) 70.019 (3.14) 2.086 (1.91) .190E–03 (1.89) −.134 (−3.94) .0373 (2.79) 4.000 (1.08) .83 1,028
Burglary −.0537 (−2.29) −9.577 (−.57) 1.979 (2.57) .201E–04 (.28) −.0195 (−.79) .0379 (3.69) .667 (.25) .93 1,010
Larceny
Note: Data are for 396 cities, 1948–52, all regressions also include city and year fixed effects, not shown. Values in parentheses are t-ratios.
R2 N
Perl4–25
NonWhite
EduEst
MedInc
PerEmploy
PoorEst
Wolf
Murder
Table 10.5 Effect of the Wolf ruling on crime
−.0799 (−2.16) −16.746 (−.66) .529 (.42) .180E–04 (.15) −.111 (−2.89) .0643 (4.22) 9.705 (2.31) .83 1,029
Auto
Effects of criminal procedure on crime rates
237
Figure 10.1 Dynamic impact of Mapp.
VI. Conclusion The goal of our research was to test the application of the prevalent economic theories of criminal behavior in criminal procedure, which might alter any subsequent normative analysis of criminal procedure. We accomplished this goal. We observed substantial but predictable results from changing criminal procedure. We found a positive and significant effect of the Supreme Court’s alteration of criminal procedure on the crime rates of those states affected. Looking at aggregated state data, Mapp increased crimes of larceny by 3.9 percent, auto theft by 4.4 percent, burglary by 6.3 percent, robbery by 7.7 percent, and assault by 18 percent. Moreover, these results mask larger impacts in suburban cities—where the imposition of the exclusionary rule increased violent crimes by 27 percent and property crimes by 20 percent. Our results imply that part of the increase in crime rates experienced in the 1960s was due to the tougher criminal procedure rules that favored defendants.40 These increases in crime rates are a weighty cost attached to each of the Supreme Court’s decisions to change criminal procedure. Society may decide the benefits of our new protections are worth these costs, but an informed debate requires that these costs be known and considered.
238
Raymond A. Atkins and Paul H. Rubin
Appendix A A1. STATE EXCLUSIONARY RULES, 1961
Admissible (24 total): Arizona, Arkansas, Colorado, Connecticut, Georgia, Iowa, Kansas, Louisiana, Maine, Massachusetts, Minnesota, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Dakota, Ohio, Pennsylvania, South Carolina, Utah, Vermont, and Virginia. Excludable (24 total): Alabama, California, Delaware, Florida, Idaho, Illinois, Indiana, Kentucky, Maryland, Michigan, Mississippi, Missouri, Montana, North Carolina, Oklahoma, Oregon, Rhode Island, South Dakota, Tennessee, Texas, Washington, West Virginia, Wisconsin, and Wyoming. Source: Elkins v. United States, 364 U.S. 206 (1960) (appendix) (Alaska and Hawaii excluded). A2. RIGHT TO COUNSEL AT TRIAL, 1962
Granted: Arizona, Arkansas, California, Georgia, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Massachusetts, Michigan, Minnesota, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Dakota, Ohio, Oklahoma, Oregon, South Dakota, Tennessee, Texas, Utah, Washington, West Virginia, Wisconsin, and Wyoming. Not automatically granted: Alabama, Colorado,41 Connecticut, Delaware, Florida, Maine, Maryland, Mississippi, North Carolina, Pennsylvania, Rhode Island, South Carolina, Vermont, and Virginia. A3. DEFINITIONS OF CRIME CLASSIFICATIONS
Murder (Criminal Homicide): Murder and Nonnegligent Manslaughter. All willful felonious homicides as distinguished from deaths caused by negligence. Excludes attempts to kill, assaults to kill, suicides, accidental deaths, and justifiable homicides. Justifiable homicide are limited to (a) the killing of a person by a peace officer and (b) the killing by a private citizen of a person in the act of committing a felony. Robbery. Stealing or taking anything of value from the care, custody, or control of a person by force or violence or by putting that person in fear, such as strongarm, robbery, stickups, armed robbery, assault to rob, and attempts to rob. Assault (Aggravated Assault). Assault with intent to kill or for the purpose of inflicting severe bodily injury by shooting, cutting, stabbing, maiming, poisoning or scalding by the use of acids, explosives, or other means. Excludes simple assault, assault and battery, fighting, and the like. Burglary (Breaking or Entering). Burglary, housebreaking, safecracking,
Effects of criminal procedure on crime rates
239
Table 10.A1 Percentage change in total per capita crime rates State
Average per capita total crime rate, 1958–60
Average per capita total crime rate, 1962–64
Percentage change
Admit evidence prior to 1961?
Massachusetts Nevada Connecticut New Jersey Illinois Minnesota Nebraska Oregon South Carolina North Dakota Maryland Delaware Utah Georgia New York Indiana Wisconsin New Hampshire Missouri Kansas Colorado Vermont Kentucky Louisiana Iowa Michigan Tennessee Ohio Mississippi Montana California Idaho North Carolina Virginia Alabama Arizona Pennsylvania Washington Texas Oklahoma Florida Maine New Mexico Arkansas West Virginia South Dakota Wyoming Rhode Island
703 1,826 652 847 1,157 639 464 836 776 343 911 876 834 829 976 805 462 404 1,066 629 1,195 454 741 804 470 1,106 806 700 393 862 1,781 672 662 797 732 1,600 646 986 1,059 944 1,430 511 1,181 582 444 547 818 1,264
1,176 2,908 958 1,228 1,665 916 659 1,158 1,075 474 1,235 1,187 1,124 1,114 1,304 1,063 606 529 1,379 811 1,535 581 939 1,006 586 1379 1,005 867 486 1,059 2,186 820 806 969 886 1,924 766 1,166 1,237 1,100 1,640 576 1,304 642 489 599 855 1,258
67.2 59.2 46.9 45.1 43.9 43.4 42.0 38.6 38.4 38.2 35.6 35.5 34.7 34.4 33.6 32.0 31.2 31.2 29.3 28.9 28.4 28.1 26.8 25.1 24.9 24.8 24.6 23.9 23.5 22.9 22.7 22.1 21.6 21.5 21.1 20.3 18.6 18.3 16.8 16.6 14.7 12.8 10.4 10.4 10.1 9.4 4.5 −.41
Y Y Y Y N Y Y N Y Y N N Y Y Y N N Y N Y Y Y N Y Y N N Y N N N N N Y N Y Y N N N N Y Y Y N N N N
Source: U.S. Department of Justice, Federal Bureau of Investigation, Crime in the United States: Uniform Crime Reports (1958–64).
240
Raymond A. Atkins and Paul H. Rubin
or any breaking into or unlawful entry of a structure with the intent to commit a felony or a theft. Includes attempts. Larceny Theft (except Auto Theft). Stealing property $50 and over in value. Thefts of bicycles, automobile accessories, shoplifting, pocket picking, or any stealing of property or article of value that is not taken by force and violence or fraud. Excludes embezzlement, con games, forgery, worthless checks, and so on. Auto Theft. Stealing or driving away and abandoning a motor vehicle. Excludes taking for temporary or unauthorized use by those having lawful access to the vehicle. Source: U.S. Department of Justice, Federal Bureau of Investigation, Crime in the United States: Uniform Crime Reports 61 (1970). A4. VARIABLE DESCRIPTIONS: STATE DATA
In(Crime): Log of per capita crime rates. Source: Vital Statistics of the United States (homicide data); U.S. Department of Justice, Federal Bureau of Investigation, Crime in the United States: Uniform Crime Reports (1970) (all other crime types). Per18–20: Percentage of the population between ages 18 and 20. Source: U.S. Bureau of the Census, Current Population Reports (various years). Employ: Percentage of the population employed. Source: Social Security Bulletin (Annual Statistical Supp.) (various years). PerWhite: Percentage of the population that is white, extrapolated from census data. Source: Statistical Abstract of the United States (various years). UpperEd: Percentage of the population that is in postsecondary schools divided by Per18–20. Source: Statistical Abstract of the United States (various years). Urban: Percentage of the population living in metropolitan areas. Source: Statistical Abstract of the United States (various years). Rincome: Real per capita income. Source: U.S. Department of Commerce, Bureau of Economic Analysis, Survey of Current Business (various years). Mapp: The exclusionary rule dummy variable. This dummy variable equals one if the year is 1962 or later and the state had not previously adopted the exclusionary rule, and zero otherwise. A5. VARIABLE DESCRIPTIONS: CITY DATA
In(Crime): The natural log of the per capita crime rate. Source: U.S. Department of Justice, Federal Bureau of Investigation, Uniform Crime Reports (various years). PoorEst: Percentage of families with income less than $3,000. Source: Census data for 1950, 1960, 1970. PerEmploy: Percentage of the cities’ population employed and over 16 years old. Source: Census data for 1950, 1960, 1970.
Effects of criminal procedure on crime rates
241
MedInc: Median family income divided by the Consumer Price Index. Source: City County Data Book (various years). EduEst: Percentage of the population 25 years and older with less than a fifth-grade education. Source: City County Data Book (various years). NonWhite: Percentage of nonwhite population. Source: Census data for 1950, 1960, and 1970. Per14–25: Percentage the population aged 15–24 years. Source: Jacob Herbert, Governmental Responses to Crime in the United States, 1948–1978, Part I: Baseline (Study No. 8076, Inter-university Consortium for Political and Social Research 1985). PopEst: Intercensus population estimate. Mapp: A dummy variable that equals one if the state did not exclude evidence as of 1961 and the year is 1961 or later, and zero otherwise. Gideon: A dummy variable that equals one if the state did not provide indigent defendants with counsel at trial and the year is 1962 or later, and zero otherwise. Wolf: A dummy variable that equals one if the state did not exclude evidence as of 1949 and the year is 1949 or later, and zero otherwise. Exclude: A dummy variable that equals one the year a state voluntarily adopts the exclusionary rule. For all but four states, this dummy variable is always zero.
Notes *
This paper has benefited from the suggestions and observations of the following people: Peter H. Aranson, Martin J. Bailey, Robert E. Carpenter, Paul G. Cassell, Christopher Curran, Andrew Dougherty, Isaac Ehrlich, Barry Hirsch, D. Bruce Johnsen, John Lott, Richard Posner, Hugh Spall, and Richard Murphy. This research was sponsored by a grant from the Donner Foundation. Data for the empirical section was made available, in part, by the Inter-university Consortium for Political and Social Research (ICPSR Study No. 8076 (Governmental Responses to Crime in the United States, 1948–1978) and Study No. 7716 (Deterrent Effects of Punishment on Crime Rates, 1959–1960)). 1 An exception is Isaac Ehrlich & George D. Brower, On the Issue of Causality in the Economic Model of Crime and Law Enforcement: Some Theoretical Considerations and Experimental Evidence, 77 Am. Econ. Rev. Papers & Proc. 99 (1987). This paper incorporated all changes in criminal procedure into a weighted proxy, which was introduced as an explanatory variable in a multiple-equation system to explain movement in national crime rates over 3 decades. The analyses supports the proposition that criminal procedure is a significant determinant of criminal behavior, but, as we show, a much more detailed analysis is possible. In contrast, economics journals are filled with extensive empirical analyses into the effect of changes in punishment, conviction rates, clearance rates, unionization, gun control, and various socioeconomic factors on criminal behavior. 2 Miranda v. Arizona, 384 U.S. 436 (1966); Gideon v. Wainwright, 372 U.S. 335 (1963); Mapp v. Ohio, 367 U.S. 643 (1961). 3 See Bruce D. Meyer, Natural and Quasi-Experiments in Economics, 13 J. Bus. & Econ. Stat. 151 (1995). We examined the states’ decisions to admit or exclude evidence obtained unlawfully with a probit model. Using 1960 state data, we
242
4 5 6
7
8
9
10 11
12
13
Raymond A. Atkins and Paul H. Rubin
detected no statistically significant relationship between various social-economic characteristics of the states and the decision to admit unlawfully obtained evidence. We did observe, however, a weak correlation between a few state characteristics and the decision to admit. In general, those states that admitted the evidence had lower crime rates, more time served, higher family incomes, and smaller populations. Any state characteristics that influenced the decision to voluntarily adopt an exclusionary rule were swept from our subsequent analysis by using a fixed-effects regression model, as discussed in Section III. Jarett B. Decker, The 1995 Crime Bill: Is the GOP the Party of Liberty and Limited Government? 20 (Cato Pol’y Analysis No. 229, Cato Inst. 1995). Contract with America: The Bold Plan by Rep. Newt Gingrich, Rep. Dick Armey, and the House Republicans to Change the Nation 52–53 (Ed Gillespie & Bob Schellhas eds. 1994). For some nonobvious and interesting implications of alternative procedural rules, see William J. Stuntz. The Uneasy Relationship between Criminal Procedure and Criminal Justice, 107 Yale L. J. 1 (1997). For example, Stuntz argues that it is cheaper to contest procedural than factual claims, so the system will be biased toward disputing procedural issues and away from examining factual matters. The difference in magnitude, 12 to 31 percent, is due to the expanded period of comparison. By comparing years that are further apart, a larger percentage increase in crime rates is expected, even if the year-by-year increase in crime rates stays constant. Appendix A, Table 10.A1, provides the state-by-state breakdown of the percentage change in crime rates following 1961. “The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated; and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.” U.S. Const. amend. IV. The Fourth Amendment does not in fact require a search warrant before all searches and seizures. Rather, the language suggests that “reasonable” searches and seizures are permitted with or without a search warrant issued with probable cause. The Supreme Court—in a series of decisions—declared the general, brightline rule that a search without a search warrant is per se unreasonable. While there has been some movement by the Court back to a reasonableness standard, a warrant is still generally required. Weeks v. United States, 232 U.S. 383 (1914). See generally Craig M. Bradley, The Emerging International Consensus as to Criminal Procedure Rules, 14 Mich. J. Int’l L. 171 (1993). As was stated by Malcolm Wilkey, “[O]ne proof of the irrationality of the exclusionary rule is that no other civilized nation in the world has adopted it.” Malcolm R. Wilkey, The Exclusionary Rule: Why Suppress Valid Evidence? 62 Judicature 214, 216 (1978). However, several countries, including Canada, Germany, Australia, and Italy, have a discretionary exclusionary rule. See Robert Harve & Hamar Foster, When the Constable Blunders: A Comparison of the Law of Police Interrogation in Canada and the United States, 19 Seattle U. L. Rev. 497 (1996); Craig M. Bradley, The Exclusionary Rule in Germany, 96 Harv. L. Rev. 1032, 1032–35 (1983). Wolf v. Colorado, 338 U.S. 25 (1949). To impose an exclusionary rule on all of the states, the Supreme Court needed to resolve two important constitutional questions. First, did the Due Process Clause of the Fourteenth Amendment dictate the inclusion of the Fourth Amendment, therefore allowing the Supreme Court to regulate the criminal proceedings of the individual states? And second, was the exclusionary rule a constitutionally mandated remedy for a breach of the Fourth Amendment by state officials? Wolf, 338 U.S. at 27–28.
Effects of criminal procedure on crime rates
243
14 Mapp, 367 U.S. at 643. 15 Note, Effect of Mapp v. Ohio on Police Search and Seizure Practices in Narcotics Cases, 4 Colum. J. L. & Soc. Problem 87 (1968); S. S. Nagel, Testing the Effects of Excluding Illegally Seized Evidence, Wisc. L. Rev. 275 (1965); Dallin H. Oaks, Studying the Exclusionary Rule in Search and Seizure, 37 U. Chi. L. Rev. 665 (1970); J. E. Spiotto, Search and Seizure: An Empirical Study of the Exclusionary Rule and Its Alternatives, 1 J. Legal Stud. 243 (1972); Bradley Canon, The Exclusionary Rule: Have Critics Proven That It Doesn’t Deter Police? 62 Judicature 398 (1979). 16 United States v. Janis, 428 U.S. 443, 450 n.22 (1976). 17 For example, see Evan Osborne, Is the Exclusionary Rule Worthwhile? 17 Contemp. Econ. Pol’y 381 (1999); Comptroller General, Impact of the Exclusionary Rule on Federal Criminal Prosecutions (B-171019, 1979); U.S. Department of Justice, The Effects of the Exclusionary Rule: A Study in California (1983); Canon, supra note 15; Bradley Canon, Is the Exclusionary Rule in Failing Health? Some New Data and a Plea against a Precipitous Conclusion, 62 Ky. L. J. 681 (1974); Lawrence Crocker, Can the Exclusionary Rule be Saved? 84 J. Crim. L. & Criminology 310 (1993); Thomas Y. Davies, A Hard Look at What We Know (and Still Need to Learn) about the “Costs” of the Exclusionary Rule: The NIJ Study and Other Studies of “Lost” Arrests, 3 Am. B. Found. Res. J. 611 (1983); Peter F. Nardulli, The Social Cost of the Exclusionary Rule: An Empirical Assessment, Am. B. Found. Res. J. 585 (1983); S. R. Schlesinger, The Exclusionary Rule: Have Proponents Proven That It is a Deterrent to Police? 62 Judicature 404 (1979); Wilkey, supra note 11. 18 American Bar Association, Special Committee on Criminal Justice in a Free Society, Criminal Justice in Crisis: A Report to the American People and the American Bar on Criminal Justice in the United States: Some Myths, Some Realities, and Some Questions for the Future 21 (1988). 19 Id. 20 See generally Gary S. Becker, Crime and Punishment: An Economic Approach, 76 J. Pol. Econ. 169 (1968). 21 For a more detailed economic model of the search warrant process, see Raymond A. Atkins, The Warrant Process (unpublished Ph.D. dissertation, Emory Univ., Dep’t Econ. 1998). 22 U.S. Department of Justice, Federal Bureau of Investigation, Crime in the United States: Uniform Crime Reports (various years). 23 Except for homicide data compiled in the Vital Statistics Report, which are also used in this study. 24 Becker, supra note 20. 25 Isaac Ehrlich, Participation in Illegitimate Activities: A Theoretical and Empirical Investigation, 81 J. Pol. Econ. 521 (1973). 26 See, for example, Jonathan M. Karpoff & John R. Lott, Jr., The Reputational Penalty Firms Bear from Committing Criminal Fraud, 36 J. Law & Econ. 757 (1993). 27 Police expenditures were available for both the state and city data. This variable might present a proxy for the probability of capture. The inclusion of this variable did not change materially any of the estimated coefficients. The estimated coefficients were, in most instances, statistically insignificant. But as police expenditures may be a function of crime rates, and thus endogenous to the model, the estimated ordinary least squares coefficients are inconsistent. See William H. Greene, Econometric Analysis 579 (2nd ed. 1993). To account for any potential simultaneous-equation bias, we recomputed our estimates using an instrumental variables regression (with state and year effects) where the lag of judicial expenditures was used as an instrument for police expenditures. The significance of the
244
28 29 30
31 32
33
34
35 36
37
38
Raymond A. Atkins and Paul H. Rubin
Mapp coefficients dropped slightly for some of the crime types, but the analysis and results were not significantly different from the results from the ordinary least squares specification reported herein. The variable Crime includes the crime rates for murder, assault, robbery, burglary, larceny, and auto theft. Thus, for each specification developed, we will perform six separate regressions. In Section IV, we use the larger city data set to investigate the impact of Gideon and Wolf. To test the robustness of these results, several specifications were examined. These included the ordinary least squares results with no state or year effects, a fixedeffects model with only state effects, a random-effects model with only state effects, and a random-effects model with both state and year effects. The random-effects model produced nearly identical estimates as its fixed-effects counterpart. Ordinary least squares without state effects detected dramatically larger effects from the Mapp ruling, but this may be due to endogeneity bias; states with lower crime rates may have voluntarily chosen to adopt an exclusionary rule. This potential endogenity bias is corrected with the fixed-effects specification. Specification tests rejected the nonlogarithmic specification for the logarithmic specification. The percentage effect of the Mapp dummy is 100 × [exp(Estimated Mapp Coefficient) – 1]. Paul G. Cassell, The Mysterious Creations of Search and Seizure Exclusionary Rules under State Constitutions: The Utah Example, 1993 Utah L. Rev. 751, 793 n.272 (questioning whether Alabama, Maryland, South Dakota, and Oregon had embraced the exclusionary rule as the Supreme Court suggested). Jacob Herbert, Governmental Responses to Crime in the United States, 1948– 1978, Part I: Baseline (Study No. 8076, Inter-university Consortium for Political and Social Research 1985) (http://www.icpsr.umich.edu/cgi/archive.prl?study= 8076). Some of the socioeconomic variables like median income, age, education, and so on, are provided only for the census years. Estimates were constructed on the basis of a linear logarithmic function. Observations where crime rates were missing are dropped from the analysis. Wolf, 338 U.S. at 25; Gideon, 372 U.S. at 335. See Appendix A, Section A2, for a list of those states that provided indigent defendants counsel at trial. These tabulations are not as reliable as the Mapp split, because there is considerable uncertainty regarding which states did not, in fact, provide counsel to indigent defendants. While many states did not require trial judges to provide counsel to indigent defendants, letters from prosecuting attorneys and attorney generals indicate that it was the practice in many states to do so. See Yale Kamisar, The Right to Counsel and the Fourteenth Amendment: A Dialogue on “the Most Pervasive Right” of an Accused, 30 U. Chi. L. Rev. 1, 67 (1962). For the Gideon dummy variable, we presume that the Supreme Court’s ruling will have some effect on those states in which indigent defendants had no guaranteed right to counsel. As with the state data set, multiple specifications were examined with the city data set. We examined a random-effects model with city and year effects, and fixed-and random-effects models with only city effects, all computed with the singleequation and multiple-equation specifications (when police expenditure was introduced to the regressions). The results presented in this paper are representative of the results from all of these different specifications; the Mapp ruling increased crime rates in those states that did not already exclude tainted evidence, and the effect was more pronounced in suburban cities. See Alok Bhargava, L. Franzini, & W. Narendranathan, Serial Correlation and the Fixed Effects Model, 49 Rev. Econ. Stud. 533 (1982).
Effects of criminal procedure on crime rates
245
39 Anthony Lewis, Gideon’s Trumpet 215 (1964). 40 Miranda, 384 U.S. at 436, is another change in procedural safeguards that may have increased crime rates. The literature supports the conclusion that Miranda seriously hampered police investigation techniques. See Paul G. Cassell, Miranda’s Social Costs: An Empirical Reassessment, 90 Nw. U. L. Rev. 391 (1996); Paul G. Cassell, All Benefits, No Costs: The Grand Illusion of Miranda’s Defenders, 90 Nw. U. L. Rev. 1084, 1090 (1996); and Paul G. Cassell & Richard Fowles, Handcuffing the Cops? A Thirty-Year Perspective on Miranda’s Harmful Effects on Law Enforcement, 50 Stan. L. Rev. 1055 (1998). Cassell found a dramatic decrease in confessions following Miranda. Compare Stephen J. Schulhofer, Miranda’s Practical Effect: Substantial Benefits and Vanishingly Small Social Costs, 90 Nw. U. L. Rev. 500 (1991); John J. Donohue III, Did Miranda Diminish Police Effectiveness? 50 Stan. L. Rev. 1147 (1998). Our research implies that Miranda would increase crime rates. We could not, however, test this hypothesis by comparing one set of states against another. In Miranda, unlike Gideon, Mapp, and Wolf, the Supreme Court created a new procedural rule with no counterpart in any state. We therefore have no control group of unaffected states. 41 Colorado had chosen to provide indigent defendants the right to counsel just the year prior to the 1962 Supreme Court ruling. We grouped Colorado with those states that did not automatically grant the right to counsel, as Gideon could still affect Colorado because of the possible retroactive effect of the decision. See Kamisar, supra note 36.
References American Bar Association. Special Committee on Criminal Justice in a Free Society. Criminal Justice in Crisis: A Report to the American People and the American Bar on Criminal Justice in the United States: Some Myths, Some Realities, and Some Questions for the Future. Washington, D.C.: American Bar Association, Criminal Justice Section, 1988. Becker, Gary S. “Crime and Punishment: An Economic Approach.” Journal of Political Economy 76 (1968): 169–217. Bhargava, Alok; Franzini, L.; and Narendranathan, W. “Serial Correlation and the Fixed Effects Model.” Review of Economic Studies 49 (1982): 533–49. Bradley, Craig M. “The Exclusionary Rule in Germany.” Harvard Law Review 96 (1983): 1032–66. Bradley, Craig M. “The Emerging International Consensus as to Criminal Procedure Rules.” Michigan Journal International Law Review 14 (1993): 171–221. Canon, Bradley. “Is the Exclusionary Rule in Failing Health? Some New Data and a Plea against a Precipitous Conclusion.” Kentucky Law Journal 62 (1974): 681–730. Canon, Bradley. “The Exclusionary Rule: Have Critics Proven That It Doesn’t Deter Police?” Judicature 62 (1979): 398–403. Cassell, Paul G. “The Mysterious Creations of Search and Seizure Exclusionary Rules under State Constitutions: The Utah Example.” Utah Law Review (1993): 751–873. Cassell, Paul G. “All Benefits, No Costs: The Grand Illusion of Miranda’s Defenders.” Northwestern University Law Review 90 (1996): 1084–1124. Cassell, Paul G. “Miranda’s Social Costs: An Empirical Reassessment.” Northwestern University Law Review 90 (1996): 387–499. Cassell, Paul G., and Fowles, Richard. “Handcuffing the Cops? A Thirty-Year Perspective on Miranda’s Harmful Effects on Law Enforcement.” Stanford Law Review 50 (1998): 1055–1145.
246
Raymond A. Atkins and Paul H. Rubin
Comptroller General of the United States. “Impact of the Exclusionary Rule on Federal Criminal Prosecutions.” B-171019. Washington, D.C.: U.S. General Accounting Office, 1979. Crocker, Lawrence. “Can the Exclusionary Rule Be Saved?” Journal of Criminal Law and Criminology 84 (1993): 310–51. Davies, Thomas Y. “A Hard Look at What We Know (and Still Need to Learn) about the ‘Costs’ of The Exclusionary Rule: The NIJ Study and Other Studies of ‘Lost’ Arrests.” American Bar Foundation Research Journal (1983): 611–90. Decker, Jarett B. “The 1995 Crime Bill: Is the GOP the Party of Liberty and Limited Government?” Cato Policy Analysis No. 229. Washington, D.C.: Cato Institute, June 1, 1995. Donohue, John J., III. “Did Miranda Diminish Police Effectiveness?” Stanford Law Review 50 (1998): 1147–80. Ehrlich, Isaac. “Participation in Illegitimate Activities: A Theoretical and Empirical Investigation.” Journal of Political Economy 81 (1973): 521–65. Ehrlich, Isaac, and Brower, George D. “On the Issue of Causality in the Economic Model of Crime and Law Enforcement: Some Theoretical Considerations and Experimental Evidence.” American Economic Review: Papers and Proceedings 77 (1987): 99–106. Gillespie, Ed, and Schellhas, Bob, eds. Contract with America: The Bold Plan by Rep. Newt Gingrich, Rep. Dick Armey, and the House Republicans to Change the Nation. New York: Times Books, 1994. Greene, William H. Econometric Analysis. 2d ed. New York: Macmillan, 1993. Harve, Robert, and Foster, Hamar. “When the Constable Blunders: A Comparison of the Law of Police Interrogation in Canada and the United States.” Seattle University Law Review 19 (1996): 497–537. Kamisar, Yale. “The Right to Counsel and the Fourteenth Amendment: A Dialogue on ‘the Most Pervasive Right’ of an Accused.” University of Chicago Law Review 30 (1962): 1–77. Karpoff, Jonathan M., and Lott, John R., Jr. “The Reputational Penalty Firms Bear from Committing Criminal Fraud.” Journal of Law and Economics 36 (1993): 757–802. Lewis, Anthony. Gideon’s Trumpet. New York: Random House, 1964. Meyer, Bruce D. “Natural and Quasi-Experiments in Economics.” Journal of Business and Economic Statistics 13 (1995): 151–61. Nagel, S. S. “Testing the Effects of Excluding Illegally Seized Evidence.” Wisconsin Law Review (1965): 275–310. Nardulli, Peter F. “The Social Cost of the Exclusionary Rule: An Empirical Assessment.” American Bar Foundation Research Journal (1983): 585–609. Note. “Effect of Mapp v. Ohio on Police Search and Seizure Practices in Narcotics Cases.” Columbia Journal of Law and Social Problems 4 (1968): 87. Oaks, Dallin H. “Studying the Exclusionary Rule in Search and Seizure.” University of Chicago Law Review 37 (1970): 665–757. Osborne, Evan. “Is the Exclusionary Rule Worthwhile?” Contemporary Economic Policy 17 (1999): 381–89. Schlesinger, S. R. “The Exclusionary Rule: Have Proponents Proven That It Is a Deterrent to Police?” Judicature 62 (1979): 404–9. Schulhofer, Stephen J. “Miranda’s Practical Effect: Substantial Benefits and Vanishingly Small Social Costs.” Northwestern University Law Review 90 (1996): 500–563.
Effects of criminal procedure on crime rates
247
Spiotto, J. E. “Search and Seizure: An Empirical Study of the Exclusionary Rule and Its Alternatives.” Journal of Legal Studies 2 (1972): 243–78. Stuntz, William J. “The Uneasy Relationship between Criminal Procedure and Criminal Justice.” Yale Law Journal 107 (1997): 1–76. U.S. Department of Justice. Federal Bureau of Investigation. Crime in the United States: Uniform Crime Reports. Washington, D.C.: U.S. Department of Justice, Federal Bureau of Investigation, various years. U.S. Department of Justice. National Institute of Justice. The Effects of the Exclusionary Rule: A Study in California. Washington, D.C.: U.S. Department of Justice, National Institute of Justice, 1983. Wilkey, Malcolm R. “The Exclusionary Rule: Why Suppress Valid Evidence?” Judicature 62 (1978): 214–32.
11 An economic theory of the Fifth Amendment Hugo M. Mialon *
The Fifth Amendment is an old friend and a good friend. It is one of the great landmarks in men’s struggle to be free of tyranny, to be decent and civilized. William O. Douglas (1898–1980), U.S. Supreme Court Justice
1. Introduction The Fifth Amendment to the Constitution of the United States guarantees that: No person (. . .) shall be compelled in any Criminal Case to be a witness against himself, nor be deprived of life, liberty, or property, without due process of law.1, 2 This paper studies important aspects of the Fifth Amendment’s right to silence and right to due process of law. Right to silence. The clause against self-incrimination implies that a criminal defendant has a right to remain silent at his own trial. But if the defendant’s silence were to lead the jury to draw an inference against the defendant, the right to silence would be a limited one. In Griffin v. California (April 5, 1965), the Supreme Court held that a comment by the prosecution on the defendant’s failure to testify violated the Fifth Amendment. In Carter v. Kentucky (January 14, 1981), the Court held that the Fifth Amendment required an instruction to the jury that no inference be drawn from the defendant’s silence: No judge can prevent jurors from speculating about why a defendant stands mute in the face of a criminal accusation, but a judge can, and must, if requested to do so, use the unique power of the jury instruction to reduce that speculation to a minimum. (450 U.S. at 303) Northern Ireland abolished the right to silence in 1988, and England followed in 1994, mainly to facilitate the conviction of suspected terrorists.
An economic theory of the Fifth Amendment
249
England’s Criminal Justice and Public Order Act 1994 (Part 3, Section 35.2) specifies that “it will be permissible for the court or jury to draw such inferences as appear proper from the failure of the accused to give evidence or his refusal, without good cause, to answer any question.”3 What is the effect of the right to silence on social welfare? To answer this question, a model of criminal trials is developed, and the right to silence is given formal expression in terms of the parameters. In the model, before the defense makes its case, the jury has a posterior over the defendant’s innocence, given all information available to date. The right to silence is the requirement that this posterior continue to govern the jury’s decision-making upon observing silence from the defendant. One might speculate that in reality juries draw adverse inferences from silence despite the judge’s instruction to the contrary. But the judge’s principal role in a criminal trial is to instruct the jury on matters of law. The law explicitly requires jurors to ignore the defendant’s silence. To the extent that jurors obey the law, they obey the judge’s instruction. Moreover, the jury is more likely to understand and obey the instruction when the judge provides an explanation for it. In the trial of O.J. Simpson, Judge Lance Ito formulated and explained his instruction to the jury as follows: Ladies and gentlemen of the jury, you have heard all the evidence, and it is now my duty to instruct you on the law that applies to this case. (. . .) A defendant in a criminal trial has a constitutional right not be compelled to testify. You must not draw any inference from the fact that a defendant does not testify. Further, you must neither discuss this matter, nor permit it to enter into your deliberations. In deciding whether or not to testify, the defendant may choose to rely upon the state of the evidence and upon the failure, if any, of the prosecution to prove beyond a reasonable doubt every essential element of the crime charged against him. (O.J. Simpson Trial Transcripts and Documents, 1995, p. 47124) As a final means of enforcement, judges can set aside a verdict if they believe that it was unjustified given the evidence and their instructions to the jury. In the model, the instruction that no inference be drawn from silence is assumed to be successful, and is shown to reduce the conviction rate. If the prosecution fails to present incriminating evidence, the defense can remain silent and be acquitted. This helps the guilty by increasing wrongful acquittals, and also helps the innocent by reducing wrongful convictions.4 If social preferences are measured in terms of the two types of court error, the right to silence cannot improve social welfare unless the jury’s preferences are biased relative to those of society; if jury preferences coincide with social preferences, the first best outcome is achieved without the right to silence. The preferences of juries may be biased relative to those of society in cases where juries discriminate against defendants based on race or social class, or in cases
250
Hugo M. Mialon
where defendants cannot afford the lawyers who are best able to select the juries to minimize the biases against their clients. In cases where juries are biased, the welfare effect of the right to silence depends on the model’s other parameters, several of which are affected by the Fifth Amendment’s due process requirement. Due process of law. The second Fifth Amendment right analyzed in the paper is the right to due process of law. Among other things, the due process clause requires prosecutors to share evidence with the defense. In most of the U.S., discovery power is much broader for the defense than for the prosecution. And all states require the prosecution to share exculpatory evidence. Mandatory disclosure by the prosecution ensures that the defense is no less informed, and in most cases, better informed than the prosecution. In the model without the right to silence, the disclosure requirement has an ambiguous effect on the conviction rate. It increases the likelihood that the defense has exculpatory evidence, which can decrease the conviction rate; but it also makes the jury’s adverse inference from the defendant’s silence more adverse, which increases the conviction rate. However, with the right to silence, mandatory disclosure cannot make the adverse inference from silence more adverse since the right to silence blocks the adverse inference in the first place. Thus, mandatory disclosure always reduces the conviction rate with the right to silence. If efficiency is measured in terms of the two types of court error, mandatory disclosure always reduces the efficiency of the right to silence. Society always feels worse about reversing the jury’s verdict given silence from guilty to innocent if it can rule out the possibility that the defendant was silent for lack of being informed, which it can with mandatory disclosure. On the other hand, mandatory disclosure always improves efficiency in the presence of the right to silence. With the right to silence, mandatory disclosure only increases the chances that exculpatory evidence comes out at trial, which cannot harm, and may benefit, society. This implies that the right to silence combined with mandatory disclosure (the Fifth Amendment) is more likely to increase welfare than is the right to silence alone. But among the four mechanisms, no Fifth Amendment, the right to silence alone, mandatory disclosure alone, and the Fifth Amendment, which is the most efficient? It turns out that the most efficient is either the Fifth Amendment or mandatory disclosure alone. Thus, mandatory disclosure is always part of the optimal mechanism in this constrained set. Whether the right to silence is also part of it depends on the extent of jury bias and on the model’s other parameters. For example, the prior probability that the defendant is truly guilty in the model is linked to the reputations of the police and the judiciary. All else constant, the right to silence is part of the optimal mechanism for a larger range of parameters the worse are the reputations of the police and the judiciary. In this case, more innocent defendants, and fewer guilty defendants, stand to benefit from the right to silence, so that it is more likely to improve welfare.
An economic theory of the Fifth Amendment
251
Section 2 relates the contribution to existing literature. Section 3 develops the economic model of criminal trials with and without mandatory disclosure and with and without the right to silence. Section 4 derives equilibrium in each of these four models, and analyzes the effects of the right to silence and mandatory disclosure on equilibrium conviction rates. Section 5 derives equilibrium welfare (measured in terms of court errors at trial) in each model, and identifies the most efficient of the four mechanisms analyzed. Section 6 discusses the relationship between the constitutional structures and the possibility of plea bargaining. Section 7 summarizes and proposes avenues for further research.
2. Relation to existing literature In the economics literature on pre-trial bargaining (see, for example, P’ng, 1983, Reinganum, 1988, and Spier, 1992), the trial process is usually modeled with a single parameter, the exogenous probability that the defendant is ultimately convicted. Relatively fewer papers model the trial process more explicitly. Gay et al. (1989) develop an inquisitorial model of trials, in which defendants choose whether to be tried by a judge or jury. The right to a jury trial is a Sixth Amendment right, whereas due process and the right to silence are Fifth Amendment rights. Also, the model that is developed here to analyze the efficiency of Fifth Amendment rights is adversarial rather than inquisitorial. Shin (1998) compares the adversarial and inquisitorial systems in terms of social efficiency, concluding that the adversarial model is superior. In the author’s models, social utility coincides with the jury’s utility, and weighs wrongful acquittals and wrongful convictions equally. Jury preferences may be biased relative to social preferences, and wrongful convictions may not be as socially desirable as wrongful acquittals, especially in criminal cases. Thus, Shin’s conclusions do not easily apply to criminal cases. In contrast, the model developed here allows the jury’s utility to differ from social utility, and allows each of them to weigh wrongful convictions more heavily than wrongful acquittals. As such, the conclusions derived from the model apply to criminal cases as well. Cooter and Rubinfeld (1994) analyze legal discovery. They note that discovery, which compels information sharing, can affect the settlement probability and trial accuracy. They focus their analysis more on the former effect, mentioning only briefly that mandatory disclosure can improve the accuracy of trials by increasing the information available to the jury. The present paper focuses more on the effect of mandatory disclosure on trial outcomes, and demonstrates that this effect is potentially ambiguous. Moreover, Cooter and Rubinfeld do not consider the combination of mandatory disclosure and the right to silence. Seidmann and Stein (2000) analyze the right to silence, but do not consider its combination with mandatory disclosure. They argue that the right to
252 Hugo M. Mialon silence helps the innocent, that is, reduces wrongful convictions, only because it allows criminals to avoid conviction without lying, so that statements by innocent defendants are more credible. In the model developed here, the right to silence reduces wrongful convictions without the possibility of perjury, by tending to block unraveling, the phenomenon that no news is bad news, which was first described by Milgrom (1981). Baird, Gertner, and Picker (1998, p. 91) mention that the right to silence at trial may prevent unraveling, but do not pursue this line of inquiry formally.
3. Model of criminal trials The set of players is {P,D,J}, where P is the prosecution, D is the defense, and J is the jury. D and the defendant are treated as the same person, so there are no agency problems. Although J comprises several jurors, their individual verdicts are inputs into a group decision-rule, usually unanimity, which outputs a unified and final verdict. In this sense, J is treated as one player with one voice.5 Let Ω denote the state space. The model’s intrinsic uncertainty is summarized by the vector (τ, ε, κ) ∈ Ω, where Ω = {Iτ, Gτ } × {Iε, Gε } × 2(P,D). The random variable τ, called the “truth,” can be in one of two states, Iτ, interpreted as “The truth is that D is innocent,” or Gτ, interpreted as “The truth is that D is guilty.” The random variable ε, called the “evidence,” can be in one of two states, Iε, interpreted as “The evidence is in favor of D’s innocence,” or Gε, interpreted as “The evidence is in favor of D’s guilt.” The random variable κ, called the “knowledge,” can be in one of four states, either both P and D know the evidence, only D knows the evidence, only P knows the evidence, or neither D nor P know the evidence. J does not know ε, τ nor κ. P, D, and J have a common prior over the realization of the truth, denoted by P[Gτ ]. This prior exogenously embodies everything leading up to the trial, including police investigations and preliminary hearings involving motions to suppress evidence. If participants in the trial have come to trust that the police and judges are fair and unbiased, in the sense that they usually conduct their investigations and render their decisions without regard for race or social class, then they may be more likely to believe ex ante that the person who stands before them in court is guilty, which means that their prior belief that D is guilty will tend to be higher. But if the participants perceive that the police and judges are corrupt or biased, their prior will tend to be lower. They naturally form an assessment that D is likely to be guilty of what is charged given that D has already made it this far into the system. The quality of the evidence is represented by the likelihood matrix given in Table 11.1, where P[Gε | Iτ ], for example, is the probability that the evidence indicates that D is guilty given that D is innocent. The evidence is more often right than wrong: P[Iε | Iτ ] > P[Gε | Iτ ] and P[Iε | Gτ] ] > P[Gε | Gτ ]. In cases where the evidence is more accurate, P[Gε | Iτ ] and P[Iε | Gτ ] are closer
An economic theory of the Fifth Amendment
253
Table 11.1 Likelihood matrix representing evidence quality
P[Iτ] P[Gτ] Gτ
Iτ Gτ
Iε
Gε
P[Iε | Iτ] P[Iε | Gτ]
P[Gε | Iτ] P[Gε | Gτ]
Table 11.2 Information structures without mandatory disclosure
D knows ε D does not know ε
P knows ε
P does not know ε
δπ (1 − δ)π
δ(1 − π) (1 − δ)(1 − π)
to 0, while in cases where the evidence is less accurate, they are closer to 1/2. Lawyers are no more likely to know the evidence if it is in their favor than if it is not, and no more likely to know it if the defendant is guilty than if he is not. More precisely, κ is stochastically independent from (τ, ε). Denote the unconditional probability that D knows the evidence by δ, and the unconditional probability that P knows the evidence by π. Initially, the event that D is informed and the event that P is informed of the evidence are independent. Then Table 11.2 represents the lawyers’ information structures. The independence of D and P’s information structures will be relaxed when the Fifth Amendment’s disclosure requirement on prosecutors is introduced at the end of this section. In the model, action unfolds in four time periods. At time 1, the truth, τ, the evidence, ε, and the lawyers’ knowledge of the evidence, κ, are realized. At time 2, P makes its case. At time 3, D makes its case. At time 4, J renders its verdict having heard D and P. It is assumed that D and P cannot say what they know to be false, and cannot say that for which they lack any evidence. In the U.S., lawyers are disbarred if they are found to have presented false evidence or to have knowingly allowed their clients to commit perjury. With this assumption, if D or P know the evidence ε, their action set is {“ε”, “s”}, where the quotation marks indicate vocalization of the evidence, and s stands for silence. If they do not know the evidence, they must take action “s.” J’s action set is {Iν, Gν }, where Iν is the not guilty verdict, and Gν is the guilty verdict. The outcome space is {Gν & Gτ, Gν & Iτ, Iν & Gτ, Iν & Iτ }. The second outcome is a wrongful conviction, the third is a wrongful acquittal. Assume D wants the verdict to be Iν, and P wants the verdict to be Gν, regardless of the state. The utilities of the lawyers are, after normalization,
254
Hugo M. Mialon UP (Gν, Gτ) = 1, UP (Gν, Iτ) = 1, UP (Iν, Gτ) = 0, UP (Iν, Iτ) = 0 UD = 1 − UP.
(1)
One might object that even in the U.S., trials are not strictly adversarial. In principle, perhaps, state and federal prosecutors are supposed to maximize social welfare. If this were the case, it would be more realistic to assume that P wants the verdict to be Gν if and only if the state is Gτ. In practice, however, prosecutors may not simply maximize social welfare.6, 7 Either to climb the ranks in U.S. Attorney offices or to have better outside options, prosecutors need trial experience and a reputation for winning cases. Moreover, even if reputation and human capital incentives would lead prosecutors to withhold evidence when it is in the defendant’s favor, the Fifth Amendment’s disclosure requirement on prosecutors would not allow them to do so. Below, we formulate this requirement in terms of the parameters, and analyze its effect on equilibrium when prosecutors only care to win at trial. J’s attitude toward risk is normalized such that its utility from a rightful conviction or a rightful acquittal is 1, its utility from a wrongful acquittal is 0, and its utility from a wrongful conviction is UJ = UJ (Gν, Iτ) ≤ 0. In a parallel fashion, society’s attitude toward risk is normalized such that its utility from a rightful conviction or a rightful acquittal is 1, its utility from a wrongful acquittal is 0, and its utility from a rightful conviction is US = US (Gν, Iτ) ≤ 0. J’s utility may differ from society’s utility in two broad ways: UJ < US or UJ > US. The first case corresponds to a jury that overly discriminates in favor of the defendant. The second case corresponds to a jury that is too biased against the defendant. If the jury is prejudiced against the defendant, then it may not attach enough (from society’s point of view) relative disutility to the outcome in which the defendant is convicted though actually innocent of the crime charged. During the jury selection stage, experienced lawyers can detect and eliminate candidate jurors who are unfavorably biased against their clients. Poorer defendants are not able to afford the best lawyers, and are instead randomly appointed one by the State. Thus, indigent defendants may be more likely to face a jury that is overly biased against them.8 The Fifth Amendment’s mandatory disclosure requirement and right to silence are now expressed in terms of the adversarial model developed above. Mandatory disclosure. Let MD denote mandatory disclosure by the prosecution. Without MD, the event that D is informed and the event that P is informed of the evidence are independent. Then Table 11.2 represents the correct information structures. With MD, if D is not informed, P cannot be informed of the evidence. Then Table 11.3 represents the correct information structures. Without MD, the probability that D is informed is δ, but with MD, the probability that D is informed is π + δ(1 − π) > δ. MD alters the time structure of the model: between time 1 and time 2 (that is, right before P makes its case), D learns any evidence that P knows.
An economic theory of the Fifth Amendment
255
Table 11.3 Information structures with mandatory disclosure
D knows ε D does not know ε
P knows ε
P does not know ε
π 0
δ(1 − π) (1 − δ)(1 − π)
Right to silence. Let RTS denote the right to silence. RTS alters the time structure of the model too: if D remained silent at time 3, the judge instructs J of D’s RTS at the start of time 4 (that is, right before J renders its verdict). At the start of time 3, just before D makes its case, J has a posterior over D’s innocence, given all information available to date (including P’s case). RTS means that this posterior should continue to govern J’s decision-making upon observing silence from D, that is, no updating should occur upon observing D’s silence. In the model without MD, for example, P[Iτ | P and D chose “s”] =
P[Iτ ]{P[Iε | Iτ ] (1 − δ) + P[Gε | Iτ ](1 − π)} P[Iτ ]{P[Iε | Iτ ] (1 − δ) + P[Gε | Iτ ] (1 − π)} + P[Gτ ] {P[Iε | Gτ ] (1 − δ) + P[Gε | Gτ ](1 − π)}
(2)
and P[Iτ | P chose “s”] =
P[Iτ ] {1 − π + P[Iε | Iτ ]π} . P[Iτ ] {1 − π + P[Iε | Iτ ]π} + P[Gτ ] {1 − π + P [Iε | Gτ ]π}
(3)
Without RTS, J’s decision-making given silence is governed by P[Iτ | P and D chose “s”], but with RTS, it should be governed by P[Iτ | P chose “s”].
4. Equilibrium Let Γ00 denote the model without MD and without RTS, Γ10 the model with MD and without RTS, Γ01 the model without MD and with RTS, and Γ11 the model with MD and with RTS. Let us characterize equilibrium behavior in each of these models. If J is so biased that it attaches a utility level of − ∞ to a wrongful conviction, then the evidence against D could not lead J to convict, no matter how accurate it is. In this case, the evidence is not pivotal to J’s verdict. Definition 1. Evidence is pivotal if J’s optimal action is to convict if it knows Gε and to acquit if it knows Iε.
256
Hugo M. Mialon
Lemma 1. J’s optimal action is to convict if it knows Gε and acquit if it knows Iε if and only if UJ ∈ [η, η] where η = 1 − Proof.
P[Gτ ] P[Gε | Gτ ] P[Gτ ] P[Iε | Gτ ] and η = 1 − . P[Iτ ] P[Gε | Iτ ] P[Iτ ] P[Iε | Iτ ]
Proofs of the main results are in the Appendix.
The first proposition identifies a perfect Bayesian equilibrium of each of the four adversarial models, assuming that evidence is pivotal, that is, UJ ∈ [η, η]. Proposition 1. If UJ ∈ [η, η, the following strategy vector is a perfect Bayesian equilibrium and the only one surviving iterative deletion of weakly dominated strategies: P and D reveal the evidence if and only if it is in their favor (otherwise they remain silent), J acquits if it hears evidence in favor of D, convicts if it hears evidence against D, and if it hears nothing but silence, it acquits if and only if UJ ≤ ηij in Γij for all i, j ∈ {0,1}, where η00 = 1 − η10 = 1 −
P[Gτ ] P[Iε | Gτ ] (1 − δ) + P[Gε | Gτ ](1 − π) P[Iτ ] P[Iε | Iτ ] (1 − δ) + P[Gε | Iτ ](1 − π) P[Gτ ] P[Iε | Gτ ](1 − δ) + P[Gε | Gτ ] P[Iτ ] P[Iε | Iτ ] (1 − δ) + P[Gε | Iτ ]
η01 = η11 = 1 −
P[Gτ ] 1 − π + P[Iε | Gτ ] π P[Iτ ] 1 − π + P[Iε | Iτ ]π
η10 < η00 < η01 = η11. The parameter ηij is the cutoff, below which D is acquitted, and above which D is convicted, given silence. For example, in Γ00, if δ = 1, the cutoff reduces to 1−
P[Gτ ] P[Gε | Gτ ] P[Iτ ] P[Gε | Iτ ]
,
(4)
which is η, the lower bound of the set of juries for whom evidence is pivotal. Thus, if D is completely informed, J always convicts given silence. If J knows that D is perfectly informed and D has remained silent, J knows that the evidence is against D, and since evidence is pivotal, J convicts. If π = 1, the cutoff reduces to 1−
P[Gτ ] P[Iε | Gτ ] P[Iτ ] P[Iε | Iτ ]
,
(5)
An economic theory of the Fifth Amendment
257
which is η. If P is perfectly informed, J always acquits given silence. If J knows that P is perfectly informed and P has remained silent, J knows the evidence is against P, and since evidence is pivotal, J acquits. In the adversarial context, J draws an inference against D if D is silent. RTS blocks J’s adverse inference from D’s silence. Naturally, this reduces the cutoff, above which J convicts, and below which J acquits, given silence. Thus, η10 < η00. In contrast, MD makes J’s adverse inference from D’s silence more adverse. With MD, J is more likely to believe that D is silent because D knows the evidence and it is not in D’s favor, than because D does not know the evidence, since MD increases the probability that D knows the evidence. Thus, η00 < η01. But with RTS, J’s cutoff, above which it convicts, and below which it acquits, given silence, is the same with or without MD. Intuitively, MD cannot make J’s adverse inference from silence more adverse with RTS, because RTS blocks the adverse inference in the first place. Thus, η01 = η11. The next proposition identifies the conviction probability in each of the four models. Proposition 2. P[Gν ]0j =
冦
For all j ∈ {0,1}, the conviction probability in Γ0j is πP[Gε ] P[Gε ] + (1 − δ)P[Iε ]
for all UJ ∈ [η, η0j] , for all UJ ∈ (η0j, η-]
and the conviction probability in Γ1j is P[Gν ]1j =
冦
πP[Gε ] P[Gε ] + (1 − π)(1 − δ)P[Iε ]
for all UJ ∈ [η, η1j] . for all UJ ∈ (η1j, η-]
From Proposition 2, we can analyze the separate, interactive, and combined effects of MD and RTS on the conviction probability. Let us consider these effects in turn. Corollary 1. Without RTS, MD increases the conviction probability by (1 − π) {(1 − δ)P[Iε ] + P[Gε ]} for all UJ ∈ [η10, η00], leaves it unchanged for all UJ ∈ [η, η10), and reduces it by π(1 − δ)P[Iε ] for all UJ ∈ (η00, η]. Without RTS, the effect of MD on the conviction probability is ambiguous. From Proposition 1, we know that MD makes J’s adverse inference from D’s silence more adverse, which tends to increase the conviction rate (a strategic effect). But MD also increases the probability that D has exculpatory evidence to present, which tends to decrease the conviction rate (a direct effect). A J drawn from (η00, η] so little dislikes a wrongful conviction that its equilibrium action given silence is to convict with or without MD (no strategic effect); but MD nevertheless reduces the probability that such a J convicts because it increases the probability that D has exculpatory evidence to present (direct effect).
258
Hugo M. Mialon
A J drawn from [η, η10) so dislikes a wrongful conviction that its equilibrium action given silence is to acquit with or without MD (no strategic effect); and although MD increases the probability that D has exculpatory evidence to present, this cannot reduce the probability that J convicts either, because J is already acquitting given silence (no direct effect). In this case, MD does not affect the conviction rate. Only if J is drawn from [η10, η00] does MD affect the verdict given silence. This type of J would acquit without MD, but convict with MD. Intuitively, without MD, if J hears only silence, it might infer that the evidence is in D’s favor but only P knows it. With MD, if J hears only silence, it is less likely to draw such an inference: now J knows that D is better informed, and hence J’s adverse inference in the face of silence becomes more adverse (strategic effect). And although MD also tends to reduce the conviction rate because it increases the probability that D has exculpatory evidence to present (direct effect), the strategic effect dominates, so that MD results in a higher conviction rate. While the effect of MD on the conviction probability is ambiguous without RTS, it is unambiguous with RTS. Corollary 2. With RTS, MD leaves the conviction probability unchanged for all UJ ∈ [η, η11] and reduces it by π(1 − δ)P[Iε ] for all UJ ∈ (η11, η]. Corollary 2 reveals that MD generally reduces the conviction probability in the presence of RTS. Intuitively, MD cannot make J’s adverse inference from silence more adverse if RTS does not allow an adverse inference from silence in the first place. Thus, with RTS, MD only increases the probability that D has exculpatory evidence to present, and hence either reduces the conviction probability or leaves it unchanged. While the effect of MD is ambiguous without RTS but unambiguous with RTS, the effect of RTS is unambiguous with or without MD. Corollary 3. Without MD, RTS reduces the conviction probability by (1 − δ)P[Iε ] + (1 − π)P[Gε ] for all UJ ∈ [η00, η11] and leaves it unchanged for all UJ ∈ [η, η00) ∪ (η11, η]. RTS reduces the conviction probability because it blocks J’s inference from D’s silence, an inference that is adverse to D in the model’s adversarial context. Corollary 4. With MD, RTS reduces the conviction probability by (1 − π){(1 − δ)P[Iε ] + P[Gε ]} for all UJ ∈ [η10, η11] and leaves it unchanged for all UJ ∈ [η, η10) ∪ (η11, η]. Corollaries 3 and 4 indicate that RTS generally reduces the conviction probability with or without MD. By reducing the conviction probability, RTS
An economic theory of the Fifth Amendment
259
helps innocent defendants by reducing wrongful convictions, but also helps guilty defendants by increasing wrongful acquittals. RTS protects D if D (and hence P, without MD) does not know the evidence, or if D knows the evidence, it is not in D’s favor, but P does not know it. In the former case, D must remain silent; in the latter case, D chooses to remain silent. In either case, if D is guilty, D is rightfully convicted without RTS, but wrongfully acquitted with it. If D is innocent, D is wrongfully convicted without RTS, but rightfully acquitted with it. Thus, RTS protects the innocent by reducing the probability of wrongful conviction, but also protects the guilty by increasing the probability of wrongful acquittal. Comparing Corollaries 3 and 4 reveals that RTS reduces the conviction probability (and reduces the probability of wrongful conviction, and increases the probability of wrongful acquittal) for a larger range of juries (since [η00, η11] 傺 [η10, η11]), but by a smaller measure for any jury in that range (since (1 − δ)P[Iε ] > (1 − π)(1 − δ)P[Iε ]), with MD than without it. On one hand, MD directly reduces the conviction probability by increasing the probability that D knows exculpatory evidence. Thus, RTS cannot reduce the conviction probability by as great a measure if MD has already directly made it low. But on the other hand, MD also increases the conviction probability by making J’s inference from D’s silence more adverse. Thus, RTS can reduce the conviction probability by a greater measure if it can cancel out this strategic effect of MD. Corollary 5. MD combined with RTS (the Fifth Amendment) reduces the conviction probability by (1 − δ)P[Iε ] + (1 − π)P[Gε ] for all UJ ∈ [η00, η11], leaves it unchanged for all UJ ∈ [η, η00), and reduces it by π(1 − δ)P[Iε ] for all UJ ∈ (η11, η]. Comparing Corollaries 3 and 5 reveals that the Fifth Amendment reduces the conviction probability for a larger range of juries (since [η00, η11] 傺 [η10, η]), and by a greater or equal measure for any jury in this range, than RTS alone. By blocking J’s adverse inference, RTS reduces the conviction probability. Adding MD to RTS reduces the conviction probability even more, since MD also directly reduces the conviction probability by increasing the probability that D knows exculpatory evidence. And although MD also indirectly tends to increase the conviction probability by making J’s adverse inference from silence more adverse, this effect is neutralized by the presence of RTS. Thus, the Fifth Amendment generally reduces the conviction probability more than RTS alone.
5. Social welfare To study the effect of RTS on social welfare, society’s preferences are now measured in terms of the two types of court error. Social preference, US, can differ from jury preferences, UJ, but evidence is pivotal for society and juries
260
Hugo M. Mialon
alike, that is, UJ, US ∈ [η, η]. Denote equilibrium welfare in Γij by Wij for all i, j ∈ {0, 1}. The following proposition identifies equilibrium welfare in each of the four adversarial models. Proposition 3. For all j ∈ {0, 1}, social welfare in Γ0j is
⎧ P[Iτ ]{P[Iε | Iτ ] + P[Iε | Iτ ](πUS + 1 − π)} ⎪ + P[G ]P[G | G ]π for all UJ ∈ [η, η0j] τ ε τ W 0j = ⎨ , ⎪ P[Iτ ]{P[Iε | Iτ ](δ + (1 − δ)US) + P[Gε | Iτ ]US } 0j ⎩ + P[Gτ ]{P[Iε | Gτ ](1 − δ) + P[Gε | Gτ ] } for all UJ ∈ (η , η] and welfare in Γ1j is
⎧ P[Iτ ]{P[Iε | Iτ ] + P[Iε | Iτ ](πUS + 1 − π)} ⎪ + P[G ]P[G | G ]π for all UJ ∈ [η, η1j] τ ε τ W 1j = ⎨ . ⎪ P[Iτ ]{P[Iε | Iτ ](π + (1 − π) (δ + (1 − δ)US)) + P[Gε | Iτ ]US } 1j ⎩ + P[Gτ ]{P[Iε | Gτ ](1 − δ)(1 − π) + P[Gε | Gτ ]} for all UJ ∈ (η , η] From Proposition 3, we can analyze the separate, interactive, and combined effects of MD and RTS on welfare. Let us also consider these effects in turn. Corollary 6. Without RTS, MD increases welfare if and only if US > η10 for all UJ ∈ [η10, η00], leaves it unchanged for all UJ ∈ [η, η10), and increases it if and only if US ≤ η for all UJ ∈ (η00, η]. Since evidence is pivotal for society too, without RTS, MD always increases welfare for all UJ ∈ (η00, η]. Therefore, MD’s overall effect on welfare is ambiguous without RTS. But it is unambiguous with RTS, much like its effect on the conviction probability. Corollary 7. With RTS, MD does not affect welfare for all UJ ∈ [η, η11), and increases it if and only if US ≤ η, that is, always increases it, for all UJ ∈ (η11, η]. According to Corollary 7, MD generally increases welfare with RTS. With RTS, MD only increases the probability that D is informed of the evidence. If this evidence is exculpatory, D will present it and J will acquit since the evidence is pivotal. With MD, P cannot suppress the exculpatory evidence. Thus, MD only makes exculpatory evidence more likely to come out, which cannot harm society as long as evidence is pivotal for society as well. RTS prevents some juries from convicting given silence. With these juries, increasing the likelihood that D knows exculpatory evidence does not affect
An economic theory of the Fifth Amendment
261
welfare. Even if P suppresses exculpatory evidence, and D does not learn about it and hence must be silent, D is nevertheless acquitted, as long as RTS is in effect. But other juries would convict given silence even with RTS. For these juries, increasing the likelihood that D knows exculpatory evidence actually does society good. Without MD, if P suppresses exculpatory evidence, and D does not learn about it and hence must be silent, then D is convicted, even with RTS. Because evidence is more often right than wrong, many of these convictions are wrongful. But with MD, if P knows exculpatory evidence, D learns and presents it, and hence is acquitted since evidence is pivotal. With these juries, MD strictly increases welfare with RTS. Without RTS, MD also makes J’s adverse inference from D’s silence more adverse, which may harm a society that sufficiently dislikes wrongful convictions. Thus, without RTS, MD need not increase welfare. But this does not imply that RTS necessarily increases the welfare efficiency of MD. For example, if society tolerates wrongful convictions sufficiently that MD increases welfare for all UJ ∈ [η10, η00] without RTS, then MD increases welfare less with RTS than without it. Let us now turn to the welfare-effects of RTS with and without MD. Corollary 8. Without MD, RTS does not affect welfare for all UJ ∈ [η, η00) ∪ (η01, η], and increases it if and only if US ≤ η00 for all U J ∈ [η00, η01]. Corollary 8 reveals that a necessary condition for RTS to improve welfare is that US < UJ. That is, RTS can only improve welfare if J’s preferences are biased against D relative to social preferences. If US = UJ, society’s problem (whether or not to implement RTS) and J’s problem (whether or not to convict given silence) exactly coincide, so the first best outcome is achieved without RTS. If US > UJ, J incurs more disutility from a wrongful conviction than society does. In this case, RTS cannot improve welfare either since RTS reduces the probability of wrongful conviction. Hence, there must be jury discrimination against defendants for the right to silence to improve welfare. Corollary 9. With MD, RTS does not affect welfare for all UJ ∈ [η, η10) ∪ (η11, η], and increases it if and only if US ≤ η10 for all UJ ∈ [η10, η11]. Corollaries 8 and 9 imply that MD reduces the welfare efficiency of RTS. Recall that η10 < η00 from Proposition 1. Consider the three regions of parameter space, (1) US < η10 < η00, (2) η10 < US < η00, and (3) η10 < η00 < US. The extent to which US is smaller (greater) than η10 (η00), determines the extent to which RTS increases (decreases) welfare in the model with MD (without MD). Thus, in region (1), RTS increases welfare, but less with MD than without it. RTS changes J’s verdict given silence from Gν to Iν for all juries in [η10, η11], and this change is beneficial to society with or without MD, because in this region society’s disutility from wrongful conviction is relatively large. Without MD, if D remains silent, society cannot rule out the
262 Hugo M. Mialon possibility that D does not know the evidence, nor the possibility that the evidence misleadingly indicates that D is guilty. But with MD, if D remains silent, society can rule out the former possibility, leaving only the latter. Thus, society benefits more (in expected terms) from the verdict change given silence from Gν to Iν without MD than with it. In region (2), RTS reduces welfare with MD, but increases welfare without it. In this region, society’s disutility from wrongful conviction is sufficiently high that, if it cannot rule out the possibility that D does not know the evidence, it benefits from the verdict change from Gν to Iν, but its disutility from wrongful conviction is also sufficiently low that, if it can rule out this possibility, it is harmed by the change in verdict occasioned by RTS. In region (3), RTS reduces welfare, but more with MD than without it. The verdict change due to RTS is beneficial to society with or without MD, because in this region society’s disutility is relatively small. But the verdict change is more beneficial to society without MD, because with MD, society can eliminate the possibility that D does not know the evidence if D remained silent, leaving only the possibility that the evidence wrongfully indicates that D is guilty. Thus RTS either increases welfare less or reduces welfare more with MD than without it. In other words, MD reduces the welfare efficiency of RTS. Corollary 10. MD combined with RTS (the Fifth Amendment) does not affect welfare for all Uj ∈ [η, η00), increases it if and only if US ≤ η00 for all UJ ∈ [η00, η01], and increases it if and only if US ≤ η, that is, always increases it, for all UJ ∈ (η11, η]. Comparing Corollaries 8 and 10, we find that the Fifth Amendment improves welfare for a larger range of parameters than does RTS on its own. Recall that MD always increases welfare in the presence of RTS. Therefore, adding MD and RTS naturally has a better effect on welfare than adding RTS alone. Having derived the interactive and combined welfare-effects of RTS and MD, let us now search for the most efficient mechanism among Γ00, Γ10, Γ01, and Γ11. The following proposition identifies the necessary and sufficient condition for Γ11 (the Fifth Amendment) to be the most efficient mechanism in this constrained set. Proposition 4. If US ≤ η01, then Γ11 is the most efficient mechanism. If US > η01, then Γ10 is the most efficient mechanism. Some juries would acquit given silence regardless of the legal regime. With these juries, all regimes yield equal welfare. Even MD does not affect welfare. MD does not reduce the conviction probability even though it increases the probability that D presents exculpatory evidence, because these juries acquit even if D remains silent rather than presenting the exculpatory evidence.
An economic theory of the Fifth Amendment
263
Other juries would convict given silence regardless of the legal regime. With these juries, only MD affects welfare. In particular, MD increases welfare with or without RTS, since with these juries, MD only increases the probability that exculpatory evidence comes out, which improves welfare as long as evidence is pivotal for society. With these juries, any regime with MD is optimal. With yet other juries, the verdict given silence can only be affected by MD alone, that is, MD without RTS (recall that with RTS, MD cannot affect the verdict given silence). With these juries, acquittal would follow silence under any legal regime other than MD alone. But with MD alone, conviction would follow silence. This harms society if and only if society sufficiently dislikes wrongful convictions. Thus, with these juries, all regimes other than MD alone yield equal welfare, and MD alone yields higher welfare if and only if society sufficiently dislikes wrongful convictions. The verdict given silence of the remaining jury types is only affected by either RTS alone or RTS with MD. With these juries, conviction follows silence under any other regime. But acquittal follows silence with RTS alone or RTS with MD. Welfare is always higher with MD alone than without MD or RTS, since with these juries, MD has no strategic effect (the juries always acquit given silence with or without MD), and hence only increases the probability that exculpatory evidence comes out, which increases welfare. Moreover, welfare is higher with RTS alone or RTS with MD than it is with MD alone if and only if society sufficiently dislikes wrongful convictions. Therefore, only MD alone or MD with RTS can be optimal across all parameter ranges. Hence, in general, either the Fifth Amendment or MD on its own is the most efficient mechanism studied. If society sufficiently dislikes wrongful convictions, then the most efficient mechanism is the Fifth Amendment, otherwise it is MD alone. Whether RTS is part of the efficient mechanism depends on the extent of jury discrimination (that is, the extent to which US is smaller than UJ), and on the model’s other parameters. Proposition 4 states that RTS is part of the efficient mechanism if and only if US ≤ η01 = 1 −
P[Gτ ] P[Iε | Gτ ](1 − δ) + P[Gε | Gτ ] P[Iτ] P[Iε | Iτ ](1 − δ) + P[Gε | Iτ ]
.
(6)
The prior probability that D is truly guilty (P[Gτ ]) is connected to the reputations of the police and the judiciary. If the police and the judiciary have a reputation for being corrupt or biased, then J is more likely to believe ex ante that D is innocent, so that P[Gτ ] will tend to be lower. On the other hand, the prior probabilities that correlate the evidence with the truth (P[Gε | Gτ ] and P[Iε | Iτ ]) are linked to the strength or accuracy of the evidence, which may vary from case to case. RTS is part of the efficient mechanism for a larger
264 Hugo M. Mialon parameter range in cases for which evidence is weaker and the worse are the reputations of the police and the judiciary. In weaker cases, the evidence might well indicate guilt even though D is innocent, and might well indicate innocence even though D is guilty. Moreover, D is more likely to be innocent the worse are the reputations of the police and the judiciary. Thus, more innocent defendants and fewer guilty defendants stand to benefit from RTS, and therefore it is more likely to improve welfare. This suggests that the prevalence of police discrimination (which may reduce police reputation) and jury discrimination (without which the right to silence cannot improve welfare) would be arguments to preserve or implement RTS, at least in cases for which evidence is less accurate. King (1993) reviews the empirical evidence concerning the effect of race on jury decisions in the United States. The author reports the results of four independent studies in different parts of the country that indicate that the racial composition of juries does affect verdicts. For example, a 1984 study in Dade County, Florida found that juries with at least one black juror were less likely than all-white juries to convict black defendants. Donohue and Levitt (2001) present evidence that the racial composition of police forces in different U.S. cities affects the racial patterns of arrests in these cities. They find that increases in the number of white police officers increases the number of nonwhites arrested, but does not affect the number of whites arrested. The model suggests that RTS is more likely to improve social welfare in places where jury and police discrimination of this kind are widespread.
6. Plea bargaining The possibility of plea bargaining affects the composition of cases that go to trial. The cases that do not settle are the ones where there is asymmetric information and this is more likely in cases for which evidence is less accurate (P[Gε | Gτ ] and P[Iε | Iτ ] are closer to 12). With MD, any informational asymmetry favors D. In this case, a reduction in evidence accuracy increases the parameter range for which RTS is efficient, as is apparent from inequality (6). Intuitively, MD rules out the possibility that P knows the evidence, the evidence is in D’s favor, but D does not know it. Then there is more weight on the possibility that D knows the evidence but it is not in D’s favor. But the less accurate is the evidence, the less this implies that D is actually guilty. So RTS, by blocking J’s adverse inference from silence, is expected to help fewer guilty defendants the less accurate is the evidence. Therefore, given MD, the possibility of plea bargaining, by selecting for trial cases for which evidence is less accurate, makes RTS efficient for a larger parameter range. This may be an argument to preserve RTS in legal systems that strongly encourage plea bargaining to reduce court costs. Given the possibility of plea bargaining, RTS and MD can also affect the probability that cases go to trial. Reinganum (1988) develops a basic model of plea bargaining with prosecutorial discretion to dismiss cases and
An economic theory of the Fifth Amendment
265
negotiate guilty pleas to lesser crimes in exchange for reduced sentences. Her model assumes P has private information about case strength, which is only possible without MD at the plea-stage. The main results are that P dismisses cases that are sufficiently weak and offers D a sentence in exchange for a guilty plea in cases that are sufficiently strong, the sentence offered increases with the conviction probability at trial, and defendants are more likely to reject higher sentence offers. Thus, the trial probability increases with the strength of P’s case. RTS reduces the strength of P’s case. With RTS, P cannot count on J’s negative inference from D’s silence. This reduces the conviction probability at trial, as is shown in Corollary 3. Therefore, RTS should reduce the trial probability, at least in the absence of MD at the pre-trial stage. Moreover, the combination of RTS and MD was shown to reduce the conviction probability by an even greater extent than RTS alone. Thus, RTS combined with MD only at the trial stage should reduce the trial probability even more than RTS alone.
7. Conclusion The paper’s main findings are the following. Mandatory disclosure by prosecutors has an ambiguous effect on the conviction probability without the right to silence, but reduces it with the right to silence. With or without mandatory disclosure, the right to silence reduces the conviction probability, increases the probability of wrongful acquittal, and reduces the probability of wrongful conviction. If social welfare is measured only in terms of the court errors, the right to silence can only improve welfare in cases where jury preferences are biased against defendants relative to social preferences. With the right to silence, mandatory disclosure always increases welfare. Mandatory disclosure reduces the welfare-efficiency of the right to silence. The right to silence combined with mandatory disclosure is more likely to increase welfare than is the right to silence alone. The most efficient of the mechanisms studied is either mandatory disclosure alone or combined with the right to silence. The latter is more likely to be the most efficient mechanism the worse is the reputation of the police and the weaker is the evidence. These results may shed light on important policy questions, such as whether to preserve the right to silence in the U.S. now that it has been eliminated in England and Ireland, and whether and how to implement mandatory disclosure or the right to silence in developing countries. At the very least, the model demonstrates that the efficiency of the right to silence is intimately connected to the prevalence of jury and police discrimination, and that the efficiency of mandatory disclosure and that of the right to silence are intimately connected to each other. Several of the theoretical predictions may also be empirically testable. For example, one could test whether the right to silence at trial has reduced plea
266 Hugo M. Mialon rates and conviction rates in the U.S. by examining the effect of the Supreme Court ruling in Carter v. Kentucky (January 14, 1981), which gave defendants the right to have the judge instruct the jury not to draw an adverse inference from a failure to testify. It might also be possible to test the prediction that the right to silence reduces wrongful convictions by examining cases for which guilty verdicts have been successfully appealed. If one found that the fraction of verdicts reversed on appeal decreased after Carter v. Kentucky, one would have support for the prediction that the right to silence reduces wrongful convictions.
Appendix Proofs of Lemma 1 and Propositions 1 and 4 follow. Proof of Lemma 1. utility is
After learning Iε, if J chooses Gν, then J’s expected
P[Iτ | Iε ]UJ (Gν, Iτ) + P[Gτ | Iε ]UJ (Gν, Gτ) = P[Iτ | Iε ]UJ + P[Gτ | Iε ].
(A1)
If J chooses Iν, then J’s expected utility is P[Iτ | Iε ]UJ (Iν, Iτ) + P[Gτ | Iε ]UJ (Iν, Gτ) = P[Iτ | Iε ].
(A2)
Therefore, J renders the verdict Iν, (that is, the evidence is pivotal) if and only if UJ ≤ 1 −
P[Gτ ] P[Iε | Gτ ] P[Gτ | Iε ] =1− . P[Iτ | Iε ] P[Iτ ] P[Iε | Iτ ]
(A3)
After learning Gε, if J chooses Gν, then J’s expected utility is P[Iτ | Gε ]UJ (Gν, Iτ + P[Gτ | Gε ]UJ (Gν, Gτ) = P[Iτ | Gε ] UJ + P[Gτ | Gε ]. (A4) If J chooses Iν, then J’s expected utility is P[Iτ | Gε ]UJ (Iν, Iτ) + P[Gτ | Gε ]UJ (Iν, Gτ) = P[Iτ | Gε ].
(A5)
Therefore, J renders the verdict Gν (that is the evidence is pivotal) if and only if UJ ≥ 1 −
P[Gτ ] P[Gε | Gτ ] P[Gτ | Gε ] =1− . P[Iτ | Gε ] P[Iτ ] P[Gε | Iτ ]
In general, the evidence is pivotal if and only if
(A6)
An economic theory of the Fifth Amendment UJ ∈ [η, η] where η = 1 −
267
P[Gτ ] P[Gε | Gτ ] P[Gτ ] P[Iε | Gτ ] and η = 1 − . P[Iτ ] P[Gε | Iτ ] P[Iτ ] P[Iε | Iτ ] (A7)
Q.E.D. Proof of Proposition 1. Consider first Γ00. Given the candidate strategies for P and D, three of J’s information sets are reached with positive probability: “Iε,” “Gε,” “s.” Since evidence is pivotal, J renders verdict Gν if it hears “Gε” and verdict Iν if it hears “Iε”. Upon hearing nothing, J’s posterior is P[Iτ | P and D chose “s”] =
P[Iτ ] {P[Iε | Iτ ] (1 − δ) + P[Gε | Iτ ](1 − π)} P[Iτ ] {P[Iε | Iτ ](1 − δ) + P[Gε | Iτ ](1 − π)} + P[Gτ ]{P[Iε | Gτ ](1 − δ) + P[Gε | Gτ ](1 − π)}
.
(A8)
J renders verdict Iν if and only if P[Gτ | “s”]UJ (Iν, Gτ) + P[Iτ | “s”]UJ (Iν, Iτ) ≥ P[Gτ | “s”]UJ (Gν, Gτ) + P[Iτ | “s”]UJ (Gν, Iτ).
(A9)
Substituting (A.8) and the values of J’s utility function into (A.9), we get that J renders verdict Iν when it hears silence if and only if UJ ≤ 1 −
P[Gτ ] P[Iε | Gτ ](1 − δ) + P[Gε | Gτ(1 − π) = η00. P[Iτ ] P[Iε | Iτ ](1 − δ) + P[Gε | Iτ ](1 − π)
(A10)
Whether or not (A.10) is satisfied, P and D at least weakly prefer to present the evidence if and only if they know that it is in their favor. Thus, the candidate strategies for D and P are best responses to J’s persuasion rule. It is straightforward but tedious to show that iterative deletion of weakly dominated strategies eliminates all equilibria other than this one. Next consider Γ10. In this game, J’s posterior given silence is P[Iτ | P and D chose “s”] =
P[Iτ ] {P[Iε | Iτ ](1 − δ) + P[Gε | Iτ ] } P[Iτ ] {P[Iε | Iτ ](1 − δ) + P[Gε | Iτ ] } + P[Gτ ] {P[Iε | Gτ ](1 − δ) + P[Gε | Gτ ] }
.
(A11)
Therefore, J renders verdict Iν when it hears silence if and only if UJ ≤ 1 −
P[Gτ ] P[Iε | Gτ ](1 − δ) + P[Gε | Gτ ] P[Iτ ] P[Iε | Iτ ](1 − δ) + P[Gε | Iτ ]
= η10.
(A12)
268
Hugo M. Mialon
Now consider first Γ01. In this game, J’s posterior following silence by both P and D is given by (A.8). However, J’s posterior following only silence by P is given by P[Iτ | P chose “s”] =
P[Iτ ] {1 − π + P[Iε | Iτ ]π}
.
P[Iτ ] {1 − π + P[Iε | Iτ ]π} + P[Gτ ] {1 − π + P[Iε | Gτ ]π}
(A13)
Without RTS, J’s decision-making is governed by P[Iτ | P and D chose “s”], but with RTS, it should be governed by P[Iτ | P chose “s”]. Therefore, in Γ01, J’s posterior following silence is (A.13). Hence, J renders verdict Iν, when it hears silence if and only if UJ ≤ 1 −
P[Gτ ] 1 − π + P[Iε | Gτ ]π P[Iτ ] 1 − π + P[Iε | Iτ ]π
= η01.
(A14)
A similar calculation reveals that η01 = η11. Moreover, 1 − π + P[Iε | Gτ ]π 1 − π + P[Iε | Iτ ]π
<
P[Gε | Iτ ] and P[Iε | Gτ ] < P[Gε | Gτ ] since evidence is assumed to be more often right than wrong. Therefore, η10 < η00 < η01 = η11. Q.E.D. Proof of Proposition 4. From Proposition 3, we see that for all UJ ∈ [η, η10), W00 = W01 = W10 = W11. For all UJ ∈ [η11, η], W00 = W01 and W10 = W11. Moreover, W11 > W00 if and only if US ≤ η, which is true by assumption. Therefore, for all UJ ∈ [η11, η], W10 = W11 > W00 = W01. For all UJ ∈ [η10, η00], W11 = W01 = W00. Moreover, W11 > W00 if and only if US ≤ η10. Lastly, for all UJ ∈ [η00, η11], W11 = W01 and W10 > W00. Moreover, W11 > W10 > W00 if US ≤ η01, W10 > W11 > W00 if US ∈ [η10, η00], W10 > W00 > W11 if US > η00. Therefore, if US ≤ η01, Γ11 is the most efficient mechanism, and if US > η01, Γ10 is the most efficient mechanism. Q.E.D.
Notes * I thank Preston McAfee, Max Stinchcombe, Tom Wiseman, the Editor, Jennifer Reinganum, and two anonymous referees for extensive feedback. For comments,
An economic theory of the Fifth Amendment
1 2 3 4
5 6
7
8
269
I also thank Don Fullerton, Dan Hamermesh, Ken Hendricks, Steve Levitt, Jonathan Montag, Cesar Martinelli, Sue Mialon, Gerald Oettinger, Paul Rubin, Chris Sanchirico, Daniel Seidmann, David Sibley, Steve Trejo, Joel Waldfogel, and seminar participants at Emory University, ITAM, the University of Texas, the University of Iowa, and the Wharton School of the University of Pennsylvania. See, for example, Gunther and Sullivan, 1997, p. A-9. For a Pulitzer prize-winning account of the history of the Fifth Amendment, from its origins in the Israel of biblical times, see Leonard Levy (1968). See Bucke, Street, and Brown (2000) and Jackson, Wolfe, and Quinn (2000) for reports on the new legislation against the right to silence in England and Northern Ireland, respectively. This seems to contradict a claim by Jeremy Bentham: “Not only are the guilty served but it is they alone that are served (by the right to silence), without any mixture of the innocent” (Bentham, 1825, p. 161). But Bentham was writing at a time when the accused was prohibited from testifying at trial. In the mid-1800s, the rule was changed to give defendants the right to testify. Bentham was referring to the more restrictive rule of the time, not to the modern rule by which defendants are free to testify, and if they do not, the jury is prohibited from drawing an adverse inference. Several papers have investigated the welfare implications of different group decision rules, for example, Federson and Pesendorfer (1998) and Duggan and Martinelli (2001). Glaeser, Kessler, and Piehl (2000) examine data on prisoners incarcerated for drug crimes. Their empirical analysis reveals that Assistant U.S. Attorneys tend to prosecute wealthier and more educated individuals, perhaps in part because these cases increase their reputation, and therefore their salary in the private sector. Boylan and Long (2000) analyze federal drug trafficking cases from 1993 to 1996 and find that the plea rate is smaller in U.S. Attorney districts where private salaries are higher. Moreover the plea rate is smaller in districts where there are either few or many, but not an average number of, prosecutors. The authors’ explanation is that prosecutors take cases to trial to acquire human capital, unless they are closely monitored. For factual evidence of the disadvantages of having court-appointed counsel in capital cases, see Bright (1990, 1994) and Vick (1995).
References Baird, D.G., Gertner, R.H. and Picker, R.C. Game Theory and the Law. Cambridge, MA: Harvard University Press, 1998. Bentham, J. A Treatise on Judicial Evidence. London: Quality Court, 1825. Boylan, R.T. and Long, C.X. “Size, Monitoring, and Plea Rate: An Examination of United States Attorneys.” Political Economy Working Paper, Washington University in St. Louis, 2000. Bright, S.B. “Death by Lottery—Procedural Bar of Constitutional Claims in Capital Cases Due to the Inadequate Representation of Indigent Defendants.” West Virginia Law Review, Vol. 92 (1990), pp. 679–692. —— . “Counsel for the Poor: the Death Sentence Not for the Worst Crime But for the Worst Lawyer.” Yale Law Journal, Vol. 103 (1994), pp. 1835–1883. Bucke, T., Sreet, R. and Brown, D. The Right to Silence: The Impact of the Criminal Justice and Public Order Act 1994. A Research, Development and Statistics Directorate Report, Home Office Research Study 199, London: Home Office, 2000.
270
Hugo M. Mialon
Carter v. Kentucky, 450 U.S. 288, 1981 U.S. LEXIS 77 (January 14, 1981). Cooter, R. and Rubinfeld, D. “An Economic Model of Legal Discovery.” Journal of Legal Studies, Vol. 23 (1994), pp. 435–463. Criminal Justice and Public Order Act 1994: Elizabeth II. Chapter 33. London: The Stationery Office Books, 1994. Donohue, J.J. III and Levitt, S.D. “The Impact of Race on Policing and Arrests.” Journal of Law and Economics, Vol. 44 (2001), pp. 367–394. Duggan, J. and Martinelli, C. “A Bayesian Model of Voting in Juries.” Games and Economic Behavior, Vol. 37 (2001), pp. 259–294. Feddersen, T. and Pesendorfer, W. “Convicting the Innocent: The Inferiority of Unanimous Jury Verdicts under Strategic Voting.” American Political Science Review, Vol. 92 (1998), pp. 23–35. Gay, G.D., Grace, M.F., Kale, J.R. and Noe, T.H. “Noisy Juries and the Choice of Trial Mode in a Sequential Signaling Game: Theory and Evidence.” RAND Journal of Economics, Vol. 20 (1989), pp. 196–213. Glaeser, E.L., Kessler, D.P. and Piehl, A.M. “What Do Prosecutors Maximize? An Analysis of the Federalization of Drug Crimes.” American Law and Economics Review, Vol. 2 (2000), pp. 259–290. Griffin v. California, 380 U.S. 960, 1965 U.S. LEXIS 1490 (April 5, 1965). Gunther, G. and Sullivan, K.M. Constitutional Law. Thirteenth Edition, New York: The Foundation Press, 1997. Jackson, J., Wolfe, M. and Quinn, K. Legislating Against Silence: The Northern Ireland Experience. NIO Research and Statistical Series No. 1, 2000. King, N.J. “Postconviction Review of Jury Discrimination: Measuring the Effects of Juror Race on Jury Decisions.” Michigan Law Review, Vol. 92 (1993), pp. 63–130. Levy, L. Origins of the Fifth Amendment. New York: Oxford University Press, 1968. Milgrom, P.R. “Good News and Bad News: Representation Theorems and Applications.” Bell Journal of Economics, Vol. 12 (1981), pp. 380–391. O.J. Simpson Trial Transcripts and Documents. WESTLAW, Notable Trials Library, 1995. P’ng, I.P.L. “Strategic Behavior in Suit, Settlement, and Trial.” Bell Journal of Economics, Vol. 14 (1983), pp. 539–550. Reinganum, J.F. “Plea Bargaining and Prosecutorial Discretion.” American Economic Review, Vol. 78 (1988), pp. 713–728. Seidmann, D.J. and Stein, A. “The Right to Silence Helps the Innocent: A GameTheoretic Analysis of the Fifth Amendment Privilege.” Harvard Law Review, Vol. 114 (2000), pp. 431–510. Shin, H.S. “Adversarial and Inquisitorial Procedures in Arbitration.” RAND Journal of Economics, Vol. 29 (1998), pp. 378–405. Spier, K.E. “The Dynamics of Pretrial Negotiation.” Review of Economic Studies, Vol. 59 (1992), pp. 93–108. Vick, D.W. “Poorhouse Justice: Underfunded Indigent Defense Services and Arbitrary Death Sentences.” Buffalo Law Review, Vol. 43 (1995), pp. 329–460.
12 The effects of a right to silence Daniel J. Seidmann
1. Introduction Surely fewer persons will confess if police must warn them of their right to silence.1 Can it be supposed that the rule in question has been established with the intention of protecting [the innocent]? They are the only persons to whom it can never be useful.2 Common law criminal systems have typically given suspects in criminal trials a “right to silence”: if the suspect does not answer police questions then the jury must condition its verdict on the other evidence alone, deciding as if the suspect had not in fact been interrogated. The Supreme Court decided in Miranda v. Arizona (1966) that the Fifth Amendment privilege against self-incrimination would be violated were the jury to draw an adverse inference from silence after the suspect received a Miranda warning.3 The right to silence was first formalized in England in 1897, and was attenuated by the 1994 Criminal Justice Act, which allowed prosecutors to invite adverse inferences from a refusal to answer certain material questions.4 The Miranda decision is/was regarded as the apotheosis of the Warren Court’s jurisprudence, but the right to silence was controversial before then.5 There seem to be two reasons why it has attracted such fierce criticism. 1
2
It prevents the jury from using potentially informative evidence, as do hearsay and other exclusion rules. Furthermore, Bentham (1825) argued that the right seems particularly egregious as it is only advantageous to suspects who are factually guilty. On closer inspection, Bentham’s argument has two parts: a factual claim that innocent suspects do not exercise the right, and a theoretical claim that a suspect can only gain from the right if she exercises it. Critics have argued that a significant proportion of those who currently evade conviction by exercising the right would confess and then be convicted if the right were abolished.6 This argument rests on a prior claim that the right affects outcomes by operating on the margin between confession and silence.
272
Daniel J. Seidmann
The right’s many defenders have not challenged the theoretical basis of these critiques. Miranda’s supporters accept both parts of Bentham’s argument, but argue that it protects social interests other than accurate adjudication, such as rights to privacy (cf. Gerstein, 1970) and not to be subject to oppressive interrogation (cf. Stuntz, 1993). They also accept the claim that Miranda operates on the confession/silence margin; but they argue that Miranda has not reduced convictions significantly because the police have developed non-coercive methods to induce criminals to confess.7 English supporters of the right have typically argued that it provides a safe haven for some innocent suspects, who would otherwise make false confessions:8 accepting Bentham’s theoretical argument and the claim about the margin on which the right operates. Despite its apparent plausibility, Bentham’s argument does not work: for if exercising the right imposes positive externalities then a suspect could gain from existence of the right without exercising it herself. We present a model of the criminal process in which a criminal who exercises the right imposes the requisite positive externalities by separating from innocent suspects who tell the truth, but are unable to prove it. We analyse games played by a fact-finder (“jury”) and a suspect. The suspect has met a witness on one of a number of occasions, and is factually guilty if she met the witness at the crime scene. The suspect alone knows exactly when the meeting took place (her type); the jury only knows the prior probability of each possible meeting; and the witness has imperfect recall: he cannot distinguish the criminal from one of the innocent types of the suspect. The jury and the suspect are both ex ante uncertain of which innocent type the witness confuses with the criminal. The game starts with some non-strategic moves. Nature first chooses the occasion which the witness cannot distinguish from the crime and the suspect’s type. The witness then reports the suspect’s type as accurately as he can to the jury (but not to the suspect). The suspect then provides a statement (a claim of her type) to the jury, which chooses a verdict conditional on the witness’s report and the suspect’s statement.9 The game ends with the suspect receiving a sentence which depends on the jury’s verdict and on whether she confessed to the crime. Note that the witness behaves non-strategically: his report can be interpreted as (possibly physical) corroboratory evidence. We suppose that the jury seeks to avoid miscarriages of justice (convicting an innocent suspect of acquitting a criminal); that a suspect who confesses is always convicted, but receives a lower sentence than one convicted at trial; and that every suspect type seeks to minimize the sentence imposed. We analyze two versions of this game: in the (post-1994) “English game”, the jury can draw an adverse inference from silence; in the “American game”, the jury must use the witness report alone to reach its verdict if the suspect is silent. We analyse both games by characterizing those Bayes perfect equilibria in which every innocent type chooses to make a true statement: namely, equilibria which satisfy the factual premise of Bentham’s argument.
The effects of a right to silence
273
An innocent suspect’s statement is therefore always consistent with the report, even though the witness may not exonerate her; so lying is risky because the jury always draws an adverse inference if the statement and report are inconsistent. We describe the effect of a right to silence by comparing equilibrium play in American and English games. We provide two main results. Before doing so, it is useful to define a condition on innocent types. We will say that such a type is “plausible” if her prior probability is sufficiently high that the jury would acquit if it only observed an indecisive report (which does not distinguish this type from the criminal): •
•
The probability that an innocent suspect is convicted is at least as great in the English as in the American game. To see this, note that the criminal is never silent in an English game: for innocent suspects always make exculpatory statements, so the jury would draw an adverse inference if the criminal were silent. Instead, the criminal makes a false statement if the premium for confession is not too high. If the criminal lies then she replicates the true statements of suspect types whom the witness is least likely to distinguish from the criminal, because the jury would convict a suspect whose statement is contradicted by the witness. If one of these types is implausible then the jury would convict with positive probability if the suspect claimed to be of this type and the report did not contradict this statement. Implausible innocent suspects may therefore be convicted in the English game if they are not exonerated. By contrast, the criminal may exercise an available right to silence in the American game because she thereby avoids the risk that a false statement is contradicted by the report. If the criminal separates from all innocent types by exercising her right then the jury draws a favourable inference if the suspect waives her right by making an exculpatory statement; so implausible types would be rightfully acquitted in the American game. In sum, a criminal who is silent in the American game may impose a positive externality on some innocent suspects; so Bentham’s claim that the right cannot benefit innocent suspects who tell the truth is wrong. If confession secures a low premium (as lawyers typically claim) then the American and English games have the same confession rates, but the conviction rate is higher in English games and the silence rate is higher in American games. This result is easy to see when some types are plausible, as the criminal would not confess in either game, exercising her right in the American game and lying in the English game. The criminal exercises her right so as to reduce the probability of conviction; so both the criminal and innocent types are more likely to be convicted in the English game. The confession rate is also equal if no types are plausible, as a silent suspect would then be convicted for sure in both games. Consequently, the two games possess the same equilibrium path, on which the criminal mixes between confessing and making false statements.
274
Daniel J. Seidmann Our model therefore has different testable implications from the various theories mentioned above, which do not consider the possibility that criminals may lie. We will argue that the best available evidence is consistent with the implications of our model: the 1994 English legislation significantly reduced the silence rate without significantly changing the confession rate; and Miranda significantly reduced the clearance rate (a proxy for the conviction rate).
This chapter is part of the literature on evidence and game theory, in which Seidmann and Stein (2000) and Sanchirico (2001) are the closest relations.10 Seidmann and Stein provide an informal account of the model in this paper, and discuss its power as an explanation of Fifth Amendment jurisprudence. Sanchirico analyzes a model in which the jury only observes the suspect’s statement, whose cost is (suspect-)type dependent, and the jury commits to a decision rule in order to affect the suspect’s incentives to commit a crime. Sanchirico shows that the jury’s optimal decision rule may not induce truthtelling. We adopt a more primitive model of interrogation by explicitly incorporating corroboratory evidence; and we suppose that the jury’s verdict is a best response in each legal regime. On the other hand, we suppress deterrence arguments, treating the jury’s disutility from miscarriages of justice as primitive. Our model, like Schrag and Scotchmer (1994) and Daughety and Reinganum (1995), focuses on the exclusionary rules which dominate Anglo-American evidence law; but we consider an exclusionary rule which can be waived at the suspect’s discretion, so that a waiver is itself informative. Our model is also part of the literature on cheap talk and verifiable message games, initiated respectively by Milgrom (1981) and Crawford and Sobel (1982). If the witness report were entirely uninformative then our model would boil down to a cheap talk game with an outside option (confession), where suspect types could not separate in any Bayes perfect equilibrium. Interrogation is informative in our model because different types hold different beliefs about the report, allowing suspects to separate in some Bayes perfect equilibria. If the witness report were fully informative then every innocent suspect could “prove” her true statement, as in conventional verifiable message games. A silent criminal would be convicted even if the jury could not draw an adverse inference from silence; so the criminal would confess, whatever the legal regime. Like Milgrom and Roberts (1986) and Lipman and Seppi (1995), we consider games in which some statements are provable; but unlike these papers, we derive provability from our interrogation model. Our model can also be viewed as exploring the effects of jury naivety (cf. Milgrom and Roberts (1986) and Froeb and Kobayashi (1996) in the sense that the jury acts as if it had non-partitional beliefs. However, we derive the jury’s possible failure to convict after silence as the consequence of an evidentiary rule; so the jury would not change its beliefs, for fixed evidence, if it reflected on its informational structure (cf. Squintani, 2002).
The effects of a right to silence
275
We present our model in Section 2. We present our results and discuss them in relation to the data in Section 3, summarizing in Section 4. We relegate proofs to an Appendix, which contains a complete characterization of play in generic games.
2. Model In this section, we model interrogation and sentencing using variants on a signalling game. The games are each played by a privately informed suspect, who can signal her type by responding to interrogation, and a fact-finder (“jury”), which chooses a verdict after observing all of the admissible evidence. In contrast to conventional signalling games, we also incorporate a non-strategic witness, who privately provides the police with further evidence. We distinguish between games in which the suspect has and does not have a right to silence. Our model is motivated by the following story. A known crime has been committed. The police arrest a suspect on the basis of some commonly known prior evidence, and present her to a witness who is known to have met the suspect once: either at the scene of the crime or on some innocent occasion. The police ask the suspect when she met the witness. The witness, whose memory is imprecise, then honestly reports when he met the suspect. If the suspect confesses then she is automatically convicted; otherwise, she is sent to trial, where the jury either acquit or convict after observing the evidence. If the suspect confesses then she receives a lesser punishment than if she had been convicted at trial. It is convenient to present some common features of the two games before distinguishing between them. We start our formal presentation by describing Nature’s moves, which select the suspect’s type and the witness report. Let T > 2 denote the finite set of occasions on which the suspect and witness may have met, with generic element t. We distinguish one of these occasions as the crime (denoted c), and we write the other (innocent) occasions as the set I. It will prove convenient to identify I with the integers {1, . . ., T − 1}.11 Nature draws the true occasion from T according to a commonly known prior distribution p: where pt > 0 is the probability that the witness met the suspect on occasion t ∈ T. We interpret pc as a measure of the strength of evidence prior to interrogation (the “prior evidence”). We assume that the suspect privately observes the realization of t: namely, she knows when she met the witness, but the jury does not. Accordingly, it is convenient to use t to denote both the suspect’s “type” and a particular occasion. We now describe the distribution of the witness’s report, conditional on the Sender’s type. We model the witness’s imprecise recall by assuming that he cannot distinguish between the criminal and exactly one other suspect type, say t. Specifically, if the police ask the witness to identify a suspect of type c or of type t then the witness can only say that the suspect’s type is in
276 Daniel J. Seidmann the pair {c, t}; and if the suspect is of any other type then the witness can identify her type exactly. In the former case, we say that type t is “indistinguishable from the criminal”. We assume that the witness reports the suspect’s identity as accurately as he can; so the witness report (denoted w) is either a pair {c, t} or a singleton innocent type. We will say that (innocent) type t is “exonerated” in the latter case, and we refer to the report {c, t} as “t-indecisive”. We model the suspect’s uncertainly by supposing that the witness cannot distinguish the criminal from innocent type t with commonly known probability qt > 0: where 冱t ∈ I qt = 1. We order the innocent types such that t < u implies that qt ≥ qu. We will say that type (resp. statement) t is “more suspicious” than type (resp. statement) u if qt > qu. It will prove convenient to define qT ≡ 0. Our model implies that an innocent suspect of type t knows that the witness cannot distinguish her from the criminal with probability qt and otherwise exonerates her, while the criminal knows that the witness cannot distinguish her from each innocent suspect of type t with probability qt. The suspect’s beliefs about the report are therefore type-dependent: a feature which interrogation exploits. We now describe the choices which players can make. The suspect chooses some “statement” (denoted s), which is either silence or an element of T. We will refer to statement c as a “confession”, and will describe any statement s ∈ I as “exculpatory”. We will say that statement s ∈ T is “true” if it is made by type s, and that it is otherwise “false”. We interpret confession as equivalent to a guilty plea. If the suspect does not confess then the case proceeds to trial, where the jury chooses a verdict: either acquitting or convicting at trial. We will refer to a pair <s, w>, consisting of the witness’s report and the suspect’s statement as the “evidence”. We will describe evidence as “t-indecisive”. If the suspect makes statement t and if t ∉ w then we will say that the report “contradicts” the suspect’s statement. We now define pay-offs in the games. We assume that the suspect’s payoff only depends on (the punishment associated with) the jury’s verdict: preferring to be acquitted than to be convicted after a confession, and the latter over being convicted without a confession. We represent this preference ordering by a utility function which we normalize such that a suspect earns 0 if she is convicted without confessing and 1 if she is acquitted. Given this normalization, we write U for the suspect’s pay-off if she confesses and is convicted. U then measures the “premium for confession”. Our assumption that a suspect-type’s pay-off only depends on the verdict excludes various arguments in the legal literature on the right: on the one hand, we assume that a criminal neither obtains direct utility from confessing nor direct disutility from breaking her oath;12 on the other hand, we assume that innocent suspects neither remain silent because they distrust the police nor confess to protect loved ones. We will focus on strategy combinations in which innocent suspects make
The effects of a right to silence
277
true statements. Accordingly, we require that the premium for confession is low enough that innocent suspects do not have an incentive to confess falsely. Specifically, we suppose that U < 1 − q1.13 Following Feddersen and Pesendorfer (1998), we suppose that there is D ∈ (0, 1) such that the jury earns a pay-off of • • •
0 whenever it acquits an innocent suspect or convicts a criminal who has not confessed; − D whenever it convicts an innocent suspect who has not confessed; and − (1 − D) whenever it acquits the criminal.14
In this model, D is the standard of proof: the critical probability of guilt at which the jury is indifferent between acquitting and convicting at trial. The jury is faced with a binary decision, so its behaviour is fully characterized by D. We will say that the suspect is “caught red-handed” whenever D < pc. If the jury did not condition its verdict on the suspect’s statement then it would always convict a criminal who was caught red-handed: for the posterior probability that such a suspect is the criminal exceeds the prior probability, and therefore the requisite standard of proof, for every possible report. We now use D to describe a notion of plausibility, which only depends on the prior beliefs p and the standard of proof.15 We define the set of innocent types P as
冦t ∈ I: p + p < D冧. pc
c
t
If t ∈ P then the prior evidence is weak enough that the jury would maximize its pay-off by acquitting the suspect on the basis of the prior evidence and report {c, t} alone. If t ∈ P then we will say that statement t and type t are “plausible”; and we will otherwise say that they are “implausible”. We now define strategy spaces and games. A suspect of type t chooses any statement s ∈ T after observing her own type (but not the witness report). We identify an information set for the jury with any pair <s, w> such that the suspect has not confessed. We define the two games by restricting the verdicts which the jury can reach at particular information sets. We say that an “English game” is played if • •
A suspect who confesses earns a pay-off of U; and The jury chooses between acquittal and conviction at every information set where the suspect has not confessed.
278 Daniel J. Seidmann We represent the Miranda regime by an “American game” which only differs from an English game by restricting the jury’s available verdicts if the suspect is silent. Specifically, we suppose that • • •
The jury has the same choice set in American and English games unless the suspect is silent; The jury must acquit a silent suspect in an American game if the report is t-indecisive and type t is plausible; and The jury must convict a silent suspect in an American game if the report is t-indecisive and type t is implausible.
Note well that a criminal can only guarantee acquittal by exercising her right if every exculpatory statement is plausible. Judges in American and English trials instruct the jury to reach its verdict solely on the basis of the evidence presented in court. A suspect’s response to interrogation is admissible in English trials; so the prosecution can invite the jury to draw an adverse inference from silence, as in an English game. By contrast, a suspect’s silence is inadmissible in American trials, and the prosecution cannot invite an adverse inference from silence. Our model of American trials captures this feature by restricting the jury to the verdict which it would rationally reach if it ignored inadmissible evidence.16 A priori, it is possible that American juries always become aware of a suspect’s silence, and ignore the judge’s instruction. If so, then the same strategy combinations would constitute an equilibrium under English and American rules. In Section 3, we will refer to evidence that a switch of regime does affect play. This evidence seems to be inconsistent with the hypothesis that American juries always draw an adverse inference from silence. Viewed in this light, our extreme hypothesis that American juries never draw an adverse inference from silence seems at least approximately realistic.17 We now motivate and then define our solution concept. All statements other than confession are cheap talk, so both games may possess Bayes perfect equilibria in which the suspect types pool. This outcome can be supported by many combinations of suspect strategies, as in cheap talk games. In particular, the probability that the suspect is silent takes every value in [0, 1] in the pooling equilibrium correspondence. Accordingly, we need a refinement of Bayes perfection in order to generate testable implications. We analyse the two games by characterizing those Bayes perfect equilibria in which every innocent type makes a true statement. We refer to such strategy combinations as “equilibria”, and to the associated distribution across terminal nodes as “outcomes”. From an empirical standpoint, our solution concept is consistent with a widespread view among American lawyers that innocent suspects typically tell the truth.18 From a theoretical standpoint, the solution concept allows us to analyse Bentham’s argument by imposing his premise.
The effects of a right to silence
279
Each game is characterized by several parameters: the premium for confession U; the prior probabilities {pt}; the suspiciousness of types {qt}; and the standard of proof D. We will refer to a vector of these parameters as a “case”, and will compare equilibria across the two games for a given case. Each game may have multiple equilibrium outcomes in some non-generic cases. For example, the criminal might make either of statements t and u if the witness is equally likely to confuse types t and u with the criminal; while the confession rate might differ across equilibria if there is some set of innocent types (say, S) such that U = 冱u ∈ S qu. Finally, the exposition is pc unnecessarily complicated if the standard of proof equals either pc or pc + pt (cf. our arguments in Section 3). On reflection, these cases seem unlikely because the determinants of the various parameters are approximately independent. Accordingly, we will say that a case is “generic” if
冱
•
qt ≠
•
D ∉ pc,
•
U≠
u ∈S
冦
冱
qu for every S 傺 I and t ∉ S; and
冧
pc for any t ∈ I; and pc + pt
u ∈S
qu for any S ⊆ I.
We focus on generic cases in order to simplify exposition. The first condition allows us to order types such that qt > qu whenever t < u. Type 1 is therefore the type most likely to be indistinguishable from the criminal; so we will sometimes refer to her as the “most suspicious” type, and analogously for statement 1.
3. Results In this section, we describe equilibrium play in English and American games, and then assess the effects of a right on the confession rate; on the probability of wrongfully convicting an innocent suspect; and on the jury’s equilibrium pay-off. We start by describing and then interpreting the unique outcome of the English game. We will need some further notation before stating our results. •
For every innocent type t, we define πt as
D
pt . 1 − D pc
Note that πt > 1 if and only if statement t is plausible. If the criminal makes implausible statement t with probability πt then the jury is indifferent between acqutting and convicting after observing t-indecisive
280 Daniel J. Seidmann
•
evidence. Rearranging terms, it is easy to confirm that 冱t ∈ I πt < 1 if and only if the suspect is caught red-handed. We define type y by max{t ∈ I: 冱u < t πu < 1} y≡
T
if D > pc otherwise
where π0 ≡ 0. •
We will demonstrate that any innocent types t > y cannot be wrongfully convicted in equilibrium because the criminal never makes statements t > y in either game.
Our main results on the effects of a right to silence concern the confession rate and the probability of wrongful convictions; so Proposition 1 describes the pertinent features of equilibrium play in English games. (Recall that we restrict attention to Bayes perfect equilibria in which all innocent types tell the truth.) Proposition 1. equilibrium:
Every generic English game has a unique outcome. In every
(a) The criminal is never silent; (b) The criminal • •
Confesses for sure if and only if U > q1; Mixes between confession and some false statement(s) if and only if qy < U < q 1;
(c) Innocent types are convicted with positive probability if and only if U < q1 and statement 1 is implausible. || Proposition 1(b) implies that the suspect types may separate, even though every statement other than confession is cheap talk, and all types share a common preference ordering over the jury’s action. By contrast, every Bayes perfect equilibrium of a cheap talk game with a type-independent preference ordering is pooling. Suspect types can partially separate in our model because the witness report is partially informative, and every type holds different beliefs about the realized report. We prove Proposition 1 in the Appendix as a direct corollary of Lemma A.1, which fully describes equilibrium play in generic English games. We now sketch the arguments underlying the Lemma. Every innocent suspect is acquitted when she is exonerated. An innocent suspect can never gain from making a false exculpatory statement: for she
The effects of a right to silence
281
would then be convicted unless exonerated, as her statement would be contradicted. Furthermore, she does not falsely confess because she is likely enough to be exonerated.19 The criminal is never silent in an equilibrium of the English game: all innocent types make true statements (by definition of an equilibrium); so the jury would infer that a silent suspect is the criminal were the latter to be silent with positive probability in equilibrium. As the jury would convict a silent suspect, the criminal could profitably deviate to confessing, which secures some premium. Three possibilities remain: the criminal may confess for sure, mix between confession and some statement(s), or not confess for sure. We start by considering the set of statements which the criminal might make in equilibrium. If the criminal were to make some plausible statement for sure then she would be convicted if and only if the report contradicted her statement. Thus, a criminal who makes any statement for sure must make the most suspicious plausible statement, as the report is least likely to contradict this statement. Conversely, the criminal cannot make any implausible statement for sure as she would then be convicted, irrespective of whether her statement contradicts the report. On the other hand, suppose that the criminal did not make some statement t in equilibrium. The jury would then acquit the suspect after observing statement t and any report which did not contradict it. Hence, a criminal who makes statement u in equilibrium must also make every statement t < u with positive probability in equilibrium, and no statement t < u can be plausible. The criminal can only make every statement t < u with positive probability in equilibrium if she is indifferent between making these statements. The jury must convict after observing a contradicted statement; so the criminal can only be indifferent between statements t and u with t < u if the jury is more likely to convict after observing an uncontradicted statement t than an uncontradicted statement u. Conversely, the jury only convicts with positive probability after observing an uncontradicted statement if the probability that the suspect is the criminal conditional on this evidence equals the standard of proof: the “jury incentive condition”. This condition, in turn, determines the probability with which the criminal makes statement t < u in equilibrium. For a fixed premium, the criminal has two possible motives for confession: either • •
The witness is likely enough (relative to the premium for confession) to contradict any exculpatory statement that the criminal might make. Specifically: U > q1; or The witness is likely enough (relative to the premium for confession) to contradict any plausible exculpatory statement that the criminal might make. Specifically: U > qy.
282 Daniel J. Seidmann A criminal who confesses for sure earns U, which must exceed q1, else she could profitably deviate to making statement 1. The converse is obvious. If the criminal mixes between confession and some statement(s) then the jury incentive condition above requires that U > qy. Furthermore, the criminal must make every statement t such that U < qt with positive probability, else she could profitably deviate to that statement. In sum, the criminal can only mix between confession and some statement(s) if qy < U < q1. The converse is again obvious. We now consider the conditions under which the suspect may be wrongfully convicted. A plausible type (say, t) cannot be convicted in equilibrium, even if the criminal makes statement t for sure. Indeed, the criminal would make statement 1 if type 1 were plausible and U < q1. Under these circumstances, the criminal may evade rightful conviction; but there would be no wrongful convictions, as an implausible type is only convicted in equilibrium if the criminal replicates her true statement. Conversely, an implausible type 1 is (wrongfully) convicted with positive probability whenever U < q1 as the criminal replicates her statement with positive probability, and the jury must convict with positive probability if the suspect makes statement 1 and is not exonerated to ensure that the criminal confesses in equilibrium. Consequently, the criminal’s strategy imposes a negative externality on an implausible type 1: reducing the credibility of her true statement by lowering the posterior probability that she is innocent if she is not exonerated. We now turn to the American game, in which the jury is forced to acquit (resp. convict) a silent suspect whenever the witness reports {c, t}, and type t is plausible (resp. implausible). We will say that the criminal “exercises her right (to silence)” if she is silent with positive probability (possibly equal to 1) in an American game. If the criminal exercises her right then she earns 冱u ∈ P qu.20 A suspect can therefore exercise two alternative outside options (silence and confession) in an American game. Proposition 2. Every generic American game has a unique outcome. In every equilibrium: (a) The criminal exercises her right to silence if max{U, qy} < 冱u ∈ P qu. (b) The criminal • •
Confesses for sure if and only if U > max{q1, 冱u ∈ P qu}; and Mixes between confession and some false statement(s) if and only if
冦 冱
max qy,
u ∈P
冧
qu < U < q1.
(c) Innocent types are (wrongfully) convicted with positive probability if and only if either
The effects of a right to silence max{U, qy} ≤
冦 冱
max U,
u ∈P
冦 冱
max qy,
冱
u ∈P
u ∈P
283
qu < q1 or
冧
qu < qy < q1 or
冧
qu < U < q1.
||
We prove Proposition 2 in the Appendix as a direct corollary of Lemma A.2, which fully describes equilibrium play in generic American games. We now sketch the arguments underlying the Lemma. An innocent suspect never has an incentive to exercise her right. To see this, note that she is acquitted for sure if she is exonerated; so she could only profitably deviate if she improved her chances of acquittal when she is not exonerated. Any plausible type is always acquitted, and can therefore not profitably deviate. Any implausible type who is not exonerated is convicted with probability less than one in equilibrium, but would be convicted for sure if she were silent; so she strictly prefers to make a true statement. Proposition 2 is therefore consistent with Bentham’s factual claim that innocent suspects do not exercise the right (cf. Section 1). The criminal is never silent in an English game, as the jury would draw an adverse inference in equilibrium (cf. Proposition 1(a)). The English game can therefore only possess a separating equilibrium if the criminal confesses for sure: namely, if the premium for confession is large enough (cf. Proposition 1(b)). By contrast, the jury cannot draw an adverse inference from silence in an American game. Consequently, the criminal can avoid the risk of a false statement being contradicted by exercising her right. The conditions in Proposition 2(a) ensure that she neither has an incentive to confess nor to risk making statement 1. The American game may therefore possess separating equilibria in cases where innocent suspects are wrongfully convicted in the English game. Conversely, availability of the right is irrelevant if all relatively suspicious (namely, low-t) types are implausible, as a silent criminal is likely to be convicted; so the American and English games possess the same outcome, which is unique in generic cases. The criminal confesses in an American game if the premium is high enough or if all relatively suspicious types are implausible. In the former set of cases, the criminal confesses for sure in an English game; in an American game, the premium must also be high enough that the criminal has no incentive to exercise her right. In the latter set of cases, the right is irrelevant; so the criminal confesses with the same probability in the two games. We will now compare equilibrium play in the two games for every fixed generic case.
284
Daniel J. Seidmann
Thus far, we have used the term “conviction” as short hand for conviction at trial; whereas the conviction rate reported in empirical studies aggregates the conviction rate at trial and after a guilty plea, which we identify with confession. Accordingly, our next results focus on aggregate convictions. Proposition 3.
For every generic case:
(a) The criminal is at least as likely to be silent in American as in English games, and in some cases is more likely; (b) The suspect is at least as likely to be convicted (either after confessing or at trial) in English as in American games, and in some cases is more likely; (c) The criminal is at least as likely to confess in English as in American games, and in some cases is more likely. || We prove Proposition 3 in the Appendix. Part (a) follows from the observation that the criminal exercises her right in some American games, but is never silent in a generic English game. If play differs across the two games then the criminal must exercise her right in the American game. In any such (generic) case, she is convicted or confesses with lower probability in the American than in the English game. If she would confess for sure in the English game then there are no wrongful convictions in either game. If she would make some false statement(s) in the English game then exercising the right raises the credibility of some innocent types’ statements, reducing the probability that they are convicted. In sum, every suspect type is weakly less likely to confess or be convicted at trial in American games. Proposition 3(c) is consistent with the claim that instructing the jury not to draw an adverse inference from silence reduces the confession rate by inducing criminals to exercise their right to silence. However, inspection of the proof reveals that this effect occurs if and only if the premium for confession is high enough and suspicious enough types are plausible.21 This condition is important because of the widespread view among lawyers in the U.S. and in the U.K. that confessions secure a very small remission of sentence.22 We formalize the consequences of this observation via the following definition. We will say that “confession secures a low premium” if U < qT − 1. We prove our next result in the Appendix. Proposition 3′. case:
If confession secures a low premium then, for every generic
(a) The criminal is at least as likely to be silent in American as in English games, and in some cases is more likely; (b) The suspect is at least as likely to be convicted (either after confessing or at trial) in English as in American games, and in some cases is more likely;
The effects of a right to silence
285
(c) The criminal is equally likely to confess in American and in English games. || Proposition 3′ implies that parts (a) and (b) of Proposition 3 continue to hold when confession secures a low premium, but that the criminal only confesses with different probabilities in American and English games if this condition fails. The criminal may, of course, exercise her right in the American game when confession secures a low premium. In such cases, she reduces the probability that any suspect type is convicted at trial as compared to equilibrium play in the English game; so the suspect is again less likely to either confess or be convicted at trial in American games. If confession secures a low premium then the criminal only confesses in either game if no type is plausible. Confession is a better outside option than silence in such American games; so the confession rate is equal in the two games. We now use Proposition 3′ to define some testable implications. Suppose that a sample of empirical cases is drawn from the population according to some distribution function whose support is the set of generic cases in which confession secures a low premium. We use such a distribution function to define some statistics. Lemmas A.1 and A.2 (in the Appendix) state the probability with which the criminal (and therefore any suspect type) confesses in the English and American games, for any fixed generic case. We define the “confession rate” by integrating the probability that the criminal confesses in each generic game with respect to the distribution function. We define the “silence” rate and the “aggregate conviction rate” (at trial or after confession) analogously.23 Proposition 3′ immediately implies Corollary 3′. If confession secures a low premium then (a) The silence rate is higher in American than in English games; (b) The aggregate conviction rate is higher in English than in American games; (c) The confession rate is equal in American and English games. || Corollary 3′ presents a collection of testable predictions of our model on the maintained hypothesis that confession secures a low premium. The collection of predictions distinguishes our model from other theories of the effects of a right to silence, which ignore the possibility of lying, and focus on the trade-off between confessing and exercising the right: •
•
American supporters claim that the right has had little effect on the silence rate or the confession rate, perhaps because police have found means to circumvent the effect of Miranda warnings.24 This account is consistent with part (c) of the Corollary, but inconsistent with the other parts. American opponents (resp. English supporters) claim that criminals
286 Daniel J. Seidmann (resp. innocent suspects) who would otherwise confess and be convicted, exercise the right and are acquitted. Both accounts are consistent with parts (a) and (b) of the Corollary, but inconsistent with part (c). The predictions of a differential silence rate and an equal confession rate can only be tested using reliable data on the outcome of police interrogations before and after a change in legal regime.25 Several careful studies have generated data on interrogation in the legal regime prevailing since the Supreme Court decided Miranda in 1966; but the only data on interrogation before 1966 come from a sample of police files which were not originally constructed for scientific purposes. Accordingly, it is difficult to use American data to test these predictions.26 Fortunately, the British Home Office collected data on interrogation before and after the 1994 Criminal Justice Act, which allowed English juries to draw an adverse inference from a suspect’s refusal to answer some material questions.27 Bucke, Street and Brown (2000) report that the silence rate in the Home Office sample was significantly lower after 1994, but that the confession rate did not significantly change. This evidence is consistent with our model, but inconsistent with the alternative theories.28 The statistically significant effect of the legislation on the silence rate is also consistent with our modelling assumption that some American juries obey the injunction not to draw an adverse inference if they believe that the suspect exercised her right to silence. While the Home Office did not collect evidence on the judicial fate of the suspects whose interrogation they recorded, the FBI have collected annual data on clearance rates for various categories of crime.29 Clearance rates dropped precipitately in the years immediately following the Supreme Court’s Miranda decision (in 1966), and have not subsequently recovered. Cassell and Fowles (1998) demonstrate that a dummy for Miranda is a statistically significant explanation of clearance rates after controlling for such plausible alternatives as the age distribution and police resources. We interpret this evidence as consistent with Corollary 3′(b), which asserts that the aggregate conviction rate is higher in English than in American games.30 Our model has some additional implications, which can be tested using cross-sectional data, such as the relationship between confessions and the strength of evidence. Recall our interpretation of pc a measure of the strength of evidence against the suspect prior to interrogation (cf. Section 2). If pc is large enough then no statement can be plausible; and Proposition 2 then implies that the criminal confesses with positive probability in an American game. Our model, like others, is therefore consistent with evidence that suspects are more likely to confess, the stronger is the prior evidence against them.31 Our model has further implications which can only be tested using observations on the factual innocence of suspects. In the remainder of this section,
The effects of a right to silence
287
we compare the incidence of wrongful convictions and the jury’s pay-off in American and English games. If confession secures a large enough premium and some statements are plausible then the criminal confesses in the English game; whereas she is silent in the American game, and is convicted at trial whenever the witness cannot distinguish her from an implausible type. Any ranking of the jury’s pay-offs in the two games then depends on a comparison of its pay-offs when the criminal is convicted at trial on the one hand, and when she confesses on the other. The social cost of guilty pleas is controversial, so we focus on cases in which the ranking of jury pay-offs is independent of this trade-off.32 In Proposition 4(b), we show that pay-offs can be ranked whenever confession secures a low premium: the premise of Proposition 3′. Proposition 4. (a) For every generic case: the suspect is at least as likely to be wrongfully convicted in English as in American games, and in some cases more likely; (b) For every generic case in which confession secures a low premium: the jury earns at least as much in English as in American games, and in some cases is strictly better off. || We prove Proposition 4 in the Appendix. Our solution concept imposes the factual premise of Bentham’s argument (cf. Section 1): that criminals alone can exercise a right to silence. Proposition 4(a) demonstrates that his conclusion is false: innocent suspects may gain from existence of a right which they do not exercise because a criminal’s statement imposes externalities on suspects who cannot prove their factual innocence. Specifically, a silent criminal separates from innocent suspects who, by construction, make true statements. If the criminal is silent for sure then every innocent suspect is always acquitted; so there are fewer wrongful convictions in the American game. If the criminal mixes between silence and some exculpatory statements then the criminal pools with fewer innocent types in the American than the English game. Furthermore, if the criminal makes statement t in both games then the jury acquits with higher probability after observing t-indecisive evidence in the American game.33 Hence, wrongful convictions are again less likely in the American game. Proposition 4(a) then follows from the observation that the two games possess the same outcome whenever the criminal does not exercise her right. The lower incidence of wrongful convictions in American games is a byproduct of the criminal’s attempt to evade rightful conviction. Consequently, if wrongful convictions are less likely in the American game then so are rightful convictions. The jury’s equilibrium pay-off, which is decreasing in either miscarriage of justice, captures this trade-off. Proposition 4(b) asserts that the right’s direct effect (reducing rightful convictions) dominates its indirect effect (reducing wrongful convictions). This result is not itself
288
Daniel J. Seidmann
surprising, as it states that a decision maker (the jury) would not gain by committing to ignore some ex post informative evidence;34 but the fact that Proposition 4(b) holds unambiguously is rather striking. If the jury’s preferences over verdict-type pairs represented social preferences then Proposition 4(b) would assert that the social costs of a right outweigh its benefits in our model.35 The related literature suggests a number of reasons why the right to silence might nevertheless be justified on consequentialist grounds, such as: the jury’s prior beliefs may be biased; or its preferences may entail a standard of proof which is too lax for the purposes of marginal deterrence; or it may draw an unduly adverse inference from silence.36 We conjecture that the right to silence could be justified were our model amended to incorporate such effects. More surprisingly, the right can be socially justified, without appeal to such effects, if the nature of the crime committed is itself uncertain. In such a model, a criminal who commits the most serious feasible crime may separate from other criminal types if and only if she can exercise a right to silence, thereby allowing those who have committed lesser crimes to confess credibly in equilibrium. The jury may then be better off ex ante with a right to silence.37 It would therefore be interesting to know how widely Proposition 4(b) generalizes?
4. Summary Critics of the right to silence have argued that it impedes truth-seeking to the exclusive benefit of criminals; and that, in practice, it reduces the aggregate conviction rate by offering criminals a better alternative than confession. Defenders of the right have accepted Bentham’s claim that a suspect can only gain from the right if she exercises it herself; and have argued that, in practice, the right is ineffectual or that it is a safe haven against false confessions. There are two problems with this consensus. Bentham’s claim is wrong in theory: innocent suspects could, in principle, gain from criminals’ exercise of the right; and the best available evidence suggests that the right reduces the aggregate conviction rate without affecting the confession rate. We have presented a model which addresses both of these problems. According to our model, the right can protect innocent suspects who are unlikely to be exonerated and whose true statements are implausible from wrongful conviction. Absent a right, the criminal may mimic such an innocent suspect’s true statements, and the jury would then convict her if she is not exonerated; but a criminal who exercises her right is less likely to pool with such an innocent suspect, who is therefore less likely to be wrongfully convicted. Our model also implies that the right reduces the aggregate conviction rate without affecting the confession rate if the premium for confession is relatively low. If no types are plausible then the criminal would not exercise an available right, which would therefore be empirically irrelevant; and if
The effects of a right to silence
289
some types are plausible then a criminal who exercised her right would make false statements, absent a right. Abolition of the right would therefore raise the probability that the criminal is convicted after being caught in a false alibi without affecting the confession rate.
Appendix It will prove convenient to use some additional notation. Our first expression depends on the premium for confession and on the probabilities of t-indecisive reports: •
We define a critical type x as
x≡ •
•
if U > qT − 1
T
otherwise.
We will demonstrate that the criminal only makes statements t < min{x, y + 1} in an equilibrium of an English game. Accordingly, it is useful to note that x > y if and only if U < qy, and that x = 1 if and only if U > q1. We define z as z≡
•
min{t ∈ I : U > qt}
1 min{t ∈ I : qt < 冱u ∈ P qu} T
if q1 ≤ 冱u ∈ P qu if q1 > 冱u ∈ P qu > 0 if no statement is plausible.
We will demonstrate that the criminal only makes statements t < min({x, y + 1, z} in an equilibrium of an American game. Accordingly, it is useful to note that y = z = 1 if statement 1 is plausible, but that y > z = 1 is possible if statement 1 is implausible. If the suspect is caught redhanded then no statement is plausible, so y = z = T. However, no statement need be plausible even if the suspect is not caught red-handed; so we could also have y ≤ T − 1 < z. We write for silence.
Proposition A.1. Every generic English game has a unique outcome. In every equilibrium: (a) The criminal is never silent; (b) The criminal • •
Confesses for sure if and only if U > q1; Mixes between confession and some false statement(s) if and only qy < U < q1;
290
Daniel J. Seidmann
(c) Innocent types are (wrongfully) convicted with positive probability if and only if U < q1 and statement 1 is implausible. Proof. Proposition A.1 follows from Lemma A.1 below, which fully characterizes equilibrium play in generic English games: Lemma A.1. (a) Every generic English game in which U < qy possesses a unique outcome. On every equilibrium path: • •
• •
The criminal makes exculpatory statement t < y with probability πt and otherwise makes statement y; The jury acquits the suspect for sure if she is exonerated or if the evidence is t-indecisive and t ≥ y; acquits the suspect with probability qy /qt if the evidence is t-indecisive and t < y; and convicts the suspect if she makes any statement t < y and the report contradicts this statement; Every innocent type t < y is acquitted with probability 1 − qt + qy and every other innocent type is acquitted for sure; The criminal is acquitted with probability qy.
(b) Every generic English game in which U > qy possesses a unique outcome. On every equilibrium path: • •
• •
The criminal makes exculpatory statement t < x with probability πt and otherwise confesses; The jury acquits the suspect for sure if she is exonerated or if the evidence is t-indecisive and t ≥ x; acquits the suspect with probability U/qt if the evidence is t-indecisive and t < x; and convicts the suspect if she makes any statement t < x and the report contradicts this statement; Every innocent type t < x is acquitted with probability 1 − qt + U and every other innocent type is acquitted for sure; The criminal is acquitted with probability U 冱t < x πt.
Proof. Recall that U < qy if and only if x > y. We first prove that there are strategy combinations which satisfy the conditions above and which form an equilibrium, and then prove uniqueness. We construct the requisite strategy combinations by supposing that the jury convicts for sure unless it observes evidence which occurs with positive probability when the suspect is innocent.38 The jury’s strategy is then a best response as the posterior probability that the suspect is the criminal, conditional on observing t-indecisive evidence, equals D for every t < y if U < qy, and for every t < x if U > qy.
The effects of a right to silence
291
No innocent suspect can profitably deviate from making a true statement: for the jury acquits her whenever she is exonerated in the putative equilibrium; and the jury would convict her if she were not exonerated and were either silent or made a false statement. Finally, she cannot profitably deviate to confessing because U < 1 − q1. If U < qy then the criminal is indifferent between all the statements that she makes with positive probability; and these choices form a mixed strategy because 冱t < y πt < 1. She earns qy after making every statement t ≤ y, and cannot profitably deviate to • • •
silence because she would always be convicted; confession because qx < U < qx − 1 ≤ qy; any statement t > y because she would earn no more than qy + 1 < qy.
If U > qy then the criminal is indifferent between all the statements that she makes with positive probability; and these choices form a mixed strategy because
冱
t<x
πt ≤
冱
t qy then the criminal must confess with probability 1 − 冱t < x πt. The jury must therefore acquit with probability U / qt after observing t-indecisive evidence (t < x) to ensure that the criminal is indifferent between making all statements t < x. Furthermore, the jury must then acquit for sure after observing t-indecisive evidence (t ≥ x). The criminal is therefore acquitted with probability U 冱t < x πt, while every type t < x (resp. t ≥ x) is acquitted with probability 1 − qt + U (resp. for sure). In sum, the English game possesses a unique outcome. || (a) Follows immediately from Lemma A.1. (b) Lemma A.1 implies that the criminal only confesses if U > qy, and that she confesses for sure if and only if U > q1. (c) If U < qy (resp. U > qy ) then Lemma A.1(a) (resp. Lemma A.1(b)) implies that some innocent type is wrongfully convicted with positive probability if and only if statement 1 is implausible (resp. U < q1 ). || Proposition A.2. Every generic American game has a unique outcome. In every equilibrium: (a) The criminal exercises her right to silence if max{U, qy} < 冱u ∈ P qu. (b) The criminal • •
Confesses for sure if and only if U > max{q1, 冱u ∈ P qu}; and Mixes between confession and some false statement(s) if and only if
max{qy,
冱
u ∈P
qu} < U < q1.
(c) Innocent types are (wrongfully) convicted with positive probability if and only if either max{U, qy} ≤
冱
u ∈P
冦 冱
qu < qy < q1 or
冦 冱
qu < U < q1.
max U,
max qy,
u ∈P
u ∈P
冧
qu < q1 or
冧
The effects of a right to silence
293
Proof. Proposition A.2 follows from Lemma A.2 below, which fully characterizes equilibrium play in generic American games. Lemma A.2. (a) Every generic American game in which max{U, qy} ≤ 冱u ∈ P qu possesses a unique outcome. In every equilibrium: •
•
The criminal makes each exculpatory statement t < z with probability πt and otherwise mixes between statement y and silence, choosing statement y with positive probability only if it is the sole plausible statement; The jury (1) Acquits the suspect for sure if she is exonerated or if the evidence is t-indecisive (t ≥ z) or if it observes (, {c, t}) for some plausible innocent type t; (2) Acquits the suspect with probability 冱u ∈ P qu/qt if the evidence is t-indecisive (t < z); (3) Convicts the suspect if she makes any statement t < z and the report contradicts this statement or it observes (, {c, t}) for some implausible innocent type t;
• •
Every innocent type t < z is acquitted with probability 1 − qt + 冱u ∈ P qu and every other innocent type is acquitted for sure; The criminal is acquitted with probability 冱u ∈ P qu.
(b) Every generic American game in which max {U, 冱u ∈ P qu} < qy possesses a unique outcome. In every equilibrium: • •
The criminal makes exculpatory statement t < y with probability πt and otherwise makes statement y; The jury (1) Acquits the suspect for sure if she is exonerated or if the evidence is t-indecisive (t ≥ y); (2) Acquits the suspect with probability qy/qt if the evidence is t-indecisive (t < y); (3) Convicts the suspect if she makes any statement t < y and the report contradicts this statement;
• •
Every innocent type t < y is acquitted with probability 1 − qt + qy and every other innocent type is acquitted for sure; The criminal is acquitted with probability qy.
(c) Every generic American game in which max{qy, 冱u ∈ P qu} < U possesses a unique outcome. In every equilibrium: •
The criminal makes exculpatory statement t < x with probability πt and otherwise confesses;
294 Daniel J. Seidmann •
The jury (1) Acquits the suspect for sure if she is exonerated or if the evidence is t-indecisive (t ≥ x); (2) Acquits the suspect with probability U/qt if the evidence is t-indecisive (t < x); (3) Convicts the suspect if she makes any statement t < x and the report contradicts this statement;
• •
Every innocent type t < x is acquitted with probability 1 − qt + U and every other innocent type is acquitted for sure; The criminal is acquitted with probability U 冱t < x πt.
Proof. We first prove that there are strategy combinations which satisfy the conditions above and which form an equilibrium, and then prove uniqueness. We again construct the requisite strategy combinations by supposing that the jury convicts for sure unless it observes evidence which occurs with positive probability when the suspect is innocent. Once again, it is easy to confirm that the jury’s strategy is then a best response. The argument used in the proof of Lemma A.1 again implies that an innocent suspect cannot profitably deviate to confessing or to making a false statement. The jury acquits a plausible innocent type, irrespective of the report; so she cannot profitably deviate to silence. The jury acquits an exonerated suspect, and would convict a silent innocent suspect who was not exonerated; so she can also not profitably deviate to silence. In sum, we can prove that the behaviour specified in the Lemma forms part of an equilibrium by proving that the criminal cannot profitably deviate: (a) The criminal is indifferent between all of the statements which she makes, and can therefore not profitably deviate by making any such statement with a different probability. Furthermore, y ≥ z implies that her choices form a mixed strategy. She earns 冱u ∈ P qu in the putative equilibrium, and can therefore not profitably deviate to confessing or to making any statement t ≥ z (by definition of z). If qy < 冱u ∈ P qu then y ≥ z; so the criminal cannot profitably deviate to making statement y. If qy = 冱u ∈ P qu then, generically, y is the only plausible statement, and the jury acquits both a silent criminal and one who makes statement y if and only if the report is y-indecisive, whatever the probability that the criminal exercises her right. (b) The criminal earns qy > 冱u ∈ P qu in the putative equilibrium, and can therefore not profitably deviate to silence. Lemma A.1(a) implies that the strategy combination is an equilibrium in the English game; so it is also an equilibrium in the American game. (c) The criminal earns U > 冱u ∈ P qu in the putative equilibrium, and can therefore not profitably deviate to silence. Lemma A.1(b) implies that the
The effects of a right to silence
295
strategy combination is an equilibrium in the English game; so it is also an equilibrium in the American game. We complete the proof by demonstrating uniqueness for each set of cases. Notice that silence and confession are both outside options in American games; so the criminal cannot mix between silence and confession in any equilibrium of a generic American game. (a) The criminal cannot confess in generic games, as she would then earn U, and could profitably deviate to silence. The criminal must be silent with positive probability: for if not, then Lemma A.1(a) implies that she would earn qy, and could profitably deviate to silence because qy < 冱u ∈ P qu. An argument analogous to that used to prove Lemma A.1 implies that • •
The criminal must make every statement t < z with probability πt, and cannot make any statement t ≥ z; and that The jury must acquit with probability 冱u ∈ P qu/qt if the evidence is t-indecisive (t < z), and must acquit for sure if the evidence is t-indecisive (t > z).
The remaining clauses in this part follow by substitution. (b) If the criminal exercised her right then she would earn 冱u ∈ P qu, and would have to make every statement t < z with probability πt. However, y ≤ z, so such a choice would not constitute a mixed strategy. Consequently, the criminal does not exercise her right; and uniqueness then follows from the proof of Lemma A.1(a). The remaining clauses in this part follow by substitution. (c) The criminal cannot exercise her right, else she could profitably deviate to confessing. Uniqueness then follows from the proof of Lemma A.1(b). The remaining clauses in this part follow by substitution. || Proposition A.2 follows directly from Lemma A.2.
||
Proposition A.3. For every generic case: (a) The criminal is at least as likely to be silent in American as in English games, and in some cases is more likely; (b) The suspect is at least as likely to be convicted (either after confessing or at trial) in English as in American games, and in some cases is more likely; (c) The criminal is at least as likely to confess in English as in American games, and in some cases is more likely. Proof. (a) Lemma A.1 implies that the suspect is never silent in an English game; so the result follows from Lemma A.2.
296 Daniel J. Seidmann (b) Lemmas A.1, A.2(b) and A.2(c) imply that the probability of the suspect confessing or being convicted at trial is equal in the two games for every generic case in which max{U, qy} > 冱u ∈ P qu. Lemma A.1 implies that the probability of the suspect confessing or being convicted at trial in a generic English game (CE ) equals
冱
t qy; while Lemma A.2(a) implies that the probability of the suspect confessing or being convicted at trial in a generic American game (CA ) equals
冱
t qy then CE − CA ≥
冤冱
冥冢冱
t 0.
In sum, the suspect is more likely to confess or be convicted at trial in an English than in an American game for every generic case satisfying max {U, qy} ≤ 冱u ∈ P qu, and is otherwise equally likely to confess or be convicted at trial in the two games. (c) Propositions A.1(b) and A.2(b) imply that •
The criminal confesses for sure in an English game but not in an American game if and only if
The effects of a right to silence q1 < U < •
冱
u ∈P
297
qu.
The criminal confesses with probability in (0,1) in an English game but not in an American game if and only if
qy < U < min{q1,
冱
u ∈P
qu}.
Lemmas A.1 and A.2 imply that the criminal confesses with equal probability in the two games for those cases in which she confesses, proving part (c). || Proposition A.3′. If confession secures a low premium then, for every generic case: (a) The criminal is at least as likely to be silent in American as in English games, and in some cases is more likely; (b) The suspect is at least as likely to be convicted (either after confessing or at trial) in English as in American games, and in some cases is more likely; (c) The criminal is equally likely to confess in American and in English games. Proof. Parts (a) and (b) follow immediately from the proof of Propositions A.2(a) and A.2(b). (c) As confession secures a low premium, Propositions A.1(a) and A.2(a) imply that the criminal does not confess for sure in either game. Furthermore, Lemmas A.1 and A.2 then imply that the criminal only confesses in either game if no statement is plausible. In such cases, the criminal confesses with equal probability in the two games because being silent is never a best response in either game. || Proposition A.4. (a) For every generic case: the suspect is at least as likely to be wrongfully convicted in English as in American games, and in some cases more likely; (b) For every generic case in which confession secures a low premium: the jury earns at least as much in English as in American games, and in some cases is strictly better off. Proof. (a) Lemmas A.1 and A.2 imply that the two games possess the same equilibria unless max {U, qy} ≤ 冱u ∈ P qu. In such cases, the suspect is wrongfully convicted with probability 冱t < z Pt (qt − 冱u ∈ P qu ) in an American game.
298
Daniel J. Seidmann
If U < qy then z ≤ y and Lemma A.1(a) imply that the suspect is wrongfully convicted with a probability of at least 冱t < z Pt (qt − qy ) in the American game; so the suspect is at least as likely to be wrongfully convicted in the English game. If U > qy then z ≤ x and Lemma A.1(b) imply that the suspect is wrongfully convicted with a probability of at least 冱t < z Pt[qt − U] in the American game; so the suspect is at least as likely to be wrongfully convicted in the English game. (b) Write J A (resp. J E ) as the jury’s pay-off in the American (resp. English) game. If confession secures a low premium then the two games only possess different outcomes when U < qy ≤ 冱u ∈ P qu, which implies z ≤ y. Substituting from the jury’s pay-off function (cf. Section 2), Lemmas A.1(a) and A.2(a) then imply that
冤冱
JA − JE = D (1 − D) pc
冢冱
t) µ(s) and is indifferent if β = µ. Judges can also play mixed strategies, and we let (s) represent the probability that the judge will play acquittal conditional on the signal generated by the bench trial. Since juries act naively, i.e., make their decisions only on the basis of the evidence presented at the trial, if the generated evidence is favorable, the jury always acquits, and if the evidence is unfavorable, the jury always convicts. Therefore, the jury acquits defendants with probability Πt and convicts them with probability 1 − Πt. The above assumptions then imply that the expected payoff to a defendant of type t who chooses a bench trial is given by Pt(F) + (1 − Pt )(U). If, on the other hand, the defendant selects a jury trial, then his expected payoff is Πt. Hence, the expected payoff to a defendant of type t from playing mixed strategy σ is given by U(t, σ, ) = [Pt(F) + (1 − Pt )(U)] σ + (1 − σ)Πt.
(4)
In order to analyze this game, we employ the concept of a Nash sequential equilibrium (NSE) originally proposed by Kreps and Wilson (1982).
308
Gerald D. Gay, Martin F. Grace, Jayant R. Kale and Thomas H. Noe
Definition 1. An NSE in this game is an ordered triple (σ(t), (s), µ(s)) with the following properties: (i) σ(t) ∈ argmax { U(t, σ, ) s.t. σ ∈ [0, 1]}, (ii) (s) ∈ argmax {V(A, µ(s)) + (1 − ) V (C, µ(s)) s.t. ∈ [0, 1]}, and (iii) if selecting a bench trial is an equilibrium action for either type of defendant, then µ(s) is determined by Bayes’ rule. If, on the other hand, σ(G = σ(I) = 0, then a bench trial is an off-the-equilibrium action, and we require that (σ, µ) = limn→∞ (σn, µn ), where σn is a sequence of purely mixed strategies and µn is determined by Bayes’ rule. Lemma 1. The set of sequentially rational posterior beliefs, denoted by ⺠, is given by
冦
⺠ = (µ(U), µ(F)) s.t. µ(F) ∈ [0, 1] and µ(F) = (1 + c)µ(U) 1 + cµ(U) Proof.
,c≡
PI − PG . PG(1 − PI )
冧
See the Appendix.
Note that Lemma 1 implies that if µ(F) and µ(U) are sequentially rational, then µ(F) ≥ µ(U), and µ(F) > µ(U) unless µ(F) = 0 or µ(F) = 1. In other words, the judge is more likely to believe that a defendant is innocent given favorable evidence rather than unfavorable evidence. There is a multiplicity of equilibria in the above game. For example, suppose that ρ is close enough to one to ensure that the judge’s posterior probability that a defendant is innocent is high enough to induce acquittal, regardless of the evidence generated at the trial. In this case, an equilibrium will exist in which all defendants choose bench trials and therefore, all defendants are acquitted. In order to rule out such “trivial” equilibria, we assume that, in the absence of any self-selection through trial mode, the judge’s priors are such that he will convict on unfavorable evidence and acquit on favorable evidence. This assumption is formally expressed by Assumption 3. Assumption 3. Let c(s, B) represent the judge’s posterior that a defendant is innocent, given signal s, in the absence of any self-selection through trial mode. Then, c(U, B) ≡
ρ(1 − PI ) ρ(1 − PI + (1 − ρ)(1 − PG )
where c(U, B) < β and c(F, B) > β.
and
c(F, B) ≡
ρPI ρPI + (1 − ρ)PG
,
Noisy juries and the choice of trial mode
309
Assumptions 1, 2, and 3 greatly restrict the set of possible NSE outcomes. In fact, they ensure that in all sequential equilibria in which choosing a bench trial is an on-the-equilibrium-path action, G randomizes between a bench and a jury trial and I chooses a bench trial with probability one. To see this, first note that Assumption 2 implies that juries are noisier than judges. Therefore, any sequentially rational strategy of the judge ((s)) which would lead a guilty defendant to choose a bench trial with positive probability, would also lead an innocent defendant to choose a bench trial. Thus, either I chooses a bench trial and G does not, or both I and G choose bench trials. But there cannot exist any equilibrium outcome in which the judge is chosen by I only, for if there were, the judge would know, by Bayes’ rule, that all defendants were innocent, and his best response would be to acquit all defendants. But, given this response, G could gain by switching to a bench trial. Hence, a judge trial must be chosen by both I and G. It cannot be the case, however, that both I and G choose the judge trial with probability one. If they did, the judge’s posterior probability that the defendant is innocent conditional on favorable evidence and unfavorable evidence would be given by c(F, B) and c(U, B), respectively. Assumption 3 implies that given these beliefs, the judge’s best response would be to acquit given favorable evidence and convict given unfavorable evidence. But, given this response, G could gain by deviating to a jury trial. Hence, it must be the case that either I or G randomizes. However, I will strictly prefer a judge trial in all cases in which G is indifferent between the two trial modes; therefore, it must be the case that G randomizes between a judge and a bench trial, and I chooses a bench trial with probability one. Thus, the only NSE outcome in which a judge trial is chosen is one in which I selects a jury trial and G randomizes. In addition, the outcome in which both I and G choose a jury trial is also an NSE outcome for the above game. Because the judicial strategies required to induce innocent defendants to choose a jury trial are rather harsh, we call these equilibria “hanging judge equilibria.” Although the hanging judge outcome can be supported by many Nash sequential equilibria, none of them satisfies the divinity criteria of Banks and Sobel (1987). The reason for this is straightforward. For the judge’s beliefs to support a hanging judge equilibrium outcome, defecting from the equilibrium by choosing a bench trial must signal a defendant’s guilt to the judge. However, because judges are better able than jurors to determine the guilt or innocence of defendants, innocent defendants will defect to a bench trial over a much larger range of possible responses by the judge. Thus, the off-equilibrium beliefs which support the hanging judge equilibrium fail the divinity refinement.12 The following proposition verifies that the remaining outcome in which the guilty defendants randomize between a bench and a jury trial, and the innocent defendants select a bench trial, is an NSE outcome. It is obviously divine because there are no off-equilibrium actions under the equilibrium strategies.
310
Gerald D. Gay, Martin F. Grace, Jayant R. Kale and Thomas H. Noe
Proposition 2.
Suppose that µ(F) and µ(U) are given by Bayes’ rule, i.e., that
(F) = 1, (U) =
ΠG − PG 1 − PG
,
σ(I) = 1, and σ(G) =
1−β
ρ
1 − PI
冢 β 冣冢1 − ρ冣冢1 − P 冣.
(5)
G
Then (σ, , µ) is the unique divine equilibrium when judges are strategic and juries are naive. Furthermore, in this equilibrium, the probability that a judge will convict is lower than the probability that a jury will convict. Also, the unconditional probability that a defendant chooses a bench (jury) trial, which we denote by P(B)(P(J)), is given by 1−β
1 − PI
冤 冢 β 冣冢1 − P 冣冥,
P(B) = ρ 1 +
P(J) = 1 − P(B).
(5a)
G
Proof.
See the Appendix.
We call the above outcome the “lenient judge equilibrium” because the bench conviction rates will always be lower than jury conviction rates. There are two results from this analysis that are noteworthy. First, equation (5) implies that the distribution of guilty defendants between the bench and the jury in the lenient judge equilibrium will depend on: a lenience factor, given by (1 − β)/β; the judge’s prior belief in the defendant’s innocence, given by ρ; and a judicial error factor measured by (1 − PI )/ (1 − PG ). The larger any of these factors is, the higher the probability that a guilty defendant will select a bench trial. Thus, when the judge is lenient (the prior probability that a defendant is innocent is high) and the judge believes that his own evaluation of the evidence is subject to a great deal of error, a large fraction of guilty defendants will select a bench trial. The reason for this is straightforward. When these factors are large, the judge is prone to acquit defendants who select bench trials. Thus, guilty defendants must select a bench trial with high probability in order to prevent the judge from becoming so lenient that he always acquits regardless of the evidence. The second interesting characteristic of the analysis is that the judge’s strategies, given by , are independent of the judge’s preferences—in all cases the judge will favor the defendant to a larger degree than is warranted by the evidence. The reason for this is that judges cannot push the probability of the guilty being convicted above ΠG, since guilty defendants can always opt out of a bench trial. Thus, even the toughest judge will realize that he can only improve the performance of the court system by reducing the conviction rate of the innocent. As long as the judge places some weight on this objective, he will follow a “lenient” judicial strategy of acquitting in some cases when the evidence points to the defendant’s guilt.
Noisy juries and the choice of trial mode
311
Judicial equilibria when juries act strategically When judges are naive and juries are strategic, there is only one possible equilibrium outcome, namely a “hanging jury equilibrium,” in which all defendants select bench trials. The intuition behind this result follows from the fact that here the strategic player is also the noisy player. The jury, being strategic, recognizes that it is noisy and, therefore, knows it will attract guilty defendants disproportionately. Hence, the jury will never find it in its interests to acquit with a positive probability when the evidence is unfavorable.13 This makes any equilibrium in which innocent defendants select jury trials impossible. If only guilty defendants choose jury trials, however, conviction of all defendants is always the optimal strategy for the jury.14 Judicial equilibria when both judges and juries act strategically We now consider the more interesting case in which both the fact finders act strategically. As with the judge, we asume that the jury’s objective is to maximize the probability of acquitting the innocent and of convicting the guilty. Defining the jury’s payoff function (w) analogously to the judge’s, we obtain
w(t, r) =
⎧δ ⎨1 − δ ⎩0
if defendant is guilty and jury convicts, if defendant is innocent and jury acquits,
(6)
otherwise,
where δ measures the jury’s toughness. We assume that both the judge and the jury have the same prior probability (ρ). Let v(s) denote the jury’s posterior that the defendant is innocent given signal s. Let W (r, v(s)) represent the jury’s expected utility if it plays r after observing s. Then, W(r, v(s)) is given by W (r, v(s)) = w(I, r) v(s) + w(G ,r)(1 − v(s)).
(7)
Given equation (7), we can express W(r, v(s)) as follows: W(r, v(s)) =
δ(1 − v(s))
冦(1 − δ)v(s)
if if
r=C r = A.
(8)
Juries can also play mixed strategies, and we let χ(s) represent the probability that the jury plays acquittal conditional on the signal generated by the jury trial. Note that the payoff to a defendant of type t from selecting a judge trial is Pt when the judge acts naively, while the expected payoff from selecting a jury trial when the jury acts strategically is Πtχ(F) + (1 − Πt )χ(U). Hence, the expected payoff to the defendant, expressed as a function of the defendant’s type and strategy, and the jury’s strategy, is given by
312 Gerald D. Gay, Martin F. Grace, Jayant R. Kale and Thomas H. Noe U(t, σ, χ) = (1 − σ)[Πtχ(F) + (1 − Πt )χ(U)] + σPt.
(9)
We also impose a restriction on the jury’s priors analogous to that imposed on the judge’s priors in Assumption 3. This restriction is provided by Assumption 4. Assumption 4. Let c(s, J) represent the jury’s posterior that a defendant is innocent, given signal s, in the absence of any self-selection through trial mode. Then, c(U, J) < δ and c(F, J) > δ. The expected payoffs to a defendant choosing a bench or a jury trial will be given by Pt(F) + (1 − Pt )(U) and Πtχ(F) + (1 − Πt )χ(U), respectively. Therefore, the expected payoff to the defendant of type t who plays σ, given that the jury is playing χ and that the bench is playing , is given by U(t, σ, χ, ) = σ[Pt(F) + (1 − Pt ) (U)] + (1 − σ)[Πtχ(F) + (1 − Πt )χ(U)]. We define an NSE when both the judge and the jury act strategically as follows. An NSE is a quintuple (σ(t), (s), χ(s), µ(s), ν(s)) that satisfies the following conditions: (i″) σ(t) ∈ argmax {U(t, σ, χ, ) s.t. σ ∈ [0, 1]}, (ii″) satisfies Condition (ii), and χ(s) ∈ argmax {χW (A, ν(s)) + (1 − χ)W(C, ν(s)) s.t. χ ∈ [0, 1]}, and (iii″) µ satisfies Condition (iii), and ν is determined by Bayes’ rule if 1 − σ(G) or 1 − σ(I) is greater than zero. If, on the other hand, 1 − σ(G) = 1 − σ(I) = 0, then a jury trial is an off-the-equilibrium action, and we require that (σ, ν) = limn→∞ (σn, νn ), where σn is a sequence of purely mixed strategies and νn is determined by Bayes’ rule. Using these definitions and assumptions, we establish that in any equilibrium either a jury trial or a judge trial is an off-the-equilibrium-path action. The reason for this is that there does not exist any randomization strategy that is consistent with the fact finder’s objective, in which both types randomize. This implies that there cannot be a mix of guilty and innocent defendants before both the bench and the jury. Hence, the optimal strategy for one of these forums will be either to convict or to acquit all defendants who face trial before it. Therefore, for the defendant’s choices to be rational, it must be the case that all defendants choose the same forum. This leaves two equilibrium outcomes—a hanging judge outcome and a hanging jury outcome. In the hanging judge (jury) outcome, the judge (jury) follows a conviction policy harsh enough to ensure that no defendants choose a judge (jury) trial. This harsh conviction policy is supported by the off-equilibrium belief that defendants who select a judge (jury) trial are more likely to be guilty. However, only the off-equilibrium beliefs that support the hanging jury outcome satisfy the divinity refinement. This is stated in Proposition 3.
Noisy juries and the choice of trial mode
313
Proposition 3. If both judges and juries are strategic, and Assumptions 1, 2, 3 and 4 hold, there will not exist any equilibria in which both a bench and a jury trial are equilibrium actions. Thus, there will be only two types of NSE’s—hanging jury and hanging judge, of which only the hanging jury equilibria are divine. The hanging judge equilibria fail the divinity refinement for the same reasons they failed in the earlier case of a naive jury and strategic judge.
3. The relevant evidence In the previous section we derived equilibria under four sets of assumptions about the actions of the three players—judge, jury, and defendant—in a game with asymmetric information. The defendant was assumed always to act strategically, whereas the judge and the jury could act either naively or strategically. In each of the cases in which either the judge or the jury acted strategically, we demonstrated the existence of a unique divine equilibrium outcome. In this section, we restrict our attention to the empirical implications of the divine equilibrium outcomes. Testable implications Recall that the actions of a defendant in the court system depend on the unobservable conditional probabilities that, in turn, depend on the defendant’s guilt or innocence. Therefore, in testing the model, we are able to test only the effects that these conditional probabilities have on the unconditional probabilities of acquittal or conviction, and on the unconditional probabilities that the defendant will select either a bench or jury trial. Table 13.1 summarizes the testable implications of each set of assumptions for these unconditional probabilities. As shown in the table, when both the judge and jury are naive players, we obtain the formalistic equilibrium in which the innocent will choose a bench trial and the guilty will choose a jury trial. This fact yields two testable predictions regarding the behavior of the court system. First, it implies that the unconditional probability that a defendant will choose a bench trial (P(B)) equals the proportion of defendants who are innocent (ρ) and that the proportion choosing a jury trial (P(J)) equals the proportion of defendants who are guilty (1 − ρ). Thus, if we make the rather reasonable assumption that, under the prevailing law enforcement system, significantly more guilty than innocent defendants are brought to trial, we should also observe that jury trials constitute a significantly larger proportion of all trials, i.e., that P(B) < P(J). Second, because innocent defendants are more likely to be acquitted than guilty defendants, the conviction rate of bench trials (C(B)) should be lower than the conviction rate of jury trials (C(J)), i.e., C(B) < C(J).
314
Gerald D. Gay, Martin F. Grace, Jayant R. Kale and Thomas H. Noe
Table 13.1 Characterizations and empirical restrictions of equilibria in the choice between trial by bench or jurya Judge behavior Naive
Strategic
Jury behavior
FORMALISTIC EQUILIBRIUM
LENIENT JUDGE EQUILIBRIUM
Naive
Defendants’ Choices: Innocent to Bench Guilty to Jury Empirical Restrictions: P(B) < P(J) C(B) < C(J)
Defendants’ Choices: Innocent to Bench Guilty Randomize Empirical Restrictions: P(B) < P(J)b C(B) < C(J)
HANGING JURY EQUILIBRIUM Defendants’ Choices: Defendants to Bench Empirical Restrictions: P(B) = 1, P(J) = 0
HANGING JURY EQUILIBRIUM Defendants’ Choices: Defendants to Bench Empirical Restrictions: P(B) = 1, P(J) = 0
Strategic
Notes: a In the table, P(B) is the unconditional probability that a defendant will choose a bench trial; P(J) is the unconditional probability that a defendant will choose a jury trial; C(B) is the unconditional probability of conviction by the bench; and C(J) is the unconditional probability of conviction by a jury. b For sufficiently small p, where p is the prior that the defendant is innocent.
If, instead, only the judge acts strategically, we obtain the lenient judge equilibrium in which the judge acquits the defendants in cases with favorable evidence, and convicts only a fraction of the defendants when the evidence is unfavorable. In this equilibrium P(B) > ρ. However, if ρ is sufficiently small, it will still be the case that P(B) < P(J). For example, using equation (5a) we see that when β = .5, PI = .66, and PG = .33, then P(B) < .5 as long as ρ < .25. Hence, assuming that significantly more defendants are guilty, the lenient judge equilibrium yields the same qualitative prediction as the naive judge equilibrium regarding a defendant’s choice of trial mode, i.e., that P(B) < P(J). Similarly, Proposition 2 shows that the probability that the judge will convict is lower than that of the jury in the lenient judge equilibrium. Hence, the restrictions on the values of the observable empirical variables implied by this equilibrium are qualitatively the same as those implied by both the judge and jury acting naively. Therefore, these two equilibria cannot be distinguished at the empirical level. This, of course, implies that our tests cannot shed light on whether judges act strategically. The lower half of Table 13.1 considers those equilibria in which the jury acts strategically and the judge is either naive or strategic. In both cases, we obtain the hanging jury equilibrium in which all defendants, guilty or
Noisy juries and the choice of trial mode
315
innocent, choose bench trials. These equilibria have the extreme empirical implication that the proportion of jury trials is zero. This hypothesis is easily rejected empirically. Data and test results In order to test the above hypotheses, we obtained data on bench and jury criminal trials for the states of Florida and Texas. These two states were chosen because, apparently, these are the only states which compile data in a form necessary for this study.15 Additionally, it so happens that these states have sentencing laws which fit our theoretical construct very well. In our theory we did not account for different sentencing habits of judges. Both Florida and Texas, however, have certain guidelines which take away judicial discretion in sentencing, and thus, we should expect uniform penalties for crimes regardless of the personality of the trial judge. Therefore, the decision of the defendant should be entirely based on the probability of acquittal. Recall that our theory applies only to criminal trials because only the criminal defendant generally has the option to unilaterally waive the jury trial.16 Of all the persons who are charged with crimes, some never go to trial because they plead guilty or because the cases are dismissed for procedural reasons, such as insufficient evidence or failure to prosecute speedily. We considered only those cases that eventually went to trial in which a plea of “not guilty” was entered.17 For the state of Texas we obtained issues of the Texas Judicial System Annual Report from the Office of Court Administration for the years 1981 to 1986. These reports contain the numbers of bench and jury trials, as well as the outcome of each trial for various crime categories. Similarly, we obtained data from the Report of the Florida State Court System prepared by the State Courts Administrator for the years 1982 to 1985.18 We first examine the conjecture that the guilty prefer jury trials and the innocent prefer bench trials. Note that the model predicts that guilty defendants would have a propensity for jury trials in naive jury equilibria. If, in fact, most defendants are guilty, then the proportion of defendants selecting jury trials would be higher than if not. We test the null hypothesis that the proportion of defendants that selects a bench trial is equal to the proportion that selects a jury trial, i.e., P(B) = P(J) = P0 = .5, against the alternative hypothesis that P(B) < P(J). The results from this test are reported in Tables 13.2 and 13.3 for Florida and for Texas, respectively.19 The results strongly support the implications of the two naive jury equilibria described in Table 13.1. With the exception of drug-related crimes in Florida for 1984, all crime categories show a lower proportion of defendants choosing bench trials.20 We next test the null hypothesis that the bench conviction rate is equal to the jury conviction rate, that is, C(B) = C(J). To test this null hypothesis, we conduct a Chi-squared test for homogeneity for an unknown probability;
316
Gerald D. Gay, Martin F. Grace, Jayant R. Kale and Thomas H. Noe
Table 13.2 Proportion of defendants selecting trial by Bench, Pˆ(B), in the Florida Circuit Courts during 1982–1985a (standard errors in parentheses) 1982 Crime description
N
Capital crimes
213
Crimes against persons Crimes against property
1983 Pˆ(B)
1649
.080* (.034) .136*
1132
(.012) .172*
Drugs
465
Other felonies
373
(.015) .118* (.081) .299* (.026)
N 187
1984 Pˆ(B)
1838
.053* (.037) .141*
1172
(.012) .211*
461 500
(.015) .154* (.023) .206* (.022)
N 160
1985 Pˆ(B)
N
1408
.088* (.040) .124*
1513
.083* (.036) .132*
794
(.013) .152*
757
(.013) .173*
496 273
(.018) .127* (.022) .260* (.030)
192
Pˆ(B)
510 263
(.018) .124* (.022) .297* (.031)
Notes: a N is the total number of defendants. * Significantly less than .5 at the .01 level.
the alternative hypothesis is C(B) < C(J).21 The actual conviction rates of judges and juries were used as estimates of the respective unconditional probabilities. The results of these tests are presented in Tables 13.4 and 13.5 for the states of Florida and Texas, respectively.22 The results strongly support the implications of the two naive jury equilibria described in Table 13.1. We observe that, in most cases, the conviction rate of judges is significantly lower than the conviction rate of juries. In those cases in which we fail to find a significant difference, the bench conviction rate is still generally less than the jury conviction. Evidence on model assumptions Our entire theoretical construct, as well as the validity of the empirical tests that follow, rests on two crucial assumptions: (1) that the jury’s decisionmaking process is noisier than a judge’s, implying that a jury is more likely to acquit the guilty and convict the innocent; and (2) that defendants are rational in that they attempt to minimize the probability of conviction. The evidence provided so far, which demonstrates that most defendants choose jury trials even though bench trials have lower conviction rates, provides strong support for the validity of our assumption regarding noisier juries. To see this, consider the opposite scenario in which the judge’s decision making is noisier than the jury’s. If this were the case, the conviction rates of judges would always be higher than those of juries. To see this, note that if both the judge and the jury act naively, the guilty will choose a bench trial and the innocent will prefer a jury trial, which implies that conviction rates
59
1135
N
1981
387
Other felonies
.068* (.065) .193* (.015) .290* (.012) .408* (.028) .328* (.025)
Pˆ(B)
Notes: a N is the total number of defendants. * Significantly less than .5 at the .01 level.
321
Drugs
Crimes against property 1700
Crimes against persons
Capital crimes
Crime description 80
445
365
1457
1866
N
1982
.013* (.056) .171* (.012) .372* (.013) .386* (.026) .321* (.024)
Pˆ(B) 71
428
438
1389
2007
N
1983
.042* (.059) .192* (.011) .422* (.013) .441* (.024) .322* (.024)
Pˆ(B) 64
456
452
1680
1232
N
1984
.047* (.064) .249* (.014) .352* (.012) .500 (.024) .338* (.023)
Pˆ(B) 65
562
501
1246
2015
N
1985
.046* (.062) .197* (.011) .390* (.014) .405* (.022) .315* (.021)
Pˆ(B)
76
706
608
1400
2099
N
1986
.026* (.057) .194* (.011) .405* (.013) .377* (.020) .346* (.019)
Pˆ(B)
Table 13.3 Proportion of defendants selecting trial by bench, Pˆ(B), in the Texas District Courts during 1981–1986a (standard error in parentheses)
.588 .347 .379 .436 .384
Capital crimes Crimes against persons Crimes against property Drugs Other felonies
.801 .636 .591 .727 .698
Jury 4.18* 67.70** 29.30** 19.21** 32.46**
χ2 1.000 .373 .328 .380 .561
Judge .808 .651 .626 .707 .730
Jury – 72.86** 70.38** 28.58** 49.81**
χ2
Notes: * Judge conviction rate is significantly less than jury conviction rate at the .05 level. ** Judge conviction rate is significantly less than jury conviction rate at the .01 level.
Judge
Conviction rate
Conviction rate
Crime description
1983
1982
.571 .408 .347 .412 .493
Judge .749 .641 .582 .681 .648
Jury
Conviction rate
1984
3.65* 35.13** 22.92** 17.37** 5.33*
χ2
Table 13.4 Conviction rates for trials by judges and juries in the Florida Circuit Courts during 1982–1985
.625 .360 .405 .285 .436
Judge
.727 .618 .543 .658 .627
Jury
Conviction rate
1985
0.76 47.72** 8.33** 32.17** 8.18**
χ2
Noisy juries and the choice of trial mode
319
Table 13.5 Conviction rates for trials by judges and juries by crime in the Texas District Courts during 1981–1986
Crime description Capital crimes Crimes against persons Crimes against property Drugs Other felonies
Crime description
1981
1982
1983
Conviction rate
Conviction rate
Conviction rate
Judge Jury χ2
Judge Jury χ2
Judge Jury χ2
.500 .945 .122 .745
9.54** 1.000 .962 – .667 13.30** .702 .827 26.19** .673
.971 .842
6.56** 57.97**
.710 .824
27.10**
.685 .809
29.02** .695
.753
5.94**
.771 .858 .740 .754
4.01** 0.09
.695 .920 .601 .775
31.36** .699 14.45** .746
.882 .838
22.50** 5.06*
1984
1985
1986
Conviction rate
Conviction rate
Conviction rate
Judge Jury χ2
Judge Jury χ2
Judge Jury χ2
Capital crimes 1.000 Crimes against .593 persons Crimes against .668 property Drugs .757 Other felonies .604
.951 0.16 .866 216.25**
.667 .952 4.02* .500 .607 .837 102.02** .583
.986 17.9** .815 100.15**
.853 159.79**
.644 .799
36.70** .549
.795
96.48**
.887 .877
.749 .893 .695 .873
18.08** .655 25.61** .516
.863 .848
36.76** 90.13**
27.05** 45.25**
Notes: * Judge conviction rate is significantly less than jury conviction rate at the .05 level. ** Judge conviction rate is significantly less than jury conviction rate at the .01 level.
for the bench would be higher. If the jury acts strategically and the judge is naive, we theoretically obtain the “lenientjury” equilibria, which again implies higher conviction rates for the bench. If the judge acts strategically, regardless of how the jury acts, we obtain a “hanging judge” equilibrium in which none of the defendants chooses a bench trial. Since the actual bench conviction rates are lower than the actual jury conviction rates, the possibility of a noisier judge is ruled out. Our evidence does not, however, rule out the possibility that defendants make their decisions suboptimally—choosing a jury trial despite the higher observed probability of conviction. Economic models generally assume that agents are rational, which implies that the observed actions of agents are optimal. Therefore, under the maintained assumptions of these models, the consequences of supposedly suboptimal decisions cannot be observed.
320 Gerald D. Gay, Martin F. Grace, Jayant R. Kale and Thomas H. Noe We are able to present evidence, however, which bears directly on the consequences which would have been borne by defendants who in reality selected jury trials, if they had chosen bench trials instead. This evidence is provided by the analysis of data taken from an experiment conducted by Kalven and Zeisel (1971). Kalven and Zeisel report on a sample of 3,576 cases which were actually tried by jury. In each of these cases, the authors asked the presiding judge how he would have ruled. The fact that Kalven and Zeisel report the decisions of the jury and the presiding judge for the same trial, implying that all other factors are held constant, allows us to test for defendant rationality. From their experimental data, we first compute the conviction rates for judges and juries, respectively, by crime. We next test the null hypothesis that the computed bench conviction rate for each crime, C(B), is equal to the jury conviction rate, C(J). To test this, we conduct a Chi-squared test for homogeneity with an unknown probability.23 The alternative hypothesis, supporting defendant rationality, is that the judges would have convicted a higher proportion than the jury, i.e., C(B) > C(J). The results of these tests are presented in Table 13.6. The column in Table 13.6 labeled “Judge conviction rate” is the would-be conviction rate of the presiding judge had he tried the case without the jury. The next column, “Excess over jury rate,” indicates the difference between the judge’s conviction rate and the actual conviction rate of the jury. The results in Table 13.6 strongly support our assumptions in that they indicate that the presumed conviction rate for the judges is higher than that for the juries. Specifically, the bench conviction rate was higher than that of the juries’ in each of the 42 crime categories and was statistically significant in 25 of them. These results clearly indicate that the guilty defendants would have done worse had they selected a bench trial; thus, the decisions made by the defendants were indeed optimal. In addition, these results lend further support to the assumption that the jury is noisier. If we interpret the judge’s hypothetical decision in the Kalven and Zeisel study as a judge’s evaluation of the evidence at the trial, and if the jury is acting naively, then the study implies that the judge is more likely than the jury to receive the unfavorable signal when the defendant is guilty. The higher hypothetical conviction rate for judges in the Kalven and Zeisel study cannot be attributed to judges simply being tougher than juries (that is, judges convict with uniformly higher probabilities than juries) because our previous tests found that the conviction rates are significantly lower in bench trials. It is possible however, that a judge may have used information from pretrial hearings in which all evidence, including irrelevant and illegally obtained evidence, is available in making his decision. Such evidence is not available to a jury. Hence, a judge may be prone to convicting more defendants. It could also be that the judges had information regarding repeat offenders and were stricter with them.
Noisy juries and the choice of trial mode
321
a,b
Table 13.6 Comparison of judge and jury verdicts in identical trials Crime description Crimes against person Murder Manslaughter Negligent Homicide Aggravated Assault Simple Assault Kidnapping Forcible Rape Statutory Rape Incest Sodomy Molestation of a Minor Indecent Exposure Commercial Vice Other Sex Offenses Robbery Extortions Crimes against property Burglary Auto Theft Mail Theft Other Grand Larceny Petty Larceny Receiving Stolen Goods Embezzlement Fraud Forgery Arson Drugs Other felonies Gambling Game Laws Liquor Other Liquor Other Regulatory Offenses Concealed Weapons Perjury Tax Evasion Escape Bribery/Official Misconduct Minor crimes Traffic Offenses Misc. Public Disorder Malicious Mischief Non-Support DWI
No. of cases
Judge conviction rate
Excess over jury rate
Standard error
210 80 94 292 78 13 106 70 48 46 38 31 42 31 229 11
0.88 0.81 0.75 0.81 0.77 1.00 0.71 0.88 0.90 0.84 0.74 0.84 0.80 0.87 0.89 0.91
0.08** 0.27** 0.21** 0.17** 0.20** 0.08 0.14* 0.26** 0.18** 0.06 0.21* 0.41** 0.08 0.13 0.14** 0.09
0.036 0.071 0.068 0.036 0.074 0.078 0.066 0.070 0.079 0.082 0.109 0.112 0.094 0.101 0.035 0.151
298 111 12 128 42 51 42 89 112 31 192
0.91 0.80 0.75 0.77 0.93 0.76 0.71 0.83 0.87 0.68 0.93
0.17** 0.13* 0.04 0.15** 0.25** 0.24** 0.06 0.22** 0.11* 0.05 0.05*
0.030 0.059 0.189 0.057 0.083 0.093 0.103 0.066 0.052 0.123 0.030
49 21 51 82 25 23 19 22 14 39
0.84 0.86 0.88 0.81 0.80 0.74 0.63 0.95 0.86 0.72
0.27** 0.33** 0.16* 0.20** 0.14 0.13 0.05 0.16 0.14 0.13
0.089 0.136 0.078 0.070 0.127 0.140 0.163 0.101 0.157 0.108
105 49 18 77 455
0.81 0.76 0.52 0.76 0.85
0.19** 0.02 0.11 0.05 0.24**
0.061 0.088 0.170 0.071 0.028
Notes: a Data source: Kalven and Zeisel (1971, pp. 69–72). b The “judge conviction rate” is the conviction rate of the presiding judge had he tried the case without the jury, while “excess over jury rate” is the difference between the judge conviction rate and the actual jury conviction rate. * Difference is significantly greater than zero at the .05 level. ** Difference is significantly greater than zero at the .01 level.
322
Gerald D. Gay, Martin F. Grace, Jayant R. Kale and Thomas H. Noe
4. Concluding remarks In this article, we have analyzed the strategic implications of allowing defendants to choose between a jury and bench trial under the assumption that juries are noisier processors of information. When both judges and juries make their decisions only on the basis of trial evidence, we demonstrated that innocent defendants will exercise their right to waive a trial by jury in order to avoid the noisier jury trial mechanism. Therefore, bench trials will generate evidence that favors acquittal, both because defendants who select bench trials are more likely to be innocent, and because judges have superior abilities to detect this innocence. If, in addition, judges act strategically by taking into account the innocent defendants’ preferences for bench trials, then judges will convict at an even lower rate than the trial evidence warrants. Under the assumption that most defendants are guilty, our theory, therefore, makes the seemingly anomalous prediction that the conviction rate at judge trials will be lower than at jury trials and that most defendants will select jury trials. On the other hand, if juries act strategically, they will realize that they have inferior information-processing abilities. Thus, they will also realize that defendants who select jury trials are likely to be guilty defendants attempting to exploit this lack of information-processing ability. As a result, their conviction rates will be sufficiently high to deter any defendant from selecting a jury trial. However, the fact that we do observe some defendants selecting jury trials rules out these types of equilibria. We found that, in general, the conviction rates of juries are greater than those of judges, thus supporting the restrictions on the data suggested by the naive jury hypotheses. Our analysis implicitly assumed that a presiding judge could not overturn the jury’s verdict. Because of double jeopardy, a judge can only overturn convictions by the jury. Thus, he cannot increase the conviction rate for guilty defendants who opt for jury trials. Hence, guilty defendants will remain biased towards jury trials, and selecting a bench trial will remain a signal of innocence. So, the conclusions from our analysis still hold. The other implicit assumptions in our analysis are that (i) fact finders are unbiased, and (ii) the judge does not use his sentencing power strategically. We expect that if the jury is sufficiently biased against the defendant, then defendants liable to be affected by this bias (even guilty ones) might opt for a bench trial. Similarly, a judge who uses his sentencing power strategically may lighten the sentence of a defendant who he thinks was erroneously convicted by the jury.24 Finally, our results shed light on the efficacy of the jury system in the context of maximizing the accuracy of trial outcomes. Our two naive jury models suggest that the innocent always choose a bench trial. Therefore, the right to waive a jury trial benefits the innocent both because they can signal their innocence by waiving this right (if judges act strategically) and because waiving the right may allow them access to a more accurate fact finder.25 Thus, innocent defendants are always better off under the current system
Noisy juries and the choice of trial mode
323
than under a system in which jury trials are mandated. Conviction rates for guilty defendants however, would be the same under both systems. Mandating judge trials, on the other hand, would increase the conviction rate for guilty defendants. Yet, imposing an all-judge system would eliminate the signalling benefits to innocent defendants of choosing a bench trial. Therefore, moving from the current system to a system in which judge trials are mandated would decrease the probability of acquitting the innocent, but, at the same time, let fewer guilty defendants go free.
Appendix The proofs of Lemma 1 and Proposition 2 follow. Proof of Lemma 1. Bayes’ Rule implies that µ(F) = µ(U) =
ρ σ(I)PI ρ σ(I)PI + (1 − ρ) σ(G)PG
(A1)
ρ σ(I)(1 − PI )
. ρ σ(I)(1 − PI ) + (1 − ρ) σ(G) (1 − PG )
(A2)
Equations (A1) and (A2) imply that if σ(I) ∈ (0, 1) and σ(G) ∈ (0, 1), then PI(1 − PG ) µ(U) µ(F) = . 1 − µ(F) PG(1 − PI ) 1 − µ(U)
冢
冣
This implies, after some algebraic manipulation, that µ(F) = µ(U)[1 + c(1 − µ(F))],
where c =
P I − PG . PG(1 − PI )
(A3)
This relationship will hold for the limit of any sequence of µn that is generated by equations (A1) and (A2). Also, it is clear that there exists a sequence of purely mixed strategies, σn → 0, such that every µ satisfies (A3). Therefore, the set of sequentially rational posterior beliefs, denoted by ⺠, is given by
冦
⺠ = (µ(U), µ(F)) s.t. µ(F) ∈ [0, 1]
and µ(F) =
(1 + c) µ(U) 1 + cµ(U)
冧.
Q.E.D.
Proof of Proposition 2. First, note that Assumption 3 implies that σ(G) < 1. Simple manipulations demonstrate that, in this equilibrium, µ(U) = β. This implies that the judge is indifferent between C and A when s = U. Thus, (U) satisfies Condition (ii) of the definition of the NSE. Since µ(F) > µ(U),
324
Gerald D. Gay, Martin F. Grace, Jayant R. Kale and Thomas H. Noe
(F) = 1. From Assumption 2, it is clear that σ(I) = 1 satisfies Condition (i) of the definition of the NSE. On the other hand, renders G indifferent between B and J and, hence, σ(G) as defined in the theorem satisfies Condition (i) in the definition of an NSE. Finally, to show that the judge’s unconditional probability of acquittal is higher than the jury’s, note that, because only guilty defendants select a jury trial, the probability that a jury will acquit is ΠG. In the equilibrium, G randomizes; hence, the probability that a guilty defendant will be acquitted by the bench is ΠG. The probability that an innocent defendant will be acquitted is Π G − PG
冢 1−P 冣>P >Π >Π .
PI(F) + (1 − PI )(U) = PI + (1 − PI )
I
I
G
G
The unconditional probability of acquittal by the judge is a weighted average of the probability of acquittal given the defendant’s innocence and the probability of acquittal given that the defendant is guilty. Therefore, the probability of acquittal must be higher from the judge than from the jury. To prove (5a), simply note that P(B) = ρ + σ(G)(1 − ρ) and substitute (5) into (A4).
(A4) Q.E.D.
Notes * 1 2 3
4
We have benefitted significantly from the comments of Sankar De, Curt Hunter, Larry Wall, the editor James Poterba, and two anonymous referees. The usual disclaimer applies. The Magna Carta, the first English Bill of Rights, contains the right, as does Article III, the Sixth Amendment of the United States Constitution, and all state constitutions. See also Klevorick and Rothschild (1979) and Gelfand and Solomon (1977a, 1977b) for earlier work on this issue. Such applications of economic theory to aspects of the judicial system have grown in recent years. Rubinfeld and Sappington (1987) present a model of the judicial system in which defendants are able to signal their guilt or innocence, and the court is then able to minimize type one or type two errors based upon the defendant’s expenditures. Schulhofer (1988) examines the role efficiency plays in the discretionary decisions available to agents in the criminal justice system. Posner (1973) examines another part of the system by looking specifically at the efficiency of procedural rules. Others examine the various incentives to bring a civil lawsuit (Shavell, 1982) and an agent’s strategic behavior in a lawsuit (P’ng, 1983 and Bebchuck, 1984). These results help clarify reasons for certain procedural rules and basic justifications for allowing agents to behave in a particular manner. There have been suggestions and in some states there exist laws (for example, in Texas and Florida) to make jury trials compulsory and do away with bench trials altogether for certain classes of crimes. These actions appear to stem from the
Noisy juries and the choice of trial mode
5
6 7 8 9 10
11
12 13 14 15 16 17
18
325
belief that a jury trial somehow best protects the right of a defendant. Legal folklore suggests that the jury is an imperfect mechanism to determine guilt or innocence, but as Justice White observed, the jury’s function is to provide a safeguard “against the corrupt or overzealous prosecutor and against the compliant, biased or eccentric judge.” [Apodaca v. Oregon, 406 U.S. 404, 92 S. Ct. 1628, 32 L. Ed. 2d 184 (1972)]. In our analysis we assume that the defendant always has this choice. In some states, and in the federal courts, waiving of a jury trial is not an absolute right. However, according to Kamisar et al. (1980), even in such cases the permission to waive a jury trial is almost always granted. For discussions on this issue, see, among others, Scheflin and Van Dyke (1980) and Visher (1987). See Banks and Sobel (1987) for a discussion of the divinity refinement of the Nash sequential equilibrium. Gelfand and Solomon (1977a, 1977b) consider a different aspect of “noise.” According to them, allowing the jury to deliberate adds noise to their decision making. We assume that defendants are not concerned with the cost of the trial. This, we believe, is a reasonable assumption for most defendants facing serious criminal charges, many of whom are defended by public defenders. The “relevant evidence doctrine” is a rule of evidence law which states that only relevant evidence can be admitted before the fact finder. According to this law, relevant evidence means evidence “having any tendency to make the existence of any fact of consequence to the determination of guilt more probable or less probable than it would be without the evidence.” See Federal Rules of Evidence 401 (1987). It can be shown that as long as v is a function of only t and r, this formulation is completely general in that any such utility function can be expressed in the form given above through a simple linear transformation. Therefore, for positive β, a fact finder who maximizes expected utility must maximize the probability of acquittal of the innocent and conviction of the guilty. Formal proofs of these assertions are available from the authors upon request. This result requires the assumption, analogous to Assumption 3, that the jury’s prior probability that the defendant is innocent is not so high that the jury would acquit regardless of the evidence, when all defendants are selecting a jury trial. The proofs of this and subsequent assertions/propositions are available from the authors upon request. California data were also available, but in a highly aggregated state. In results not reported, we find that the California data support our basic hypotheses. In a civil lawsuit, both parties must agree to waive a jury trial. This excludes pleas of nolo contendere or guilty since a jury trial is then automatically waived. Directed verdicts, in which a judge directs the jury to acquit the defendant because, as a matter of law, no reasonable jury could convict on the basis of the presented evidence, were included in the analysis. Our results are therefore on the conservative side because they lower the estimated jury conviction rates. The existence of directed verdicts is consistent with the assumption that a jury’s decision making is more noisy because, if it were not, the judge would not need to tell it that the evidence was insufficient. This data, from the Florida Circuit Courts and Texas District Courts, are from the trial court of general jurisdiction. Generally, there is also a trial court of limited jurisdiction which hears relatively minor criminal/regulatory matters, such as traffic offenses, violations of county or municipal ordinances, and similar misdemeanor offenses. The general trial level court, however, hears felony cases, which generally have more severe penalties.
326
Gerald D. Gay, Martin F. Grace, Jayant R. Kale and Thomas H. Noe
19 Our test statistics, not reported in the tables, were computed according to (P(B) − P0 )/σP(B), where P(B) is the observed proportion of defendants selecting a bench trial, and σP(B) is computed as √[(P0 )(1 − P0 )/N] where N is the total number of defendants. 20 The results for the category of capital crimes for the state of Texas need to be interpreted with caution because of a Texas law which makes a jury trial mandatory in those cases where the death sentence is a possibility. 21 The test statistic, distributed Chi-squared, is given by x2(1) =
(xB − NBp)2 NBpq
+
(xJ − NJp)2 , NJpq
where xB and xJ are the number of persons convicted by the judge and jury, respectively; NB and NJ are the number of persons coming before the bench and the jury, respectively; p = (xB + xJ)/(NB + NJ) is the pooled conviction rate; and q = 1 − p. 22 This test is robust against any misspecification except for a lack of independence between successive court decisions. In the case of jury trials, independence between successive decisions is to be expected since different juries are involved. In the case of bench trials, however, the assumption of independence may be violated. It is not possible to test for independence due to a lack of time-series data on court decisions. 23 The test statistic is given by t2N − 2 =
C(B) − C(J)
,
2(N − 1)(SB2 + S2J) N(2N − 2)
冪
where S2l =
NC(i)(1 − C(i)) N−1
for i = B, J.
24 We are grateful to an anonymous referee for this observation. 25 The reduction in the conviction rate for innocent defendants induced by the right to waive a jury trial will have the added benefit of reducing jail-overcrowding.
References Banks, J.S. and Sobel, J. “Equilibrium Selection in Signalling Games.” Econometrica, Vol. 55 (1987), pp. 647–661. Bebchuck, L.A. “Litigation and Settlement Under Imperfect Information.” RAND Journal of Economics, Vol. 15 (1984), pp. 404–415. Cho, I.K. and Kreps, D.M. “Signalling Games and Stable Equilibria.” Quarterly Journal of Economics, Vol. 102 (1987), pp. 179–221. Gelfand, A.E. and Solomon, H. “An Argument in Favor of 12-Member Juries.” In S. Nagel, ed., Modelling the Criminal Justice System, Vol. 7 of Justice Systems Annuals. Beverly Hills: Sage Publications, 1977. —— “Considerations in Building Jury Behavior Models and in Comparing Jury Schemes: An Argument in Favor of 12-Member Juries.” Jurimetrics Journal, Vol. 17 (1977), pp. 292–313.
Noisy juries and the choice of trial mode
327
Kalven, H. and Zeisel, H. The American Jury. Chicago: The University of Chicago Press, 1971. Kamisar, Y., Lafave, W.R., and Israel, J.H. Basic Criminal Procedure: Cases, Comments and Questions, Fifth Edition. St. Paul: West Publishing Company, 1980. Klevorick, A.K. and Rothschild, M. “A Model of the Jury Decision Process.” Journal of Legal Studies, Vol. 8 (1979), pp. 141–164. Klevorick, A.K. and Winship, C. “Information Processing and Jury Decision Making.” Journal of Public Economics, Vol. 23 (1984), pp. 245–278. Kreps, D. and Wilson, R. “Sequential Equilibria.” Econometrica, Vol. 50 (1982), pp. 863–894. P’ng, I.P.L. “Strategic Behavior in Suit, Settlement and Trial.” Bell Journal of Economics, Vol. 14 (1983), pp. 539–550. Posner, R.A. “An Economic Approach to Legal Procedure and Judicial Administration.” Journal of Legal Studies, Vol. 2 (1973), pp. 399–458. Rubinfeld, D.L. and Sappington, D.E.M. “Efficient Awards and Standards of Proof in Judicial Proceedings.” RAND Journal of Economics, Vol. 18 (1987), pp. 308–315. Schulhofer, S.J. “Criminal Justice Discretion as a Regulatory System.” Journal of Legal Studies, Vol. 17 (1988). pp. 43–82. Shavell, S. “The Social versus Private Incentive to Bring Suit in a Costly Legal System.” Journal of Legal Studies, Vol. 11 (1982), pp. 334–339. Sheflin, A. and van Dyke, J. “Jury Nullification: The Contours of a Controversy.” Law and Contemporary Problems, Vol. 43 (1980), pp. 51–115. Visher, C.A. “Juror Decision Making: The Importance of Evidence.” Law and Human Behavior, Vol. 11 (1987), pp. 1–17.
14 Runaway judges? Selection effects and the jury Eric Helland and Alexander Tabarrok
1. Introduction The American civil jury is on trial. It has been charged with being biased in favor of the plaintiff, subject to emotion rather than reason, inaccurate in its understanding of law, and wildly unpredictable. Evidence of jury bias, in the form of the anecdote, is found regularly on the pages of the Wall Street Journal and in the popular press [see e.g., Adler (1994)]. Anecdotes, however, almost invariably focus attention on the atypical rather than the typical, and are thus misleading.1 Furthermore, anecdotes, even if accurate, miss the point if judges would have made the same decisions in the same circumstances. Realistic reform requires that we compare alternative institutions, all of which may be imperfect. If judges and juries decide cases similarly, then the charges leveled against the jury are moot since the judge is the primary alternative decision maker. Only if judges decide cases differently do restrictions on civil juries have any hope of achieving their aims. It is therefore important to bring the available evidence to bear on this fundamental question; do trial judges reach systematically different decisions than juries? In Section 2 we survey the literature on judge versus jury trials. In Section 3 we discuss our dataset and present data on mean awards and win rates across judge and jury trials. The fact that the average jury award is much larger than the average judge award is the point of departure for the remainder of the article. How much of this difference may be explained by differences in the sample of cases coming before judges and juries? We answer this question first by asking how juries would have decided the sample of cases going to judges and then by asking how judges would have decided the sample of cases going to juries. We ask both questions in three stages, progressively controlling for larger sets of independent variables and more sophisticated error structures across selection and award equations. We find that we can explain three-quarters to two-thirds of the difference in mean awards across judges and juries, thus demonstrating that most of the difference in mean awards is due to sample differences and not to different attitudes or decision processes across judges and juries. Nevertheless, 25–33 percent of the differences in mean awards is unexplained. In Section 6 we examine the corollary question,
Selection effects and the jury
329
“Holding the sample of cases constant, in what respects do judges decide cases differently than juries?” A direct comparison of judge and jury award equations reveals small but significant differences in judge and jury decision processes. Although our focus is on awards, we also examine differences in win rates and ask how much of the judge/jury difference is explained by sample selection and how much by differing decision processes. In Section 7 we offer some concluding remarks.
2. Judges versus juries Are juries out of control? Compared to whom? The usual answer has been “yes,” at least compared to judges. In particular, juries in personal injury torts are often accused of compensating sympathetic accident victims even when the defendant has not committed a tort. The North Carolina Hospital Association, for example, claimed that Often awards have little relationship to the seriousness of injury. There is no way to predict how a jury will rule on a particular set of facts . . . . Often awards bear no relationship to economic losses . . . today juries often make awards regardless of the “fault” of anyone—out of sympathy for an injured person . . . too often juries appear to award on [the] basis of emotion as opposed to facts and/or realistic evaluation of case circumstances.2 Bernstein (1996) agrees, calling juries “a disaster for the civil justice system” because they “undermine certainty, are incompetent to decide complex cases, and often base their decisions on illegitimate factors.” In England, Canada, and Australia, Bernstein notes pointedly, “judges alone handle personal injury cases.” Bernstein and others argue that judges are better than juries at evaluating complicated evidence (a factor in many medical malpractice and product liability trials), they are less likely to be swayed by emotion, and are more likely to closely follow the principles of tort law. Tort reforms therefore typically try to limit the jury’s discretion by imposing limits on the amounts that juries may award for pain and suffering, to give one example. More generally, opponents of the current tort system point out that compared with the rest of the world the American reliance on the jury is anachronistic and should be curtailed.3 Perceived differences between juries and judges are not limited to critics of the tort system. Practitioner’s handbooks on trial law, for example, often suggest that, “As a general rule, most plaintiffs with highly charged cases want a jury in the hope that the jury will be swept away in a tide of emotion and award large damages” (Izard, 1998). Juries are also said to be preferable when the case does not rest on complex facts or legal technicalities and when the plaintiff is a “little guy” relative to the defendant [see, e.g., Haydock and Sonsteng (1991) and Izard (1998)].
330 Eric Helland and Alexander Tabarrok One would expect lawyer perceptions of the trial process to be reasonably accurate, so it’s quite surprising that the academic literature on judges versus juries does not find a large difference in decision making. In their classic study, The American Jury, Kalven and Zeisel (1966) surveyed the judges who presided over some 4000 civil jury trials. In 78 percent of the trials, the presiding judges would have ruled the same as the juries had it been up to them. This rate of agreement is comparable to the rate of agreement among different experts of all kinds (e.g., scientists doing peer review, physicians diagnosing patients, etc.) and, of importance, it is comparable to the rate of agreement among different judges (Diamond, 1983).4 When Kalven and Zeisel found disagreement among judge and jury it was just about as likely that the judge found liability and the jury did not as the reverse.5 Most of the studies of judge/jury differences rely on hypothetical questions—judges are asked what they would have done if they had been responsible for deciding the case—or they rely on artificial experiments. Almost no research has been done using nonsurvey data on judge and jury outcomes. The first systematic effort to look at this question using litigation data was by Clermont and Eisenberg (1992). Clermont and Eisenberg compare win rates and awards in a sample of federal civil trials. They find that win rates often differ significantly across the trial forum and not always in ways predicted by the critics of the jury system—in some types of cases plaintiff win rates are higher in judge trials than in jury trials. Clermont and Eisenberg are primarily interested in explaining why judge trials are more prevalent in some areas of litigation than in other areas. In particular, they focus on the puzzle of why plaintiffs predominantly choose jury trials even in case categories where judge win rates are significantly higher than jury win rates. They suggest that a combination of selection effects and misperceptions might explain the data. We also offer some comments on this issue below.
3. Mean awards and win rates To test whether judges and juries decide cases similarly we use a large dataset that includes data on settlements as well as trial outcomes. The data were extracted from Jury Verdict Research’s (JVR’s) Personal Injury Verdicts and Settlements on CD-ROM.6 Data from trials are drawn directly from court records. Using an extensive survey of lawyers, JVR also collects data on settlements. Our dataset contains information on 59,304 trials, and 27,429 settled cases.7 The dataset spans all 50 states. The earliest cases were tried in 1988 and the most recent cases date from 1996. All award amounts are corrected for inflation by conversion into 1996 dollars. Table 14.1 presents data on win rates, mean and median awards, the log standard deviation of awards, and the number of trials in each category (Table 14.1 does not include data on settlements). At first glance, the table appears to support the claims of jury reformers that the jury is biased toward the plaintiff. The mean award in a case before a jury is more than twice as
Selection effects and the jury
331
Table 14.1 Judge/jury differences (all trials)
Win rate Mean awarda Median awarda Mean of log awards Standard deviation of log awards (Dollar equivalent)a,b Number of trials
Juries
Judges
Two-sided p-value on differencec
56.67% $696,149 $74,879 11.24 2.188 ($603,156) 53,335
67.73% $218,629 $17,279 10.02 1.853 ($121,885) 5969
0.000 0.000 0.000 0.000 0.000
Source: JVR. Notes: a Conditional on a plaintiff win. b Since dollar awards are not normally distributed, the standard deviation of dollar awards is not informative. The standard deviation of log awards has meaning, however, because log awards are well approximated by a normal distribution. To convert a standard deviation in logs back to a dollar figure we evaluate at the mean of the log awards. c The p values for the difference in win rates, means, and standard deviations are two-sided and were computed using standard tests available in any text (e.g., Aczel 1996). The difference in medians test was computed using a Monte Carlo method with 5000 replications.
large as the mean award in a case before a judge. Contrary to the conventional wisdom, however, the win rate before judges is significantly higher than the win rate before juries. The higher judge win rate, however, does not fully offset the higher awards before juries—the expected award is higher before a jury than a judge. The median award before a jury is significantly higher than the judge median. The higher jury mean is thus not simply an artifact of the occasional astronomical award before a jury. In both judge and jury cases the mean award is well above the median award, suggesting a strongly right-skewed distribution. Figure 14.1 is a kernel density estimate of log awards in judge and jury trials.8 Since the kernel density estimate for log awards is approximately normal, the distribution of dollar awards is approximately lognormal. The density function for jury awards clearly has a larger mean and standard deviation than that for judge awards. Aside from the higher win rate in judge trials, the raw data appear to support the case for jury reform.
4. The importance of sample selection Sample selection and awards The average judge award is 31 percent of the average jury award. How much of this difference can be explained by differences in the sample of cases appearing before judges and juries? To answer this question we examine potential sources of different judge/jury samples and ask how much of the difference in average awards can be explained by sample variation if the null
332
Eric Helland and Alexander Tabarrok
Figure 14.1 Kernal estimation of all cases: judge and jury.
hypothesis of no difference in judge and jury decision processes is true. We examine three sources of potential sample variation: 1) case categories, 2) injuries and other variables, and 3) unobserved sample selection effects. It is well known that awards in product liability and medical malpractice cases are much larger than in premises liability and auto injury cases.9 Column one of Table 14.2, for example, shows a regression of log awards (in jury trials) on these four case categories. Evaluated at the mean log award, awards are approximately $180,000 and $187,000 larger than average in product liability and medical malpractice cases and $25,000 and $100,000 lower than average in premises liability and auto cases, respectively. Product liability cases and medical malpractice cases are comparatively rare; they make up 4.7 percent and 7.3 percent of jury trials, respectively. Premises liability and auto injury cases are much more common; these case categories account for 15.2 percent and 47.5 percent of our sample of jury trials, respectively (the remainder are miscellaneous torts). (Descriptive statistics on all variables can be found in Table 14.A1 in Appendix A.) If product liability and medical malpractice cases are proportionately a larger part of the judge sample than the jury sample, this could explain why the average jury award is so much larger than the average judge award. In fact this is the case, the high-award case types of product liability and medical malpractice make up only 1.51 percent and 1.58 percent of judge trials, respectively, while the low-award case types of premises liability and auto injury account for 9 percent and 64.9 percent of the judge sample, respectively. To establish the importance of this source of variation we ask, “If juries had decided the cases
Selection effects and the jury
333
Table 14.2 Award regressions
Constant
OLS jury
OLS jury
11.747*** (.023)
11.61*** (.069) .423*** (.051) .272*** (.02) .633*** (.071) −.995*** (.065) −1.16*** (.0769) −.134 (.113) .327*** (.218) −.101*** (.037) .815*** (.47) .652*** (.055) −.915*** (.0299) 2.22*** (.197) .091* (.049) −.528*** (.028) .322*** (.218) −.081 (.129) −.106 (.022) .18*** (.024) 30,226
Number of defendants Expected years of life left Major injury Minor injury Emotional distress Bad faith Male Premises liability Medical malpractice Product liability Auto
−.386*** (.038) 1.311*** (0.49) 1.255*** (.0579) −1.27*** (.0285)
Poverty Joint and several liability Noneconomic cap Collateral sources No punitive Punitive cap Evidence standard Number of cases
30,226
Notes: *, **, *** Significant at > 0.1, > 0.05, and > 0.01 levels, respectively. Standard errors in parentheses.
that actually went to judges how much lower would the average award have been?”10 Using the coefficients from Table 14.2, we find that if juries had decided the sample of cases going to judges, the average award would have been 63 percent of the average jury award. Thus just over half of the
334
Eric Helland and Alexander Tabarrok
difference in average judge and jury awards can be explained solely by differences in the sample of four case categories going to judge and jury trial.11 We now add injuries, differences in tort law across the states, the number of defendants, and local poverty rates to the list of variables that may lead to different judge/jury samples. Our dataset has descriptive information on the victim’s injury. We code this information into six variables. Five of the variables—major injury, minor injury, emotional distress, bad faith, and wrongful termination—are dummy variables. Major injury is set equal to one if the victim suffered a permanent injury such as loss of a limb, brain damage, or blindness. Minor injuries are those that are (potentially) temporary, for example, broken arms, broken legs, concussions, or wounds. A pianist might consider a broken finger a major injury if recovery was not 100 percent complete. We do not know all of the specifics of a case so we cannot control for potential miscodings of this type; nevertheless any coding errors will be uncorrelated with our other independent variables. Emotional distress indicates cases in which the victim suffered emotional or psychological injuries. Bad faith cases are those in which an insurance company is sued for refusing to pay a claim. Wrongful termination is set equal to one when the plaintiff claims a wrongful termination of employment. To prevent perfect collinearity with the intercept term we suppress wrongful termination. We also include a sixth variable, the expected years of life left in a case in which the victim died. We calculated the expected years of life left using the age at death and actuarial tables which control for age and sex. We do not have data on lost wages, but we do include a dummy variable set to one when the victim was a male, on the theory that average wage losses are higher for males than females. Together these variables control for the severity of the plaintiff’s injury. In addition to injuries, we include a number of legal variables that may affect liability. Under the joint and several rule, any defendant can be liable for a plaintiff’s entire injury regardless of the relative contribution of that defendant to the injury. Some states have modified the joint and several rule to limit the liability of some defendants (e.g., a defendant responsible for less than 50 percent of the injury may not be assessed more than his relative contribution). Joint and several is set equal to one if the state has modified the rule and if there is more than one defendant. Noneconomic cap is set equal to one if state law puts a cap on damages due to pain and suffering or other noneconomic losses. Punitive cap and no punitive control for states that cap punitive awards or prohibit them altogether.12 Evidence standard is set equal to one if the state requires that “malicious intent” be proven for punitive damages to be recoverable. Under the collateral sources rule, payments to the plaintiff from a third party (i.e., insurance) are not deducted from damages due from the defendant. If collateral sources is set equal to one the collateral sources rule is weakened so that some offset is allowed.13 The variable poverty measures the percentage of the population in poverty in the county in which the trial occurs. Helland and Tabarrok (1999) find that
Selection effects and the jury
335
higher rates of poverty in the trial county are significantly associated with larger awards. The number of defendants is included as another control variable that may affect the size of awards. The descriptive statistics for the independent variables are given in Appendix A. Column 3 of Table 14.2 shows the impact of these variables on awards. As before, awards are higher than average in product liability and medical malpractice trials and lower than average in premises liability and auto trials. Also, as expected, awards are higher than average in cases involving deaths and major injuries and lower than average in cases involving minor injuries, emotional distress, or bad faith contracting. Limitations on joint and several awards tend to raise awards, a result the opposite of that expected, but the effect is small and not statistically significant at the 5 percent level. Caps on noneconomic awards and punitive awards appear to reduce awards as intended; in both cases the effect is highly statistically significant. Evidence standards, however, do not appear to lower awards. Awards also tend to be larger in states where the collateral sources rule is weakened, perhaps because juries increase awards if they think insurance payments will later be deducted.14 Trials with multiple defendants appear to generate larger awards than otherwise similar trials.15 Finally, the higher the poverty rate the county in which the trial occurs (the jury pool), the greater the award.16 (We look at marginal effects in more detail further below.) If trials before judges tend to involve fewer deaths or major injuries than trials before juries or if they tend to occur in richer counties or in states which cap pain and suffering awards, then differences in the sample could explain differences in the average award. Taking into account all of these possible sources of variation we find that if the judge sample had been tried before a jury, the average award in the judge sample would have been 56 percent lower than the average jury award. Case type variables alone already suggested that the average award in the judge sample would be 63 percent lower than the jury average. Injuries, differences in tort law, the number of defendants, and local poverty rates do not therefore greatly increase our ability to explain the difference in judge and jury average awards.17 Using case types and all of the additional variables we are able to explain approximately 62 percent of the difference in average awards, (100 − 56)/(100 − 31) = 0.63. The Heckit model As noted above, one potential problem with estimating the effect of the independent variables on trial awards is that awards do not represent a random sample. To be awarded damages before a jury, for example, at least one of the defendants must have requested a jury trial, the case must not have been settled, and the plaintiff must have won at trial. Unless the sample selection is controlled for, the parameter estimates may be biased because unobserved sources of variation in the forum, settlement, and win decisions could be correlated with unobserved sources of variation in the award
336
Eric Helland and Alexander Tabarrok
equation. To account for this sample selection we estimate probit models for the forum, settlement, and win decisions and then use Heckman’s (1979) procedure to control for any correlation of errors between each of these decision equations and the award equation. For tractability we assume that the errors of the probit equations are uncorrelated with each other. The award at trial is thus estimated by, log(award ) = Xυ βυ + λJ βλ + λWβλ + λTβλ + ε, J
W
T
where log(award) is the trial award, XV are the variables described above, λi, i = J, T, W are inverse Mill’s ratios, βλ , i = J, T, W are the vector of coefficient estimates from the decision equations (see below), and ε is the error term. The coefficients are estimated by ordinary least squares. The least squares covariance matrix will, however, be biased because the disturbance term in the award equation is, by construction, heteroscedastic. The correct asymptotic covariance matrix is i
3
var[β1, β2, . . . βk] = [X*V ′ X*V ]
−1
冤X*′ (σI − Π) X* + 冱 Q 冥 [X*′ X*] , V
V
j
V
V
−1
i=1
where X* V = [XV | λJ | λw | λT], Π = diag(π1 . . . πn), πi = βλ2 δJ + βλ2 δT + βλ2 δW, J
T
W
δj = λj (λj + γjXj) Σj = asymptotic covariance matrix for estimates of [βi] ∆j = diag(δi . . . δn) Qj = βλ2 (X*V ′ ∆j Xj) Σj (X′j ∆jX*V), i
j = J, T, and W, and σ2 = (1/n)e′e − (1/n) Σjπi. There is one remaining complication even after selection effects have been controlled for; plaintiffs and defendants decide whether or not to settle a dispute based in part upon expectations of the trial award. Decisions about whether to pursue a judge or jury trial are also likely to be based in part on expectations of future outcomes. To account for these considerations we use a two-stage procedure. In the first stage we run through each of the equations to create for each case a shadow trial award, a shadow probability of winning, and a shadow probability of going to jury trial. We then reestimate the model
Selection effects and the jury
337
in the second stage using the shadow variables as estimates of plaintiff and defendant expectations. In effect, the first-stage estimates use all of the independent variables in a given equation as instruments for the shadow variables (structural variables) in the second stage.18 The estimation procedure is depicted in Figure 14.2. The selection effects Forum selection The first decision equation to estimate is the choice of judge or jury trial. We model the decision to choose a jury rather than a judge as a function of the default forum, the relative costs of each forum, expected differences in the judge and jury award, and certain case characteristics. In every state, both the defendant and the plaintiff have a right to a jury trial in just about any case involving money damages.19 In many states, however, a bench trial is the default. In these states, if the plaintiff or defendant want a jury trial it must be requested, often in writing within a short period of time after filing or responding to a complaint. The default rule will determine the forum if both the plaintiff and defendant are indifferent to forum or if one of the parties wants a jury trial but doesn’t realize that it must be specifically requested. We define the dummy variable, default, to be one if the default forum is a judge trial and zero otherwise. We expect that default will reduce the probability of a jury trial and thus will have a negative sign. The expected cost of each forum is proxied by the expected time from filing to decision before a judge and jury. Cases scheduled to be decided by a judge typically reach court and are tried faster than cases before juries. We modeled
Figure 14.2 Estimation procedure.
338
Eric Helland and Alexander Tabarrok
the duration of time to decision using a sample of 36,896 cases tried before a jury and 5496 cases tried before a judge. Included within our model are injury variables (death, major, minor, etc.), case types (product liability, medical malpractice, etc.), the number of defendants, and as a measure of the state court queue, the number of filings per judge by state. We found that a model of duration based on the logistic hazard function fit the data well. The results of the model are presented in Table 14.B1 in Appendix B. Subtracting the expected time to a judge decision from the expected time to a jury decision creates the time difference variable. We expect that as the costs (time to decision) of a jury trial increase the probability of selecting a jury trial will diminish.20 If the expected award in a judge trial exceeds the expected award in a jury trial, the defendant will request a jury trial. On the other hand, if the expected award in a judge trial is less than the expected award in a jury trial, the plaintiff will request a jury trial. To account for this symmetry we define defense request to be equal to the expected judge award minus the expected jury award if the difference is positive, and if the difference is negative we define plaintiff request to be the expected jury award minus the expected judge award. We expect both defense request and plaintiff request to be positive. The number of defendants is included because any defendant can request a jury trial. Thus we might expect that the probability of a jury trial will increase with the number of defendants. Alternatively, cases with a large number of defendants may be especially complex and potentially time consuming, thus plaintiffs and defendants may agree to a judge trial to save court costs. Finally, a dummy variable for cases involving auto accidents is included, as these cases tend to be more symmetrical on a number of important dimensions than other cases. Both defendants and plaintiffs are typically individuals in auto cases, for example, and this may make jury bias against defendants less likely than when the defendant is a business (other auto case symmetries are discussed in the results section). The decision to proceed to trial We model the decision to settle using a model based on Gould (1973), Posner (1973), Priest and Klein (1984), and others.21 The settlement model suggests that the settlement decision is a function of the variance of plaintiff and defendant’s prediction errors, the expected award, risk, court and settlement costs, and stake asymmetry. We proxy for each of these factors using the following variables. As noted above, we create for each case a shadow probability and a shadow award. We proxy for prediction error by the variance of the shadow probability, p(1 − p). The shadow award proxies for the expected judgment amount, and we measure risk as the variance of the expected award, p(1 − p)X2, where X is the shadow award and p is the shadow probability. Court costs are again proxied by the expected time to trial
Selection effects and the jury
339
weighted by the probability of a judge or jury trial.22 We expect that the longer the expected time to trial the greater the expected court costs and thus the greater the incentive to settle. We include the number of defendants as a proxy for settlement costs. If holdout and bargaining problems when defendants must allocate damages among themselves increase the difficulty of reaching a settlement, trials will become more likely the greater the number of defendants. Alternatively, the cost per defendant falls for any given compensatory claim and thus, if the defendants can agree on an allocation, settlement costs may fall with more defendants. In product liability and medical malpractice cases the award to the plaintiff in the event the plaintiff wins may underestimate the cost to the defendant. A loss in one product liability case may generate further lawsuits, and a loss in a medical malpractice case might mean further scrutiny of the defendant doctor from, say, a hospital board, and may even cause a loss of operating rights. We include product liability and medical malpractice dummies to account for these effects. In addition to the factors suggested directly by the model, we include several other variables. Nonpecuniary elements may enter into a plaintiff’s bargaining efforts if a death, particularly a child’s death, is involved in the dispute. Defendants may also be more likely to settle these types of cases if a trial would generate negative publicity. To control for possible nonpecuniary elements in bargaining, we include two variables, a dummy variable labeled child (set equal to one if a child died), and the expected number of years of life left in cases involving an adult death. Kornhauser and Revesz (1994; see also Donohue 1994) show that the joint and several liability rule, under which any one defendant is liable for the damages of all, can change the probability of settlement. Whether the probability of settlement increases or decreases, however, depends on the correlation of the defendant’s probabilities of winning at trial. As the correlation between the defendant’s probabilities of success at trial increases, the probability of settlement decreases. We include a variable called joint and several which is equal to one in states which have weakened the joint and several rule so that liability of some defendants (e.g., a defendant responsible for less than 50 percent of the injury) may not be assessed more than his relative contribution. Joint and several could be either positively or negatively signed. Lawyers paid on a contingency fee basis are willing to settle for lower amounts than their clients because the lawyers, not the clients, bear most of the costs of a trial (Miller 1987; Thomason 1991). When information is imperfect, lawyers may convince plaintiffs to settle even when a betterinformed plaintiff would prefer to go to trial. In some states contingency fees are capped, limited, or court reviewed, while in others any fee agreed upon by the plaintiff is acceptable. No limit is a dummy variable set equal to one in states with no limits on contingency fees. We expect that settlements will be more likely in states that have no limits on contingency fees.
340
Eric Helland and Alexander Tabarrok
Our sample of cases underrepresents settlements and overrepresents trials as compared to population proportions. To rebalance our sample the settlement equation is estimated using the weighted exogenous sample maximum likelihood estimator (WESML) of Manski and Lerman (1977).23 In our application, the WESML is essentially a weighted probit model where the weights are equal to population proportions divided by sample proportions. A number of studies have found that approximately 10 percent of tort cases go to trial; we therefore use 10 percent as our estimate of the population proportion of trials to settlements.24, 25 Plaintiff win equation The probability that the plaintiff wins is estimated separately for judge and jury trials using a probit model. To account for different decision standards we include dummy variables for the case types, medical malpractice, product liability, auto, and premises liability. Life expectancy is included to account for any differences in the probability of winning a case in which a death was involved. Some states allow a “products defense” in product liability cases. A typical products defense might allow a defendant to claim that the product, say a knife, was “inherently dangerous” and thus injuries from ordinary use do not impose liability on the defendant. Products defense is a dummy variable set equal to one in product liability cases in states allowing a products defense. Heckit results The results of the selection equations are presented in Table 14.3. Since we have discussed the results from a similar settlement equation at length elsewhere (see Helland and Tabarrok, 1999) we will mention only a few variables briefly. All of the variables in the model are highly statistically significant and their signs are as expected. The sign on var P is positive, indicating, as the Priest–Klein model predicts, that the more uncertain the trial outcome (win/lose) the greater the probability of going to trial. Settlements are more likely (trials less likely) in states with no limits on contingency fees. Cases involving a death are more likely to settle than other cases. Higher expected court costs (as measured by expected time to trial) result in more settlements (a lower probability of going to trial). We now examine in more detail the choice of forum equation, the win equations, and the award equations. The descriptive statistics are given in Table 14.A1 in Appendix A. Forum choice results Juries are selected in 90 percent of the personal injury cases in our sample. If judges and juries grant similar awards—as our results indicate—why are most
Selection effects and the jury
341
Table 14.3 Forum choice, trial, and win sequential probit results Variable Constant
1.873*** (.051)
Expected years of life left if defendant died Product liability Medical malpractice Auto
Jury win probit
Judge win LR test χ probit
−5.968*** (.704) −.033***
.227*** (.011) −.0025
.283*** (.033) −.089***
(.008) .346*** (.0418) 1.332*** (.107)
(.006) −.231*** (.034) −.595*** (.02) .19*** (.014) −.265*** (.017)
(.019) −.5** (.128) −.936*** (.093) .66*** (.042) −.399*** (.054)
−.125*** (.046)
−.045 (.207)
Forum choice Trial probit probit
−1.02*** (.045)
Premise liability Joint and several liability
(.012) −.251***
−.195***
(.0181)
(.0269) −.547*** (.0137)
Child
Default Plaintiff request Defendant request
−.575*** (.053) −.271*** (.0164) .181*** (.0154) .773*** (.025)
Expected time to triala
13.26*** 114.55*** 5.656**
.141
−.628*** (.106) .21*** (.023) 34.69***
Expected award Variance of expected award (risk) Var p Number of cases
4.18*
(.026) −.059***
Products defense
Time difference
18.879***
.144***
No limit on contingency fees
Number of defendants
2.54
59,304
(2.48) −.0238*** (.003) 86,733 53,335
5969
Notes: *, **, *** Significant at > 0.1, > 0.05, and > 0.01 levels, respectively. Asymtotic standard errors in parentheses. a The weighted average of expected time until a jury trial and expected time until a trial before a judge.
342
Eric Helland and Alexander Tabarrok
personal injury trials held before juries? Two reasons help to explain this. First, both the plaintiff and the defendant have the right to a trial by jury, so bench trials occur only when both the plaintiff and the defendant prefer a judge. Assume that there are no differences in the costs of trying a case before a judge or jury, and assume that judges and juries are unbiased in the sense that if every case were to be tried before both a jury and a judge, then half of the time the jury would grant a larger award to the plaintiff and half of the time the judge would grant a larger award. Let both plaintiffs and defendants have rational expectations about judge and jury bias. If the errors in plaintiff and defendant expectations are independently distributed, then 75 percent of trials will be jury trials. As the correlation of defendant and plaintiff errors increases, the probability of a jury trial increases. When errors are perfectly correlated so that whenever the plaintiff believes the jury to be in his favor the defendant agrees (i.e., thinks a judge would be in the defendant’s favor) and vice versa, then 100 percent of trials will be jury trials. Thus it is not difficult to reconcile an observation of 90 percent jury trials, even if there are no differences in awards between judges and juries. If juries are slightly biased toward plaintiffs (or defendants, although most observers suggest this is not the case) then the reconciliation is a fortiori. A second reason for the predominance of jury trials is that if plaintiff lawyers think that juries are biased, then jury trials will predominant even if judges and juries grant identical awards on average. Note that if judges and juries are unbiased, then rationality does not require lawyer perceptions to match reality since there is no cost to false perceptions. Furthermore, since win rates and mean awards differ significantly by case type, false lawyer perceptions can easily drive average awards and win rates in judge and jury trials far apart. It’s possible that a “perceptions equilibrium” may arise in which average awards and win rates are driven in just such a way that the false perceptions of lawyers appear to be verified in the data. Although we find some differences in judge and jury decisions, false lawyer perceptions seemingly verified by, on average, low judge awards and high jury awards may be responsible for some of the predominance of jury trials in personal injury cases. If false and self-reinforcing perceptions are responsible for forum choice decisions, then we would expect to see quite different and arbitrary forum choices across different classes of cases. Clermont and Eisenberg (1992) examine forum choice across contract, personal property torts, fraud, personal injury, and other case types and find that a model of false and sometimes self-reinforcing lawyer perceptions is the best explanation for the data. Given a set of perhaps false baseline perceptions, forum choice decisions made on the margin will still be rationally responsive to other considerations. We find, for example, that in states where the default trial is a judge trial, jury trials are 4.9 percent less likely than in other states. The coefficient on time difference is also significant and negative. A one standard deviation increase in the time to jury trial relative to a judge trial decreases the probability of a jury trial by 2.4 percent. Both plaintiff request and defense
Selection effects and the jury
343
request are positive, which indicates that the plaintiff chooses a jury trial when the expected jury award rises above the expected judge award, and the defendant chooses a jury trial when the expected jury award falls below the expected judge award. Defendants appear to be slightly more sensitive to differences in the expected award across forums than are plaintiffs. If the expected jury award rises above the expected judge award by $1,000, the probability of a jury trial increases by 1.6 percent, but if the expected jury award falls below the expected judge award by $1,000, the probability of a trial increases by 2.7 percent. Adding a second defendant reduces the probability of a jury trial by 3 percent. This is consistent with the theory that both plaintiffs and defendants prefer judges in more complex cases. Finally and importantly, auto cases are 18 percent less likely to be tried before a jury than nonauto cases.26 An important aspect of auto cases is that they are more symmetrical than most other cases. Auto cases often occur between two individuals (rather than between an individual and a business), both of whom are injured and neither of whom has much deeper pockets than the other. Each of these aspects of symmetry makes jury bias less likely. Thus in auto cases both defendant and plaintiff should be more likely to take advantage of faster judicial decision making by accepting a bench rather than a jury trial. Jury win equation Results from the jury win equation are presented in column 4 of Table 14.3. Win rates are significantly lower than average in product liability, medical malpractice, and premises liability cases, lower in states and cases in which a products defense is applicable, and higher than average in auto cases. Win rates do not appear to differ from average in death cases. The results for the judge equation are given in column 5 of Table 14.3. Results are similar in sign in judge trials, except death cases before judges significantly reduce the chances of winning and product defense rules have no impact. We discuss differences between judge and jury win rates at greater length in Section 5. Award equation The award equation found in column 1 of Table 14.4 is similar, although not identical to the OLS equation. Of importance, the inclusion of the inverse Mill’s ratios, which control for unobserved correlation of the error terms across the selection and award equations, significantly increases the fraction of the difference in judge and jury means that can be explained by differences in the sample. If the sample of cases actually decided by a judge had instead been decided by a jury, the average award in that sample would have been 47.1 percent lower than the average award in the jury sample. Thus 77 percent of the difference in the average judge and jury award
344
Eric Helland and Alexander Tabarrok
Table 14.4 Award equation results for Heckit estimation
Constant Number of defendants Expected years of life left Major injury Minor injury Emotional distress Bad faith Male Premises liability Medical malpractice Product liability Auto Poverty Joint and several liability Noneconomic cap Collateral sources No punitive Punitive cap Evidence standard IMR trial mode IMR settle IMR win Number of cases
3-level Heckit jury
3-level Heckit judge
10.96*** (1.15) .66*** (.0606) .35*** (.026) .862*** (.0792) −.926*** (.0715) −1.06*** (.0843) −.013 (.123) .325*** (.024) −.726** (.289) −.693 (.681) −.112 (.327) −.336* (.193) 3.03*** (.233) −.055 (.0568) −.479*** (.032) .250*** (.024) −.152 (.144) −.146*** (.024) .106*** (.027) −1.00*** (.172) −.866*** (.0623) 3.41** (1.75) 30,226
15.72*** (3.19) .666*** (.162) .322 (.275) −.58* (.34) −1.65*** (.324) −1.92*** (.354) −.966* (.518) .157** (.08) −.079 (1.3) 1.89 (3.38) 1.03 (1.78) −.555 (1.71) −2.64*** (.86) −.02 (.16) −.189* (.113) .0042 (.118) .186 (.586) −.07 (.084) .173 (.116) .399 (.258) −1.73*** (.228) −1.53 (5.33) 4043
Notes: *, **, *** Significant at > 0.1, > 0.05, and > 0.01 levels, respectively. Correct standard errors in parentheses—see text.
F test 1.96 .001 .0009 16.44*** 4.8** 5.49** 3.37* 4.05** .361 .563 .398 .016 40.15*** .198 6.04** 4.14** .315 .749 .313 20.3*** 13.5*** .779 34,269
Selection effects and the jury
345
can be explained by differences in the sample of cases appearing before judges and juries (100 − 47.1)/(100 − 31) = 0.766.
5. What would judges have done with trials that went to juries? We have far more observations on jury trials than on judge trials, so the jury equation is better estimated. Since the jury equation is better estimated we can get better estimates of what juries would have done with the judge sample than what judges would have done with the jury sample. Nevertheless, the latter question is also interesting and is not equivalent to the former. Suppose that juries receive cases of type A and judges receive cases of type B. It’s possible that juries would treat every type B case just as judges would, but that judges would treat type A cases quite differently than juries. We use the same three sets of variables as above. Recall that the actual jury mean is 3.18 times as high as the actual judge mean. Using only case type variables, we find that if judges had tried the cases that were actually tried by juries, the mean award would have been 1.58 times as high as the actual judge mean.27 Case type variables alone therefore explain 26.6 percent of the difference in mean awards [(158–100)/(318–100) = 26.6]. Using our second set of variables, which adds injury and law variables, we find that if judges had decided the sample of cases going to juries, awards in that sample would have been 2.08 times as high as the actual judge mean. Thus our second set of variables increases the explanatory power to approximately 50 percent [(218–100)/(318–100) = 49.5]. Adding the sample selection effects, we find that if judges had decided cases which actually went to juries, awards in that sample would have been 2.36 times as high as in the actual judge sample. Thus our most comphrensive set of variables is able to explain 62.5 percent of the difference in mean awards. Although we find that three-quarters to two-thirds of the differences in mean awards is due to sample differences, there is still a significant unexplained difference in mean awards.
6. Comparing the decision process of judges and juries Awards A more detailed examination of the judge and jury equations sheds light on where differences in decision making occur.28 Column 2 of Table 14.4 contains a judge award equation comparable to the Heckit model for juries in column 1. In column 3 we give F tests of the difference in coefficient values across the two equations. The F tests indicate that there are systematic differences between judges and juries in the impact that various factors have on awards. Bearing in mind that the judge equation is not as well estimated as the jury equation and that some of the coefficient values in the judge equation
346
Eric Helland and Alexander Tabarrok
appear implausible, we can gain some insights by comparing the judge and jury coefficients. Juries appear to be more sympathetic to injured plaintiffs than are judges. Holding the sample constant, juries give larger awards than judges for every injury category, with the exception of expected years of life left, for which no significant judge/jury differences are found. Caps on damages for pain and suffering (noneconomic caps) cause a greater decline in awards when the case is decided by a jury than when the case is decided by a judge. The greater effectiveness of caps on juries is also consistent with the evidence on injuries discussed above. If juries grant larger awards than judges for pain and suffering when they are allowed to do so, it follows that juries rather than judges will be constrained by caps. Since judges grant fewer large pain and suffering awards to begin with, we find that caps on judges are “less effective” (because they are less necessary). The collateral resources rule also has a different impact on juries than judges; it increases the award in jury trials but has no effect on trials before a judge. Again this effect is consistent with a story in which juries neutralize a weakening of the collateral sources rule by topping awards up, while judges, perhaps out of greater respect for the law, do not try to offset the law’s intended effect. The most robust difference between judges and juries arises in the impact of local poverty. A one standard deviation increase in the local poverty rate raises jury awards by $22,913, but causes a slight reduction in judge awards of $3,394 (evaluated at the means). Poverty was included in the awards regression under the hypothesis that less affluent juries might be more responsive to income redistribution via the courts. Under this reasoning we would expect poverty to affect jury awards but not judge awards. Although statistically significant, the negative effect of poverty on judge awards is relatively small, and thus our results are consistent with the theory that less affluent juries are more sympathetic to plaintiffs. The influence of local poverty on juries is the most important explanation for the “unexplained” difference in average awards. If poverty had no affect on juries, that is, if the coefficient on poverty in the jury equation were zero, then we could have explained 100 percent of the difference in average awards on the basis of sample differences. In other words, if poverty had no influence on jury awards, juries would have given the same average award to the judge sample of cases as judges actually gave. Two of the selection terms differ across judges and juries. Not surprisingly, the coefficient on the inverse Mill’s ratio generated by the trial forum equation is different, which suggests that the sample of cases going to juries and judges is different. More interesting is the fact that the inverse Mill’s ratio for settlement has a different effect for jury trials than for judge trials. The data suggest therefore that settlement behavior is different depending on whether the case is scheduled to be decided by a judge or a jury. Unfortunately we are unable to investigate this effect in detail since we do not
Selection effects and the jury
347
have data on whether settled cases were scheduled to be decided by a judge or jury.29 Win rates – sample selection or differences in decision processes? We turn now to a more complete discussion of win rates across judges and juries. The average win rate in jury cases is 56.67 percent and in judge cases is 67.73 percent. Using the coefficients for the jury win equation in column 4 of Table 14.3, we can estimate what the win rate would have been if the sample of cases going to judges had instead been decided by juries—60.04 percent. Sample selection can thus explain about 30 percent of the difference in judge and jury win rates [(60.04–56.67)/(67.73–56.67) = 0.3]. Since most of the difference in win rates appears not to be caused by sample selection, there may be significant differences in win decision processes across judges and juries. Using a likelihood ratio (LR) test, given in column 6 of Table 14.3, we can compare the coefficients from the jury and judge win equations given in columns 4 and 5 of Table 14.3. The test rejects at the 10 percent level or greater the null hypothesis of identical judge and jury win coefficients for every variable except products defense and the constant. Marginal effects from the jury and judge win equations are presented in Table 14.5. Significantly almost all of the marginal effects run in the opposite direction to that of the average win rate. The average win rate is higher for judges than for juries, but this is almost entirely due to the higher win rate of auto cases before judges than before juries. Consistent with the anecdotal evidence, plaintiffs with product liability and medical malpractice cases are more likely to win before juries than before judges (although these cases are harder to win than the average in both forums).
7. Discussion Bernstein (1996) argues that in an ideal world juries would be “eliminated” for civil trials. He continues, that unfortunately this would be unconstitutional in most states. As a result, the most “important measure that legislatures can take to eliminate the pernicious effects of civil juries is to remove the issue of damages from the jury and put it in the hands of judges.” Table 14.5 Marginal effects, judge and jury win equations
Expected years of life left Product liability Medical malpractice Auto Premises liability Product defense
Jury
Judge
0.09% −9.1% −23.3% 7.4% −10.5% −4.9%
−3.1% −19.7% −35.5% 23.1% −14.8% −1.5%
348
Eric Helland and Alexander Tabarrok
Our results show that such a reform would have a smaller effect on awards than Bernstein and other tort reformers imagine. There is some truth, however, to the views of the tort reformers. Juries do grant systematically larger awards to injured plaintiffs than judges. Juries also appear to be more receptive to “redistribute the wealth” arguments than judges. In particular, juries drawn from pools with high poverty rates grant systematically larger awards than judges and than juries drawn from more affluent regions. Win rates in product liability and medical malpractice cases are higher before juries than judges. The differences in judge and jury decision making we have discovered, however, explain only one-quarter to one-third of the difference in average award rates across judges and juries. Three-quarters to two-thirds of the difference in average awards is due not to differences in decision making but to differences in the sample of cases appearing before judges and juries. The difference in average awards across judges and juries gives a very misleading picture of what would happen if the United States followed the rest of the world and shifted decision making from the judge to the jury. Tort reformers often point the finger of blame for high awards on juries, but the revolution in product liability and medical malpractice law which has occurred over the past 40 years has been a product not of juries but of judges (Epstein 1980; Priest 1991). If juries have granted large awards in class action suits, it is the judges who have rewritten the law to enable those suits to be brought, often on the flimsiest of evidence. From this perspective, it’s not surprising that judges grant similar awards to juries—the judges are leading the charge.
Appendix A: Descriptive statistics Table 14.A1 Descriptive statistics Variable
Mean
Std. dev.
Jury trial awards Log(jury award) Number of defendants Expected years of life left Major injury Minor injury Emotional distress Bad faith Male Premises liability Medical malpractice Product liability Auto Poverty Joint and several liability Noneconomic cap
11.24 .2336 .2433 .1132 .7268 .05102 .0126 .51 .152 .0729 .04744 .4752 .1281 .2366 .1908
2.187 .4055 .9215 .3169 .4456 .22 .1117 .5 .3591 .26 .2126 .4994 .05512 .42499 .3929
Selection effects and the jury Variable Collateral sources No punitive Punitive cap Evidence standard Judge trial awards Log(judge award) Number of defendants Judge trial awards Expected years of life left Major injury Minor injury Emotional distress Bad faith Male Premises liability Medical malpractice Product liability Auto Poverty Joint and several liability Noneconomic cap Collateral sources No punitive Punitive cap Evidence standard Forum choice Forum choice (jury = 1) Auto Number of defendants Time difference Default (judge = 1) Plaintiff request Defendant reguest Trial equation Does the case go to trial (yes = 1) Product liability Medical malpractice Expected time to trial Number of defendants Child Expected years of life left Joint and several liability No limit on contingency fees Jury win equation Plaintiff win at a jury trial (yes = 1) Product liability Auto Medical malpractice Premises liability Expected years of life left Products defense
Mean .4983 .0711 .52 .322 10.027 .245
Std. dev. .5 .08404 .4996 .4674 1.853 .41094
.156 .121 .766 .0493 .0866 .48 .0933 .0151 .0158 .649 .138 .203 .231 .273 .0047 .606 .262
.77 .326 .4237 .2164 .09268 .5 .2909 .122 .125 .4773 .0583 .4022 .421 .446 .0684 .489 .4396
.899 .4197 .2332 1.009 .229 .455 .662
.3008 .4935 .4092 .2321 .4204 .659 .792
.683 .0474 .0955 6.663 .239 .434 .288 .225 .47
.465 .212 .294 .157 .4153 .496 1.003 .417 .499
.567 .057 .407 .116 .178 .2705 .0287
.4955 .2315 .4913 .3203 .3822 .9663 .1671 (Continued )
349
350
Eric Helland and Alexander Tabarrok Table 14.A1—continued Variable Judge win equation Plaintiff win at a judge trial (yes = 1) Product liability Auto Medical malpractice Premises liability Expected years of life left Products defense
Mean .677 .028 .534 .0426 .141 .2171 .01
Std. dev. .468 .165 .499 .202 .348 .906 .0998
Appendix B: Duration results Table 14.B1 Time to trial results
Constant Death Major injury Minor injury Emotional distress Premises liability Medical malpractice Product liability Log (number of defendants) Auto Number of cases filled per judge in the state Number of cases
Logistic hazard model jury
Logistic hazard model judge
7.15*** (.419) .133*** (.023) .143 (.213) .03 (.02) −.097 (.023) −.005 (.104) .211*** (.119) .212*** (.016) .046*** (.004) −.164*** (.009) −.067*** (.005) 36,896
5.109*** (.273) .073 (.154) −.245* (.135) −.14 (.126) −.153 (.146) .204*** (.077) .292** (.124) .327** (.14) .173*** (.026) −.42*** (.054) .1** (.034) 5496
Notes: *, **, *** Significant at the > 0.1, > 0.05, and > 0.01 levels, respectively. Asymptotic standard errors in parentheses.
Selection effects and the jury
351
Notes 1 In other cases the anecdotal evidence is just wrong. For example, Vidmar (1997) points out that the well-known case of a women who was awarded damages because a CAT scan destroyed her psychic abilities had in fact suffered permanent brain damage due to an allergic reaction to a contrast dye and collected damages because of her inability to work. Her job merely happened to be a psychic. 2 Reported in United States General Accounting Office (U.S. GAO), Report to Congressional Requesters, Medical Malpractice: Case Study in North Carolina (Dec. 1986). This and many other similar quotations can also be found in Vidmar (1997). 3 Schuck (1993) reviews a number of jury reform proposals. Bernstein (1996) is particularly antagonistic toward juries. 4 Of interest, this rate of agreement is almost identical to the appeals courts affirmation rate of trial verdicts (81 percent). See Clermont and Eisenberg (1999). 5 The Kalven and Zeisel results are supported by other research showing that judges and juries reach similar decisions in similar cases and that juries appear to respond to information in reasonable ways. A number of articles in Litan (1993) make this point, see especially Lempert (1993:235) who writes, “The weight of evidence indicates that juries can reach rationally defensible verdicts in complex cases [and] that we cannot assume that judges in complex cases will perform better than juries. . . .” The literature on the quality of jury decision making is reviewed in Hans and Vidmar (1986); see also Clermont and Eisenberg (1992). 6 JVR market their data to lawyers who are seeking to ascertain the value of their cases by comparing them with similar cases. In other words, lawyers use JVR data to create rational expectations of case outcomes. The JVR dataset is the largest and most extensive dataset on state court records currently extant. In our estimation the dataset is of much higher quality (in terms of accuracy, missing records, size, and extent of coverage) than most government-generated datasets. 7 The dataset originally contained two extreme outliers, awards of $4.25 and $5 billion. We eliminated these outliners from all computations. We also eliminated all class action suits. Thus the injured party in every case in our sample is an individual. Some cases have multiple defendants. 8 We use a biweight kernel with smoothing parameter optimized on the assumption that the underlying data are normally distributed [see Silverman (1986) for more information on kernel estimation]. The use of other kernels and/or smoothing parameters does not materially affect the results. 9 Tabarrok and Helland (1999) and Helland and Tabarrok (1999) show that awards are higher in product liability and medical malpractice cases than in other cases even after controlling for injuries. 10 Later we report what judges would have done had they decided the cases that actually went to juries. 11 Under the null hypothesis we initially expect judge and jury awards to be the same. The “unexplained” difference is thus 100% − 31% = 69%. If we can explain 100 − x of this difference, then the ratio of the unexplained to the explained is (100 − x)/ (100 − 31). Note that when x = 31, 100 percent of the difference is explained and when x = 100, as would have been the case if the sample of case types going to judge and jury trial were the same, then none of the difference is explained. Letting x = 63, we have that 53 percent of the difference in averages is explained by differences in case types. 12 No state prohibits punitive damages absolutely and completely. Punitive damages are prohibited in New Hampshire, for example, except where explicitly allowed for by statute. 13 The American Tort Reform Association (ATRA) home page (http://www.atra.org) contains information on tort reform legislation by state.
352
Eric Helland and Alexander Tabarrok
14 Some of these results may be subject to endogeneity problems—perhaps states with above average punitive damage awards are more likely to pass evidence standards than other states—so we cannot make definitive conclusions about the effect of various laws. “Reduced form estimates,” however, are all we need in order to examine the role of sample effects in explaining differences in average judge and jury awards. 15 There are no class action suits in our sample. 16 We also included a specification with poverty and poverty squared. The high correlation of these variables made interpretation more difficult than with the simpler specification used in the text, but the comparison across judge and juries was similar. 17 The case category variables explain a larger fraction of the judge/jury difference than the “injury” set of variables, regardless of the order in which variables are added. Since the marginal explanatory power does vary, however, with the order in which variables are added, the total explanatory power is the more important result. 18 An estimation procedure similar to that described here was first used in Danzon and Lillard (1982). 19 In most states the right to a jury trial is protected by the state constitution, but in some it is based only on statute. 20 Surveys indicate that both plaintiffs and defendants prefer shorter time to trials (see, e.g., Miller, 1992). It is sometimes argued that defendants want longer times to trial to avoid paying damages. We find this theory dubious as longer times to trial increase everyone’s costs and plaintiff lawyers are sure to correct damage measures for inflation and interest. Since either the defendant or plaintiff can request a jury trial, however, this possibility does not change the expected sign or interpretation of our results. 21 Cooter and Rubinfeld (1989) review the literature. 22 That is, Pr(judge trial) * expected time until trial before a judge + Pr(jury trial) * expected time until trial before a jury. 23 The WESML is applied in a problem similar to ours by Boyes, Hoffman, and Low (1989). 24 In their survey of the literature Cooter and Rubinfeld (1989:1070) note, “A typical finding is that 10 disputes settle out of court for every one that is tried.” Using one month of data from 33 courts, the National Center for State Courts (1994) finds that approximately 5 percent of tort cases go to trial. Using data on 2996 torts in federal court Waldfogel (1995) finds an average trial rate of 18.7 percent. Danzon and Lillard (1983) find that 12 percent of medical malpractice cases go to trial. 25 Our results are robust with respect to varying the weights in the WESML estimator. 26 For dummy variables (d ) we calculate the exact difference between the probability of jury trial when d = 0 and when d = 1 when all other variables are at their means. For continuous variables we calculate marginal effects using the derivative at the mean of all variables. 27 We only report equation results for the most inclusive judge equation, given in column 2 of Table 4. Other results are available from the authors upon request. 28 The lower bounds of what can be explained by differences in samples are 77 percent and 62.3 percent, since the inclusion of more variables, even random ones, would allow more of the difference to be explained. We are confident, however, that the additional explanatory power of any further variables is low. This specification includes two variables added to an earlier specification at the request of referees. The two additional variables raised the explanatory power by less than 2 percent. 29 Put differently, we are forced by data limitations to assume that cases are settled before the trial forum is decided upon.
Selection effects and the jury
353
References Aczel, Amir. 1996. Complete Business Statistics, 3rd ed. Homewood, IL: Richard Irwin. Adler, Stephen J. 1994. The Jury: Trial and Error in the American Courtroom. New York: Random House. Bernstein, D. E. 1996. “Procedural Tort Reform: Lessons from Other Nations,” 19 Regulation. Available at http://www.cato.org/pubs/regulation/reg19n1.html. Boyes, W. J., D. L. Hoffman, and S. A. Low. 1989. “An Econometric Analysis of the Bank Credit Scoring Problem,” 40 Journal of Econometrics 3–14. Clermont, Kevin M., and Theodore Eisenberg. 1992. “Trial by Jury or Judge: Transcending Empiricism,” 77 Cornell Law Review 1124–1177. ——. 1999. “Appeal from Jury or Judge Trial: Defendants’ Advantage.” Presented at the annual meeting of the American Law and Economics Association, Yale Law School, New Haven, CT. Cooter, R. D., and D. L. Rubinfeld. 1989. “Economic Analysis of Legal Disputes and Their Resolution.” XXVII Journal of Economic Literature 1067–1097. Danzon, P. M., and L. A. Lillard. 1982. The Resolution of Medical Malpractice Claims: Modeling the Bargaining Process. R-2792-ICJ Institute for Civil Justice (Rand Corporation, Santa Monica, CA). ——. 1983. “Settlement Out of Court: The Disposition of Medical Malpractice Claims,” 12 Journal of Legal Studies 345–377. Diamond, S. S. 1983. “Order in the Court: Consistency in Criminal Court Decisions,” in C. J. Scheirer and B. L. Hammonds, eds. The Master Lecture Series: Psychology and the Law, vol. 2. Washington, D.C.: American Psychological Association, pp. 123–146. Donohue III, J. J. 1994. “The Effect of Joint and Several Liability On Settlement: Comment on Kornhauser and Revesz,” XXIII (1 pt. 2) Journal of Legal Studies 517–558. Epstein, R. A. 1980. Modern Product Liability. Westport, Conn.: Quorum Books. Gould, J. P. 1973. “The Economics of Legal Conflicts,” 2 Journal of Legal Studies 279–300. Hans, V. P., and N. Vidmar. 1986. Judging the Jury. New York: Plenum Press. Haydock, R., and J. Sonsteng. 1991. Trial: Theories, Tactics, Techniques (American Casebook Series). St, Paul, Minn.: West. Heckman, J. 1979. “Sample Selection Bias As a Specification Error,” 47 Econometrica 153–161. Helland, Eric, and Alex Tabarrok, 1999a. “The Effect of Electoral Institutions on Tort Awards,” working paper. Available at http://www.independent.org. Izard, R. A. 1998. Lawyers and Lawsuits: A Guide to Litigation. New York: MacMillan Spectrum. Kalven, Harry Jr., and Hans Zeisel. The American Jury. Boston: Little, Brown, 1966. Kornhauser, L. A., and R. L. Revesz. 1994. “Multidefendant Settlements: The Impact of Joint and Several Liability,” XXIII (1 pt. 1) Journal of Legal Studies 41–76. Lempert, R. 1993. “Civil Juries and Complex Cases,” in R. E. Litan, ed. Verdict: Assessing the Civil Justice System. Washington, D. C.: The Brookings Institution, pp. 181–247. Litan, R. E. 1993. Verdict: Assessing the Civil Jury System. Washington, D. C.: Brookings Institution.
354
Eric Helland and Alexander Tabarrok
Manski, C. F., and S. R. Lerman. 1977. “The Estimation of Choice Probabilities from Choice-Based Samples,” 45 Econometrica 1977–1988. Miller, G. P. 1987. “Some Agency Problems in Settlement,” XVI Journal of Legal Studies 189–215. Miller, Neal. 1992. “An Empirical Study of Forum Choices in Removal Cases under Diversity and Federal Question Jurisdiction,” 41 American University Law Review 369–452. National Center for State Courts. 1994. State Court Caseload Statistics Annual Report 1992. Williamsburg, VA: NCSC. Posner, R. A. 1973. “An Economic Approach to Legal Procedure and Judicial Administration,” 2 Journal of Legal Studies 399–459. Priest, G. 1991. “The Modern Expansion of Tort Liability: Its Sources, Its Effects, and Its Reform.” 5(3) Journal of Economic Perspectives 31–50. Priest, G. and B. Klein. 1984. “The Selection of Disputes for Litigation.” 13 Journal of Legal Studies 1–55. Schuck, P. H. 1993. “Mapping the Debate on Jury Reform,” in R. E. Litan, ed. Verdict: Assessing the Civil Jury System. Washington, D. C.: Brookings Institution, pp. 306–340. Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. London: Chapman & Hall. Tabarrok, A., and E. Helland. 1999. “Court Politics: The Political Economy of Tort Awards,” XLII (1 pt. 1) Journal of Law and Economics 157–188. Thomason, T. 1991. “Are Attorneys Paid What They’re Worth? Contingent Fees and the Settlement Process.” XX Journal of Legal Studies 187–233. Vidmar, Neil. 1997. Medical Malpractice and the American Jury. Ann Arbor: University of Michigan Press. Waldfogel, J. 1995. “The Selection Hypothesis and the Relationship Between Trial and Plaintiff Victory.” 103 Journal of Political Economy 229–260.
15 Reasonable doubt and the optimal magnitude of fines Should the penalty fit the crime? James Andreoni *
1. Introduction Models of the enforcement–compliance relationship generally treat both the probability and magnitude of fines as independent choice variables of the government. This approach has led Becker (1968) and others1 to conclude that it is optimal to set uniformly maximal penalties for all crimes, and to set the probability of conviction at the minimum level necessary to enforce compliance with the law. While this normative prescription is natural and intuitive, there still remains a larger positive question of why in most nations of the world penalties are not uniformly high, but rather rise with the severity of the crime. One possible resolution of the positive and normative questions may lie in the fact that penalties and probabilities of conviction are not generally independent. Convictions are typically determined by judges or jurors who are instructed to convict only if the evidence convinces them “beyond a reasonable doubt” that the accused is guilty. Psychologists, however, have found that a juror’s willingness to convict may be influenced by more than just the evidence. Jurors are very sensitive to the potential penalties that defendants may pay, with higher penalties leading to lower probabilities of conviction (Vidmar, 1972). This effect is evident in recent econometric studies on the deterrent effect of penalties that show that higher penalties reduce the number of convictions (Snyder, 1990; Andreoni, 1991), and it is consistent with a great deal of anecdotal evidence relating to the sometimes counterproductive effects of minimum sentence requirements (Lachman, 1981). This suggests that probabilities of conviction may be inversely related to the magnitudes of the penalties. If increasing penalties causes convictions to fall fast enough, then higher penalties could potentially encourage rather than deter crimes. This institutional feature could cause us to revise the normative conclusions about the optimal magnitudes of fines. This article presents a simple theory of how a judge or juror determines whether he has been convinced “beyond a reasonable doubt” of a defendant’s guilt. I show that the determination of how much doubt is “reasonable” is likely to depend on the level of punishment perceived to befall the accused,
356
James Andreoni
with higher penalties reducing the range of reasonable doubts. As a result, any increase in penalties is likely to result in reduced probabilities of conviction.2 I then examine the conditions that are necessary for an increase in penalties to reduce this probability so much that crimes are actually encouraged. Surprisingly, the requirement is very weak and is likely to be met in practice. Extending the model to the question of maximal deterrence, I find that the institution of the “reasonable doubt test” leads to optimal penalties that rise with the severity of the crime, in contrast to Becker’s (1968) famous finding. This result is important in that it does not rely on any notions of marginal deterrence, or on subjective evaluations of the social cost of incarcerating innocent people, but relies only on the positive model of jury decision making. This model can also be generalized to include the fact that many cases are settled with plea bargains or dismissals rather than jury trials. This is because the outcome of pretrial bargaining is directly related to the penalty attainable in court, and the probability of attaining it (see, e.g., Reinganum (1988)). The model can also be applied to regulation, to labor contracts, and to enforcement–compliance relationships in general. This article is organized as follows. Section 2 presents a model of the juror’s decision problem and of criminal behavior. Section 3 considers the conditions under which increasing penalties may increase crime. Section 4 looks at optimal deterrence. Section 5 discusses other applications and extensions of the reasonable doubt model.
2. A theoretical model of reasonable doubt There is a large literature on the optimal probability and magnitude of fines that explores Becker’s (1968) provocative finding that uniformly maximal penalties will always yield maximal deterrence. Stigler (1970) argues that more severe crimes should receive more severe penalties in order to provide “marginal deterrence.” However, Posner (1985) points out that there is a tradeoff between marginal deterrence and total deterrence and, moreover, that society can still maintain marginal deterrence with uniform penalties by changing the probability of capture. Polinsky and Shavell (1979) show that less than maximal penalties are efficient when there are crimes in which the private benefit to the criminal exceeds the social cost of the criminal activity. In this case, it would not be efficient to deter all crimes. If the private benefit of the crime would never exceed the social cost, maximal penalties would still be optimal. More recent discussions include Kaplow (1989), who shows that maximal penalties may be undesirable because they increase the social cost of riskbearing by those who are not deterred. Rubinfeld and Sappington (1987) show that lower penalties will reduce the social cost of legal expenditures by defendants and prosecutors, while Malik (1990) shows that lower penalties decrease the social cost of apprehending offenders. Neither Kaplow (1989), Rubinfeld and Sappington (1987), nor Malik (1990) offer explanations for
Reasonable doubt and the magnitude of fines
357
graduated penalties. Two recent studies do find that graduated penalties may be optimal for some range of crimes. Shavell (1989) and Mookherjee and P’ng (1989) demonstrate that when there is joint production in surveillance for some crimes, then optimal deterrence is maintained when penalties rise with the severity of the crime. However, when there is no joint production, as with specific investigation of crimes, then uniformly maximal penalties may again be optimal. Next let us examine a simple model of the decision process of the judge or juror. I shall show that, in general, maximal penalties will not be desirable and that optimal penalties should rise with the severity of the offense, that is, penalties should “fit the crime.” Juror behavior For simplicity, I shall refer to the trier-of-fact as the juror, although the model could apply equally well to the decision process of the judge in a nonjury trial. When considering a juror’s motivation for voting to convict or acquit, we must assume that the juror takes his role seriously. That is, we must assume that he weighs, in some manner, the consequences of his vote for himself, for those involved, and for society at large.3 The consequences of correct verdicts are generally good: a guilty person gets a punishment, or an innocent person is freed. The consequences of incorrect verdicts are generally bad: a guilty person is released into society, or an innocent person pays a penalty. I assume in this model that the juror assigns a cost to each of these four possibilities, which one could think of as social costs, and then weighs the value of these costs to himself. When a juror incorrectly convicts an innocent person, then there are costs to the accused, such as the penalty paid, loss of permanent income, and stigma, as well as losses to society from a “miscarriage of justice.” Let the juror’s evaluation of these costs be given by c1. Let be the full cost of conviction to the defendant, including penalties paid, loss of income and stigma. Since c1 must include , write c1 = c¯ + , where c¯ ≥ 0. When a juror incorrectly acquits a guilty person, there may be consequences for society in “turning loose” a potentially dangerous criminal, as well as other social costs from a miscarriage of justice. Let c2 indicate the juror’s evaluation of the costs of incorrect acquittal. Since acquitting perpetrators of more serious crimes is both more “dangerous” and more “unjust,” c2 should be higher the more severe the crime.4 Turning to correct verdicts, assume for simplicity that jurors evaluate correct verdicts the same, regardless of the crime or the potential penalties. That is, the cost of a correct conviction is the same as that of a correct acquittal. This assumes that there is no vengeance or sympathy on the part of jurors that would lead them to take special pleasure in, for instance, “a good hanging” for a notorious criminal. Later I shall consider such effects, but they will only underscore the results of the article.
358
James Andreoni
Next, assume that the juror weighs these costs for himself. For instance, he must consider how he would feel if he mistakenly convicted an innocent person. We will assume that this disutility can be represented by the function v(c). Again for simplicity, we will normalize the utility from a correct decision to be zero. Hence, v(c1), v(c2) < 0. Finally, assume that jurors are risk averse. Since we are dealing with losses, this implies v′ < 0 and v″ > 0. Suppose that after hearing the evidence, the juror feels there is a probability p that the defendant is guilty. Hence, the amount of doubt in the mind of the juror is reflected in the extent to which p is less than one. If the juror votes to convict, then the expected utility is is EV c = p · 0 + (1 − p) · v(c1). If the vote is to acquit, then the expected utility is EV a = p · v(c2) + (1 − p) · 0. Hence, the jurors will vote to convict if EV c > EV a, that is, if (1 − p) · v(c1) ≥ p · v(c2).
(1)
Looking at (1) we see that, for any given c1 and c2, there is some critical level of p, say p*, such that the juror will convict if and only if p ≥ p*. Solving from (1) we get p* =
v(c1)
. v(c1) + v(c2)
(2)
This now gives us a model of how a juror will determine reasonable doubt. If the p generated by the evidence is p ≥ p*, then the evidence is convincing beyond a reasonable doubt and the vote will be to convict. However, if p < p*, then there is too much doubt, and the juror will vote to acquit. Equation (2) illustrates how this model differs from previous models of jury decision making. Prior studies have not explored the possibility that the utility of jurors in any circumstance may depend on the potential penalties, and that p* may be a function of those penalties.5 Taking the derivative of (2) with respect to the penalty, , and rearranging, we see that dp* d
=
v′(c1) p*(1 − p*) > 0. v(c1)
Hence, as the penalty upon conviction becomes more strict, so do the juror’s demands on the evidence. Intuitively, increasing the penalty will increase the possible loss from voting for conviction, while the loss from voting for acquittal stays the same. As a result, the juror will demand a higher probability of guilt before voting for conviction. For simplicity, let the probability density function that ex ante describes the possible values for p be described by a uniform distribution. Then, conditional on being apprehended, the probability of being convicted is simply
Reasonable doubt and the magnitude of fines
359
1 − p*. Hence, the probability of conviction falls as the penalty rises. We see now that if convictions are determined through the reasonable doubt test, then the probability and magnitude of fines are not independent. Penalties cannot be increased without diminishing the chances of gaining a conviction. Criminal behavior Suppose that if a criminal commits a crime and is not convicted, then he gets a benefit b1. If he is caught and convicted, then he must pay the penalty discussed above. In this case he will get a net benefit b2 = b¯ − . Let the criminal’s utility be represented by the function u(b). Furthermore, assume that the criminal is risk averse, so u′ > 0 and u″ < 0. First, consider the probability that a person will be apprehended and convicted, given that the person has committed the crime. Let α be the probability that a person is apprehended. Then the probability that the person will be apprehended and convicted is q = α(1 − p*). Next, consider the expected utility of not committing the crime. Utility in this contingency includes the state in which the person is untouched by the law, as well as the state in which the person is wrongfully accused and convicted. For simplicity, normalize the expected utility of not committing a crime to be zero. For any given level of penalties, there is no loss in generality from this normalization. However, each time we change the penalty it will affect the utility and the probability of being wrongfully convicted. As a result, we must renormalize utility. But since for most individuals the probability of being wrongly apprehended and convicted is extremely low, the effects of this renormalization would be very slight. Hence we assume that the effect is sufficiently small to be safely ignored. Note that this is not inconsistent with the presumption that there can be a significant probability that a defendant is innocent. While the individual risk of being wrongfully convicted may be low, the aggregate risk of apprehending the wrong person may still be nonnegligible. Finally, we must consider the expected utility of committing the crime. Since we are using law-abiding behavior as the origin, we assume that u(b1) > 0 > u(b2). Hence, a potential criminal will commit a crime if and only if the expected utility from the crime is greater than zero, that is, if EU = (1 − q) · u(b1) + q · u(b2) > 0.
(3)
There will be some critical value of q, say q*, such that the criminal will only commit the crime if q > q*. Solving from (3) we see q* =
u(b1)
. u(b1) − u(b2)
Taking the derivative of (4) with respect to , and rearranging, we find
(4)
360
James Andreoni dq* d
=
u′(b2) u(b2)
q*(1 − q*) < 0.
(5)
Hence, for any given q, raising the penalty will reduce the number of crimes committed. This is the standard result of optimal deterrence. For any given q, the higher the , the lower the expected utility of criminal activity, as shown in Figure 15.1. So if is sufficiently high, then the government can deter all crimes. Notice, too, that a reduction in q will shift the curve in Figure 15.1 to the right. Hence, if apprehending and prosecuting criminals is expensive, while administering penalties is relatively inexpensive, then we get the result indicated in the introduction: it is optimal to set q very low and penalties as high as possible. But, as the last subsection indicates, it is not in general feasible to separate q and ; the deterrent effect of a high penalty is counteracted by the reduction in the probability of being convicted. As I shall show in the next section, it is possible that the latter effect may dominate. In such a case, the expected utility of a potential criminal will look like that in Figure 15.2. Here, for sufficiently high , utility again becomes positive, so that increasing penalties will actually encourage criminal activity rather than deter it.
3. When can higher penalties encourage criminal behavior? The situation drawn in Figure 15.2 is, of course, not universally true for all configurations of preferences among criminals and jurors. This section identifies and discusses some necessary conditions for the situation depicted in Figure 15.2 to occur.
Figure 15.1 Standard result of optimal deterrence.
Reasonable doubt and the magnitude of fines
361
Figure 15.2 Expected utility of a potential criminal.
If an increase in the penalty is to encourage criminal activity, then there must be a such that dEU/d > 0 for some q ≤ q*. As can be seen from (3), this implies dEU dq = [u(b2) − u(b1)] − qu′(b2) > 0. d d
(6)
Rearranging (6), substituting dq*/d from (5), and using the definition of q* in (4), one can easily show that inequality (6) will hold if and only if −
dq d q
>−
dq* . d q*
The left-hand side of this expression reflects the rate of reduction in the probability of being convicted. This is a “permissiveness elasticity.” The right-hand side reflects the rate of reduction in the criminal’s threshold. This is a “deterrence elasticity.” Thus we get a very natural condition that increasing the penalty will increase criminal activity if at some q ≤ q* the permissiveness elasticity exceeds the deterrence elasticity, that is, if p falls at a faster rate than q* rises. A second and more interesting implication of Figure 15.2 is that the expected utility of the criminal has a (local) minimum in at some finite level of . The necessary and sufficient condition for this to hold is that at an extremum the second derivative of EU is positive. Assuming that an extremum exists, then we can check the second-order condition by
362
James Andreoni
differentiating (6) again. After doing so and rearranging, I show in the Appendix that the second-order condition will hold if and only if r c < rj + 2
1 ’ d p*(1 − p*)
dp*
(7)
where rc = − u″(b2)/u′(b2) is the (absolute) risk aversion of the criminal, and rj = − υ″(c1)/υ′(c2) is the (absolute) risk aversion of the juror. Recall that dp*/d > 0, hence the last term in (7) is positive. Moreover, this term is higher the greater is dp*/d and the closer p* is to one. Therefore, condition (7) is more likely to be met if jurors are highly responsive to penalties, or if penalties are already so high that p* is close to one. Notice, however, that a stronger sufficient condition is simply that rc ≥ rj; that is, criminals are less risk averse than jurors. Intuitively, if jurors are highly risk averse, then increases in penalties will sharply reduce the likelihood of conviction. If criminals are not very risk averse, then increases in the penalties alone will not generate much loss in utility. However, when such increases are accompanied by a decline in the probability of conviction, it is possible that the net effect could be to increase expected utility, and hence encourage crime. It was first noted by Becker (1968) that selection into criminal behavior would result in criminals who are less risk averse than most people, and recent research of Block and Gerety (1988) indicates that criminals actually are less risk averse than the general public. Hence, condition (7) seems likely to be met in practice.
4. Maximal deterrence: should the penalty fit the crime? Suppose that evidence in the case leads a juror to believe that the chance is p that the defendant is guilty. Define * as the solution to (1 − p) · υ (c¯ + *) ≡ p · υ (c2). Then * is the maximum penalty that a juror would tolerate and still vote to convict, and it is the maximum possible penalty that a convicted criminal would ever expect to receive under a trial by jury. Hence, if we let = *, then the expected utility of the criminal will be lower than for any other penalty. Under ideal circumstances, therefore, * will be the optimal penalty for deterring crimes. Recall that the cost of an incorrect acquittal, c2, was assumed to increase with the severity of the crime. By implicit differentiation, we find that d*/ dc2 > 0. Hence, for any given p, the more serious the crime, the higher the c2, and the higher the optimal punishment *. Intuitively, the greater the severity of the crime, that is the higher the c2, the more willing the juror is to convict, all else equal. As c2 increases there is a margin for the government to increase deterrence by also increasing the penalty. Hence, this analysis implies that penalties that provide the maximum deterrence will be finite and will increase with the severity of the crime.
Reasonable doubt and the magnitude of fines
363
5. Extensions and applications While this article has focused on setting penalties solely as deterrents, the two factors of equity and retribution have historically served as the primary bases for determining sentences, as Posner (1981) indicates in his discussion of “just punishment.” This may lead us to question the simplifying assumption made earlier in this article that the utility a juror gets from making a correct decision is independent of the penalty. If a juror feels the penalty is just retribution, then this would raise the utility the juror may get from voting to convict a guilty person. In turn, this may lower the threshold p* relative to that discussed in Section 2 above. It is also possible, however, that retribution could be excessive or unjust; the juror may feel the penalty is too strict to be deserved.6 If penalties enter this domain, the utility of convicting a guilty person may fall. Hence, as penalties rise they may raise the threshold p* relative to that discussed above. Considering these effects would enrich the model of jury behavior and may allow it to be used to discuss such phenomena as “jury nullification” or the “lynch mob” mentality surrounding notorious crimes. However, considering these will not alter the basic finding that increasing penalties will reduce the probability of conviction. This model can also give insights into why penalties may vary across income classes. Landes (1971) indicates that those living in higher-income areas have lower probabilities of conviction, while Hagen’s (1974) literature review suggests that those with higher socio-economic status receive milder sentences. In standard models of fines and imprisonment, such as Polinsky and Shavell (1984), the question of whether it is optimal for wealthier people to receive lighter prison sentences has an ambiguous answer. It is easily shown with the reasonable doubt model7 that increasing the wealth of criminals shifts the expected utility curve up and to the left, as shown in Figure 15.3. As found by Polinsky and Shavell (1984), Figure 15.3 illustrates that the minimum such that the crime is just deterred may be either higher or lower for the wealthy. However, Figure 15.3 also illustrates that it is unambiguous that high penalties are more likely to encourage crime as the wealth of criminals rises. This suggests that the harshest jail terms given to the more wealthy should, as a matter of maximum deterrence, be lower than the harshest jail terms given to the less wealthy. If the range of deterring penalties shrinks, as in Figure 15.3, then the observed probability of a penalty may also decline. Surprisingly, this implies that a bias of the court toward the well-to-do may actually be an efficient aspect of the legal institution.8 There may also be applications of this analysis to uniform sentencing laws. Two laws have taken effect recently that increase the severity of penalties and reduce the discretion of judges in assigning them. The 1984 Sentence Reform Act established the United States Sentencing Commission to add consistency and uniformity to sentencing practices. The 1989 Speedy Trial Act requires swift and harsh “automatic” sentences for drug offenses. There are several ways of theorizing about the effects of such laws, only one of which is offered
364
James Andreoni
Figure 15.3 Wealthy criminals and jail sentence.
in this article. First we must stipulate the reason for the variance in sentences for similar crimes. If, as suggested above, the variance allows judges to establish norms that penalties will fit the strength of the evidence, then the variance is good rather than bad. To reduce the variance via uniform sentencing standards would, according to this model, diminish the freedom of judges to set penalities that “fit the crime.” Hence, it may reduce the fraction of guilty pleas and convictions, and it may increase the incentives for criminal behavior. As indicated recently in the press, there is growing concern that this may be the case.9 The model developed here may also have applications to auditing and regulation. In the literature on tax evasion, theorists generally consider the probability of detection and the fine conditional upon detection as two independent choice variables. Kolm (1973) and Christiansen (1980), for instance, have indicated that the minimum detection effort combined with the maximum penalty (“hang evaders with zero probability”) may be optimal. If we consider that there may be ambiguity in the minds of auditors, or that evaders can contest the findings of a tax auditor in court, then such a policy may become inefficient. Again, optimal compliance may be attained with penalties that fit the crime. Similarly, these results may also apply more generally to the literatures on regulation and torts. Another area of economics that requires costly enforcement of rules is the labor market. Efficiency wage models indicate that firms can enforce work rules by paying workers at a rate exceeding the market clearing wage, and then threaten them with dismissal if they are caught shirking (e.g., Shapiro and Stiglitz (1984)). However, it has been noted that bonds are a lower-cost substitute for efficiency wages (e.g., Lazear (1979)). Dickens, Katz, Lang, and Summers (1989) have argued that the bonding approach is flawed because
Reasonable doubt and the magnitude of fines
365
bonds are not available, perhaps because of credit market constraints. They use as evidence that firms engage in large amounts of monitoring. Another approach to the bonding question is to ask how the determination is made that an infraction merits forfeiture of the bond. One would expect that it is not at the full discretion of the firm. There are often grievance procedures, for instance, and there are possibilities of court challenges, especially if infractions are not easily documented or externally verified. Hence, the larger the penalty to shirking, the more difficult it may be to punish a recalcitrant worker. As a result, bonds are unlikely to be a complete substitute for efficiency wages. This model could also be generalized to assume that jurors are sophisticated game players, that is, jurors may make inferences about the guilt of a defendant from the fact that he is on trial. Such an extension would preserve the basic finding of the simpler model presented here, but could lead to other more general results.
6. Conclusion Standard economic models of law enforcement have taken the penalty and the probability of conviction as independent choice variables for policy makers. This article has shown that in a judicial system built on the “reasonable doubt test,” the penalty and the probability of conviction are not independent. As the penalty increases, the probability of conviction falls. If criminals are less risk averse than jurors, then it is possible that the increased permissiveness of the court may dominate the harshness of the penalty. Therefore, increasing penalties may actually increase crime rates. These results are consistent with several empirical studies of juries and of criminal deterrence. The model also indicates that, all else equal, jurors are more willing to convict when the offense is more serious. This implies that maximal deterrence will be obtained with fines that rise with the severity of the crime, that is, the penalty should “fit the crime.”
Appendix The objective function is EU = qu(b2) + (1 − q)u(b1). The first-order condition is dEU d
= − qu′(b2) +
dq [u(b2) − u(b1)] = 0. d
Differentiating this again and rearranging, we have dq d 2q d 2EU = qu″(b ) − 2 ) + [u(b2) − u(b1)]. u′(b 2 2 d2 d d2
366
James Andreoni
From the first-order condition q = (dq/d)[u(b2) − u(b1)]/u′(b2), and by definition, q* = u(b1)/[u(b1) − u(b2)]. Finally, recall that dq*/d = u′(b2)q*(1 − q*)/u(b2). Substituting this into the above and rearranging, we find d 2EU d
2
=
dq u(b1) d q*
冢
−
u″(b2) u′(b2)
+2
dq* 1 d q*
冣
−
d 2q u(b1) . d2 q*
To derive equation (7), evaluate the inequality dq u(b1) d q*
冢
−
冣
dq* 1 u″(b2) d 2q u(b1) +2 > 0. − 2 u′(b2) d q* d q*
(A1)
Recall that q = α(1 − p*), with p* = v(c1)/[v(c1) + v(c2)], and dp*/d = [v′(c1)/ v(c1)]p*(1 − p*) > 0. Differentiating q twice and simplifying, we see
冢 冣 αp*.
dq v″(c1) dq d 2q −2 =− 2 d v′(c1) d d
2
1
Substituting this into (A1) and rearranging, the condition becomes −
u″(b2) u′(b2)
U* ol (C o | cv = c v),
(2)
s=a
where s = a, . . ., S denote a set of mutually exclusive and jointly exhaustive states of the world including all the possible outcomes of murder; cos denote the offender’s consumption levels, net of potential punishments and other losses, that are contingent upon these states; πs denote his subjective evaluation of the probabilities of these states; and C mo and C ol denote, respectively, his consumption prospect in the event he commits murder or takes an alternative action. To illustrate the behavioral implications of the model via a simple yet sufficiently general example, assume the existence of just four states of the world associated with the prospect of murder as summarized in Table 16.1. In Table 16.1, Pa denotes the probability of the event of apprehension and 1 − Pa denotes its complement—the probability of escaping apprehension; Pc | a denotes the conditional probability of conviction of murder given apprehension, and 1 − Pc | a denotes its complement—the probability of
Table 16.1 Behavioral implications
Apprehension —— No Apprehension
Event
State s
Probabilities Consumption πs prospect Cs
conviction of murder –——
execution imprisonment for murder
conviction of a lesser offense or acquittal
other punishment
(Pa) (Pc | a) (Pe | c) (Pa)(Pc | a) (1 − Pe | c) Pa(1 − Pc | a)
no punishment 1 − Pa
Cd:(co = 0; cv = 0) Ce:(co = c; cv = 0) Cb:(co = b; cv = 0) Ca:(co = a; cv = 0)
374
Isaac Ehrlich
conviction of a lesser offense (including acquittal); finally, Pe | c and 1 − Pe | c denote, respectively, the conditional probabilities of execution and of other punishments given conviction of murder. The (subjective) probabilities of the set of states introduced in Table 16.1 are equal by definition to the relevant products of conditional probabilities of sequential events that lead to this more final set of states. The last column in Table 16.1 lists the consumption levels that are contingent upon the occurrence of these states. Economic intuition suggests that the relevant consumption levels can be ranked according to the severity of punishment imposed on the offender; that is, Ca > Cb > Cc > Cd. In the preceding discussion the incidence of murder has been viewed to be motivated by hate. As hinted earlier in the discussion, however, murder could also be a by-product, or more generally, a complement of other crimes against persons and property. Since the set of states of the world underlying the outcomes of these other crimes also includes punishment for murder, the decision to commit these would also be influenced by factors determining the probability distribution of outcomes considered in Table 16.1. In turn, the incidence of murder would be influenced by factors directly responsible for related crimes. In general, behavioral implications concerning the effect of various opportunities on the incidence of murder ought to be analyzed within a framework that includes related crimes as well. For methodological simplicity and because data exigencies rule out a comprehensive empirical implementation of such a framework, the following discussion emphasizes the effect of factors directly related to murder and the direct effect on murder of general economic factors like income and unemployment. In practice, however, the effect of these latter factors on murder may be due largely to their systematic effects on particular crimes against property. 1. The effects of probability and severity of punishment An immediate implication of the model that is independent of the specific motives and circumstances leading to an act of murder is that an increase in the probability or severity of various punishments for murder decreases, relative to the expected utility from an alternative independent activity, the expected utility from murder or from activities that may result in murder. These implications have been discussed at length elsewhere (see the author (1970, 1973a)) but the somewhat more detailed formulation of the model adopted in this paper makes it possible to derive more specific predictions concerning the relative magnitudes of the deterrent effects of apprehension, conviction, and execution that expose the theory to a sharper empirical test. Specifically, given the ranking of the consumption levels in states of the world involving execution, imprisonment, other punishment, and no punishment for murder, as assumed in the preceding illustration, and given the level of the probabilities of apprehension and the conditional probabilities of conviction and execution, it can be shown that the partial elasticities of the expected
The deterrent effect of capital punishment 375 utility from crime with respect to these probabilities can be ranked in a descending order as follows: ∈Pa > ∈Pc| a > ∈Pe| c
(3)
where ∈P = − ∂ ln U*/∂ ln P for P = Pa, Pc | a, Pe | c.4 The interesting implication of condition (3) is that the more general the event leading to the undesirable consequences of crime, the greater the deterrent effect associated with its probability: a 1 percent increase in the (subjective) probability of apprehension Pa, given the values of the conditional probabilities Pc | a and Pe | c, reduces the expected utility from murder more than a 1 percent increase in the conditional (subjective) probability of conviction of murder Pc | a (as long as Pc | a < 1), essentially because an increase in Pa increases the overall, i.e., unconditional, probabilities of three undesirable states of the world: execution, other punishment for murder, and punishment for a lesser offense, whereas an increase in Pc | a raises the unconditional probability of the former two states only. A fortiori, a 1 percent increase in Pc | a is expected to have a greater deterrent effect than a 1 percent increase in Pe | c as long as Pe | c is less than unity. If there exists a positive monotonic relation between an average person’s subjective evaluations of Pa, Pc | a, and Pe | c and the objective values of these variables, and between an average person’s expected utility from crime and the actual crime rate in the population, equation (3) would then amount to a testable theorem regarding the partial elasticities of the murder rate in a given period with respect to objective measures of Pa, Pc | a, and Pe | c. On the basis of this analysis, it can be predicted that while the execution of guilty murderers deters acts of murder, ceteris paribus, the apprehension and conviction of guilty murderers is likely to have an even larger deterrent effect. Analogous to the effects of the probabilities of various punishments for murder, an increase in the severity of these punishments, their probabilities held constant, is generally expected to decrease the expected utility from murder and so to discourage its commission. Due to lack of space, other implications concerning the effect of severity as well as probability of punishment on the elasticities ∈Pa, ∈Pc | a, and ∈Pe | c are omitted here. For a more complete analysis, see the author (1973b). 2. Effects of employment opportunities, income, and demographic variables The model developed in this section suggests that the incentive to commit murder or other crimes that may result in murder in general would depend on permanent income (or wealth), the relevant opportunities to extract related material gains as well as on direct opportunities for malevolent actions, including the direct costs involved in effecting the production of malevolent transfers. The means for a direct implementation of the effect of these latter
376 Isaac Ehrlich opportunities are not readily available (see, however, the discussion in fn. 14). In contrast, variations in legitimate and illegitimate earning and income opportunities may be approximated by movements in the rate of unemployment and of labor force participation, U and L, respectively, and in the level and distribution of permanent income Yp in the population. The relevance of the latter set of variables has been discussed in detail elsewhere (see the author (1973a)), particularly in connection with crimes against property, some of which involve murder. However, the level and distribution of income within a community may also exert a direct influence on the incentive to commit murder because of their impact on the individual demand for malevolent actions. In addition, although the decision to commit murder is presumably derived from considerations related to lifetime utility maximization, the timing of murder may be affected by variations in the opportunity cost of time throughout the life cycle, because the typical punishment for murder involves a finite imprisonment term. Thus, to the extent that earning opportunities are imperfectly controlled in an empirical investigation, it may be important to investigate the independent effects of variations in demographic variables, such as the age and racial composition of the population, A and NW, respectively. Controlling for variations in age composition may also be important because of the differential treatment of young offenders under the law. B. Defense against murder 1. Factors determining optimal law enforcement activity Following the approach used by Becker (1968), I shall attempt to derive implications concerning law enforcement activity against murder on the assumption that law enforcement agencies behave as if they seek to maximize a social welfare function by minimization of the per capita loss from murder. Losses accrue from three main elements: harm to victims net of gains to offenders; the direct costs of law enforcement by police and courts; and the net social costs associated with penalties. The behavior of enforcement agencies is assumed to be in accordance with the general implications of the deterrent theory of law enforcement. The main elements of the social loss function can be summarized by: L = D(q) + C(q, Pc) + γ1Pc Pc | c qd + γ2Pc(1 − Pe | c)qm
(4)
The term D(q) represents the net social damage resulting from the death of victims and other related losses, where q ≡ Q/N denotes the rate of murder in the population. The term C(q, Pc) represents the total cost of apprehending, indicting, prosecuting, and convicting offenders. The aggregate output of these law enforcement activities can be summarized by the fraction of all
The deterrent effect of capital punishment 377 murders that are “cleared” by the conviction of their alleged perpetrators (assuming a fixed proportional relation between the number of murders and their perpetrators). This fraction θ may be viewed as an objective indicator of the probability that a perpetrator of murder will be convicted of his crime, Pc = Pa(Pc | a) with one qualification: since the overall probability of error of justice, π—that of apprehending and convicting an innocent person—is greater than nil, the true probability of conviction 0 < Pc < 1 will be systematically lower than θ. However, to abstract the analysis from a separate determination of the optimal value of ε, it is henceforth assumed that Pc and θ are proportionally related, so that C can be defined as a direct function of Pc.5 The rate of murder q is introduced as a separate determinant of C because of the argument and evidence that the costs of producing a given value of θ are higher for higher levels of q. The larger is q, the larger the number of suspects that must be apprehended, charged, and convicted in order to achieve a given value of θ. Both D and C are assumed to be monotonically increasing, continuously differentiable, and concave functions in each of their respective arguments. The third and fourth terms in equation (4) represent the per capita social costs of punishing guilty and innocent convicts through execution and imprisonment (or other penalties), respectively. The variables d and m denote the private costs to victims and their families from execution and imprisonment, and the multipliers γ1 and γ2 indicate the presence of additional costs or gains to the rest of society from administering and otherwise bearing the respective penalties of execution and imprisonment that are imposed on guilty and innocent convicts.6 For methodological convenience, the costs of execution and imprisonment can be combined, and equation (4) can be rewritten as: L = D(q) + C(q, Pc) + γ1Pc f q
(5)
where f = (Pe | c)d + γ2(1 − Pe | c)m/γ1 is a measure of the average social cost of punishment for murder. Equation (5) identifies the unconditional probability of conviction Pc, and the expected social cost of punishment f, as the main control variables underlying law enforcement activity. Given the harshness of the method of execution, the length of imprisonment terms, and other factors determining d and m (changes in these factors occur slowly in practice) the magnitude of f is largely a function of the conditional probability of execution Pe | c. The values of 0 < Pc < 1 and 0 < Pe | c < 1 that locally minimize equation (5) must then satisfy the following pair of equilibrium conditions:
冤D + C + C q + γ Pc f (1 − Ep)冥 q = 0
(6)
[Dq + Cq + γ1Pc f (1 − E f )]q f fe = 0
(7)
1
q
q
1
p
p
p
378
Isaac Ehrlich
where Ep ≡ − Ef ≡ − fe ≡
∂Pc q ∂q Pc
≡
1 εPc
∂f q 1 ≡ ∂q f εf
∂f ∂Pe | c
冢
= d−
γ2 γ1
m
冣
and the subscripts p, f , and e associated with the variables C and q denote the partial derivatives of the latter with respect to Pc, f, and Pe | c, respectively. The product γ1 fe indicates the difference between the social costs of execution and imprisonment. In equation (7) the term − (Dq + Cq )qf f e represents part of the marginal revenue from execution: the value of the lives of potential victims saved, and the reduced costs of apprehending and convicting offenders due to the differential deterrent effect of execution on the frequency of murder. The term γ1Pc f (1 − E f )q f fe represents the net marginal social cost of execution: the value to society of the life of a person executed at a given probability of legal error, plus all the various costs of effecting his execution (including mandatory appeals) net of imprisonment costs thereby “saved.” Because in equilibrium, the two must be equated, the optimal value of Pe | c need not be unity—capital punishment may not always be imposed even when it is legal— and would depend on the relative magnitude of the relevant costs and gains. A similar interpretation applies to equation (6). Inspection of the equilibrium conditions given by equations (6) and (7) reveals a number of interesting implications. First, it may be noted that if an increase in Pe | c unambiguously raises the social cost of punishment for murder, that is, if γ1 f e = γ1d − γ2m > 0, then in equilibrium, the deterrent effect associated with capital punishment must be less than unity, or εP | c < εf < 1.7 Put differently, executions must only decrease the rate of murders in the population but not the rate of persons executed, for otherwise the marginal cost of execution would be negative and a corner solution would be achieved at Pe | c = 1. However, equation (7) does not have the same implications regarding the value of εPc. More specifically, equation (6) shows that the marginal costs of conviction include the marginal costs of apprehending and convicting offenders, in addition to the marginal costs of punishing those convicted. Therefore, the overall marginal revenue from convictions must also be higher than that from executions. Indeed, by combining equations (6) and (7), it can readily be shown that in equilibrium, εPc > εf > εPe | c;8 that is, the deterrent effect associated with Pc must exceed the differential deterrent effect associated with Pe | c. This proposition is essentially the same as that derived regarding the response of offenders to changes in Pc and Pe | c (see
The deterrent effect of capital punishment 379 equation (3)). The compatibility of the implications of optimal offense and defense under the assumption that both offenders and law enforcement agencies regard execution to be more costly than imprisonment insures the stability of equilibrium with respect to both activities. It also provides the basis for a sharp empirical test of the theory. 2 The Interdependencies Among the Murder Rate and the Probabilities of Conviction and Execution9 Any exogenous factor causing a decrease in the severity of punishment for murder via a decrease in Pe | c can be shown to increase the value of Pc because it tends to decrease the marginal costs of conviction and increase its marginal revenue. More specifically, given the values of d and m, an increase in social aversion toward capital punishment or in the costs of the related due process, measured by γ1, can be shown to produce a decline in the optimal value of Pe | c and a simultaneous increase in the optimal value of Pc. This analysis is consistent with an argument often made regarding the greater reluctance of courts or juries to convict defendants charged with murder when the risk of their subsequent execution is perceived to be undesirably high. Conviction and execution thus can be considered substitutes in response to changes in the shadow price of each. Indeed, the empirical investigation reveals that at least over the period between 1933 and 1969, in which the estimated annual fraction of convicts executed for murder in the United States, denoted by PXQ1, fell from roughly 8 percent to nil, the national clearance ratios of reported murders, denoted by P0a, and the fraction of persons charged with murder who were convicted of murder, denoted by P0a, on the whole, moved in an opposite direction. The zero-order correlation coefficient between PXQ1, and P 0a is found to be −0.028, while that between PXQ1 and P0c | a is found to be −0.19. (In principle, the product P0aP0c | a approximates the value of Pc.) The general implication of this analysis is that the simple correlation between estimates of the murder rate and the conditional probabilities of execution cannot be accepted as an indicator of the true differential deterrent effect of capital punishment, because the simple correlation is likely to confound the offsetting effects of opposite changes in Pc and possibly also in the probability and severity of alternative punishments for murder. Just as convictions and executions are expected to be substitutes with respect to changes in the shadow cost of each activity, they can be expected to be complementary with respect to changes in the severity of damages from crime, essentially because such changes increase the marginal revenues from both activities. Since an exogenous increase in the rate of murder is expected to increase the marginal social damage Dq, and, indirectly, the marginal costs of apprehension and conviction Cq, it is expected to induce an increase in the optimal values of both Pc and Pe | c. This analysis demonstrates the simultaneous relations between offense and defense and suggests that the deterrent
380
Isaac Ehrlich
effects of conviction and execution must be identified empirically through appropriate simultaneous equation estimation techniques.
II. New evidence on the deterrent effect of capital punishment A. The econometric model In the empirical investigation an attempt is made to test the main behavioral implications of the theoretical model. The econometric model of crime and law enforcement activity devised by the author (1973a) is applied to aggregate crime statistics relating to the United States for the period 1933–69. The model treats estimates of the murder rate and the conditional probabilities of apprehension, conviction, and execution as jointly determined by a system of simultaneous equations. Since data limitations rule out an efficient estimation of structural equations relating to law enforcement activities or private defense against murder, the following discussion focusses on a supply-ofmurders function actually estimated in this study. 1. The murder supply function It is assumed that the structural equations explaining the endogenous variables of the model are of a Cobb-Douglas variety in the arithmetic means of all the relevant variables. The murder supply function is specified as follows.
冢N冣 = CPa Pc | a Pe | c U L Y A Q
α1
α2
α3
β1
β2
β3 p
β4
exp (υ1)
(8)
where C is a constant term and υ1 is a disturbance term assumed to be subject to a first-order serial correlation. The regression equation thus can be written as: y1 = Y1A1′ + X1B1′ + υ1
(9)
where υ1 = ρυ1 + e1 −1
(10)
The variables y1, Y1, and X1 denote, respectively, the natural logarithms of the dependent variable, other endogenous variables, and all the exogenous variables entering equation (8); ρ denotes the coefficient of serial correlation, and the subscript –1 denotes one-period lagged values of a variable. The coefficient vectors A1′ and B1′ have been estimated jointly with ρ and the standard error of e1, σe, via a non-linear three-round estimation procedure proposed by Ray Fair.
The deterrent effect of capital punishment 381 2. Variables used The dependent variable of interest (Q/N) is the true rate of capital murders in the population in a given year. The statistic actually used, (Q/N)0, is the number of murders and nonnegligent manslaughters reported by the police per 1,000 civilian population as computed from data reported by the FBI Uniform Crime Report (UCR)10 and the Bureau of the Census. This statistic can serve as an efficient estimator of the true Q/N if the two were related by:
冢N冣 = k 冢N冣 exp µ Q
Q
0
(11)
where k indicates the ratio of the true number of capital murders committed in a given year relative to all murders reported to the police, and µ denotes random errors of reporting or identifying murders. It should be noted, however, that the fraction of capital murders among all murders may have been subject to a systematic trend over time. Indeed, the theory developed in Section Ia suggests that the decrease in the tendency to apply the death penalty in the United States over time may have led to an increase in the fraction of capital murders among all murders. More important, the number of reported murders may have decreased systematically over time because of the decrease in the fraction of all attempted murders resulting in the death of the victims due to the continuous improvement in medical technology. To account for such possible trends, the term k in equation (11) can be defined as k = δ exp(λT), where δ and λ are constant terms and T denotes chronological time. Upon substitution of (Q/N)0 for (Q/N) in equation (8), the inverse values of δ and µ would be subsumed under the constant term C and the stochastic variable υ, respectively, and exp( − λT) would emerge as an additional explanatory variable. Thus, the natural value of T is introduced in equation (9) as an independent exogenous variable.11 The matrix of endogenous variables associated with Y1 in equation (9) includes the conditional probabilities that guilty offenders be apprehended, convicted, and executed for murder. These probabilities have been approximated by computing objective measures of the relevant fractions of offenders who are apprehended, convicted, and executed. The following paragraphs contain a brief discussion of these measures. Pa is measured by the national “clearance rates” as reported by the FBI UCR, which are estimates of the percentage of all murders cleared by the arrest of a suspect. It is denoted by P 0a. The conditional probability Pc | a is identically equal to Pch | a · Pc | ch—the product of the conditional probabilities that a person who committed murder be charged once arrested, and that he be convicted once charged. Statistical exigencies preclude the estimation of a complete series of Pch | a, but Pc | ch is estimated by the fraction of all persons charged with murder who were convicted of murder in a given year as reported by the FBI UCR. This fraction, denoted by P0c | a,
382 Isaac Ehrlich may serve as an efficient estimator of the overall true probability Pc | a, provided that Pch | a were either constant over time, or proportionally related to the probability of arrest Pa. The actual measures of Pe | c consist of alternative estimates of the expected fractions of persons convicted of murder in a given year who were subsequently executed, P 0e | c. Because no complete statistics on the disposition of murder convicts by type of punishment are available, however, P0e | c has been estimated indirectly by matching annual time-series data on convictions and executions. Over most of the period considered in this investigation (up to 1962), executions appear to lag convictions by 12 to 16 months on the average. An objective measure of P0e | c in year t, therefore, may be the ratio of the number of persons executed in year t + 1 to the number convicted in year t or PXQ1 = Et+1/Ct.12 It must be pointed out, however, that the number of persons executed in year t + 1, and hence PXQ1, is, of course, unknown in year t and must be forecast by potential offenders. Even if expectations with respect to PXQ1 were unbiased on the average, the actual magnitude of PXQ1 is likely to deviate randomly from its expected value in year t. The effect of such random noise would be to bias the regression coefficient associated with PXQ1 toward zero. I have therefore constructed four alternative forecasts of the desired variable: PXQ1 = Et /Ct − 1; PXQ2 = Et /Ct; TXQ1 = the systematic part of PXQ1 computed via a linear distributed lag regression of PXQ1 on three of its consecutively lagged values; and PDL1 = the systematic part of PXQ1 computed via a second degree polynomial distributed lag function relating PXQ1 and four of its consecutively lagged values. The advantage of using these alternative estimates of the expected P 0e | c is that all being based on past data, they may be treated largely as predetermined rather than as endogenous variables. Alternatively, PXQ1 is treated as an endogenous variable along with P 0a and P 0c | a, and its systematic part is computed via the reduced form regression equation (see Table 16.3). Two difficulties associated with the use of the proposed estimates of P 0e | c as measures of the true conditional probability of execution warrant special attention. First, it may be argued that the fraction of convicts executed for murder may represent only the fraction of those convicted of capital murders among all murder convicts. Variations in PXQ1 or in other related estimates might then be entirely unrelated to the probability that a convict liable to be punished by the death penalty will be actually executed, and the expected elasticity of the murder rate with respect to these estimates might be nil. However, the significant downward trend in PXQ1 between 1933 and 1967 suggests, especially during the 1960s, that it may serve as a useful indicator of the relative variations in the true Pe | c. Second, it should be noted that the relative variation in the reported national murder rate relates to the United States as a whole, whereas the measures of P 0e | c relate to only a subset of states which retained and actually enforced capital punishment throughout the period considered. Thus, the empirical estimates of the elasticities of the −1
The deterrent effect of capital punishment 383 national murder rate with respect to P0e | c may, on this ground, be expected to understate the true elasticities of the murder rate in retentionist states only. The matrix of exogenous variables associated with X1 in equation (9) includes annual census estimates of the labor force participation rate of the civilian population 16 years and over (calculated by excluding the armed forces from the total noninstitutional population) L; the unemployment rate of the civilian labor force U; Milton Friedman’s estimate of real per capita permanent income (extended through 1969)13 Yp; the percentage of residential population in the age group 14–25, A; and chronological time T. Other exogenous variables assumed to be associated with the complete simultaneous equation model of murder and law enforcement X2 are one-year lagged estimates of real expenditure on police per capita XPOL − 1 and annual estimates of real expenditure by local, state, and federal governments per capita XGOV. Real expenditures are computed by deflating Survey of Current Business estimates of current expenditures by the implicit price deflator for all governments. In addition, X2 includes the size of the total residential population in the United States N, and the percent of nonwhites in residential population NW. The reason for including NW in the list of variables subsumed under X2 is discussed below in Section IIb. A list of all the variables used in the regression analysis is given in Table 16.2. B. The empirical findings An interesting finding which poses a challenge to the validity of the analysis in Section I is that over the period 1933–69, the simple correlation between the reported murder rate and estimates of the objective risk of execution given conviction of murder is positive in sign. For example, the simple correlation coefficients between (Q/N)0 and PXQ1, PXQ1 , and PXQ2 are found to be 0.140, 0.096, and 0.083, respectively. However, the results change substantively and are found to be in accordance with the theoretical predictions and statistically meaningful when the full econometric framework developed in the preceding section is implemented against the relevant data. Despite the numerous limitations inherent in the empirical counterparts of the desired theoretical constructs, the regression results reported in Tables 16.3 and 16.4 uniformly exhibit a significant negative elasticity of the murder rate with respect to each alternative measure of the probability of execution. More importantly, the regression results also corroborate the specific theoretical predictions regarding the effects of apprehension, conviction, unemployment, and labor force participation. Table 16.3 shows that the estimated elasticity of the murder rate with respect to the conditional probability of execution is lowest in absolute magnitude when the objective measure of Pe | c, PXQ1, is treated in the regression analysis as if it were a perfectly forecast and strictly exogenous variable. The algebraic value of the elasticity associated with PXQ1 is −0.039 with upper and lower 95 percent confidence limits (calculated from the normal −1
384
Isaac Ehrlich
Table 16.2 Variables used in the regression analysis, annual observations 1933–69 Mean Variable y1{ (Q/N)0 =Crime rate: offenses known per 1,000 civilian population. P0a = Probability of arrest: percent of ⎧ offenses cleared. ⎪ ⎪ 0 ⎪ P c | a = Conditional probability of ⎪ conviction: percent of those ⎪ charged who were convicted of ⎪ murder.a Y1 ⎨ 0 P e | c = Conditional probability of ⎪ execution; PXQ1 the number of ⎪ ⎪ executions for murder in the ⎪ ⎪ year t+1 as a percent of the ⎪ total number of convictions in ⎩ year t.b L = Labor force participation: ⎧ fraction of the civilian ⎪ ⎪ population in the labor force. ⎪ U = Unemployment rate: percent of ⎪ ⎪ the civilian labor force ⎪ unemployed. ⎪ A = Fraction of residential X1 ⎨ population in the age group 14– ⎪ ⎪ 24. ⎪ Yp = Friedman’s estimate of (real) ⎪ ⎪ permanent income per capita in ⎪ dollars. ⎪ T = Chronological time (years): 31– ⎩ 37. ⎧ NW = Fraction of nonwhites in residential population. ⎪ N = Civilian population in 1,000s. ⎪ ⎪ XGOV = Per capita (real) expenditures (excluding national defense) of ⎪ all governments in million X2 ⎨ dollars. ⎪ XPOL = Per capita (real) expenditures −1 ⎪ on police in dollars lagged one ⎩ one year.b
Standard deviation
Arithmetic mean
(Natural logarithms) −2.857
0.156
0.058
4.997
0.038
89.835
3.741
0.175
42.733
0.176
1.749
2.590
−0.546
0.030
0.579
1.743
0.728
7.532
−1.740
0.118
0.177
6.868
0.338
1012.35
2.685
0.867
19.00
−2.212
0.063
0.110
11.944 −7.661
0.161 0.501
155,853 .000532
2.114
0.306
8.638
Notes: a The figures for P0c | a (1933–35) and XPOL (all the odd years 1933–51) were interpolated via an auxiliary regression analysis. b The actual number of executions 1968, 1969, and 1970 was zero. However, the numbers were assumed equal to 1 in each of these years in constructing the value of PXQ1 in 1967–69.
0.291 0.046 −0.207 0.050 0.208 0.051
4. 1937–69 2.00 5. 1939–69 2.15 6. 1935–69 1.86
−0.455 (−3.58) −0.386 (−3.85) −0.374 (−3.59)
−1.553 (−1.99) −1.182 (−1.83) −1.203 (−1.78)
−1.461 (−2.03) −2.225 (−3.04) −1.512 (−1.94)
−3.176 (−0.78) −4.190 (−1.25) −4.419 (−1.25)
−2.447 (−0.61) 6.868 (1.39) −3.503 (−0.85)
−0.487 (−3.38) −0.850 (−4.124) −0.424 (−3.38)
∆*Pˆoc | a
C (Constant) ∆*Pˆoa
−0.049 (−2.26)
∆*TXQ1
−0.039 (−1.59)
∆*PXQ1
−0.062 (−3.82)
∆*PDL1
−0.068 (−3.69)
∆*PXQ2
Alternative ∆* P0e|c
−0.059 (−1.73)
ˆ Q1 ∆*PX
−0.065 (−3.29)
−1
∆*PXQ1
−1.393 (−1.58) −0.457 (−0.50) −1.368 (−1.38)
−1.336 (−1.36) −1.277 (−1.59) −1.405 (−1.63)
∆*L
0.524 (1.94) 0.059 (0.23) 0.485 (1.42)
0.630 (2.10) 0.481 (2.19) 0.512 (2.26)
∆*A
1.295 (3.90) 0.580 (1.70) 1.455 (4.25)
1.481 (4.23) 1.318 (4.86) 1.355 (4.88)
∆*Yp
0.063 (2.09) 0.014 (0.43) 0.064 (1.93)
0.067 (2.00) 0.062 (2.38) 0.068 (2.55)
∆*U
−0.044 (−4.93) −0.032 (−4.09) −0.050 (−4.87)
−0.047 (−4.60) −0.047 (−6.61) −0.047 (−6.54)
∆*T
Note: All variables except T are in natural logarithms. The definitions of these variables are given in Table 16.2. The term ∆*X denotes the linear operation X − ˆ is estimated via the Cochrane-Orcutt iterative procedure (CORC). The term σˆe is defined in Section IIA1. The terms ∆*Pˆa and ∆*Pˆc | a in PˆX−1. The value of p equations 1–5 are computed via a reduced form regression equation including: C(constant), Q/N−1, Pa−1, Pc | a−1, Poe | c, L, A, Yp, U, T, Poe | c−1, L−1, A−1, Yp−1, U−1, XPOL1, XGOV, NW, N. The terms ∆*Pˆa, ∆*Pˆc | a, and ∆*PˆXQ1 in equation 6 are computed via the same reduced form with PXQ1(Poe | c) excluded.
0.257 0.052 0.135 0.042 0.077 0.045
σˆe
D.W. statistic
1. 1935–69 1.84 2.1935–69 1.82 3. 1935–69 1.81
pˆ(CORC)
Effective period
Table 16.3 Modified first differences of murder rates (in natural logarithms) regressed against corresponding modified first differences of selected variables set I (1933–69) (βˆ/sβˆ in parentheses)
σˆe
0.059 0.044 0.287 0.046 — 0.046 0.061 0.046 0.250 0.048 −0.164 0.048 −0.029 0.048 −0.001 0.033 0.016 0.037
D.W. statistic
1. 1935–69a 1.80 2. 1937–69a 1.99 3. 1936–69b 1.49 4. 1935–69 1.84 5. 1937–69 2.08 6. 1941–69 2.21 7. 1941–69 2.13 8. 1933–66 1.90 9. 1939–66 1.96
∆*Pˆoa
−1.247 (−1.56) −1.435 (−1.87) −1.385 (−2.12) −1.172 (−1.73) −1.634 (−2.16) −1.744 (−2.21) −1.947 (−2.38) −0.564 (−1.10) −0.946 (−1.38)
C (Constant)
−4.060 (−1.00) −2.568 (−0.61) −3.608 (−1.03) −4.882 (−1.32) −2.086 (−0.51) 3.025 (0.57) 3.752 (0.68) −5.678 (−2.21) −2.601 (−0.598)
−0.345 (−3.07) −0.474 (−3.22) −0.345 (−3.25) −0.383 (−3.20) −0.508 (−2.83) −0.714 (−3.70) −0.723 (−3.69) −0.265 (−3.49) −0.360 (−1.984)
∆* Pˆo c|a −1
0.055 (−3.72)
−0.074 (−3.70)
−0.069 (−3.22)
−0.066 (−3.33)
∆*PXQ1
∆*Poe|c
−0.051 (−3.23)
−0.066 (−3.34)
−0.055 (−2.36)
−0.049 (−2.31) −0.064 (−3.52)
∆*TXQ1 −1.314 (−1.49) −1.388 (−1.57) −1.218 (−1.40) −1.487 (−1.61) −1.444 (−1.51) −1.008 (−1.04) −0.962 (−0.99) −2.111 (−3.18) −1.766 (−2.254)
∆*L 0.450 (2.20) 0.526 (1.94) 0.482 (2.13) 0.477 (1.89) 0.406 (1.23) 0.141 (0.56) 0.152 (0.55) 0.283 (1.65) 0.212 (1.03)
∆*A 1.318 (4.81) 1.289 (3.91) 1.348 (4.94) 1.393 (4.30) 1.334 (3.73) 0.734 (2.06) 0.771 (2.00) 0.922 (4.16) 0.780 (2.920)
∆*Yp
Notes: same references as in Table 16.3 but the reduced form used to compute ∆* Pˆoa and ∆*Pˆoc | a does not include N. a Same as equations 3 and 4 in Table 16.3 with the missing data pertaining to XPOL−1 interpolated via a smoothing procedure. b Same as equation 4 in Table 16.3 with pˆ assumed to be zero (level regression).
pˆ(CORC)
Effective period
0.068 (2.60) 0.063 (2.10) 0.068 (2.59) 0.077 (1.95) 0.077 (1.80) 0.028 (0.91) 0.0311 (0.96) 0.036 (1.74) 0.027 (1.11)
∆*U
0.018 (0.31) 0.035 (0.50)
War years dummy (1942–45)
−0.046 (−6.54) −0.044 (−4.96) −0.047 (−6.69) −0.048 (−5.76) −0.045 (−4.72) −0.036 (−4.40) −0.036 (−4.13) −0.036 (−6.30) −0.033 (−4.99)
∆*T
Table 16.4 Modified first differences of murder rates (in natural logarithms) regressed against corresponding modified first differences of selected variables set II: alternative time periods and other tests (βˆ/sβˆ in parentheses)
The deterrent effect of capital punishment 387 distribution) of 0.008 and −0.086. The corresponding elasticities associated with the alternative measures of Pe | c, PXQ1 , PXQ2, TˆXQ1, PˆXQ1, and PDL1 vary between −0.049 and −0.068 with upper and lower 90 percent confidence limits ranging between −0.01 and −0.10. These results have been anticipated by the analysis of Section IIa2 where it was suggested that the regression coefficient associated with PXQ1 is likely to be biased toward zero due to the effect of random forecasting errors. In addition, since the analysis of optimal social defense against murder suggests that an exogenous change in (Q/N) may change the socially optimal value of Pe | c in the same direction, the coefficient associated with PXQ1 may be biased toward a positive value because of a potentially positive correlation between (Q/N) and the unsystematic part of PXQ1. This simultaneous equation bias is expected to be eliminated when the systematic part of PXQ1 is estimated via the reduced form regression equation (PˆXQ1). It is noteworthy that the estimated elasticities of (Q/N)0 with respect to alternative measures of Pe | c are found generally low in absolute magnitude. This, perhaps, is the principal reason why previous studies into the effect of capital punishment on murder using simple correlation techniques and rough measures of the conditional risk of execution have failed to identify a systematic association between murder and the risk of execution. The regression results regarding the effects of P0a, P0c | a, and P0e | c constitute perhaps the strongest findings of the empirical investigation. Not only do the signs of the elasticities associated with these variables conform to the general theoretical expectations, but their ranking, too, is consistent with the predictions in Section I. Table 16.3 shows that the elasticities associated with P0a range between −1.0 and −1.5, whereas the elasticities associated with P0c | a in the various regression equations range between −0.4 and −0.5. And, as indicated in the preceding paragraph, the elasticities associated with P0e | c are lowest in absolute magnitude. Consistent with predictions and evidence presented in Section Ib regarding a negative association between P0e | c on the one hand and P0a and P0c | a on the other, introduction of the latter variables in the regression equation is found to be particularly useful in isolating the (negative) deterrent impact of estimates of P0e | c. Of similar importance is the introduction of the time trend T. The estimated values of the elasticities associated with the unemployment rate U, labor force participation L, and permanent income Yp in Table 16.3 are not inconsistent with the theoretical expectations discussed in Section Ia. Of particular interest is that the effects of equal percentage changes in P0e | c and U are found to be nearly alike in absolute magnitude. Because murder is often a by-product of crimes involving material gains, the positive effect of U on (Q/N)0 may be attributed in part to the effect of the reduction in legitimate earning opportunities on the incentive to commit such crimes. Indeed, preliminary time-series regression results show that the elasticities of robbery and burglary rates with respect to the unemployment rate are even larger in magnitude than the corresponding elasticities of the murder rate. −1
388
Isaac Ehrlich
These results conform more closely to theoretical expectations than do the results in a cross-state regression analysis (see the author (1973a)). The reason, presumably, is that due to their higher correlation with cyclical variations in the demand for labor, changes in U over time measure the variations in both involuntary unemployment and the duration of such unemployment more effectively than do variations in U across states at a given point in time. The estimated negative effect of variations in the labor force participation rate on the murder rate can be explained along similar lines. Theoretically, variations in L are likely to reflect opposing income and substitution effects of changes in market earning opportunities. However, with measures of both permanent income Yp, and the rate of unemployment introduced in the regression equation as independent explanatory variables, changes in L may reflect a pure substitution effect of changes in legitimate earning opportunities on the incentive to commit crimes both against persons and property.14 Finally, the positive association between Yp and (Q/N)0 need not imply a positive income elasticity of demand for hate and malice since changes in the level of the personal distribution of income may be strongly correlated with payoffs on crimes against property. If legitimate employment opportunities are effectively accounted for by U and by L, changes in Yp may be highly correlated with similar changes in the incidence of crimes against property. Such a partial correlation is indeed observed across states and in a time-series regression analysis of crimes against property now in progress. The positive effect of variations in the percentage of the population in the age group 15–24, A, on the murder rate is consistent with the cross-state evidence concerning the correlation between these variables. A possible explanation for this finding was already offered in Section IIa2. Additional analysis, not reported herein, indicated that the effect of the percentage of nonwhites in the population NW becomes statistically insignificant when the time trend T is introduced as an independent explanatory variable in the regression equation. Consequently, this variable is excluded from the regressions estimating the supply of murders function. This result stands in sharp contrast to the ostensibly positive effect of NW on the murder rate across states. I have argued elsewhere in this context that the apparently higher participation rate of nonwhites in all criminal activities may result largely from the relatively poor legitimate employment opportunities available to them (see the author (1973a)). Since, over time, variations in these opportunities may be effectively accounted for by the variations in U and L, the estimated independent effect of NW may indeed be nil. The negative partial effect of T on (Q/N)0 reported in Tables 16.3 and 16.4 is not inconsistent with the predictions advanced in Section IIa2. The regression results are found to be robust with respect to the functional form of the regression equation. In addition, estimating the regression equations by introducing the levels of the relevant variables rather than their modified first differences (that is, assuming no serial correlation in the error terms) artificially reduces the standard errors of the regression coefficients as
The deterrent effect of capital punishment 389 would be expected on purely statistical grounds (see Table 16.4, equation (3)). The results are further insensitive as to the specific estimates of expenditures on police used in the reduced form regression equation. The data for this variable are not available for all the odd years between 1933 and 1951 and the missing statistics were interpolated either via a reduced form regression analysis (XPOL) or a simple smoothing procedure. The results are virtually identical (compare equations (1) and (2) in Table 16.4 with equations (3) and (4) in Table 16.3). The introduction of a dummy variable distinguishing the World War II years (1942–45) from other years in the sample has no discernible effect on the regression results, while the effect of the dummy variable itself appears to be statistically insignificant. Of more importance, the qualitative results reported in Table 16.3 are for the most part insensitive to changes in the specific interval of time investigated in the regression analysis, as indicated by the results reported in Table 16.4. However, the absolute magnitudes of some of the estimated elasticities, especially those associated with P0a, P0c | a, P0e | c, U, and L do change when estimated from different subperiods. Finally, the time-series estimates of the supply-of-murders function appear quite consistent with independent estimates derived through a cross-state regression analysis using data from 1960. A detailed discussion of related issues is included in the author (1973b).
III. Some implications A. The apparent effect of capital punishment: deterrence or incapacitation? It has already been hinted in the introduction to this paper that an apparent negative effect of execution on the murder rate may merely reflect the relative preventive or incapacitating impact of the death penalty which eliminates the possibility of recidivism on the part of those executed. An estimation of the differential preventive effect of execution relative to imprisonment for capital murder has been attemped in this study through an application of a general model of the preventive effect of imprisonment developed in the author (1973a). In this application of the model, execution is identified with an imprisonment term Te, which is equal to the life expectancy of an average offender imprisoned for murder. The differential preventive impact of execution is estimated by taking account of the alternative average sentence served by those imprisoned for capital murder Tm, the fractions of potential murders executed and imprisoned, and the rate of population growth. Derivation of the expected partial elasticity of the murder rate with respect to the fraction of convicts executed, σP e | c, is omitted here for lack of space. I shall point out only that estimates of σP e | c derived on the basis of the extremely unrealistic assumption that any potential murderer at large (outside prison) commits one murder each and every year and for values of 0
0
390 Isaac Ehrlich Te and Tm estimated at 40, and between 10 and 16 years, respectively, vary between 0.020 and 0.037 (see the author (1973b)). These estimates, therefore, do not account for the full magnitudes of the absolute values of the elasticities of the murder rate with respect to estimates of the fraction of convicts executed that are reported in Tables 16.3 and 16.4. Moreover, according to the model of law enforcement involving only preventive effects, the partial elasticity of the murder rate with respect to the fraction of those apprehended for murder P0a is expected to be identical to the corresponding elasticity with respect to the fraction of those apprehended and charged with murder who were convicted of this crime, P0c | a. The reason, essentially, is that equal percentage changes in either P0a or P0c | a have the same effect on the fractions of offenders who are incapacitated through incarceration or execution, and thus should have virtually equivalent preventive effects on the murder rate. This prediction is ostensibly at odds with the significant positive difference between empirical estimates of the murder rate with respect to P0a and P0c | a. In contrast, the latter findings are consistent with implications of the deterrent theory of law enforcement (see equation (3)). In light of these observations one cannot reject the hypothesis that punishment in general, and execution in particular, exert a unique deterrent effect on potential murderers. B. Tentative estimates of the tradeoff between executions and murders The regression results concerning the partial elasticities of the reported murder rate with respect to various measures of the expected risk of execution given conviction in different subperiods α˜3, can be restated in terms of expected tradeoffs between the execution of an offender and the lives of potential victims that might thereby be saved. For illustration, consider the regression coefficients associated with PˆXQ1 and PXQ1 in equations (6) and (3) of Table 16.3. These coefficients, −0.06 and −0.065, respectively, may be considered consistent estimates of the average elasticity of the national murder rate, (Q/N)0, with respect to the objective conditional risk of execution, P0e | c = (E/C)0, over the period 1935–69. Evaluated at the mean values of murders and executions over that period, Q = 8,965 and E = 75, the marginal tradeoffs, ∆Q/∆E = αˆ3Q /E , are found to be 7 and 8, respectively. Put differently, an additional execution per year over the period in question may have resulted, on average, in 7 or 8 fewer murders. The weakness inherent in these predicted magnitudes is that they may be subject to relatively large prediction errors. More reliable point estimates of the expected tradeoffs should be computed at the mean values of all the explanatory variables entering the regression equation (hence, also the mean value of the dependent variable) because the confidence interval of the predicted value of the dependent variable is there minimized. The mean values of the dependent variable and the explanatory variable used to calculate the value of α˜3 in equation (3) of Table 16.3 are found to be nearly identical with the actual −1
The deterrent effect of capital punishment 391 values of these two variables in 1966 and 1959, respectively. The corresponding values of murders and executions in these two years were Q(1966) = 10,920 and E(1959) = 41; the marginal tradeoffs between executions and murders based on the latter magnitudes and the elasticity αˆ3 = −0.065 are found to be 1 to 17. It should be emphasized that the expected tradeoffs computed in the preceding illustration mainly serve a methodological purpose since their validity is conditional upon that of the entire set of assumptions underlying the econometric investigation. In addition, it should be pointed out that the 90 percent confidence intervals of the elasticities used in the preceding illustrations vary approximately between 0 and −0.10 implying that the corresponding confidence intervals of the expected tradeoffs in the last illustration range between limits of 0 and 24. As the above illustrations indicate, however, although the estimated elasticities αˆ3 reported in Tables 16.3 and 16.4 are low in absolute magnitude, the tradeoffs between executions and murders implied by these elasticities are not negligible, especially when evaluated at relatively low levels of executions and relatively high levels of murder.15 Finally, it should be emphasized that the tradeoffs discussed in the preceding illustrations were based upon the partial elasticity of (Q/N)0 with respect to measures of P0e | c and thus, implicitly, on the assumption that the values of all other variables affecting the murder rate are held constant as the probability of execution varies. In practice, however, the values of the endogenous variables Pa and Pc | a may not be perfectly controllable. The theoretical analysis in Section Ib suggests that exogenous shifts in the optimal values of Pe | c may generate offsetting changes in the optimal values of Pa and Pc | a. Indeed, consistent estimates of the elasticities of the reported murder rates with respect to alternative measures of P0e | c that were derived through a reduced form regression analysis using as explanatory variables only the exogenous and predetermined variables included in the supply of offenses function and other structural equations (X1 and X2 in Table 16.2) are found to be generally lower than the elasticities reported in Table 16.3.16 The actual tradeoffs between executions and murders thus depend partly upon the ability of law enforcement agencies to control simultaneously the values of all the parameters characterizing law enforcement activity.
IV. Conclusions This paper has attempted to present a systematic analysis of the relation between capital punishment and the crime of murder. The analysis rests on the presumption that offenders respond to incentives. Not all those who commit murder may respond to incentives. But for the theory to be useful in explaining aggregate behavior, it is sufficient that at least some so behave. Previous investigations, notably those by Sellin, have developed evidence used to unequivocally deny the existence of any deterrent or preventive effects of capital punishment. This evidence stems by and large from what amounts
392
Isaac Ehrlich
to informal tests of the sign of the simple correlation between the legal status of the death penalty and the murder rate across states and over time in a few states. Studies performing these tests have not considered systematically the actual enforcement of the death penalty, which may be a far more important factor affecting offenders’ behavior than the legal status of the penalty. Moreover, these studies have generally ignored other parameters characterizing law enforcement activity against murder, such as the probability of apprehension and the conditional probability of conviction, which appear to be systematically related to the probability of punishment by execution. In addition, the direction of the causal relationship between the rate of murder and the probabilities of apprehension, conviction, and execution is not obvious, since a high murder rate may generate an upward adjustment in the levels of these probabilities in accordance with optimal law enforcement. Thus the sign of the simple correlation between the murder rate and the legal status, or even the effective use of capital punishment, cannot provide conclusive evidence for or against the existence of a deterrent effect. The basic strategy I have attempted to follow in formulating an adequate analytic procedure has been to develop a simple economic model of murder and defense against murder, to derive on the basis of this model a set of specific behavioral implications that could be tested against available data, and, accordingly, to test those implications statistically. The theoretical analysis provided sharp predictions concerning the signs and the relative magnitudes of the elasticities of the murder rate with respect to the probability of apprehension and the conditional probabilities of conviction and execution for murder. It suggested also the existence of a systematic relation between employment and earning opportunities and the frequency of murder and other related crimes. Although in principle the negative effect of capital punishment on the incentive to commit murder may be partly offset, for example, by an added incentive to eliminate witnesses, the results of the empirical investigation are not inconsistent with the hypothesis that, on balance, capital punishment reduces the murder rate. But even more significant is the finding that the ranking of the elasticities of the murder rate with respect to Pa, Pc | a, and Pe | c conforms to the specific theoretical predictions. The murder rate is also found negatively related to the labor force participation rate and positively to the rate of unemployment. None of these results is compatible with a hypothesis that offenders do not respond to incentives. In particular, the results concerning the effects of the estimates of the probabilities of apprehension, conviction, and execution are not consistent with the hypothesis that execution or imprisonment decrease the rate of murder only by incapacitating or preventing apprehended offenders from committing further crimes. These observations do not imply that the empirical investigation has proved the existence of the deterrent or preventive effect of capital punishment. The results may be biased by the absence of data on the severity of alternative punishments for murder, by the use of national rather than state
The deterrent effect of capital punishment 393 statistics, and by other imperfections. At the same time it is not obvious whether the net effect of all these shortcomings necessarily exaggerates the regression results in favor of the theorized results. In view of the new evidence presented here, one cannot reject the hypothesis that law enforcement activities in general and executions in particular do exert a deterrent effect on acts of murder. Strong inferences to the contrary drawn from earlier investigations appear to have been premature. Even if one accepts the results concerning the partial effect of the conditional probability of execution on the murder rate as valid, these results do not imply that capital punishment is necessarily a desirable form of punishment. Specifically, whether the current level of application of capital punishment is optimal cannot be determined independently of the question of whether the levels of alternative punishments for murder are optimal. For example, one could argue on the basis of the model developed in Section Ia that if the severity of punishments by means other than execution had been greater in recent years, the apparent elasticity of the murder rate with respect to the conditional probability of punishment by execution would have been lower, thereby making capital punishment ostensibly less efficient in deterring or preventing murders. Again, this observation need not imply that the effective period of incarceration imposed on convicted capital offenders should be raised. Given the validity of the analysis pursued above, incarceration or execution are not exhaustive alternatives for effectively defending against murders.17 Indeed, these conventional punishments may be considered imperfect means of deterrence relative to monetary fines and other related compensations because the high “price” they exact from convicted offenders is not transferrable to the rest of society. Moreover, the results of the empirical investigation indicate that the rate of murder and other related crimes may also be reduced through increased employment and earning opportunities. The range of effective methods for defense against murder thus extends beyond conventional means of law enforcement and crime prevention. There is no unambiguous method for determining whether capital punishment should be utilized as a legal means of punishment without considering at the same time the optimal values of all other choice variables that can affect the level of capital crimes.
Notes *
I have benefitted from comments and suggestions from Gary Becker, Harold Demsetz, Lawrence Fisher, John Gould, Richard Posner, George Stigler, and Arnold Zellner. I am particularly indebted to Randall Mark for useful assistance and suggestions and to Walter Vandaele and Dan Galai for helpful computational assistance and suggestions. This paper is a reduced version of a more complete and detailed draft (see the author 1973b). Financial support for this study was provided by a grant to the NBER from the National Science Foundation, but the paper is not an official NBER publication since it has not been reviewed by the board of directors.
394
Isaac Ehrlich
1 For a more complete discussion of this model, see Harold Hochman and James Rodgers, and Gary Becker (1974). 2 It might be argued that although the wish to harm other persons cannot be rejected on economic grounds, nonetheless the execution of such desires (as opposed to benevolent actions) must be considered irrational in the sense of violation of Pareto optimality conditions. If there were no bargaining, transfer, or enforcement costs associated with mutually acceptable and enforceable contracts between a potential offender o and his potential victim v, and if v’s wealth constraint were not binding, then it would always be optimal for v to offer compensation to o for not committing a crime against him and for o to seek such compensation or extortion. The reason is that a reduction in v’s consumption level is thus achieved by o without incurring the direct costs of committing a crime and the prospective cost of legal sanctions. Indeed, there exists some range of compensations that would increase both o’s and v’s utilities relative to their expected utilities if crime is committed by o against v. Many crimes against persons, and some cases of property crimes as well, may occasionally be avoided by such arrangements; successful extortions involving kidnapping or hijacking constitute obvious examples. Yet in many situations compensations may be too costly to pursue or to enforce, just as fully effective private or public protection against murder may be too costly to provide. The incidence of murder must then be expected on purely economic grounds. 3 The case in which crime is committed in pursuit of material gains has been analyzed explicitly by the author (1973a). Note that the victim’s level of consumption need not directly enter the offender’s utility function in this case. 4 Differentiating equation (2) with respect to Pa, Pc | a, and Pe | c, using the contingent outcomes of murder as illustrated in Table 16.1, it can easily be demonstrated that: ∈Pa = −
∂U* 1 om Pa = * {Pa(1 − Pc | a) ∂Pa U* Uo o
• [U(Ca ) − U(Cb )] + PaPc | a(1 − Pe | c) • [U(Ca ) − U(Cc )] + PaPc | aPe | c • [U(Ca ) − U(Cd )]} > 0 ∈Pc | a = −
∂U* 1 om Pc | a = {PaPc | a(1 − Pe | c) ∂Pc | a U* U* o o
• [U(Cb ) − U(Cc )] + PaPc | aPe| c • [U(Cb ) − U(Cd )]} > 0 ∈Pe | c = −
∂U* 1 om Pe | c = {PaPc | aPe | c ∂Pe| c U*o U*o
• [U(Ce ) − U(Cd )] > 0 Clearly, ∈Pa > ∈ Pc | a > ∈ Pc | c > 0. 5 Pc and θ would be proportionally related if the number of arrests of innocent and guilty persons were proportionally related and if the probability of legal error remained constant as more resources were spent on enforcement activity through arrests and prosecutions. Alternatively, it might be argued that Pc and θ are highly (positively) correlated because of the well-known proposition that at any given level of evidence presented in court in reference to the defendant’s guilt or innocence,
The deterrent effect of capital punishment 395 the probability of legal or type I error, α (that of convicting the innocent), is negatively related to the probability of type II error, β (that of acquitting the guilty). Hence α might be negatively correlated with Pc | ch ≡ 1 − β where Pc | ch denotes the conditional probability that a guilty offender will be convicted once he is charged. However, the assumption that Pc and θ, or Pc | ch and α, are mutually dependent is made mainly for methodological convenience without affecting the basic implications of the following analysis. More generally, the direct costs of law enforcement activity C may be specified as a function including Pc and the unconditional probability of legal error ε as independent arguments so that optimal values of these probabilities may be determined separately via appropriate expenditures. 6 More specifically, γ1 = b1 + λβ1 and γ2 = b2 + λβ2, where λ is a coefficient relating Pc to the fraction of murders cleared by convicting innocent persons π and b and β indicate the respective net social costs from punishing guilty and innocent convicts through execution, denoted by the subscript 1, or imprisonment, denoted by the subscript 2. The conditional probability of execution given conviction is implicitly assumed to be equal for all convicts. 7 By definition, εPe | c ≡ − (∂q/∂Pe | c)(Pe | c/q) ≡ ε f (∂ f /∂Pe | c)(Pe | c/f ) ≡ εfεfc Clearly, εfc = Pe | c[d − (γ2/γ1)m]/{(γ2/γ1)m + Pe | c[d − (γ2/γ1)m]}
8 9 10 11
12 13 14
is lower than unity if [d − (γ2/γ1)m] > 0. Under this condition, and the assumption that γ1 > 0, εPe | c < εf < 1. By like reasoning and some simplifying assumptions, it can also be shown that in equibrium, εPa > εPc | a > εPe | c. Proofs to the theorems discussed in this section can be developed through an appropriate differentiation of equations (6) and (7) with respect to the relevant variables. I am indebted to the Uniform Crime Reporting Section of the FBI for making available their revised annual estimates of the total number of murders and other index crimes in the United States during the period 1933–65. Another important reason for introducing chronological time as an exogenous variable in equation (8) is to account for a possible time trend in missing variables, in particular, the average length of imprisonment for both capital and noncapital murders for which no complete time-series is available. Scattered evidence shows rising trends in the median value of prison terms served by all murder convicts over a large part of the period considered in this investigation, but this increase may have been largely technical. With executions being imposed less frequently over time, the frequency of life imprisonment sentences for murder convicts may have risen accordingly, thus increasing the mean or median time spent in prisons by these convicts. Execution figures are based on National Prisons Statistics Bulletin (NPS) statistics. Conviction figures are derived by Ct = Q0t P0atP0c | at. Statistics on the time elapsed between sentencing and execution can be found in NPS numbers 20 and 45. I am indebted to Edi Karni for making available to me his updated calculations of the permanent income variable. A possible explanation for the significant negative association between labor force participation and particularly crimes against the person is that interpersonal
396
Isaac Ehrlich
frictions and social interactions leading to acts of malice occur mostly in the nonmarket or home sector rather than at work. An increase in the total time spent in the nonmarket sector (a reduction in L) might then generate a positive scale effect on the incidence of murder. This ad hoc hypothesis is nevertheless supported by FBI UCR evidence on the seasonal pattern of murder. This crime rate peaks twice a year: around the holiday season (December) and around the summer vacation season (July–August) in which relatively more time is spent out of work. It is also supported by evidence that the frequency of murders on weekends is significantly higher than on weekdays (see William Graves, p. 327). 15 A decrease in the number of executions in 1960 from 44 to 2 (the actual number of executions in 1967), which implies a decline of 95 percent in the value of Pe | c in that year, would have increased the murder rate that same year by about 6.2 percent from 0.05 to 0.053 per 1,000 population if the true value of α2 were equal to 0.065. The implied increase in the actual number of murders in 1960 would have been from 9,000 to 9,558. For comparison, note that the actual murder rate in 1967 was 0.06 per 1,000 population and the number of murders was 12,100. The values of other explanatory variables associated with the supply of murders function were, of course, quite different in these two years. By this tentative and rough calculation, the decline in Pe | c alone might have accounted for about 25 percent of the increase in the murder rate between 1960 and 1967. 16 The elasticities associated with PXQ1, PXQ1 , TXQ1, and PDL1 in this modified reduced form regression analysis relating to the period 1934–69 are found equal to −0.0269 (−0.83), −0.0672 (−2.29), −0.0414 (−1.99), and −0.052 (−5.81), respectively, where the numbers in parentheses are the ratios of the coefficients to their standard errors. 17 Ironically, the argument that capital punishment should be abolished because it has no deterrent effect on offenders might serve to justify the use of capital punishment as an ultimate means of prevention of crime, since the risk of recidivism that cannot be deterred by the threat of punishment is not eliminated entirely even inside prison walls. In contrast, since the results of this investigation support the notion that execution exerts a pure deterrent effect on offenders, they can be used to suggest that other punishments, even those which do not have any preventive effect, can in principle serve as substitutes. −1
References Beccaria, C. B. An Essay on Crimes and Punishments, London 1767, originally published in 1764. Becker, G. S. “Crime and Punishment: An Economic Approach,” J. Polit. Econ., Mar./ Apr. 1968, 78, 169–217. —— , “A Theory of Social Interactions,” J. Polit. Econ., Nov./Dec. 1974, 82, 1063–93. Bedau, H. A. The Death Penalty in America, Garden City 1967. Ehrlich, I. “Participation in Illegitimate Activities: An Economic Analysis,” unpublished doctoral dissertation, Columbia Univ. 1970. —— , (1973a) “Participation in Illegitimate Activities: A Theoretical and Empirical Investigation,” J. Polit. Econ., May/June 1973, 81, 521–65. —— , (1973b) “The Deterrent Effect of Capital Punishment: A Question of Life and Death,” Nat. Bur. Econ. Res. working pap. series no. 18, 1973. Fair, R. C. “The Estimation of Simultaneous Equation Models with Lagged Endogenous Variables and First Order Serially Correlated Errors,” Econometrica, May 1970, 38, 507–16.
The deterrent effect of capital punishment 397 Graves, W. F. “The Deterrent Effect of Capital Punishment in California,” in H. A. Bedau, ed., The Death Penalty in America, Garden City 1967. Hochman H. M. and Rodgers, J. D. “Pareto Optimal Redistribution,” Amer. Econ. Rev., Sept. 1969, 59, 542–57. President’s Commission on Law Enforcement and Administration of Justice (PCL), Crime and Its Impact—An Assessment, Task Force Reports, Washington 1967. Sellin, T. The Death Penalty, Philadelphia 1959. —— , Capital Punishment, New York 1967. U.S. Department of Justice, Bureau of Prisons, National Prisons Statistics Bulletin (NPS), various numbers; published prior to 1950 by Bureau of Census, Washington. —— , Federal Bureau of Investigation, Uniform Crime Report (UCR), Washington, various years. U.S. Office of Business Economics, The National Income and Product Accounts of the United States, 1929–65, Statistical Tables, Surv. Curr. Bus., supp., Washington 1966. —— , Business Statistics, Surv. Curr. Bus., Washington, various years.
17 Does capital punishment have a deterrent effect? New evidence from postmoratorium panel data Hashem Dezhbakhsh, Paul H. Rubin and Joanna M. Shepherd 1. Introduction The acrimonious debate over capital punishment has continued for centuries (Beccaria, 1764; Stephen, 1864). In recent decades the debate has heated up in the United States following the Supreme Court-imposed moratorium on capital punishment.1 Currently, several states are considering a change in their policies regarding the status of the death penalty. Nebraska’s legislature, for example, recently passed a two-year moratorium on executions, which was, however, vetoed by the state’s governor. Ten other states have at least considered a moratorium last year (“Execution Reconsidered,” 1999, p. 27). The group includes Oklahoma, whose legislature will soon consider a bill imposing a two-year moratorium on executions and establishing a task force to research the effectiveness of capital punishment. The legislatures in Nebraska and Illinois have also called for similar research. In Massachusetts, however, the House of Representatives voted down a bill supported by the governor to reinstate the death penalty. An important issue in this debate is whether capital punishment deters murders. Psychologists and criminologists who examined the issue initially reported no deterrent effect (see, e.g., Cameron, 1994; Eysenck, 1970; Sellin, 1959). Economists joined the debate with the pioneering work of Ehrlich (1975, 1977). Ehrlich’s regression results, using U.S. aggregate time-series for 1933–69 and state-level cross-sectional data for 1940 and 1950, suggest a significant deterrent effect, which sharply contrasts with earlier findings. The policy importance of the research in this area is borne out by the considerable public attention that Ehrlich’s work has received. The Solicitor General of the United States, for example, introduced Ehrlich’s findings to the Supreme Court in support of capital punishment (Fowler v. North Carolina). Coinciding with the Supreme Court’s deliberation on the issue, Ehrlich’s finding inspired an interest in econometric analysis of deterrence, leading to many studies that use his data but different regression specifications— different regressors or different choice of endogenous versus exogenous variables.2 The mixed findings prompted a series of sensitivity analyses on Ehrlich’s equations, reflecting a further emphasis on specification.3
Does capital punishment have a deterrent effect? 399 Data issues, on the other hand, have received far less attention. Most of the existing studies use either time-series or cross section data. The studies that use national time-series data are affected by an aggregation problem. Any deterrence from an execution should affect the crime rate only in the executing state. Aggregation dilutes such distinct effects.4 Cross-sectional studies are less sensitive to this problem, but their static formulation precludes any consideration of the dynamics of crime, law enforcement, and judicial processes. Moreover, cross-sectional studies are affected by unobserved heterogeneity, which cannot be controlled for in the absence of time variation. The heterogeneity is due to jurisdiction-specific characteristics that may correlate with other variables of the model, rendering estimates biased. Several authors have expressed similar data concerns or called for new research based on panel data (see, e.g., Avio, 1998; Cameron, 1994; Hoenack and Weiler, 1980). Such research will be timely and useful for policy making. We examine the deterrent effect of capital punishment by using a system of simultaneous equations and county-level panel data that cover the postmoratorium period. This is the most disaggregate and detailed data used in this literature. Our analysis overcomes data and econometric limitations in several ways. First, the disaggregate data allow us to capture the demographic, economic, and jurisdictional differences among U.S. counties, while avoiding aggregation bias. Second, by using panel data, we can control for some unobserved heterogeneity across counties, therefore avoiding the bias that arises from the correlation between county-specific effects and judicial and law enforcement variables. Third, the large number of county-level observations extends our degrees of freedom, thus broadening the scope of our empirical investigation. The large data set also increases variability and reduces colinearity among variables. Finally, using recent data makes our inference more relevant for the current crime situation and more useful for the ongoing policy debate on capital punishment. Moreover, we address two issues that appear to have remained in the periphery of the specification debate in this literature. The first issue relates to the functional form of the estimated equations. We bridge the gap between theoretical propositions concerning an individual’s behavior and the empirical equation typically estimated at some level of aggregation. An equation that holds true for an individual can also be applied to a county, state, or nation only if the functional form is invariant to aggregation. This point is important when similar equations are estimated at various levels of aggregation. The second issue relates to murders that may not be deterrable— nonnegligent manslaughter and nonpremeditated crimes of passion—and that are included in commonly used murder data. We examine whether such inclusion has an adverse effect on the deterrence inference. We draw on our discussions of these issues and the specification debate in this literature to formulate our econometric model. The article is organized as follows: Section 2 reviews the literature on the deterrent effect of capital punishment and outlines the theoretical foundation
400
Hashem Dezhbakhsh, Paul H. Rubin and Joanna M. Shepherd
of our econometric model. Section 3 describes data and measurement issues, presents the econometric specification, and highlights important statistical issues. Section 4 reports the empirical results and the corresponding analysis, including an estimate of the number of murders avoided as the result of each execution. This section also examines the robustness of our findings. Section 5 concludes.
2. Capital punishment and deterrence Historically, religious and civil authorities imposed capital punishment for many different crimes. Opposition to capital punishment intensified during the European Enlightenment as reformers such as Beccaria and Bentham called for abolition of the death penalty. Most Western industrialized nations have since abolished capital punishment (for a list see Zimring and Hawkins, 1986, chap. 1). The United State is an exception. In 1972, in Furman v. Georgia, the Supreme Court outlawed capital punishment, arguing that execution was cruel and unusual punishment, but in 1976, in Gregg v. Georgia, it changed its position by allowing executions under certain carefully specified circumstances. There were no executions in the U.S. between 1968 and 1977. Executions resumed in 1977 and have increased steadily since then, as seen in Table 17.1. As Table 17.2 illustrates, from 1977 through 2000 there have been 683 executions in thirty-one states. Seven other states have adopted death penalty laws but have not executed anyone. Tennessee had its first execution in April 2000, and twelve states do not have death penalty laws. Several of the executing states are currently considering a moratorium on executions, while a few nonexecuting states are debating whether to reinstate capital punishment. The contemporary debate over capital punishment involves a number of important arguments, drawing either on moral principles or social welfare considerations. Unlike morally based arguments, which are inherently theoretical, welfare-based arguments tend to build on empirical evidence. The critical issue with welfare implications is whether capital punishment deters capital crimes; an affirmative answer would imply that the death penalty can potentially reduce such crimes. In fact, this issue is described as “the most important single consideration for both sides in the death penalty controversy” (Zimring and Hawkins, 1986, p. 167). As Figure 17.1 demonstrates, looking at the raw data does not give a clear answer to the deterrence question. Although executing states had much higher murder rates than nonexecuting states in 1977, the rates have since converged. Hence, more sophisticated empirical techniques are required to determine if there is a deterrent effect from capital punishment. Ehrlich (1975, 1977) introduced regression analysis as a tool for examining the deterrent issue. A plethora of economic studies followed Ehrlich’s. Some of these studies verbally criticize or commend Ehrlich’s work, whereas others
Does capital punishment have a deterrent effect? 401 Table 17.1 Executions and executing states Year
No. of executions
No. of states with death penalty
1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
1 0 2 0 1 2 5 21 18 18 25 11 16 23 14 31 38 31 56 45 74 68 98 85
31 32 34 34 34 35 35 35 35 35 35 35 35 35 36 36 36 34 38 38 38 38 38 38
Source: Snell, Tracy L. 2001. Capital Punishment 2000. Washington, D.C.: U.S. Bureau of Justice Statistics (NCJ 190598).
offer alternative analyses. Most analyses use a variant of Ehrlich’s econometric model and his data (1933–69 national time-series or 1940 and 1950 state-level cross-section). For example, Yunker (1976) finds a deterrent effect much stronger than Ehrlich’s. Cloninger (1977) and Ehrlich and Gibbons (1977) lend further support to Ehrlich’s finding. Bowers and Pierce (1975), Passel and Taylor (1977) and Hoenack and Weiler (1980), on the other hand, find no deterrence when they use an alternative (linear) functional form.5 Black and Orsagh (1978) find mixed results, depending on the cross-section year they use. There are also studies that extend Ehrlich’s time-series data or use more recent cross-sectional studies. Layson (1985) and Cover and Thistle (1988), for example, use an extension of Ehrlich’s time-series data, covering up to 1977. Layson finds a significant deterrent effect of executions, but Cover and Thistle, who correct for data nonstationarity, find no support for the deterrent effect in general. Chressanthis (1989) uses time-series data covering 1966–85 and finds a deterrent effect. Grogger (1990) uses daily data for California during 1960–63 and finds no significant short-term correlation between execution and daily homicide rates.
402
Hashem Dezhbakhsh, Paul H. Rubin and Joanna M. Shepherd
Table 17.2 Status of the death penalty Jurisdictions without a death penalty on December 31, 2000
Jurisdictions with a death penalty on December 31, 2000 (No. of executions 1977–2000)
Alaska District of Columbia Hawaii Iowa Maine Massachusetts Michigan Minnesota North Dakota Rhode Island Vermont West Virginia Wisconsin
Texas (239) Virginia (81) Florida (50) Missouri (46) Oklahoma(30) Louisiana (26) South Carolina (25) Alabama (23) Arkansas (23) Georgia (23) Arizona (22) North Carolina (16) Illinois (12) Delaware (11) California (8) Nevada (8) Indiana (7) Utah (6) Mississippi (4) Maryland (3) Nebraska (3) Pennsylvania (3) Washington (3) Kentucky (2) Montana (2) Oregon (2) Colorado (1) Idaho (1) Ohio (1) Tennessee (1) Wyoming (1) Connecticut (0) Kansas (0) New Hampshire (0) New Jersey (0) New Mexico (0) New York (0) South Dakota (0)
Source: Snell, Tracy L. 2001. Capital Punishment 2000. Washington, D.C.: U.S. Bureau of Justice Statistics (NCJ 190598).
There are also a few recent studies. Brumm and Cloninger (1996), for example, who use cross-sectional data covering fifty-eight cities in 1985 report that the perceived risk of punishment is negatively and significantly correlated with homicide commission rate. Studying the effect of concealed
Does capital punishment have a deterrent effect? 403
Figure 17.1 Murder rates in executing and nonexecuting states.
handgun laws on public shootings, Lott and Landes (2000) report a negative association between capital punishment and murder on a concurrent basis. Cloninger and Marchesini (2001) report that the Texas unofficial moratorium on executions during most of 1996 appears to have contributed to additional homicides. Mocan and Gittings (unpublished data) find that pardons may increase the homicide rate while executions reduce the rate. Zimmerman (2001) also reports that executions have a deterrent effect.6 None of the existing studies, however, uses county-level postmoratorium panel data. Becker’s (1968) economic model of crime provides the theoretical foundation for much of the regression analysis in this area. The model derives the supply, or production, of offenses for an expected utility maximizing agent. Ehrlich (1975) extends the model to murders that he argues are committed either as a byproduct of other violent crimes or as a result of interpersonal conflicts involving pecuniary or nonpecuniary motives. Ehrlich derives several theoretical propositions predicting that an increase in perceived probabilities of apprehension, conviction given apprehension, or execution given conviction will reduce an individual’s incentive to commit murder. An increase in legitimate or a decrease in illegitimate earning or income opportunities will have a similar crime-reducing effect. Unfortunately, variables that can measure legitimate and illegitimate opportunities are not readily available. Ehrlich and authors who test his propositions, therefore, use several economic and demographic variables as proxies. Demographic characteristics such as population density, age, gender, and
404 Hashem Dezhbakhsh, Paul H. Rubin and Joanna M. Shepherd race enter the analysis because earning opportunities (legitimate or illegitimate) cannot be perfectly controlled for in an empirical investigation. Such characteristics may influence earning opportunities and can therefore serve as reasonable proxies. The following individual decision rule, therefore, provides the basis for empirical investigation of the deterrent effect of capital punishment: ψt = f (Pat, Pc | at, Pe | ct, Zt, ut ),
(1)
where ψ is a binary variable that equals 1 if the individual commits murder during period t, and 0, otherwise; P denotes the individual’s subjective probability; a, c, and e denote apprehension, conviction, and execution, respectively; Z contains individual-specific economic and demographic characteristics, as well as any other observable variable that may affect the individual’s choice; and u is a stochastic term that includes any other relevant variable unobserved by the investigator.7 Variables included in Z also capture the legitimate earning opportunities. The individual’s preferences affect the function f (·). Most studies of the deterrent hypothesis use either time-series or crosssectional data to estimate the murder supply, based on equation (1). The data, however, are aggregated to state or national levels, so Ψ is the murder rate for the chosen jurisdiction. The deterrent effect of capital punishment is then the partial derivative of ψ with respect to Pe | c. The debate in this literature revolves around the choice of the regressors in (1), endogeneity of one or more of these regressors, and to a lesser extent the choice of f(·).
3. Model specification and data In this section we first address two data-related specification issues that have not received due attention in the capital punishment literature. The first involves the functional form of the econometric equations, and the second concerns the allegedly adverse effect of including the nondeterrable murders in the analysis. These discussions shape the formulation of our model. Functional form Most econometric models that examine the deterrent effect of capital punishment derive the murder supply from equation (1). The first step involves choosing a functional form for the equation. Ideally, the functional form of the murder supply equation should be derived from the optimizing individual’s objective function. Since this ideal requirement cannot be met in practice, convenient alternatives are used instead. Despite all the emphasis that this literature places on specification issues such as variable selection and endogeneity, studies often choose the functional form of murder supply
Does capital punishment have a deterrent effect? 405 rather haphazardly.8 Common choices are double-log, semilog, or linear functions. Rather than arbitrarily choosing one of these functional forms, we use the form that is consistent with aggregation rules. More specifically, note that equation (1) purports to describe the behavior of a representative individual. In practice, however, we rarely have individual-level data, and, in fact, the available data are usually substantially aggregated. Applying such data to an equation derived for a single individual implies that the equation is invariant under aggregation, and its extension to a group of individuals requires aggregation. For example, to obtain an equation describing the collective behavior of the members of a group—for instance, residents of a county, city, state, or country—one needs to add up the equations characterizing the behavior of each member. If the group has n members, then n equations, each with the same set of parameters and the same functional form but different variables, should be added up to obtain a single aggregate equation. This aggregate equation has the same functional form as the individual-level equation—it is invariant under aggregation—only in the linear case. Because not every form has this invariance property, the choice of the functional form of the equation is important. For example, deterrence studies have applied the same double-log (or semilog) murder supply equation to city, state, and national level data, assuming implicitly that a double-log (or semilog) equation is invariant under aggregation. But this is not true, because the sum of n double-log equations would not be another double-log equation. A similar argument rules out the semilog specification. The linear form, however, remains invariant under aggregation. Assume that the individual’s murder supply equation (1) is linear in its variables, Ψj,t = ai + β1Pai,t + β2Pc | ai,t + β3Pe | ci,t + g1Zj,t + γ2TDt + uj,t,
(1′)
where j denotes the individual, i denotes county, ai is the county-specific fixed effect, TD is a set of time trend dummies that captures national trends, such as violent TV programming or movies that have similar cross-county effects, and us are stochastic error terms with a zero mean and variance σ2. Assume there are ni individuals in county i—for example, j = 1,2, . . ., ni— with i = 1,2, . . ., N, where N is the total number of counties in the U.S. Note that probabilities have an i rather than a j subscript because only individuals in the same county face the same probability of arrest, conviction, or execution. Summing equation (1′) over all ni individuals in county i and dividing by the number of these individuals (county population) results in an aggregate equation at the county level for period t. For example, ni
mi,t =
ψj,t
冱n j=1
= ai + β1Pai,t + β2Pc | ai,t + β3Pe | ci,t
i
+ g1Zi,t + γ2TDt + ui,t,
(2)
406 Hashem Dezhbakhsh, Paul H. Rubin and Joanna M. Shepherd where mi is murder rate for county i (number of capital murders divided by county population). The above averaging does not change the Pi, but it alters the qualitative elements of Z into percentages and the level elements into per capita measures.9 The subscript i obviously indicates that these values are for county i. Also, note that the new error term, ui,t = Σj =n 1 uj,t/ni, is heteroskedastic, because its variance σ2/ni is proportional to county population. The standard correction for the resulting heteroskedasticity is to use weighted estimation, where the weights are the square roots of county population, ni. Such linear correction for heteroskedasticity is routinely used by practitioners even in double-log or semilog equations. Given the above discussion, we use a linear model.10 Ehrlich (1996) and Cameron (1994) indicate that research using a linear specification is less likely than a logarithmic specification to find a deterrent effect. This makes our results more conservative in rejecting the “no deterrence” hypothesis. i
Nondeterrable murders Critics of the economic model of murder have argued that, because the model cannot explain the nonpremeditated murders, its application to overall murder rate is inappropriate. For example, Glaser (1977) claims that murders committed during interpersonal disputes or noncontemplated crimes of passion are not intentionally committed and are therefore nondeterrable and should be subtracted out. Because the crime data include all murders, without a detailed classification, any attempt to exclude the allegedly nondeterrable crimes requires a detailed examination of each reported murder and a judgment as to whether that murder can be labeled deterrable or nondeterrable. Such expansive data scrutiny is virtually impossible. Moreover, it would require an investigator to use subjective judgment, which would then raise concerns about the objectivity of the analysis. We examine this seemingly problematic issue and offer an econometric response to the criticisms. The response applies equally to the concerns about including nonnegligent manslaughter—another possible nondeterrable crime—in the murder rate.11 Assume equation (2) specifies the variables that affect the rate of the deterrable capital murders, m. Some of the nondeterrable murders would be related to economic and demographic factors or other variables in Z. For example, family disputes leading to a nonpremeditated murder may be more likely to occur at times of economic hardship. We denote the rate of such murders by m′ and accordingly specify the related equation m′i,t = α′i + γ 1′ Zi,t + u′i,t,
(2′)
where u′ is a stochastic term and α′ and γ′ are unknown parameters. Other
Does capital punishment have a deterrent effect? 407 nondeterrable murders are not related to any of the explanatory variables in equation (2). From the econometricians’ viewpoint, therefore, such murders appear as merely random acts. They include accidental murders and murders committed by the mentally ill. We denote these by m″ and accordingly specify the related equation m″i,t = α″i + u ″i,t,
(2″)
where u″ is a stochastic term and α″ is an unknown parameter. The overall murder rate is then M = m + m′ + m″, which upon substitution for m′ and m″ yields Mi,t = αi + β1Pai,t + β2Pc | ai,t + β3Pe | c + γ1Zi,t + γ2TDt + εi,t,
(3)
where αi = ai + α′i + α″i, γ1 = g1 + γ 1′ , and εi,t = ui,t + u i′,t + u ″i,t is the compound stochastic term.12 Note that we cannot estimate g1, in equation (2), or γ 1′ , in equation (2′), separately, because data on separate murder categories are not readily available. This, however, does not prevent us from estimating the combined effect γ1, nor does it affect our main inference, which is about the βs.13 Therefore, any inference about the deterrent effect is unaffected by the inclusion of the nondeterrable murders in the murder rate. Econometric model The murder supply equation (3) provides the basis for our inference. The three subjective probabilities in this equation are endogenous and must be estimated through separate equations. Endogeneity in this literature is often dealt with through the use of an arbitrarily chosen set of instrumental variables. Hoenack and Weiler (1980) criticize earlier studies both for this practice and for not treating the estimated equations as part of a theorybased system of simultaneous equations. We draw on the economic model of crime and the existing capital punishment literature to identify a system of simultaneous equations. We specify three equations to characterize the subjective probabilities in equation (3). These equations capture the activities of the law enforcement agencies and the criminal justice system in apprehending, convicting, and punishing perpetrators. Resources allocated to the respective agencies for this purpose affect their effectiveness and thus enters these equations: Pai,t = 1,i + 2Mi,t + 3PEi,t + 4TDt + i,t,
(4)
Pc | ai,t = θ1,i + θ2Mi,t + θ3JEi,t + θ4PIi,t + θ5PAi,t + θ6TDt + ξi,t,
(5)
Pe | ci,t = ψ1,i + ψ2Mi,t + ψ3JEi,t + ψ4PIi,t + ψ5TDt + ζi,t,
(6)
and
408 Hashem Dezhbakhsh, Paul H. Rubin and Joanna M. Shepherd where PE is police payroll expenditure, JE is expenditure on judicial and legal system, PI is partisan influence as measured by the Republican presidential candidate’s percentage of the statewide vote in the most recent election, PA is prison admission, TD is a set of time dummies that capture national trends in these perceived probabilities, and , ξ, and ζ are error terms. If police and prosecutors attempt to minimize the social costs of crime, they must balance the marginal costs of enforcement with the marginal benefits of crime prevention. Police and judicial-legal expenditure, PE and JE, represent marginal costs of enforcement. More expenditure should increase the productivity of law enforcement or increase the probabilities of arrest, and of conviction, given arrest. Partisan influence is used to capture any political pressure to “get tough” with criminals, a message popular with Republican candidates. The influence is exerted by changing the makeup of the court system, such as the appointment of new judges or prosecutors that are “tough on crime.” This affects the justice system and is, therefore, included in equations (5) and (6). Prison admission is a proxy for the existing burden on the justice system; the burden may affect judicial outcomes. This variable is defined as the number of new court commitments admitted during each year.14 Also, note that all three equations include county fixed effects to capture the unobservable heterogeneity across counties. We use two other crime categories besides murder in our system of equations. These are aggravated assault and robbery, which are among the control variables in Z. Given that some murders are the byproducts of violent activities, such as aggravated assault and robbery, we include these two crime rates in Z when estimating equation (3). Forst, Filatov, and Klein (1978) and McKee and Sesnowitz (1977) find that the deterrent effect vanishes when other crime rates are added to the murder supply equation. They attribute this to a shift in the propensity to commit crime, which in turn shifts the supply function. We include aggravated assault and robbery to examine this substitution effect. The other control variables that we include in Z measure economic and demographic influences. We include economic and demographic variables, which are all available at the county level, following other studies based on the economic model of crime.15 Economic variables are used as proxy for legitimate and illegitimate earning opportunities. An increase in legitimate earning opportunities increases the opportunity cost of committing crime and should result in a decrease in the crime rate. An increase in illegitimate earning opportunities increases the expected benefits of committing crime and should result in an increase in the crime rate. Economic variables are real per capita personal income, real per capita unemployment insurance payments, and real per capita income maintenance payments. The income variable measures both the labor market prospects of potential criminals and the amount of wealth available to steal. The unemployment payments variable is a proxy for overall labor market conditions and the availability of legitimate jobs for potential criminals. The transfer payments variable
Does capital punishment have a deterrent effect? 409 represents other nonmarket income earned by poor or unemployed people. Other studies have found that crime responds to measures of both income and unemployment but that the effect of income on crime is stronger. Demographic variables include population density and six gender and race segments of the population ages 10–29 (male or female; black, white or other). Population density is included to capture any relationship between drug activities in inner cities and murder rate. The age, gender, and race variables represent the possible differential treatment of certain segments of the population by the justice system, changes in the opportunity cost of time through the life cycle, and gender- or race-based differences in earning opportunities. The control variables also include the state level National Rifle Association (NRA) membership rate. NRA membership is included in response to a criticism of earlier studies. Forst, Filatov, and Klein (1978) and Kleck (1979) criticize both Ehrlich and Layson for not including a gun-ownership variable. Kleck reports that including the gun variable eliminates the significance of the execution rate. Also, all equations include a set of time dummies that capture national trends and influences affecting all counties but varying over time. Data and estimation method We use a panel data set that covers 3,054 counties for the 1977–96 period.16 More current data are not available on some of our variables, because of the lag in posting data on law enforcement and judicial expenditures by the Bureau of Justice Statistics. The county level data allow us to include countyspecific characteristics in our analysis and therefore reduce the aggregation problem from which much of the literature suffers. By controlling for these characteristics, we can better isolate the effect of punishment policy. Moreover, panel data allow us to overcome the unobservable-heterogeneity problem that affects cross-sectional studies. Neglecting heterogeneity can lead to biased estimates. We use the time dimension of the data to estimate county fixed effects and condition our two-stage estimation on these effects. This is equivalent to using county dummies to control for unobservable variables that differ among counties. This way we control for the unobservable heterogeneity that arises from county-specific attributes, such as attitudes towards crime, or crime reporting practices. These attributes may be correlated with the justice system variables (or other exogenous variables of the model) giving rise to endogeneity and biased estimation. An advantage of the data set is its resilience to common panel problems, such as self-selectivity, nonresponse, attrition, or sampling design shortfalls. We have county level data for murder arrests, which we use to estimate Pa. Conviction data are not available, however, because the Bureau of Justice Statistics stopped collecting them years ago. In the absence of conviction data, sentencing is a viable alternative that covers the intervening stage
410
Hashem Dezhbakhsh, Paul H. Rubin and Joanna M. Shepherd
between arrest and execution. This variable has not been used in previous studies, although authors have suggested its use in deterrence studies (see, e.g., Cameron, 1994, p. 210). We have obtained data from the Bureau of Justice Statistics on number of persons sentenced to be executed by state for each year. We use this data and arrest data to estimate Pc | a. We also use sentencing and execution data to estimate Pe | c. Execution data are at the state level because execution is a state decision. Expenditure variables in equations (4)–(6) are also at the state level. The crime and arrest rates are from the Federal Bureau of Investigation’s (FBI) Uniform Crime Reports.17 The data on age, sex, and racial distributions, percentage of state population voting Republican in the most recent presidential election, and the area in square miles for each county are from the U.S. Bureau of the Census. Data on income, unemployment, income maintenance, and retirement payments are obtained from the Regional Economic Information System. Data on expenditure on police and judiciallegal systems, number of executions, and number of death row sentences, prison populations, and prison admissions are obtained from the U.S. Department of Justice’s Bureau of Justice Statistics. NRA membership rates are obtained from the National Rifle Association. The model we estimate consists of the simultaneous system of equations (3)–(6). We use the method of two-stage least squares, weighted to correct for the Heteroskedasticity discussed earlier. We choose two-stage over threestage least squares because, though the latter has an efficiency advantage, it produces inconsistent estimates if an incorrect exclusionary restriction is placed on any of the system equations. Since we are mainly interested in one equation—the murder supply equation (3)—using the three-stage least squares method seems risky. Moreover, the two-stage least squares estimators are shown to be more robust to various specification problems (see, e.g., Kennedy, 1992, chap. 10). Other variables and data are discussed next.
4. Empirical results Regression results The coefficient estimates for the murder supply equation (3) obtained with the two-stage least squares method and controlling for county level fixed effects are reported in Table 17.3 and 17.4. Various models reported in Tables 17.3 and 17.4 differ in the way the perceived probabilities of arrest, sentencing, and execution are measured. These three probabilities are endogenous to the murder supply equation (3); the tables present the coefficients on the predicted values of these probabilities. We first describe Table 17.3. For Model 1 in Table 17.3 the conditional execution probability is measured by executions at t divided by number of death sentences at t − 6. For Model 2 this probability is measured by number of executions at t + 6
Does capital punishment have a deterrent effect? 411 Table 17.3 Two-stage least squares regression results for murder rate (Models 1–3) Estimated coefficients Regressor Deterrent Variable Probability of arrest Conditional probability of death sentence Conditional probability of execution Other Crime Aggravated assault rate Robbery rate Economic Variable Real per capita personal income
Model 1
Model 2
Model 3
−4.037 −10.096 −3.334 (6.941)** (17.331)** (6.418)** −21.841 −42.411 −32.115 (1.167) (3.022)** (1.974)** −5.170 −2.888 −7.396 (6.324)** (6.094)** (10.285)** 0.0040 0.0059 0.0049 (18.038)** (23.665)** (22.571)** 0.0170 0.0202 0.0188 (39.099)** (51.712)** (49.506)**
0.0005 0.0007 (14.686)** (17.134)** Real per capita unemployment insurance payments −0.0064 −0.0077 (6.798)** (8.513)** Real per capita income maintenance payments 0.0011 −0.0020 (1.042) (1.689)* Demographic Variable African American (%) 0.0854 −0.1114 (2.996)** (4.085)** Minority other than African American (%) −0.0382 0.0255 (7.356)** (0.7627) Male (%) 0.3929 0.2971 (7.195)** (3.463)** Age 10–19 (%) −0.2717 −0.4849 (4.841)** (8.021)** Age 20–29 (%) −0.1549 −0.6045 (3.280)** (12.315)** Population density −0.0048 −0.0066 (22.036)** (24.382)** NRA membership rate (% state pop. in NRA) 0.0003 0.0004 (1.052) (1.326) Intercept 6.393 23.639 (0.4919) (6.933)** F-statistic 217.90 496.29 0.8476 0.8428 Adjusted r2
0.0006 (16.276)** −0.0033 (3.736)** 0.0024 (2.330)** 0.1852 (6.081)** −0.0224 (4.609)** 0.2934 (5.328)** 0.0259 (0.4451) −0.0489 (0.9958) −0.0036 (17.543)** −0.0002 (0.6955) −12.564 (0.9944) 276.46 0.8624
Notes: Dependent variable is the murder rate (murders/100,000 population). In Model 1 the execution probability is (number of executions at t)/(number of death row sentences at t – 6). In Model 2 the execution probability is (number of executions at t + 6)/(number of death row sentences at t). In Model 3 the execution probability is (sum of executions at t + 2 + t + 1 + t + t − 1 + t − 2 + t − 3)/(sum of death row sentences at t − 4 + t − 5 + t − 6 + t − 7 + t − 8 + t − 9). Sentencing probabilities are computed accordingly, but with a two-year displacement lag and a two-year averaging rule. Absolute value of t-statistics are in parentheses. The estimated coefficients for year and county dummies are not shown. * Significant at the 90% confidence level, two-tailed test. ** Significant at the 95% confidence level, two-tailed test.
412
Hashem Dezhbakhsh, Paul H. Rubin and Joanna M. Shepherd
Table 17.4 Two-stage least squares regression results for murder rate (Models 4–6) Estimated coefficients Regressor
Model 5
Model 6
−2.264 (4.482)** −3.597 (0.2475) −2.715 (4.389)**
−4.417 (9.830)** −47.661 (4.564)** −5.201 (19.495)**
−2.184 (4.568)** −10.747 (0.8184) −4.781 (8.546)**
0.0053 (29.961)** 0.0110 (35.048)**
0.0086 (47.284)** 0.0150 (54.714)**
0.0064 (35.403)** 0.0116 (41.162)**
0.005 (20.220)** −0.0043
0.0004 (14.784)** −0.0054
0.0005 (19.190)** −0.0038
(5.739)** 0.0043 (5.743)**
(7.317)** 0.0002 (0.2798)
(5.080)** 0.0027 (3.479)**
0.1945 (9.261)** Minority other than African American (%) −0.0338 (7.864)** Male (%) 0.2652 (6.301)** Age 10–19 (%) −0.2096 (5.215)** Age 20–29 (%) −0.1315 (3.741)** Population density −0.0044 (30.187)** NRA membership rate (% state pop. in NRA) 0.0008 (3.423)** Intercept 10.327 (0.8757) F-Statistic 280.88 0.8256 Adjusted r2
0.0959 (4.956)** −0.0422 (9.163)** 0.3808 (8.600)** −0.6516 (15.665)** −0.5476 (15.633)** −0.0041 (27.395)** 0.0006 (3.308)** 17.035 (8.706)** 561.93 0.8062
0.1867 (7.840)** −0.0237 (5.536)** 0.2199 (4.976)** −0.1629 (3.676)** −0.1486 (3.971)** −0.0046 (30.587)** 0.0008 (3.379)** 10.224 (1.431) 323.89 0.8269
Deterrent Variable Probability of arrest Conditional probability of death sentence Conditional probability of execution Other Crime Aggravated assault rate Robbery rate Economic Variable Real per capita personal income Real per capita unemployment insurance payments Real per capita income maintenance payments Demographic Variable African American (%)
Model 4
Notes: Dependent variable is the murder rate (murders/100,000 population). In Model 4 the execution probability is (number of executions at t)/(number of death row sentences at t − 6). In Model 5 the execution probability is (number of executions at t + 6)/(number of death row sentences at t). In Model 6 the execution probability is (sum of executions at t + 2 + t + 1 + t + t − 1 + t − 2 + t − 3)/(sum of death row sentences at t − 4 + t − 5 + t − 6 + t − 7 + t − 8 + t − 9). Sentencing probabilities are computed accordingly, but with a two-year displacement lag and a two-year averaging rule. Absolute value of t-statistics are in parentheses. The estimated coefficients for year and county dummies are not shown. * Significant at the 90% confidence level, two-tailed test. ** Significant at the 95% confidence level, two-tailed test.
Does capital punishment have a deterrent effect? 413 divided by number of death sentences at t. The two ratios reflect forwardlooking and backward-looking expectations, respectively. The displacement lag of six years reflects the lengthy waiting time between sentencing and execution, which averages six years for the period we study (see Bedau, 1997, chap. 1). For probability of sentencing, given arrest, we use a two-year lag displacement, reflecting an estimated two-year lag between arrest and sentencing. Therefore, the conditional sentencing probability for Model 1 is measured by the number of death sentences at t divided by the number of arrests for murder at t − 2. For Model 2 this probability is measured by number of death sentences at t + 2 divided by number of arrests for murder at t. Given the absence of an arrest lag, no lag displacement is used to measure the arrest probability. It is simply the number of murder-related arrests at t divided by the number of murders at t. For Model 3 in Table 17.3 we use an averaging rule. We use a six-year moving average to measure the conditional probability of execution, given a death sentence. Specifically, this probability at time t is defined as the sum of executions during (t + 2, t + 1, t, t − 1, t − 2, and t − 3) divided by the sum of death sentences issued during (t − 4, t − 5, t − 6, t − 7, t − 8, and t − 9). The six-year window length and the six-year displacement lag capture the average time from sentence to execution for our sample. Similarly, a two-year lag and a two-year window length is used to measure the conditional death sentencing probabilities. Given the absence of an arrest lag, no averaging or lag displacement is used when arrest probabilities are computed.18 Strictly speaking, these measures are not the true probabilities. However, they are closer to the probabilities as viewed by potential murderers than would be the “correct” measures. Our formulation is consistent with Sah’s (1991) argument that criminals form perceptions based on observations of friends and acquaintances. We draw on the capital punishment literature to parameterize these perceived probabilities. Models 4, 5, and 6 in Table 17.4 are, respectively, similar to Models 1,2, and 3 in Table 17.3, except for the way we treat undefined probabilities. When estimating the models reported in Table 17.3, we observed that in several years some counties had no murders and some states had no death sentences. This rendered some probabilities undefined because of a zero denominator. Estimates in Table 17.3 are obtained excluding these observations. Alternatively, and to avoid losing data points, for any observation (county/year) in which the probabilities of arrest or execution are undefined because of this problem, we substituted the relevant probability from the most recent year when the probability was not undefined. We look back up to four years, because in most cases this eradicates the problem of undefined probabilities. The assumption underlying such substitution is that criminals will use the most recent information available in forming their expectations. So a person contemplating committing a crime at time t will not assume that he will not be arrested if no crime has been committed, and hence no arrest has been made, during this period. Rather, he will form an impression of the arrest
414 Hashem Dezhbakhsh, Paul H. Rubin and Joanna M. Shepherd odds, an impression based on arrests in recent years. This is consistent with Sah’s (1991) argument. Table 17.4 uses this substitution rule to compute probabilities when they are undefined.19 Results in Tables 17.3 and 17.4 suggest the presence of a strong deterrent effect.20 The estimated coefficient of the execution probability is negative and highly significant in all six models. This suggests that an increase in perceived probability of execution, given that one is sentenced to death, will lead to a lower murder rate.21 The estimated coefficient of the arrest probability is also negative and highly significant in all six models. This finding is consistent with the proposition set forth by the economic models of crime, which suggests an increase in the perceived probability of apprehension leads to a lower crime rate. For the sentencing probability the estimated coefficients are negative in all models and significant in three of the six models. It is not surprising that sentencing has a weaker deterrent effect, given that we are estimating the effect of sentencing, holding the execution probability constant. What we capture here is a measure of the “weakness” or “porosity” of the state’s criminal justice system. The coefficient of the sentencing probability picks up not only the ordinary deterrent effect, but also the porosity signal. The latter effect may, indeed, be stronger. For example, if criminals know that the justice system issues many death sentences but the executions are not carried out, then they may not be deterred by an increase in probability of a death sentence. In fact, an unpublished study by Leibman, Fagan, and West reports that nearly 70 percent of all death sentences issued between 1973 and 1995 were reversed on appeal at the state or federal level. Also, six states sentence offenders to death but have performed no executions. This reveals the indeterminacy of a death sentence and its ineffectiveness when it is not carried out. Such indeterminacy affects the deterrence of a death sentence. The murder rate appears to increase with aggravated assault and robbery, as the estimated coefficients for these two variables are positive and highly significant in all cases. This is in part because these crimes are caused by the same factors that lead to murder, so measures of these crimes serve as additional controls. In addition, this reflects the fact that some murders are the byproduct of robbery or aggravated assault. In fact, several studies have documented that increasing proportions of homicides are the outcome of robbery (see, e.g., Zimring, 1977). Additional demographic variables are included primarily as controls, and we have no strong theoretical predictions about their signs. Estimated coefficients for per capita income are positive and significant in all cases. This may reflect the role of illegal drugs in homicides during this time period. Drug consumption is expensive and may increase with income. Those in the drug business are disproportionately involved in homicides because the business generates large amounts of cash, which can lead to robberies, and because normal methods of dispute resolution are not available. An increase
Does capital punishment have a deterrent effect? 415 in per capita unemployment insurance payments is generally associated with a lower murder rate. Other demographic variables are often significant. A larger number of males in a county is associated with a higher murder rate, as is generally found (e.g., Daly and Wilson, 1988). An increase in percentage of the teenage population, on the other hand, appears to lower the murder rate. The fraction of the population that is African American is generally associated with higher murder rates, and the percentage that is minority other than African American is generally associated with a lower rate. The estimated coefficient of population density has a negative sign. One might have expected a positive coefficient for this variable; murder rates are higher in large cities. However, this may not be a consistent relationship: the murder rate can be lower in suburbs than it is in rural areas, although rural areas are less densely populated than suburbs. But the murder rate may be higher in inner cities where the density is higher than in the suburbs.22 Glaeser and Sacerdote (1999) also report that crime rates are higher for cities with 25,000 to 99,000 persons than for cities with 100,000 to 999,999 persons and then higher for cities over one million, although not as high as for the smaller cities. (Glaeser and Sacerdote, 1999, figure 3.) Because there are relatively few counties containing cities of over one million, our measure of density may be picking up this nonlinear relationship. They explain the generally higher crime rate in cities as a function of higher returns, lower probabilities of arrest and conviction, and the presence of more female-headed households. Finally, the estimates of the coefficient of the NRA membership variable are positive in five of the six models and significant in half of the cases. A possible justification is that in counties with a large NRA membership guns are more accessible and can therefore serve as the weapon of choice in violent confrontations. The resulting increase in gun use, in turn, may lead to a higher murder rate.23 The most robust findings in these tables are as follows: The arrest, sentencing, and execution measures all have a negative effect on murder rate, suggesting a strong deterrent effect as the theory predicts. Other violent crimes tend to increase murder. The demographic variables have mixed effects; murder seems to increase with the proportion of the male population. Finally, the NRA membership variable has positive and significant estimated coefficients in all cases, suggesting a higher murder rate in counties with a strong NRA presence. We do not report estimates of the coefficients of the other equations in the system (equations [4]–[6]), because we are mainly interested in equation (3), which allows direct inference about the deterrent effect. Nevertheless, the first-stage regressions do produce some interesting results. Expenditure on the police and judicial-legal system appears to increase the productivity of law enforcement. Police expenditure has a consistently positive effect on the probability of arrest (equation [4]); expenditure on the judicial-legal system has a positive and significant effect on the conditional probability of
416
Hashem Dezhbakhsh, Paul H. Rubin and Joanna M. Shepherd
receiving a death penalty sentence in all six models of equation (5). The partisan-influence variable also has a consistently positive and significant impact on the probability of receiving a death sentence (equation [5]). This result indicates that the more Republican the state, the more common the death row sentences. The partisan-influence variable has a consistently positive and significant impact on the conditional probability of execution in equation (6). This suggests that the more Republican the state, the more likely the executions. The expenditure on the judicial-legal system has a negative and significant effect on the conditional probability of execution in all six models (equation [6]). This result implies that more spending on appeals and public defenders results in fewer executions. Effect of tough sentencing laws One may argue that the documented deterrent effect reflects the overall toughness of the judicial practices in the executing states. For example, these states may have tougher sentencing laws that serve as a deterrent to various crimes, including murder. To examine this argument, we constructed a new variable measuring “judicial toughness” for each state, and estimated the correlation between this variable and the execution variable.24 The estimated correlation coefficient ranges from −.06 to .26 for the six measures of the conditional probability of execution that we have used in our regression analysis. The estimated correlation between the toughness variable and the binary variable that indicates whether or not a state has a capital punishment law in any given year is .28. We also added the toughness variable to equation (3), our main regression equation, to see whether its inclusion alters our results. The inclusion of the toughness variable did not change the significance or sign of the estimated execution coefficient. Moreover, the toughness variable has an insignificant coefficient estimate in four of the six regressions. The low correlation between execution probability and the toughness variable, along with the observed robustness of our results to inclusion of the toughness variable, suggests that the deterrent finding is driven by executions and not by tougher sentencing laws. Magnitude of the deterrent effect The statistical significance of the deterrent coefficients suggests that executions reduce the murder rate. But how strong is the expected tradeoff between executions and murders? In other words, how many potential victims can be saved by executing an offender?25 Neither aggregate time-series nor cross-sectional analyses can provide a meaningful answer to this question. Aggregate time-series data, for example, cannot impose the restriction that execution laws be state specific, and any deterrent effect should be restricted to the executing state. Cross-sectional studies, on the other hand, capture the
Does capital punishment have a deterrent effect? 417 effect of capital punishment through a binary dummy variable that measures an overall effect of the capital punishment laws instead of a marginal effect. Panel data econometrics provides the appropriate framework for a meaningful inference about the tradeoff. Here an execution in one state is modeled to affect the murders in the same state only. Moreover, the panel allows estimation of a marginal effect rather than an overall effect. To estimate the expected tradeoff between executions and murder, we can use estimates of the execution deterrent coefficient βˆ3 as reported in Tables 17.3 and 17.4. We focus on Model 4 in Table 17.4, which offers the most conservative (smallest) estimate of this coefficient. The coefficient β3 is the partial derivative of murder per 100,000 population with respect to the conditional probability of execution, given sentencing (e.g., the number of executions at time t divided by the number of death sentences issued at time t − 6). Given the measurement of these variables, the number of potential lives saved as the result of one execution can be estimated by the quantity β3(POPULATIONt/100,000) (1/St−6), where S is the number of individuals sentenced to death. We evaluate this quantity for the United States, using β3 estimate in Model 4 and t = 1996, the most recent period that our sample covers. The resulting estimate is 18, with a margin of error of 10 and therefore a corresponding 95 percent confidence interval (8–28).26 This implies that each additional execution has resulted, on average, in eighteen fewer murders, or in at least eight fewer murders. Also, note that the presence of population in the above expression is because murder data used to estimate β3 is on a per capita basis. In calculating the tradeoff estimate, therefore, we use the population of the states with a death penalty law, since only residents of these states can be deterred by executions. Robustness of results Although we believe that our econometric model is appropriate for estimating the deterrent effect of capital punishment, the reader may want to know how robust our results are. To provide such information, we examine the sensitivity of our main finding—that capital punishment has a deterrent effect on capital crimes—to the econometric choices we have made. In particular, we evaluate the robustness of our deterrence estimates to changes in aggregation level, functional form, sampling period, modeling death penalty laws, and endogenous treatment of the execution probability. For each specification, we estimate the same six models as described above. The results are reported in Table 17.5. Each row includes the estimated coefficient of the execution probability (and the corresponding t-statistics) for the six models.27 Results are in general quite similar to those reported for the main specification. For example, where we use state level data the estimated coefficient of the execution probability is negative and significant in five of the six models, suggesting a strong deterrent effect for executions. In the remaining case, Model 4, the coefficient estimate is insignificant.
−2.257 (2.151)** −0.191 (3.329)** −0.078 (2.987)** 0.204 (0.301) −3.074 (6.426)** −7.085 (11.471)** −0.428 (3.236)**
−5.343 (2.774)** −0.145 (1.449) −0.155 (3.242)** −3.021 (3.250)** −7.431 (9.821)** −0.088 (0.090) −0.494 (2.888)**
State level data
−6.271 (4.013)** −0.218 (2.372)** −0.144 (6.283)** −3.251 (3.733)** −7.631 (11.269)** −4.936 (5.686)** −2.515 (8.284)**
Model 3 −1.717 (0.945) −0.142 (0.878) −0.150 (1.871)* −1.681 (2.182)** −4.442 (7.143)** −1.688 (2.394)** −0.309 (2.464)**
Model 4
Notes: Absolute value of t-statistics are in parentheses. The estimated coefficients for the other variables are available upon request. * Significant at the 90% confidence level, two-tailed test. ** Significant at the 95% confidence level, two-tailed test.
Exogenous execution probability
Other crimes dropped
Execution dummy added
1990s data
Double-log
Semilog
Model 2
Model 1
Specification
Table 17.5 Estimates of the execution probability coefficient under various specifications (robustness check)
−4.046 (6.486)** −0.420 (6.518)** −0.181 (3.903)** −4.079 (4.200)** −5.109 (19.564)** −7.070 (22.282)** −0.377 (5.102)**
Model 5
−2.895 (1.867)* −0.419 (2.902)** −0.158 (3.818)** −2.791 (3.633)** −5.669 (9.922)** −1.599 (2.531)** −1.761 (7.562)**
Model 6
Does capital punishment have a deterrent effect? 419 We also estimate our econometric model in double-log and semilog forms. These, along with the linear model, are the commonly used functional forms in this literature. For the semilog form, this coefficient estimate is negative in all six models and significant in four of the models. For the double-log form, the estimated coefficient of the execution probability is negative and significant in all six models. These results suggest that our deterrence finding is not sensitive to the functional form of the model. Given that the executions have accelerated in the 1990s, we think it worthwhile to examine the deterrent effect of capital punishment, using only the 1990s data. This will also get at a possible nonlinearity in the execution parameter. We, therefore, estimate Models 1–6, using only the 1990s data. The coefficient estimate for the execution probability is negative and significant for all models but Model 2, which has a positive but insignificant coefficient. As an additional robustness check, we added to our linear model a dummy variable that identifies the states with capital punishment. This variable takes a value of 1 if the state has a death penalty law on the books in a given year, and 0 otherwise. This variable allows us to make a distinction between having a death penalty law and using it. The addition of this variable did not change the sign or the significance of the estimated coefficient of the execution probability. The estimated coefficient remains negative and significant in all six models. The estimated coefficient of the dummy variable, on the other hand, does not show any additional deterrence. This suggests that having a death penalty law on the books does not deter criminals when the law is not applied. In addition, we estimate the models after dropping the crime rates of aggravated assault and robbery. The coefficient for the conditional probability of execution is negative and significant in four of the models. In Model 1 the coefficient is negative and insignificant, and in Model 4 the coefficient is positive and significant. We also estimated all six models reported in Tables 17.3 and 17.4, assuming that the execution probability is exogenous. In all six cases the estimated coefficient of this variable turned out to be negative and significant, suggesting a strong deterrent effect. The numerator of murder rate, our dependent variable, is murder that also appears as the denominator of arrest rate, which is one of the regressors, and is perhaps proportional to other probabilities that we use as regressors. To make certain that we are not observing a spurious negative correlation between these variables, we estimate the primary system of equations (3)–(6), using variables that are in levels. We use the number of murders in year t as the dependent variable and the number of executions, the number of death row sentences, and the number of arrests in year t as the deterrent variables. The estimated coefficient on the number of executions in this specification is −16.008 with a t-statistic of 25.440 (significant at the 95 percent confidence level), indicating deterrence and suggesting that our results are not artifacts of variable construction.
420
Hashem Dezhbakhsh, Paul H. Rubin and Joanna M. Shepherd
Overall, we estimate fifty-five models. Six models are reported in Tables 17.3 and 17.4; forty-four models in Table 17.5. One model is discussed in the previous paragraph, and six models are discussed in the section examining the effect of tough sentencing laws; the estimated coefficient of the execution probability is negative and significant in forty-nine of these models and negative but insignificant in four (see note 27). The above robustness checks suggest that our main finding that executions deter murders is not sensitive to various specification choices.
5. Concluding remarks Does capital punishment deter capital crimes? The question remains of considerable interest. Both presidential candidates in the fall 2000 election were asked this question, and they both responded vigorously in the affirmative. In his pioneering work, Ehrlich (1975, 1977) applied a theory-based regression equation to test for the deterrent effect of capital punishment and reported a significant effect. Much of the econometric emphasis in the literature following Ehrlich’s work has been the specification of the murder supply equation. Important data limitations, however, have been acknowledged. In this study we use a panel data set covering 3,054 counties over the period 1977–96 to examine the deterrent effect of capital punishment. The relatively low level of aggregation allows us to control for county specific effects and also avoid problems of aggregate time-series studies. Using comprehensive postmoratorium evidence, our study offers results that are relevant for analyzing current crime levels and useful for policy purposes. Our study is timely because several states are currently considering either a moratorium on executions or new laws allowing execution of criminals. In fact, the absence of recent evidence on the effectiveness of capital punishment has prompted state legislatures in, for example, Nebraska to call for new studies on this issue. We estimate a system of simultaneous equations in response to the criticism levied on studies that use ad hoc instrumental variables. We use an aggregation rule to choose the functional form of the equations we estimate: linear models are invariant to aggregation and are therefore the most suited for our study. We also demonstrate that the inclusion of nondeterrable murders in murder rate does not bias the deterrence inference. Our results suggest that the legal change allowing executions beginning in 1977 has been associated with significant reductions in homicide. An increase in any of the three probabilities of arrest, sentencing, or execution tends to reduce the crime rate. Results are robust to specification of such probabilities. In particular, our most conservative estimate is that the execution of each offender seems to save, on average, the lives of eighteen potential victims. (This estimate has a margin of error of plus and minus ten.) Moreover, we find robbery and aggravated assault associated with increased murder rates. A higher NRA presence, measured by NRA membership rate, seems to have
Does capital punishment have a deterrent effect? 421 a similar murder-increasing effect. Tests show that results are not driven by “tough” sentencing laws and are robust to various specification choices. Our main finding, that capital punishment has a deterrent effect, is robust to choice of functional form (double-log, semilog, or linear), state level versus county level analysis, sampling period, endogenous versus exogenous probabilities, and level versus ratio specification of the main variables. Overall, we estimate fifty-five models; the estimated coefficient of the execution probability is negative and significant in forty-nine of these models and negative but insignificant in four models. Finally, a cautionary note is in order: deterrence reflects social benefits associated with the death penalty, but one should also weigh in the corresponding social costs. These include the regret associated with the irreversible decision to execute an innocent person. Moreover, issues such as the possible unfairness of the justice system and discrimination must be considered when society makes a social decision regarding capital punishment. Nonetheless, our results indicate that there are substantial costs in deciding not to use capital punishment as a deterrent.
Acknowledgements We gratefully acknowledge helpful discussions with Issac Ehrlich and comments by Badi Baltagi, Robert Chirinko, Keith Hylton, David Mustard, George Shepherd, and participants in the 1999 Law and Economics Association Meetings, 2000 American Economics Association Meetings, and workshops at Emory University, Georgia State University, Northwestern University, and Purdue University. We are also indebted to an anonymous referee for valuable suggestions. The usual disclaimer applies.
Notes 1 In 1972 the Supreme Court imposed a moratorium on capital punishment, but in 1976 it ruled that executions under certain carefully specified circumstances are constitutional. 2 See Cameron (1994) and Avio (1998) for literature summaries. 3 Sensitivity analysis involves dividing the variables of the model into essential and doubtful and generating many estimates for the coefficient of each essential variable. The estimates are obtained from alternative specifications, each including some combination of the doubtful variable. See, e.g., Ehrlich and Liu (1999), Leamer (1983, 1985), McAleer and Veall (1989), and McManus (1985). 4 For example, an increase in nonexecuting states’ murder rates aggregated with a drop in executing states’ murder rates may incorrectly lead to an inference of no deterrence, because the aggregate data would show an increase in executions leading to no change in the murder rate. 5 Ehrlich’s regression equations are in double-log form. 6 These studies have not gone through the peer review process. 7 Note that engaging in violent activities such as robbery may lead an individual to murder. We account for this possibility in our econometric specification by including violent crime rates such as robbery in Z.
422
Hashem Dezhbakhsh, Paul H. Rubin and Joanna M. Shepherd
8 The only exceptions to this general observation are Hoenack and Weiler (1980), who criticize the use of a double-log formulation, suggesting a semilog form instead, and Layson (1985), who uses Box-Cox transformation as the basis for choosing functional form. Box-Cox transformation, however, is not appropriate for the simultaneous equations model estimated here with panel data. 9 For example, for the gender variable, an individual value is either 1 or 0. Adding the ones and dividing by county population gives us the percentage of residents who are male. Also, for the income variable, summing across individual and dividing by county population simply yields per capita income for the county. 10 To examine the robustness of our results, we will also estimate the double-log and semilog forms of our model. These results will be discussed in section 4. 11 Ehrlich (1975) discusses the nonnegligent manslaughter issue. 12 Note that the equation describing m′i,t may also include a national trend term (γ2 TDt ). The term will be absorbed into the coefficient of TD in equation (3). 13 The added noise due to compounding of errors may reduce the precision of estimation, but it does not affect the statistical consistency of the estimated parameters. 14 This does not include returns of parole violators, escapees, failed appeals, or transfers. 15 Inclusion of the unemployment rate, which is available only at the state level, does not affect the results appreciably. 16 We are thankful to John Lott and David Mustard for providing us with some of these data—from their 1997 study—to be used initially for a different study (Dezhbakhsh and Rubin, 1998). We also note the data on murder-related arrests for Arizona in 1980 is missing. As a result, we have to exclude from our analysis Arizona in 1980 (or 1982 and 1983 in cases where lags were involved). This will be explained further when we discuss model estimation. 17 The FBI Uniform Crime Report Data are the best county-level crime data currently available, in spite of criticisms about potential measurement issues due to underreporting. These criticisms are generally not so strong for murder data that are central to our study. Nonetheless, there are safeguards in our econometric analysis to deal with the issue. The inclusion of county fixed effects eliminates the effects of time-invariant differences in reporting methods across counties, and estimates of trends in crime should be accurate so long as reporting methods are not correlated across counties or time. Moreover, one way to address the problem of underreporting is to use the logarithms of crime rates, which are usually proportional to true crime rates. Our general finding is robust to introduction of logs as discussed in section 4. 18 The absence of arrest data for Arizona in 1980, mentioned earlier, results in the exclusion of Arizona 1980 from estimation of all three models, Arizona 1982 from estimation of Models 2 and 3, and Arizona 1983 from estimation of Model 3. 19 For the states that have never had an execution, the conditional probability of execution takes a value of 0. For the states that have never sentenced anyone to death row, the conditional probability of a death row sentence takes a value of 0. 20 In all of our estimations we correct the residuals from the second-stage least squares to account for using predicted values rather than the actual arrest rates, death row sentencing rates, and execution rates in the estimation of the murder equation (Davidson and MacKinnon, 1993, chap. 7). 21 We also repeat the analysis, using as our dependent variable six other crimes: aggravated assault, robbery, rape, burglary, larceny, and auto theft. If executions were found to deter other crimes besides murder, it may be the case that some other omitted variable that is correlated with the number of executions is causing crime to drop across the board. However, we find no evidence of this. Of the thirty-six models that we estimate (six crimes and six models per crime), only six
Does capital punishment have a deterrent effect? 423
22
23
24
25 26 27
exhibit a negative correlation between crime and the number of executions. These cases are spread across crimes with no consistency as to which crime decreases with executions. To examine the possibility of a piecewise relationship, we used two interactive (0 or 1) dummy variables identifying the low and the high range for the density variable. The dummies were then interacted with the density variable. The estimated coefficient for Models 1–3 were negative for the low density range and positive for the high density range, suggesting that murder rate declines with an increase in population density for counties that are not too densely populated, but increases with density for denser areas. This exercise did not alter the sign or significance of other estimated coefficients. For Models 4–6, however, the interactive dummies both have a negative sign. If the NRA membership variable is a good proxy for gun ownership, our results appear to contradict the finding that allowing concealed weapons deters violent crime (Lott and Mustard, 1997). However, the results may be consistent with theirs if the carrying of concealed weapons is negatively related to NRA membership. See also Dezhbakhsh and Rubin (1998), who find results much weaker than those of Lott and Mustard. This variable takes values 0, 1, or 2, depending on whether a state has zero, one, or two tough sentencing laws at a given year. The tough sentencing laws we consider are (1) truth-in-sentencing laws, which mandate that a violent offender must serve at least 85 percent of the maximum sentence and (2) “strikes” laws, which significantly increase the prison sentences of repeat offenders. See also Shepherd (2002a, 2002b). Ehrlich (1975) and Yunker (1976) report estimates of such tradeoffs, using timeseries aggregate data. The 95 percent confidence interval is given by + (−)1.96[SE of (βˆ3)] (POPULATIONt/100,000) (1/St−6). For brevity, we do not report full results, which are available upon request.
References Avio, Kenneth L. 1998. “Capital Punishment,” in Peter Newman, ed., The New Palgrave Dictionary of Economics and the Law. London: Macmillan. Beccaria, Cesare. 1764. On Crimes and Punishments, H. Puolucci, trans. Indianapolis, IN: Bobbs-Merrill. Becker, Gary S. 1968. “Crime and Punishment: An Economic Approach,” 76 Journal of Political Economy 169–217. Bedau, Hugo A., ed. 1997. Death Penalty in America, Current Controversies. New York: Oxford University Press. Black, Theodore, and Thomas Orsagh. 1978. “New Evidence on the Efficacy of Sanctions as a Deterrent to Homicide,” 58 Social Science Quarterly 616–31. Bowers, William J., and Glenn L. Pierce. 1975. “The Illusion of Deterrence in Isaac Ehrlich’s work on Capital Punishment,” 85 Yale Law Journal 187–208. Brumm, Harold J., and Dale O. Cloninger. 1996. “Perceived Risk of Punishment and the Commission of Homicides: A Covariance Structure Analysis,” 31 Journal of Economic Behavior and Organization 1–11. Cameron, Samuel. 1994. “A Review of the Econometric Evidence on the Effects of Capital Punishment,” 23 Journal of Socio-Economics 197–214. Chressanthis, George A. 1989. “Capital Punishment and the Deterrent Effect Revisited: Recent Time-Series Econometric Evidence,” 18 Journal of Behavioral Economics 81–97.
424
Hashem Dezhbakhsh, Paul H. Rubin and Joanna M. Shepherd
Cloninger, Dale O. 1977. “Deterrence and the Death Penalty: A Cross-Sectional Analysis,” 6 Journal of Behavioral Economics 87–107. Cloninger, Dale O., and Roberto Marchesini. 2001. “Execution and Deterrence: A Quasi-Controlled Group Experiment,” 35 Applied Economics 569–76. Cover, James Peery, and Paul D. Thistle. 1988. “Time Series, Homicide, and the Deterrent Effect of Capital Punishment,” 54 Southern Economic Journal 615–22. Daly, Martin, and Margo Wilson. 1988. Homicide. New York: De Gruyter. Davidson, Russell, and James G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University Press. Dezhbakhsh, Hashem, and Paul H. Rubin. 1998. “Lives Saved or Lives Lost? The Effect of Concealed-Handgun Laws on Crime,” 88 American Economic Review 468–74. Ehrlich, Isaac. 1975. “The Deterrent Effect of Capital Punishment: A Question of Life and Death,” 65 American Economic Review 397–417. ——. 1977. “Capital Punishment and Deterrence: Some Further Thoughts and Additional Evidence,” 85 Journal of Political Economy 741–88. ——. 1996. “Crime, Punishment, and the Market for Offenses,” 10 Journal of Economic Perspectives 43–67. Ehrlich, Isaac, and Joel Gibbons. 1977. “On the Measurement of the Deterrent Effect of Capital Punishment and the Theory of Deterrence,” 6 Journal of Legal Studies 35–50. Ehrlich, Isaac, and Zhiqiang Liu. 1999. “Sensitivity Analysis of the Deterrence Hypothesis: Let’s Keep the Econ in Econometrics,” 42 Journal of Law and Economics 455–88. “Execution Reconsidered.” 1999. Economist, July 24, 27. Eysenck, Hans. 1970. Crime and Personality. London: Paladin. Forst, Brian, Victor Filatov, and Lawrence R. Klein. 1978. “The Deterrent Effect of Capital Punishment: An Assessment of the Estimates,” in A. Blumstein, D. Nagin, and J. Cohen, eds., Deterrence and Incapacitation: Estimating the Effects of Criminal Sanctions on Crime Rates. Washington, DC: National Academy of Sciences. Fowler v. North Carolina, 428 U.S. 904 (1976). Furman v. Georgia, 408 U.S. 238 (1972). Glaeser, Edward L., and Bruce Sacerdote. 1999. “Why Is There More Crime in Cities?” 107 Journal of Political Economy 225–58. Glaser, Daniel. 1977. “The Realities of Homicide versus the Assumptions of Economists in Assessing Capital Punishment,” 6 Journal of Behavioral Economics 243–68. Gregg v. Georgia, 428 U.S. 153 (1976). Grogger, Jeffrey. 1990. “The Deterrent Effect of Capital Punishment: An Analysis of Daily Homicide Counts,” 85 Journal of the American Statistical Association 295–303. Hoenack, Stephen A., and William C. Weiler. 1980. “A Structural Model of Murder Behavior and the Criminal Justice System,” 70 American Economic Review 327–41. Kennedy, Peter. 1992. A Guide to Econometrics. Cambridge, MA: MIT Press. Kleck, Gary. 1979. “Capital Punishment, Gun Ownership and Homicide,” 84 American Journal of Sociology 882–910. Layson, Stephen. 1985. “Homicide and Deterrence: A Reexamination of the United States Time-Series Evidence,” 52 Southern Economic Journal 68–89.
Does capital punishment have a deterrent effect? 425 Leamer, Edward. 1983. “Let’s Take the Con out of Econometrics,” 73 American Economic Review 31–43. ——. 1985. “Sensitivity Analysis Would Help,” 75 American Economic Review 308–13. Leibman, James, Jeffrey Fagan, and Valerie West. 2000. “Capital Attrition: Error Rates in Capital Cases, 1973–1995,” 78 Texas Law Review 1839–61. Lott, John R., Jr., and William M. Landes. 2000. “Multiple Victim Public Shootings,” University of Chicago Law and Economics Working Paper. Lott, John R., Jr., and David B. Mustard. 1997. “Crime, Deterrence and Right-toCarry Concealed Handguns,” 26 Journal of Legal Studies 1–69. McAleer, Michael, and Michael R. Veall. 1989. “How Fragile are Fragile Inferences? A Reevaluation of the Deterrent Effect of Capital Punishment,” 71 Review of Economics and Statistics 99–106. McKee, David L., and Michael L. Sesnowitz. 1977. “On the Deterrent Effect of Capital Punishment,” 6 Journal of Behavioral Economics 217–24. McManus, William. 1985. “Estimates of the Deterrent Effect of Capital Punishment: The Importance of the Researcher’s Prior Beliefs,” 93 Journal of Political Economy 417–25. Mocan, H. Naci, and R. Kaj Gittings. Unpublished. “Pardons, Executions, and Homicides,” University of Colorado. National Rifle Association. Passell, Peter, and John B. Taylor. 1977. “The Deterrent Effect of Capital Punishment: Another View,” 67 American Economic Review 445–51. Sah, Raaj K. 1991. “Social Osmosis and Patterns of Crime,” 99 Journal of Political Economy 1272–95. Sellin, Johan T. 1959. The Death Penalty. Philadelphia, PA: American Law Institute. Shepherd, Joanna M. 2002a. “Fear of the First Strike: The Full Deterrent Effect of California’s Two- and Three-Strike Legislation,” 31 Journal of Legal Studies 159–201. ——. 2002b. “Police, Prosecutors, Criminals, and Determinate Sentencing: The Truth about Truth-in-Sentencing Laws,” 45 The Journal of Law and Economics 509–34. Snell, Tracy L. 2001. Capital Punishment 2000. Washington, DC: U.S. Bureau of Justice Statistics (NCJ 190598). Stephen, James. 1864. “Capital Punishment,” 69 Fraser’s Magazine 734–53. U.S. Department of Commerce, U.S. Bureau of the Census, Current Population Reports (1977–1996). U.S. Department of Commerce, Bureau of Economic Analysis, Regional Economic Information System (1977–1996). U.S. Department of Justice, Bureau of Justice Statistics, Capital Punishment (1977–1996). U.S. Department of Justice, Bureau of Justice Statistics, Expenditure and Employment Data for the Criminal Justice system (1977–1996). U.S. Department of Justice, Bureau of Justice Statistics, National Prisoner Statistics Data Series (1977–1996). U.S. Department of Justice, Federal Bureau of Investigation, Uniform Crime Reports for the United States (1977–1996). Yunker, James A. 1976. “Is the Death Penalty a Deterrent to Homicide? Some Time Series Evidence,” 5 Journal of Behavioral Economics 45–81.
426
Hashem Dezhbakhsh, Paul H. Rubin and Joanna M. Shepherd
Zimmerman, Paul R. 2001. “Estimating the Deterrent Effect of Capital Punishment in the Presence of Endogeneity Bias,” Federal Communications Commission Manuscript. Zimring, Franklin E. 1977. “Determinants of the Death Rate from Robbery: A Detroit Time Study,” 6 Journal of Legal Studies 317–32. Zimring, Franklin E., and Gordon Hawkins. 1986. Capital Punishment and the American Agenda. Cambridge, MA: Cambridge University Press.
Index
accidental deaths: concealed weapons and 140, 185–7 accountability: government and the state 2 Ades, A. 68, 69 advertising 2, 18–19, 45; economics of 19–22; false see false advertising; markets for ideas and 56–7 Akerlof, George A. 46 Altermann, Eric 90 Andreoni, James 6 arbitration 30 Arendt, Hannah 60 arms see weapons Atkins, Raymond A. 4 auditing 364 automatic penalties 363–4 awards 328–9, 347–8; comparison of decision process of judges and juries 345–7; comparison with of judge versus jury trials 329–30; descriptive statistics 348–50; duration results 350; Heckit model 335–7, 340; mean awards and win rates 330–1; sample selection and 331–5; selection effects 337–45 bargaining: pre-trial 251 Baron, David 90 Baumol, William J. 55 Beccaria, C. B. 370 Becker, Gary S. 123, 131, 222, 227, 355, 356, 376, 403 beliefs: markets for news and 91, 92, 95 Bentham, Jeremy 271, 272, 273, 288 Bernstein, D. E. 329, 347 Besley, Timothy 90 bias 91, 96; heterogeneous reader beliefs
92–3, 95, 99–106; homogeneous reader beliefs 95, 97–9 bonds 364–5 Breton, Albert 1 broadcasting: regulation of 12–13 Brown, D. 286 Brumm, Harold J. 402 Brunetti, Aymo 2 Bucke, T. 286 Callmann, Rudolf 35 Cameron, Samuel 406 capital punishment: deterrence and 6–7, 370–2, 380–93, 398–404, 420–1; incapacitative effect 389–90; model 380–9, 404–20; trade-off between executions and murders 390–1 Cassell, Paul 228, 286 Christiansen, V 364 class actions 30, 348 Clermont, Kevin M. 330 Cloninger, Dale O. 402, 403 Coase, Ronald H. 1, 44–5, 48, 57 collusive corruption 70–1 commercial speech, freedom of see advertising compensation see awards competition 45, 46; markets for news and 92, 106; political 60; scientific ideas 49, 51, 52; unfair 18 concealed weapons 3, 129–31, 187–8, 201–2, 217–18; accidental deaths and 140, 185–7; data for studies 136–40, 188–91, 211–17; economic model of crime and 203–8; empirical evidence on effect of ‘shall issue’ provisions 140–85, 206–8; endogeneity of arrest rates and passage of concealed handgun laws 162–70; estimation
428
Index
methods and issues in model 208–11; method of murder and choice of murder victims 170–3; modeling effect of ‘shall issue’ law 204–6; problems testing effect of ‘shall issue’ provisions on crime 131–5 confessions 271, 272, 273, 280, 283–8 constitutional protections 1 contingency fees 339 Cook, Philip J. 129, 203 Cooter, R. 251 corruption 2; collusive 70–1; determinants 67, 68–9, 73–4; extortive 69–70; measures of 72–3, 81; methodology of study 74–5; panel data evidence 83–4; press freedom and 67–85; results of study 75–9 costs of litigation 30 Coulter, Ann 90 counsel: right to 230, 233 courts 45; market for ideas and 48–9; see also trials Crawford, V. 274 credence 20 criminal procedure 222–4, 237; confessions 271, 272, 273, 280, 283–8; due process see due process; exclusion of evidence see exclusionary rule; fines see fines; juries see juries; model of criminal trials 252–9; plea bargaining 264–5; pre-trial bargaining 251; right to counsel 230, 233; right to silence see silence right; wrongful convictions 272, 273, 280, 282, 287–8 damages awards see awards Darby, Michael R. 20 death penalty see capital punishment defamation 25 Demsetz, Harold 51 Derenberg, Walter 35 deterrence 131, 356; capital punishment 6–7, 370–2, 380–93, 398–404, 420–1; carrying weapons 3, 123, 201; magnitude of deterrent effect 416–17; maximal 356, 362; model 380–9, 404–20; nondeterrable murder 406–7 Dezhbakhsh, Hashem 3, 7 Di Tella, R. 68, 69 Director, Aaron 10, 11, 14 disclosure of evidence 4, 250, 251, 265–6; model of 254, 255–9; plea bargaining and 264–5; social welfare and 260–4
disparagement 25, 26 Djankow, Simeon 90 Donohue, J. J. 264 Douglas, William O. 248 due process 248, 250 economic freedom: freedom of speech and 2 efficiency wage models 364 Ehrlich, Isaac 6, 123, 135, 162, 170, 210, 227, 398, 400, 401, 403, 406, 420 Eisenberg, Theodore 330 Evans, P. 68 evidence 5; disclosure of see disclosure of evidence; exclusion of see exclusionary rule; right to silence see silence right exclusionary rule 222, 223, 224–5, 237; data for studies 226–7, 228–30, 238–41; empirical results of studies 228, 230–5; impact of Mapp case over time 235–7; model specification 227; previous empirical studies 225–6 experience goods 19, 20, 56 extortive corruption 69–70 Fagan, Jeffrey 414 false advertising 18–19, 20, 21, 36–7; claims about competitors’ product 25–6; competitors’ remedies 24–6, 31–4; false claims 24–5; Federal Trade Commission cases 28–9; German experience 35; Lanham Act 31–4; producer identity 26–8; purchasers’ remedies 22–4, 29–31 fines 355–6, 357, 363–5; encouragement of criminal behavior and 360–2 Fowles, R. 286 Franken, Al 90 Friedman, Milton 51, 53, 93 game theory 7, 274 Gay, Gerald D. 5, 251 Gentzkow, Matthew A. 102 Gittings, R. Kaj 403 Glaeser, Edward L. 106, 137, 415 Glaser, Daniel 406 Goldberg, Bernard 90 Goodman, John C. 18 government and the state: accountability 2; regulation by see regulation of markets Graber, Doris 91 grievance procedures 365 gun control see weapons
Index Hagan, J. 363 Hayakawa, Samuel I. 91 Hayek, Friedrich A. 60 Helland, Eric 6, 332 Hindricks, J. 69 Hoenack, Stephen A. 407 ideas: markets for see markets for ideas interactions between rights 7 Jacob, Herbert 228, 230 joint and several liability 339 Jordan, Ellen R. 2 judges 5, 303–4, 322–3; data and test result of trial mode choice model 315–16; evidence on trial mode choice model assumptions 316–21; implications of trial mode choice model 313–15; lenient 310; mean awards and win rates 330–1; model for choice of trial mode 305–13; naive 306, 311; proofs of trial mode choice model 323–4; sample selection and awards 331–5; strategic actions 311–13 juries 5–6, 251, 303–5, 322–3; awards by see awards; data and test result of trial mode choice model 315–16; evidence on trial mode choice model assumptions 316–21; implications of trial mode choice model 313–15; model for choice of trial mode 305–13; naive 306–10; proofs of trial mode choice model 323–4; racial discrimination by 264; reasonable doubt and 357–9; strategic actions 311–13 Kalven, H. 305, 320, 330 Kaplow, L. 356 Karni, Edi 20 Kaufman, D. 69 Keynes, John Neville 51 King, N. J. 264 Klevorick, A. K. 303 Klitgaard, R. 69, 70 Kolm, S.-C. 364 Kornhauser, L. A. 339 Kuhn, Thomas S. 50, 51, 52, 55 labor market: efficiency wage models 364 Lakatos, Imre 49 Landes, William M. 18, 363, 403 Lanham Act 31–4
429
lawyers: contingency fees 339; right to counsel 230, 233 Lee, R. 69 Leibman, James 414 Leitzel, James A. 203 Leland, Hayne E. 46 Levitt, S. D. 264 Lipman, B. 274 Lott, John R. 3, 123, 202, 208, 210, 211, 217, 218, 403 Ludwig, Jens 207 McDowall, David 130 Malik, A. 356 Marchesmi, Roberto 403 markets: failure 15; ideas see markets for ideas; news see markets for news; regulation see regulation of markets markets for ideas 1, 9–17, 44–8, 61; commercial advertising and goods markets 56–7; courts and 48–9; monopoly in ideas 1–2; politics 2, 45, 57–61; scientific ideas 49–56 markets for news 90–3, 106–7; accuracy and reader heterogeneity 103–6; crosschecking 104–6; defining bias 96; heterogeneous reader beliefs 92–3, 95, 99–106; homogeneous reader beliefs 95, 97–9; lemmas 107–9; model setup 93–6; newspaper strategy 94–5; proofs of propositions 109–18; rational readers 96–7; reader utility 94; slanting 91–2, 103–4 Marshall, Alfred 51 Mauro, P. 69 mediation 30 medical malpractice 329, 332, 348 Mialon, Hugo M. 2, 4 Milgrom, P. 274 Milton, John 13–14, 44 misrepresentation: passing off 26–7 Mocan, H. 403 monopolies 9, 52; in ideas 1–2 Mookherjee, D. 357 Mullainathan, Sendhil 2 murder 372–4; defense against 376–80; effects of employment opportunities, income and demographic variables 375–6; effects of probability and severity of punishment 374–5, 379–80; nondeterrable 406–7; supply function 380; trade-off between executions and murders 390–1; see also capital punishment
430
Index
Mustard, David B. 3, 123, 202, 208, 210, 211, 217, 218 Nelson, Philip 19, 20, 56 news media see markets for news; press/ media freedom Newton-Smith, W. H. 49 non-rationalism 50 origin of goods 27 Orwell, George 60 paradigms 50, 52–4 passing off 26–7 perjury 4 personal injuries cases 329; awards 329, 332, 334 plea bargaining 264–5 P’ng, I. 357 police: factors determining optimal law enforcement activity 376–9; racial discrimination by 264; searches by see searches by police; silence in detention see silence right Polinsky, A. M. 356, 363 politics: influence over media 71–2; markets for ideas and 2, 45, 57–61; markets for news and 106, 107 Popper, Karl 49 Posner, Richard A. 18, 24, 28–9, 36, 356, 363 poverty: awards and 334–5 Prat, Andrea 90 premises liability 332 press/media freedom 1, 2, 12; corruption and 67–85; measure of 71–2; measures of 81–3; panel data evidence 83–4; political influence 71–2; protection of sources 12; see also markets for news pre-trial bargaining 251 product liability 329, 332, 348 punishment 6; death penalty see capital punishment; deterrent effect 131; effects of probability and severity of punishment 374–5, 379–80; fines see fines; scaling 6; severe gun-crime punishment 126–7 racial discrimination 264 Rahman, R. 68 rationality: markets for news and rational readers 96–7; rationalism 49, 50, 51 Rauch, J. 68
Rawls, John 53 reasonable doubt 355–6, 365; criminal behavior and 359–60; jurors’ behavior and 357–9; theoretical model 356–60 regulation of markets 9–10, 15–16, 44–5; broadcasting 12–13; freedom of speech and 46–8, 71 Reinganum, J. F. 264 representation: right to 230, 233 Revesz, R. L. 339 Roberts, J. 274 Rothschild, M. 303 Rubin, Paul H. 2, 3, 4, 7, 18 Rubinfield, D. 251, 356 Sacerdote, Bruce 137, 415 Sah, Raaj K. 413, 414 Samuelson, Paul A. 55 Sanchirico, C. 274 Sappington, D. E. M. 356 science: economic model 51–6; markets for scientific ideas 49–56; models 49–50 search goods 19, 30, 56 searches by police 3–4, 224, 225–6, 237; data for studies 226–7, 228–30, 238–41; empirical results of studies 228, 230–5; impact of Mapp case over time 235–7; model specification 227; previous empirical studies 225–6 Seidmann, Daniel 4, 251, 274 self-defense 2–3 self-esteem 11 self-interest 11, 14, 16 Sellin, T. 370, 371, 391 Seppi, D. 274 Severin, Werner J. 91 shall-issue laws see concealed weapons Shapiro, Jesse M 102 Shavell, S. 356, 357, 363 Shepherd, Joanna M. 7 Shin, H. S. 251 Shleifer, Andrei 2, 69 silence right 4–5, 248–51, 251–2, 265–6, 271–5, 288–9; model of 255–9, 275–9, 289–98; plea bargaining and 264–5; results of model 279–88; social welfare and 259–64 small claims courts 29–30 Sobel, J. 274 speech, freedom of 1; economic freedom and 2; market for ideas see markets for ideas; press see press/media freedom;
Index regulation 46–8, 71; see also advertising Stein, A. 251, 274 Stigler, George J. 50, 356 Street, R. 286 strike suits 30
431
Van Rijckeghem, C. 68 Vishny, R. 69
Tabarrok, Alexander 6, 332 Tankard, James W. 91 Tanzi, V. 69 tax evasion 364 totalitarianism 60–1 trade names 27–8 trademarks 26–7, 28 trials: model of criminal trials 252–9; plea bargaining 264–5; pre-trial bargaining 251; right to counsel 230, 233; see also evidence; judges; juries; silence right Tullock, Gordon 60
weapons 2–3; concealed see concealed weapons; deterrent effect 3, 123, 201; facilitation effect 3, 123, 201; full gun control 126–7; gun control 123–4; marginal gun control 125; model of crime and gun control 124–7; severe gun-crime punishment 126–7 Weder, Beatrice 2, 68 Weiler, William C. 407 West, Valerie 414 Wicksell, Knut 51 Winship, C. 303 Wintrobe, Ronald 1 Wiseman, Tom 2 wrongful convictions 272, 273, 280, 282, 287–8
unfair competition 18 unraveling 252
Zeisel, H. 305, 320, 330 Zimmerman, Paul R. 403