INTRODUCTION TO THE SERIES The aim of the Handbooks in Economics series is to produce Handbooks for various branches of ...
177 downloads
2043 Views
36MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
INTRODUCTION TO THE SERIES The aim of the Handbooks in Economics series is to produce Handbooks for various branches of economics, each of which is a definitive source, reference, and teaching supplement for use by professional researchers and advanced graduate students. Each Handbook provides self-contained surveys of the current state of a branch of economics in the form of chapters prepared by leading specialists on various aspects of this branch of economics. These surveys summarize not only received results but also newer developments, from recent journal articles and discussion papers. Some original material is also included, but the main goal is to provide comprehensive and accessible surveys. The Handbooks are intended to provide not only useful reference volumes for professional collections but also possible supplementary readings for advanced courses for graduate students in economics.
EDITORS' INTRODUCTION The field of Public Economics has been changing rapidly in recent years, and the sixteen chapters contained in this Handbook survey many of the new developments. As a field, Public Economics is defined by its objectives rather than its techniques, and much of what is new is the application of modern methods of economic theory and econometrics to problems that have been addressed by economists for over two hundred years. More generally, the discussion of public finance issues also involves elements of political science, finance and philosophy. These connections are evidence in several of the chapters that follow. Public Economics is the positive and normative study of government's effect on the economy. We attempt to explain why government behaves as it does, how its behavior influences the behavior of private firms and households, and what the welfare effects of such changes in behavior are. Following Musgrave (1959) one may imagine three purposes for government intervention in the economy: allocation, when market failure causes the private outcome to be Pareto inefficient, distribution, when the private market outcome leaves some individuals with unacceptably low shares in the fruits of the economy, and stabilization, when the private market outcome leaves some of the economy's resources underutilized. The recent trend in economic research has tended to emphasize the character of stabilization problems as problems of allocation in the labor market. The effects that government intervention can have on the allocation and distribution of an economy's resources are described in terms of efficiency and incidence effects. These are the primary measures used to evaluate the welfare effects of government policy. The first chapter is this volume, by Richard Musgrave, presents an historical development of these and other concepts in Public Finance, dating from Adam Smith's discussion in The Wealth of Nations of the role of government and the principles by which taxes should be set. The remaining chapters in the Handbook examine different areas of current research in Public Economics. Analyses of the efficiency and incidence of taxation, developed in Musgrave's chapter, are treated separately in Alan Auerbach's chapter in the first volume and Laurence Kotlikoff's and Lawrence Summers' chapter in the second volume, respectively. Auerbach surveys the literature on excess burden and optimal taxation, while Kotlikoff and Summers discuss various theoretical and empirical approaches that have been used to measure the distributional effects of government tax and expenditure policies. These general analyses of the effects of taxation form a basis for the consideration of tax policies in particular markets or environments, as is contained in the chapters by Jerry Hausman, Agnar Sandmo, Avinash Dixit, Harvey Rosen, John Helliwell and Terry Heaps, and Joseph Stiglitz.
xvi
Editors'Introduction
Hausman discusses the effects of taxation on labor supply, including a treatment of how one empirically estimates such effects in the presence of tax and transfer programs. He also considers the incentive effects of social welfare programs such as unemployment compensation and social security. Sandmo focuses on the other major factor in production, capital, dealing with theory and evidence about the effects of taxation on private and social saving and risk-taking. Dixit shows how the basic results about the effects of taxation may be extended to the trade sector of the economy, casting results from the parallel trade literature in terms more familiar to students of Public Finance. Rosen's chapter brings out the characteristics of housing that make it worthy of special consideration. He considers the special econometric problems involved in estimating the response of housing demand and supply to government incentives. Because of its importance in most family budgets and its relatively low income elasticity of demand, housing has been seen as a suitable vehicle for government programs to help the poor, and Rosen discusses the efficiency and incidence effects of such programs. Helliwell and Heaps consider the effects of taxation on output paths and factor mixes in a number of natural resource industries. By comparing their results for different industries, they expose the effects that technological differences have on the impact of government policies. Stiglitz treats the literature on income and wealth taxation. The remaining chapters in the Handbook may be classified as being on the "expenditure" side rather than the "tax" side of Public Finance, though this distinction is probably too sharp to be accurate. In Volume I, Dieter B6s surveys the literature on public sector pricing, which is closely related both to the optimal taxation discussion in Auerbach's chapter and Robert Inman's consideration, in Volume II, of models of voting and government behavior. The question of voting and, more generally, public choice mechanisms, is treated by Jean-Jacques Laffont in his chapter. The chapters by William Oakland and Daniel Rubinfeld focus on the provision of "public" goods, i.e., goods with sufficiently increasing returns to scale or lack of excludability that government provision is the normal mode. Oakland considers the optimality conditions for the provision of goods that fall between Samuelson's (1954) "pure" public goods and the private goods provide efficiently by private markets. Rubinfeld surveys the literature on a special class of such goods: local public goods. Since the work of Tiebout (1956), much research has been devoted to the question of whether localities can provide efficient levels of public goods. The other two chapters in Volume II also deal with problems of public expenditures. Anthony Atkinson considers the effects of the range of social welfare programs common in Western societies aimed at improving the economic standing of the poor. Some of these policies are touched on in the chapters by Hausman and Rosen, but the coexistence of many different programs itself leads
Editors' Introduction
xvii
to effects that cannot be recognized by examining such programs seriatim. Jean Dreze and Nicholas Stern present a unified treatment of the techniques of cost benefit analysis, with applications to the problems of developing countries. References Musgrave, R.A., 1959, The theory of public finance (McGraw-Hill, New York). Samuelson, P.A., 1954, The pure theory of public expenditures, Review of Economics and Statistics 36, 387-389. Tiebout, C.M., 1956, A pure theory of local expenditures, Journal of Political Economy 94, 416-424.
CONTENTS OF THE HANDBOOK VOLUME I Editors' Introduction Chapter 1
A Brief History of Fiscal Doctrine R.A. MUSGRAVE Chapter 2
The Theory of Excess Burden and Optimal Taxation ALAN J. AUERBACH Chapter3
Public Sector Pricing DIETER BOS Chapter 4
Taxes and Labor Supply JERRY A. HAUSMAN Chapter 5
The Effects of Taxation on Savings and Risk Taking AGNAR SANDMO Chapter 6
Tax Policy in Open Economies AVINASH DIXIT Chapter 7
Housing Subsidies: Effects on Housing Decisions, Efficiency, and Equity HARVEY S. ROSEN Chapter 8
The Taxation of Natural Resources TERRY HEAPS and JOHN F. HELLIWELL
VOLUME II Chapter 9
Theory of Public Goods WILLIAM H. OAKLAND
viii
Contents of the Handbook
Chapte 10
Incentives and the Allocation of Public Goods JEAN-JACQUES LAFFONT Chapter 11
The Economics of the Local Public Sector DANIEL RUBINFELD Chapter 12
Markets, Government, and the "New" Political Economy ROBERT INMAN Chapter 13
Income Maintenance and Social Insurance ANTHONY B. ATKINSON Chapter 14
The Theory of Cost-Benefit Analysis JEAN DREZE and NICHOLAS STERN Chapter 15
Pareto Efficient and Optimal Taxation and the New New Welfare Economics JOSEPH STIGLITZ Chapter 16
Tax Incidence LAURENCE KOTLIKOFF and LAWRENCE SUMMERS
Chapter 9
THEORY OF PUBLIC GOODS WILLIAM H. OAKLAND Tulane University
1. Introduction This chapter is concerned with the class of goods which has come to be known as public goods or collective goods. A distinctive characteristic of such goods is that they are not "used up" in the process of being consumed or utilized as an input in a production process. As Samuelson (1954, 1955) has so succinctly stated, public goods have the property that "one man's consumption does not reduce some other man's consumption". This contrasts sharply with private goods which, once consumed or utilized as an input, can no longer be of service to others. To be considered public a good must also be of interest to more than one consumer or firm. Otherwise, the fact that the consumption possibilities of others are undiminished is irrelevant. A fireworks display in the middle of an empty desert is a private good even though the same display in a city park would be a public good. A public good may also create disutility or reduced profits. A cigarette smoked in a crowded classroom is an example. The disutility suffered by one non-smoker does not reduce the disutility suffered by other non-smokers. Similarly, the reduced profits suffered by one fisherman because of water pollution does not reduce the loss to other fishermen. Public goods are of particular relevance to public policy because they tend to be inefficiently provided by private arrangements such as the market mechanism. Consider first the category of public "goods" as opposed to public "bads". Because they are not used up in the act of consumption of production, the marginal cost of extending service to additional users is zero. Private provision of the good, however, will necessitate revenues from users in order to defray the cost of producing the good. Such charges will usually lead some potential users to forgo consumption, creating a deadweight efficiency loss. Handbook of Public Economics, vol. II, edited by A.J. Auerbach and M. Feldstein © 1987, Elsevier Science Publishers B. V. (North-Holland)
W.H. Oakland
486
A further source of difficulty is encountered if the public good has the property that potential users cannot be costlessly excluded from its services. In the event that exclusion costs are prohibitive, private sellers will be unable to use prices to finance production costs. Instead, the latter will have to be covered by voluntary contributions of users-a mechanism unlikely to provide an efficient outcome because one can enjoy the good whether or not he/she contributes. If exclusion is feasible, but costly, any outlays for exclusion are themselves a source of inefficiency. While the inability to exclude costlessly exacerbates the efficiency problems of private provision of public goods it is not essential for market failure. The fact that the marginal cost of additional users is zero is itself sufficient to insure market failure. This is not true for public "bads", however. If exclusion costs are zero, no-one will avail themselves of the potential harm caused by the public bad. If exclusion costs are non-zero private markets will generally provide a superoptimal supply of the public bads. The supplier of the public bad will often have little incentive to consider the damages imposed on others, whether the damages arise from exclusion outlays or actual harm from the public bad. Nevertheless, the question of whether such public bads will be overprovided by private arrangements has been the source of contention in the literature. A full treatment of this issue is deferred until later in the chapter. We will begin our analysis with the derivation and interpretation of the efficiency conditions for pure public goods. These conditions are then generalized to cover cases which are intermediate between pure public goods and pure private goods, such as externalities. The positive issues of how private market arrangements will provide for public goods are treated next. The analysis concludes with a discussion of alternative mechanisms for making public goods decisions within the public sector.
2. Pure public goods Consider an economy consisting of N individuals. Let X be the quantity of a good which is produced and x' be the individual i's consumption of the good. Then a good is purely public if xi<X,
i=
,..., N,(1)
with the possibility that the equality can hold for all i simultaneously. The weak inequality is used to allow for the possibility of exclusion; if exclusion costs are prohibitive, (1) will take the form of an equality.
487
Ch. 9: Theory of Public Goods
On the other hand, a good is purely private if Exi < X.
(la)
N
That is, the sum of the individual consumptions cannot exceed total production. Given production, and assuming N = 2, the consumption possibilities for pure public and pure private goods are shown in Figure 2.1. For private goods, consumption possibilities are located within the triangle XOX, while for public goods they must fall within the rectangle AXOX if exclusion is costless. If exclusion is not possible the consumption possibilities for the public good collapse to the point A. Inspection of Figure 2.1 or comparison of equations (1) and (la) quickly reveal the fundamental difference between pure private and pure public goods- the total absence of conflict between the consumption of different individuals when the good is public. In Figure 2.1, for example, person l's consumption is irrelevant to person 2's consumption possibilities when the good is public. If the good is private, and total production is allocated to consumers, an increase in person l's consumption must be accompanied by an equal decrease in person 2's possibilities. As Musgrave (1969) has aptly stated, public goods involve no "rivalryness" among consumers while private goods involve perfect "rivalryness".
X
A
Figure 2.1
488
W.H. Oakand
Let us turn to the question of optimal resource allocation in a world with public goods. Without loss of generality we can consider a world with two goods, one purely public and the other purely private. Efficiency is defined by the following problem: max ui(y i, xi) s.t. B i -
ui
(y i , x i )
0,
j = 1,..., N
N
E_ yk
0.
3.4. Externalities As mentioned above, public goods are sometimes provided as a by-product of the provision of some private good. The public goods so provided may generate either positive or negative benefits. Cigarette smoke serves as an example of the latter, whereas inoculation against communicable diseases represents examples of the former. What appears to differentiate these cases from pure public goods is
497
Ch: 9: Theoy of Public Goods
that rivalry exists among consumers for the good which gives rise to the external benefits or costs. That is, an increase in one person's allocation of cigarettes must be at the expense of some other person's allocation. Nevertheless, such a distinction is more apparent than real. Goods with external effects fit the mold of pure public goods precisely.7 Consider a two-good world where one of the goods is strictly private, whereas the other has the property that, in addition to the benefits it provides the direct consumer, it confers positive benefits to other agents in the economy. 8 Moreover, let the external benefits have to property of pure public goods in that one person's consumption of the externality does not reduce the consumption possibilities of others. The utility function of each individual can be written Ui = ui(yi i,
,Zi.,..., Z-, Z+l, . .,ZN),
...
,
(30)
with the constraints on production and distribution being Ey'i Y,
(31)
N
Exi < X,
(32)
N
z'<xJ,
i, i= 1,...N,j=1,...,N,
F(X, Y) = 0.
(33)
(34)
As before, y represents the private good. x' now corresponds to an individual's allocation of the good, X, which gives rise to the external benefits, zj (j 4 i). Note that from this standpoint of individual j, xi (i j) is a pure public good. On the other hand, from the vantage point of individual i, x' is a pure private good. For i's allocation of x can only be increased by reducing someone else's allocation. The good x, therefore, appears to have properties of both private and public goods. To consider the optimal resource allocation for such an economy let us recognize that exclusion from external benefits can never be optimal. Also assume that individuals are not sated by their direct consumption of X. Hence, (32) and (33) will hold as equalities. Our problem, then, is to maximize U'(y, 7 See 8
xi, ... , X
N )
Samuelson (1969b) and Mishan (1969, 1976). As with our discussion of pure public goods we will focus on the case of positive benefits. Negative benefits create no new issues that have not already been discussed.
498
W.I. Oakland
subject to Bi= ui(yjtY
Eyk
, ,...xN, XN) ,.....
i,
=
N,
Y 0 and f" < 0. Output can be divided among expenditures on a public good, X, and private goods, Y. If individuals are identical, ny = Y, and the region's budget constraint is ny + C(X) =f(n).
(65)
Consider the problem of choosing the optimal level of public good for a fixed population and where the utility function is u(y, X). The allocation must satisfy the Samuelson rule, nu_ = C'(X),
(66)
Uy
and the constraint (65). Together, these define the indirect utility function, V(n), where Uy
V'(n)= n f'-y.
(67)
As in the case of clubs, there is insufficient structure to assure that V(n) has a maximum or if it does, that it is unique. As before, we will assume that (67) has a unique zero and that V" < 0 at this point. It follows that the optimal population requires that workers be paid, after taxes, a wage equal to their marginal product. Moreover, it will be the case that C( X) =f(n) - nf'(n).
(68)
That is, aggregate expenditure on public services is equal to aggregate rents. Moreover, if the underlying production function exhibits constant returns to scale in labor and land, public expenditures will equal land rents. Since the structure of this problem is so similar to the club model, there is little merit in pursuing a parallel discussion. Instead, we can conclude with a statement of the welfare optimum condition for the division of an identical population among two regions: r
r"Yv
f
-l-L,
.......
1
w·
v·.
(69)
Ch. 9: Theory of Public Goods
509
which is equivalent to (59'). Thus, differences in private income should reflect wage differences only if utility is the same in the two regions. 2 3 If the high wage region is also the region with greater utility, it must be the case that differences in private consumption are less than the wage differentials. Here, a tax on the wages of the workers in the high wage region would be necessary to achieve an optimum. On the other hand, if the high wage region is the one with lower utility, we would need a subsidy to wages in that region. For efficiency, we only require that the interests of a potential migrant be in opposition to the interests of his community. Thus, a region which would prosper by a reduction in population would not contain any members who would voluntarily move out.
4. Private provision of public goods We turn our attention now to the question of how, in the absence of governmental intervention, private markets would tend to provide for public goods. Without such information it is not possible to decide whether public intervention is warranted and, if it is, what is the best form it could take. The market's performance will depend critically upon the characteristics of the public good such as possibilities for exclusion, the number of individuals who derive utility from it, the presence of large benefits to direct consumers, and the scale economies of public good production. Moreover, it may also depend upon the legal-institutional framework within which market transactions occur. Because of these complexities, few generalizations can be made beyond the observation that private markets will tend to underprovide public goods and to overprovide public bads. Whether market performance is sufficiently inadequate to warrant public intervention will depend upon the particular circumstances. Indeed, there may be cases where private markets will perform efficiently. Owing to this diversity of possible outcomes we will only discuss some of the more important special cases.
4.1. No exclusion, small number of participants 4.1.1. "Public goods" Consider a situation where the number of consumers deriving utility from a public good is small and where exclusion is impossible. An example of such a case would be a small island inhabited by two consumers, A and B. Each is 23 Flatters et al. (1974) derive an expression similar to (69) but without the term involving the welfare function. This difference reflects their requirement that utility in the two regions must be equal.
510
W.H. Oakland
interested in the control of mosquitoes.4 If A eliminates mosquitoes it benefits B and vice versa. To keep the argument simple, we assume that the resources available to each consumer are fixed and that it costs one unit of private good to obtain one unit of mosquito control. That is, for each consumer we have xi +y = Yi,
i = A,B,
(70)
where x is mosquito control purchased by i. y and Y' are private good consumption and income, respectively. Each consumer has a utility function U = Ui(XA + xB, yi),
(71)
where we assume that each individual derives equal utility from his own and the other party's mosquito control. To begin with, let us assume that the individuals live at opposite ends of the island so that each is unaware of the others mosquito control activity except as it is reflected in the size of the mosquito population. Hence, each will take the x of the other as given when making his own choices. The first-order conditions for utility maximization are then (70) and --=1,
i=A,B.
(72)
uy
Solving equations (70) and (72) for each party yields the reaction functions: XA = XA(XB);
XB = XB(XA).
Assuming that an increase in the other's control activities lead to a reduction in one's own, these reaction functions will appear as in Figure 4.1. Equilibrium levels of xA and xB are given by point E in the diagram. Clearly, at E the level of mosquito control is sub-optimal. For, by (72), the total benefit of an additional unit of mosquito control equals 2, while the cost of the unit is only 1. It would appear that the extent of market failure in this case is quite substantial-marginal benefits being twice marginal cost. But this result stems from the assumption that each party is indifferent between his direct consumption of mosquito control and that of the other consumer. This would be the case for a pure public good. If, however, we were considering a consumption externality, the inefficiency might be quite modest. This would be so if each consumer derives considerably more satisfaction from his own consumption than for that of 24
This example is drawn from Buchanan (1968).
511
Ch. 9: Theory of Public Goods R
XD
AO
A
X
Figure 4.1
the other party. Whether or not significant resource misallocation develops in this case, turns upon the relative marginal utility of direct and indirect consumption of the good. Some authors, however, would deny that any misallocation would result from market provision in this particular case.2 5 At points such as E in Figure 4.1, both parties will be aware of the mutual benefits of expanding mosquito control. For example, if B were to pay A for expanding his consumption, A would have the incentive to do so. As long as the aggregate benefits of expansion exceed production costs, it will be possible for the two parties to work out a cost-sharing scheme for additional output that will benefit both. Such possibilities are shown in Figure 4.2, where the indifference curves of A and B are superimposed on the reaction functions. 26 Note that the reaction functions pass through the peaks of the indifference curves. Consequently, at the point of intersection, E, the indifference curves also intersect. By sharing costs of additional units of x, the parties could move to a point between P and P' on the "contract curve", the locus of efficient outcomes. Since both parties are interested 25 See, 26
for example, Buchanan (1968, pp. 45-46). The indifference curves are arrived at by substituting for y from the budget constraint into the utility functions.
512
W.H. Oakland
Figure 4.2
in higher levels of utility, the argument goes, the necessary agreement will be forthcoming. While this logic is appealing, it flies in the face of reality. Consider defense expenditure. Clearly, all nations would be better off if such expenditure were eliminated. Nevertheless, they continue and have tended to escalate. A moment's reflection suggests that defense outlay is exactly analogous to the mosquito control example; one nation's outlay creates a negative externality to the other. The failure of bargaining to achieve an efficient outcome, in large measure, owes to the multiplicity of efficient outcomes. 27 In Figure 4.2, all points along ZA and ZB are efficient and have the property that no person is worse off than he/she would be in isolation. Since the points ZA and ZB carry substantially different utility to the parties, it is not implausible that each should attempt to capture as large a share of the gains as possible - even if it means not achieving an efficient outcome. Indeed, the point E, itself an inefficient outcome, may not be achievable via bargaining. While E corresponds to the so-called "Nash" bargaining solution, the subjective nature of the benefits gives both parties the incentive to under-represent their true preferences. Thus, the reaction curves 27
For a similar view about the limitations of bargaining, see Lipnowski and Maital (1983).
Ch. 9: Theory of Public Goods
513
revealed during the bargaining process may be considerably below those shown in Figure 4.2. The preceding difficulties caused by bargaining are referred to as transactions costs by those who believe market outcomes tend towards efficiency. 28 It is alleged that, when the number of parties is small, transactions costs will also be small. The remarks in the preceding two paragraphs suggest this view may be an act of faith instead of a scientifically derived hypothesis. 4.1.2. "Public bads"
The analysis of the previous subsection carries over with little modification to the instance where the externality generates negative benefits. Nevertheless, since this case has commanded wide attention in the literature it is useful to spend a little time on it.
In a celebrated article, Coase (1960) argued that as long as property rights were defined unambiguously, private markets would tend to efficiently allocate goods with negative externalities when the number of parties involved is small. A corollary to this hypothesis is that it makes little difference whether the property right is assigned to the person generating the externality or to the party affected by it. When the externality confers positive benefits the issue of property rights does not arise. It is difficult to conceive of the benefactor having the legal right to force an individual to increase his/her consumption of the externality generating activity.2 9 In the case of a negative externality, however, it is not unreasonable to confer the right to enjoin the damaging activity to the party being damaged. Consider the case of air pollution. If the pollutee has the right to clean air the polluter will have to bribe him for permission to emit pollutants. The level of the bribe will reflect the benefits to the polluter from using the air to eliminate wastes. If damage is a continuous increasing function of emissions, the pollutee will allow emissions up to the point where the marginal damage to him is equal to the payment for the marginal unit of pollutant. The latter, in turn, will be equal to the marginal benefit to the polluter. Hence, pollution will be carried out at optimal levels. If the polluter could not be enjoined by the pollutee, it would be the pollutee who would have to do the bribing. By similar reasoning efficiency would be reached. The reader will recognize the parallelism between this reasoning and that used to demonstrate efficiency for the case of positive externalities. Hence, it shares the same limitations. Namely, there are incentives for the parties to misrepresent their preferences so as to gain a disproportionate share of the gains from trade. 28 See, 29
for example, Coase (1960). A possible exception may be compulsory elementary education.
514
W.H. Oakland
For example, if the polluter has the property right, he will have an incentive to overstate the level of pollution that he would generate in the absence of any monetary payment. This will have the effect of raising the willingness of the pollutee to pay for any level of reduction in emission levels. While such a strategy could still yield an optimal level of pollution it would not do so if the pollutee misrepresented his true preferences. The latter has strong incentives to do so in view of the fact that the total payment he will be asked to make will depend upon his revealed willingness to pay for inframarginal units.30 In other words, the polluter will attempt to extract the pollutee's entire consumer surplus from emission reduction. He will do so by making an all or nothing offer based upon his estimate of the pollutee's demand curve for pollution reduction. Recognizing that some consumer surplus is better than none, the pollutee will understate his demand for reduction. This tendency together with the polluter's incentive to overstate base levels of pollution will lead to super-optimal levels of pollution. 31 The major distinction between external "bads" and external "goods" has to do with distribution outcomes rather than to efficiency properties. By varying the locus of property rights, society can affect the relative bargaining strength of the parties. But, in itself, this will not insure efficient behavior through the private market.
4.2. No exclusion, large number of participants The inefficiency in the small number of participants case arises out of the bilateral monopoly nature of the problem. It is well known that similar problems would arise in bilateral bargaining over private goods. As such, the problem has more to do with game theoretic considerations than with the public nature of the good. When the number of participants is large, however, this similarity disappears. It can be demonstrated that by sufficiently increasing the number of actors in a market for private goods, the outcome would tend towards efficiency via the mechanism of perfect competition. With a public good, however, the outcome will diverge further from the efficient outcome. In markets for private goods, when the number of participants is large, individual actors become price-takers. In public good markets, on the other hand, people become quantity-takers. Consequently, no agent feels that he/she can influence the amount of public good which is made available. The rational agent, therefore, will attempt to "free ride" on the public good supplies of others. The consequence will be gross underprovision of public goods and gross overprovision of public bads. 30
This incentive to free ride at the expense of others is discussed more thoroughly below. If the externality is between two firms, a merger could eliminate the problem. Mergers, however, have costs of their own. See Coase (1937) and Williamson (1975). 31
Ch. 9: Theory of Public Goods
515
This result can be shown more rigorously by expanding the number of agents in the mosquito control example to N, a large number. Unlike the small numbers case, however, it is now rational for each agent to take the mosquito control activities of others as independent of his/her own choices. The result will be equation (70) and u < 1,
i= 1,..., N.
(72a)
uy
The inequality is written in recognition of the fact that some agents may choose to devote no resources to mosquito control. This would be the case if the cost of the public good are large relative to any individual's direct benefit. Assume that (72a) holds as an equality for K < N agents. Then we have, in equilibrium,
> K,
(73)
N Uy
with the strict inequality holding if at least one of the remaining (N-K) individuals place positive value on the last unit of mosquito control. Just as it was inappropriate to measure the extent of market failure by expression such as (73) in the small numbers case, it is inappropriate to use it for that purpose here. Even if K were zero there could be cases of serious underprovision. Thus, while no individual could afford a nuclear submarine, aggregate benefits could exceed aggregate costs by a large factor. The same would be true in a case of externality where one's direct marginal benefit from a good vastly exceeds the external marginal benefit of direct consumption by others. The important point is that the opportunity for bargaining which is present in the small numbers case virtually disappears when the number of actors is large. If bargaining costs are proportional to the number of agreements, they would increase with the square of the size of the bargaining group. It is unlikely, therefore, that we would obtain a level of public good which exceeds that indicated by (70) and (72a).
4.3. Exclusion possible, large number of participants We now consider the class of public goods for which exclusion is costless. This opens the possibility of market provision by private firms. If economies of scale are reached at low levels of output relative to the total output of public good, the number of firms who participate in the market will be large. For analytic convenience we assume that each firm produces one unit of the public good at
516
W.H. Oakiand
cost c.32 The total revenues of a firm will be given by p · , where p is the price charged each customer and n is the number of customers served. Assuming free entry, it follows that for every firm p = c/n.
(74)
Thus, the price of a particular unit will vary inversely with the number of individuals consuming it. Now suppose that all consumers have the same demand function, D(p). Then D(c/N) units would be produced and each sold at the uniform price c/N, where N is the number of individuals in the economy. Lower priced units could not be sustained because they could not cover production costs. Higher priced units would not be available because people can obtain all they want at the prevailing price. Clearly, such an equilibrium is Pareto optimal, for we have N. D 1= c, where D - 1 is the inverse demand function and corresponds to the consumer's marginal valuation of the public good. In general, however, consumers will not have identical demand functions. At the lowest possible price, c/N, firms will provide sufficient output so as to satisfy the individual with the lowest demand. Thus, individuals with excess demand at the price c/N will have to pay higher prices to obtain additional public good consumption. If individual demands at each price are distinct, then, at the price c/(N - 1), firms will produce sufficient output so as to satisfy the individual with the second lowest demand, etc. Under these circumstances, the market will provide for public goods at a continuum of prices between c/N and c. At c, only the person with the largest demand will be purchasing that firm's output. Moreover, it will only be this individual who consumes the entire industry output. 33 The equilibrium described above is characterized by variation in the prices at which individual units of public good are sold. Note, however, that all consumers of a given unit pay the same price. Also, arbitrage cannot be used to eliminate these price differentials as it would do if the good were a private good. Those who purchase the higher priced units are consuming all lower priced units. Hence, purchase for resale would not be possible. It is obvious that the equilibrium described above is not Pareto optimal. Efficiency requires that all consumers have access to all units which are produced. But some individuals facing prices above c/N, choose not to consume the higher priced units. It can also be seen that the market produces sub-optimal levels of 32 In essence, the minimum average cost of production is reached at a level which we normalize to be equal to one unit. 33 For a more detailed discussion of this equilibrium, see Oakland (1974).
Ch. 9: Theory of Public Goods
517
public good. For the last unit produced we have MRSN = c,
(75)
where MRSN corresponds to the largest demander's marginal valuation. If, after allowing all individuals to expand, at no charge, their consumption to the level of industry output, some individual other than N places any positive benefit on additional consumption, we have, by virtue of (75), Y MRS'> c.
(76)
N
Thus, unless all but the Nth individual are satiated by the equilibrium output of public good, it would be worthwhile to expand production. While the preceding characterization of market equilibrium has won some acceptance in the literature, there have been other formulations of the problem. One notable example has been set forth by Thompson (1968), who concludes that markets may actually overprovide for public goods. In his scheme, the preferences of individual consumers are perfectly known to firms and to other consumers. Each firm, therefore, will attempt to extract from each consumer his/her maximum willingness to pay for that unit. But since each firm also sees itself as the marginal supplier to any particular individual, it will recognize that charging a person his/her marginal valuation will lead to consumer surplus on units already purchased. It will, therefore, attempt to capture the latter source of benefit as well. The result is that each firm charges the individual, his/her average valuation for its unit of public good. This leaves the consumer with no surplus whatever. In effect, the economy is no better off with public goods than without. In terms of the Samuelson condition we have ,MRSi< DAV'=c, N
(77)
N
where A V' is the individuals' average valuation of the public good and is greater than marginal valuation because the latter is decreasing. While clearly a novel approach, its informational requirements strain the imagination. Indeed, it is precisely the problem of determining consumer preferences which gives rise to the efficiency problems associated with public goods. To assume it away is to render the question of private versus public provision moot. Efficiency could always be achieved by providing the public good in the government sector.
518
W.H. Oakland
4.4. Exclusion possible, single seller 4.4.1. Monopoly seller- uniform price Because of economies of scale or considerations of space, private provision of public goods will often involve monopolistic elements.3 4 Examples would be highways, airports, waterways, parks, zoos and museums. As long as the facility remains uncongested each meets the criteria of an excludable pure public good. Consider the case of a single seller of the public good. Depending upon the firm's information regarding the tastes of its clients it will choose a pricing strategy to maximize profits. To begin with, assume that the firm is constrained to charge a single, uniform price to each of its customers and, for simplicity, it produces under constant costs. The firm's problem is to maximize
p- Exi- cX
(78)
N
subject to x i < D i (p),
(79)
xi < X,
(80)
where x i is the ith individual's consumption, Di(p) is his demand function and X is the total production of the public good. Defining X' and i' to be the shadow prices of constraints (79) and (80), respectively, the first-order conditions are Ex' + X'D(p)=O0, N
Ela-i
(81)
N
= c
o,
(82)
N
p = X + i,
(83)
where Ai > 0 and l >_ 0. Consumers can be classified into two mutually exclusive sets: those for whom x i < X and hence ' = 0, and those for whom x i < Di(p) and hence XA= 0. Label the sets R and R, respectively. If the sets were not mutually exclusive, we would 34
This case is discussed in Bums and Walsh (1981), Brennan and Walsh (1981) and Brito and Oakland (1980).
Ch. 9: Theory of Public Goods~
519
have for some i: D(p) > x < X.
But for such individuals, the firm could let x' to rise to the smaller of D'(p) and X without changing its costs - thereby increasing profits. 3 5 Thus we can rewrite our first-order conditions as Ex N
i
+ pED(p) = 0,
(84)
R
n(R)p = c,
(85)
where n(R) are the number of people in R. Equation (85) is the profit maximizing condition for X. Since only members of R utilize the entire output, only they are relevant to the output decision. Moreover, since they have excess demand at X, price does not have to be lowered to induce them to consume marginal increases in output. Similarly, (84) is the profit maximizing condition for price. For any given X, (84) determines the revenue maximizing price. Since only those in set R are not capacity constrained only they are sensitive to changes in price. Let us examine the efficiency implications of this solution. First, as long as
consumers are not identical, some are excluded from consuming total output. Second, we have by (85) and the nature of those belonging to R,
)MRS'> n(R)p = c.
(86)
Thus, the monopolist produces too little public good. While the same deficiencies are present under competition, they are less severe. That is, there is less exclusion and more production under competition than under monopoly. While this proposition can be demonstrated for the general case, we confine our proof to the case where the public good is not an inferior good and where individuals can be uniquely ranked in terms of their intensity of demand for the public good.36 Clearly, those in set R consume more at lower prices under competition. For under monopoly they pay the uniform price c/n(R), whereas under competition the prices they pay range from c/N up to c/(N - k) = c/n(R), where k is the individual with the largest demand in the set R. Now consider those in R. Under competition they were not capacity constrained at the price c/n(R). If they had 3s5 It is also clear that neither set could be empty. For if R were empty, (81) would be violated. If R were empty it would imply that all people were on their demand curve at X, which is inconsistent. 36 This means that for any price list p =f(x), individuals' notional demands maintain the same ranking.
520
W.H. Oakland
excess demand at c/n(R) they were able to call forth additional production. Thus, they are better off. Moreover, since they demanded at least as much under competition at prices up to c/n(R), output is greater under competition. 4.4.2. Monopoly provision- price discrimination
If the firm has perfect information regarding consumer preferences, Thompson (1968) has shown that a price discriminating monopolist would provide public goods in an efficient manner. Each consumer would be allowed to consume the entire output at a charge which equals his total willingness to pay for it. Thus, there is no inefficiency arising from exclusion. Also, at the margin, each person would be charged his/her marginal valuation, thereby insuring that the Samuelson condition is met. As pointed out above, the assumption of perfect knowledge of preferences will rarely be encountered in reality. Consumers have considerable incentives to disguise their preferences because each is in a bilateral bargaining position with the firm. This caveat notwithstanding, the firm may often have information regarding the nature of the demand for its product which it can translate into additional profits. For example, if the firm has knowledge about the distribution of preferences for the public good but not those of any particular customer, it would find it worthwhile to charge average prices which depend upon the quantity purchased. This situation would be similar to the competitive solution where different units of the public good command different prices. 3 7 To investigate this possibility we consider a world in which consumers have identical preferences but differ by income according to the continuous density function f(I). We further assume that the firm charges an average price schedule, R(x)/x, where R(x) is the total charge for x units and R'(x) is the price of the xth unit. One special case of R(x) would be px, the case considered earlier. The maximand of the monopolist would thus be JR(x(I))f(I) dI - cX,
(87)
where I* and I** are the lower and upper bounds of income, respectively.
Confronted with this price schedule, each consumer maximizes utility, u(x, y), subject to the budget constraint R(x) + y = I, where the price of the private good is normalized to be unity. The consumer will behave such that
R'(x) = That . (88) is, consumption of the public good is carried out to the point where they That is, consumption of the public good is carried out to the point where the 3'The discussion here draws heavily on Brito and Oakland (1980).
Ch. 9: Theory of Public Goods
521
consumer's marginal valuation is equal to marginal price. If we substitute for y from the budget constraint into the utility function and totally differentiate with respect to income, we find, by virtue of (88), du d
=
(x(I),
(I)),
(89))
where x(I) and y(I) are the Engel curves obtained by solving (88) together with the budget constraint. If aggregate income in the economy, Y= fIf(I)dI, is taken to be fixed, the firm's maximand can be re-expressed as Y-JfI*y(i)f (I )dI- cX.
(90)
In effect, the maximization of the firm's revenue is equivalent to minimizing aggregate consumption of the private good. The last step is to substitute for y in terms of u and x from the definition of the utility function, y =y(u, x). Thus the firm's problem is to maximize -
y(x(I), u(I))f(I)dI- cX
(91)
subject to x(I) < X
(92)
and (89). Now this is a standard control problem with u as the state variable and x as the control variable. Its solution will yield the trajectories u(I) and x(I), which together with the utility function and the budget constraint can be used to construct R(x(I)), the profit maximizing price schedule. The mechanics of the solution will not be developed here.3 8 Instead, we shall summarize the results. First, little, in general, can be said about the shape of R(x) other than it will not be equal to px. However, if the utility function is separable and if f(I) is non-decreasing, R'(x) will be everywhere decreasingexactly the opposite of the competitive case. Second, at the optimum, we must have R'(X) n(X) = c,
(93)
where n(X) is the number of people who consume the entire industry output. This is the same result as in the uniform pricing case. Since uy/u, > R'(X) for 38
A formal derivation can be found in Brito and Oakland (1980).
522
W.. Oakland
members of n (X), it follows that the monopolist produces too little public good. If the public good is non-inferior, the monopolist will produce less than in the competitive case. Most striking, however, is the result that the monopolist's marginal prices will be somewhere above and nowhere below those obtained under competition. Thus, not only will production be higher but exclusion will be less under competition than for the monopolist. As with private goods, therefore, monopoly creates excess welfare losses. 5. Public sector provision of public goods Since the economics of public sector decision-making is covered extensively in other chapters in this Handbook, only the highlights will be discussed here. 39 Basically, the task confronting policy-makers is to insure that the Samuelson condition holds. This requires knowledge of how the private sector itself provides for public goods, how it will react to policy parameters, and information on costs and tastes associated with the public good. Obtaining this information is a formidable task - perhaps even unachievable. Nevertheless, even if policy-makers have complete information there are strong reasons to suspect that they will not always implement an efficient allocation. Public sector decision-makers may frequently have personal interests which conflict with an efficient allocation. Our discussion in this section treats these issues separately: (1) how to obtain the
information necessary to achieve efficiency; and (2) how public sector decisionmakers behave.
5.1. Pigouvian taxes and subsidies Before proceeding to these questions it is convenient to discuss an issue which arises in connection with externalities - the possibility of using corrective taxes or subsidies to remove inefficiencies arising from private behavior. Such an approach, if appropriate, is attractive because it minimizes the need for bureaucratic intervention into the resource allocation process, thus avoiding some of the potential wastes identified below. Moreover, this approach harnesses individual self-interest to achieve the desired outcome. Consider the conditions for Pareto efficiency when there is an external effect
from the consumption of commodity i by individual j: MRSjj. + E MRSjk = MC i,
(94)
ja-k 39 See, for example, Chapter 12 by Inman, Chapter 11 by Rubinfeld, Chapter 10 by Laffont, and Chapter 14 by Dreze and Stem in this Handbook.
Ch. 9: Theory of Public Goods
523
where MRSjk is the marginal rate of substitution between the numeraire commodity and j's consumption of the ith good for the kth individual; MC' is the marginal cost, in terms of the numeraire, of producing another unit of commodity i. According to Pigou (1947) economic efficiency can be achieved for such goods by placing a subsidy (tax) on the purchase of commodity i by individual j equal to the external benefits (costs) represented by the second term on the left-hand side of (94). This has the effect of lowering (increasing) the cost of an additional unit of commodity i by an amount equal to the marginal benefit (cost) conferred upon others. If the individual behaves so as to equate his/her direct marginal benefit of commodity i (i.e. the first term on the left-hand side of (94)) with his/her net marginal cost of procuring the unit, it follows that such a subsidy (tax) will result in the equality of the Samuelson condition (94). There have been several objections to this Pigouvian prescription. First, it ignores the possibility that individuals may engage in private transactions to account for external benefits and costs (cf. the discussion of mosquito control above). That is, individuals who enjoy external benefits may offer their own bribes to the person who generates the benefits. If this is indeed the case, the Pigouvian subsidy would lead to oversupply of the public good; for the individual would be doubly compensated for the external benefits.40 This problem is unlikely to arise if the number of participants in the external benefits of an individuals' consumption is large; for beneficiaries will attempt to free ride on one another. But if the number of participants is small, the Pigouvian approach is likely to be inappropriate-though there is also no guarantee that the private solution will prove optimal. An equally serious shortcoming of the Pigouvian prescription arises if externalities introduce non-convexities into the problem. For then the solution to condition (94) may not be unique.4 1 If this is the case, the marginal adjustments introduced by the Pigouvian tax on subsidy may drive the economy away from the global optimum toward a local optimum. While some improvement is necessarily achieved by the adjustment, the entire approach diverts attention from the fact that a fundamentally different approach is necessary to achieve global efficiency. Finally, even if these problems do not arise, the reader should not be lulled into believing that the Pigouvian approach offers a "solution" to the underlying problem of efficiently providing public goods. For its implementation requires knowledge of the marginal external benefits or costs generated by each person's 40This criticism was first made by Buchanan and Kafogolis (1963) although the argument is implicit in Coase's (1960) critique of the Pigouvian approach. 41See Baumol (1972) and Starrett (1972) for a discussion of the problems caused by non-convexities. Typically the problem is treated within the context of production externalities. Baumol (1964), however, argues that the problem can also arise because of consumption externalities.
524
W.H. Oakland
consumption. But if such information were available to policy-makers they could allocate the good directly - without having to resort to correcting market prices. 42 Moreover, the determination of individual preferences is the major obstacle to the achievement of efficiency. There is nothing in the Pigouvian approach which reduces the scope of this information problem.
5.2. Information Consider the simplest case of a pure public good for which exclusion is impossible and where costs of the public good are prohibitive relative to the resources available to any single agent. Thus, if the public good is to be provided at all it must be done through the public sector. Assuming that full information is available on costs of production, it remains to determine individuals' marginal valuations. Inspection of the efficiency conditions (2)-(6) suggests that the latter depend upon the distribution of welfare in the society. Hence, the efficient output of the public good is not unique, in general. This is not surprising since it also characterizes the efficient provision of private goods. Since the underlying income distribution can be modified by taxes and transfers we can take it as given for purposes of this discussion. Nevertheless, even with a given distribution of income the efficient output of the public good is not generally unique. 4 3 The problem here is one of distributing the inframarginal surplus arising from the public good. This issue does not arise with respect to private goods because market institutions have a unique method of distributing the gains from trade. It involves confronting each individual with constant prices for goods he/she consumes or sells on factor markets. Since public goods are not sold, this convenient device is not available. An explicit decision must be made as to how to distribute the surplus. This will, in turn, affect individual marginal valuations for the public good and hence the efficient level of provision. It appears, therefore, that considerations of income distribution are hopelessly entwined with the problem of efficiency. 5.2.1. Voluntary exchange Some economists have attempted to cope with this difficulty by adapting a convention on how inframarginal gains should be distributed. Specifically, they require that individuals be charged (taxed) an amount equal to their marginal 42
If individual consumers generate different marginal external benefits, the corrective tax or subsidy would vary across individuals, negating much of the simplicity of the Pigouvian approach. 43 See Aaron and McGuire (1969).
Ch. 9: Theory of Public Goodsr
525
valuation times the level of public good supplied.4 4 This is equivalent to the method used for allocating surpluses arising from private goods, and is usually referred to as benefit taxation. Since any desired distribution of welfare can be achieved by lump-sum taxes and transfers, this convention separates income distribution issues from those of efficiency. 4 5 This procedure implies that the determination of efficient public good levels simultaneously carries with it a method for financing the public goods. As we shall demonstrate momentarily, it is this connection that renders the approach non-operational. But first note that the convention requires that the policy-maker discover the Marshallian demand curve of each individual for the public good. This demand curve shows the amount of public good desired at each constant tax price. Each point on this demand curve corresponds to the individual's marginal valuation for that level of public good supply. To determine the efficient level of public good, the policy-maker needs only to sum these demand curves vertically and find where the resultant curve intersects the marginal cost schedule for the public good. 4 6 The vertical summation represents the aggregate marginal valuation for each particular level of public good. Since individual tastes and income vary, the marginal valuations of individuals will generally differ at the optimum. This means that each individual's tax price will vary - the highest price charged to the individual with the greatest marginal valuation. It must be emphasized that the preceding calculations cannot be considered to be a conceptual exercise with actual taxes determined by some other method. For the demand curves are valid indicators of marginal valuations only if taxes are indeed collected according to marginal benefits received.47 Thus, individuals must actually be confronted with such tax prices when the policy-maker obtains the data on individual demand curves. We can now see why this approach breaks down. Because the policy-maker is not omniscient the demand curve will have to be supplied by the individual. The threat of exclusion, however, cannot be used to obtain the demand data; hence, the individual will have to supply it voluntarily. If the number of individuals is large, an individual's tax payment will be an insignificant fraction of total taxes. But since total taxes determine the level of the public good supplied, it follows that the individual's response will have little or no impact on output. In this circumstance the rational individual will try to convince the government that his demand curve is identically zero, so as to minimize his tax payments. Thus, 4This is the so-called Lindahl approach. See Johansen (1963). 45However, if lump-sum taxes are not feasible and only distortionary taxes are open to the policy-maker, the dichotomization of distribution and efficiency is invalid. See Diamond (1974). 46Strictly speaking, this procedure is correct only if the public good is produced under conditions of constant costs. Otherwise, other taxes or subsidies will be required which will affect the underlying demand curves. 47 An exception would arise if individual demands were independent of income.
526
W.H. Oakland
individuals will not voluntarily reveal their true preferences to decision-makers. They will instead try to become free riders at the expense of others. 5.22. Incentive compatible mechanisms
The problem of inducing people to reveal their preferences for public goods stems from the association of their stated demand curves with the taxes they are asked to pay.4 8 Some economists have suggested that honest revelation can be induced by making a person's tax payment approximately independent of his stated preferences. Basically, an individual would be asked to pay a two-part tax.4 9 The first part of the tax is used to cover the cost of the public good and is set in advance. Specifically, each person would be preassigned some share of the cost of the public good. The second part of the tax would depend upon how much an individual's revealed preferences would change the outcome. If the individual reports a demand curve which is identically zero, the second tax would be zero. If, on the other hand, the individual reported preferences which caused an increase in the supply of the public good, he would be liable for the additional cost, less the willingness to pay others for the additional units.5 0 The latter is measured by the area under the reported demand curves of others. In such circumstances, individuals would be induced to reveal their true demand curves irrespective of whether they believed others are telling the truth. Hence, all individuals are induced to truthfully reveal their demand curves and efficiency would appear to be in reach. There have been a number of criticisms of this and related mechanisms. 15 Some have argued that the proceeds of the second tax must be wasted if the mechanism is to work. Others have suggested that the mechanisms are vulnerable to the formation of coalitions which could exploit them. Moreover, the mechanisms produce valid measures of marginal valuations only if individual demand curves are independent of income. 52 The most serious shortcoming arises in the large number case. Here the chosen level of public good is approximately 48The late Leif Johansen (1977) argues that the importance of this problem is overstated. This seems, however, to be an act of faith. 49 See Tideman and Tullock (1976), Groves and Ledyard (1977), and Green and Laffont (1979) for examples of such mechanisms. Only simple mechanisms are discussed here. For a thorough discussion of this literature see Chapter 10 by Laffont in this Handbook. 50 This is the so-called "Clarke" tax, as discussed in Tideman and Tullock (1976). The Groves-Ledyard (1977) variant is somewhat different although it carries a similar content. See Fane and Sieper (1983). 5 1See, for example, the Special Supplement to the Spring 1977 issue of Public Choice, 27-2. 52 The Clarke mechanism is particularly sensitive to the point. The individual demand schedules are based upon the premise that the agent actually pays the tax indicated by his tax share. However, under the mechanism, tax shares are preassigned and fixed. Consequently, the revealed demand curves only indicate the agent's marginal valuation if the demand for the public goods is independent of income- a highly dubious condition.
Ch. 9: Theory of Public Goods
527
independent of the individual's response, i.e. the aggregate outcome is approximately the same whether or not the individual refuses to participate by reporting no demand whatever.5 3 Since the determination of one's own demand curve requires considerable thought, many people would simply not participate or would seriously understate their demand.5 4 Consequently, this mechanism is of little practical use when the number of taxpayers is large. And if the number is small it is subject to the waste, coalition and income effect criticisms mentioned above. 5.2.3. Voting
The ballot box is the principal method employed by democratic societies to elicit citizen's demands for public services.55 Wicksell (1896) has shown that, under certain idealized circumstances, the voting mechanism can generate an efficient supply of public goods. He demonstrated that if public good expenditures and tax shares were decided simultaneously, a unanimity voting rule would lead to efficiency. Assuming voters vote in favor of any program which increases their benefits by no less than their taxes, only programs which offer a Pareto improvement can be passed. As long as public expenditures are inefficient it will be possible to formulate a new program which could command unanimous consent. If voting is costless, this process would continue until Pareto optimality is achieved. Unfortunately, the conditions necessary to make a unanimity rule viable are not met in practice. For one thing, voting is not costless. The probability that an alternative tax-spending proposal does not harm anyone is virtually zero once some minimal level of public service is provided. Thus, the unanimity rule would favor the status quo, at sub-optimal levels of provision. Also, it is likely that some individuals will behave strategically, vetoing proposals that benefit them in the hope that some superior alternative will be proposed. Thus, the Wicksellian unanimity rule has little practical value. It does, however, offer insights into why alternative voting rules tend to be ineffective. For with any plurality short of unanimity, proposals could be adopted which redistribute income. Since voters cannot distinguish efficiency gains from redistribution gains, non-unanimous voting rules hopelessly entangle efficiency and redistributional considerations. It is not surprising, therefore, that voting has serious shortcomings as a preference revelation mechanism. Indeed, analysts have 53 In the context of the Groves-Ledyard mechanism, failure to participate would be manifested by asking for no change from what others demand. 54 This point has not received the attention in the literature it deserves. The mechanisms require sophisticated behavior on the part of the agents when formulating their own choices, but assume that taxpayers are naive concerning how their response will actually affect the outcome. 55For an excellent survey of the voting mechanism, see Chapter 12 by Inman in this Handbook. Also see Mueller (1979).
528
W..
Oakland
generally concluded that most voting processes are incapable of determining an equilibrium outcome - i.e. one that cannot be defeated by an alternative. Some analysts have attempted to overcome the lack of existence by assuming that tax shares are fixed - say by the constitution. In such cases, voters' preferences for a public good will usually be single peaked, i.e. there is a unique optimum from the viewpoint of each taxpayer.5 6 If the choice of public good supply is made by majority rule, then the outcome will be the personal optimum of the individual with the median preferences - i.e. the individual whose preferred outcome is considered "too high" by half the voters and "too low" by the other half. In general, it is not possible to determine whether such an outcome yields an oversupply or undersupply of the public good from an efficiency perspective. It will also only be by accident, with almost zero probability, that the outcome will be efficient. For example, given the tax rule, we may have 51 percent of the voters enjoying a net benefit of one penny with the other 49 percent suffering negative benefits of fifty dollars apiece; an obvious case of oversupply. Or the figures could be reversed, with undersupply being the case. These examples underscore the fundamental difficulty of the voting rule-people are unable to express the intensity of their preferences. Consequently, the outcome may be dominated by distributional considerations. Intensity of preference can be introduced into the voting mechanism by explicit or implicit vote trading. The latter is often referred to as log-rolling, while the former is the case of a multi-issue ballot. Here people can trade their support for public goods about which they have a marginal interest for support for goods for which they have a strong interest. Unfortunately, such mechanisms reintroduce the likelihood that an equilibrium will not exist. To a large measure this reflects their inability to eliminate distributional considerations from the decision mechanism. We can only conclude that voting is a highly suspect preference revelation mechanism, which under plausible conditions may be counter-productive. 5.2.4. Cost-benefit analysis57 Although people do not have incentives to voluntarily reveal their preferences they may occasionally do so unconciously through their behavior in markets for private goods. This would be the case if the public good is a close substitute in consumption for some private good traded in the market. For example, the benefit people derive from fire protection might be gleaned from their behavior with respect to fire insurance and fire protection materials. Or the benefits of pollution control may be estimated by comparing land values of plots subject to 56 All 57
that is required is that consumers' preference orderings be quasi-concave. For an excellent survey of this technique see Chapter 14 by Dr&ze and Stem in this Handbook. Also see Layard (1972) and Mishan (1976).
Ch. 9: Theory of Public Goods
529
different degrees of pollution. Another important example has to do with public services which affect the time at the disposal of an individual. Thus, a highway improvement which reduces travel time may be valued by observing how an individual makes choices among the transportation modes or transportation routes which offer different combinations of time and out-of-pocket costs. 58 The most important application of this technique, however, has to do with public intermediate goods which are used in the production of private goods. Here the relevant preference information is based in a production function as opposed to the utility function. As such the relevant information is objective as opposed to subjective. The government can obtain the services of the appropriate technical personnel to determine the productivity of public good inputs. The value of the service is obtained by weighting the additional output by the observed market prices of the private good. Cost-benefit analysis, then, can be a useful technique for ascertaining preferences when the public good has a close substitute in consumption or if is as input into the production of private goods. For such goods as defense, however, it can be of little help, meaning we must rely on the highly imperfect voting mechanism for obtaining preferences. 5.2.5. The Tiebout mechanism
If the public good has a spatial dimension it may be possible to use taxpayer mobility as a means of revealing preferences.59 For if communities differ in their supply of public goods and taxes, individuals can locate in that community which most closely matches their tastes. Under certain idealized conditions, such mobility can lead to efficient levels of spatial public goods. These conditions include those developed earlier for local public goods; that is, there must be a large number of communities for each taste class. In addition, it may be necessary to restrict individual mobility so that individuals do not base their locational choice upon distributional considerations. This would be the case if taxes were collected on some base which is positively correlated with income such as the property tax. For then it might be attractive for low income individuals to migrate to communities with high average income. For a given level of services, property tax rates would be lower in richer communities. Hence, some sort of mobility restriction, such as zoning, may be necessary to achieve efficiency. 6 0 58Bradford and Hildebrandt (1977) rigorously developed conditions under which it is possible to determine preferences for public goods from observed market demand functions for complementary private goods. 59 See Chapter 11 by Rubinfeld in this Handbook. 60 Some analysts have argued that richer communities would provide a higher level of services. Thus, taxes on the poor may not be lower in the rich communities. The issue, then, is whether the poor are willing to pay the higher taxes for the greater services. However, if the tastes of the rich are diverse, there would be rich communities which offered lower taxes and higher services to the poor resident. See Westhoff (1979) and Epple, Filimon and Romer (1984).
530
W.If. Oakland
While the Tiebout mechanism has attracted wide attention among economists its assumptions are so restrictive as to cast doubt on its validity. Recall it has not yet been possible to determine conditions for efficiency when the number of individuals of given tastes is not large relative to optimal community size. Without such conditions it is impossible to judge the efficiency of mobility models. Moreover, in reality the structure of local governments does not fit well with the Tiebout assumptions. The existence of large central cities and restrictions on mobility due to discrimination raise serious questions about the relevance of the Tiebout mechanism.
5.3. Behavior of public officials Efficient public goods supply may not be achieved even if decision-makers have full access to consumer tastes. Public officials cannot be supposed to ignore their own economic self-interests when formulating policy. They have strong incentives not only to retain their positions but to upgrade them, thereby augmenting their salaries and position of authority. In part this reflects the quasi-rent characteristics of an occupation - once selected they are difficult to change without a loss of earnings. It also reflects the opportunity open to a policy-maker to augment his/her income prospects once out of office by using the power of the office to confer rents upon particular constituents. For many officials this augmentation occurs directly through expanding the size of their organization because salary or prestige and organization size often go hand in hand. Two examples will nicely illustrate the problems which can arise here. The first is Niskannen's (1971) model of the monopoly bureau. The second is Rosenthal and Romer's (1978) agenda-setting model. We will treat them in turn. 61 Niskannen's model begins with the observation that a particular government bureau is often the sole provider of a public good. Because it is in the public sector, however, the bureau's objective is not to maximize profits but to maximize output, thereby giving the bureau's managers greater economic opportunity and power and prestige. Niskannen assumes that the sponsoring agency (e.g. the legislature) does not have knowledge of the cost function for the public good. Hence, it can only monitor the bureau's output and its budget. On the other hand, the bureau is assumed to know perfectly the benefits that the sponsor derives from the public good. Under the best circumstances these benefits would be an accurate assessment of the taxpayer's true benefits from the public good. It is assumed that the bureau makes an all-or-nothing proposal to the sponsor. This 61
These models are also discussed by Inman in Chapter 12 of this Handbook.
Ch. 9: Theory of Public Goods
531
offer will be that which maximizes the output of the public good, X, subject to
C(X) fiui(g(m*l(O'),... ,m*(i-)(Oi-1)
daO-i mi,
m*(i+l)(Oi+t),.,
m* (01)),Oi).~(O-7i0)dO- ', Vmi E M.
541
Ch. 10: Allocation of Public Goods
A Bayesian-Nash equilibrium is a Nash equilibrium of response functions m *'1(.), . . ., m * (.) such that, if each agent i believes that the others are using these responses functions, he should himself use m*'(.). His expected utility for
his message m*'(O') is at least as good as his expected utility for any other message. 2.3. Maximin equilibrium: c = m An I-tuple of messages (m*',..., m*') is a maximin equilibrium at 0 iff Vi E I, min - ui(g(m*', m-), 0')
M-'iM '
min
u'(g(m', m-'),O'), Vm'eM'.
m '-M-
For each agent i, m*' is the best strategy if he envisions the worst strategies by the others from his point of view. 2.4. Nash equilibrium: c
n
An I-tuple of messages (m*l,..., m* l ) is a Nash equilibrium at 0 iff Vi E I,
g(m*i, m*-i)Ri(6')g(mi, m*-'),
Vm'E M'.
Implementation in Nash equilibria is a very different notion of implementation than the other three concepts. For any of the other concepts, despite incomplete information, each agent is able to obtain his best strategy by pure thinking, in a completely decentralized way, even though each notion corresponds to a different attitude. On the other hand, a Nash equilibrium can be arrived at only with a dynamic procedure because of incomplete information. 3 Then, the question of manipulation along the procedure arises. Implementation in Nash equilibria can be viewed as a weakening of incentives requirements as follows. Assume that there is a dynamic procedure; at date t of the procedure agent i is told the answer of date t- 1 and believes that this is the final step and that the others will not change their answers. Under this myopic behavior, if the procedure converges, the outcome corresponds to Nash implementation. 3 An alternative interpretation is that when the game is played all agents have all the information. This interpretation is not, in general, appropriate in public economics.
542
J.-J. Laffont
The above discussion stresses the need for, among other things, myopic behavior. Why not adopt myopic behavior as an appropriate weakening of incentives requirement, but keep one of the other decentralized solution concepts discussed above? To formalize this idea we consider economies where agents have C' utility functions defined on R L , u'(x'), where L is the dimensionality of the commodity space. Let U' be the space of admissible utility functions for agent i. Mount and Hurwicz's diagram is here
U' x*xU'
U
,
A
/ M' x · ·. We mi(t), mi(t), which
MI
consider continuous time procedures; agent i's message is a function t 2>0, and let M'(-) M'(t), t > O, be an admissible space of functions t O. Myopic behavior consists in choosing m'(t) at instant t in a way maximizes the increase of utility
dU' dt =
aui
L
E
l
dmx
x~(x'(t)) dt
(m (t
1-1 ax;
),
m - (t ))
.
The outcome function associated with each I-tuple (ml(t),..., m'(t), t > O) is the solution of the differential system dx dx~ = G(mi(t) m i(t)),
i=,
of messages
I,1=1,.. L.
A procedure is an I-tuple of message spaces Mi(t), t 0, and a set of outcome revision functions Gi(.), i= 1,..., I, = 1,,.., L. Note that at each instant the objective function of an agent is characterized by his marginal rates of substitution, for example all are expressed with respect to commodity one. The local utility function of the agent is linear in these characteristics. So, at each instant the procedure defines a mechanism for linear utility functions. We refer to the induced local mechanism.
543
Ch. 10: Allocation of Public Goods
2.5. Local dominant equilibrium A set of messages (m*l(t),..., m*I(t), t > 0) is a local dominant equilibrium iff ViEI, Vt>0,
E> ix (x*i(t)) d (mi(t), m-i(t)), =1 tVmi(t)E
Mi(t);
Vm-i(t)
EM-i(t),
where dx,
dt (M*i(t),m
i(t))=
G(m*i(t), m-(t)),
1=,.
L,
,
and max-i(t))imin equilibrium can be Alocal Bayesi(man(t),-Nash equil(t))ibrium and a loG(m(t),cal
A local Bayesian-Nash equilibrium and a local maximin equilibrium can be defined in a similar way. In this Chapter we will use the three equilibrium concepts, a, b, and c, and their local versions. Incentives theory is greatly simplified by the revelation principle. A mechanism is direct if agent i's message space M' coincides with his space of characteristics O', i = 1,..., I. A direct mechanism is revealing if OEEgC(O), VGE9,
i.e. if the true characteristics are equilibrium messages. A procedure is direct if, at each instant, the induced local mechanism is A direct procedure is revealing if at each instant the induced local mechanism is revealing. We state only the revelation principle for implementation in dominant gies. It extends basically to all decentralized solution concepts [see Barbera and Dasgupta, Hammond and Maskin (1979)].
direct. direct strate(1982)
544
J.-J. Laffbnt
Theorem 2.1. Gibbard's Revelation Principle (1973) Let (M, g) be a mechanism which implements the social choice function f() in dominant equilibrium. Then there exists a direct revealing mechanism (, 4,) which implements f(.) in dominant equilibrium. Proof Let m*l,..., m * t an I-tuple of equilibrium messages for (M, g) when the agents' characteristics are respectively 0, ... , O1. Define, for every 0 E O, A(0)
g(Eg,d(0)).
By definition, (0) =f(0); (, 4) is a direct mechanism. Suppose it is not revealing. Then there exists an agent i and i'E O' such that
(Oi -)P'(0i'),(i, 0-i)
(2.1)
where P'(O') is the relation of strict preference associated with R'(i'). From the definitions, (2.1) can be rewritten as g(Eg,a(t, 0-'))P'(0')g(Egd(0i, 0 - i )).
(2.2)
By assumption, (m*1,..., m*i,..., m*') E Eg, d(0 ,i
- i
(2.3)
)
Expression (2.2) tells us that there exists (M*l...,
M.(-1),
,
m)Egd(,
-i),
such that g(m*l ..... m*('-',
which contradicts (2.3).
.,
m*')Pi(Oi)g(m*l...
, m*,
m*
Q.E.D.
Therefore, all the social choice functions which can be implemented in dominant equilibria by complex mechanisms can also be implemented by direct revealing mechanisms or are said to be self-implementable. We restrict the analysis to these mechanisms.
Ch. 0: Allocation of Public Goods
545
Finally, let us note that the above definitions can be extended to a social choice correspondence F: - A. Implementation means then that there exists (M, g) and c such that VO,
g(Eg c())
0
and
g(Eg c())
F(O).
To summarize, in a world of decentralized information a decision-maker is constrained to use social choice functions which can be implemented, i.e. which are compatible with individual incentives. A good appraisal of the induced limitations of government intervention is fundamental for public economics.
3. The Gibbard-Satterthwaite theorem The first point to understand is that if no restriction at all on agents' characteristics is available, implementation in dominant strategies is essentially unreachable. This is formalized by the Gibbard-Satterthwaite theorem. Let Z(A) be (to simplify the proof) the set of all total orderings (indifference is not allowed) pi admissible for agent i, i = 1,..., I; we say that the property of universal domain (UD) is satisfied. A social choice function (SCF) here is a mapping from [(A)]' into A. If a social choice function f has a range A' c A, then f is called a SCF with range A'. A social choice function with range A' is dictatorial if there exists an agent whose favorite announced alternative in A' is always the social choice. There exists i E (1,..., I} such that for any profile P = ( 1l,..., P) [2(A)'] and any a E A', such that a f(P), then f(P)P'a. We have: Theorem 3.1. The Gibbard(1973)-Satterthwaite (1975) impossibility theorem If A' has at least three elements, a social choice function with range A' which satisfies universal domain and is implementable in dominant equilibria is dictatorial. Proof From Theorem 2.1 we can restrict the analysis to direct revealing mechanisms. The first lemma shows that if an agent can by himself change the social choice, truth cannot be a dominant strategy. 4
This proof is due to Schmeidler and Sonnenschein.
546
J. J. Laffont
Lemma 3.1 If there exists i E (1,..., I} such that f(P 1 ,..., P',..., Pi) = a,, f(pl ... ,p
... P) = a,
aa
a 2,
and if a2 P'al or
alP'ia2,
then the truth is not a dominant strategy at either (P', (pi
. ., pi
,..., ..... P)
or
.....pi).
Proof If aLP"a i 2 f(P 1,... Pi,..., P )Plif(P 1,..., p P ,..., P), then if agent i has preferences P", it is in his interest to announce P. If a 2 P'al,f(P',..., P'i,..., P')P'f( 1 ....., pi,..., p'), then if agent i has preferences P', it is in his interest to announce P'i. Q.E.D. Lemma 3.2 If f has range A' and if B cA', if, for a profile P = (P ' ,..., P.), for any a, E B and any a2 E A, a2 B, alPia2 for any i, then f(P 1 ,..... P) c B. Proof Suppose, on the contrary, that f(Pl,..., Pi) = a 2 4 B. Since B c A', there exist (P', ... Pi) [(A)]1 such that f(P'1,..., P') = a, E B. Consider then the set of social states i=I defined by a =f(Pt
O....p,i, pi+l .... pl)
and let j be the first integer such that a E B. Then we have f(p"l,...,
p,, pJ+l
f (P/1'..'
p,(j-1), pi .
. , P') = a . P) = a
E B, - 1
q B,
Ch. 10: Allocation of Public Goods
547
and
Then, from Lemma 3.1, the truth is not a dominant strategy for agent j if he has preferences P'. Q.E.D. This lemma shows in particular that f() must select in A' a Pareto optimal outcome. The idea of this proof is then to construct, from the social choice function f, a social welfare function F which satisfies the assumptions of Arrow's impossibility theorem. First, let us see how an ordering on A' is constructed from f for each profile. For any pair (a,,a 2) a modified profile ('l,..., P')is constructed as follows: Pi is constructed from pi by moving up to the top of the ordering a, and a 2, keeping the ranking defined by P' on {a,, a 2} and on {a/a A, a al, a * a 2 }. Let us denote by laa2(Pi) the ordering Pi obtained in this way. From Lemma 3.2, we know that either a1 =f(Pl,..., P) or a2 =f(j l.., ). Then, we define
alPa2 ,
if a =f(P1,...,
a 2 Pal, if a 2 =f(l,
),
... , P).
This construction can be repeated for any pair of alternatives in A'; it yields, for the profile P, a binary relation on A'. Repeated for any profile P in [(A)]', it gives a potential social welfare function F which associates to any profile a binary relation. It remains to show that this binary relation is indeed an ordering, and that F satisfies all the assumptions of Arrow's theorem, universal domain, Pareto optimality and independence of irrelevant alternatives. By definition, F satisfies universal domain. Pareto optimality on A' follows immediately from Lemma 3.2. If F did not satisfy independence of irrelevant alternatives it would mean that there would exist two profiles, P and P', and two alternatives, a E A' and a2 EA', such that a lPia2
alP'ia 2,
Vi E I,
and (for example) alF(P)a2
and
a 2 F(P')al,
548
J.-J. Laffont
which means
f l(P
") I...Ia,.(P) )
a· .
As in Lemma 3.2 consider the alternatives: a3
=f(P,aa2(P
,.
)
12
)(PI)
9
la2
a3 = al, a3 = a 2, and, from Lemma 3.2, a
{(a,, a 2},
Vi.
The proof ends as in Lemma 3.2. Clearly, for any P, F(P) is total on A' and anti-symmetric (since f is single valued). It remains to show that F(P) is transitive. Suppose it is not. Then there exists (P',..., P'), a,, a2, a3 in A' such that al=f(caa2( ),..
a2 =f(a
a 3 =f(0
3,(p)
0 3(P'
.aa(P'))aaF(P)a
,...
a2 ,(P')) a3
... ,%,(P'))
2,
a 2 F(P)a3, a 3F(P)a.
Let (', ... , p') be the profile obtained by moving up (a,, a2 , a3 ) in each ordering, keeping the ranking defined by P' in {a, a 2, a3}) and (a/a eA, a=a,, a a 2, a a3 ). Assume then (without loss of generality) that a1 =f(PA ..., p).
Let (Pl',..., P"') be the profile obtained from (P",..., PI) by moving down a2 to the third position in each ordering P'i (to ensure by Lemma 3.2 that a 2
549
Ch. 10: Allocation of Public Goods
will not be chosen). Then alP'a3
-*
alP"'a3 ,
Vi
I.
Since a 3 F(P)al by assumption, the property of independence of irrelevant alternatives implies a3 =f(p
l,...,
Let a' =f(p,,1...
P",).
P"', p,('+l),..., pt'), i = 0,..., 1, and let j be the first
al. integer such that a If a = a3 , then the truth is not a dominant strategy for agent j (Lemma 3.1). If a = a2, then the truth is not a dominant strategy for agent j since
f(P"1,...,
p"ip'J+... P")= a2,
and aP"'a .2
Consequently, F(P) is transitive. From Arrow's theorem, F is dictatorial. The dictator for F is a dictator for f. Q.E.D. To obtain positive results it is then clear that we must either introduce a priori restrictions on agents' characteristics or weaken the notion of incentive compatibility. In economies with only public goods, the simple restriction to "economic environments", i.e. to an economy where agents have well-behaved utility functions on an n-dimensional space of continuous public projects, still yields an impossibility theorem. When private goods exist in the economy and production is possible, some mechanisms incentive compatible in dominant strategies (and even Pareto optimal) can be exhibited. Satterthwaite and Sonnenschein (1981) give the following example. Consider an economy with three consumers and two pure private goods. Commodity 2 is produced from commodity 1 according to the linear technology Y2 = y, and initial resources are three units of good 1, one for each agent. Agents two and three share the first unit of good 1 in proportions which depend or the mean curvature of one's utility function. If it is "very curved", agent two gets the unit of good 1, if it is "very flat", agent three gets the unit of good 1. Similarly, the second and third units are shared in the same way by
550
J. J. Laffont
permutating agents circularly. Then, each consumer receives his maximal point (from the point of view of this announced utility function) on the budget line x + x 2 = w, where w is the amount of good he has received by the above procedure. This mechanism is clearly incentive compatible in dominant strategy since each agent gets his best point in an exogenously given set.5 4. Public goods only; positional dictators Positive results can be obtained at the cost of an extremely restrictive assumption on preferences which is famous in social choice theory, namely single peakedness, and which is heavily used in the median voter paradigm. Consider an economy with an odd (for simplicity) number I of agents such that social states can be ordered on a line in such a way that all agents have single-peaked preferences (see Figure 4.1). agent 2 agent I
al
agent 3
02
03
04
05
i6
07
Figure 4.1
As noted by Dummett and Farquharson (1961) the social choice function which selects the "peak" alternative of the median agent (which would be selected by majority voting) is truthfully self-implementable in dominant strategies. Clearly, if an agent is the median agent he should tell the truth. If an agent has his peak to the right of the median agent's peak (agent 3 in Figure 4.1), he gains nothing by lying with an announced preference which still has its peak to the right of the median agent's peak (since nothing changes) and he moves the selected peak in the wrong direction if he lies by announcing a peak to the left of 5
As shown by Hammond (1979), at the individual level a dominant strategy mechanism is characterized by an exogenously given set on which the agent picks his best point. The difficulty is then to ensure social feasibility of the allocation. When Pareto optimality is not required, this feasibility condition takes the form of a set of inequalities which can be easily satisfied, at least when individual rationality is not required. If Pareto optimality is imposed, the mechanism must satisfy in addition some equalities and this is often generically impossible. In the above example, a good mileage is obtained from the production technology which basically suppresses one of the two equilibrium conditions. In a pure exchange economy, the conjecture is that "generically" there exists no Pareto optimal dominant strategy equilibrium. See below for the context of one private good and one public good.
551
Ch. 10: Allocation of Public Goods
the median agent's peak. (A similar reasoning applies to agents whose peaks are to the left of the median agent's peak.) Moulin (1980) has shown that, when messages are restricted to announcing peaks only, the family of efficient, anonymous, incentive compatible social choice functions can easily be described. It is obtained by adding less than I fixed ballots to the I agents' ballots and then choosing the median of this set of ballots. When transfers across agents are not feasible, this is a good justification of majority voting if preferences can be assumed single peaked. 6
5. The differential approach
In the sequel where parameterizations of utility functions will be frequently used, we take the so-called differentiable approach to construct incentive compatible mechanisms. Consider, in a two-commodities exchange economy, agent i's utility function (linear in the unknown parameter):
Ox
- X2 + X,
where ' is the true characteristic of agent i known only to agent i; 0' is known by the Center to belong to an open interval of R containing zero, 1'. Let o= i,=1'. Let i' denote his answer to the mechanism which is (at the individual level) a pair of functions, xi(9', 0i), xi(O', 9 - ), where -i'= (81, ... , -i l, 8 i+1 ... , 'O).Finally, let (wi, w') i = 1,..., I be initial endow-
ments. Let us restrict the analysis to differentiable7 mechanisms. Agent i will announce a 9' which satisfies the first-order condition of his maximization problem: i
max i 6
ix('
-
i)
2
0,i
2
-i)
2
+x
-)
See also Laffond (1980). 7The restriction to differentiable mechanisms is in fact not needed. A direct argument, without any assumptions, shows that the function x2 (') must be non-decreasing. Since a non-decreasing function is differentiable almost everywhere, the above characterization holds almost everywhere. Mechanisms which are infinitely non-differentiable are not interesting from an economic point of view. The right family to consider is the set of functions which are differentiable except at a finite number of points where they may even be discontinuous. The characterization is then analogous to the one given above in each interval where it is differentiable. Then we must add conditions to connect the pieces which are simply conditions making the utility functions of the agent continuous across the pieces.
552
J.-J. Laffont
The first-order condition is
a
('i,
a-i) - xi(0 , -)
a2'i'
a(x ( , -i) + aXi(i ao
aoi
'
) = 0.
A necessary condition for the truth to be a dominant strategy is therefore
0iaX i, aai2 (O
x-) x0 - i ) ai ( i 2( 20i,
-
- + ax,e,
0 - i) ,=
(5.1)
for any 0-'. Since we wish to obtain the truth for any 0' E O', (5.1) must hold for any E O'. 9 It is therefore an identity which we rewrite as O a2(, (i
aO'
-
Xl(O,
-i)_
-
i)') x(
-
s
ao'-')('O,i, O-i) + -aoi
(i', 0-) =
(5.2)
or -
2ixi0(si o-i]
X2 (Si , 0-') ds'+ hi(0-'),
VOt
.
(5.3) A second necessary condition is obtained from the local second-order condition. The above maximization problem must be concave around the truth. Let O(o ,i 'i) =
ixi(o',
- i
)
X2
The first-order condition can be written as
a (l i,)
=0.
The second-order condition is a02.
But, from identity (5.2),
ao (', o i) _ o.
+
i(i
-).
553
Ch. 10: Allocation of Public Goods
We therefore have a-
=°
+2 a
The second-order condition can then be rewritten
0')o, >0,
aO ( i.e. here
x(
0,
i O
VE0.
-i)0,
These necessary conditions are here sufficient, as can be checked O'x2,(o', 9-i)
-
22
X2
> ix (ofi
Xi) O+x( O-
X2 (
-i)
i
i) + xi(Oi ,-i)
(5.4)
Using (5.3), expression (5.4) becomes: (di - Oi)x (i,
fx2(si,
9i) x
i) ds',
which is true from the fact that x2 is non-decreasing in o'. The characterization of differentiable dominant strategy mechanisms at the individual level is then completed. We can choose a function x(-) non-decreasing in i' and then the function x(8) is determined by integration with an arbitrary function hi(-i). In a private goods economy the feasibility conditions are I
I
E
< E w2
x(o',o-')
i=l
i=1
and Ehi(0-i)-
Wi
• i=l
E
-(_
X-2(si,
i))
x(s',
i') dsi
554
J.-. Laffont
Suppose now that good 2 is a public good (x = x 2 for any i) and that its cost in units of good 1 is C(x 2). The feasibility condition is then C(x 2 (o)) + E h'(0-')- -E aX 2 (S i
)
f(si
2(Si
-i))
0.
6.Thasrk-GrovesVi
6. The Clarke-Groves-Vickrey mechanisms 8 A major intellectual event of the last decade in public economics has been the irruption of the Clarke-Groves-Vickrey (CGV) mechanisms claiming to be a solution for the long-standing free-rider problem. 9 In this section we show a possible intellectual origin of these mechanisms, discuss some of their properties and show that they are a special case of a more general class of incentive compatible mechanisms. The intuition of a CGV mechanism can be given from the Vickrey auction. Consider an indivisible good that a seller wants to sell to a group of I buyers. The Vickrey auction enables the seller to elicit the true willingnesses to pay of all buyers as follows. Let ' denote buyer i's true willingness to pay and i' denote his (sealed) bid. The Vickrey auction tells him that the buyer with the highest bid will receive the object and will pay the second highest bid. It is easy to see that truthful bidding is a dominant strategy. Consider, without loss of generality, buyer 1 and let us distinguish between two cases. In case 1, truthful bidding leads buyer 1 to be the highest bidder. He cannot gain by announcing 0' # 8'. Indeed, suppose he tries to lie. If he announces 1 * O1 but still larger than the second highest bid, nothing changes. If he announces 9' below the second highest bid, he loses the object and pays nothing. He has lost a gain of (1 _ 02) he would have had by being truthful. In case 2, he is not the highest bidder if he is truthful. He does not receive the object and pays nothing. Suppose he lies with a lie 81, still below the highest bid. Nothing changes for him. If his announced bid becomes the highest bid, he receives the object but pays the second highest bid which, by assumption, is here higher than his true willingness to pay, making a loss. Note that this mechanism yields a Pareto optimal outcome (since the agent with the highest bid gets the object) and that the payment made is the loss 'The early papers are Clarke (1971), Smets (1972), Groves and Loeb (1975) and Vickrey (1961). 9A thorough discussion is not feasible here [see, for example, Green and Laffont (1979), Laffont and Maskin (1982)].
Ch. 10: Allocation of Public Goods
555
inflicted by the agent who gets the object on the rest of society since he pays the second highest valuation. Consider now an economy with one private good and one public good in which each agent has a quasi-linear utility function, .,
i(xi, y)=xi + i(y),
I,
where xi is agent i's consumption of the private good and y is the quantity of the public good (which, for simplicity, has a zero cost). Finally, let i' be the initial resource of agent i in private good with x = Y=lxi. Consider first the simple case of a (0, 1} public project with the normalization: V(0) = 0;
i= 1 ... 1.
V (1) = ,
Excluding corner solutions where some agents are bankrupted, a Pareto optimal public decision (under perfect information) is defined by I
y=l*
0>0. i=l
Let 0' be the announced willingness to pay for the realization of the public good (y = 1), i = 1,..., I, and let 0=(01 . .. ,1), -i= (01 ... ,
i-
Oi +
,...,
),
0 = (0i, 0-i).
The Clarke mechanism can be defined as follows: I
'> O
y(O) = 1
(6.1)
i=l
ti(O) =:--
ji, joi
= 0,
if ( i) j;i
otherwise.
(
< 0, i=1
(6.2)
The decision is taken according to (6.1). Moreover, agent i receives a (nonpositive) transfer ti . If his answer changes the collective decision, i.e. ,j i0 has
J.-J. Laffont
556
a sign different from the sign of = 0i', he must pay the cost that he imposes on the rest of society, I ,* 1. Otherwise, he pays nothing. It is easy to check that 0'= 0i is a dominant strategy for agent i, i = 1,..., I. Let A' denote the difference between agent i's utilities if he announces 'i and agent i's utility if he announces 0'. Different cases must be distinguished.
Case : Y jiO
>0.
(a)
+
0i
i'> O0;
joi
E O + Oi20, j=i
bA= 0' -
' = O.
j=i
Ju~~jl
(c)
E j+ i 0, i=1
if E 0' O} has i elements,
otherwise,
ti(O) = hi(-i'), Vi. (b) The random dictatorship:
y()
= E yk(0), k=l
with yh(O) = O,
1 =
ti()
if
h < 0,
if
h>0,
= hi(O-i),
Vi,
where y(O) should now be interpreted as the probability of realizing the project. Laffont and Maskin (1982) give an example where, for a given optimization problem, the Clarke mechanism is dominated by the random dictatorship which clearly does not always take the Pareto optimal public good decision. Finally, we show how, in the case of divisible public goods, the differentiable approach yields constructively the CGV mechanisms. Suppose that the utility function of agent i is now xi + Vi(y, O
iEOi),
i
i=1 , ...
I.
We look for transfers which implement the Pareto optimal public decision, i.e. the function y*(O) which is implicitely defined by a (y, i) =O.
i= solves the program
Agent i solves the program max vi(y*(Oi, O-), ') + t'(9O, O-')),
eE Oi9
(6.3)
Ch. 10: Allocation of Public Goods
559
yielding, as a necessary condition:
ay*(oi, -i) + at(i ay
aO'
-i) = 0,
MI
which, by the argument of Section 5, is transformed into a differential equation:
) = -a
ati 8i
(y*)()
)ay*
-(Y7t~l ,9
(0),
which can be integrated, yielding:
ti(O)
i
(y * ( s i
a-i), si) Y-(s ,i
- i)
i= 1 ... , I.
dsi+ hi(-i) (6.4)
From (6.3):
aVi
a ,.
a (y*(),oi)= - E aY(*() joi
i).
Substituting into (6.4) we get ti(8) = E vi(y*(O), O) + hi(O-i), jii
which is the Groves class for continuous projects. 7. Weakening incentives requirements Let us remain with the case of a divisible public project, as at the end of the previous section. The major drawback of the class of transfers exhibited there, namely,
ti'() = E VJ(y*(O), oj ) + hi(O-i), joi
is that they do not balance in general, or equivalently that the dominant strategy mechanisms are not Pareto optimal.
560
J.-J. Laffont
D'Aspremont and G6rard-Varet (1979) have shown that weakening incentives requirements to the notion of truthful Bayesian-Nash implementation enables the construction of Pareto optimal mechanisms, i.e. here of mechanisms which balance the budget. Consider first the functions r'(O') which induce agent i to tell the truth if he maximizes his expected utility, assuming that the others tell the truth. Agent i solves the problem max f
,vi(y*('),
i) + r'(O'))
1(`-)dO-',
where agent i's expectation is taken to be independent of his type i', and where y*(O) is as in Section 6 the first-best choice of the public good. The differentiable approach yields the identity:
afo,iv .
aro=
ay*
Y*_ (O-i)dO-'+ ar' =O,
i
and integrating: r()= rl(°')= -
Jo JOay' i
i(O-`) ao'(.1
adv
ay*
dO-'ds
+
constant.
(7.1)
From Section 6 we know that
ay i ds
oa
= - E vi(y*(0), O') + hi( '),
so that reversing integrals in (7.1) gives r'(
)i =
d-i + constant.
vj(y*(0),i') i(i)
f
(7.2)
To obtain transfers which balance, it is enough to consider then -
ti(0) = r'(')
E rJ(0j). ji
The addition to r'(8 ') of a function of but, in addition, clearly I
E ti(O) i=1
0.
-'
does not perturb the above argument
Ch. 10: Allocation of Public Goods
561
The possibility of taking into account dependent beliefs, i.e. functions i(8-i/9i) which depend on i', has been the subject of further research by D'Aspremont and Gerard Varet (1982) and Laffont and Maskin (1979). Recently, Cremer and Riordan (1982) have observed that it is possible to construct balanced mechanisms in which one agent is attributed a Bayesian-Nash behavior, all the others having a dominant strategy. Let us conclude by observing that, as formula (7.2) makes clear, the design of ri(0') requires knowledge of agent i's expectations, 4i(9-'). Another way of weakening incentives has been to look for implementation in maximin equilibria [see Dubins (1979) and Thomson (1979)]. The introduction of random schemes is also helpful [Gibbard (1977, 1978)]. Finally, starting with the work of Groves and Ledyard (1977) it became clear that implementation in Nash equilibria opened great possibilities, such as the achievement of Pareto optimality. Adding the requirement of individual rationality, restricts essentially the mechanisms to the implementation of Lindahl equilibria [see Hurwicz's (1979) theorem]. Maskin (1977) provided a characterization of social choice functions implementable in Nash equilibria. However, as explained in Section 2, this is a very weak and unclear notion of implementation in games of incomplete information.
8. Planning procedures The idea of using planning procedures to provide public goods is an old one. The first rigorous formalizations appear in Tideman (1972), Dreze and de la Valle Poussin (1971) and Malinvaud (1971, 1972). We adopt here a presentation based on the logic of our theme. Planning procedures are presented by solution concept from the point of view of implementation.
8.1. Local dominant equilibrium
For simplicity, we consider an economy with two consumers and two commodities, a public good and a private good. The utility function of agent i, vi(x', y), is a smooth strictly concave utility function. Agent i starts the procedure with an endowment wi > 0 in the private good. For simplicity, again, we ignore boundary problems and assume that the public good is costless.10 10
It is straightforward to define imputed shares of the cost of the public good and define net willingnesses to pay.
562
J.-J. Laffont
Myopic behavior implies that agent i attempts to maximize at instant t: d v'
where §' denotes the true marginal rate of substitution. The revelation principle applied to the local game implies that we can restrict the analysis to direct planning procedures where agents announce marginal rates of substitution: = y(ol, 2), i= Xi(01, 02),
(8.1)
i=1,2.
A planning procedure is strongly locally individually incentive compatible (SLIIC) if revelation of the true marginal rate of substitution is a dominant strategy at each instant for every consumer. It is balanced if X'(O1
, O2)
+ X 2 (8',92) =
VO.
A Pareto optimal allocation (x l , x2 , y) is characterized here by Ol(x l , y) + 02 (x 2, y) =
(Samuelson condition),
x +X 2 =0. We say that a planning procedure is Pareto optimal iff its stationary points are Pareto optimal allocations: Y(Ol(x, y), 82(x2, y))=O XI(OI(x, y), 2 2 (X2 2, y)) X(O1(xl, y), (x , y)) = 0 X 2 (Ol(xl, y), 02(x2, y))=O
01(x',y) + 0 2 (X2 , y) = 0,
X +xX2 =0
It is convergent iff the solutions of the dynamic system (8.1) converge to its stationary points; it is dynamically efficient iff it is balanced, Pareto optimal and convergent. We say that it is individually rational iff d v' dt
0,
i=1,2.
A class of planning procedures is neutral or unbiased iff each individually
563
Ch. 10: Allocation of Public Goods
rational Pareto optimal allocation can be reached by a member of the class. We restrict the analysis to C2 planning procedure [i.e. where Y(-), X'(.) and X 2 (.) are C2]. At each instant, the objective function of agent i is linear in 0'. The method of Section 5 can be used to show that incentive compatibility requires
Y aeXl(
'
1
Y O;-
, 92) =
O' 1
Y (1,
X 2 (0, 02) = -,
y(
2)
dsl + hl (0 2 ),
s 2 ) ds2
+ h 2 (91 ).
(8.2)
We can then prove: Theorem 8.1
Under the above assumptions a two-person C2 planning procedure is SLIIC, Pareto optimal, balanced and individually rational iff (i) (ii)
= A(0 1 ) - A(-02), A strictly increasing and C 2,
Y()
XL(a)
=
0sl'1(s1)ds
-
+ f9 252A'(s2)ds2,
x 2(H) = -X 1 (8). Proof
As a first step we show that balancedness requires Y(O) = y 1 (81 ) + y 2(0 2 ).
From incentive compatibility, Xl + X 2 =
-
Y (51 82 )dsl + h1 (82 )
f1
o
-|o
as s
as
(l,
2)dS2+
2(Ol).
564
J-J. Laffont
Differentiating with respect to O8and 02, we obtain: (01 + 02)
182'
Therefore, balancedness implies
(0 +
a2Y
)
0=
and a2Y 8182 a00 02 O,
since Y is C2. Reciprocally, integrating back we show that 8 2 Y/80'82- 0 implies the possibility of constructing a balanced procedure. Pareto optimality requires: Yl(O 1 ) + Y 2 (-_
1
O0 for any 81 E 0.
)
Denoting A(-) - yl(.), we have: Y(o) = (o 1 ) -
(-o 2 ).
To avoid the existence of stationary points different from the Pareto optima, A( ) must be strictly increasing. From the above results, transfers can be rewritten as X=
-
X2 =
+ f s2A'(-s2)ds2 + K,
flSlA(sl)ds 2
sA'( _
2
)s ds 2 + fo fOs1A(s1)
ds 1
K.
It remains to show the implication of individual rationality, do'
-T cr 'Y+ Xi > O.
Integrating by parts 1'Y + X1, we have: f
0
A(s)ds-(90+
02)A(-9
2) + K
0.
Ch. 10: Allocation of Public Goods
565
From the fact that A(.) is increasing, the first two terms are always non-negative. Since they can become zero (if 01 = - 02), K must be non-negative. Reasoning symmetrically for agent 2 we show that K< 0, implying K= 0. Q.E.D. Thanks to individual rationality, the dynamic efficiency of the SLIIC procedures defined above is easily proved: U + U2 can be taken as a Lyapunov function. It is increasing and easily bounded above by the boundedness of the feasible set. Consequently, the procedures are quasi-stable in the Uzawa sense; any limit point of the trajectory is a stationary point. Stability then follows since, from the strict convexity of preferences and technology, stationary points are Pareto optimal and there is a unique Pareto optimum that dominates all points of the trajectory. Since the property of truthful revelation is a closed property we can take non-differentiable limits of increasing functions. Moreover, the above analysis is easily extended to piecewise/differentiable procedures. For example, Y()
= + 1,
=0,
= -1,
if 01 > 0 and 2 > 0 (one strictly at least); if 01 > 0 and 02 < 0, or 01 < 0 and 02 > 0, or 01 = 0 and 02 = 0; if 01 < 0 and 02 < 0 (one strictly at least),
defines an incentive compatible procedure. This procedure," which can be called the Bowen procedure, does not require any transfers since Y is piecewise constant. However, for more than two agents it does not have the property of Pareto optimality, a usual feature associated with majority voting. Finally, the neutrality of the class of SLIIC procedures, when all increasing functions Y(.) are considered, is easily shown. However, this property does not extend to the case of more than two agents. 12 Moreover, the property of neutrality as defined above is not very interesting when agents do not behave truthfully. A planning procedure should be able to use at date t only the information available at date t. The question of the ability of the Center to affect the distribution of resources when agents behave strategically still remains quite open. 1It 12
corresponds to majority voting. For more see Fugigaki and Sato (1981), Laffont and Maskin (1983), and Champsaur and Rochet (1983).
566
J.-J. Laffont
Since little is available on planning procedures based on local Bayesian equilibrium, we turn now to local maximin equilibrium.
8.2. Local maximin equilibrium The planning procedures introduced by Dreze and de la Valle Poussin and Malinvaud-the MDP procedures-turned out to be a special class of direct revealing procedures when the solution concept is the local maximin equilibrium, and the change. of public good is linear in marginal willingnesses to pay: = 9 +02,
= -_0 + x 2 = _--02 81 >
0
,
+
52 2
82>0,
2
, 81
+ 8
2=1 .
This procedure can be balanced since individually rational since
is additive in 01 and
2.
It is clearly
do'
dt cc Sip 2 .
When varying the weights (8 i') one generates a neutral class [Champsaur, 1976]. Finally, it is easy to check that truthful behavior is a local maximin strategy. Indeed, truthful behavior ensures that utility does not decrease (individual rationality). Any other strategy opens the possibility of net losses. One may wonder if there is a SLIIC procedure associated with p = + 02. There is one, as can be checked from formula (8.2). It corresponds to 1= 2 = 1/2, i.e. to an and equal sharing of the surplus .p 2 remaining after the compensations - 0' - 082) provided for the change in public good. (For I> 2 the two classes do not intersect.) There is no good reason to limit the analysis to linear Y(-) functions. A characterization of all procedures which are incentive compatible in the local maximin sense includes of course the MDP class and the SLIIC class [see Laffont (1985)]. The widespread interest in the MDP class has led to more work and results about this procedure, using the concept of local Nash equilibrium [see Roberts (1979), Henry (1979)] as well as results on the properties of the global game (i.e. when agents are not myopic) [Champsaur and Laroque (1982)].
Ch. 10: Allocation of Public Goods
567
9. Conclusion Many results described in this chapter appear fairly abstract for any type of application. However, some real progress has been achieved in the understanding of the difficulties raised by the free-rider problem and, more generally, by incentives constraints; the basic ideas leading to the construction of incentive compatible mechanisms have been laid out. In my view, two major issues remain. First, any real application will be made with methods which are crude approximations to the mechanisms obtained here. The rationale for these simplified schemes remains to be given. Clearly, transaction costs will play a major role in this explanation, but other considerations, such as simplicity and stability to encourage trust, goodwill and cooperation, will have to be taken into account. Second, the above analysis has taken the view that agents know their willingnesses to pay for public goods. However, it is often the case that transaction costs must be incurred by agents to find out correctly this information. Two different views can be held here. The optimistic view is that the practice of direct democracy should, in the same way as the practice of market activities for private goods, lead agents to become well informed about their tastes for public goods. The pessimistic view is that transaction costs will still have to be incurred by agents to provide accurate information; therefore, the incentives mechanisms must provide a strength of incentives which is strong enough to overcome these costs. This more subtle version of the free-rider problem cannot be considered as solved. 3 It can even be conjectured that it cannot be solved in a satisfactory way within the homo economicus paradigm. However, if moral considerations can be relied upon for a positive attitude towards becoming an informed citizen, it will be important to be able to use social decision mechanisms which, if they offer to agents only a weak incentive to be truthful, at least do not give any incentive to misrepresent their information.
References D'Aspremont C. and L.A. Gerard-Varet, 1979, Incentives and incomplete information, Public Economics 11, 25-45. D'Aspremont C. and L.A. Gtrard-Varet, 1982, Bayesian incentive compatible beliefs, Mathematical Economics 10, 83-103. Barbera S., 1982, Truthfulness and decentralized information, Mimeo (Bilbao). Border K. and J. Jordan, 1983, Straightforward elections, unanimity and plantom voters, Economic Studies 50, 153-170. Champsaur P., 1976, Neutrality of planning procedures in an economy with public goods, Economic Studies 43, 293-300. 13
See Green and Laffont (1979, ch. 13) and Muench and Walker (1979).
Journal of Journal of Review of Review of
568
J.-J. Laffont
Champsaur P. and G. Laroque, (1982), Strategic behavior in decentralized planning procedures, Econometrica 50, 325-344. Champsaur P. and J.C. Rochet, 1983, On planning procedures which are locally strategy proof, Journal of Economic Theory 30, 353-369. Clarke, E., 1971, Multipart pricing of public goods, Public Choice 8, 19-33. Cremer J. and M. Riordan, 1982, A Stackelberg solution to the public goods problem, Mimeo (University of Pennsylvania). Dasgupta P., P. Hammond and E. Maskin, 1979, The implementation of social choice rules: Some general results on incentive compatibility, Review of Economic Studies 46, 185-216. Dreze J. and D. de la Vallee Poussin, 1971, A ttonnement process for public goods, Review of Economic Studies 38, 133-150. Dubins, L., 1979, Group decision devices, Mimeo (Berkeley). Dummett and Farquharson, 1961, Stability in voting, Econometrica 29, 33-44. Fugigaki Y. and K. Sato, 1981, Incentives in the generalized MDP procedure for the provision of public goods, Review of Economic Studies 48, 473-486. Gibbard A., 1973, Manipulation of voting schemes: A general result, Econometrica 41, 587-601. Gibbard A., 1977, Manipulation of schemes that mix voting with chance, Econometrica 45, 665-681. Gibbard A., 1978, Straightforwardness of game forms with lotteries as outcomes, Econometrica 46, 595-614. Green, J. and J.J. Laffont, 1977, Characterization of satisfactory mechanisms for the revelation of preferences for public goods, Econometrica 45, 427-438. Green J. and J.J. Laffont, 1979, Incentives in public decision making, Studies in Public Economics (North-Holland, Amsterdam) vol. 1. Groves T. and J. Ledyard, 1977, Optimal allocation of public good: A solution to the free rider problem, Econometrica 45, 783-810. Groves T. and M. Loeb, 1975, Incentives and public inputs, Journal of Public Economics 4, 311-326. Hammond P., 1979, Straightforward individual incentive compatibility in large economies, Review of Economic Studies 46, 63-82. Henry C., 1979, On the free rider problem in the MDP procedure, Review of Economic Studies 46, 293-303. Holmstrm B., 1979, Groves schemes on restricted domains, Econometrica 47, 1137-1144. Hurwicz J., 1979, On allocations attainable through Nash equilibria, Journal of Economic Theory 21, 140-165. Laffond, G., 1980, Revelation des preferences et utilit6s unimodales, Working Paper (CNAM). Laffont J.J., 1985, Incitations dans les procedures de planification, Annales de l'INSEE 58, 3-36. Laffont J.J. and E. Maskin, 1979, A differential approach to expected utility maximizing mechanisms, in: J.J. Laffont, ed., Aggregation and revelation of preferences, Studies in Public Economics (North-Holland, Amsterdam) vol. 2. Laffont J.J. and E. Maskin, 1982, The theory of incentives: An overview, in: W. Hildenbrand, ed., Advances in economic theory (Cambridge University Press, Cambridge) ch. 2. Laffont J.J. and E. Maskin, 1983, A characterization of SLIIC planning procedures with public goods, Review of Economic Studies 50, 171-186. Malinvaud, E., 1969, Lecons de theorie microeconomique (Dunod, Paris). Malinvaud, E., 1971, A planning approach to the public good production, Swedish Journal of Economics 73, 96-112. Malinvaud E. 1972, Prices for individual consumption, quantity indicators for collective consumption, Review of Economic Studies 39, 385-405. Maskin E., 1977, Nash equilibrium and welfare optimality, Mimeo. Moulin, H., 1980, On strategy-proofness and single peakedness, Public Choice 35, 437-455. Muench T. and M. Walker, 1979, Identifying the free rider problem, in: J.J. Laffont, ed., Aggregation and revelation of preferences (North-Holland, Amsterdam). Rob R., 1982, Asymptotic efficiency of the demand revealing mechanism, Journal of Economic Theory 28, 207-220. Roberts J., 1979, Incentives in planning procedures for the provision of public goods, Review of Economic Studies 55, 283-292. Satterthwaite M., 1975, Strategy-proofness and Arrow's conditions: Existence and correspondence
Ch. 10: Allocation of Public Goods
569
theorems for voting procedures and social welfare functions, Journal of Economic Theory 10, 187-217. Satterthwaite M. and H. Sonnenschein, 1981, Strategy proof allocation mechanisms at differential points, Review of Economic Studies 48, 587-598. Smets H., 1972, Le principe de la compensation rciproque, Paris OCDE, Direction de l'Environnement. Thomson W., 1979, Maximin strategies and the elicitation of preferences, in: J.J. Laffont, ed., Aggregation and revelation of preferences, Studies in Public Economics (North-Holland, Amsterdam) vol. 2. Tideman N., 1972, The efficient provision of public goods, in: S. Mushkin, ed., Public prices for public products (The Urban Institute, Washington, D.C.). Vickrey W., 1961, Counterspeculation, auctions and competitive sealed tenders, Journal of Finance 16, 1-17.
Chapter 11
THE ECONOMICS OF THE LOCAL PUBLIC SECTOR DANIEL L. RUBINFELD* University of California, Berkeley
1. Introduction
One of the dominant areas of study in public finance during the past several decades has been the provision of public goods. Public finance economists spend a good deal of time and effort attempting to determine what individuals' preferences for public goods are, to decide what the "efficient" level of provision of goods is, and to analyze the actual production process that provides these goods. While the provision of public goods may occur at any of several levels of government, local aspects of public economics have only recently come to be an area of substantial academic interest. What distinguishes the study of the local public sector from the general theory of public goods is the possibility of migration between local jurisdictions. My analysis begins and ends with a focus on the "Tiebout (1956)" model. Tiebout suggested that it might be useful to view the provision of local public goods in a system of numerous jurisdictions as being analogous to a competitive market for private goods. Competition among jurisdictions would allow for a variety of bundles of public goods to be produced, and individuals would reveal their preferences for those public goods by moving (" voting with your feet"). Tiebout also argued that such a process, at least in the abstract, would lead to an efficient outcome in the sense that no one would be able to make themselves better off without making someone else worse off. The Tiebout view provides an extreme view of the world, but it does allow us to concentrate on what makes the local public sector of unusual interest. Of course, the Tiebout view of the world is a narrow one. However, at present there really is no fully satisfactory model of the local public sector, so that the *This chapter was completed in March 1983, with only minor updating of newer material. I apologize to those authors whose work has inadvertently been omitted. Henry Aaron, Alan Auerbach, Harvey Brazer, Jan Brueckner, Paul Courant, Gerald Goldstein, Edward Gramlich, Robert Inman, Helen Ladd, Mark Pauly, Thomas Romer, Susan Rose-Ackerman, Harvey Rosen, Suzanne Scotchmer and Richard Tresch were all kind enough to read and comment on an earlier draft of the paper. Edith Brashares provided valuable research assistance. Handbook of Public Economics, vol. II, edited by A.J. Auerbach and M. Feldstein © 1987, Elsevier Science Publishers B. V. (North-Holland)
572
D.L. Rubinfeld
appropriate question is whether and under what assumptions the Tiebout model (or an alternative model with fixed population which focuses on the choice of public goods within a jurisdiction) provides a good "approximation" to the local public economy. I will begin with a simple and hopefully intelligible form of the model and then add sequentially a number of complicated, but relevant assumptions which can be crucial in understanding both normative and positive aspects of local public service provision. One of the concerns raised by Tiebout (often confused in the literature) is the determination of optimal public service provision. Optimality or efficiency (used interchangeably) in local public economics has two distinct meanings, which I will have occasion to discuss throughout the chapter. The more common use of the term applies to the provision of public services within a single jurisdiction. An optimal provision, which leads to intrajurisdictionalefficiency, is one which maximizes the sum over all individuals within the jurisdiction of the net-of-costs willingness to pay for the public good, where the set of individuals within the jurisdiction is taken to be fixed. Optimality within a system of jurisdictions, i.e. interjurisdictionalefficiency, applies to the provision of public services among jurisdictions when migration is possible. Efficiency is achieved when the existing number of jurisdictions results in the provision of a level of public goods which is sufficient to satisfy individuals' demands, and is produced at minimum cost. The notion of interjurisdictional efficiency, while clearly unimportant to the discussion of public goods at the federal level, is central to local public economics and will serve as a focal point of this chapter. As a first step in discussing the question of jurisdictional optimality one might ask whether a given jurisdiction ought to be increased in size or decreased in size. In other words, is it efficient to have a small number of large jurisdictions or a large number of small jurisdictions? One immediate thought is that a small jurisdiction has an advantage from a political perspective. The smaller and more homogeneous is each of the communities in a system of local governments, the more likely it is that services provided will be consistent with desires of each and every member of the population. Thus, a small jurisdiction is likely to minimize the "political externalities" that individuals create when selecting a level of public goods provision that does not satisfy everyone's tastes. On the other hand, local redistributive goals that society might have are likely to be thwarted if communities are small and homogeneous. In addition, most public goods involve benefits that extend beyond jurisdictional boundaries. The obvious way to solve this externality problem is to internalize it by expanding the jurisdictional scope. Finally, jurisdictional scale may also be affected by the supply side of the local public sector. If there are economies of scale in the provision of local public services, it makes sense to have larger jurisdictions, for production purposes at least. The problem of balancing these objectives is a complex problem which will appear and reappear throughout the discussion.
Ch. 11: Economics of the Public Sector
573
The chapter is primarily a methodological one. I sketch examples of some of the kinds of models that local public finance economists have used, illustrating some of their advantages and difficulties, but without attempting to achieve complete coverage. A general discussion of public goods provision has been omitted here since it is treated in the opening chapter of this Handbook, while the discussion of public choice and political economy appears in a later chapter. Relatively few references are given in the text, but the reader is urged to read the footnotes and suggested references there when further detail is desired. Section 2 begins the discussion of optimality among jurisdictions with an analysis of the club model of local public provision of public services. This model allows one to see in a very simple way the conditions under which the competitive nature of the provision of local public goods is similar to the competitive market. It illustrates the trade-off between scale economies in the provision of public goods and the externality created when communities become congested or crowded. Section 3 continues with a detailed description of the Tiebout model. It expands the earlier discussion to look at the allocation of individuals among communities when individuals have different incomes, and when a property tax is used to finance public goods. The primary concern is whether an equilibrium exists and if so whether it will be efficient. Local public economics is concerned with the fact that both public spending and taxes can have a direct impact on land and property values. This additional consideration is added to the Tiebout framework in Section 3. The possibility that public sector variables can affect land values alters the analysis of the efficiency of the Tiebout mechanism. How one might test the Tiebout model in such a framework is sketched out at the end of the section. With the broad normative analysis in the background, the review switches briefly to a more technical positive analysis of the local public sector. I ask whether information about individual demand functions for public goods can be obtained within a world of numerous jurisdictions, and if so how one might estimate those demand functions. To some extent the issues are similar to those in the general public finance literature, but information about interjurisdictional variation in the provision of public services adds greatly to the set of possible techniques for estimating demand functions. However, the possibility of migration in response to fiscal variables in a Tiebout-type world adds substantially to the difficulty of obtaining consistent demand parameter estimators. Section 4 illustrates several traditional approaches which can yield demand functions. Section 5 focuses on demand functions also, but looks at some non-traditional methods for estimating demand functions. The first involves analysis of the use of survey data, and the second an examination of differentials in property values, the so-called "hedonic" approach. The discussion then continues to ask: If one knows about demand functions, is it possible to test whether the provision of local public services within jurisdictions is efficient? Finally, Section 6 presents a
574
D.L Rubinfeld
broader overview of the question of the allocation of responsibilities among levels of jurisdictions. The so-called "fiscal federalism" question asks whether the local government ought to be balancing its budget, what local public goods should be provided locally, and whether and how the revenues ought to be shared between levels of government.
2. Community formation and group choice The large number of jurisdictions providing public goods distinguishes local public economics from the rest of public economics. Because individuals are mobile and vary in tastes, the determination of the optimal level of provision of public goods among a series of jurisdictions is complex, as is the meaning of "optimality" itself. Tiebout provided a start by describing a model in which there is a mobile population and a sufficient set of either potential or existing communities offering varying bundles of public goods so that, with costless mobility, individuals can choose the community with the best package of public goods and taxes and in the process reveal their true preferences for public goods. There is no private sector production in the Tiebout model. Individuals simply allocate some portion of their income (obtained outside the model as "dividend income") towards the consumption of the public good and consume the rest. The result is an equilibrium with individuals distributed among communities on the basis of public service demands, with each individual obtaining his own desired public service-tax bundle. The Tiebout model is a demand side model, within which individuals choose locations on the basis of their preferences for public goods. Tiebout's paper says little if anything about the supply side and, in particular, nothing about the technology of production of public goods or the underlying political mechanism by which levels of public goods might be chosen. Average cost curves for public services are U-shaped as a function of population size, and all jurisdictions are assumed to provide services at the point of minimum average cost. As a consequence of all of these assumptions, Tiebout suggests that the outcome of a process by which individuals select jurisdictions will be optimal or Pareto efficient in the sense that no one can be made better off without making someone worse off. Efficiency arises both because public goods are provided at minimum average cost and because each individual resides in a jurisdiction in which his demand is exactly satisfied. By revealed preference individuals who could have moved chose not to, and thus cannot make themselves better off. The Tiebout model has appealed to a large number of writers in public finance, in part because of its analytical convenience, and in part because of the analogy
Ch. 11: Economics of the Public Sector
575
between the Tiebout framework and the competitive model with which economists are so familiar. The number of variants of the Tiebout model is substantial. Here, as in previous analyses, I focus on the analogy between the many jurisdictions and the competitive market. The fundamental question to be answered is whether a competitive analog in the form of an equilibrium distribution of households among jurisdictions exists, and if so whether that equilibrium is efficient. In the simple version of the Tiebout model that arises out of the theory of "clubs",' the Tiebout equilibrium exists and yields an efficient outcome. As the model is generalized to incorporate more realistic assumptions, however, an existence problem arises.. Perhaps more disturbing is the fact that when an equilibrium does exist, it may not be efficient. Since my objective is to progress from the simple Tiebout model towards more complex and realistic models of the local public sector, a brief summary of the assumptions of the Tiebout model will serve as a good starting point. The reader should refer to this list in the following sections as many of the assumptions are relaxed. The Tiebout assumptions are: (1) Individuals have perfect information. (2) Mobility is costless and is responsive only to fiscal conditions. (3) Public goods are provided at minimum average cost within each jurisdiction. Each new migrant to a community pays an access cost equal to the cost of providing public services to that migrant. (4) There are no interjurisdictional externalities. (5) There are a sufficient number of jurisdictions and a sufficient number of households of each type (in terms of tastes and incomes) so that each jurisdiction can contain identical individuals. Thus, new communities can be developed costlessly. (6) There is no public choice mechanism other than the utility-maximizing decision of an identical set of individuals. (7) All income is dividend income, not generated by private production. There is no labor market. (8) Public goods are financed by lump sum taxes. (9) There is no land, no housing; and therefore no capitalization. Clearly, these restrictions are extremely strong. The hard question is whether the Tiebout model is sufficiently like the local public economy so as to make it useful for both normative and positive purposes. This is a question for which there is not an easy answer, but which motivates a good deal of the methodological discussion which follows. 2 'See, for example, Buchanan (1965), See, also, Sandler and Tschirhart (1980), Berglas (1982), Berglas and Pines (1984), and Scotchmer (1985a). 2 Of course, this is in part an empirical question. as is discussed in Oates (1981).
D.L. Rubinjeld
576
2.1. A starting point- the theory of clubs Assume initially that all individuals are identical in tastes and in incomes. In such a world the level of public services provided will be the utility-maximizing level selected by any individual within the community. This most elementary "public choice" mechanism will be relaxed later in the chapter, in which case the competitive analogy is substantially weakened. Assume also that a pure public good is provided to all identical residents of a community or region which is isolated from other communities.3 The "pure" public good has the property that its consumption is unaffected by the number of people in the community. To produce the public good, the production of the private good, and thus private consumption, must be forgone. As a result, a trade-off exists, in both production and consumption, between public and private goods. Following Stiglitz (1977), let G represent the level of provision of the public good, X private consumption per capita, W total income available to all members of the community, and N the number of individuals. Private and public goods are assumed to be measured in identical units and produced by a single production process. The single production function, f, for both public and private goods, is given in equation (1): W=f(N),
f' positive, f" negative.
(1)
With this functional form there are first increasing and then decreasing returns to scale (in population). Total income W is measured in units of output, produced within the jurisdiction. The constraint W= XN + G
(2)
reflects the fact that W must be fully allocated among the purely private good and the purely public good. As a result of these assumptions, the Stiglitz model is quite restrictive. It requires, in effect, that individuals work where they live and consume public goods on islands. If N is fixed, and all individuals in a given jurisdiction have identical tastes and income, then each resident's problem is to choose X and G to maximize his utility function, U(X, G). The Lagrangian is (X is a Lagrange multiplier): maximize U( X, G) - X(XN+ G-f(N)),
(3)
and the first-order conditions are dU/X - XN=
(4)
3 This model is a good starting point, but is somewhat limited in its application to the local public sector, since it assumes an isolated island economy. The assumption that all income is generated from local production is a very restrictive one.
577
Ch. 11: Economics of the Public Sector
X
f(N)/N
:U f(N)
G
Figure 2.1
and
aU/aG-
= 0.
(5)
Solving equations (4) and (5) yields: N [(au/aG)/(aU/ax)] = 1.
(6)
Equation (6) is the Samuelson condition for efficient provision of the public good. The expression in brackets is the marginal rate of substitution of public for private goods, and N is the population. The left-hand side of the equation is the sum of the individual marginal rates of substitution between public and private goods. The right-hand side is the marginal rate of transformation, i.e. the marginal cost in terms of forgone units of X of producing an additional unit of output G (since inputs and outputs are all measured in equivalent units). In Figure 2.1, f(N) represents maximal public good provision, while f(N)/N is the maximum possible per capita consumption of the private good. Since the technology is assumed to be linear, the production possibility frontier is linear and an interior optimum is attained at point E. Thus, for a given population, the optimum provision of the public good is easily determined, since complications associated with the public choice process in a world of heterogeneous individuals are not present here. Now consider the problem of allocating population among jurisdictions. If the level of public goods, G, is fixed in each community, how many people should reside in each? The "club" analogy applies here, since this is the kind of decision that would face an entrepreneur attempting to decide how many individuals to let into an exclusive private club. The entrepreneur's objective is to obtain the
D.L. Rubinfeld
578
highest entry fee possible (with identical individuals), which can be accomplished by choosing N to maximize private good consumption given a fixed G. From (1) and (2) private good consumption is given by (7)
X= (f(N) - G)/N. Maximizing X with respect to N yields:
(8)
X=f'(N).
This condition states that population is added to the point at which the added income is equal to the per capita consumption of the private good. If the additional income were lower than X, for example, the newest member of the club would not have sufficient private income to assure that the current standard of living (private consumption) would be maintained. With the level of G as well as N allowed to vary, the full club model can be analyzed. As population increases, aggregate income and thus the maximum level of public good obtainable increases, given that there is a positive marginal product of labor. However, the maximal level of private consumption decreases, since the marginal productivity of labor diminishes. In other words, as N increases, f(N) rises, but f(N)/N falls. An example of how the "opportunity sets" vary with N is given in Figure 2.2. Clearly, a high population allows for increased public good consumption, but at the cost of having less of the private good available (and greater congestion and other externalities in a more realistic model). In terms of the entrepreneur's decision to set the size of the club optimally, the locus of all possible opportunities is relevant, since any allocation of resources x
G Figure 2.2
Ch. 11: Economics of the Public Sector X
579 X
A
G Figure 2.3
not on the locus cannot be utility maximizing. Figure 2.2 illustrates the situation in which there are only three sizes of jurisdictions, but the general point should be clear. Although the production possibilities set within each community is convex for a given N, the opportunity set may well be non-convex. As a consequence, a number of possibilities can arise, three of which are illustrated in Figure 2.3. 4 Figure 2.3 shows that the optimal size of the community may be either zero, the entire population, or somewhere in between, depending upon the shape of the opportunity locus and the shape of the indifference curves of the identical individuals. The model is an extremely simple one, but the theoretical possibility of "corner solutions" because of supply or demand conditions holds quite generally.5 The corner-solution possibility should not be particularly troubling to students concerned with policy, because the Tiebout model assumes away a 4 See 5
Stiglitz (1977) for additional details. See, also Wooders (1980). Multiple equilibria are also possible, but not illustrated in the figure.
580
D.L. Rubinfeld
number of "frictions" such as imperfect information, moving costs, and land-use restrictions which make a corner outcome unlikely. However, to public finance theorists the prospect is more serious because it suggests that models without "interior" equilibria may occur in local public economics. If interior optima are to be studied (using the calculus) some further assumptions about the nature of utility and production functions or about institutions such as land-use controls, must be made. 6 To avoid this problem I assume in the next section that any solution to the optimization problem does not involve a corner solution.
2.2. Optimum club size with congestion The club model becomes more interesting and more realistic if one allows for congestion in the consumption of the public good. There are a number of ways in which congestion might be modeled, but one simple alternative is to account for congestion in the cost of providing the public good. To do so, I borrow from Henderson (1979) and denote the cost of providing a unit of output of the public good (e.g. a year of public education) by C(N). If the public good is a pure public good, then C'(N)= 0. If the public good has either private good characteristics or can have congestion in consumption, then C'(N) is positive. I assume that consumers are able to purchase units of housing, H, at an exogenous price, p, and for pedagogic reasons that they are identical in both taste and income, but will relax this assumption in a moment. The individuals solve the following optimization problem, where Y represents individual exogenous income and the price of X is one: maximize U( X, H, G) subject to X = Y - pH - [C(N)/N] G.
(9) (10)
The first-order conditions are similar to those in (4) and (5) yielding: N [(aU/aG)/(aU/aX)] = C(N)
(Ila)
[(au/aH)/(au/ax)]=p.
(Ilb)
and
In (Ila) the left-hand side represents the sum of the marginal rates of substitution of individuals' preference for public versus private goods, while the right-hand side represents the marginal rate of transformation or the cost of producing an additional unit of the public good when population is fixed. Equation (lb) represents the equivalent condition for the MRS between housing and the private good. 6 Courant and Rubinfeld (1981, pp. 297-298) describe a set of conditions sufficient to guarantee an interior optimum.
Ch. 11: Economics of the Public Sector
581
Now, return to the entrepreneurial decision about optimum club size. Maximizing private consumption with respect to N, with G fixed, yields the necessary condition given in (12): C'(N) = C(N)/N.
(12)
Equation (12) states that the optimum size of the community 7 is the point at which the average cost of the provision of public services is equal to the marginal cost of adding another individual to the club. For the standard U-shaped cost curve this is the point of minimum average cost. Thus, an entrepreneur wishing to maximize consumer's utility, and implicitly entrepreneurial rents, would choose a community size at which the public good is produced at minimum average cost. Models of this type as applied to the study of local jurisdictions assume that communities can control the entry of population as well as the provision of public goods, so that by choosing N to minimize the cost of providing public goods, they can maximize the price they can obtain for entry into the jurisdictions. 8 Any entry charge is taken to be the same for everyone. Assuming that the total population to be allocated among jurisdictions is a multiple of the optimum club size, the equilibrium solution would involve a host of identical clubs, all of equivalent optimal size. As shown by Henderson (1979), such an outcome would be efficient and stable. When consumers are identical, if any jurisdiction is above or below the optimum jurisdiction size, then the cost of public good provision in that jurisdiction would be higher than average, and hence, a new club would be formed to provide the same public good bundle at lower cost. If the population were not a multiple of club size the analysis would be greatly complicated because public goods would not be produced at minimum average cost in each community. Pauly (1970) argues that in general an equilibrium will not exist in such a case. The club model provides a natural introduction to the Tiebout model because it describes optimal public good provision within communities as well as optimal jurisdiction (or club) size. The Tiebout model focuses primarily on interjurisdictional optimality in a world of varying tastes and incomes, and should be viewed simply as an analysis of the optimal provision of public goods in a series of clubs or jurisdictions. 3. The Tiebout model 3.1. Allocating individuals with different incomes Models of optimal community size are most interesting when the tastes and/or incomes of consumers and their public good demands vary. To get a sense of the 7
Assuming that the optimum exists and is not a corner solution. 8See, for example, McGuire (1974).
582
D.L. Rubinfeld
added complexity which results I will treat the simplest case in which there are two types of individuals with identical tastes, but different incomes. This turns out to be slightly easier to analyze than the case of identical incomes and differing tastes which is what the Tiebout model is about. I will also assume that each of these groups is sufficiently large to support jurisdictions of the optimal size and that individuals within jurisdictions pay equal tax shares. A necessary condition for optimality within each jurisdiction is that the population must be divided into different types of clubs each of which is homogeneous in income and satisfies the Samuelson condition for efficiency in the provision of public goods. A mixed club, on the other hand, would necessarily be inefficient. A typical possibility in a mixed club would be that the actual level of public provision would be an average of the demands of the two types of individuals. Neither the low nor the high income group would then be consuming their most preferred level of public services given the taxes that they pay. In this case both groups could improve their well-being if they moved to homogeneous communities 9. Heterogeneity is inefficient when all individuals pay equal taxes within a jurisdiction. Of course, heterogeneity may make sense in a model in which individuals of differing skills are complementary factors in production or if individuals simply have a taste for living among others of different tastes and/or incomes. However, within the relatively simple club model we ask a harder question. Does the heterogeneity result change when taxes paid are a function of income or of the benefits received from the consumption of public goods? To answer this question, consider a Lindahl pricing or tax scheme in which the tax paid per unit of public good is equal to the marginal benefit that the individual receives from the consumption of the last unit of the public good provided.' 0 By construction, the Lindahl taxation scheme has the property that both high and low income individuals will choose to consume the identical public service bundle given the "price" that they face. Although appealing in terms of the marginal conditions of utility maximization, this solution is nonetheless inefficient; both groups of individuals can increase their welfare by moving to jurisdictions with equal tax shares. To achieve the income redistribution implicit in the diversity of tax shares it is better to simply redistribute income rather than redistribute through induced overcon9 As Henderson (1979) argues, in a mixed club the Samuelson condition will provide for provision of public goods above that of the low demanders and below that of the high demanders. (There is a positive income elasticity of demand.) Such an arrangement lowers the potential utility obtained by all members in the community. The low demanders are forced to consume on the same production possibilities curve as in a homogeneous community, but at a point that is not a tangency with their indifference curves. The same is true for the high demanders since they underconsume rather than overconsume. 'OLindahl prices here are set equal to each individual's marginal benefit from the public good at a given level of provision (not necessarily the efficient level).
Ch. 11: Economics of the Public Sector
583
sumption of public goods.1 l Thus, heterogeneity is not efficient in this simple Tiebout model. The argument developed here could be extended to the case in which there are a large number of types of individuals differing both in incomes and in tastes. Clearly, the optimal number of jurisdictions which provide public goods would increase. With costless mobility each individual could move to the community of his or her choice. In such a world the tax system would not be distorting since public goods would be paid for with a head tax. The market analogy suggested by Tiebout would be complete. An equilibrium that existed would be efficient. While Tiebout did not 'describe how such an equilibrium might arise, one can easily image an entrepreneurial process that would lead to the creation of new communities. It is at this point that the extreme limitation of the Tiebout model becomes clear. For the Tiebout model to generate an efficient outcome each community would have to produce at minimum average cost, yet there may not be enough people of each type to provide the appropriate scale for production efficiency. In addition, the assumption that new communities can be developed costlessly is crucial for the market analogy to be complete. 12 Without free entry individuals cannot be assured that opportunities will exist to consume the utility-maximizing level of public good in a homogeneous community. Free entry is a very strong assumption, however. In urban areas and in urban models with labor markets and limited accessibility, the number of possible communities is restricted. A new community would either have to evolve at the urban fringe with workers facing high costs of commuting, or new urban areas would have to develop - an unlikely prospect. Finally, the Tiebout competitive model does not allow for the property tax as a means of financing public goods - a complication to which I now turn.
3.2. The property tax The property tax is the primary source of revenue for local governments in the United States. While its role as a revenue source has diminished substantially since the turn of the twentieth century, it still represents over 50 percent of total local revenues (54.7 percent in 1980). This figure includes only revenues that are raised locally. Intergovernmental aid from both states and from the federal government today represent an additional major revenue source, a subject to which we turn in Section 6 of the chapter. When the property tax is used to finance the local public good and when tastes for housing are not perfectly correlated with tastes for public goods (i.e. when "See Henderson (1979) as well as Eberts and Gronberg (1981). This point is developed in Courant and Rubinfeld (1982) and Epple and Zelenitz (1981).
12
584
D.L. Rubinfeld
housing and public goods are not perfect complements or substitutes,) the Tiebout model becomes substantially more complex. In this case the entrepreneurs in each community provide combinations of public goods and housing to suit all individuals. As a result, there must be a sufficient number of communities to provide the desired levels of public goods and of housing for all groups of individuals. The necessary number of such communities is likely to be extremely large. To see exactly how the presence of a property tax alters the efficiency properties of the Tiebout model, I add a property tax on housing (assumed identical to land here to simplify matters) to the previous model. Again, to simplify the discussion I choose the property tax rate, t, to be defined as the rate of taxation on annual housing expenditures. This rate is equal to the usual tax rate on property value divided by an appropriate discount rate (assuming an infinite stream of housing services). If p is the annual price of producing housing, and t is the tax rate, then p(1 + t) is the annual price of housing, gross of taxes, that the consumer faces when making his consumption decision. Utility maximization is now carried out subject to the budget constraint given by X= Y-p(1 + t)H,
(13)
where ptH = C(N)(G/N), the average cost of a unit of public good G. The first-order condition for optimal public service provision remains as before - the Samuelson condition that the sum of the marginal rates of substitution equals the marginal rate of transformation. However, consider the first-order condition given in (14), which provides the necessary condition for the optimum consumption of housing: (au/aH)/(aU/ax) =p(l + t).
(14)
If there were no property tax, the efficient housing consumption condition would be given by (lib), reproduced here as (15), rather than (14): (au/aH)/(au/ax)=P
(15)
Clearly, the property tax alters the housing consumption choice, leading to underconsumption of housing (other things equal). This suboptimality in housing consumption will also lead to inefficiency in public good consumption, except when public good and housing choices are independent. In general, I would expect either complementarity or substitutability between housing and public service demand. To the extent that public goods and housing are complements, I would expect the underconsumption of housing caused by the property tax to lead to underconsumption of the public good. On the other hand, if public goods
Ch. 11: Economics of the Public Sector
585
and housing are substitutes, I would expect public goods consumption to be greater than would be efficient.' 3 The inefficiency of housing consumption is a crucial problem for the Tiebout models that follow. As long as the property tax does distort housing consumption, any conclusions about efficiency or inefficiency in the provision of public goods will be conditional on current housing consumption. Such analyses might be reasonable in a short-run model where housing consumption is fixed, but can lead to unreasonable results in a long-run model in which housing consumption is allowed to vary.
3.3. Production inefficiencies in the Tiebout model Inefficiencies can also arise in Tiebout models when there are economies of scale in the provision of public goods and individuals assume that their migration will not alter the provision of public goods or taxes in either of the jurisdictions. Assume that land is a fixed factor in the production of the public good and that individuals cannot contract for public goods outside of their jurisdiction. Migration of a low income individual into the high income community reduces the per capita costs of providing the public good in this high income community, but it increases the per capita costs in the low income community. As a result, it is possible to have an equilibrium in which consumers live in a variety of small jurisdictions, each optimizing by selecting the bundle of housing and public good that maximizes his utility, and each content, believing that there is no move that could make him better off. However, it may be possible to combine communities, taking advantage of the scale economies in the provision of public goods, and make everyone better off.l 4 Bewley (1981) illustrates this production inefficiency in a very special case where individuals have identical tastes for public goods and utility is separable 13 This is a standard result in the theory of the second best, whereby a failure in one market, the housing market, can create inefficiencies in other markets. For a discussion of the property tax in a general equilibrium setting, see Courant (1977). See, also Dusansky et al. (1981). For additional studies which evaluate the question of efficiency of resource allocation in models with public goods and with various kinds of distortionary taxes, see, for example, Wildasin (1977), Hochman (1981) and Rufolo (1979). The possibility of efficiency of taxation has usually focused on the land tax, or other taxes that serve as head taxes. See, for example, Hochman (1981, 1982) and Bucovetsky (1981). The discussion of land taxation has occasionally focused on the "Henry George Theorem", which states in effect that with an optimal jurisdictional population the aggregate land rents will be identically equal to the optimal spending on the local public good. The obvious appeal of such a proposal is that one need not resort to distortionary taxation to finance the public sector. The result, unfortunately, holds only under rather extreme assumptions, and is therefore more of a curiosity than a fundamental result. For some discussion of the Henry George Theorem, see, for example, Flatters, Henderson and Mieszkowski (1974), Stiglitz (1977, 1983a, 1983b), Arnott (1979), Arnott and Stiglitz (1979), Berglas (1982), Hartwick (1980), Schweizer (1983) and Wildasin (1984). 14In the language of game theory the model would involve a game without a core.
586
D.L. Rubinfeld
between private and public goods. Consider an equilibrium in which there are two communities, each providing identical levels of the pure public good. Such an equilibrium exists because no one individual will have any motivation to switch communities, the two being identical. However, if the two communities were to combine, the new community could provide twice the level of public goods at no additional per capita cost, making everyone better off. Clearly, it is the inability of individuals to form coalitions and the absence of land developers that allows for an inefficient equilibrium.15 15A more complex kind of inefficiency can arise when individuals have complementary tastes and are mismatched in their current jurisdictions; as discussed in Bewley (1981). To take a simple arithmetic example, assume that four individuals supply one unit of labor and are each independently able to provide one unit of a public good with that labor. There are four distinct public goods labeled GA, GB, Gc, and GD, and two communities, 1 and 2. Each individual's utility function is given as: U(A) = 2GA + GB, U(B) = GA + 2GB, U(C) = 2GC + GD, U(D) = Gc + 2GD,
where GA + GB + Gc + GD = 4. Thus, individuals vary in the extent to which they care about each of the four public goods that might be provided, and total production of public goods must be limited to the number of individuals living in each community. Now consider the following equilibria. In equilibrium I, A and C live in community 1, B and D live in community 2. In 1, GA = 1= GC while in 2, GB = 1 = GD Then, U(A) = U(B) = U(C) = U(D) = 2.
In equilibrium II, A and B live in 1, while C and D live in 2. D. Then, in 1, GA = 1 GB, while in 2, Gc = 1 = G Then, U(A)
U(B) = U(C) = U(D) = 3.
Here, A and C both live in one community, but each is poorly matched, and is able therefore to obtain a level of utility equal to only 2 units. The same is true for individuals B and D in community 2. There are no complementarities in terms of consumption here, but the equilibrium is a stable equilibrium because any attempt by any individual to move to the alternative community will result in a level of utility which falls from 2 to 1. The problem is the strong assumption that individuals do not take into account any shift in the production of public goods which might arise when they enter the community. Were the entrepreneurs running the community to encourage migration, aware of the option to change the level of public service provision, then the equilibrium which is clearly inefficient, would also be stable. With proper entrepreneurial ability, equilibrium II would arise. Here A and B live together in community 1, taking advantage of their complementarities, as do individuals C and D in community 2. In this case the utility level obtained by all is 3 and the equilibrium is efficient. These examples are simple, but they apply quite broadly. Even if the model allowed for the ability of governments to alter public service provision, the possibility of inefficiency would still exist.
Ch. 11: Economics of the Public Sector
587
3.4. Will an equilibrium exist? Before proceeding much further it is important to ask whether there is an equilibrium consistent with any version of the Tiebout model. In fact without additional institutional restrictions on mobility, one cannot be assured of this result.l 6 The non-existence problem can be most easily seen in a model with two income classes all having identical utility functions.l7A graphical demonstration is easier if I assume that the utility function is separable between the private good and the combined bundle of housing and the public good as given in equation (16): U(X, H, G)= V(X) + W(H, G).
(16)
Both low and high income individuals live in homogeneous communities, with no commercial and industrial tax base. Is such an outcome an equilibrium? Under property tax financing of the public good, the "tax-price" that any individual pays for a unit of public good is equal to the cost per unit of producing the public good times the individual's tax share, in this case the ratio of the individual's house value to the total house value (total tax base) in the community. With a head tax everyone's tax-share is 1/N, the reciprocal of the number of community residents. If the homogeneous community outcome is to be an equilibrium no individual should have an opportunity to improve his welfare by moving to another jurisdiction. As an example of why this might be a problem [following Henderson (1979)], consider the utility-maximizing calculus of one low income household. In both panels (a) and (b) of Figure 3.1 the low income individual is initially at point D, facing budget line AB in the low income community. However, buying a small house in the high (rather than low) income jurisdiction, the low income individual effectively faces a lower tax-price for public goods than do the high income individuals with larger homes. The tax-price in dollars per unit of public good for the low income household can be represented as PG= (t p. HL)/G,
(17a)
where t = expenditures/tax base = cG/[p(HL + NHHH)] (recall that I am assuming only one low income entrant to the high income community). Here PG '6 A complete argument would also discuss stability and would necessitate a description of the process by which public goods provision is determined, as well as a discussion of the adjustment of housing prices to the potential or actual movement of each of the income classes. 17 For a more complete discussion of this issue, see Buchanan and Goetz (1972) and Henderson (1979) for a theoretical perspective, and Inman and Rubinfeld (1979) for an empirical discussion. More general analyses of equilibria in models with public goods appear in Greenberg (1977, 1983), Richter (1978, 1982), and Foley (1967). See also Ellickson (1973, 1977), and Stahl and Varaiya (1983).
D.L. Rubinfeld
588 (a)
(b)
Figure 3.1. (a) Low income household does not migrate; (b) low income household migrates.
represents the tax-price, c the marginal cost of providing public goods (assumed equal to the average cost), and HL and HH the housing consumption of the low income and high income individuals (of which there are NH), respectively.i 8 Then, it follows directly that PG = C[ pHL/p ( HL + NHHH) = c[HL/(HL + NHHH)].
(17b)
Since the tax-price is a function of how much housing the low income person consumes, the budget constraint faced by the low income individual in the high income community varies with the level of housing consumption selected. In the high income community, the budget constraint faced by the low income individ18
The price of housing is assumed to be exogenous and thus, does not affect tax-price.
Ch. 11: Economics of the Public Sector
589
ual is given by line AC (in both panels), with housing consumption at H1, and public good provision at G1. The indifference curve, U0, in Figure 3.1, panels (a) and (b), shows the maximum level of utility obtainable given the prices that the consumer faces in the low income community. An equilibrium arises if the low income household would not find it advantageous to move to the high income community. In panel (a), the consumer is initially at point D in the low income community, facing budget constraint AB. If the low income individual were to move to the high income community, he would be required to buy the public services, G, offered there. G involves a low tax-price per unit of public services, but a high total expenditures on public goods. As a result, if G were purchased the low income individual would have little income to spend on housing and would consume housing HI associated with point E on budget line AC. The benefits from the additional public good consumption are outweighed by the lost utility from diminished housing consumption. In this case the outcome is an equilibrium, since low income individuals have no desire to migrate elsewhere. Incidently, it is here that the separability assumption is helpful. With separability the graphical analysis can focus on the trade-off between housing and the public good. Private good expenditures are omitted since they are unaffected by changes in the price of the public good relative to housing. In panel (b), however, the non-existence problem can arise. Here, the level of public good provided in the low income community is not as high as in the previous case, so that the low income individual faces a lower cost of the public good in terms of housing, even though the tax-price has fallen somewhat less than in the previous case. The move from point D to point F represents an increase in utility for the low income individual choosing to live in the high income community (even though the public good remains at G). If the result of calculations of this type is for low income individuals to continually move into high income communities, and for high income individuals to move to avoid the low income individuals, an equilibrium will not exist. Since there is nothing within the Tiebout framework which restricts individual mobility, it is perfectly reasonable, and even likely, to expect a non-existence problem of this kind. How can existence be assured in such a Tiebout framework? One solution involves the use of various zoning and land use devices to limit the mobility of low income families. For the residents of the high income community, a given spending level per household can be obtained with a lower tax rate than in the low income community, since the high income tax base is larger. Admitting low income individuals, who purchase houses valued below the community average into the community lowers the tax base per family and raises the tax rate. This increased tax rate makes the high income community less attractive to future residents and lowers the value of remaining properties-causing tax rates to increase again. As a result, the incentive of current residents to keep out lower
590
D.L. Rubinfeld
income families may differ from the incentive of private developers who may make greater profits by building high density housing units for lower income families than by constructing large homes.' 9 A low income entry barrier can be created if the high income community increases the level of public spending beyond the level preferred by high income individuals. This adds to the cost of entry to low income individuals but also lowers the utility of current residents. A more direct and effective approach is for the high income community to regulate the minimum size for houses. By fixing a minimum consumption level for housing greater than the housing demand of low income individuals, the cost of entry into the high income community is increased. If the minimum is set to be equal to the high income individuals' consumption levels, then the low income migrant will no longerface a lower tax-price for public services. By forcing equal housing consumption, the exclusionary zoning device changes the property tax to a "head" tax, with everyone in the community paying an equal share of the cost of the public good. Faced with a head tax, the low income individuals are unable to move to the high income community, since they would have to buy the same level of housing and more of the public good. The argument just made was developed by Hamilton (1975). In an attempt to rescue the Tiebout model, Hamilton shows that with perfectly elastic housing supply and zoning the Tiebout equilibrium will exist and be efficient, despite property tax financing of the public good. If property taxes paid equal benefits received, there is no migration incentive (on that account). If they do not, there is. Of course, this result occurs only because the property tax has been effectively turned into a head tax and because the distortion in housing consumption caused by the property tax has been eliminated. In the usual non-zoning case low income individuals desire to move into the high income community because the benefits from the increased consumption of public services outweigh the costs or taxes paid for those public services. Thus, the purchase of a small house generates a fiscal surplus or residuum for the low income individual. From the perspective of residents in the high income community, a "fiscal externality" has been created, since the new entrant does not fully pay for the additional public service costs. A fiscal externality is created in the low income community as well if the public good is provided originally at minimum average cost; in this case the departure of the resident could force the average cost of provision of public services to be higher for all remaining residents. Thus, the zoning device can simultaneously improve the welfare of high income households and eliminate the fiscal externality. Hamilton's argument shows that 19The arguments concerning the motivation of the various parties to use zoning as a restrictive device are given in White (1975) and Mills (1979). See also Rubinfeld (1979), Eppel et al. (1978), Hamilton (1983a) and Zodrow and Mieszkowski (1983).
Ch. 11: Economics of the Public Sector
591
to the extent that one can make the property tax a head tax, an efficient outcome can be insured. However, without a head tax (even with Lindahl taxes which equate marginal benefit to marginal cost), migration incentives arise, generating externalities that are likely to lead to an inefficient outcome.20 And, there is reason to believe that actual zoning policies deviate substantially from the one which transforms a property tax into a head tax. The analysis suggests that the pure Tiebout equilibrium will exist and be efficient if there are sufficient communities to match the tastes of all individuals for both housing and public service consumption, if the public good is produced at minimum aerage cost with respect to population, if profit-maximizing entrepreneurs are free to produce new communities to satisfy the demands of residents, and if a host of additional assumptions mentioned previously also hold.2 Even though the Tiebout model involves a series of strong and unrealistic assumptions, the insights that it gives about the manner in which individuals respond to fiscal variables are relevant for both theory and practice. The Tiebout framework is useful from a normative perspective because it defines a clear set of conditions which are necessary for optimality, while clarifying the intrajurisdictional and interjurisdictional aspects of efficiency. The "competitive" Tiebout world highlights the efficiency gains to be made when individuals are allowed to choose by "voting with their feet". It can and has been useful from a positive perspective because it emphasizes the effect that variation in the supply of public goods can have on locational choice, as opposed to more traditional empirical analyses which do not take migration into account. However, both normative and positive aspects of the Tiebout model are changed somewhat when the land market is modeled explicitly. This additional complication is the subject of the discussion which follows.
3.5. Capitalization of public spending and taxes in a Tiebout framework I have assumed that the pre-tax price of housing in any of the communities is fixed. To the extent that housing prices respond to differences in public sector variables, the "capitalization" of public sector variables into housing prices may provide an adjustment mechanism that will improve the efficiency of the Tiebout process. But, even with capitalization, inefficiency, both in terms of the distribu20 The previous results are overstated somewhat since they assume a positive correlation between tastes for public goods and income. If tastes and income are uncorrelated, efficiency and income mixing are not incompatible. 21These conditions are sufficient, but not necessary, for an efficient equilibrium and thus appear overly stringent. For example, Mark Pauly points out that one could have an efficient equilibrium at other than minimum average cost if there were economies of scale up to a point, but all communities were at the optimal size. See Pauly (1976), Pestieau (1983) and Zodrow (1983).
592
D.L. Rubinfeld
tion of population among communities and in terms of the appropriate level of public services within jurisdictions, is still possible. 22 Consider what capitalization means and how it fits into the Tiebout framework. To begin, one might ask what are the effects of property taxes or public services on the value of land? For this discussion assume that when one purchases a house one buys simply the plot of land with a fixed structure on it so there is no distinction between land and the structures on land. Allowance for the substitution of capital for land in the production of housing changes the quantitative, but not the qualitative nature of the analysis. The individual's willingness to pay for a plot of land will depend in part upon the bundle of public services that can be consumed once the land has been purchased and residence in the community is determined. However, the market price that is paid for the land will reflect not only the willingness to pay of the individual who eventually purchases the land, but also the willingness to pay of all potential purchasers of the land. If all individuals have identical tastes for public goods - one interesting extreme - then the price of land will be bid up by competition to reflect the value of public goods to all, and complete capitalization will have occurred. However, in the other extreme, if the individual's demand for public good is different from all others, then the price of land will not change and no capitalization will occur. The individual will be fortunate in being able to enjoy the consumer surplus associated with his unique tastes. Thus, with "sufficient" competition for land among individuals with "similar" tastes, one would expect that higher taxes would lead to lower land values, while greater public services would lead to increased land values. Capitalization is the term which refers to the effect of fiscal variables on the capitalized or present value of the asset land, rather than the annual rental price. Because competition is a crucial determinant of the presence of capitalization, it is important to distinguish between capitalization in the short run, intermediate run, and in the long run. In the short run with land (housing) in fixed supply, one would expect capitalization to occur with some frequency. In the intermediate run with housing in variable supply, an increased supply of housing could diminish or conceivably eliminate capitalization. Finally, in the long run in which the number of communities, and perhaps the number of metropolitan areas, is in variable supply, capitalization is likely not to occur. The reason is that with an increased demand for public goods individuals would always have the option to 22 Capitalization has extremely important implications for policy since it can thwart or even counteract attempts to achieve equity through judicial reform. For an extensive discussion of this issue, see Inman and Rubinfeld (1979). Capitalization adds a complication because individuals moving into a community must pay what is in effect a two-part tariff. One part of the price of public services is contained in the portion of the price of housing which is associated with the capitalization of fiscal differences. The second part is the direct payment made in terms of property taxes. Any policy change which increases costs of buying public services in an attempt to obtain equity will simultaneously lower land prices, compensating partially or wholly for the change in property taxes.
Ch. 11: Economics of the Public Sector
593
move to a newly developing jurisdiction rather than paying a higher price for housing in a developed jurisdiction. This distinction between short, intermediate and long run are crucial for understanding the more technical analysis which follows. The Tiebout model can easily be modified to account for capitalization, beginning with the short run. Let H be the units of housing consumed, p the price of a stream of services available from that housing, and i be the discount rate. 23 Then, the value of a house in the community, V, is as follows: V= (pH- tV)/i.
(18)
Here the property tax rate, t, is defined as a tax on the value of housing. Equation (18) assumes complete or total capitalization of taxes because total tax payments are deducted from the pre-tax value of the house to obtain the final after-tax value. Other things equal, an increased tax rate will lower the value of the house. More explicitly, the house value, V, in terms of the remaining variables is as follows: V=pH/(i+ t).
(19)
Equation (19) shows that higher tax rates lead to lower house values, assuming the level of public services provided is constant. The intermediate story would be similar except for the fact that land price adjustments would differ from housing price adjustments due to the supply response. When there is imperfect capitalization, equation (19) becomes: V= pH/(i + at),
(20)
where a is less than one. 24 Imperfect capitalization can occur in a short-run situation because individuals differ in their tastes and because the fixed number of communities limits the amount of competition for the variety of public services available. Of course, partial capitalization may also arise if there are informational market imperfections as well. 23 This 24
abstracts from the usual problems with selecting the appropriate discount rate. There has been a substantial literature of an empirical nature dealing with the extent to which property taxes as well as public sector variables are capitalized. The most widely cited piece is by Oates (1969), but there has been an extended discussion, both empirical and conceptual, about the Oates work. For a review of this entire literature, see Bloom, Ladd and Yinger (forthcoming). See also Church (1973), Cushing (1984), Edelstein (1974), Heinberg and Oates (1970), King (1977), Levin (1982), McDougall (1976), Oates (1972), Polinsky and Rubinfeld (1978), Rosen and Fullerton (1977), Smith (1970), Wales and Weins (1974), Wicks, Little, and Beck (1968), Woodard and Brady (1965), Rosen (1982), Yinger (1982).
594
D.L. Rubinfeld
However, as a general rule, short-run and intermediate-run models are associated with capitalization, which can arise for a variety of reasons. Assume, for example, that public services are provided more efficiently in some communities than in others in the sense that the tax cost of an equivalent bundle of services is lower. Then, we would expect competition to lead some individuals to move into the more efficient producing community, raising the price of land there. Hamilton (1976), on the other hand, has argued that zoning restrictions on low income housing within a mixed community can lead to intrajurisdictional capitalization.2 In Hamilton's case capitalization occurs because the supply of housing of each type is not perfectly elastic even in the intermediate run. In the long run, however, the prospect of new communities will be sufficient to eliminate capitalization, subject to the proviso that tastes of individuals are sufficiently similar to allow all surpluses to disappear. 2 6 To see this, imagine that some capitalization has occurred, i.e. that an additional unit of public service leads to a positive increment in house value, net of tax cost. Then, a new community can be created costlessly that provides the higher level of service without increasing the price of land. The "rents" originally earned by the owner of the land whose value was high due to capitalization, are dissipated when the new community is developed. However, since the assumptions associated with the Tiebout model - long-run assumptions - are not satisfied in the local public sector, the empirical presence of capitalization is not and should not be a surprise to public finance economists. 2 7 A complete analysis of how capitalization relates to the Tiebout model requires a framework which models both the decision to move among communities and the effect of the move on the tax base and the demand for public services. In addition, variations in individuals' preferences may influence the decision-making process within communities. Attempts to model such complex behavioral systems
25 In Hamilton's (1976) discussion of intrajurisdictional capitalization of public services, the possibility of efficiency with mixed communities is considered. Hamilton shows under a set of strong conditions concerning mobility and the use of zoning to regulate housing consumption, that an efficient heterogeneous community can exist. The efficient solution occurs because everyone satisfies their public service demands, since their tastes are assumed to be indistinguished. Housing demands vary, but these demands are also satisfied by a restrictive zoning ordinance. Those individuals with high incomes consume more housing but pay a higher price, while those with low incomes consume less housing and pay a lower price. The net effect of this full capitalization of the benefits and costs of public services is that no individual wishes to move to make himself better off. The efficiency of this solution arises only because we have assumed that everyone has identical public service demands. The main point, however, does hold- that stratification and homogeneity of individuals within jurisdictions is not a necessary condition for efficiency. See Bucovetsky (1981) for a more recent discussion of this and similar issues. See also Starrett (1981) and Scotchmer (1985b). 26 This is essentially the argument made by Edel and Sclar (1974). 27 See also Hamilton (1976) Epple, Zelenitz and Visscher (1978) Sonstelie and Portney (1980), Stiglitz (1983b), Topham (1982), and Wildasin (1983), for some methodological discussion. Brueckner's (1979) paper argues for capitalization, but does not consider long-run supply responses.
Ch. 11: Economics of the Public Sector
595
have been made by a number of authors, yet none is completely satisfactory.2 8 Nevertheless, the discussion of capitalization by Yinger (1982) on which the following analysis relies, is illuminating. 29 Capitalization depends on the willingness of potential or actual movers to pay for housing in a given community. For example, let P represent the amount of money per year per unit of housing services that a household is willing to pay to live in a house in a given community (not necessarily equal to p, the market price of housing), which is a function of the level of public services, G, and the property tax rate, t, in the community. The optimization decision of a household choosing among communities, with public services fixed, is given by maximize U( X, H, G) subject to Y= X+ P(G, t)H+ tP(G, t)H/i.
(21)
The expression tPH/i represents the tax rate times total house value, V. Thus, this model is essentially equivalent to the previous one except that P is now a function of both G and t. Households' bids for housing vary with public service and taxes, as can be seen by differentiating this system not only with respect to X and H, but also with respect to G and t. The complete set of first-order conditions is given below:
(i)
au/d X = A,
(ii)
dU/dH = XP(1 + t/i),
(iii)
dU/dG = X(dP/dG) H(1 + t/i),
(iv)
(dP/dt)(i+ t) + P = 0.30
(22)
First, assume that all households have identical tastes and income. In this case competition among households will eliminate all consumer surpluses so that the function P(G, t) represents an equilibrium pattern of housing prices. To see whether capitalization occurs, fix G at some arbitrary level, say G*, and then solve equation (22, iv) for the bid price function P. The solution to this differential equation is P = X(G)/(i + t),
(23)
where X(G) is a constant of integration whose value varies with G. 28 See, for example, Rose-Ackerman (1979, 1983), Portney and Sonsteie (1978), as well as Epple, Filamon and Romer (1983, 1984). 29 Yinger's model is limiting in part because there is no explicit public choice mechanism. Voter models are important in Epple et al (1983, 1984), Martinez-Vasquez (1981), Wildasin (1984), and Rose-Ackerman (1979). 30 (iv) is obtained by differentiating the constraint with respect to t, and solving.
596
D.L. Rubinfeld
It is easy to see that the solution satisfies (22, iv) by differentiating (23) with respect to t and choosing the appropriate constant of integration. Note that if the tax rate is zero, then the constant of integration is equal to the bid that would be made for housing in a no-tax world. Since a change in the level of public services, G* (through additional federally financed local goods), alters the constant of integration, equation (23) shows that the extent of capitalization depends upon the level of public services. 3 1 More generally, the pattern of housing prices will depend upon the choice of housing and the public good, when all four equations in (22) are solved simultaneously for H, G, and X in terms of the remaining variables. Yinger illustrates this solution in the case of a Cobb-Douglas utility function. The utility function is given in (24) and the solution for the bid price function in (25): U(X, H, G) = In X+ aln H+ P(G, t) = (yGf/-)/(i + t),
lnG,
(24) (25)
where y is a constant of integration. With identical households there will be no differences in willingness to pay for the same public service bundle, and P will equal the equilibrium price of housing. Within a jurisdiction where G is fixed, any variation in taxes-taxrates - is capitalized immediately into the price of housing. The dollar amount of capitalization depends upon the function, P, and, implicitly, the tastes of the individuals. For example, the extent of capitalization will vary directly with a, the strength of preference for the public good, and inversely with /B, the strength of preference for housing. Equation (25) also describes the variation in housing prices reflecting different levels of G when t is fixed. Specifically, the change in housing prices is given by dP/dG = (/a)P(G, t)/G.
(26)
Here the percentage change in housing prices is a positive function of the strength of preference of the public good. Now consider what happens if either the tastes or incomes of individuals differ. Recall that in a model with two income classes, low income and high income households are likely to segregate. 32 In such a case the capitalization question becomes more complicated, since the willingness to pay of a low income household to live (in a given size house) in a high income community depends on the trade-off between the value of additional public services and the cost reflected 31 And, 32
in general on the level of all commodities. This segregation is not a necessary condition for Tiebout equilibrium. See Yinger (1982) for a discussion of this issue. See, also Henderson (1985).
Ch. 11: Economics of the Public Sector
597
in additional taxes. If individuals choose not to move between jurisdictions, then there may be two different tax and service bundles in two different communities with the same housing prices. This follows because [as implied by (26)] differences in the dollar expenditure on public services may not be reflected dollar for dollar in differences in house values. How much individuals value public services affects capitalization, not the dollar expenditures. Thus, there may be little capitalization if differences in services are not valued by one of the income groups.3 3 To sum up, the analysis has shown that as a general rule the no capitalization case can arise only in the very long run in which entry of new communities is free, a necessary condition for the Tiebout model to mimic the usual competitive model. However, even in the long run, such a possibility is quite unrealistic, not only because of the cost of such development, but because of the spatial nature of urban areas which depend upon one or more centralized workplaces for employment. Despite some theoretical suggestions to the contrary, capitalization is a phenomenon which is likely to be an important one. Fortunately, this outcome is useful empirically. I will show in a later section, for example, how the presence of capitalization can be used to empirically estimate the demand for local public goods. 3.6. The efficiency of the Tiebout model with capitalization
How is our previous discussion about Tiebout modified in a model in which capitalization may occur? In particular, is the outcome efficient in the intrajurisdictional sense, conditional on the property-tax-distorted housing choice? Allowing for capitalization complicates the problem. Imagine, for example, two individuals, A and B, both with identical income and tastes. The individuals reside initially in separate jurisdictions providing different public goods, but with 33
Finally, consider the decision among residents of the community to select a public service level. Because changes in the public budget affect the location decision of potential community entrants, the prospect of capital gains and losses associated with capitalization can influence the public sector decision. In the simplest model in which all individuals have identical tastes and incomes, the possibility of capital gains does not affect the public service bundle choice. By choosing public services that set marginal benefit equal to marginal cost (implicit in the Samuelson condition), the individual is paying the maximum amount for the house that he would be willing to pay if he moved into the community. Even if non-residents have different tastes, the individual in the community is still maximizing property values. However, if the resident is planning to leave the community a different calculation might occur. As long as the political process is dominated by current residents, an equilibrium with property value maximization in a simple model (and with no business property) yields an efficient outcome. If movers do dominate or have a substantial impact, the analysis becomes more complex. The property value maximization argument has been utilized by a number of public finance researchers to build a simple structure for analyzing the local capitalization question. See, for example, Brueckner (1979). Note that the analysis becomes substantially more complex if we have heterogeneous communities, but we leave this analysis for a later section.
598
D.L Rubinfeld
housing prices equated. In the first jurisdiction individual A is consuming his desired level of public services, given the price of housing. Individual B, however, is underconsuming public services and is initially worse off than A. Since an equilibrium in such a model occurs only when utility levels are equal for both individuals, one possible outcome is for capitalization to occur, causing the price of housing in B's jurisdiction to fall. In the resulting equilibrium, B will face a lower price of housing, and will consume more housing and less public services than A. Both, however, will achieve the same level of utility and not be motivated to move. Thus, capitalization can compensate individuals, eliminating their incentive to move. However, this equilibrium will, in general, not be efficient, for the same kinds of reasons that were given previously. For example, it may be efficient from a production point of view to have one, rather than two, communities. However, individuals will not have the appropriate incentive to take advantage of economies of scale. With or without capitalization, the efficiency of the outcome in a Tiebout model depends upon the provision of public services and individual mobility. And, capitalization cannot rule out either production inefficiencies or suboptimality in consumption within communities. 3 4
3.7. Testing the Tiebout model Tests of the Tiebout model can be divided into two distinct types. First are the abstract tests of whether the Tiebout model yields an efficient outcome with each individual obtaining his most preferred level of public services in a homogeneous community. The second type of test attempts to evaluate whether fiscal variables affect migration so that there is at least a partial sorting of individuals in communities based on public sector variables. Some of these tests look at mobility explicitly, while others look indirectly by focusing on the capitalization of public sector variables into land values. One conceptual test of the Tiebout model follows from our previous discussion about capitalization. In a model in which all residents of the community are homeowners, property value maximization can lead to efficiency in the Tiebout world. The test of Tiebout efficiency is then to see whether aggregate property values are maximized for the current distribution of individuals among jurisdictions. The argument is most fully developed in Brueckner (1979, 1982, 1983), and is illustrated in Figure 3.2. Brueckner suggests that the graph of the relationship between aggregate total property values in a community, APV, and public output, G, is likely to be an inverted U-shape. The most important assumption in his analysis is that the local public choice mechanism is one which selects public 34 For a model involving strong assumptions in which the efficient outcome will be obtained, see Brueckner (1979). See also Sonstelie and Portney (1980) and Westhoff (1979).
Ch. 11: Economics of the Public Sector
599
-11 A V
APV*
Figure 3.2
goods to maximize property values. In this case, the point G*, where public service provision is optimal, is the point at which property values are maximized. A test of Tiebout efficiency involves looking across communities to see whether they are near the peak or the maximum of property values. Such a test can theoretically be made by estimating a multiple regression of the form given in equation (27): APV = a + PG + Z + e.
(27)
Here Z is a vector of variables (other than public sector variables), that determine property values and E is a random disturbance term. Note that this multiple regression model is similar to what has been called a "hedonic" regression model, where Z includes a list of housing, accessibility, and neighborhood characteristics. The model accounts not only for G, but also other variables that affect property values, and thus is a kind of "reduced form" explanation of why property values vary across communities. If the model is correct and public goods are efficiently produced, then near the optimum small changes in public spending should have no effect on property values. Whether the coefficient /3 in equation (27) is significantly different from zero can be tested to evaluate whether property values do respond to changes in public goods provision. Insignificance is at least consistent with property value maximization and efficiency. The test works because property value maximization implies that public goods will be provided up to the level at which a tax-financed increase in public goods yields a competitive return in terms of increased house value. The tax increase used to finance the public good takes into account the opportunity cost of the
600
D.L. Rubinfeld
public funds in terms of what they could earn in the private sector (all funds are assumed to come from investment). A coefficient of zero in the equation in which t is allowed to vary is the proper efficiency test. A more general test that would apply in a more general model than the one just described has been suggested by Inman (1978). Inman suggests that we analyze a structural equation such as (27a) in which the tax variable is included: APV= a + f3G + yZ +ft + £.
(27a)
In this case dAPV/dG = +f(dt/dG) = R, where R measures the return on the public investment. If the annual value of this return R = 1 + i, where i is the competitive rate of return, then the level of public investment is efficient. Inman's empirical work supports this conclusion for a group of jurisdictions on Long Island, New York. There are, however, substantial methodological and empirical problems associated with the narrow property value maximization test. First, the specification of the hedonic regression affects the interpretation of the results; in particular the regression is not likely to be linear. Second, it is difficult to measure all the appropriate control variables that ought to appear in the vector Z. Third, /3 may be insignificant even though there is not Tiebout efficiency. As Brueckner notes, one cannot be assured that some communities do not overprovide public services and others underprovide services so that the net effect appears to be a zero coefficient on the G variable. In addition, any empirical test which relies on an insignificant result is a weak one, since we cannot rule out the possibility that the regression model is improperly specified.3 5 Thus, this test of Tiebout efficiency is not likely to be very useful from an empirical perspective. Do public sector variables affect the decisions of individual households who move between jurisdictions? The best of the relatively scanty evidence, given in Reschovsky (1979), suggests that high income households are influenced by variations in public services, especially education. However, Reschovsky finds no support for this effect among low income movers. His study is particularly appealing because it focuses solely on the behavior of movers, not on all individuals, and is a micro rather than aggregated study. Most other Tiebout studies focus on the implications that such movement may have on the distribution of spending and on the spending demands within communities, usually as proxied by income. For example, Hamilton, Mills and 35 Tests of the Tiebout hypothesis using estimating equations like (27) do have some additional deficiencies. As Portney and Sonstelie (1978) point out, a linear form assumes that differences in property tax rates between communities have the same effect on the value of homes independent of the size of those homes. This suggests that it is tax payments rather than tax rates that ought to be capitalized. Sonstelie and Portney suggest an alternative Tiebout test which they claim has the advantage of not depending on the presence of capitalization. There are additional methodological problems with this approach, but it is a conceptually appealing one.
Ch. 11: Economics of the Public Sector
601
Puryear's (1975) crude empirical work suggests that there is more homogeneity of income and public good provision within suburban jurisdictions than there is within the central city. Given the vast set of choices that exist in suburban areas, one expects such homogeneity in a world in which public sector variables matter. However, we are clearly a long way from having homogeneous suburbs as is illustrated in the empirical analysis by Pack and Pack (1977). Perhaps the most widely discussed test of the Tiebout hypothesis was one described by Oates (1969). Oates suggested, along the lines of the argument spelled out by Brueckner, that if the Tiebout model was an accurate depiction of the local public sector, the effect (on the margin) of increased public services on property values, holding taxes constant, would be positive. Likewise, the tax variable, holding services constant, would have a negative sign, and the net effect of the two changing jointly would be zero, at least for the median or decisive voter. A good deal of discussion of the theory underlying the Oates paper has led to the view that the empirical work does not provide a direct test of the Tiebout efficiency. This should not come as a surprise since Tiebout equilibrium can occur without capitalization, and capitalization can arise without Tiebout efficiency. The Oates and other follow-up papers do suggest, however, that capitalization of both public spending and taxes is prevalent. Finally, an alternative test of the Tiebout mechanism uses survey data. Gramlich and Rubinfeld (1982) analyzed responses by a random sample of Michigan voters to questions about the voters desired level of public expenditures. Responses were used to estimate demand functions, which in turn were used to evaluate statistically the variation in public sector demands within various kinds of communities across the state. If the Tiebout model is appropriate, then one would expect greater homogeneity of demands in suburban communities where there is substantial locational and public sector choice. The data do support this view. Within the suburbs of the city of Detroit and several other large metropolitan areas, public service demands are relatively homogeneous, while the demands are less homogeneous in cities with smaller suburbs or rural areas.
4. Empirical determination of the demand for local public goods
4.1. Introduction Empirical estimation of the demand for public goods is a major focus of public finance. Throughout the 1950s and 1960s many researchers attempted to explain variations in local public expenditures as reflecting socioeconomic characteristics of the population as well as the level of intergovernmental aid. In the past decade
602
D.L. Rubinfeld
or so, researchers advanced this literature substantially by concentrating on the optimizing behavior of the parties demanding public goods. 3 6 The natural reaction was to borrow directly from the optimization literature of consumer theory, an approach which has a number of advantages over the previous methods. Specifically, it may be possible to distinguish between demand and supply characteristics, to specify the appropriate price and income variables, and to model the effect of grants-in-aid on local spending. However, a number of difficulties not present in standard consumer theory arise in modeling local public goods. I will treat both empirical and theoretical problems in this section, beginning with a discussion of the conceptual problems which underly demand estimation. The first conceptual problem involves the aggregation of preferences. Individual demands for the public good are translated into a community demand through some kind of political process. As a result, estimation of the demand for a publicly provided good must either explicitly or implicitly model this process. The usual resolution of this problem involves the use of the median voter model. 3 7 The median voter model is a demand-side model which takes the community demand for a single-dimensional public good to be the median of the individual demands for the good. Of course, the actual public good provided will depend upon the supply side as well - whether and under what conditions public officials will provide the public good desired by the median voter. More about this later. For the moment, however, I assume that the public provision comes about by a majority rule voting process. Under a set of reasonably strong assumptions (including single-peaked preferences, a single-dimensional public good, and no agenda setting), the median voter's demand will be the public good actually provided. Any alternative proposal can never win a majority of the votes in an election against the level demanded by the median person. Under some political processes there is no guarantee that the level of community provision of the public good will represent the quantity demanded by any one individual in the community. If this possibility occurs, demand estimation using aggregate data becomes very difficult. The median voter model is appealing among other reasons because it avoids this problem. The paper by Bergstrom and Goodman (1973) serves a valuable function in that it specifies clearly a sufficient set of very restrictive assumptions that makes use of aggregate data to estimate demand functions possible. Whether and to what extent the model is empirically validated is a subject for further discussion. First, however, it will be useful to discuss some additional conceptual problems 36
For two examples of this see Borcherding and Deacon (1972) and Deacon (1977). 37The median voter model was proposed by Bowen (1943), and extended by Barr and Davis (1966), and Bergstrom and Goodman (1973). See also Mueller (1976), and Martinez-Vasquez (1981).
Ch. 11: Economics of the Public Sector
603
peculiar to public goods - the measurement of output and the determination of price. Individual demand functions are derived as follows: each individual is assumed to have a quasi-concave utility function, U(X, H, G). To allow for variation in tastes among individuals with different socioeconomic characteristics, these utility functions are conditioned on a set of characteristics, denoted by Z. The individual's problem is then: maximize U(X, H, G; Z) subject to Y = X + pH + PGG.
(28)
Here the price of the private good has been normalized to be equal to 1, and the price of the public good is denoted by PG. The public good need not be a pure public good. In fact, I will allow for "congestion" in the sense that the level of the public good will depend upon population in the jurisdiction. Nevertheless, it will be useful at this point to assume that all individuals within the jurisdiction consume the same level of G, so that G need not be subscripted by individual household.3 8 The Tiebout model has no supply side in which the process by which the public good is produced is clearly specified. One relatively straightforward specification of the supply, given by Inman (1979), is to describe G as representing the flow of public services obtained from a production function whose inputs include the public capital stock and the population of the community. The capital stock itself is obtained from another production process which uses both labor and materials as inputs. This supply-side specification allows for the treatment of a number of interesting local public finance issues which will not be taken up in this chapter. First, it introduces the local public labor market (as well as the private market) explicitly into the model. Second, it raises the question of what the appropriate level of the public capital stock ought to be, as well as questions concerning the rate of investment in the capital stock, and the source of financing. Unfortunately, all of these issues cannot be covered in this chapter and are left for further reading and thought.3 9 Individual demand functions depend not only on individual budget constraints, but also on the budget constraint facing the public sector, which in turn determines the individual tax-price, PG. To incorporate this into the analysis I let 38This could be accounted for by subscripting the public good, allowing the within jurisdiction benefits to vary among individuals. 39 The estimation of production functions and, more generally, the study of the supply of local public goods has been undertaken in a substantial list of articles. A sample of those would include Carr-Hill and Stern (1973) and Fare et al. (1982). See Inman (1979) for a longer list.
604
D.L. Rubinfeld
C equal the marginal and average cost of production of G (at least near the optimum). The municipal budget constraint is then cG = T + A,
(29)
where T is the level of locally raised taxes and A is the total amount of aid given to the municipality by higher levels of government. This aid is assumed to be "non-matching", i.e. independent of the level of local tax effort. If matching aid were also included the budget constraint would then become cG = (1 + m)T+ A,
(29a)
where m is the matching rate as a fraction of local taxes T. If local revenues are raised by a property tax, then the tax-price of a dollar of local public spending to a homeowner is given by PG = cpH/B,
(30)
where pH is the value of the individual house, and B is total tax base in the community (piHi). 40 Here tax-price represents the individual's cost of purchasing an additional dollar of local public services, with no commercial and industrial tax base present.4 1 Solving the maximization problem given by equation (28) subject to the individual budget constraint and the budget constraint of the local government (29), yields (in general) the individual's demand function for the public good. This demand function is given in its most general form in equation (31) below where a superscript asterisk denotes the desired public good: G* = g(Y, PG; Z).
(31)
Following Bergstrom and Goodman (1973), most theories of the demand for public goods assume that individuals demand a flow of services of uniform quality, Go, from the publicly produced good. Because of "congestion" or crowding in the consumption of that good, Go is a function of the population of the community, N. For example, one might postulate that Go = G/N g , 40 This avoids all issues surrounding the distinction between assessed and market values for housing, as well as the questions of rental as opposed to owner-occupied housing, and the shifting of the tax on non-residential property. 41A more complex calculation of price would account for the deductibility of local taxes under the federal personal income tax and the possibility of local tax credits associated with the payment of property taxes.
Ch. 11: Economics of the Public Sector
605
where G is the observed level of the public good and g is thought to lie between 0 and 1. Note that when g is equal to 0, G is essentially a pure public good. However, when g equals 1, the consumption of the publicly provided good is equal to the individual's per capita share of the public good produced, so that G is a pure private good. Values of g between 0 and 1 allow for the possibility of goods that are mixed or that have partly private and partly public characteristics. If the tax-price associated with G is given by (30), then the tax-price per unit of Go is PG = PN g . Note that when G is a pure public good (g = 0), then PGO represents the price of one unit of public good which when consumed is available to all individuals. Thus, with the simplifying assumption that all houses are uniformly valued, the price would equal 1/Nth of the cost of producing a unit of the pure public good. When G is a pure private good PGO represents the cost of producing one unit of the public good, which is N times the cost per individual of producing the good. The most popular approach to estimating demand functions, in part because of its tractability, uses aggregate data on spending at either the jurisdiction, county, or perhaps state level. Were everyone in the community to receive the level of public services that he or she demanded, then the demand function might appear as follows:42
I + ,21og Y+ f331lg PO +f51lg Z + e.
log Go =
(32)
However, empirically one explains variation in actual output, G, not variations in GO. If the following substitutions are made: log Go
log G - g log N
and log PGO = log P
+
g log N,
then the actual output demand equation is log G =
1
+ 1210g Y + 310g PG + ,41og N + ,5log Z + e,
(33)
where ,4 = g(1 + ,3). The observations for each variable differ by jurisdiction, but the subscript has been suppressed. 42The log-log form is chosen for simplicity because parameter estimates are elasticities. However, one should note that the only utility function globally consistent with such a demand function is the Cobb-Douglas with unitary income and price elasticities.
606
D.L. Rubinfeld
With the log-log specification N appears as a separate variable in the demand equation. Thus, an estimate of g can be obtained from the estimate of the coefficient 4 and the estimate of the price elasticity of demand p/3.4 3
4.2. Empirical issues Strong as the median voter assumptions are, they are not sufficient to allow for demand estimation using aggregate data unless we know the characteristics of the median voter. Bergstrom and Goodman (1973) showed that under a certain set of rather strong conditions an observation on expenditures and public goods in the community can be treated as if it were the amount demanded by a homeowner with the median income and the median house value of that community. Their rather strong set of conditions shows one possible set of sufficient conditions for demand estimation. A number of Bergstrom-Goodman assumptions are worthy of particular attention. First, all jurisdictions are assumed to have income distributions that are simple proportional shifts of each other. For example, if the distribution of income in one community is normal with mean $10 000 and variance $100 000, a second city with mean income $20 000 would have a normal income distribution with the same $100 000 variance. Second, each individual's tax-price is a constant elasticity function of individual family income. This is equivalent to assuming that all individuals have identical constant income elasticities of demand for housing.4 4 Finally, the income and price elasticities are assumed to be related so that the distribution of desired demands is a monotonic function of income. To see why the monotonicity assumption is crucial, consider the distributions of desired demands given in Figure 4.1. Panel (a) illustrates the monotonic case in which the quantity demanded by the median person is the median quantity demanded. However, if the distribution is as in panel (b), it is clear that the median desired level of spending is no longer the level desired by the median income person. Instead, the individual with the median income demands Gmed, the highest demand, while individuals. with incomes Y 1 and Y2 demand G * or the median quantity demanded. What constraints on the parameters of the model insure that the median voter result works? The answer can be derived by differentiating the demand function with respect to income and realizing that the gross effect on demand of the change in income must be positive (or negative) everywhere so that differentiating 43 Crowding can occur with respect to dimensions other than population. For example, Gramlich and Rubinfeld (1982) specify crowding in terms of both the income and the property tax bases. This allows for the possibility that the provision of public goods will vary among individuals of different income classes living in different size houses. 44 In this case log H = C + log Y.
607
Ch. 11: Economics of the Public Sector G
G=Gmed
I I I I I Y
Ymed
G1 (
Y1
Ymed
Y2
Figure 4.1
(33) with respect to log Y yields: d log G/d log Y =
+
/f2
3d
log PG/d log Y.
(34)
Now assume (as do Bergstrom and Goodman) that the public good is property tax financed and there is a constant income elasticity of demand for housing, 7 (i.e. log H = CO + q log Y). Since log PG = log(cp/B) + log(H) from (30), it follows that log PG = Co + Xlog Y.Then, substituting into (34), it follows that d log G/d log Y =
/2 + /3X-
(35)
This suggests one useful empirical test of the median voter model: the income and price elasticity of demand for public goods, /2 and 3, as well as the income elasticity of demand for housing, 7, must make /2 + 3q positive. If this condition does not hold, then all empirical results will be suspect.4 5 45
To the extent that empirical estimates are available, they do tend to support this necessary condition for the median voter model.
608
D.L. Rubinfeld
The "median voter" methodology has been used by a large number of authors to estimate demand functions for public goods. Results vary by type of data base and of expenditure item studied.4 6 However, a few general conclusions for education may serve to give the reader a flavor of the results. Income elasticities vary from about 0.4 to 1.3 but most are substantially less than 1. Price elasticity estimates are generally very low in the range of -0.2 to -0.4. While income elasticity estimates are reasonably comparable, price elasticities' estimates are not, since empirical studies differ substantially in their definition of tax price. 4 7 Still, the estimates of the elasticities are consistent with the use of the median voter model. 48 Additional support comes from the fact that most empirical evidence indicates that p/4 is reasonably close to 1, a result consistent with the fact that education involves congestion (given a low price elasticity of demand). Nevertheless, the primary criticism of the median voter model is that other political theories are better able to explain public expenditure levels. Since Inman treats these issues in greater detail elsewhere in this volume, only a few issues are mentioned here. In many states a jurisdiction's expenditures are not determined by referenda. As a result, the role of political parties, information and voting become extremely important. Even in the context of referenda-determined expenditures, the median voter model requires strong assumptions. For example, it assumes that all eligible voters vote and that the suppliers of the public services, the boards of education or city councils, play an entirely passive role in determining the referenda agendas. Both Romer and Rosenthal (1978), and Denzau, Mackay and Weaver (1981) have suggested models in which agendas may not allow voters to choose between the median desired level and other alternatives. Romer and Rosenthal argue that if the alternative to passage of the proposal is a reversion to a substantially lower level of spending, then voters may opt for a higher than median desired level. This reversion agenda setting argument is troublesome, since many communities offer a sequence of elections. In such a case even if one or two referenda fail, a final passage can ensure continued expenditures. In addition, the literature on agenda setting does not fully resolve the question of whether electorates would, 46
Reviews are given in Inman (1978) and Denzau (1975). See also the review of the literature on demand for education in Bergstrom, Rubinfeid, and Shapiro (1982). While most of the literature deals with demand in the United States, there has been similar evidence outside the United States. See Pommerehne and Frey (1976), Pommerehne (1978), and Pommerehne and Schneider (1978). 47 A useful summary of these comparative results is given in Bergstrom, Rubinfeld and Shapiro (1982). 48Anot)her troublesome problem with the median voter approach is that preferences must not only be single peaked, but also one dimensional. However, recent work suggests that the majority rule outcomes may be unstable in a more general setting. See McKelvey (1976, 1979) for a discussion. Deviations and variations from median voter outcomes which might arise are discussed elsewhere by Comanor (1976) and Hinich (1977). The best review of the median voter model is given by Romer and Rosenthal (1979b). See also Filamon, Romer and Rosenthal (1982), Romer and Rosenthal (1979a), and Ott (1980).
Ch. 11: Economics of the Public Sector
609
when aware of the referenda setting activity of the local governments, rise up in opposition and either move to another community or vote out the people in power. Further development of the reasons that limited entry to the political process and/or the high costs of mobility among jurisdictions may result in agenda setting is needed. Romer and Rosenthal point out a number of empirical problems that suggest that demand function estimates may not provide strong support for the median voter model. One critique, which they call the "multiple fallacy", arises if, for example, all individuals desire twice the actual level of spending provided in the community. Given the model of equation (33), the only change in empirical estimates of demand parameters would be a change in the intercept. Since log 2G = log 2 + log G, the intercept in the equation with doubled demands would become log2 + fl or log(2 en'), but all of the other coefficients would remain unchanged. Why and under what conditions voters may be coerced into proportionally under- or overconsuming public services is an unanswered question, but troubling nonetheless. A second empirical issue surrounds the use of median income as an explanatory variable. To the extent that income distributions are similar among communities, one cannot be sure that the significance of the median income variable in an estimated demand equation supports the median voter model any more than it would a mean voter model or, for that matter, a model associated with individuals in any "fractile" of the income distribution. Current results are consistent with the use of median income, but no one has provided (to my knowledge) a robust test which distinguishes between the median and mean, for example. If the distribution of demands (or more correctly marginal rates of substitution) for public goods are nearly symmetric then the mean-median distinction becomes a trivial one. For asymmetric distributions, however, the difference can be important. As I discuss in Section 5.3, the median public good demanded would be the predicted outcome of majority rule voting. However, the mean demand would relate directly to the interjurisdictionally efficient provision of the good. With the mean substantially different from the median, majority rule is a political rule which leads to economically inefficient outcomes.
4.3. Some additionalempirical issues The demand function given in (33) specifies a level of real services provided. However, data on public goods are given usually in terms of dollars of expenditure. In order to account empirically for the difference between expenditures and output, G must be multiplied by an index of the price of public services. With the log-log demand form, this is equivalent to including price as a separate right-hand
610
D.L. Rubinfeld
variable. Since price is unavailable, most researchers use the average wage rate of public employees as a proxy. However, this proxy may not be a very good one because it does not include non-labor inputs to production and because the average wage may not account for the mix of labor quality. Of course, attempts to control for quality are often made. For example, in studies of education, quality can be proxied, although imperfectly, by the number of years of teacher experience or by the educational training of the teacher. Demand functions should be estimated for renters as well as homeowners. However, in the discussion to this point tax-price is calculated under the assumption that the median voter lives in the median value home. In the unlikely case that the distribution of tax-prices facing renters is identical to that of homeowners, then the median voter assumption may be satisfactory from an empirical point of view. Even if tax-prices differ, differences in other variables may be sufficient to compensate, so that demands are similar. For example, renters may face relatively low prices causing (other things equal) them to demand more of the public good. However, they may have fewer children or lower incomes, causing (other things equal) them to demand less. On net, it would be possible, although fortuituous, for renter demands and homeowner demands to be the same. However, most evidence suggests that renters' demands are different from homeowners in a systematic way and that voter turnout in local referenda is substantially lower for renters than for homeowners.49 As a result, demand estimation that does not control for renters may lead to biased estimates of demand functions. Note that this problem is really due to aggregation, since with individual data one could estimate different demand functions for renters and homeowners. In any case, the usual solution for the renter problem is simply to include an explanatory variable of the percent of renters in the community. This solution is suspect, since it implies that renters have identical price and income elasticities as homeowners and that demands are multiples of one another. By far the greatest econometric attention has been focused on the specification of the price term in the demand equation. One issue arises because the inclusion of tax base in the denominator of the price term implicitly assumes that an increase in the tax base of the jurisdiction will lower the price of public services. Whether this is true for commercial and industrial property depends upon the assumptions one makes about both the nature of public service demands by industrial property and the ability of commercial and industrial property to shift the property tax, either forward in terms of higher prices, backward in terms of lower wages, or to consumers living outside the jurisdiction who purchase the products.
49
See Rubinfeld (1977), for example.
Ch. II: Economics of the Public Sector
611
In response, Ladd (1975) has attempted to allow for different impacts of changes in the tax base on spending. To see how her procedure might work recall that PG = cpH/B, where B is residential tax base. Now let C represent commercial and industrial tax base. We can account for various treatments of commercial and industrial base by writing PG= [cpH/B][1 -a(C/(B + C))].
(36)
If a equals 0, then commercial base does not alter tax-price (there is no shifting), while if a is greater than 0, commercial base lowers tax price. For example, when a = 1, PG = cpH/(B + C). Ladd's empirical estimates of a are in the range of 0.6, suggesting that there is some forward shifting of the commercial and industrial base. The specification of the price variable also depends on the nature of grants from higher levels of government. If all grants are matching and open-ended (no budgetary limitation), then the price term becomes cpH(1 + m)/B. The price variable is simply recalculated and the demand equation estimated as before. However, some aid is non-matching, while a number of matching grants are closed-ended with communities opting to receive all available aid. In either case the grant does not alter the price of the public good on the margin; rather, it serves to increase community wealth. As a result the magnitude of closed-ended matching grants and non-matching grants should be viewed as additions to income, since they increase the consumption possibilities of the median voter. To the extent that the presence of these grants is capitalized into land values, however both individual and community tax base may change, as may the tax-price facing the median voter.i ° While the specification just suggested is theoretically correct, empirical evidence suggests that grants of a non-matching nature may have a greater impact on spending than would ordinary income. To account for this many authors include a separate grants variable in the demand equation, so that the effect of an incremental dollar of income need not equal the effect of an additional dollar of non-matching grant. Most empirical estimates of demand functions assume that the distribution of tastes of voters is not different from that of non-voters. Evidence on this issue is quite limited, despite the fact that a number of socioeconomic variables are known to affect turnout. However, one must control simultaneously for all voter attributes when looking at differences in demands. One attempt to do this [Rubinfeld (1980)] provided a weak test that supports the median voter hypothesis. Rubinfeld considered three alternative models of non-voter behavior. The 50 This statement is not quite correct, as discussed in Gramlich (1977). See Section 6 for more detail about the grants variables.
612
D.L. Rubinfeld
first hypothesis was that non-voters are alienated because their demands are either substantially lower or substantially higher than that actually provided. A second hypothesis was that non-voters are indifferent in the sense of being content with current spending levels and thus do not find voting worthwhile. The third is that the distributions of voter and non-voter demands are no different, so that the choice to vote is essentially random and independent of tastes. The study concluded that the random hypothesis could not be rejected, while the others could. But, because the study was limited to one jurisdiction in the State of Michigan, the results may not generalize. In studying public employee turnout and voting behavior in Michigan, Gramlich, Rubinfeld and Swift (1981) found the contrary. Their work suggested that non-voters are opposed to tax limits and generally support public spending relative to those who voted. In addition, Gramlich and Rubinfeld (1982) found that public employees with greater tastes for public goods voted at a much higher rate than private employees. Whether the turnout rates reflect differences in underlying tastes or the possibility of earning rents is difficult to say.
4.4. Estimating demand functions in a Tiebout world The approach to estimating demand functions just described is based on a median voter approach to the within jurisdiction determination of the provision of local public goods. No recognition has been given in the previous discussion, and in much of the literature, to the fact that Tiebout-related migration might be occurring at the same time. Goldstein and Pauly (1981) raise this important issue, suggesting that the presence of interjurisdictional migration can lead to biased parameter estimation if demand function estimation is carried out as suggested. The following relatively brief discussion of the issue borrows directly from their paper. In Section 4.1 I suggested that it would be appropriate to estimate a demand function of the form given in equation (33), repeated here: log G = I + t21og Y +
31
lg PG + 41og N + 51og Z + e.
(33)
In the classic normal regression model the right-hand explanatory variables which determine public spending G are assumed either to be fixed (exogenous) or, more realistically, to follow a probability distribution that is independent of (or uncorrelated with) the disturbance term e. The dependent variable, log G, and the disturbance, e, are assumed to be randomly distributed normal variables. The Tiebout process which brought about the distribution of households and public spending demands is one in which individuals search and move on the basis of the level of public spending. In general one would expect that migration
Ch. 11: Economics of the Public Sector
613
based on public sector demands would lead to changes in the distribution of income in the communities from which individuals leave and to which individuals move. If that change in distribution alters the income of the median voter then demand estimation will be biased. Broadly speaking this bias results from the fact that the demand and supply of public goods are being simultaneously determined. Properly, consistent estimation should take into account the fact that while changes in income cause changes in public spending, changes in public spending (through migration) cause changes in income. The same argument can be made with respect to tax-price as well, but the main point should be made clear. Except in unusual circumstances, if individuals sort themselves on the basis of public goods provision, least-squares single-equation estimation of demand functions will yield biased and inconsistent results. The obvious solution to the problem of bias is to correctly specify a two (or more) equation system which determines income and public spending jointly and hopefully allows the demand equation to be identified. However, this is easier said than done, since identification requires information about the supply or migration relationship either in the form of a variable that affects migration but not public spending determination (i.e. affects Y, but not G) or in the form of a restriction on the covariance of the cross-equation error term, a restriction that may have little or no economic interpretation. Research work in this area, both theoretically and empirically, is currently underway, as in Rubinfeld, Shapiro and Roberts (forthcoming). If one is willing to make further assumptions about the nature of the Tiebout process, then it is possible to determine the direction of the bias which results when least-squares estimation is used, and in the process to get some intuition about how an unbiased or at least consistent estimator can be determined. Assume (following the spirit of Goldstein-Pauly) that there are a number of communities containing individuals sorted on the basis of their demands for public goods. To simplify the analysis, assume also that income is the only determinant of the demand for public goods. The income elasticity of demand is also assumed to be positive. Suppose that the demand for public goods is given by G =a+fY+ e, and concentrate on the lowest income community. If all low income individuals resided in the community, then we would have reason to believe that omitted variables would cause the error term e to be symmetrically and perhaps normally distributed. However, consider what happens when individuals sort themselves on the basis of their demands for the public good. All those with relatively low demands, i.e. with less than zero, will choose to locate in the low income, low spending community, since there is no lower spending alternative. However, the choice is different with those with a positive disturbance. Some of them will choose (if the disturbance is large enough) to locate in the higher spending jurisdiction in which residents with somewhat higher incomes live. This suggests
614
D.L. Rubinfeld
that the error distribution associated with the low spending jurisdiction will become truncated, with the truncation being determined on the basis of individual demands for public goods. Such a truncation or grouping of data leads to bias when the grouping is based on the values of the dependent variable-spending on public goods. In this particular case the bias arises because the error term is positively correlated with income - high error term individuals are more likely to live in higher income, higher spending jurisdictions than those with negative error terms. Of course, the opposite story would hold were we to discuss the sorting of individuals at the high end of the income distribution. Here negative errors would be associated with lower incomes. On balance, least-squares estimation in this truncated model will lead to overestimates of the income elasticity of demand for public goods. The cause of the misestimation is the fact that the individual with the median income will not be correctly identified when demand estimation is done using aggregated data. For example, one is likely to incorrectly think that the individual whose demand determines the level of spending in the low income community is the individual with the median income in that community. If the Tiebout process works along the lines just described, what then is wrong with the estimation procedure suggested by Bergstrom and Goodman? One answer is that the assumption of proportionality of income no longer holds. In the community providing the lowest level of G, the income of the individual with a zero disturbance value will be less than the median income of those in the community. This follows because the community will contain a number of individuals with relatively high incomes but low demands for the public good (since those with low incomes and relatively high demands moved out). The reverse will be true for a community providing a high level of the public good. The distribution of income in that community will be skewed to the left, rather than to the right as in the low income community. The distribution of income in the high income community will as a result not be a multiple of the distribution in the low income community- the third as well as first moment of the distribution of income will have changed. 5. Applications of the demand approach 5.1. Alternative methods for estimating the demand for local public goods
The median voter model is the most popular, but not the only approach to estimating demand functions for public goods. While a number of alternatives have been tried, we will focus on two that seem particularly intriguing. The first approach uses micro (individual) survey data as opposed to macro or aggregated
Ch. 11: Economics of the Public Sector
615
data. It should be noted that voter surveys are subject to the criticism that the answers do not reflect true preferences for public goods because of the hypothetical nature of the questions asked. 5 The ability to estimate demand functions using survey data depends upon the nature of the community or communities being studied. If there is no expenditure variation among communities being studied, demand functions cannot be estimated. With expenditure variation, the estimation process becomes feasible. To see how this approach might work, consider a referenda voting example. 5 2 The alternatives in a local millage election may be to increase local spending or to keep it at the current level of spending. Assuming that the millage increase is very small, a "yes" vote is a revealed preference for greater spending and a "no" vote for lesser spending (it is assumed that everyone votes). Obviously, public spending, to which voters respond, is the same for all individuals in a jurisdiction. In Figure 5.1, panels (a) and (b) illustrate two distributions of desired demands for public goods within the same community. In the distributions of both panels (a) and (b), survey studies of voter behavior in a millage election should show similar results. Roughly half of the voters would state that they voted yes and half would state that they voted no, whether the distribution of demands had a high variance or a low variance. However, the two distributions are likely to be associated with different demand functions for public goods. For example, if demand is monotonically related to income and both communities have identical income distributions, then the demand function in panel (b) would have a higher income elasticity of demand than the demand function in panel (a) (given identical tastes). Unique income elasticities and other demand parameter estimates cannot be estimated when actual expenditures do not vary. To see how one might estimate a demand function when expenditures do vary, examine Figure 5.2 in which panels (a) and (b) describe two different distributions of demands for public goods, panel (b)'s distribution having a greater variance. Suppose that survey data are available for two communities, one with per capita spending of $1000 per pupil and the other with $1500 per pupil. In the first community 50 percent of the voters say that they want more spending and 50 percent say they want less. At this point either the distribution in panel (a) or panel (b) could represent the underlying demand for public goods. Now include information from the voting in the second community. Assume that 20 percent of the population say they want more spending and that the underlying characteristics of the population (income, price, etc.) are identical in the two jurisdictions. Then, given a normal distribution of desired demands, only the distribution in panel (b) is consistent with the survey responses in both communities. 51 A great deal of attention in local public finance has schemes to get people to reveal their preferences correctly. demand-revealing preference techniques, see Tideman and 52 See Rubinfeld (1977) as well as Akin and Lea (1982),
been placed on the notion of designing For some examples of attempts to derive Tullock (1976). Fischel (1979), and Citrin (1979).
616
D.L. Rubinfeld
(a) Prob.
I
G* (h
Prob.
Figure 5.1
Obtaining survey information from more than one community enables one to estimate a demand equation, because it gives information not only about the mean of the distribution of desired demands but also about the variance. One expects that the greater the variance in spending levels among communities surveyed, the more likely we can accurately estimate the demand for public goods. Using survey data collected by Courant, Gramlich and Rubinfeld (1979), Bergstrom, Rubinfeld and Shapiro (1982) analyze school expenditures demands. Respondents were asked to state whether they preferred more, less, or the same spending on public schools, after being given information about the characteristics of schools, public spending on schools, and the tax costs of voting for higher spending. The qualitative responses are used to identify demand functions in the following manner.5 3 53 This approach is similar to an approach taken earlier by Rubinfeld (1977) and by Shapiro (1975), and also relies heavily on the qualitative choice literature of McFadden (1973).
617
Ch. II: Economics of the Public Sector
(a) P
Prob-
(h) 1-1
1000
1500
G
Figure 5.2
Assume that the demand for public goods is given by 54 log G* = log Di - log ei .
(37)
Here D i summarizes the non-random portion of the individual's demand function, G* is the desired public good, and e is a random error term. The negative sign is solely for convenience.) Let G, equal the public output in the individual's jurisdiction, and let d, a number greater than 1, be a threshold parameter that accounts for substantial differences between desired and actual output and triggers an individual response of dissatisfaction.55 Responses for more spending or a vote for a millage, less spending or a no vote, or the "same" spending or a 54 The fact that responses are made with error is not a problem as long as the random error is not systematically related to some of the explanatory variables. 55 d can vary by individual without substantially complicating the analysis.
D.L. Rubinfeld
618
decision not to vote occur when: (i)
(more) G* > dG,,
(ii)
(less) G * < Gild,
(iii)
(same) G,/d < G * < dGi.
(38)
Taking logarithms (38) and substituting into (37), yields the following: (i)
(more) log ej < log D i - log d - log Gi,
(ii)
(less) log e' > log Di + log d - log Gi,
(iii)
(same) log D + log d - log Gi > log e > log D - log d - log Gj.
(39)
The key to econometric estimation of the micro demand model is to realize that the probability that (39, i) holds is given by the cumulative distribution function of the density of log ej when evaluated at (log Di - log d - log Gj). Similarly, the probability that (38, ii) holds is equal to one minus the cumulative distribution function, evaluated at (log Di + log d - log Gi), and finally, the probability that (39, iii) is equal to one minus the two previous values. The three numbers represent the probability or likelihood that an individual with a given demand function will respond more, less, and the same. Demand function parameters are estimated by maximizing the value of the joint likelihood function obtained by multiplying together the probability of an individual making a particular response over all individuals in the sample. A number of alternative statistical assumptions are possible, but Bergstrom, Rubinfeld and Shapiro chose the multiple logit or logistic model (the ordered logit, to be specific) for estimation purposes. 56 To see how the approach works, assume that the systematic portion of the demand function is given by: log D, = r0 + illog Y,+ 2 1°g PG, and that the variance of the error term is a 2. Then, each of the equations in (39) can be written in terms of the underlying demand parameters. For example (39, i) now becomes log e, < ( , - log d) +/ 110og Y + ,210g PGi- log Gi.
(40)
Because of the normalization of the logistic distribution, the statistical output will usually include coefficients that are deflated by the standard deviation of the 56
For a detailed discussion of the logit model, see Pindyck and Rubinfeld (1981, ch. 8).
Ch. 11: Economics of the Public Sector
619
error term.5 7 Since d is constant but G varies across jurisdictions, the right-hand side of the estimated equation, excluding the constant term, is as follows: (8/a )log Y + (
2 /)log
PG, - (1/a)log Gi .
(41)
The dependent variable (not written) is the logarithm of the likelihood of being in a given group (more, the same, or less). The constant term will be a somewhat complex function of ,o, a, and d, but is not of immediate concern here. The price elasticity of demand, / 2 , is simply the negative of the ratio of the coefficient of log PG, and the coefficient of log Gi, while the income elasticity is the negative of the ratio of the coefficients of log Y and log Gi. Note that the coefficient of log Gi provides an estimate of the standard deviation of the error term and thus tells how accurately one can distinguish between alternative demand functions. If G, does not vary, such as when data are from a single jurisdiction, one cannot estimate the error variance, and thus the parameters of the demand function cannot be determined.5 8 When this technique was applied to the Michigan survey data, the results were generally comparable to those obtained using aggregate data. Individual price elasticities were similar to those from median voter models, lending support to both approaches. Of course, survey data are more expensive to obtain than the data for cross-sectional median voter studies, but survey data have more detail about voter characteristics than other aggregate sources. In addition, the micro results were informative about the behavior and attitudes of individuals according to items such as race, income, occupation, religion, age, and family composition. A very different approach to demand estimation uses the fact that with capitalization the price of housing and land reflects individuals' valuations of all of the attributes that are associated with housing, including the level of public goods. Assume that the supply of land and/or new communities is not perfectly elastic so that capitalization is maintained even after supply responds. Then, under reasonable conditions one can obtain information about preferences for public goods from variations in property values across communities. This hedonic approach has been emphasized by Rosen (1974) and evaluated empirically by a long list of authors, most of whom have focused on environmental attributes rather than on public goods. 5 9 To develop the hedonic method I alter somewhat the utility maximization approach used earlier in this chapter. Rather than viewing housing and the public 57
Log Ei/o is unit logistic, since it has mean zero and variance one. 58The constant term is a bit more difficult and depends on the nature of the computer program. In general, however, we can obtain a separate estimate of the constant 80, and the threshold parameter d. See Bergstrom, Rubinfeld and Shapiro (1982) for details. 59 See, for example, Harrison and Rubinfeld (1978), and Nelson (1978).
620
D.L. Rubinfeld
good as distinct commodities, I assume that housing is a commodity which itself consists of a series of attributes, one of which is the level of public good available in the community. Other attributes might include structural characteristics of the house itself (house size, number of bathrooms, etc.), neighborhood characteristics (crime rate, socioeconomic status of the community, etc.) and accessibility characteristics (distance to work, distance to shopping areas, etc.). To distinguish this view of housing from the previous one, I let h(G, a) represent the housing bundle, where G is the level of public good, and a represents the vector of all other housing attributes. As before, p represents the price of housing, but now p(h) is a function of the housing attribute vector. The p(h) function might be linear as has been assumed previously, but it might be non-linear as well, depending upon one's assumptions about the extent to which housing attributes can be combined into housing bundles. The utility-maximization problem is now: maximize U(X, h(G, a)) subject to Y = X + p (h).
(42)
The first-order conditions with respect to G and to X are (i)
(aU/Oh)(Oh/OG) = X(ap/ah)(h/OG),
(ii)
au/ax= .
(43)
Combining (43, i) and (43, ii) yields: W= (aU/aG)/(aU/aX) = p(h)/aG.
(44)
The left-hand side of (44) is the marginal rate of substitution between the public and private good, while the right-hand side is the incremental change in house value associated with an increase in the level of the public good. In other words, (44) tells us that willingness-to-pay for the public good, W, can be measured by differences in house values, assuming that all other differences in housing attributes are held constant. Once information about the willingness-to-pay is known empirically, the demand function will also be known. The direct link occurs because the marginal rate of substitution is itself the compensated (equal utility) demand curve for the public good, which differs from the usual uncompensated demand function solely because of income effects. The hedonic approach depends crucially upon the presence of capitalization. As a result it cannot be applied in the case of a long-run Tiebout model in which capitalization is not present. In the short run or intermediate run, however, with
621
Ch. 11: Economics of the Public Sector
housing and public goods choices somewhat limited, housing price differentials should reflect differences in the provision of public goods. There are a number of possible ways to proceed, but the most direct is to first estimate the hedonic housing price function p(h). In practice the capitalized value of the property (PV) will often be measured rather than the annual price p, so that one must be careful to interpret the results as total willingness to pay, rather than an annual figure. Using PV as the dependent variable, a regression of the following form might be estimated: log(PV) = 0 + illog(G) +
2 1og(a)
(45)
+ E,
where is a random error term. With this particular formulation it follows directly that 8(log PV)/a(log G) =
l.
(46)
Now, using the fact that a(log X) = (1/X)(aX), (46) becomes W= a(PV)/aG = ,I(G/PV).
(47)
Thus, the marginal increment to house value from high public service provision is equal to P, times the ratio of public spending to the individual property value. If the market is in equilibrium, then this number represents individuals' marginal willingness to pay for incremental amounts of the public good. Furthermore, if all consumers are identical, W is the marginal rate of substitution of public for private goods and characterizes the demand function for public goods. The hedonic approach has its difficulties. In general one must worry about the confounding effects of supply and demand. For example, assume that the level of the public good in a community is determined in part by the income of the median voter. In this case income serves as a demand variable. However, assume also that the median voter's income arises from his or her wage rate as a public employee. Then income also represents a supply variable, the cost of producing the public good. Unless the supply and demand explanations can be sorted out, both theoretically and econometrically, the income elasticity of demand for the public good cannot be determined. To illustrate how the approach is applied, consider the easiest case in which supply is fixed and not determined by income or other attributes. One can estimate equation (48):60 log W = a o + allog G + a21og Y + ,
(48)
60 The more general analysis that worries about supply and demand is suggested in Rosen and attempted by several other authors such as Harrison and Rubinfeld (1978).
622
D.L Rubinfeld
which represents an inverse demand function, since W is the marginal valuation or price associated with the public good and G is the level of public good. The estimate of the parameter a tells how individual valuations of the public good vary with the level of that good, while the parameter a2 tells us how valuations vary with income. From these parameters, the standard price and income elasticities of demand can be calculated (assuming sufficient variation in G). For example, the income elasticity of demand would be (aG/G)/(aY/Y) = (log G)/a(log Y) = [(log W)/a (log T)]/[a (log W)/
(log G)] = a2 /a. (49)
The hedonic approach to valuing non-market attributes is questionable if people have limited information, supply is somewhat elastic, or if markets do not equilibrate. In addition, proper specification of the functional form is crucial for empirical work. 61 However, most of these problems are common to a demand estimation approach. And, the hedonic method provides an interesting alternative approach to the estimation of demand functions which can be carried out and evaluated without unreasonable data collection needs. How this approach as well as the previous ones can be of value as a prescriptive as opposed to descriptive device is the subject of the following section. The micro approach can provide one solution to the Tiebout bias problem discussed earlier. With micro data one no longer needs the assumption of Bergstrom and Goodman concerning proportionality of income distribution. Or, more generally, one need not worry about the problem of finding the income and other attributes of the individual whose demand for public spending is the median demand of those within the community. However, the Tiebout bias problem does not completely go away, even with micro data. If individuals sort themselves on G, then the simultaneity problem still exists, and a multiple equation, rather than single equation approach may be appropriate.
5.2. Will the median level of services provided be efficient? Demand functions for public goods have a number of uses, both positive (predictive) and normative. In this section one of the normative uses of the 61 Note also that I am not arguing that changes in property values themselves measure peoples' willingness to pay, but solely that information from property values can be used to induce demand functions. For a discussion of this issue, see Polinsky and Rubinfeld (1977). See also Brown and Rosen (1982) and Scotchmer (1986).
623
Ch. 11: Economics of the Public Sector
estimated demand functions is pursued-the question of the efficiency of the median-voter-determined level of public good provision. Whether the empirical approach is micro or macro and whether one works with demand or utility functions, is not a central concern here. Of course the median voter determination of public output is a positive, not a normative result. And, not surprisingly, the median voter demand is generally inefficient for a number of reasons. First, if individual tastes and incomes vary, a Lindahl equilibrium is possible in which each individual's desired public good is equal to the level of public good provided. However, even under these conditions, sorting of individuals among jurisdictions can lead to interjurisdictional efficiency gains. Second, given the current distribution of individuals, the willingness-to-pay for public output, net of costs, is not necessarily maximized. It is this second intrajurisdictional inefficiency that is of concern here. For an example of how this second type of inefficiency arises, recall that the marginal rate of substitution is a diminishing function of the level of public goods demanded (other things equal). If the utility function, U, is Cobb-Douglas, i.e. U = In X + a In H + ,f log G, then MRS = (au/aG)/(au/ax) = 1/G, which is monotonically decreasing in G. Now consider the distribution of marginal rates of substitution of public for private goods at the current level of spending. Individuals, with an MRS less than that of the individual who desires the current level of spending will desire less spending, while individuals with MRS greater than the current MRS of the median voter desire more spending. If the distribution of MRS is not symmetric, then the median voter outcome is inefficient. Consider a distribution of MRS which is highly skewed to the right, as in Figure 5.3. The mean of such a distribution is greater than the median. However, the median represents the level of public services demanded by the median voter
Prob.
_
Med. Mean Figure 5.3
MRS i
624
D.L. Rubinfeld
and thus the outcome of a majority rule referendum, while the mean is the more efficient level of provision, since it weights preferences by willingness to pay. Of course, this formulation assumes that there are no side payments made, no logrolling, and no package deals. This efficiency requirement is derived from Samuelson's condition that EMRS i = MRT. Dividing both sides of the equality by the number of individuals in the community, N, yields YMRSi/N = MRS = MRT/N. The left-hand side of the equation is the mean level of the MRS, and the right-hand side is the cost of an additional unit of public goods divided by the community population. If public goods are paid for with a head tax, i.e. an equal tax on all individuals, then the cost to an individual is MRT/N. For the individual with the mean marginal rate of substitution, his or her valuation of public goods equals his or her cost. Thus, to attain an efficient solution the preferences of the "mean" voter, not the median, should determine the referenda outcome. The "median" and "mean" outcomes will be equivalent when the distribution of MRS is symmetric. 62 When a property tax is substituted for a head tax, the efficiency conditions are more complex. However, the concept remains the same; efficient provision of public goods focuses on the mean of the marginal rates of substitution, not the median. 6 3 Empirically, means and medians are difficult to distinguish, but there is a viable test. One fits a distribution to the marginal rates of substitution across the population and tests whether the mean equals the median, using a non-parametric test.6 4 This test can also be used to see whether government is too large. If the sum of the marginal rates of substitution is less than the marginal rate of transformation, then overspending must have occurred. This discussion is based, however, on a rather simplistic view of government. A more complete analysis distinguishes at least three different behavioral models that result in non-optimal public spending levels. First is the model of Niskanen (1971) in which the level of public spending is greater than that desired by the median voter (and presumably greater than the efficient level) because budget decision-makers have bureaucratic power. In a sense this argument is a generali62 Note that a symmetric distribution of MRS is not the same as a symmetric distribution of demands, and that there will be a distribution of MRS for each level of spending possible. Consider the following example from Bergstrom (1979). If U = In X i +a In G,, c is the marginal (= average) cost of producing the public good, and a head tax is used, then G* = a/(1 + a)(C/N)Y. The median demand is then G*ed = a(1 + a)(c/N)Ymed. The MRS of each individual can be calculated to be: MRSi(Gea) = (Y, - (c/N)Gmed)/Gm*ed. Note, however, that the person with income Ymed and median MRS gets to consume his or her most preferred level of public good, Gm*,ed Since he or she is optimizing, his or her MRS is equal to the relative price of public goods, which is (c/N) under the head-tay scheme. It follows that the median MRS (evaluated at G,,d) = c/N. Finally, the mean MRS = (1/N) MRSi(Gd) = [(1 + O)(Y/Ymed) - a]. If Ymed = Y, the median outcome is efficient, otherwise it is not. 63For a well-written extension of the head tax case, see Barlow (1970). For a detailed discussion of these efficiency conditions including the case of an income or wealth tax and their implications, see Bergstrom (1979). See also Akim and Youngday (1976). 64Such a test was used in a different situation to test whether voter and non-voter demands were equivalent. See Rubinfeld (1980).
Ch. 11: Economics of the Public Sector
625
zation of the Romer-Rosenthal agenda-setting view. Bureaucrats want larger government output but are not particularly concerned about higher wages for themselves or their employees. The second model assumes that the wages of public employees are higher than they would be if they were determined competitively. This results in a tax-price which is above a competitively determined tax-price. Public budgets are higher than efficient budgets because public employees are earning rents. Presumably, these rents are the result of collective bargaining or political power on the part of public sector unions. This is the model considered in Courant, Gramlich and Rubinfeld (1979) and allowed for in the budgeting model of Inman (1981). In the papers by Courant, Gramlich and Rubinfeld, and Inman, there is limited mobility. Public employee unions have some power to set both wages and output, but must trade-off the advantages and disadvantages of each. Higher output, for example, may mean higher employment and greater bargaining power, while higher public wages may mean less output and less bargaining power. Courant and Rubinfeld (1982) show that even in the competitive world of the Tiebout model with unlimited mobility, the threat of public unions extracting rents is a real one.6 5 A third way in which government may be too large and inefficient, is that it may not be producing on the production frontier. Except for Fiorina and Noll (1978), there has been essentially no literature on the mechanisms by which such inefficiency may arise. One possibility is that government is perceived to be too large because there is waste due to the use of an inferior technology. What are the uses of these alternative models? They have important positive implications for our ability to predict the effects of policy changes on the local public sector and normative implications about the appropriate use of constitutional and statutory limitations of government. As an example, Courant and Rubinfeld (1982) suggest that the limitation on the total spending level of the public budget may or may not be an effective policy, depending upon what causes the inefficiency in the public sector. They suggest that the "Niskanen-like" inefficiency can be removed, at least in part, by an expenditure limitation, but that the rent earning ability of public employees is likely to be enhanced rather than diminished under such a reform. 6. Fiscal federalism 6.1. The responsibilities of local government The analysis of the provision of local public goods has shown us that with fixed jurisdictional boundaries and interjurisdictional externalities an efficient outcome 65
The same general result is given in Epple and Zelenitz (1981).
626
D.L. Rubinfeld
cannot be assured. However, it is possible to eliminate the inefficiencies associated with these interjurisdictional externalities in a model which is generalized to allow for multiple levels of government. Thus, it is natural for us to complete our overview of local public economics by briefly reviewing some of the problems and concerns associated with a federalist system. The analysis of federalism includes both normative and positive issues, but our focus will be primarily on the normative. This choice is made solely because of space limitations. The positive analysis of federalism is one of the more interesting areas of recent research in public finance. The central issue of federalism is whether certain goods and services or government responsibilities can best be provided and financed at the federal, state, or local level. This breakdown greatly simplifies reality, since there are numerous intermediate jurisdictions such as special districts, counties, and so on, and because many of these jurisdictions overlap spatially. In 1977, for example, there were over 3000 county governments, 18000 municipalities, 16000 townships, 15000 school districts, and 25000 special districts in the United States. This actually represents a decrease in the number of jurisdictions from the 1950s. Since space is limited and this chapter is primarily methodological, it will be useful to simplify the discussion of the important normative issues by assuming only two levels of government -local and national. The question then becomes: Which level of government is most appropriate for handling which public function?6 6 The usual taxonomy for analyzing optimal fiscal federalism is due to Musgrave (1959), and has been elaborated by Oates (1968, 1972), among others. Since this taxonomy is analytically valuable, it is followed reasonably closely here. Musgrave distinguished between three "branches" of government: the stabilization, distribution, and allocation branches. Although not analytically independent, these branches are usually treated separately, an approach which I follow here. 6.1.1. Stabilization One of the roles of any government is to use macroeconomic policy to stabilize prices and employment in an inherently cyclical national economy. Stabilization is not an important function of local governments, however, because local governments can use only fiscal rather than monetary tools and because they are relatively open to trade and migration. As a result, a good argument can be made that the federal government is in the best position to handle stabilization policy. One argument against stabilization being a local function is that local governments are required by law to balance their budgets. The budget balance constraint eliminates the major source of fiscal stabilization-deficits in times of 66
For a broader and more detailed discussion of these issues, see Oates (1972).
Ch. 11: Economics of the Public Sector
627
recession and surpluses when the economy is at or near full employment. Whether or not stabilization is the primary goal of assigning a government program to the federal or local level, that assignment can have stabilizing or destabilizing effects. Consider, for example, a recent (1982) Reagan Administration proposal to have state governments take over primary responsibility for managing welfare programs. Since welfare benefits are highest in times of recession, and state and local budgets must be balanced, the reassignment of functions can in itself be destabilizing, even though the goals of the program are primarily redistributive. One possible role for stabilization at the local level involves situations in which there are substantial regional differences in income and unemployment. 6 7 For instance, one might argue that the Northeast and Midwest public sectors should engage in policies to counter local unemployment problems since they have first-hand knowledge about the nature of these problems. There are substantial problems with regional stabilization policies, however. First, vast leakages to imports and exports make attempts to stabilize local economies ineffective. Second, these policies may increase competition among regions and jurisdications to attract business and other revenue-generating facilities. The net effect of such competition may leave the regional economy relatively unchanged and make all local governments worse off. This might arise because of the increased taxation needed to finance the tax breaks given to government with the resulting deadweight loss. Third, even if competition does alter outcomes, it may be inefficient from a social point of view. If the Northeast economy is "moving" to a new long-run equilibrium in which real incomes are substantially lower than they were previously, attempts by the local regional government to counter this equilibrating move may be socially unproductive. This controversial point is one of substantial current interest which has not been analyzed thoroughly.6 8 6.1.2. Distribution
To the extent that government has a role in redistributing income, the redistribution branch will be seriously limited if it operates at the local level. For example, if the Tiebout model is correct, then there is likely to be substantial income homogeneity within jurisdictions. With no disparities in income within jurisdictions redistribution is simply not possible. Local redistribution is also limited by the prospect of migration. If one jurisdiction redistributes more income than the other (which is otherwise the same), then the poor would have an incentive to move to the community with the more redistributive policy. This migration would 67
Tresch (1981) discusses these issues much more thoroughly than space allows here. 68See Courant (1983) for an interesting analysis of the tax and expenditure subsidy question among regions.
628
D.L. Rubinfeld
add to the first jurisdiction's cost of achieving its distributive goal. As a consequence of incentives such as these, local jurisdictions are likely undertaking " too little" redistribution. All in all, these arguments suggest that a larger level of government is appropriate for redistribution, especially if redistributive goals are the result of a national consensus. While the capacity of successful redistribution is greatest at the national level, there are some arguments for redistribution to be done by the lower levels of government.6 9 These arguments focus on the limitations of the market model as well as the differences between municipal and federal decision-making processes. For example, a number of authors argue that at the metropolitan level the transactions costs of running government, including the ability to correct evaluate distributional needs (who is poor, etc.), are less than they are at the federal level. Within limits, redistribution can be somewhat successful since both individuals and businesses are not perfectly mobile. Whether a metropolitan government chooses to substantially redistribute income is problematic, however, since such a government may well be controlled by the middle and upper income suburban residents.7 0 A related, but somewhat different, argument has been made by Pauly (1973). Pauly assumes that income redistribution from rich to poor increases the utility of both parties. According to this view, income redistribution is a pure public good and its "provision" makes both parties better off. However, the public good is local, since individuals care more about helping the poor in their community rather than elsewhere. As a result, a first-best solution requires differing redistribution programs in each jurisdiction. Such a goal clearly cannot be achieved with a uniform national program. And, except under very strong conditions, a decentralized solution in which each jurisdiction sets its own program is suboptimal. A more interesting income redistribution issue focuses on the distributional consequences of the local provision of taxes and services. If all local taxes and public goods provision are the same irrespective of household income, then the local public sector will be distributionally "neutral" and will not conflict with federal policies. However, if the outcome of the local public sector spending and taxing pattern is not neutral, and is different than the norms of the federal level, efficiency problems can arise. As an example assume that the outcome of a Tiebout-like process is for community tax bases to vary substantially. Then, some communities will be able to finance a given level of public services at a lower tax-rate than will others. Should the federal, state, or even metropolitan government involve itself in redistributing to equalize the "fiscal capacity" of communities? A number of authors have argued that the equalization of tax base is an 69 See 70
Tresch (1981, ch. 30) for details of this argument. For an argument for metropolitan government and redistribution, see Burstein (1980). See also Rose-Ackerman (1981).
Ch. 11: Economics of the Public Sector
629
appropriate policy from the point of view of equity, at least as far as educational finance is concerned. However, equalization of tax bases can be inefficient since it conflicts with what might have been the efficient outcome of a Tiebout sorting process. The matter becomes even more complex when we recall that differences in fiscal capacity are likely to be capitalized into land values as individuals from low tax base communities migrate into high tax base communities. As a result capitalization will compensate, at least partially, for differences in fiscal capacity. In this case individuals who consume public services at a low tax-price are paying for part of that benefit in higher housing prices. Despite the effects of capitalization, some equalization may generate desirable fiscal and income distributions. One can argue, for example, that differences in fiscal capacity per se violate a principle of horizontal equity, since individuals in different communities will not be treated equally. However, why one should worry about a limit equity goal is not clear. Nevertheless, equality or near equality of fiscal capacity may be a means, albeit a costly one, of partially achieving a goal of equalizing expenditures on merit goods such as education. This issue is discussed in greater detail in the section on grants-in-aid which follows. Empirical evidence on the fiscal equity aspects of the local public sector is quite limited and of questionable reliability. At this stage in the development of the subject area of local public finance both the underlying methodology and the actual empirical estimates of fiscal impacts on the distribution of income are in need of substantial improvement. Space, however, limits my analysis to a brief and necessarily somewhat cryptic description of the state of the art. For my purposes it is the qualitative nature of the distributional differences among levels of government that is important. Inman and Rubinfeld (1979) summarizes previous empirical work by focusing on two measures of equity, one on the tax side and the other on the spending side. As shown in Figure 6.1, s, the tax-equity index, is the percentage increase in after tax income that occurs in response to a 1-percent rise in before-tax income. A value of s equal to 1 indicates a proportional income tax, less than 1 illustrates a progressive tax, and a value greater than 1, a regressive tax. The other equity variable is the elasticity of local service expenditures, v, which measures the percentage change in local expenditures given a percentage increase in family income. If service outlays are independent of income, then v is 0. However, v greatly increases when service expenditures rise with family income, making the distribution of service expenditures pro-rich. Based on a detailed study of a number of incidence papers, Inman and Rubinfeld conclude that the federal level of government is the most progressive in terms of its tax and spending program, since spending is essentially proportional while taxes are mildly progressive. Welfare is viewed here as a federal and not a local program. The state level relies on an essentially proportional tax and a mildly pro-rich spending system. However, evidence from the local level suggests
630
D.L. Rubinfeid Tax Eauitv (s) 1.07
Private Sector
1.06 1.05 1.04
1.03 Local 1.02 1.01 ,State
1.00
Federal
0.0
01
02
0.3
oA
0.5
06
Spending Equity (v) Figure 6.1
that the tax system is somewhat regressive and that spending levels are somewhat pro-rich. In light of the analysis of the Tiebout model, this outcome is not surprising, since any attempt to be substantially progressive at the local level is likely be thwarted by the migration of high income families. Legal attempts to alter distribution have concentrated on zoning laws as well as the distribution of tax liabilities and expenditure benefits within jurisdictions. Inman and Rubinfeld (1979) suggest that, while a number of legal decisions appear to be equityimproving, the economic impact of those legal decisions has been relatively small. Likewise, legislative attempts to achieve equity by equalizing tax bases among jurisdictions have also had relatively small effects. These failures are due in part to the fact that individuals can migrate to avoid what they deem to be adverse distributional effects, and partly to the ability of jurisdictions to exercise their political power to thwart changes in spending and tax patterns in the first place. The inability to alter distribution follows from the fact that both legislative and judicial attempts to achieve equity do not confront the fundamental observation that there are substantial income differences among families within metropolitan areas. Attempts to achieve income redistribution through the local public sector are much less likely to be economically successful than a federal program of income redistribution through either the tax or the spending side of the budget. 6.1.3. Allocation
The third branch of government, the allocation branch, provides the best motivation for a multi-level federalist system. As the Tiebout model suggests, local
Ch. 11: Economics of the Public Sector
631
governments are likely to be most responsive to the variations in demands for public goods. What makes the federalism question interesting is the spatial nature of the public goods produced. For example, the theory of clubs suggests that there is an allocatively efficient size for government, where the marginal benefit from consumption of the public good is equal to the marginal cost of congestion. When the jurisdictional boundaries are fixed, however, the benefits of certain public goods produced in one jurisdiction are likely to spill over to the residents of neighboring jurisdictions. In general, the externality results in an inefficient outcome if public spending and taxing decisions are entirely in the hands of the local governments. For example, if suburban residents use some of the services of the central city but do not pay either user fees or taxes on wages to compensate for the benefits received, then the city may underproduce the goods. Economic externalities can be dealt with in two ways. First, the externality can be internalized by shifting the decision-making process to the metropolitan government level. Second, a system of taxes and subsidies can be designed that charges those generating the externalities for the costs and benefits they confer. For example, if the externality is generated by the community as a whole, a system of state and federal grants or taxes would work. Empirically, whether such externalities are important and which direction they flow is relatively unknown. The usual argument, suggested by Neenan (1981) and others, is that suburbanites are subsidized by the central city, but Bradford and Oates (1974) have argued that the suburban labor force generates benefits to central city residents in its productive role. Fiscal as opposed to economic externalities occur with respect to the industrial base of the central city and suburbs. Fiscal externalities alter relative prices that jurisdictions face without necessarily changing the underlying production functions and thus do not necessarily involve allocative inefficiencies. Assume, for example, that heavily taxed commercial and industrial properties can pass on or export some of these taxes by raising prices. If the products are sold in a national market or at least a non-local market, then the property tax costs can be exported to those outside the jurisdiction. As a result, households residing in the jurisdiction may decide to increase public service output because of the low price they face. However, they impose external monetary costs on non-residents of the jurisdiction. These fiscal effects may or may not lead to inefficient outcomes, depending upon whether underlying productive relationships (e.g. scale economies and congestion) are altered. When economic externalities do arise, they can be remedied if revenues generated by a larger jurisdiction, say a metropolitan or statewide area, are shared with the smaller units of government." Obviously, the allocation branch has an important role in the assignment of taxes to the different levels of government. To minimize the inefficiencies which arise because people move to avoid the effects of taxes, one ought to tax mobile 71
This argument has been developed in extensive detail in Brazer (1961).
632
D.L. Rubinfeld
sources of revenue at the highest level of government and immobile sources at the local level. Thus, the decision to tax both personal and corporate income at the federal level in the United States is rational, particularly since the personal income tax is progressive. Similarly, since the property and sales taxes are perceived not to fall heavily on the mobile factors of production, these state and local taxes are regarded as those that minimize inefficiencies. However, the problem is much more complex than this brief summary suggests. For example, whether the property tax is an appropriate local tax depends upon one's view of the incidence of the tax. If the property tax is borne by owners of housing or land, then it is a desirable local tax from an efficiency perspective. However, if the tax is borne by all owners of capital or by businesses who pass it on by raising prices, then it is less desirable. In fact, the possibility of "tax-exporting" to non-residents is important at both the state and the local level. In an early study McLure (1967) estimated that roughly 25 percent of the taxes levied at the state level are shifted outside the state. Similarly, Roberts (1975) suggested that exporting (exclusive of the deductability under the Federal personal income tax) ranged from under 20 percent for the property tax, to over 25 percent for the sales tax in Michigan. The spillover and tax exporting issues have been analyzed theoretically by a number of authors. 72 The following model, based on Oates (1972), should help to clarify some of the issues. It illustrates why the larger level of government must intercede in some manner when there are interjurisdictional externalities. Assume that there are two communities, A and B, each with N identical individuals. Individual utility functions are UA(XA, GA + aGB) in the first community and UB(XB, GB + aGA) in the second. X is a private good, and G is a public good whose benefits spillover to the neighboring community so that a ranges from 0 to 1 and represents the extent of the externality. The externality here is real, not just financial. In addition, the local public good is produced by a production function, F(X,G)= 0, within each community and financed by a property tax on housing of which k percent is exported. The inefficiency of the decentralized model can be seen by comparing the necessary conditions for a welfare optimum to the necessary conditions for utility maximization within each community. Given that the two utility functions are weighted equally, the social optimum is obtained by maximizing NUA( XA/N, G + aGB) + NUB( XB/N, GB + GA)
(50)
subject to
F(XA + XB, GA + GB)=
(51)
72 See, for example, Arnott and Grieson (1981), Gordon (1984), Bird and Slack (1983), and Zimmerman (1983).
633
Ch. 11: Economics of the Public Sector
The first-order necessary conditions for optimality are N(aMRSA + MRSB) = MRT
and N(MRSA + aMRSB) = MRT,
where MRSA = (auA/aGA)/(a A/(aXA/N)),
MRSB = (auB/aGB)/(a
B/( aX
B/
N)),
and MRT= (aF/aG)/(aF/aX). If there are no spillovers (a = 0), then the standard Samuelson condition applies so that YMRSA = N(MRSA) = MRT = N(MRSB) = YMRSB in each community. When a = 1, there is a single aggregate Samuelson efficiency condition that YMRSA + YMRSB = MRT. In the remaining cases, the efficiency conditions state that the joint benefit to individuals in both communities from an increase in G must equal the marginal cost in terms of foregone private consumption (the MRT). What happens in the decentralized case when each community maximizes its own utility? For community A the level of G is determined by maximizing (52)
NUA( X/JN, GA + aGB)
subject to (53)
F(kXA, GA) = 0.
Here, the input necessary to produce a given level of public good has been reduced by the factor k to reflect the lower cost to the community of using a partially exported tax to finance the public good. The first-order conditions for maximization yield: N(auA/aGA)/(auA/( aXA/N)) = N(MRSA)
= (F/aG)/(k aFlax) = MRT/k
634
D.L. Rubinfeld
or kNMRSA = MRT. If k = 1 and the production function is constant returns to scale, then community A would increase GA until MRSA = MRT/N. However, if there are spillovers, then they are underproducing the public good. Instead, community A should produce more GA until MRSA = MRT/N - aMRSB. When k is greater than one and a is less than one, the bias is in the other direction. Local governments tend to oversupply the public good for reasons analogous to the ones just given. Finally, in the general case, the direction of bias depends upon the magnitudes of the parameters. It is conceivable, but unlikely, that decentralized decision-making would result in A producing efficiently (MRT/N - aMRSB = MRT/kN) and B producing efficiently (MRT/NaMRSA = MRT/kN). In fact, in a more general version of the model, decentralization never leads to a first-best outcome. Instead, state and/or federal grants must be used to remove the inefficiency. One resolution to this externality problem is to have all production decisions made at a higher level of government. However, an alternative decentralized approach is to allow the higher level of government to use grants-in-aid to stimulate the appropriate spending levels by the local governments. It is this approach to the externality problem to which I now turn.
6.2. The role of grants in a federalist system The various roles that federal and state grants might play in a federalist system are depicted in the four panels in Figure 6.2.7 3 The horizontal axis measures the level of government spending or output G. The price of public output is chosen to equal 1 initially so that G can be represented in either real or money terms. The vertical axis represents total post-tax income in the community, and thus the budget line, AB, represents the constraint that the community decision-maker faces in trying to allocate goods between the public and private sectors. If there are no constraints, then the optimum is point E where the "community indifference curve" is tangent to the budget line. This indifference curve reflects the preferences of the city manager of "decisive voter" and is assumed to be a function of the preferences of the citizens in the community. While this is a plausible assumption if everyone in the community is identical, it is clearly unrealistic and causes serious problems if there is heterogeneity. 73 This analysis builds on the early work on Wilde (1968). See also Thurow (1966), and Inman (1975).
635
Ch. 11: Economics of the Public Sector Non-Matching Bloc Grant
' Y
Matching Grant (b)
(a)
A
A
H
G Specific Non-Matching Grant (c)
Specific Matching Grant "'I
Y
A
Figure 6.2. (a) Non-matching bloc grant; (b) matching grant; (c) specific non-matching grant; (d) specific matching grant
Panel (a) represents the case of the non-matching unconditional grant or the federal revenue sharing block grant. The non-matching grant adds to income and allows the community to expand its public spending as well as its after-tax income to move from point E to point F. Such a grant is logically used to redistribute income among jurisdictions since the lack of restriction on spending allows each community to maximize its own welfare. What if there are externalities in which some of the benefits of local spending spillover into neighboring jurisdictions? In this case a matching grant, as shown in panel (b), is more effective, since the matching rate is chosen to induce the local government to spend more on the public good. If the correct rate is chosen, then decision-makers will equate the social marginal benefit to all individuals from the public good to the marginal cost. If one assumes that the amount of money spent on the two grants is sufficient to allow the community in cases (a) and (b) to reach the same level of utility (not
636
D.L. Rubinfeld
the same amount of money), then the effects of matching and non-matching grants can be compared. Since the non-matching grant affects income without changing price, while the matching grant changes price and thus has both income and substitution effects, point H must be associated with a higher level of public spending than point F. This result follows from the fact that the substitution effect is always negative. Thus, the matching grant is more stimulative and hence more cost efficient than a non-matching grant. Panel (c) describes a conditional non-matching grant where money is received only if a minimum amount is spent on a particular public service. The grant acts like a pure non-matching grant with only an income effect if the community would spend more than the minimum amount required by the grant. Thus, the non-matching grant is effective in guaranteeing a minimum level of spending by a local government on a particular budgetary item. However, these grants do not have an effect different from a pure income grant unless the community is forced to spend more than it would have done otherwise. Finally, consider a matching grant that is limited or "closed ended". As long as the community operates within the budget constraint set by the matching program, the grant has a price effect. However, if the community spends all the money available from the grant, then the analysis is the same as panel (a).7 4 A closed-ended matching grant makes sense if there are some limited externalities and only a certain amount of additional stimulus is desired.7 5 This rather simplistic view of the effect of grants is flawed since it assumes that the community indifference curve reflects interests of the decisive voter or manager which are identical to the interests of the constituents. For example, the manager's political goals may not coincide with the long-run interests of those members of the jurisdiction, since his period of election or office is likely to be relatively short. Thus, the model just described is likely to be a poor predictor of government spending behavior, at least in the short run. After all, the simple model predicts that a dollar increase in grants has the same effect as a dollar increase in community income. However, grants often lead to greater spending increases than would result from an equivalent increase in community income. This phenomenon is sometimes called the "flypaper effect", since it describes the situation in which money tends to stick where it initially hits. A number of authors have attempted to model the effect of grants on local spending to explain the flypaper effect by focusing on the various kinds of budgetary and fiscal illusions that may arise in a federal system. Courant, 74 This has implications for the specification of the price and income terms in the regression analysis where grants can have both price and income effects. With a non-matching closed-ended grant, the closed-ended form of the grant acts simply as an addition to income, not as a price increment. 75For an extended and substantial analysis of the effect of grants, both theoretically and empirically on local spending, see Gramlich (1977). See also, Hamilton (1983b).
Ch. 11: Economics of the Public Sector
637
Gramlich and Rubinfeld (1979), as well as Romer and Rosenthal (1980) and Hettich and Winer (1984), have suggested models of fiscal illusion that focus on individual utility-maximizing behavior. The economic rationale for the flypaper effect in many of these models hinges on the inability of voters to perceive the true marginal price of public expenditures when non-matching grants are present. Of course, these models do not preclude the more popular political rationale that I discussed previously. The analysis of how the political process might work to explain the flypaper effect is one which should provide room for fruitful research. One reason for the limited predictive ability of the spending models is that each assumes that the grant program and level of grants is exogenous to the model. In fact, the form, nature, and extent of grants is often the result of a complicated bargaining process between politicians and employees at each level of government involved. Empirically, these models may not fully reflect the specific income and price effects of the program. A number of authors have shown recently that better measures of the effects of grant programs can be obtained from models that account for these bureaucratic elements.7 6 The analysis of grants within a federal system is in many ways typical of the state of local public economics today. Normative analyses of the role of grants have been well developed within models that have little or no assumptions about the supply side-the politics and economics of the provision of local public goods. However, empirical work on the positive aspects of the grants side is somewhat less well developed. And, the area of greatest need combines the normative and the positive. We need to utilize our empirical base of knowledge of the local public sector to build better normative and positive models. To be effective, both in terms of proscription and prescription these models must include reasonable assumptions about politics as well as economics.
7. Conclusions This chapter has covered a substantial amount of ground, and yet the subjectmatter of local public economics has been barely scratched. Emphasis has been placed on the demand side of the public sector, and on normative rather than positive models. Hopefully, the models and issues sketched out here will whet the student's appetite to look further into the interesting and varied problems in local public finance. A detailed review of where the analysis has taken us is likely to be tedious at this point. In its place I will be somewhat more presumptuous in sketching out what I perceive to be some of the interesting areas of ongoing research and areas for future research as well. My apologies and thanks in 76 See especially Johnson (1979) and Chernick (1979). See also Winer (1983), Fisher (1979, 1982), Filiman, Romer and Rosenthal (1982), and McGuire (1978, 1979).
638
D.L Rubinfeld
advance to my colleagues and peers who have done so much to shape my thinking. (1) The political economy of the local public sector has been studied by both political scientists and economists, but efforts to combine the two have had limited success. At this point the addition of supply-side-political assumptions to Tiebout models has led largely to negative conclusions about the existence and efficiency of the local public competitive analogy. Positive models and a more careful specification of the political process by which public goods are provided may get us somewhere, however. My sketch of the arguments for inefficiency of public good provision and the need for tax limitation is one of many possible examples. Without a well-thought-out and careful description about how public goods are provided one cannot hope to accurately assess either the normative or the positive aspects of the recent move towards state and local limitations on tax revenues and public spending. (2) The Tiebout model has provided one useful, albeit somewhat artificial, model of a more or less competitive local public sector. However, the value and usefulness of the Tiebout model is likely to diminish in the future, and an alternative or alternatives are needed. Several monopoly models of local government have been suggested and do provide a useful counterpart to the Tiebout approach. However, these models have not been nearly as well developed in the sense of taking adequate account of the multiple number of local jurisdictions and the general equilibrium nature of public spending and locational choices. There may be a good deal of mileage in pursuing the market structure analogy further. Some of the recent advances (as well as some of the older work) in the field of industrial organization may help us to understand this aspect of the local public sector. One possibility might be to view local governments as providing public goods which vary spatially in the attributes that they provide to the consuming public. Public goods could be viewed as brands which differ in one or more of the attributes. The extent to which local governments compete among themselves in the supply of the public good becomes an issue that can be studied using modern techniques of industrial organization. (3) Local public economics is a subject area that is rich with empirical work, but a good deal of high quality work remains to be undertaken. The relationship between mobility of households and the provision of local public services is one area that typifies work in local public finance. We know a good deal about why and how often households move, but relatively little about the extent to which their moves influence and are influenced by public sector variables. Better empirical information about this subject is likely to suggest better lines of theoretical research. At the same time better theory is likely to substantially improve upon the empirical work that is undertaken and to make hypothesis testing possible. A well-specified and correctly estimated model may help us to evaluate the correct method by which the demand for local public goods can be determined.
Ch. 11: Economics of the Public Sector
639
(4) There has been a great deal of attention given to equilibrium models of the local public sector and to the question of whether or not equilibrium exists. This is natural, given the tools of microeconomics currently available. However, many of the problems of the local public sector may be best explained by modeling an economy which is in disequilibrium. Modem tools, both theoretical and empirical, have only recently begun to be applied to these problems. For example, consider the question of whether the capital infrastructure in our cities is being efficiently provided. Are the cities of the Northeast "undercapitalized" and if so, what should the federal government do about it? Is it efficient for adjustments of population and the capital stock to respond solely to market forces, or do substantial externalities warrant government intervention? (5) The federalism question was only lightly treated in this chapter, but is an extremely important one. Recent proposals for a major restructuring of the federal system by President Reagan have encouraged economists and others to rethink the basic Musgravian view of federalism. Should redistribution be moved in part to the state and/or the local level? Can individuals and individual local governments internalize interjurisdictional externalities or is a move to consolidation of governments a necessary step? These are only a few of the areas in which interesting work is being and is likely to be done in local public economics. Future research should only serve to prove how limited and how narrow this list of suggestions has been.
References Aaron, H.J., 1975, Who pays the property tax? (Brookings, Washington, D.C.). Akin, J. and M. Lea, 1982, Microdata estimation of school expenditure levels: An alternative to the median voter approach, Public Choice 38, 113-128. Akin, J. and D. Youngday, 1976, The efficiency of local school finance, Review of Economics and Statistics 58. Arnott, R.J., 1979, Optimal taxation in a spatial economy with transport costs, Journal of Public Economics 11, 307-334. Arnott, R. and R.E. Grieson, 1981, Optimal fiscal policy for state or local government, Journal of Urban Economics 9, 23-48. Arnott, R.J. and J.E. Stiglitz, 1979, Aggregate land rents, expenditure on public goods, and optimal city size, Quarterly Journal of Economics 93, 471-500. Barlow, R., 1970, Efficiency aspects of local school finance, Journal of Political Economy 78, 1028-1039. Barr, J. and O.A. Davis, 1966, An elementary political and economic theory of expenditures of local governments, Southern Economic Journal 33, 149-165. Beck, J.H., 1983, Tax competition, uniform assessment, and the benefit principle, Journal of Urban Economics 13, 127-146. Berglas, E., 1982, User charges, local public services, and taxation of land rents, Public Finance 37, 178-188. Berglas, E. and D. Pines, 1984, Resource constraint, replicability and mixed clubs: A reply, Journal of Public Economics 23, 391-397. Bergstrom, T.C., 1979, When does majority rule supply public goods efficiently?, Scandanavian Journal of Economics 81, 216-226.
640
D.L. Rubinfeild
Bergstrom, T.C. and R.P. Goodman, 1973, Private demands for public goods, American Economic Review 63, 280-296. Bergstrom, T.C., D.L. Rubinfeld and P. Shapiro, 1982, Micro-based estimates of demand function for local school expenditures, Econometrica 50, 1183-1205. Bewley, T.F., 1981, A critique of Tiebout's theory of local public expenditures, Econometrica 49, 713-739. Bird, R. and E. Slack, 1983, Urban public finance in Canada (Butterworth's, Toronto). Bloom, H.S., H.F. Ladd and J. Yinger, 1983, Are property taxes capitalized into house values?, in: G.R. Zodrow, ed., Local provision of public services: The Tiebout model after twenty-five years (Academic Press, New York) 145-164. Bloom, Howard S., H.F. Ladd and J. Yinger, Property taxes and house values (Academic Press, New York) forthcoming. Borcherding, T.E. and R.T. Deacon, 1972, The demand for services of non-federal governments, American Economic Review 62, 891-901. Bowen, H.R., 1943, The interpretation of voting in the allocation of economic resource, Quarterly Journal of Economics 58, 27-48. Bradford, D. and W. Oates, 1974, Suburban exploitation of central cities and government structure, in: H. Hochman and G. Peterson, eds., Redistribution through public choice (Columbia University Press, New York). Brazer, H., 1961, The value of industrial property as a subject of taxation, Canadian Public Administration 4, 137-147. Brown, J.N. and H.S. Rosen, 1982, On the estimation of structural hedonic price models, Econometrica 50, 765-768. Brueckner, J.K., 1979, Property values, local public expenditure, and economic efficiency, Journal of Public Economics 11, 223-245. Brueckner, J.K., 1982, "A test for allocative efficiency in the local public sector, Journal of Public Economics 14, 311-332. Brueckner, J.K.,. 1983, Property value maximization and public sector efficiency, Journal of Urban Economics 14, 1-16. Buchanan, J., 1965, An economic theory of clubs, Economica 32, 1-14. Buchanan, J.M. and C.J. Goetz, 1972, Efficiency limits of fiscal mobility: An assessment of the Tiebout model, Journal of Public Economics 14, 25-44. Bucovetsky, S., 1981, Optimal jurisdictional fragmentation and mobility, Journal of Public Economics 16, 177-192. Burstein, N.R., 1980, Voluntary income clustering and the demand for housing and local public goods, Journal of Urban Economics 7, 176-185. Carr-Hill, R.A. and N.H. Stern, 1973, An econometric model of the supply and control of recorded offenses in England and Wales, Journal of Public Economics 2, 289-318. Chernick, H.A., 1979, An economic model of the distribution of project grants, in: P. Mieskowski and W. Oakland, eds., Fiscal federalism and grants-in-aid (The Urban Institute, Washington, D.C.) 81-103. Church, A.M., 1973, Capitalization of the effective property tax rate on single family homes, National Tax Journal 28, 113-122. Citrin, J., 1979, Do people want something for nothing: Public opinion on taxes and government spending, National Tax Journal 32, 113-130. Comanor, W.S., 1976, The median voter rule and the theory of political choice, Journal of Public Economics 5, 169-177. Courant, P.N., 1977, A general equilibrium model of heterogeneous local property taxes, Journal of Public Economics 8, 313-328. Courant, P'.N., 1983, On the effects of federal capital taxation on growing and declining areas, Journal of Urban Economics 14, 242-261. Courant, P.N. and D.L. Rubinfeld, 1981, On the welfare effects of tax limitation, Journal of Public Economics 15, 289-316. Courant, P. and D.L. Rubinfeld, 1982, On the measure of benefits in an urban context: Some general equilibrium issues, Journal of Urban Economics 13, 346-356.
Ch. 11: Economics of the Public Sector
641
Courant, P.N., E.M. Gramlich and D.L. Rubinfeld, 1979, Public employee market power and the level of government spending, American Economic Review 69, 806-817. Cushing, B.J., 1984, Capitalization of interjurisdictional fiscal differentials: An alternative approach, Journal of Urban Economics 15, 317-326. Deacon, R., 1977, Private choice and collective outcomes: Evidence from public sector demand analysis, National Tax Journal 30, 371-386. Denzau, A.T., 1975, An empirical survey of studies on public school spending, National Tax Journal 28, 241-249. Denzau, A.T., R.J. MacKay and C.L. Weaver, 1981, On the initiative-referendum option and the control of monopoly government, in: H.F. Ladd and T.N. Tideman, eds., Tax and expenditure limitations, Committee on Urban Public Economics, Papers on Public Economics (The Urban Institute, Washington, D.C.) 5, 191-222. Dusansky, R., M. Ingber and N. Karatjas, 1981, The impact of property taxation on housing values and rents, Journal of Urban Economics 10, 240-255. Eberts, R.W. and T.J. Gronberg, 1981, Jurisdictional homogeneity and the Tiebout hypothesis, Journal of Urban Economics 10, 227-239. Edel, M. and E. Sclar, 1974, Taxes, spending and property values: Supply adjustment in a Tiebout-Oates model, Journal of Political Economy 82, 941-954. Edelstein, R., 1974, The determinants of value in the Philadelphia housing market: A case study of the main line 1967-1969, The Review of Economics and Statistics 56, 319-327. Ellickson, B., 1973, A generalization of the pure theory of public goods, American Economic Review 63, 417-432. Ellickson, B., 1977, The politics and economics of decentralization, Journal of Urban Economics 4, 135-149. Epple, D. and A. Zelnitz, 1981, The implications of competition among jurisdictions: Does Tiebout need politics?, Journal of Political Economy 89, 1197-1217. Epple, D., R. Filimon and T. Romer, 1983, Housing, voting and moving: Equilibrium in a model of local public goods with multiple jurisdictions, in: J.V. Henderson, ed., Research in urban economics 3, 59-90. Epple, D.,. R. Filimon and T. Romer, 1984, Equilibrium among local jurisdictions: Toward an integrated treatment of voting and residential choice, Journal of Public Economics 17, 281-308. Epple, D., A. Zelenitz and M. Visscher, 1978, A search for testable implications of the Tiebout hypothesis, Journal of Political Economy 86, 405-425. Fare, R., S. Grosskopf and F.J. Yoon, 1982, A theoretical and empirical analysis of the highway speed-volume relationship, Journal of Urban Economics 12, 115-121. Filimon, R., T. Romer and H. Rosenthal, 1982, Asymmetric information and agenda control: The bases of monopoly power in public spending, Journal of Public Economics 17, 51-70. Fiorina, M.P. and R.G. Noll, 1978, Voters, bureaucrats and legislators: A rational choice perspective on the growth of bureaucracy, Journal of Public Economics 9, 239-254. Fischel, W., 1979, Determinants of voting on environmental quality: A study of a New Hampshire pulp mill referendum, Journal of Environmental Economics and Management 6, 107-118. Fisher, R.C., 1979, Theoretical view of revenue sharing grants, National Tax Journal 32, 173-184. Fisher, R.C., 1982, Income and grant effects on local expenditure: The flypaper effect and other difficulties, Journal of Urban Economics 12, 324-345. Flatters, F., V. Henderson and P. Mieszkowski, 1974, Public goods, efficiency and regional fiscal equalization, Journal of Public Economics 3, 99-112. Foley, D., 1967, Resource allocation and the public sector, Yale Economic Essays 7, 43-98. Goldstein, G.S. and M.V. Pauly, 1981, Tiebout bias on the demand for local public goods, Journal of Public Economics 16, 131-144. Gordon, R.H., 1984, Taxation in federal systems: An optimal taxation approach, in: R. Matthews and C. McLure, eds., Taxation in federal systems (Australian National University, Canberra) 26-42. Gramlich, E.M., 1977, Intergovernmental grants: A review of the empirical literature, in: W.E. Oates, ed., The political economy of fiscal federalism (D.C. Heath, Lexington, Mass.) 219-240. Gramlich, E.M. and D.L. Rubinfeld, 1982, Micro estimates of public spending demand functions and
642
D.L. Rubinfeld
tests of the Tiebout and median-voter hypotheses, Journal of Political Economy 90, 536-560. Gramlich, E.M., D.L. Rubinfeld and D.A. Swift, 1981, Why voters turn out for tax limitation votes, National Tax Journal 34, 115-124. Greenberg, J., 1977, Existence of an equilibrium with arbitrary tax schemes for financing local public goods, Journal of Economic Theory 16, 137-150. Greenberg, J., 1983, Local public goods with mobility: Existence and optimality of a general equilibrium, Journal of Economic Theory 30, 17-33. Hamilton, A.D., E.S. Mills and D. Puryear, 1975, The Tiebout hypothesis and residential income segregation, in: E.S. Mills and W.E. Oates, eds., Fiscal zoning and land use controls (Lexington Books, Lexington, Mass.) 101-118. Hamilton, B.W., 1975, Zoning and property taxation in a system of local governments, Urban Studies 12, 205-211. Hamilton, B.W., 1976, Capitalization of intrajurisdictional differences in local tax prices, American Economic Review 66, 743-753. Hamilton, B.W., 1983a, A review: Is the property tax a benefit tax?, in: G.R. Zodrow, ed., Local provision of public services: The Tiebout model after twenty-five years (Academic Press, New York). Hamilton, B.W., 1983b, The flypaper effect and other anomalies, Journal of Public Economics 22, 347-362. Harrison, D.E. and D.L. Rubinfeld, 1978, Hedonic housing prices and the demand for clean air, Journal of Environmental and Economic Management 5, 81-102. Hartwick, J.M., 1980, The Henry George rule, optimal population, and interregional equity, Canadian Journal of Economics 18, 695-700. Heinberg, J.D. and W.E. Oates, 1970, The incidence of differential property taxes and urban housing: A comment and some further evidence, National Tax Journal 23, 92-98. Henderson, J.V., 1979, Theories of group, jurisdiction, and city size, in: P. Mieszkowski and M. Straszheim, eds., Current issues in urban economics (The Johns Hopkins University Press, Baltimore) 235-269. Henderson, J.V., 1985, The Tiebout model: Bring back the entrepreneurs, Journal of Political Economy, 93, 248-264. Hettich, W. and S. Winer, 1984, A positive model of tax structure, Journal of Public Economics 23, 67-87. Hinich, M.J., 1977, Equilibrium in spatial voting: The median voter results is an artifact, Journal of Economic Theory 16, 208-219. Hochman, O., 1981, Land rents, optimal taxation and local fiscal independence in an economy with local public goods, Journal of Public Economics 15, 59-85. Hochman, O., 1982, Congestable local public goods in an urban setting, Journal of Urban Economics 11, 290-310. Inman, R.P., 1975, Grants in a metropolitan economy- A framework for policy, in: W. Oates, ed., Financing the new federalism (The Johns Hopkins University Press, Baltimore) 88-114. Inman, R.P., 1978, Testing political economy's "as if" proposition: Is the median income voter really decisive, Public Choice 33, 45-65. Inman, R.P., 1979, The fiscal performances of local governments: An interpretive review, in: P. Mieszkowski and M. Straszheim, eds., Current issues in urban economics (The Johns Hopkins University Press, Baltimore) 270-321. Inman, R.P., 1981, Wages, pensions, and employment in the local public sector, in: P. Mieszkowski and G.E. Peterson, eds., Public sector labor markets, Committee on Urban Public Economics, Papers on Public Economics (The Urban Institute, Washington, D.C.) 4, 11-58. Inman, R.P. and D.L. Rubinfeld, 1979, The judicial pursuit of local fiscal equity, Harvard Law Review 92, 1662-1750. Johnson, M.B., 1979, Community income, intergovernmental grants, and local school district fiscal behavior, in: P. Mieszkowski and W.H. Oakland, eds., Fiscal federalism and grants-in-aid, Committee on Urban Economics, Papers on Public Economics (The Urban Institute, Washington, D.C.) 1, 51-77. King, T.A., 1977, Estimating property tax capitalization: A critical comment, Journal of Political Economy 85, 425-431.
Ch. 11: Economics of the Public Sector
643
Ladd, H.F., 1975, Local education expenditures, fiscal capacity, and the composition of the property tax base, National Tax Journal 28, 145-158. Levin, S.G., 1982, Capitalization of local fiscal differentials: Inferences from a disequilibrium market model, Journal of Urban Economics 11, 180-189. Martinez-Vazquez, J., 1981, Selfishness versus public "regardingness" in voting behavior, Journal of Public Economics 15, 349-361. McDougall, G.S., 1976, Local public goods and residential property values: Some insights and extensions, National Tax Journal 29, 436-437. McFadden, D., 1973, Conditional logit analyses of qualitative choice behavior, in: P. Zarembka, ed., Frontiers in econometrics (Academic Press, New York). McGuire, M., 1974, Group segregation and optimal jurisdictions, Journal of Political Economy 82, 112-132. McGuire, M., 1978, A method for estimating the effects of a subsidy on the receiver's resource constraint, Journal of Public Economics 10, 25-44. McGuire, M., 1979, The analysis of federal grants into price and income components, in: P. Mieszkowski and W.H. Oakland, eds., Fiscal federalism and grants-in-aid (The Urban Institute, Washington, D.C.) 31-50. McKelvey, R.D., 1976, Intransitivities in multidimensional voting models and some implications for agenda control, Journal of Economic Theory 12, 472-482. McKelvey, R.D., 1979, General conditions for global intransitivities in formal voting models, Econometrica 47, 1085-1111. McLure, C.E., Jr., 1967, The interstate exporting of state and local taxes: Estimates for 1962, National Tax Journal 20, 49-77. Mills, E.S., 1979, Economic analysis of urban land-use controls, in: P. Mieszkowski and M. Straszheim, eds., Current issues in urban economics (The Johns Hopkins University Press, Baltimore) 511-541. Mueller, D.C., 1976, Public choice: A survey, Journal of Economic Literature 14, 395-433. Musgrave, R.A., 1959, The theory of public finance (McGraw-Hill, New York). Neenan, W.B., 1981, Local public economics (Markham, New York). Nelson, J.P., 1978, Residential choice, hedonic prices, and the demand for urban air quality, Journal of Urban Economics 5, 357-369. Niskanen, W., 1971, Bureaucracy and representative government (Aldine-Atherton, Chicago). Oates, W., 1968, The theory of public finance in a federal system, Canadian Journal of Economics 1, 37-54. Oates, W., 1969, The effects of property taxes and local spending on property values: An empirical study of tax capitalization and the Tiebout hypothesis, Journal of Political Economy 77, 957-971. Oates, W., 1972, Fiscal federalism (Harcourt Brace, New York). Oates, W., 1981, On local finance and the Tiebout model, American Economic Review 71, 93-97. Ott, M., 1980, Bureaucracy, monopoly, and the demand for municipal services, Journal of Urban Economics 8, 362-383. Pack, H. and J.R. Pack, 1977, Metropolitan fragmentation and suburban homogeneity, Urban Studies 14, 191-201. Pauly, M.V., 1970, Cores and clubs, Public Choice 9, 53-65. Pauly, M.V., 1973, Income distribution as a local public good, Journal of Public Economics 2, 35-58. Pauly, M.V., 1976, A model of local government expenditure and tax capitalization, Journal of Public Economics 6, 231-242. Pestieau, P., 1983, Fiscal mobility and local public goods: A survey of the empirical and theoretical studies of the Tiebout model, in: J.F. Thisse and H.G. Zoller, eds., Locational analysis of public facilities (North-Holland, Amsterdam) 11-41. Pindyck, R.S. and D.L. Rubinfeld, 1981, Econometric models and economic forecasts (McGraw-Hill, New York). Polinsky, A.M. and D.L. Rubinfeld, 1977, Property values and the benefits of environmental improvements: Theory and measurement,'in: L. Wingo and A. Evans, eds., Public economics and the quality of life (The Johns Hopkins University Press, Baltimore). Polinsky, A.M. and D.L. Rubinfeld, 1978, The long run effects of a residential property tax and local public services, Journal of Urban Economics 5, 241-262.
644
D.L. Rubinfeld
Pommerehne, W.W., 1978, Institutional approaches to public expenditures: Empirical evidence from Swiss municipalities, Journal of Public Economics 7, 255-280. Pommerehne, W.W. and B. Frey, 1976, Two approaches to estimating public exenditures, Public Finance Quarterly 4, 395-407. Pommerehne, W.W. and F. Schneider, 1978, Fiscal illusion, political institutions, and local public spending: Some neglected relationships, Kyklos 31, 381-408. Portney, P. and J. Sonstelie, 1978, Profit maximizing communities and the theory of local public expenditures, Journal of Urban Economics 5, 263-277. Reschovsky, A., 1979, Residential choice and the local public sector: An alternative test of the Tiebout hypothesis, Journal of Urban Economics 6, 501-521. Richter, D.K., 1978, Existence and computation of a Tiebout general equilibrium, Econometrica 46, 779-805. Richter, D.K., 1982, Weakly democratic regular tax equilibria in a local public goods economy with perfect consumer mobility, Journal of Economic Theory 27, 137-162. Roberts, D., 1975, Incidence of state and local taxes in Michigan, Ph.D. dissertation (Michigan State University). Romer, T. and H. Rosenthal, 1978, Political resource allocation, controlled agendas, and the status quo, Public Choice 33, 27-43. Romer, T. and H. Rosenthal, 1979a, Bureaucrats vs. voters: On the political economy of resource allocation by direct democracy, Quarterly Journal of Economics 93, 563-587. Romer, T. and H. Rosenthal, 1979b, The elusive median voter, Journal of Public Economics 12, 143-170. Romer, T. and H. Rosenthal, 1980, An institutional theory of the effect of intergovernmental grants, National Tax Journal 33, 451-458. Rose-Ackerman, S., 1983, Beyond Tiebout: Modeling the political economy of local government, in: G.R. Zodrow, ed., Local provision of public services: The Tiebout model after twenty-five years (Academic, New York) 55-84. Rose-Ackerman, S., 1979, Market models of local government: Exit, voting, and the land market, Journal of Urban Economics 7, 319-337. Rose-Ackerman, S., 1981, Does federalism matter: Political choice in a federal republic, Journal of Political Economy 89, 152-165. Rosen, H.S. and D.J. Fullerton, 1977, A note on local tax rates, public benefit levels and property values, Journal of Political Economy 85, 433-440. Rosen, K.T., 1982, The impact of Proposition 13 on house prices in northern California: A test of the interjurisdictional capitalization hypothesis, Journal of Political Economy 90, 191-200. Rosen, S., 1974, Hedonic prices and implicit markets, Journal of Political Economy 82, 34-55. Rubinfeld, D., 1977, Voting in a local school election: A micro analysis, Review of Economics and Statistics 59, 30-42. Rubinfeld, D., 1979, Judicial approaches to local public-sector equity: An economic analysis, in: P. Mieszkowski and M. Straszheim, eds., Current issues in urban economics (The Johns Hopkins University Press, Baltimore) 542-576. Rubinfeld, D., 1980, On the economics of voter turnout in local school elections, Public Choice 35, 315-331. Rubinfeld, D., P. Shapiro and J. Roberts, Tiebout bias and the demand for local public schooling, Review of Economics and Statistics, forthcoming. Rufolo, A.M., 1979, Efficient local taxation and local public goods, Journal of Public Economics 12, 351-376. Sandler, T. and J.T. Tschirhart, 1980, Mixed clubs: Further observations, Journal of Economic Literature 18, 1481-1521. Schweizer, U., 1983, Edgeworth and the Henry George theorem: How to finance local public projects, in: J.F. Thisse and H.G. Zoller, eds., Locational analysis of public facilities (North-Holland, Amsterdam) 79-93. Scotchmer, S., 1985, Profit-maximizing clubs, Journal of Public Economics 27, 25-85. Scotchmer, S., 1985, Hedonic prices and cost/benefit analysis, Journal of Economic Theory, 37, 55-75. Scotchmer, S., 1986, Local public goods in an equilibrium: How pecuniary externalities matter, Regional Science and Urban Economics, 16, 463-481. Shapiro, P., 1975, A model of voting and the incidence of environmental policy, in: H.W. Gottinger,
Ch. 11: Economies of the PublicSector
645
ed., Systems approaches and environmental problems (Vandershoech and Ruprecht, Gottinger). Smith, A.S.. 1970, Property tax capitalization in San Francisco, National Tax Journal 23, 177-193. Sonstelie, J. and P. Portney, 1980, Gross rents and market values: Testing the implications of Tiebout's hypothesis, Journal of Urban Economics 9, 102-108. Stahl, K. and P. Varaiya, 1983, Local collective goods: A critical re-examination of the Tiebout model, in: J.F. Thisse and H.G. Zoller, eds., Locational analysis of public facilities (North-Holland, Amsterdam) 45-53. Starrett, D.A., 1980, On the method of taxation and the provision of local public goods, American Economic Review 70, 380-392. Starrett, D.A., 1981, Land value capitalization in local public finance, Journal of Political Economy 89, 306-327. Stiglitz, J.E., 1977, The theory of local public goods, in: M. Feldstein and R. Inman, eds., The economics of public services (MacMillan, New York). Stiglitz, J.E., 1983a, Public goods in open economies with heterogeneous individuals, in: J.F. Thisse and H.G. Zoller, eds., Locational analysis of public facilities (North-Holland, Amsterdam) 55-78. Stiglitz, J.E., 1983b, The theory of local public goods twenty-five years after Tiebout: A perspective, in: J.F. Thisse and H.G. Zoller, eds., Local provision of public services: The Tiebout model after twenty-five years (Academic, New York) 17-54. Thurow, L., 1966, The theory of grants-in-aid, National Tax Journal 19, 373-377. Tideman, T.N. and G. Tullock, 1976, A new and superior process for making social choices, Journal of Political Economy 84, 1145-1159. Tiebout, C., 1956, A pure theory of local expenditures, Journal of Political Economy 64, 416-424. Topham, H., 1982, A note on testable implications of the Tiebout hypothesis, Public Finance 120-128. Tresch, R., 1981, Public finance: A normative theory (Business Publications, Inc., PIano, Texas). Wales, T.J. and E.G. Weins, 1974, Capitalization of residential property taxes: An empirical study, Review of Economics and Statistics 56, 329-333. Westhoff, F., 1979, Policy implications from community choice models: A caution, Journal of Urban Economics 6, 535-549. White, M.J., 1975, Fiscal zoning in fragmented metropolitan areas, in: E. Mills and W. Oates, eds., Fiscal zoning and land use control (Lexington Books, D.C. Heath Company, Lexington, Mass.) 31-100. Wicks, J.J., R.A. Little and R.A. Beck, 1968, A note on capitalization of property tax changes, National Tax Journal 21, 263-265. Wildasin, D.E., 1977, Public expenditures determined by voting with one's feet and public choice, Scandinavian Journal of Economics 79, 889-898. Wildasin, D.E., 1983, The welfare effects of intergovernment grants in an economy with independent jurisdictions, Journal of Urban Economics 13, 147-164. Wildasin, D.E., 1984, Urban public finance, draft. Wilde, J., 1968, The expenditure effects of grant-in-aid program, National Tax Journal 21, 340-348. Winer, S.L., 1983, Some evidence on the effect of the separation of spending and taxing decisions, Journal of Political Economy 91, 126-140. Woodard, F.O. and R.W. Brady, 1965, Inductive evidence of tax capitalization, National Tax Journal 18, 193-201. Wooders, M., 1980, The Tiebout hypothesis: Near optimality in local public good economies, Econometrica 48, 1467-1485. Yinger, J., 1982, Capitalization and the theory of local public finance, Journal of Political Economy 90, 917-943. Zimmerman, D., 1983, Resource misallocation from interstate tax exportation: Estimates of excess spending and welfare loss in a median voter framework, National Tax Journal 36, 193-202. Zodrow, G.R., 1983, The Tiebout model after twenty-five years: An overview, in: G.R Zodrow, ed., Local provision of public services: The Tiebout model after twenty-five years (Academic, New York) 1-16. Zodrow, G.R. and P. Mieszkowski, 1983, The incidence of the property tax: The benefit view versus the new view, in: G.R. Zodrow, ed., Local provision of public services: The Tiebout model after twenty-five years (Academic, New York) 109-130.
Chapter 12
MARKETS, GOVERNMENTS, AND THE "NEW" POLITICAL ECONOMY ROBERT P. INMAN* University of Pennsylvania and NBER
1. Introduction [The] very real features of government behavior-and the wider political structure-must be taken into account in any realistic assessment of the prospects for (economic) reform. In this return to "political economy," a great deal remains to be done [A.B. Atkinson and J.E. Stiglitz (1980, p. 576) as the conclusion to Lectures in Public Economics].
If there has been a prevailing trend in the affairs of nations in the twentieth century it has been the increasing role of governments in the allocation of society's resources. It is a trend evident not only in the major industrial nations of the world, but in developing economies as well. Governments provide goods and services directly, redistribute income, and regulate economic relations. Today, none of the major industrial nations, apart from Switzerland, has less than 20 percent of its gross domestic product (GDP) allocated through governments. Denmark, Sweden, Norway, Great Britain, the Netherlands, and Italy all allocate more than a third of GDP through governments. There is little doubt that government is a -if
not the-central institution in the allocation of social
resources. It is imperative that we understand the growth and performance of so important an economic institution. This is not a new task. Indeed, Adam Smith sought first and foremost to understand the role and limitations of governments *I wish to thank the National Science Foundation, grant SES-81-12001, for financial support for this project, and the many friends and colleagues who read and commented on an earlier version of this chapter. Particular thanks must be given to A. Auerbach, W. Baumol, D. Blair, J. Farnbach, E. Mills, R. Nelson, M. Olson, S. Peltzman, H. Rosenthal, M. Riordan, D. Rubinfeld, D. Sappington, K. Shepsle, and D. Yao who read all or major portions of a first draft with great care and insight. I hope this revised version has done their comments justice.
Handbook of Public Economics, vol. II, edited by A.J. Auerbach and M. Feldstein © 1987, Elsevier Science Publishers B. V. (North-Holland)
648
R.P. Inman
in the economic affairs of nations. Hobbes and Rousseau before him and Mills, Keynes, and Schumpeter after, wrote to the same agenda.' Yet in the last three decades we have witnessed perhaps the greatest advances in our understanding of the role, the strengths and the weaknesses of government as an economic institution. It is the intention of this chapter to survey these advances. The survey will bring together five separate literatures, each of which addresses one aspect in our understanding of the role for and capabilities of governments as resource allocators. The market failures literature, and in particular the challenge of Coase, Tiebout, and Stigler to the Pigou-Samuelson tradition of governmental remedies, will help us to understand when markets "work" and when they do not (Section 2). It is here we shall find the potential role for governmental-i.e. collective - action. Whether this potential will be realized in actual performance is another matter, however. The voluntary-exchange theory of Wicksell and Lindahl and the democratic social choice literature from Condorcet to Arrow and his contemporaries address just this question. When will a collective decision-making process satisfy a set of minimal normative criteria - for example, Pareto efficiency and non-dictatorial choice - and do so with a minimal expenditure of social resources? The voluntary exchange process of Wicksell and Lindahl and its contemporary offspring, the demand revealing mechanism, hold the promise of being efficient and the threat of becoming dictatorial. The voting process can protect us against dictatorship but generally at the cost of social efficiency. I examine these alternatives in detail. In Western societies the decision has been made to protect democracy; the task then is to search for the most efficient collective choice process consistent with this norm. The public choice literature developed under the early guidance of Bowen, Downs, Buchanan, and Niskanen has focused on just this question. It is in the spirit of their early work that we have begun to examine the performance of democratic governments as resource allocators. The results of this examination emerge in the new quantitative political economy and the empirical examination of government spending, taxation, and regulation (Section 3). Yet the political-economic perspective not only addresses important questions of fact-Why do governments do what they do? It also challenges in a most fundamental way our view of economic policy analysis. It is not enough to offer a policy recommendation and expect its adoption into law. Governments are endogeneous social institutions with their own agendas. The fact that governments will not automatically adopt Pareto-improving policies and reject Paretoinferior ones does not mean we should stop looking for good policies. It does mean, however, that an integral part of economic policy analysis must be to iFor an introduction to writings of the first political economists, from Aristotle onward, see Schumpeter (954).
Ch. 12: The "New" Political Economy
649
search for institutional reforms which will facilitate the adoption of good policies and which will block the adoption of bad ones. I shall review recent recommendations for the institutional reform of the budgetary process (Section 4). The "new" political economy is concerned with that most basic question of economic policy: Under what circumstances are markets or governments the preferred institution for allocating societal resources? The problem is an old one and was the central issue in the work of Hobbes, Smith, Mills and Sidgwick before us. What is "new" in the new political economy is the sophistication of analytic technique which we can now bring to bear to answer this important question. The work of the last thirty years has given us new insight and understanding into the relative strengths and weaknesses of markets and governments as institutions for allocating the economic resources of society (Section 5).
2. Market failures and the role of government According to the system of natural liberty, the sovereign has only three duties to attend to; three duties of great importance, indeed, but plain and intelligible to common understandings: first, the duty of protecting the society from the violence and invasion of other independent societies; secondly, the duty of protecting, as far as possible, every member of the society from the injustice or oppression of every other member of it, or the duty of establishing an exact administration of justice; and thirdly, the duty of erecting and maintaining certain public works and certain public institutions, which it can never be for the interest of any individual, or small number of individuals, to erect and maintain; because the profit could never repay the expense to any individual or small number of individuals, though it may frequently do much more than repay it to a great society (Adam Smith, The Wealth of Nations, Book IV, chapter IX). At least in theory, markets and governments are compatible institutions. The case for markets as the mechanism for the allocation of social resources is well known. Utility-maximizing households and profit-maximizing firms, if joined by markets covering all desired (now and future) goods and services, will be able to achieve a Pareto-efficient allocation of social resources, an allocation from which no single economic agent can be made better off without at the same time harming another agent. 2 Furthermore, if agents are small compared to the economy as a whole, then the competitive market equilibrium is as good as any other allocation which 2 The details of the argument are developed in Arrow and Hahn (1971). The strict convexity assumptions for households and firms can be relaxed without upsetting the central results; see Starr (1969).
650
R.P. nman
a society might achieve for an initial distribution of resources.3 If, in fact, competitive markets arise to cover all desired commodities for trade, then the need for government is minimal. However, if the needed competitive trades do not occur to produce an efficient allocation, then we can speak of a market failure. It is in these instances that we need to consider a potentially broader role for governmental activities. Indeed, I shall argue that there is a common problem which underlies all market failures, and that that common problem is uniquely handled by an institution which can enforce cooperative-that is, collectivebehavior in a world where non-cooperative behavior is the preferred individual strategy. Government is one such institution. 2.1. The minimal role of government 2.1.1. Property rights The establishment of individual rights to initial endowments and to the gains from trading those endowments is now understood as a necessary condition for the use of markets as an institution to achieve a Pareto-efficient allocation. Without rights to property, market trades cannot be sustained. For Buchanan (1975, p. 9) property rights are necessary if we are to escape the Hobbesian world of constant conflict and achieve the benefits of trade. Furthermore, one might well argue that the Hobbesian state of nature in fact induces individuals to coordinate their activities so as to ensure the existence and maintenance of property right institutions. Nozick (1974) advances the idea of an "invisible hand process" from which evolves a cooperative institution to enforce individual property rights. If the protection of an individual's property and human capital endowment is costly but is less costly than stealing another's endowment, the natural state of affairs will be to attack one's neighbors and to defend oneself. The expected outcome of such a world will equal the initial endowment (M) less protection costs (d) less the net costs of attack (c). This expected outcome is less than what could be achieved if the initial endowments of property and skills were protected through a system of property rights, provided the cost per person of such a system (0) were less than costs c + d(M - > M - c - d). If there are economies of scale in the protection of endowments [that is, it is cheaper per person to protect (n) endowments than (n - 1) endowments], Bush and Mayer (1974) and Schotter (1981, pp. 47-51) have each shown that there will arise a coalition of all agents who agree not to rob each other, and who grant the coalition judicial and enforcement powers to ensure cooperation. One outcome that may emerge from such a "game" is that every player retains his original endowments. Each player agrees not to rob another, so no resources need be spent on protection or on attack. This is not the only outcome, however. Players 3
Again see Arrow and Hahn (1971) for details.
651
Ch. 12: The "New" Political Economy
may agree to a property rights coalition which gives some members less than their initial endowment of resources and others more than their initial endowments. This unequal distribution of endowments through the property rights system will
still be a stable outcome if being outside the protection system makes each individual or subset of individuals worse off. 4 Nozick calls this property rights coalition the "minimal state". All members of society will agree to participate within the rules established by the property rights system, with some members compensated for the costs, if any, they may incur in the role of enforcers. 5 To enjoy the gains of markets, a minimal degree of cooperative activity is required. Government as the protector of property rights is necessary. 2.1.2. Distribution of endowments
With the establishment of property rights to initial endowments and the gains from trade, an economy in which markets span all current and future goods can then achieve a Pareto-efficient allocation. That final allocation is tied directly to initial endowments, however. Those who start with more may end up with more in Nozick's minimal-state, market economy. For many social philosophers, 4 Schotter (1981, pp. 45-51) develops these arguments in detail, but the basic idea is simple enough. In a game where each player is endowed with M of resources, where it costs c to attack and d to defend, and where there is sufficient time to attack only one other player and to be attacked once, players will find it advantageous to form a grand protection coalition- that is, a government. The pay-off matrix for this game is represented by:
Attack
Attacked
Defended Defeated
Win
Lose
2M-c-d M-c-d
M-c-d -(c d)
If a player attacks and is successful, then he captures M from the defeated opponent. If they are robbed and lose, they lose M. The best outcome is to "attack and win" and "defend against attack": 2 M-c - d. Schotter shows that if the probability of a successful attack against a defending opponent is , then the best strategy for each player is to attack and to defend. The expected value for each player in this game is therefore:
V= (2M-c- d) + (M-c- d) + (M-c- d) +(-(c+ d))=M-c- d. Note that this expected value of the "state of nature" game, M- c - d, is less than what would be earned if property rights were jointly protected at a cost , when c + d > : M - > M - c - d. Note too that the charge , can be varied by each individual i and unanimity maintained as long as M - 0 > M- c - d. If the charge (i) is greater than the true average cost per person (), then person i contributes more than the average share. If 0 < 0, then person i contributes less than the average share. An unequal distribution of final endowments, M- 9,, can therefore result. 5 Becker (1968) describes such an efficient, though inequitable, system of enforcement.
652
R.P. nman
however, this fact is a troublesome consequence of allocations by markets only. While libertarians such as Nozick view the market as a fair process and hence sufficient for a just society,6 those who embrace an outcome or end-state criterion of social justice will find Nozick's minimal-state market economy unacceptable on grounds of equity. If we embrace an end-state view of justice, government must do more than just ensure property rights. End-state theories of justice founded on utilitarianism, the Rawlsian maximin principle, or other notions of7 fairness, judge the outcomes of a social process by what people finally receive. Given such principles, there is no guarantee that markets will achieve a just allocation of resources. Markets alone cannot ensure fair outcomes; markets combined with an appropriately structured initial distribution of endowments are required. 8 Yet the appropriate distribution of endowments will generally not happen by chance; it must be managed by some institution within society. Markets, designed to ensure that no individual is made worse off than what he or she could achieve from the initial endowment, are clearly not the appropriate means of redistribution. Reallocation of initial endowments will require a coercive mechanism. A government with the power to tax and transfer is such an institution. Those who embrace an end-state philosophy of social justice therefore grant to governments an additional function: the power to tax and transfer the 9 initial endowments of societal resources. 6Nozick (1974) argues that extensions of state power beyond the prevention of force or fraud- for example, to redistribute goods and services- are a violation of our basic rights to liberty, that is, our rights not to be interfered with. Nozick does sketch a theory of a just distribution of holdings. There are three parts to any such theory: (i) a description of how people legitimately acquire their endowments, (ii) a description of how people legitimately transfer their endowments, and (iii) a description of how past injustices are to be rectified. Nozick's theory of just distributions is a procedural theory: a distribution is just if it arises from another just distribution by legitimate means. "Legitimate means" are those means which do not violate an agent's rights not to be interfered with. A market process based on free trade is legitimate in Nozick's view. Lump-sum taxation and transfer of endowments by the state is not. Such a redistribution of endowments can well include slavery (e.g. debtors' prison), yet even more modest taxes will make the state, in Nozick's words, "a part owner in you". Your rights to liberty are thereby violated. For an extended treatment of Nozick's view that the "minimal state" will not include government-enforced redistributions of endowments, see Levin (1982). 7 See, for example, Varian (1975) who criticizes Nozick's minimal-state economy from the perspective of various end-state theories of justice. SThis conclusion is the familiar second theorem of welfare economics: If an allocation is a point of maximum social welfare for some particular social welfare function, it can be achieved by a suitable reallocation of initial endowments followed by trading to a competitive market equilibrium. 9Redistribution via lump-sum taxes and transfers of initial endowments is not straightforward. If initial endowments are meant to include labor power as well as property, then a lump-sum tax and transfer scheme requires knowledge of worker abilities. Abilities are not easily observed, however. We all have an incentive to understate our skills to avoid taxes. Thus, tax and transfers which sort workers by ability are needed. Generally, we cannot find a perfect scheme [see Stiglitz (1982)]. End-state theories of justice necessarily must use "second-best" tools of redistribution, that is, tools which cost economic resources. The gains in justice must compensate for the lost efficiency according to the end-state welfare criterion.
Ch. 12: The "New" Political Economy
653
2.2. Beyond the minimal state: When markets fail
Proponents of the minimal state, whether libertarian or utilitarian, assume that markets do arise to allocate all valued goods and services efficiency. In reality we observe many valued commodities which are not efficiently allocated by markets: clean air and water, basic scientific research, charity, and insurance against nuclear attack are examples. l° Where markets fail to achieve efficiency, governments may succeed. But it is important in such analyses to give markets a full hearing. Is government really necessary, or have we simply failed to use the market institution to its full potential? I shall argue below that in many important instances governments are necessary for economic efficiency, and that the central feature of those instances is the need for the coercive enforcement of cooperative behavior among self-seeking agents. 2.2.1. Public goods
Samuelson's now classic 1954 paper on the theory of public goods explains, perhaps more clearly than any other single source, the central issue in the failure of markets to achieve allocative efficiency when cooperation is required. Samuelson argues that the technology of certain goods and services is such that increased consumption by one member of society in no way diminishes the consumption of the other members of society: "more for you means no less for me". Such commodities are called "pure public goods" and include as examples national defense, a lighthouse in a harbor, or basic medical research. Samuelson proves that a Pareto-efficient allocation will require that the sum of the marginal private benefits [agents' marginal rates of substitution (MRS) of income for the good] must just equal the marginal cost (MC) of an additional unit of the public good: YMRSi = MC. Will a private market process, where all individual agents seek to maximize only their own welfare, satisfy this efficiency requirement? Generally, no. If purchased privately, each consumer will provide no more than the level where his/her own marginal benefit (MRSi) equals marginal cost (MC). If others in society also benefit - and for these goods, their consumption will not detract from the benefits of the original consumer - then too little (ZMRSi > MC) of the good will be provided. One can even imagine instances in which none of the good is purchased privately, even though all benefit from its provision. When exclusion is not possible, the original consumer will soon realize that he/she has purchased the good while others benefit at no cost. But why not let others provide the good privately? Such "free-riding" behavior may dominate private l°The absence of a market for a valued good is not necessarily a sign of a market failure, however. It is possible that a good is not produced and sold in markets simply because the highest price anyone would pay for the good is less than the lowest cost at which the good can be produced. We do not have a private market in space travel today; we may tomorrow.
654
R.P. Inman
purchase. If so, then all consumers hope to free ride on their neighbors and no one provides the good. Clearly, the market process with self-seeking consumers has failed to achieve a Pareto-efficient allocation of these goods and services. But has it? Samuelson's original conclusion was challenged from two directions, both arguing, fundamentally, that we have not been clever enough in our design of a competitive market process for the allocation of such commodities. The first challenge, offered by Tiebout (1956), argued that most of the goods and services provided by governments could be efficiently provided, if only governments could be made competitive with each other for customers. The natural setting for such "market" competition is the local public economy, where may suburban governments within a single metropolitan area compete for residents. In Tiebout's view, such competition among residential suburbs (private firms are ignored by Tiebout) would force each government to be technically (i.e. production process) efficient as well as allocatively efficient. If a local government provided too much or too little of a public good or were too expensive, residents would exit to another community which did meet their demands at the lowest cost.
Tiebout's insight is a powerful one; it is not, however, a solution to Samuelson's "free-rider" critique of the market provision of pure public goods. Crucial to the existence of a Pareto-optimal equilibrium in the Tiebout system of competitive government is the assumption that the publicly provided good displays congestion - "more for you does mean less for me". As new residents enter a community, each additional resident reduces the consumption of the public good enjoyed by existing citizens. To restore consumed output of the service to its original chosen level, more of the public facility (labor, capital, materials) must be provided. When the marginal cost of the added facilities just equals the average cost of providing the chosen level of the good to all residents, then the community has achieved the optimal (efficient) population size for that level of output. This efficient population size must be small enough to permit a sufficient number of competitive communities. However, if the good or service is a pure public good with no congestion, then the marginal cost of new residents is always zero and the efficient population size is infinite. In this case, a Tiebout competitive solution will not be efficient, and an efficient solution cannot be Tiebout competitively Demsetz (1970) offered a competitive market process which he argued would efficiently allocate even a pure public good. As long as consumers could be easily excluded from the consumption of the pure public good by the producer, then Demsetz argued that competitive private firms would be compelled to provide the 1 lThis is not to argue that the Tiebout strategy does not solve the allocation problem for congestible public services. It well may. See Chapter 11 by Rubinfeld in this Handbook, and the review by Bewley (1981).
Ch. 12: The "New" Political Economy
655
efficient (MRS i = MC) level of the pure public good. If everyone in society has identical demands for the public good (i.e. the same MRS schedule), then Demsetz's process will work. A firm provides a given level of the public facility and charges each of the N customers a price of at least 1/N of the cost. If that facility is too small by the Samuelson criterion for efficient provision (MRSi > MC), then there will be a profitable margin for an alternative firm to offer a larger facility and attract customers away from the initial producer. The competitive process continues until potential profits are bid to zero. At that point an efficient level of the pure public good is provided.1 2 The Demsetz argument breaks down, however, if customers do not have identical demands for the public good and if those different demands are not known by the competitive firms. Oakland (1974) provides the counter-example in this, the more relevant, case. Allow three equal-sized groups to have different levels of demand (MRS schedules) for the public good: low (MRS), middle (MRS 2 ), and high (MRS3). Everyone will wish to purchase the level of the public good demanded by the low group, and a Demsetz process will satisfy the low demanders through a firm providing a facility level up to the point where MRS1 = MC/N. The middle (MRS2 > MC/N) and high demanders (MRS 3 > MC/N) will clearly desire more. The Demsetz process will meet the additional demands of these two groups by creating a second firm which provides additional output until MRS2 = MC/(2/3 N). (Note the low demanders do not buy this additional output; hence only 2/3N consumers are available to share costs.) The middle and high demanders buy the public good from both firm 1 and firm 2. Yet the high demanders want still more output [MRS3 > MC/(2/3 N)]. A third firm will arise in the Demsetz process to satisfy this additional demand; output will be defined by MRS 3 = MC/(1/3 N) where only the high demanders (1/3 N) buy from the third firm. We need to ask whether this competitive allocation is Pareto efficient. The answer is no. Reallocations can occur which will make some citizens better off without harming others. Simply, the low demanders will pay some small fee to join the middle and high demanders in consuming from firms 2 and 3, and the middle demanders will pay a small, reduced fee to join the high demanders in consuming from firm 3. Those additional payments can be used to expand output and all citizens can be made better off. The Demsetz competitive process will provide too little of the public good. 12 A firm must charge a price per person (p) which at least covers the marginal cost per person of producing the facility: p MC/N. The price cannot exceed that maximum willingness to pay of a typical (identical) customer: MRS > p. Hence MRS 2 p > MC/N. If the existing firm provides a level of the good such that EMRS = N MRS > MC, or MRS > MC/N, then a new firm could provide a larger output, still make a profit by pricing the new output between MRS > p > MC/N, and attract all the customers from the initial firm. This profit potential exists as long as MRS > MC/N. Output expansion continues until MRS = MC/N. But at that point N MRS = EMRS = MC, which implies efficient provision.
656
R.P. nman
Demsetz (1970) is aware of this problem, and its solution. The private firms must be able to price discriminate between the low, middle, and high demanders. It is here, however, that the competitive market fails us. There is no incentive in the Demsetz process for consumers to reveal their true demands. If low demanders are given a price break by firms 2 and 3, what is to prevent the middle and high demanders from claiming to be low demanders too? The Demsetz process is incapable of price discrimination by tastes, yet that is exactly what is required for a market process to ensure a Pareto-efficient allocation. We are back then to Samuelson's original dilemma: each consumer has a private incentive to free ride on the public good provisions of others. The only way out of this problem is to design a mechanism which successfully reveals consumer preferences. For then, free riding is impossible. Unfortunately, no such competitive market mechanism generally exists when consumers are self-seeking utility maximizers. Those allocation mechanisms which can extract true preferences require a single, non-competitive organization for their implementation. One candidate institution is government. 3 2.2.2. Externalities
Like pure public goods, externalities create a conflict between self-seeking, maximizing behavior and the attainment of a Pareto-efficient allocation of social resources, and for fundamentally the same reason. If a Pareto-efficient allocation is to result when an activity creates benefit or harm for individuals other than those performing the activity, then we require the performing agent to be rewarded or penalized according to the marginal benefits and costs created. Dirty or clean air affects all consumers equally; more for you means no less for me. The criterion for the efficient allocation of such activities is identical to that for pure public goods [see Oakland (1969)]. The sum of the marginal benefits of the external activity (which may be negative in the case of "bads" such as soot) must equal the marginal cost of providing (a "good") or curtailing (a "bad") the activity [see, for example, Baumol and Oates (1975, ch. 5)]. When there are many beneficiaries from changing the level of an externality-producing activity, we confront the same difficulty here in using a market process for allocation as we did with pure public goods. Unless all consumers have identical demands for the
13Thompson (1968) offers an alternative institution: a discriminating monopolist who confronts each consumer with all or nothing offers. Thompson's process, while valid, is an extremely expensive one-on-one bargaining game played with each citizen. Groves (1973) and Groves and Ledyard (1977) have devised more manageable demand-revealing processes. I will have more to say about their processes below (see Section 3.3). For a general review, see Chapter 10 by Laffont in this Handbook.
Ch. 12: The "New" Political Economy
657
externality and the externality can be denied to non-payers, a market process will not achieve a Pareto-efficient allocation. The failure of the market to reveal correctly true consumer or producer preferences for changes in the external activity is the problem; all agents have an incentive to "free ride" against other similarly situated consumers or producers. What is required is a single organization encompassing producers and consumers which can extract from each agent their true costs and benefits and then tax or reward the externality-producing activity accordingly.' 4 Again, that coercive institution called government is a possibility. There is a well-known exception to this conclusion, however. In the case of two agents linked to each other through an external (non-market) activity in a world of complete information and costless bargaining, the two agents will strike a mutually advantageous (Pareto-efficient) bargain for the level of the external activity. First articulated by Ronald Coase (1960), this argument has spawned an enormous literature by economists and legal scholars. The Coase Theorem challenges in a fundamental way the notion that governments are always necessary to achieve an efficient allocation of non-market, external activities. But Coase's world is the very special one of two-person bargaining and full information, and Coase's conclusion -that private bargaining will efficiently allocate what markets cannot- is limited thereto." The Coase theorem does not generalize to the many-agent case without assuming costless coalition formation so as to collapse the problem back into a two-party (e.g. consumer vs. producer) bargain. But costless coalition formation in essence assumes costless preference revelation and no "free riding", hardly a helpful assumption when analyzing the relative roles of markets and governments.l6 Relaxing the assumption of full information raises the possibility that bargaining in a world with non-convexities will result in non-optimal allocations. Bargainers may reach and then remain at locally efficient (second-best) not globally efficient (first-best) allocations [Polinsky (1979)]. Only an organization capable of overcoming barriers to information can correct these errors. Government is again a candidate organization. While one must recognize the challenge of the Coase Theorem to the usual conclusion that collective activity is required for the efficient allocation of externalities, the Theorem is not a sufficient argument for no government at all. At least the Nozick "minimal state" is required to enforce property rights 14For examples of how this might be done, see Dasgupta, Hammond and Maskin (1980). 15It must be noted that Coase's theorem still requires a "minimal-state" for property right
enforcement and income redistribution. In Coase's analysis, either lump-sum taxes and transfers can be used to transfer income, or the initial property rights can be adjusted to protect preferred distribution of post-bargain resources. On the use of property rights for distributional targets, see Polinsky (1979). 16On what may happen when costless coalititions cannot be formed, see Schultz and d'Arge (1974).
658
R.P. nman
agreements and in most real world cases, a more activist governmental position towards externalities must be considered too. 7 2.2.3. Increasing returns to scale
It has long been recognized that significant increasing returns to scale in the production of a commodity is a potential source of market failure [Dupuit (1844), Hotelling (1938)]. In instances in which production requires a large fixed cost for capital, "know-how", or natural resource development, average cost may exceed marginal cost over all relevant (i.e. demanded) levels of output. If so, Paretoefficient marginal cost pricing cannot be sustained by the private market process. A private firm which prices at marginal cost will not cover average costs, and in the long run will be driven from the market. A sustainable market process will produce either of two Pareto-inefficient outcomes in this case. In the case where no threat of market entry exists, the established firm will price at the long-run profit-maximizing level where marginal revenue equals marginal cost. Price will exceed marginal cost, and output will be below the Pareto-efficient level. A social welfare loss results [Harberger (1954)]. In the case where a significant threat of entry exists and the established firm's monopoly position is "contestable", the existing firm will be forced to lower its price and its profits to deter entry. In the limit, the firm will be driven to price at average cost where monopoly profits are zero, the so-called "Ramsey price" in this simple case [Baumol, Bailey and Willig (1977)1.18 Yet, price equal to average cost is not a Pareto efficient outcome either; again output is less than when price equals marginal cost. '7 Cooter (1982) offers an even stronger critique of the applicability of the Coase Theorem. Coase assumes that agents who can benefit from a bargain will in fact bargain until all the gains are exhausted. That is one view of human nature, but not the only one. Cooter offers the Hobbesian view as an alternative. Bargainers may adopt their worst threats- the threat not to bargain at all- unless there is a third party called government which forces them to negotiate. "The Coase Theorem represents extreme optimism about private cooperation and the Hobbes Theorem represents extreme pessimism." The two theorems have contradictory implications for the role of governments. "According to the Coase Theorem, there is no continuing need for government... According to the Hobbes Theorem, the coercive threats of government... are needed to achieve efficiency when externalities create bargaining situations... (T)he government continuously monitors private bargaining to insure its success" (p. 19). For some experimental evidence on the Coasian vs. Hobbesian view of bargaining, see Hoffman and Spitzer (1982). The evidence seems to favor the Coasian view of bargaining at least in the two-person, full information situation first analyzed by Coase. 1 "The original argument of Baumol, Bailey and Willig (1977) considered the more interesting case of a natural monopoly involving both increasing returns to scale ("economies of scale") and economies from multi-product production ("economies of scope"). In the simple case above, only economies of scale are considered. In the more general case at a multi-product firm, cross-product subsidies are possible and the resulting Ramsey prices may be above average cost for some commodities and below average cost for others. Nonetheless, these prices are still second best against alternatives which permit lump-sum (non-distortionary) taxes. See the argument in the following paragraph which generalizes to the case of economies of scale/economies of scope as well.
Ch. 12: The "New" Political Economy
659
The first-best solution to this allocation problem is to price at marginal cost and then to raise the required revenue to cover the long-run losses from pricing below average cost through a system of lump-sum, non-distortionary charges on all consumers. Can a private firm adopt this first-best pricing strategy? Not unless it knows who its customers will be before it builds its facilities.1 9 If customers are known only after the facility is in place, a subtle game of preference revelation will emerge between consumers and the firm which may prevent the attainment of long-run efficiency. The firm needs to charge the full lump-sum fee, a fee which each consumer has an incentive to avoid. Consumers will agree to purchase the good for a small mark-up above marginal cost (hence contributing a token amount towards fixed cost) but can threaten not to consume if charged the full lump-sum fee. An existing firm may view a small contribution better than none at all, but in the long run it suffers losses and is driven from the market. New entrants may fear the same fate. The market fails. First-best, Pareto efficiency can only result from the market process if the demands for the product are known before the production facilities are put in place. This requires an organization encompassing all consumers and producers which can extract from each agent their true costs and benefits of providing the commodity. Governments are a possible solution.2 0 2.2.4. Incomplete information The competitive market process as a mechanism for achieving a first-best, Pareto-optimal allocation of resources requires that each consumer be fully informed about all attributes of the commodity being purchased. If this is not true, particularly if the seller knows more about the commodity (whether output or input) than the buyer, market transactions will be characterized by incomplete or asymmetric information. Either of two problems may arise in this instance. The first, called moral hazard, arises when the buyer cannot observe controllable 19
For example, the firm may write long-term contracts with potential customers and then provide the service if a sufficient number of customers can be found to contribute to the cost of the overhead. Note, however, that all customers still have an incentive to free ride on the first-users. Once the facility is in place, the firm has an incentive to write second contracts with new customers and to charge fees which cover marginal costs and make only small contributions to overhead. Everyone will want to avoid the first-user contracts and sign a "new" customer agreement. Again the free-rider problem may undo a clever, private-market contracting scheme. 20 0n how governments might extract this needed information, see Baron and Myerson (1982). There is considerable discussion in the industrial organization literature on whether this informational argument for government provision of natural monopoly commodities means government should run monopolies or just license them. Posner (1971) and Demsetz (1968) argue in favor of licensing after bidding for the rights to provide the commodity. Bids should be awarded on the basis of lowest per unit price [Stigler (1968)]. Williamson (1976) argues that the information argument may still require government regulation or management of natural monopolies. Loeb and Magat (1979) offer another approach to natural monopoly regulation which conserves information about firm costs, but still requires information on consumer demand.
660
R.P. nman
actions taken by the seller. The typical example is the level of care or effort taken by a seller in performing a service. If care cannot be monitored, the incentive for sellers is to underprovide quality, even though consumers would pay for a better product [Holmstrom (1984)]. The second, called adverse selection, arises when the buyer cannot observe important exogenous characteristics or contingencies affecting the seller. When health-care consumers cannot distinguish between competent and incompetent physicians, car buyers cannot tell a "cream puff" from a "lemon", and insurance companies cannot separate good and bad risks, then adverse selection may drive good physicians, good cars, and good risks from the marketplace. Product quality declines, or the market may disappear altogether [Akerlof (1970)]. Market institutions may arise to help overcome informational asymmetries. The more able sellers may be willing to offer contingent contracts whereby payment is made only if product quality meets a given standard. Warranties are an example of contingent contracts. Seller reputation is another potential device for overcoming the problems of informational asymmetries. Sellers planning to remain in the market (this fact must be known!) have an incentive to protect product quality over the long run [Klein and Leffler (1981)]. Furthermore, contracts may act as signals of seller characteristics or effort. Only the most able or those willing to work hard will risk a contingent contract for their services. Investment in education may signal the more able in the work force [Spence (1973)]. Finally, sellers may band together to form professional associations to certify product quality of its members [Skaked and Sutton (1981)]. The issue here is whether these market responses are sufficient to overcome the difficulties originally created by incomplete information. The tentative answer appears to be, no [Holmstrom (1984)]. While they are ingenious institutional responses which expand the range of beneficial trades, they do not exhaust all possibilities for improvement. Further trades would be possible, if additional information about sellers could be provided to buyers. Is government necessary for the provision of this additional information? Several arguments, particularly favoring government certification, suggest that it is. If certification of sellers is costless, i.e. information is free, all but the least qualified sellers would allow themselves to be certified and the resulting market would be efficient. When certification is costly, however, markets will not be efficient and government intervention may be desirable. The preferred allocation will set the marginal benefits of additional certification (measured by the gains from now available trades) equal to the marginal costs of providing certification. There are likely to be significant externalities in such a process, however, and herein lies the case for possible government intervention. First, the value to a single seller of being certified may depend upon who else is certified; information about quality is often a statement of relative (better than, worse than) value. In such a case no one may signal because going first is not worthwhile. Only
Ch. 12: The "New" Political Economy
661
cooperative action by all sellers can overcome this "network externality" [Wilson (1982)]. Second, while the process of certification confers benefits on the most able sellers, it also imposes an externality on the least able who earn less with certification. This external cost of certification is ignored by the more able. Excessive information therefore may be provided by a market process [Jovanovic (1982)]. Again, a cooperative action by all sellers- how to limit certification- is necessary. Might we then rely upon private seller cooperatives to certify quality and ensure efficiency? No, for with the right to control certification goes the right to control entry. Skaked and Sutton (1981) show how professional groups which license entry will give out too few licenses, inflate prices, and earn monopoly rents. In summary, certification is a commodity- information about sellers - which has all the characteristics of a Samuelson public good. Individual consumers generally have no incentive to provide such a public service. Individual sellers may provide certification but because of either positive or negative spill-overs the level is likely to be non-optimal. Government regulation of certification is one solution to these problems.2 2.2.5. Unemployment A major, if not now the most prominent, role of government in a market economy is the management of cyclical fluctuations in output and employment. But not all observed unemployment requires government intervention. Some unemployment can be socially beneficial, allowing workers and firms to match skills and needs through market search. This so-called search or "natural" rate of unemployment is a structural feature of any economy with less than perfect information about possible labor market transactions. By itself it is not an imperfection in market economy performance which macro-economic fiscal or monetary policy should correct; indeed, such policies may only make matters worse [Friedman (1968)]. Market performance can be improved through a better matching of workers and firms, but it is not obvious that a government is necessary to provide that service. A deeper argument is needed. It is now generally appreciated that the case for government intervention to increase employment -whether through monetary and fiscal policies or through policies to improve labor market performance-must be grounded in a well specified micro-economic argument of market failure. Fundamentally, the problem of unemployment is a failure of self-seeking workers and firms to coordinate their actions to achieve mutually beneficial trades. In this sense unemployment is 21 Yet again the government must have a mechanism which will induce suppliers to reveal their true qualities. Various signalling equilibria have been proposed and analyzed in the new literature on the economics of information; see Riley (1979).
662
R.P. nman
"involuntary" (not "natural") and its correction therefore requires intervention of an extra-market coordinating agent such as government. But why might labor markets fail to exhaust all beneficial trades? 2 2 Either of two arguments seem potentially persuasive. The first builds from the theory of asymmetric information as applied to labor services. The labor market is marked by random shocks to firms' demands for labor creating alternate periods of low and high wages and employment. Riskaverse workers will want to insure against such swings in their annual incomes. In the case of symmetric (i.e. identical) information regarding demand shocks between insured workers and the insurers (whether the firm or a third party), an optimal insurance contract can be written which will efficiently allocate the economic risk.2 3 However, if the information regarding demand is asymmetrically distributed to the advantage of either party (e.g. the firm knows demand and the worker does not), an inefficient insurance contract may result. The reasons have been outlined above - moral hazard or adverse selection - and have been carefully examined by Azariadis (1975), Baily (1974) and others [see Azariadis and Stiglitz (1983, particularly sections IV and V)]. The conclusions here parallel those of the asymmetric information literature; less than a full employment Pareto-optimal allocation may result which can only be corrected through the truthful revelation of private information by workers or firms. Government intervention may be necessary in this case either (1) to collect and reveal relevant information to all parties, or (2) to provide unemployment insurance directly to workers if such insurance does not arise privately. 2 4 The second market failure argument for government intervention to improve employment rests fundamentally on economy-wide increasing returns to scale in the production of private goods and services [Weitzman (1982)].25 Strict constant
22 In essence the failure of the labor market is a failure of wages to adjust to clear the labor market; a point appreciated since the work of Keynes. If wages did adjust, a Walrasian equilibrium would result with no involuntary unemployment. As Drazen (1980) has emphasized in his review of the recent disequilibrium literature, the task before us now is not to explain what happens when wages are sticky, but rather why they are sticky in the first place. That is, why does the market fail? 23 This efficient insurance contract may well involve lay-off unemployment. See Azariadis and Stiglitz (1983). 24 It should be noted that any government policies which follow from the asymmetric information argument are micro-economic policies directed at improving the insurance of worker income against exogenous demand shocks. For an interesting asymmetric information argument which argues for macro-economicpolicies, see Woglom (1982). In Woglom's work the asymmetric information problem occurs in the product markets, not the labor markets. Note too that minimizing exogenous demand shock risks is also a policy alternative. But to argue for government as a necessary mechanism to reduce such shocks first requires a theory of how unanticipated demand shocks arise. I know of no theory which argues such shocks are a logical outcome of a market economy. 25 Weitzman emphasizes that increasing returns to scale is a general description for all the economic advantages of size: physical economies, internalization of externalities, economizing on information, specialization of roles, access to capital, and so on.
Ch. 12: The "New" PoliticalEconomy
663
returns to scale (and symmetric information) must imply full employment. With full divisibility of production provided by constant returns to scale, any unemployed factor can "hire itself and any other factors it needs and sell the resulting output directly.... The operational requirement is that the efficient minimum-cost scale of production be sufficiently small, relative to the size of the market, that any one firm or plant cannot affect prices appreciably" [Weitzman (1982, pp. 791, 792)]. Individuals can sell all that they produce at the market price. Full employment is therefore a logical consequence of the assumptions of constant returns to scale and price competition. Allowing increasing returns to scale upsets this happy balance of supply and demand. With increasing returns to scale, firms are large relative to their markets and thus face a downward sloping demand curve for their products. It is this fact which creates the possibility of a less than full employment equilibrium. Each existing firm, or new firm created by unemployed factors, is afraid of glutting its own market by increasing its output. New supply "spoils" its own market, before it has a chance to create new demand. Only the simultaneous expansion by all firms in the economy can overcome this barrier to full employment. Yet there is no market mechanism that will insure such coordinated expansion of economic activity. An extra-market institution - e.g. government - will be needed to move the economy to the preferred full employment equilibrium. In this case the appropriate policies are those which can expand aggregate demand, for example, monetary [Woglom (1982)] and fiscal [Hart (1982)] policy. The essence of each of the five market failures detailed above is the need for some institution to overcome the proclivity of individual, self-seeking agents to act uncooperatively when cooperation among agents is needed for a mutually beneficial trade. Markets as institutions for organizing economic activity often cannot insure the required cooperative behavior. When markets fail, they fail because they cannot be designed to prevent individual cheating against a mutually preferred cooperative outcome. In such cases, a cooperative agreement among all parties is required, and an extra-market institution - possibly a government will be needed to enforce it. Understanding why an enforceable cooperative agreement is necessary, and hence why governments may sometimes be preferred to markets for allocating economic resources, is our next task. 2.3. Market failures as a Prisoner'sDilemma: The potential case for government
The essence of the problem of the failure of two individuals to cooperate, even when cooperation is in their joint interest, is captured by a game called the Prisoner's Dilemma. 2 6 Imagine two agents deciding whether to provide a public 26
The formal structure of the game was originally proposed by Merrill Flood (1958), though the issue of non-cooperatve behavior when cooperation is jointly preferred has been discussed by
664
R.P. Inman Player 1 C
N
(1)
(1) C
(2)
(2) Player 2 (1)
(1) N
$6
/
$0
________________________________________________ ______________________________________________2 _____
Figure 2.1. The Prisoner's Dilemma
good which yields each with a benefit of $6 but costs a total of $8 to produce. The two agents can cooperate (strategy C) or not (strategy N) to provide the good. Once the good is provided the benefits are enjoyed by both parties. No communication or pre-contracting outside the game is allowed. If both agents cooperate, they share the costs equally and enjoy a net benefit of $2 per person [= $6 - ($8/2)]. If one agent produces the good but does not receive a contribution from the other, then the producing agent's net benefit is - $2 (= $6 - $8). In this case, the non-cooperative, free-riding agent enjoys all the benefits at no cost and receives $6 (= $6 - $0). If the good is not produced by either agent - that is, both act non-cooperatively - then each agent receives 0 net benefits. The best outcome for either player occurs when the other provides the good and he or she free rides, but if both parties attempt to free ride the good will not be produced. Unfortunately, the non-cooperative, "free-riding" strategy is the dominant (i.e. always best) strategy for both agents. That this is so can be seen from the matrix form of this game in Figure 2.1 above. Pay-offs to each player for each combination of strategies are shown above (for player 1) and below (for player 2)
political economists as the basis of collective action at least since David Hume's example of the flooded meadow: Two neighbours may agree to drain a meadow, which they possess in common; because 'tis easy for them to know each others mind; and each must perceive, that the immediate consequence of his failing in his part, is, the abandoning the whole project. But 'tis very difficult, and indeed impossible, that a thousand persons shou'd agree in any such action; it being difficult for them to concert so complicated a design, and still more difficult for them to execute it; while each seeks a pretex to free himself of the trouble and expense, and wou'd lay the whole burden on others. David Hume, A Treatize of Human Nature, book 3, part 2, sec. 7, p. 538.
Ch. 12: The "New" Political Economy
665
the diagonals in each cell. The joint cooperation strategy (C, C) yields players 1 and 2 $2 each. If player 1 cooperates but player 2 does not, the pay-offs are - $2 and $6, respectively. If player 2 cooperates but 1 does not, the pay-offs are reversed. If both players adopt the non-cooperative strategy, the pay-offs are $0 to each. Note that the dominant strategy for both agents is not to cooperate. If agent 2 cooperates (C), the best strategy for agent 1 is not to cooperate (N) as $6 > $2. If agent 2 does not cooperate (N), then again the best strategy for agent 1 is not to cooperate (N) as 0 > - $2. Either way, non-cooperation is preferred. The same logic applies to agent 2 deciding how to play against agent 1. Since both players find non-cooperation the preferred individual strategy, the outcome of this game is in the (N, N) cell where no public good is provided. Yet both agents would prefer the cooperative outcome to this result. Cooperation is preferred, but no agent will play cooperatively for fear of suffering large losses if their neighbor cheats. Just as the market may not provide a contract to ensure cooperative behavior when there is an incentive to act independently, so too do the players in the Prisoner's Dilemma game lack the means to enforce the cooperative solution. The pay-off matrix of the game illustrates the advantages to cooperation. The fact that the players cannot communicate or write binding contracts parallels our market failures. Without a cooperative contract and an enforcement mechanism, non-cooperation (N, N) is the inevitable result."7 Without too much effort one can construct a Prisoner's Dilemma story which captures the essential features of each of the market failures described above. 2 8 In each instance there are gains to cooperation but a failure of private contracting to ensure the cooperative outcome. If government is to have a role to play as a mechanism for the allocation of social resources beyond its functions in a "minimal state", then it is in its ability to enforce a cooperative outcome in these market failure, Prisoner's Dilemma games. Before we rush to embrace government, however, perhaps we should step back for a moment and ask if there might not be reasonable instances in which agents will voluntarily select the cooperative allocation within the Prisoner's Dilemma setting. In an analogy to market failure, markets failed because explicit and enforceable contracts could not be written, but perhaps "implicit" (i.e. legally unenforceable) contracts for cooperation will be chosen voluntarily. If the answer is yes, then the need for government - coer27 Both parties have adopted their Nash strategy in this non-cooperative, finite sum game and the outcome is a Nash equilibrium. The game is non-cooperative because no communication or extra-game interaction between the players are allowed. The game is finite sum because there is a finite gain to cooperation, if cooperation can be achieved. 2 SFor public goods and externalities see Riker and Ordeshook (1973, chapter 9). Klein and Leffler (1981, p. 624) describe their market game with asymmetric information as a Prisoner's Dilemma. Woglom (1982) and Weitzman (1982) both mention the Prisoner's Dilemma character of their description of the unemployment problem. Sen (1967) and Warr and Wright (1981) stress the Prisoner's Dilemma character of non-optimal intertemporal market allocations.
666
R.P. Inman
cive and expensive-is significantly reduced. Are there voluntary, cooperative 29 solutions to the Prisoner's Dilemma game? The Prisoner's Dilemma allocation problem described by the matrix in Figure 2.1 was a game played once between two agents. There was little to be gained by trusting your opponent and playing the cooperative strategy since you were sure to lose. Furthermore, since the game was played only once there was no way to penalize a cheating opponent. But our economic lives consist of many market failures, or Prisoner's Dilemma problems, and some -e.g. do we, or do we not litter in the park - are repeated almost daily. Might not the better characterization of market failures be a repeated Prisoner's Dilemma game, and, if so, is the non-cooperative strategy dominant in this case as well? In the case in which the Prisoner's Dilemma game is to be played a known and finite number of times, the non-cooperative strategy does remain the dominant strategy for each player. Consider a game which is to be played 100 times. For the 100th play, both agents know this is the last play of the game. Therefore, this iteration is equivalent to a one-time Prisoner's Dilemma game, where the noncooperative strategy is dominant. Thus, the outcome of the 100th play is known; both agents play non-cooperatively and receive $0 pay-offs. But if the outcome of the 100th iteration of the repeated game is known, so too will be the outcome of the 99th iteration. For now when players consider the 99th play of the game, they will realize the 100th play is already determined. What is the best strategy for the 99th iteration? Since nothing can be gained at the 100th iteration by playing cooperatively at the 99th, the dominant strategy for both players is again to play non-cooperatively. Thus, the outcome of the 99th play is known; it is the non-cooperative result with $0 pay-offs for both agents. This logic can be repeated for each preceding iteration of the game, until it is clear that the non-cooperative strategy is the dominant strategy for each iteration of the repeated Prisoner's Dilemma game. It appears that allowing for the fact that market failure-Prisoner's Dilemma situations are pervasive and repetitive will not, by itself, diminish the potential role for government to insure cooperative allocations. assumption of 29In the discussion which follows, I consider only those approaches which keep the self-seeking, utility-maximizing agents. We could of course obtain cooperative outcomes by assuming agents wish to behave cooperatively for reasons of friendship or the ethical importance of cooperative behavior. Such moral motivations are real enough. Indeed, there is a good deal of experimental evidence which indicates that friends, or parties, told to be cooperative, do chose the cooperative allocation [e.g. Deutsch (1973)]. But that strategy significantly changes our problem; a high penalty is now attached to the non-cooperative strategy such that the problem is no longer a Prisoner's Dilemma. For example, subtract $5 from the pay-off for each non-cooperative play in Figure 2.1; it should be clear that cooperation is now the dominant strategy in this new game. [Though unselfishness will not always solve our problems of non-cooperation, and such motivations may even create non-cooperative behavior! See Weymark (1978).] Since my focus is on the market failure-Prisoner's Dilemma problem, I will not discuss these alternative approaches to cooperative behavior. Hardin (1982, chs. 6 and 7) provides a useful survey. See also Kurz (1977) for a more formal analysis of resource allocations with altruistic agents.
667
Ch. 12: The "New" Political Economy
We can relax the structure of our analysis a bit further, however, and voluntary cooperative behavior will emerge, at least for some of the plays of a finite length, repeated Prisoner's Dilemma game. Axelrod (1981) drops the restriction that the finite Prisoner's Dilemma game has a known number of iterations and allows each player a degree of uncertainty that the next play will be the last. Axelrod asks whether in this uncertain setting cooperative behavior might emerge. The answer is yes, provided there is a sufficiently large probability that the game will continue after each iteration. The argument proceeds in two steps. First, it is shown that if the probability that the game will continue is high enough, then there is no best strategy against all strategies your opponent might play.30 Specifically, the non-cooperative strategy is no longer the dominant strategy. 30
The matrix of the Prisoner's Dilemma game can be generalized to: Agent 1 C
N
(1)
(1) R
C
T R
S (2)
Agent 2 (1) N
(2) (1)
S
P T
P (2)
(2)
where the payoffs are related as T> R > P > S and R 2 (S + T)/2 so that the alternating strategy of N then C is not preferred to the outcome from mutual cooperation. If the pay-off in any period is uncertain because the future of the game is uncertain, then this uncertainty should be reflected in players' strategy selection. Assume players are risk neutral and discounting of future pay-offs is not relevant. (The conclusions generalize to risk aversion and discounting.) A player who receives the pay-off P in each period the game is played receives an expected pay-off over the life of the game of 2
3
P + wP + w P + w P +
..
= P/(1 - w), where w is the probability the game will continue after
each play. In this setting there is no one best strategy independent of the strategy used by an opponent. To see this, suppose strategy A is best no matter what your opponent does. This means the value of playing A against any strategy B your opponent plays must dominate the value of some other strategy A' against B, that is, V(A I B) > V(A' IB). Consider the two general forms for strategy A: (1) cooperate until your opponent does not cooperate and then stop cooperating (the "nice" strategy), and (2) stop cooperating before your opponent does (the "not nice" strategy). First, if A is "nice", let A' be always play N and let the opponents strategy (B) be always play C. Then V(A I B) = R/1 - w. The value of A' is V(A' IB) = T/1 - w which is greater than V(A IB). Therefore when A is nice it is not always the best strategy. Second, let A be a "not nice" strategy, define A' to be to always play C, and define the opponent's strategy B to cooperate until the first player no longer cooperates and then to play N. Let strategy A be to defect after the first play. Then V(A' IB) = T+ (wP/1- w) and V(A' IB) = R/1 - w. In this case V(A' IB) > V(A I B) if w > (T - R)/T - P. If the probability of the game continuing is high enough then no one strategy is always best [Axelrod (1981, theorem 1)]. Axelrod's theorem 1 is a formalization of the insight of Hardin's (1982, p. 146) critique of non-cooperative outcome of the finite length Prisoner's Dilemma game.
668
R.P. Inman
Second, Axelrod proves that a cooperative strategy which punishes non-cooperators-called the "tit-for-tat" strategy-is a potentially stable strategy in the Prisoner's Dilemma game with uncertain repetition. Furthermore, if this strategy is adopted by some specified fraction of the players participating in a society of repeated Prisoner's Dilemma interactions the "tit-for-tat" strategy will eventually dominate the pure non-cooperation strategy.31 With the tit-for-tat strategy each player plays cooperatively until his opponents acts non-cooperatively and then he plays one round non-cooperatively in retaliation. With everyone playing tit-for-tat, cooperative allocations result. Has uncertainty about the number of iterations of a Prisoner's Dilemma game therefore given us the "out" we need? Surely, uncertainty is a reasonable relaxation and by itself does little violation to our view of market failures. The more relevant question is whether most market failure situations will meet the precise conditions of the Axelrod theorems. That is an empirical matter. Yet there remains an important logical gap in applying the Axelrod results to our question of market failures and the potential need for government. For the cooperative solution to emerge voluntarily, there must first be at least a small set
31Suppose a strategy B is being used by all members of society. We then introduce a new, small group into society who use a new strategy A. Those who play A interact with a probability p with another play of strategy A and a probability (1 -p) with someone who plays B. If the value of playing A against A is V(A IA) and the value of playing A against B is V(A IB), then the expected pay-off of playing A is p V(A IA) + (1 - p)V(A IB). Since there are so many B's, we can assume B's interact almost always with B's. The value of playing B is therefore V(B B). Playing strategy A will be preferred to playing B if p V(AlA) + (1 - p)V(A IB) > V(B IB), or if
V(B B) - V(A B)
P
V(A lA)- V(A lB)
Let strategy A be the "tit-for-tat" strategy, and let strategy B be to always play non-cooperatively. Then for the generalized pay-off matrix in footnote 30, V(BIB)= P/(1- w), V(AIB) = S wP/ (1 - w), and V(A IA) = R/(1 - w). Upon substitution, playing tit-for-tat will be preferred to always being non-cooperative if (P- S) + (SP> (R-S)+(S-P)w
P)w
If we let P = 0, S = - 2, and R = 2 (the pay-offs in Figure 2.1) and w = 0.5 (a 0.5 probability of continuing), then p > 0.33 will insure tit-for-tat will dominate always behaving non-cooperatively. If w = 0.9, then p > 0.091. As long as tit-for-tat players can be sure of interacting with enough other tit-for-tat players, then the tit-for-tat strategy will dominate a pure non-cooperative strategy [Axelrod (1981, p. 315)]. In a similar vein, Kreps et al. (1982) argue that cooperative behavior can emerge from a Prisoner's Dilemma game if each player believes the other will play tit-for-tat some fraction 8 of the time. The fraction is not motivated from either individual self-seeking behavior or the structure of the pay-offs; in a sense, players are simply willing to try cooperation now and then. Kreps et al. show that for 8 > 0, there is an upper limit to the total number of plays which involve non-cooperation.
669
Ch. 12: The "New" Political Economy
of players willing to play cooperatively at the start. What motivates their behavior? Certainly no self-seeking, utility-maximizing player will adopt the tit-for-tat strategy on his own. If all others play non-cooperatively, a tit-for-tat strategy will give you a negative pay-off in the first period (you cooperate and your opponent does not) and 0 pay-offs (both non-cooperate) after that. A non-cooperative strategy at each play gives 0 pay-offs in each iteration and is preferred. Cooperative behavior can emerge, but only if a significant subset of agents are willing to risk a loss and act cooperatively first. Rather than an argument against the need for government to handle market failures, Axelrod's analysis may be seen as offering a case for government. Government protects and encourages those first cooperative agents. It is enforced cooperation by government in the early stages of the repeated Prisoner's Dilemma problem which permits the tit-for-tat strategy to take hold. Over time, voluntary cooperative behavior may replace the need for enforced cooperative behavior. But enforced cooperation via governments is still necessary initially to solve the problems posed by market failures. Radner (1980) has adopted a different approach for motivating voluntarily cooperative behavior in Prisoner's Dilemma games. In Radner's analysis players are greedy and self-seeking, but only up to some margin, epsilon. Radner proves that an equilibrium will result in finite, Prisoner's Dilemma games which have the characteristic that if each player adopts a strategy within epsilon in pay-off of being his best strategy, then some maximal fraction of iterations in the repeated Prisoner's Dilemma game will be played cooperatively. This fraction of cooperative plays increases as epsilon increases.3 2 Radner suggests this epsilon-equilibria approach to cooperative behavior can be motivated from either of two perspectives. First, decision-making is costly, and epsilon measures the percent of maximal returns beyond which further searching for an optimal strategy is no longer worthwhile. Alternatively, epsilon may be interpreted as that fraction of maximal returns which each player will willingly forgo to protect cooperative behavior. Neither motivation is persuasive by itself; theories of bounded rationality or charitable behavior are needed. Furthermore, Radner's results establish only an upper bound to the percent of iterations which are played cooperatively; 32 In the repeated Prisoner's Dilemma game considered by Radner the number of total iterations (T) which are played by a "nice", or cooperative strategy is denoted k. If N is the number of players in Radner's game and e is the epsilon margin tolerated by each player, then Radner shows for the most compelling form of player behavior that the upper limit to the fraction of cooperative plays (k/T) is given by
1+
1- (1/e){(N-1)/4N} T
k T
[Radner (1980, p. 153)]. As e rises, the upper bound to the fraction of cooperative plays increases.
670
R.P. Inman
non-cooperative behavior is still likely and may be pervasive. 3 3 The need for government to enforce cooperative behavior thus remains as an institutional response to market failures. Smale (1980) explores yet a third approach to the motivation of voluntary cooperative behavior by self-seeking agents in Prisoner's Dilemma games. The analysis is of an infinitely repeated, two-person Prisoner's Dilemma game. Smale shows that cooperative behavior by both agents will eventually (i.e. asymptotically) emerge as the preferred strategies and that once mutual cooperation is in place the outcome is globally stable. Smale's results are based upon two assumptions. First, players have limited memories and therefore store information about past outcomes of the game through summary statistics which are updated after each play. For example, the average of past pay-offs to each player might be used. Decisions regarding how to play the next iteration are based on the value of these summary statistics. Smale's second assumption assumes the agents are "nice" but not foolish or charitable. If my opponent's past pay-offs as measured by the summary statistic are greater than my past pay-offs so measured, then I play non-cooperatively. If the summary measure of my opponent's past pay-offs is less than mine, then I play cooperatively. Finally, if our summary measures are equal, then I again play cooperatively. Smale calls this decision rule a "good" strategy, and he proves that a two-person, infinite Prisoner's Dilemma game will voluntarily converge to the Pareto-optimal cooperative solution if both players adopt the "good" strategy. While of potential interest in other settings (e.g. duopoly market analysis), Smale's result does little to upset the conclusion that market failure problems will require enforced cooperation for their solution. First, the analysis is restricted to two-person games, while most market failures involve at least several players; as the number of players increases, cooperation generally becomes more difficult.3 4 Second, the infinite repetition structure of the game is not appropriate when
33
Using Radner's formula for the estimate for the upper bound of his Prisoner's Dilemma game, we see that for a game with 50 repeated plays with the same 100 players (small town America) and e equal to 5 percent error, the upper bound on the fraction of cooperative plays is 0.996. For a game with five repeated plays with the same 100 or more players and e equals 2 percent error (big city America with sophisticated mobile residents), the upper bound on the fraction of cooperative plays falls to a bit less than 0.6. With E equal to 1 percent error, the upper bound for the fraction of cooperative plays falls to zero. In a mobile society of sophisticated, self-seeking agents non-cooperative behavior is the norm not the exception. 34 Smale's analysis may generalize to more than two players, but to date no results have been obtained. The analysis of N person, finite Prisoner's Dilemma games suggests cooperation becomes more difficult as the number of players increases. See Radner (1980) and more generally the discussion in Olson (1965, ch. 2) and Hardin (1971). For some preliminary work in this direction, see Rubinstein (1979).
Ch. 12: The "New" Political Economy
671
players have finite lives. 35 In the same spirit, the result insures only asymptotic convergence to cooperative behavior, but finite-lived players are surely interested in the speed of convergence too. Enforced cooperation today at a cost may be preferable to voluntary cooperation at some very distant future date. Finally, the emergence of the cooperative result in Smale's analysis is extremely sensitive to the precise formulation of the "good" strategy. Specifically, suppose that the good strategy is followed except when equal values for the players' summary statistics of past pay-offs arise. Then one player may decide not to cooperate rather than to cooperate. If so, the game converges to the joint non-cooperative outcome rather than the joint cooperative outcome, and the joint non-cooperative outcome is globally stable. Hence, a willingness to play "nicely" is a key assumption behind Smale's result. While the analyses of Axelrod, Radner, and Smale suggest ways through the Prisoner's Dilemma to voluntary cooperation and implicit market contracts, no panacea is at hand. In those market failure "games" in which players are not well known to each other, the number of iterations is few, and the number of players is large, non-cooperation rather than cooperation is likely to be the equilibrium behavior. In such cases only enforced cooperation through an institution such as government can lead us from the non-cooperative outcome.3 6
35 Smale appeals to Aumann (1959, p. 295) for justification of the infinite horizon structure to his game. What Aumann really seemed to have in mind was an argument similar to that which motivated Axelrod's analysis:
In the notion of supergame that will be used in this paper, each superplay consists of an infinite number of plays of the original game G. On the face of it, this would seem to be unrealistic, but actually it is more realistic than the notion in which each superplay consists of a fixed finite (large) number of plays of G... This is unnatural, because in the usual case, the players can always anticipate that there will still take place in the future an indefinite number of plays. This is especially true in political and economic applications, and even holds true for the neighborhood poker "club". Of course when looked at in the large, nobody really expects an infinite number of plays to take place; on the other hand, after each play we do expect that there will be more. [Italics added.] 36 Baumol (1952) emphasized the point in his now classic analysis of market failure and the role of the state: In those cases where the welfare of the various members of the economy is partly dependent on each others' activity, it is possible that persons in pursuit of their own immediate interests will be led to act in a manner contrary to the interests of others... It may be that the disadvantage to each of them of such activity is so great and of such a sort that arrangements can be made on a purely voluntary basis whereby it can be avoided altogether. Where such arrangements cannot be relied on, it becomes advantageous to the members of the economy to have activities restricted by coercive measures... The fundamental importance of this point to the theory of government should be noted, for in the very concept of government there is an element of coercion... (pp. 180-181).
672
R.P. Inman
2.4. Summary While competitive markets are extremely attractive institutions for allocating societal resources among economic agents, they are not by themselves sufficient to insure best allocations against the minimal ethical criterion of Pareto optimality: that all trades be undertaken which can make at least one agent better off without at the same time harming another. At a minimum, government is required to protect and validate the trading process through the maintenance of a system of property rights. Those who embrace a stronger ethical criterion than Pareto optimality may also ask government to redistribute the initial endowments of agents to insure a more equitable allocation of final goods and services. In fact, governments may be asked to do a good deal more than protect property rights and transfer endowments. The motivation for an extended agenda lies in the important fact that competitive markets may not be capable of facilitating all beneficial trades between agents. Markets fail. They fail for the fundamental reason that the institution of market trading cannot enforce cooperative behavior on self-seeking, utility-maximizing agents, and cooperative behavior between agents is often required for beneficial trading. In each instance of market failure-public goods, externalities, increasing returns, asymmetric information, unemployment- agents were asked to reveal information about their benefits or costs from trades with no guarantee that that information would not be used against them. Without that guarantee, information is concealed, trades collapse, and the market institution fails. Only through cooperatively sharing valued information can the preferred, Pareto-optimal allocation be achieved. A careful analysis of agent behavior in market failure problems reveals non-cooperation to be the dominant strategy of each self-seeking, utility-maximnizing player. The generic form of market failures is the Prisoner's Dilemma game repeated a finite number of times between many strangers. The equilibrium strategy in this game is to play non-cooperatively. Only by externally enforcing cooperative behavior upon individuals can the non-cooperative, market failure outcome by avoided. Government is one institution which has the potential to enforce the preferred cooperative outcome. Whether it can do so in fact, and at a reasonable cost, are our next questions. 3. The search for efficient government In every case where such cooperation is indispensible and salutary it is necessary for society; and while the government exacts from the members of society part of their liberty and wealth, the well-being thus obtained for all may make them support without regret the sacrifice imposed on them by the establishment of a government [J.B. Say, as quoted in Baumol (1967a, p. 187)].
Ch. 12: The "New" Political Economy
673
As an allocator of social resources, the task of government is to find and enforce a cooperative, Pareto-improving allocation. The difficulty is that each member of society has an incentive to conceal his true benefits and costs from cooperation in hopes of exploiting those who do reveal their true gains or losses. If we all cheat, the outcome will generally be inefficient. We must search then for a way of running government which minimizes such non-cooperative behavior and which does so with the expenditure of as little of society's scarce resources as possible. The first to contend that government might be organized efficiently was Wicksell (1896). Wicksell's idea was simple: if net benefits are created by a cooperative allocation, then a means of redistributing those gains through taxation could be found which would ensure that all citizens preferred the new allocation to the non-cooperative status quo. The new allocation would be unanimously approved; no re-allocations would be allowed unless all citizens agreed to the change. Wicksell reasoned that unanimous approval would offer sufficient protection to encourage voluntary, cooperative participation on the part of all agents. Wicksell's ideas were developed further by Erik Lindahl and are now called the voluntary exchange theory of government. In Section 3.1 below I review this view of government and, using the work of Olson (1965) and Malinvaud (1971), argue that the voluntary exchange theory is fundamentally flawed. Government must be coercive if it is to guarantee the participation of all citizens. Once we admit to the necessity of coercion we must confront an additional concern: how can we protect the citizenry from oppression, discrimination, or the violation of basic rights? At a minimum we would want a government which is non-dictatorial. Can we design a governmental (coercive) decision-making process which is (1) non-dictatorial, (2) capable of finding and enforcing Pareto-efficient resource allocations, and (3) does so in a way which uses as few recourses as possible in reaching its decisions? The answer, sadly, is no. Arrow (1963), Gibbard (1973) and Satterthwaite (1975) have provided us with two related theorems which show that there is no collective choice process which is non-dictatorial, always capable of finding Pareto-optimal allocations, and is decisionmaking efficient. Section 3.2 presents and discusses these important results. We must conclude that while markets are imperfect, so too are governments. Rather than searching for an ideal institution, we now see we must compare imperfect ones. Decentralized markets perform well against the standards of democratic choice and economy in decision-making, yet often fail against the standard of allocative efficiency. Since we are looking to government to improve upon the overall performance of markets, it is most useful at this stage to search for governments which are allocatively efficient. In light of the ArrowGibbard-Satterthwaite theorems, then, we must sacrifice either democracy or decision-making efficiency. In Section 3.3 I review the recent research on those
674
R.P. nman
social choice processes, called planning or demand-revealing processes, which seek efficiency but are dictatorial. In Section 3.4, I review the recent work on those collective choice mechanisms, such as majority rule, which are democratic but often inefficient. Only through the careful delineation of the performance properties of specific governmental structures can we hope to make an informed choice on the important matter of markets vs. governments.
3.1. Collective action as voluntary exchange The first to consider analytically the notion that collective (i.e. cooperative) activities might be provided voluntarily was Olson (1965) in his book, The Logic of Collective Action. Olson was particularly interested in the relationship between group size and the provision of the collective good. Voluntary groups form because individuals realize a congruence of interests in a particular activity, but individuals are always free to leave the group at any time. How might such groups behave in the provision of collective goods? Olson considered the archetypical case of a pure public good from which no individual can be denied its benefits. He hypothesized that individuals are more likely to act cooperatively when the group of potential beneficiaries is small than when it is large. Figures 3.1(a) and 3.1(b) outline Olson's arguments.37 Figure 3.1(a) examines the reaction of a typical agent to the provision of a non-excludable public service by other members of society. In Figure 3.1(a), X represents the provision by all others, while x is the individual's own contribution. The total level of services provided is therefore X + x. Initially, when X = 0, the individual will select that level of x which maximizes his or her own utility subject to an individual budget constraint. The budget constraint is line AA in Figure 3.1(a); the curved lines are the individual's indifference curves. The utility-maximizing outcome given AA is point 0, and the level x0 is the individual's own provision of the non-excludable public good. If all other agents behave identically, then an aggregate provision of (n - 1)x 0 = X0 will be provided by all others in the group. Since exclusion is not possible, our typical individual will be able to consume X0 without charge. Like any free good, X0 shifts the individual's budget constraint outward, in this case to A'A'. With the new budget constraint the individual prefers a new level of the public good equal to x 1, but only the amount (x 1 - X0 ) will be provided personally. Note, however, (xl - X0 ) may be 37 The discussion which follows is based on Chamberlain's (1974) development of Olson's analysis. The analysis assumes each agent acts as if his behavior will have no effect on the behavior of other agents. Comes and Sandler (1984) relax this assumption and allow for agents to anticipate the behavior of others in response to their behavior. Our central conclusion-voluntary behavior underprovides public goods - is unaffected by this extension.
675
Ch. 12: The "New" Political Economy
private goods y
Income Expansion Path
xo
A
Xo
X 1
Xmax
A
collective goods (a) provision by others (X)
X*(n-l)x
individual provision (b) Figure 3.1. Voluntary provision of public goods.
less than xO. As group consumption rises, individual contributions may decline. 38 When X moves beyond Xma in Figure 3.1(a), individual contributions fall to zero. What is the equilibrium outcome of this adjustment process? Figure 3.1(b) presents the analysis for n identical individuals, where n is allowed to increase 38 The case illustrated in Figure 3.1(a) is for a "normal" good where an increase in income or resources is allocated to all commodities. If the free level of X means an increase in the private good y as well as x, then the level of original expenditure on x must be reduced so the expenditure on y can increase. Thus for normal goods, as X increases, our contributions fall. This seems the most likely case.
676
R.P. nman
from 1 to oo. The curve RR represents a typical agent's reaction to varying levels of the aggregate provision (X) by the other, n - 1 agents. For the case in Figure 3.1(a), as X increases, x declines. This is true for each agent. In the case of identical agents, an equilibrium also requires that X= (n - 1)x. A line describing this relationship is drawn in Figure 3.1(b) as the ray from the origin with the slope of (n - 1). Equilibrium must be consistent with both individual utility maximizing behavior represented by the reaction curve, RR, and with aggregate group production defined by X = (n - 1)x. This is point E, where x * is provided individually and nx*(=(n -1)x* + x*) is provided by the group. As the
number of individuals in the group increases, (n- 1) increases and the equilibrium allocation shows declining individual contributions but increasing group provision. In this case, the increase in n more than compensates for the fall in X *, so nx * rises.
What is interesting from Olson's analysis is that at least some of the public good is provided voluntarily by members of the group, even when agents can "free ride" on their neighbors' provision. What is particularly important for this chapter, however, is the next question: Is the level of the collective good provided by voluntary provision Pareto optimal? The answer is no. For each individual acting voluntarily, x* (< x a) is defined by the tangency of his budget line, whose slope is the marginal cost of x, to his indifference curve, whose slope is his MRS evaluated at nx*: MRS(nx*)= MC. When agents are identical we can
multiply both sides of the individual equilibrium condition by n to give: ,MRS = nMRS(nx *) = nMC > MC. Olson's voluntary exchange theory of pub-
lic good provision implies marginal social benefits (MRS) exceed marginal social cost (MC). Too little of the public good is voluntarily provided for any finite n.3 9
Voluntary confederations of beneficiaries generally will not provide the efficient level of collective activities. What is required is a collective decision enforced across all beneficiaries. Yet, since everyone can benefit from a collective action can we not design a collective choice outcome which everyone will approve and thus voluntarily support? Wicksell (1896) thought so and proposed governmental decision-making by unanimous consent. Wicksell argued that it ought to be possible to ensure that any Pareto-improving government decision had the unanimous approval of all citizens. The trick is to adjust taxes to benefits from a given public activity: "Provided the expenditure in question holds out any prospect at all of creating utility exceeding costs, it will always be theoretically possible and approximately so in practice, to find a distribution of costs such that 39 Pauly (1970) reached the same conclusion for the more general case in which consumers are not identical. There have been numerous experiments to test individual behavior in voluntary associations and the results generally confirm Olson's predictions; see Manvell and Ames (1981). Palfrey and Rosenthal (1983) provide a review of the evidence and a more general analysis of individual strategic behavior in voluntary associations.
Ch. 12: The "New" PoliticalEconomy
677
all parties regard the expenditure as beneficial and may, therefore, approve it unanimously" [Wicksell (1897, pp. 89-90)]. Unanimity protects each citizen from exploitation and therefore, Wicksell reasoned, should encourage agents to reveal their true costs and benefits of collective activity. Furthermore, once a cooperative allocation is obtained, unanimity should minimize any enforcement costs required to maintain the cooperative outcome. It remained for Erik Lindahl (1919) to make Wicksell's insight precise. The adjustment process proposed by Lindahl was meant to describe how a government might work in practice to set the level of public spending.4 0 Two parties, A and B, are to determine the level of public expenditure (G) and the division of the required taxes between A and B. The tax share of expenditures paid by A is denoted h while the remaining share (1 - h) is paid by B. Hoping to ensure that each party to the bargain will cooperate, Lindahl followed Wicksell's lead and required unanimous agreement to any final allocation. Lindahl viewed government as an auction process, not unlike a market place. A tax share is proposed to parties A and B who, in turn, respond by quoting their preferred levels of public spending. If the spending levels differ, new tax shares are proposed which are higher for the high demander and lower for the low demander. The new tax shares should increase preferred public spending of the low demand agent and reduce preferred spending of the high demand agent. The process continues until the preferred spending levels coincide and unanimity prevails. Johansen (1963) formalized the Lindahl process in a simple general equilibrium model. The Johansen framework, presented in Figure 3.2(a), allows us to see how the Wicksell-Lindahl process was designed to achieve the Pareto-efficient (i.e. cooperative) allocation of public goods. It will also help us to understand just why the process is likely to fail. The curve marked AA is the demand curve of person A and represents A's preferred level of public spending for each tax share, h, A could be asked to pay. More G is demanded by A as h falls. The dashed lines denoted a'a', a "a" and a "'a "', which peak along AA, are three indifference curves for party A and indicate various combinations of h and G along each dashed line which are equally valued. For low levels of G, an increase in G is highly valued so A will pay a higher tax share to remain indifferent (the indifference curve rises toward AA). After a point, however, private income is valued more highly than G and person A must be compensated by a lower tax share as G expands (the indifference curve falls away from AA). The peak of each indifference curve is on AA. Since player A is better off by receiving more G and paying a lower share, A wishes to move as far down the AA demand curve as possible. A similar analysis 40 While the Lindahl process is generally described for public goods, in principle it can also be applied to other communal or shared activities, including externalities, lumpy capital investments, or information. See Bergstrom (1971), Weitzman (1978), and Dasgupta, Hammond and Maskin (1980).
678
R.P. Inman
A's share B's share
U
B
Gw
public spending (a)
A's share
B's share
B
B public spending (lb) Figure 3.2. The Lindahl process.
describes player B's demand curve, BB, and corresponding indifference curves: b'b', b"b" and b "' b ". Note, however, that B's tax share is 1 - h, so B prefers to move as high up the BB schedule as possible and have A pay a large share (h) of public expenditures. The Wicksell-Lindahl process entails the use of a government auctioneer who announces a tax share h and asks each player to respond with their preferred level of G. If a high h is announced, G demanded by
679
Ch. 12: The "New" Political Economy
B exceeds that level of G demanded by A. The auctioneer realizes that h must be lowered to increase A's demand and to lower B's. The auctioneer continues to lower h until A's demand rises and B's falls sufficiently so as to equalize their prefereed G's. This occurs at point 1 in Figure 3.2(a) where both demand Gw and pay the respective tax shares h
and (1 - h).
This is Wicksell's point of
unanimity, also known as a "Lindahl equilibrium." Note too that the two indifference curves a"a" and b"b" are tangent to each other at point 1; thus, there is no realignment of G and h which can make both parties better off. Point 1 is therefore Pareto optimal.4 1 Wicksell-Lindahl's government auctioneer has therefore succeeded in leading our two non-cooperative agents to a cooperative, Pareto-optimal outcome. The Wicksell-Lindahl process has apparently succeeded in finding and enforcing a Pareto-optimal cooperative allocation at the cost of a neutral government auctioneer cum tax collector. Matters, however, are not so easy. By requiring unanimity, the Wicksell-Lindahl process in effect grants a veto to the low demand agent. This agent can stop the process unless compensated by a lower tax share. It is therefore in the interest of both parties to position themselves so as to appear to be the low demand party. The low demand veto, rather than encouraging the truthful revelation of preferences, becomes a weapon for manipulating the auctioneer. For example, if A 41
Any move from point 1 will necessarily push one party to a lower indifference curve, and hence such a move cannot be Pareto optimal. We can also show that point 1 also satisfies Samuelson's efficiency condition for pure public goods, as it should. At point 1 we know party A pays h, of government taxes and receives Gw in services. Further, this combination (h,, G,) is on A's demand curve AA, and therefore G must be A's preferred spending level given a tax share of h,. That is, Gw is a solution to A's allocation problem: max Ua(ya, G) subject to the budget constraint a = hwG + ya, where ya are private goods and la is pre-tax income. G, therefore satisfies the requirement that A's marginal rate of substitution of ya for G equals A's "price" of another unit of G which is h: { -(aUa/aG)/(aUa/aa)} = h. Point 1 is also on B's demand curve, and therefore B's marginal rate of substitution of yb for G equals B's "price" of another unit of G which is (1- h,): (- (aUb/aG)/(OUb/aYb)) = (1 - h,). Finally, in the Lindahl framework, government spending equals a constant marginal (= average) cost per unit of public output times the level of output: G = (MC) X. Thus, aU/aG = (1/MC) (aU/ax). Substituting the expression for aU/aG into each party's marginal rates of substitution of Y for G gives: ( - (aUa/aX)/(aa/aya)}(1/MC)= hw and (- ( aub/a X)/( ab/ayb) ) (1/Mc) = (1 - h w) Adding the two equalities together and noting that the expression in curly brackets is each agent's marginal rate of substitution of Y for X (denoted earlier as MRS), implies: MRS a + MRSb = MC, which is Samuelson's condition for efficiency in the provision of public goods.
680
R.P. nman Player A
Player B
r
N
C
|
N
C
Point 3
Point 2
Point
Point
4
1
Figure 3.3.
knows B's demand curve and assumes B will tell the truth, then A can improve on his pay-off over that available at point by understanding his demand curve to be A'A'; see Figure 3.2(b). The resulting Wicksell-Lindahl equilibrium will then be point 4 where A achieves the pay-off on indifference curve a "' a "'. The false demand curve A'A' is just that curve which will maximize A's well-being given that B tells the truth; a "'a "' is tangent to BB at point 4 and is the best (lowest) indifference A can obtain given player B responds according to his true demand curve BB. Since A and B are equal players, however, B can play the same strategy and give an understated demand curve B'B' in hopes of exploiting A's true demand curve, AA. When B lies and A tells the truth, the resulting Wicksell-Lindahl equilibrium will be at point 2 giving B a pay-off on the preferred indifference curve b "'b "'. Yet when both players adopt the lying, non-cooperative strategy the outcome will be a point such as 3; an outcome inferior to the cooperative outcome at point 1. Malinvaud (1971) first emphasized the inadequacies of the Wicksell-Lindahl process as a mechanism for discovering a Pareto-optimal cooperative outcome. The process suffers from just the same difficulties which plagued the market system as an allocator of public goods. The allocation game in Figure 3.2(b) can be described by the pay-off matrix of outcomes shown in Figure 3.3, where N is the non-cooperative lying strategy and C is the cooperative truth-telling strategy. From the player's indifference curves, we observe the ordering of outcomes for player A as 4 > 1 > 3 > 2 (where " >." indicates preferred) while that for player B as 2 > 1 > 3 > 4. With these pay-offs, both players find non-cooperation is the dominant strategy within the institutional structure of the Wicksell-Lindahl process. The game is a Prisoner's Dilemma game. Rather than ensuring the cooperative allocation, the Wicksell-Lindahl process has simply moved noncooperative behavior from the marketplace into the house of government.4 2 If 42
These orderings of outcomes in Figure 3.2(b) (points 1, 2, 3, 4) is only one of several possible orderings. While point 4 is always best for A and point 2 is always best for B and while 1 is preferred to 2 for A and 1 is preferred to 4 for B, point 3 may be overall worst (e.g. 4 > 1 > 2 > 3 for A and 2>1>4>3 for B) or 3 may be second best (4>3>1>2 forA and 2>3>1>4 forB). These alternative orderings turn the allocation game into a game of "chicken" which Lipnowski and Maital (1983) analyze and show will not have a stable, Pareto-optimal allocation. Our main result, that the Wicksell-Lindahl process cannot guarantee an efficient outcome, is valid generally.
Ch. 12: The "New" Political Economy
681
government is to find and enforce a Pareto-superior cooperative allocation, we must provide the auctioneer-tax collector with an allocation process, other than the Wicksell-Lindahl scheme, which encourages participants to reveal their true benefits and costs from public activity. 4 3
3.2. Coercive mechanisms of collective choice: Democracy s. efficiency With the inability of the Olson or the Wicksell-Lindahl voluntary exchange mechanisms to guarantee an efficient allocation of social resources, we need to ask whether a coercive mechanism of collective choice might achieve efficiency. A coercive collective choice mechanism is one which requires individual agents to provide information about their preferences and/or their economic endowments and, when asked, to contribute resources to facilitate the collectively decided allocations. We need make no assumption about whether agents will act honestly and truthfully when participating in collective decision-making - indeed, that will turn out to be the central question - but we do require their participation. A coercive collective choice mechanism may be either democratic or dictatorial. A democratic mechanism respects the preferences of each individual, and the expressed preferences of all participants have bearing on the final resource allocation chosen by the mechanism. While democracy does not guarantee each agent will receive his favorite outcome, it does guarantee his preferences will be heard when reaching a collective choice. In contrast, a dictatorial collective choice mechanism is responsive to the preferences of only one individual, the dictator. All other participants can be ignored. As our task is to find a collective choice process which we can compare to a market process as a vehicle for achieving efficient resource allocations, it seems reasonable to ask that that mechanism be democratic rather than dictatorial. Markets clearly respect individual preferences, given initial endowments, and at least at this stage of our inquiry we should ask no less of a coercive collective choice mechanism. Our problem becomes this: Is there a democratic collective choice process capable of allocatingsocietal resources efficiently, even when markets cannot? Arrow (1963) was the first to pose this question precisely, and within his framework the answer is, no, democratic processes are not generally efficient. More recent work by Gibbard (1973) and Satterthwaite (1975) has extended Arrow's analysis to a more general behavioral structure and they reach essentially 43 These comments are not meant as a criticism of the notion of the Lindahl equilibrium as a normative concept. A Lindahl equilibrium, if reached, is Pareto efficient, and as each party pays a marginal tax equal to their marginal benefit the concept may have some appeal as a "fair" equilibrium as well [Varian (1978, p. 200)]. The concept of the Lindahl equilibrium is used extensively in models of public choice; see Roberts (1974). The criticism here is only of the Wicksell-Lindahl process, which generally fails to achieve its stated (and normatively attractive) objective, the Lindahl equilibrium.
682
R.P. nman
the same conclusion. We stand, therefore, in the face of a dilemma. Any collective choice mechanism which we might design must be imperfect: either efficient but dictatorial or democratic but inefficient. And we must choose. We shall examine the performance properties of specific dictatorial and democratic mechanisms in Sections 3.3 and 3.4 below. First it is important to understand just why it is we face this difficult choice. 3.2.1. Arrow's Possibility Theorem The Arrow Possibility Theorem clarifies in a most fundamental way what democracies can, and cannot, do. Arrow proposed five axioms which he argued might be reasonably required of any collective choice process; the theorem proves the inconsistency of the five axioms.4 4 They are: (i) Pareto optimality (P). If everyone prefers some alternative x to another alternative y (denoted xPiy, for all individuals i), the collective choice process should prefer x to y (denoted xPy). If we wish a collective choice process to find a Pareto-optimal allocation of social resources - in particular, to solve the problems posed by market failures - then we must impose the axiom of Pareto optimality. (ii) Non-dictatorship (ND). No individual has full control over the collective choice process such that that individual's preferences over alternatives are always decisive, even when everyone else prefers just the opposite. For example, there is no person i such that when i prefers x to y (denoted xPiy) but everyone else prefers y to x (yPjx, for all j, j i) the collective choice process prefers x to y (denoted xPy). Imposing non-dictatorship seems an important starting point for any analysis of collective choice processes which we hope will respect the preferences of all individuals in society. The last three axioms all have convenient interpretations as means of insuring an efficient decision-making process. They seek to ensure that societal decisions will in fact be made when individuals can provide the required information for collective choice, and that that information will be used as parsimoniously as possible. (iii) Unrestricted domain (U). The collective choice process is capable of reaching collective decisions for all possible combinations of individual prefer44Arrow originally proposed six axioms and proved their inconsistency. The axiom missing from the list here is the axiom of positive association. Positive association requires that if individual preferences over alternatives are changed for some reason so that an alternative x moves up in some individual's ordering and does not fall in others' orderings, then if the collective choice process originally preferred x, then x must remain preferred after the change in preferences. This axiom has been shown to be unnecessary to the proof with a slight strengthening of Arrow's other axioms. See Sen (1970a) and Vickrey (1960). Positive association still has a role to play in our analysis of collective choice, however. See the discussion below of the Gibbard-Satterthwaite theorem on strategy-proof voting processes and of May's theorem for majority rule.
Ch. 12: The "New" Political Economy
683
ences orderings of the alternatives. To exclude certain combinations of orderings from consideration by the collective choice mechanism clearly limits the mechanism's usefulness. Either no decisions can be reached when an excluded combination of orderings appears, or if we ignore those excluded combinations and do make a collective decision, we have denied all individuals with those rankings a chance to participate in society's decisions. (iv) Rationality (R). The collective choice process is required to be rational: first, in the sense that it can completely rank all alternatives presented for consideration by stating x is preferred to y (xPy), or y is preferred to x (yPx), or x and y are equally as desirable so we are indifferent (denoted xly); and, second, that all rankings display the property of transitivity. The requirement that the collective choice process give a complete ranking of all alternatives simply means we want our mechanism to be able to choose between alternatives. Completeness seems innocent enough. Requiring transitivity, however, is more controversial. Arrow required the collective choice mechanism be transitive with respect to both the strict preference relation (called P-transitive)-if x is preferred to y (xPy), and y preferred to z (yPz), then x is preferred to z (xPz) - and with respect to the indifference relation (called I-transitive)- if x is indifferent to y (xly) and y is indifferent to z (yIz), then x is indifferent to z (xlz). As we shall see, the Arrow transitivity requirement is most demanding but it has obvious decision-making advantages. Requiring P- and I-transitivity along with completeness ensures that the collective choice mechanism will give a complete ordering of all alternatives [Sen (1970, ch. 4)]. (This translation of individual preference orderings into a complete social ordering of alternatives Arrow called a "social welfare function.") Knowing alternatives are part of a complete ordering can save on decision-making costs. For example, with a complete social ordering, to rank any three alternatives we need to make only two comparisons. If we discover xPy and yPz or ylz, then we know without additional use of the collective choice mechanism that xPz. There is an additional advantage to requiring P- and I-transitivity. We avoid cycles of indecision often called voting cycles. If xPy and yPz, then if P-transitivity is violated zPx. In this instance, no one of the three alternatives clearly dominates the others. Which alternative is to be preferred is not at all clear. Arrow's transitivity requirement embodied in the axiom of rationality asks us to find a collective choice mechanism immune to the indecision of voting cycles. (v) Independence of Irrelevant Alternatives (I).
Requirement I states that the
selection of either of two alternatives by the social choice process must depend on the individual's orderings over only those two alternatives, and not on individual orderings over other alternatives. The central advantage of this requirement is again efficiency in decision-making: when axiom I applies, the comparison of alternatives x and y does not require us to consider the availability or the characteristics of some third alternative, z. While this axiom is appealing as a
684
R.P. Inman
source of economy in collective choice, it has its price. The axiom excludes the use of cardinal orderings in deciding social choices. Cardinal utility indicators permit us to introduce measures of preference intensity as relevant information in deciding collective choices. If one voter prefers x to y, while another prefers y to x, perhaps that voter who favors his first choice more intensely ought to have his way? Such arguments worried Arrow (1963, pp. 9-11), however. The choice of the cardinal utility indicator can be manipulated to affect the collectively chosen outcome. Yet there may be no compelling ethical or economic reason to prefer one cardinal utility indicator over another. 45 It seems reasonable, at least at the start, to see if we might find a collective choice process whose decisions are not sensitive to the arbitrary choice of such an indicator. Requiring the axiom of independent of irrelevant alternatives removes this possibility. Each of the five axioms above seems to be a plausible restriction as we search for a coercive, collective choice mechanism. We want the mechanism to be non-dictatorial (axiom ND), to find and enforce efficient allocations particularly when markets cannot (axiom P), to work for all possible combinations of individual preferences (axiom U), and to rank alternatives and to do so with an economy of process (axioms R and I). Does such a collective choice mechanism capable of ranking all alternatives exist? The answer is no. In his famous Possibility Theorem, Arrow showed that there is no collective choice process which can satisfy the five axioms P, ND, U, R, and I. The proof of Arrow's theorem has been examined, dissected, and refined since its original presentation in 1953, and one particularly straightforward presentation is given in Figure 3.4.6 4 The strategy of the proof is simple. In the first stage, a set of individuals, denoted as D, is defined as a "decisive set" over two alternatives x and y under a particular collective choice process when all members of D express a preference for x over y, all individuals not in D express the opposite preference, and the social choice process prefers x over y. It is then shown [steps (3)-(7)] in Figure 3.4] that this group which is decisive for x vs. y is also decisive over any other pair of alternatives. In the second stage of the proof, the original decisive group is split into two subsets, A and B, and it is shown that one of these two groups is a decisive set [steps (9)-(14)]. The process of splitting the decisive group continues until we reach a decisive set of one individual whose preferences control the collective choice process [step (15)]. That person is a 45
Suppose person 1 prefers x to y and assigns a cardinal utility value of 10 to x and a value of 5 to y. Person 2 prefers y to x and assigns a value of 6 to y and a value of 2 to x. Person values x "5 utils" more than y while person 2 values y "4 utils" more than x. Do we therefore conclude x is collectively preferred to y? What if person 2 adopted a different cardinal indicator and assigned a value of 12 to y and value of 4 to x? Then person 2 prefers y by 8 "utils" over x, and should we then conclude y is collectively preferred to x? Who is to decide which cardinal indicator of utility is "correct" and must all individuals have the same cardinal indicator? Violating axiom I means we must 4 6 answer these difficult questions. The proof here follows the outline in Vickrey (i96).See Sen (1970a) or Fishbumr (1973) for a review of the literature on the proof of the Arrow theorem.
Ch. 12: The "New" Political Economy
685
Theorem. The five axioms- the Pareto principle (P), non-dictatorship (ND), unrestricted domain (U), rationality (R), and independence of irrelevant alternatives (I) - cannot be simultaneously satisfied by a collective choice process. (1) Definition: A set of individuals, denoted D, is called decisive over alternatives x and v if the collective choice process yields a collective preference for x over y, whenever all members of D prefer x to y, and everyone else prefers y to x. (2) Assume a set of individuals D is decisive for x over y. (3) Assume all members of D prefer XPDVPDZ and all others not in D (denoted N) prefer yPNZPNX (use of axiom U).
(4) The collective choice process yields xPy (definition of decisive). (5) The collective choice process yields yPz (use of axiom P). (6) The collective choice process therefore yields xPz (use of axiom R). (7) Only members of D prefer x to z. Furthermore, society's preference of x over z must hold no matter the ranking of y (use of axiom I). Therefore the set of individuals D is decisive for x over z (definition of decisive). By repeating (2)-(6) D can be shown to be decisive over all pairs of alternatives. (8) D must contain two or more individuals (use of axiom ND). (9) Divide D into two non-empty sets, A and B, and assume for all members of A, xPA YP, z, and for all members of B, zPBxP 8 y. The set of citizens not in D, set N, continues to prefer yPNZPNx (use of axiom U). (10) There are three possible outcomes for society's choice between z and : either zPv, or yPz, or z and y are indifferent (denoted zly) (use of axiom R). (11) If zPy, then as members of set N and of set A prefer y to z and only members of set B prefer z to y, set B must be decisive (definition of decisive). (12) If yPz or ylz, then as xPy [step (4) above] the social choice process gives xPz by transitivity (use of axiom R). (13) However, only members of set A prefer x to z. Thus set A must be a decisive set (definition of decisive). (14) The original decisive set D can therefore be reduced to a smaller decisive set, either set B [step (11)] or set A [step (13)]. (15) Repeat steps (9)-(14) for the new decisive set. Repeat the argument again until the decisive set consists of only one member and cannot be subdivided. A decisive set of one member is a dictatorship. Thus, axiom ND is violated. Figure 3.4. The Arrow Possibility Theorem.
dictator. The consequence is that the four Arrow axioms - P, U, R, and I - are inconsistent with the fifth, ND. 3.2.2. Can we escape from Arrow's Theorem? The Arrow Possibility Theorem is true; there is no escaping that fact. The five axioms above cannot all be satisfied simultaneously by a collective choice process. There is only one way out and that is to discard one or more of Arrow's original requirements. Perhaps one of the axioms asks too much. On closer examination we might wish to relax one or more of the requirements. Perhaps then we can prove there does exist a collective choice process which satisfies this new, less demanding, set of axioms. This is the strategy which has occupied social choice theorists for the past twenty years. The results are generally discouraging. 47 4
For a more detailed review, see Sen (1983).
686
R.P. nman
(i) Relaxing Pareto optimality. This axiom hardly seems like one we would want to sacrifice, but for completeness we should explore the implications if we do relax it. Wilson (1972) reconsidered Arrow's theorem without enforcing the Pareto principle and shows that either the resulting collective choice mechanism is always indifferent between alternatives regardless of individual rankings or the mechanism produces a "negative" dictator, someone whose preferences over alternatives are reversed to form the social ordering. Neither alternative is very appealing. Since the Pareto principle is such a crucial component to our comparison of markets and governments, and since virtually nothing is gained by dropping it, it seems wise to look elsewhere for a way around Arrow's challenge. (ii) Relaxing non-dictatorship. The theorem itself shows that if we will tolerate a dictator we can satisfy the remaining four requirements. There are problems for the dictator, however. As Gibbard and Satterthwaite have shown, there is nothing in the remaining four requirements stated above - P, U, R, and I - that guarantees the other members of society will truthfully reveal their preferences to the dictator. An "optimal" allocation based on false preferences is a hollow victory. Yet obtaining true information cooperatively is surely a central problem for most dictatorships, even benevolent ones, seeking to maximize citizen welfare. How a dictator might overcome the problem of preference revelation is examined in detail in Section 3.3 below. For now we retain the axiom of non-dictatorship. (iii) Relaxing unrestricteddomain. The axiom of unrestricted domain requires
that all possible combinations of preference orderings be allowed for consideration by the collective choice process. To do otherwise is to deny those individuals who happen to have excluded preferences a say in the collective decision. Either we ignore them, which is discriminatory, or we say that no decision can be made when those combinations of orderings are expressed. Neither discrimination nor indecision is attractive. There is one happy circumstance in which all our problems disappear, however. This is a case which we will consider in detail in Section 3.4 below and which involves the use of pair-wise voting by majority rule (satisfying ND and I) to search for efficient budgetary allocations (satisfying P) by comparison of alternatives which yields orderings which are both complete and transitive (satisfying R). Only axiom U is violated through the imposition of a preference restriction on voters. Acceptable orderings are called "value restricted" and satisfy an axiom due to Sen (1966) in which voter preferences over every combination of three alternatives has the property that all agree some alternative is never worst, or some alternative is never best, or some alternative is never medium. Sen's condition of value restricted preferences sets a minimum degree of voter compatibility to ensure the majority rule process is transitive. A prominent example of preferences which satisfy Sen's condition are preferences which, when considering alternatives which can be ordered along one dimension (e.g. the size of the budget), yield orderings which are "single-peaked". (See Figure 3.9 below.) We shall consider this case in detail in Section 3.4. When
Ch. 12: The "New" Political Economy
687
voting, say, on public spending for a single public good, a majority rule process has the ability to find Pareto-optimal allocations. As we shall see in Section 3.4, however, this result is of only limited usefulness. The plausible restrictions on preferences which gave unique, majority rule outcomes in one dimension are no longer sufficient for unique majority outcome when policy assumes two or more dimensions-for example, taxes and spending. What is sufficient when policy alternatives have more than one dimension are limitations on preferences which are so restrictive as to make their realization extremely unlikely. (See Section 3.4 below.) Again what originally seemed a promising escape route has proven to be a dead end. If there is to be a generally satisfying way out of Arrow's paradox it will have to be in relaxing one of the last two axioms. (iv) Relaxing rationality. Perhaps the least defensible of Arrow's requirements is transitivity, the cornerstone of the rationality axiom. Arrow had sought a collective choice mechanism which would provide a complete ordering of all social alternatives. A complete ordering requires completeness and both P-transitivity (denoted xPy & yPz
xPz) and I-transitivity (denoted xly & ylz - xlz).
While knowledge of a complete ordering of alternatives has its advantages for collective decision-making, we do not often need all the information such orderings provide. In most instances, we seek only a best alternative from a set of options, not a complete ranking of all the options. (Who cares what policy was tenth best?) It is important to ask what might happen to Arrow's Possibility Theorem if we no longer require our collective choice mechanism to provide a complete ordering of all alternatives but rather only require that it find a "best" alternative? Relaxing the Arrow requirement that the collective choice mechanism be both P-and I-transitive drops the need for the mechanism to give a complete ordering, but retains the possibility that the mechanism can find a best alternative. Such mechanisms are often called "social decision functions" or "social choice functions" (as distinct from Arrow's "social welfare" or "social orderings" function). What happens, then, if we drop I-transitivity and required only P-transitivity? Plott (1976) has shown that one of Arrow's original arguments for requiring both P- and I-transitivity in fact needs only P-transitivity. Arrow wanted the collective choice mechanism to produce preferred outcomes which were independent of the "path" or agenda of alternatives used to decide the outcomes. Such path independence can be obtained when P- and I- transitivity hold, but only P-transitivity is really necessary. What happens if we permit intrasitive indifference and require only strict preferences to be transitive? Gibbard (1969) provided the answer: We will have collective decisions by oligarchies. Gibbard proved that every collective choice process satisfying P-transitivity and the other Arrow axioms of P, U, and I will generate a select group of individuals with the following powers. Any time members within the group agree over a pair of alternatives (say, xPiy for all i in the group) then the society chooses the group's
688
R.P. Inman
preferred alternative (xPy) no matter what others prefer. Furthermore, when the members of the group disagree, each member still has the power to veto any alternative they dislike (if xPiy for one i in the group, then yPx cannot occur). No collective decision will be made in such instances. We can conclude that relaxing Arrow's original rationality axiom by asking for just P-transitivity, or equivalently path independence, buys us very little. We go from a dictatorship by one to either a dictatorship by committee or, if the committee is large and cannot agree, to instances where no collective choice is made. Blair and Pollak (1981, 1983) push the relaxation of rationality to its limit. They ask the collective choice process to satisfy what might be viewed as a minimal rationality restriction: acyclicity. Acyclicity is weaker still than P-transitivity; their hope is to escape rule by oligarchies. Acyclicity requires only that the social choice process produce a clear winner which is preferred to all others in pairwise comparisons. Whether the losing alternatives satisfy P- or I-transitivity is not crucial. We are looking only for a winning alternative. The results are discouraging. The Blair-Pollak theorem shows that collective choice rules which satisfy the usual Arrow axioms P, U, and I plus the minimal rationality requirement of acyclicity will always produce an individual who has veto power (i.e. can ensure no choice is made except his choice) over a possibly sizable fraction of all pairwise comparisons. When the number of alternatives (a) is at least four, and the number of alternatives exceeds the number of voters (a > n), then the fraction of all pairwise comparisons which can be vetoed by at least one individual is (a - n + 1)/a. As a increases, this fraction approaches 1. The force of individual veto power is either to give to one individual unilateral power to block an alternative from adoption or, when two individuals who disagree have veto power over the same pair of alternatives, to force social indecision in the ranking of those alternatives. Unfortunately, the Blair-Pollak theorem is likely to apply to most budget allocation problems when we ask the collective choice rule to set the level and the distribution of many public goods and private income (via taxation). In the very simple case of three voters and three distinct bundles of a public service, there are six different distributions of services, each corresponding to a social alternative. 4 8 With a = 6 and n = 3, one individual voter can veto at least two-thirds (= 6 -- 3 + 1/6) of all pairwise comparisons. Expanding the list of alternatives to include several public goods and private income only increases the reach of individual veto power. Of course, we could limit the number of alternatives to be less than the number of voters - Ferejohn and Grether (1974) have shown acyclic collective choice rules can avoid the veto result in this case.- but then who is to decide which alternatives are to fall within 48
Suppose the three levels of the public good are low (L), middle (Mlv)and high (H), and the three amounts can be allocated across the three voters. The six alternatives ordered by person 's allocation, then 2's allocation, and then 3's allocation are: LMH, LHM, HML, HLM, MLH, and MHL.
689
Ch. 12: The "New" Political Economy Borda points for alternatives: w x y z Person 1: wPxPyPz Person 2: yPzPwPx Person 3: xPyPzPw Total points
4
3
2
1
2
1
4
3
1
4
3
2
7 8 9 6 Borda count winner: option y
Figure 3.5. Borda count voting.
the restricted set? Such agenda-setting power carries a veto or dictatorial implication of its own. We must conclude that the strategy of relaxing rationality, even down to the very weak condition of acyclicity, does not provide a very satisfying escape from the dilemma posed by Arrow's theorem. (v) Relaxing independence of irrelevant alternatives. Relaxing axiom I therefore seems to be our last chance for finding a compelling way out of Arrow's paradox. Indeed, when the theorem was first published, axiom I received the most criticism. The central complaint was that the axiom ruled out the introduction of potentially useful information on individual preference intensities. In fact, many commonly used voting schemes such as point-voting in which outcomes are chosen by the sum of scores from a ranking of alternatives will violate axiom L.49 If we can drop axiom I, the whole problem, as Samuelson (1967, p. 41) says " vanishes into thin air". The remaining four axioms - P, U, ND, and R - can be shown to be consistent. A typical collective choice mechanism used to illustrate this point is the so-called Borda count, named after its eighteenth century inventor Jean-Charles de Borda. In the case of a alternatives and n voters, each voter assigns a points to their first place alternative, a - 1 to their second place alternative, down to 1 point for their last place alternative. The Borda score for each alternative is the sum of points given to it by the n voters. The alternative with the highest score is the Borda winner. Figure 3.5 illustrates the use of the Borda count for 3 voters and 4 alternatives; alternative y is preferred. The Borda count is non-dictatorial; for example, person 2 who prefers alternative y is not decisive in all pairwise comparisons. The Borda count also satisfies the axiom of unrestricted domain, works for all preference orderings, and satisfies Arrow's rationality (i.e. is complete and is P- and I-transitive). Finally, the Borda count can find Pareto efficient allocations; a switch from a selected alternative (such as y) lowers the welfare of at least one person (such as person 2). The procedure does not satisfy axiom I, however. Axiom I requires that the ranking of any two 49 For a very interesting application of voting methods which violate axiom I, see the study by Dyer and Miles (1976) of the decision to determine the trajectory of the Mariner space probes.
690
R.P. nman
alternatives by the collective choice mechanism be independent of the ranking of any third alternative. In Figure 3.5, option y is preferred to option x. If all individuals retain their original preferences between x and y, but person 1 changes his ordering from w > x >y > z to x > w > z >y (1 still prefers x to y), then under the Borda count alternative x will receive 9 points, y will receive 8 points, z will receive 7, and w will receive 6. Now the collective choice mechanism prefers x to y, only because the ranking of some alternative options (w and z) had been changed. Clearly, the rankings with the Borda procedure are not independent of irrelevant alternatives. Furthermore, the Borda count is also sensitive to which options are actually included in the decision set. If we simply remove the worst option (z) in Figure 3.5 from consideration, then suddenly all remaining options (w, x, y) are viewed as equally valued by the Borda procedure; all alternatives receive a score of 6 (where 3 points are awarded first choice, 2 points to second choice, and 1 point to the last choice). How individuals report their rankings for all considered options and which options are actually considered can have a significant effect on the final outcome when axiom I is violated. Procedures which violate axiom I must gather preference information on all alternatives to rank any two, and, even more importantly, they are open to manipulation by the choice of the agenda and the stated preference orderings of the voters. Borda himself appreciated these problems and is said to have commented that "my scheme is intended only for honest men". 50 Dropping axiom I, therefore, does not appear to be a satisfactory way out of the dilemma of Arrow's theorem, Again we face a hard choice between decisionmaking efficiency and democracy. Processes which violate axiom I are likely to be expensive to run as all preference information must be considered. But matters are really worse than that. As we relaxed axiom I, a new problem emerged, one which significantly complicates our search for an attractive collective choice mechanism. It is the problem of preference manipulation or strategic voting. The central issue is well illustrated by our Borda count example. When all voters voted truthfully, the outcome (in Figure 3.5) was option y. Person 1, however, ranks y third and would prefer either w or x to y. Suppose voter 1 misrepresents his preferences and reports not his true preferences (w > x > y > z) but his
"strategic" preferences (x > w > z >y). Using a Borda count, the chosen outcome will become option x (9 points) while option y (8 points) falls to second place. Voter 1 can strategically manipulate his stated preferences to achieve a preferred allocation. Similar strategies exist for the other voters as well. Suddenly, our search for an efficient and democratic voting process has been complicated by a new order of difficult - what do we do about voters who lie? 50
Quoted in Black (1958, p. 182).
Ch. 12: The "New" Political Economy
691
Arrow's answer was to assume voters did not lie [Arrow (1963, p. 7)], and most all of the subsequent discussion of the Arrow theorem followed Arrow's lead. The manipulation of voter preferences was simply not an issue in the early evaluation and analysis of Arrow's theorem. It remained for Farquharson (1969) to bring strategic voting to our attention. Gibbard (1973) and Satterthwaite (1975) proved how all-pervasive the problem of preference manipulation really is in voting processes. The Gibbard-Satterthwaite theorem proved that every voting scheme violates at least one of three conditions: (1) the scheme is non-dictatorial; (2) truth (non-manipulation) is a dominant (always best) strategy for voters; or (3) the scheme considers three or more alternative policies. Simply put, democratic processes considering three or more options are always open to manipulation. There are collective choice processes which are immune to the strategic manipulation of preferences (called "strategy-proof' processes), but such processes must satisfy yet another axiom of decision-making-the axiom of positive association. 51 The axiom of positive association (denoted PA) says that if a collective choice process initially prefers an alternative x, and if some citizens then change their preference orderings so that x moves up relative to other alternatives, then x must remain the chosen alternative. When PA is joined with the Arrow axioms of rationality (R) and independence of irrelevant alternatives (I) then the collective choice process will be immune to manipulation by voters.52 Combining the results of the Arrow theorem with those of the GibbardSatterthwaite theorem yields the following, rather discouraging, conclusion: there is no collective choice process which is democratic (axiom ND), decision-making efficient (axioms R, I, U), allocatively efficient (axiom P), and immune to strategic manipulation (axioms PA, R, I). The following axioms of collective choice - ND, R, I, U, P and PA - cannot be satisfied simultaneously. 3.2.3. Conclusion
We have now exhausted all avenues of escape from the grip of Arrow's Possibility Theorem. Dropping any one of Arrow's requirements allows us to protect the others, but only at a price. We must sacrifice either efficient decision-making (including the truthful revelation of preferences), or allocative efficiency, or democracy. We stand, therefore, at a crossroads. The force of Arrow's theorem 51 It is interesting to note that Arrow originally included the axiom of positive association within his original set of axioms for describing an attractive social welfare function. Since he and subsequent analysts also assumed truthful voting, however, positive association was superfluous to establishing a conflict among the remaining axioms (U, I, P, R, and ND). See Sen (1970a, theorem 3.1). 5 2See Blin and Satterthwaite (1978, theorem 2). Blin and Satterthwaite show the tight logical connection of the Arrow theorem and the Gibbard-Satterthwaite theorem, deriving one from the other.
692
R.P. nmanl
and the Gibbard-Satterthwaite theorem has been to help clarify what lies down each path we might choose. Figure 3.6 is the roadmap. If we choose the democratic pathways which protect axiom ND (routes 1 and 2) we must realize that our collective choice mechanism will either be allocatively inefficient if we give up axiom P (route 1) or indecisive, informationally expensive, and manipulable if we give up axioms PA, R, U, or I (route 2). If we choose the third path of allocative and decision-making efficiency (retain P, PA, U, I, and R), then our collective choice mechanism is necessarily dictatorial. Sadly, there is no collective choice process which is democratic, decision-making efficient, and always capable of finding Pareto-optimal allocations. As markets are imperfect, so too are governments. Which path shall we follow? Our central concern here is to compare markets to governments as mechanisms for resource allocation. Since markets have failed us on the dimension of allocative efficiency it seems most fruitful to explore those avenues where government can potentially succeed against at least this criterion. If governments can allocate economic resources efficiently when markets cannot, then perhaps the price we pay in lost democracy (route 3) or in decision-making inefficiency (route 2) is low enough to justify a coercive collective choice process for that allocation. The theorems of Arrow, Gibbard, and Satterthwaite tell us that down either routes 2 or 3 there lies one collective choice process which satisfies the required, "signpost" axioms. Our next task is to find such processes and to examine their relative performance in finding allocatively efficient outcomes. We begin the search by first looking for dictatorial processes which satisfy allocative and decision-making efficiency. We then turn to an examination of inefficient, but democratic, decision-making processes. 3.3. Dictatorialprocesses: The dictator as a benevolent planner
A variety of dictatorial regimes are possible, but only one - the benevolent regime in which the dictator seeks to maximize some non-zero index of citizen wellbeing - seems worth discussing in detail. Repressive regimes may be allocatively efficient in the sense that the repressed cannot be made better off without making the dicator worse off, and they may be decision-making efficient since the dictator need know only his own preferences to make collective choices (thus escaping Arrow's dilemma), but such societies are likely to sacrifice so much in other basic human values as to be unacceptable (see the discussion in Section 5 below). A benevolent dictator, on the other hand, respects citizen preferences and makes a serious effort to allocate societal resources so as to maximize some weighted index of those preferences. The Lange-Lerner socialist planner is such a benevolent dictator; see, for example, the survey in Heal (1973). Unfortunately, the same fundamental problem which plagues markets or democratic decision-
Ch. 12: The "New" Political Economy
693
Arrow Possibility Theorem Gibbard-Satterthwaite Theorem
I
(2)
(1)
(3)
i Satisfy U, ,R,P PA)
Satisfy P, PA
Satisfy ND, U, I, R, PA)
IND,
sSatisfy 1, R
5
Satisfy 8 U I
|Satisfy U,R
If
I 1Lose U I Lose R
1Lose PI
l Lose I
Decision- Making Inefficiency
Allocative Inefficiency
)Lose ND I
Dictatorial Choice
P = Pareto Optimality ND= Non-dictatorship U = Unrestricted Domain R = Rationality (Completeness, P- and I-transitivity) I
= Independence of Irrelevant Alternatives
PA = Positive Association Figure 3.6. The performance of alternative collective choice processes.
694
R.P. Inman
making processes - the incentive to conceal and deceive - plagues the benevolent planner as well. The dictator's mantle does not carry with it the power to read the minds of men. If the efficient level of a public good is to be provided, the planner must know the preferences (the MRSs) of the individual consumers of the good. If efficient taxes are to be used to pay for public services or to redistribute income, then the planner must know individual resource endowments and how individuals will behave in the face of taxation. What one gains in the use of a dictatorial planning process of collective choice is a well-defined social welfare function - that is, a clearly articulated set of objectives. What we do not have, yet, is the information needed to put those objectives into action. The task here is to see if we can design a planning mechanism for collecting the crucial information on tastes, endowments, or behavior. 3.3.1. The MDP process Malinvaud (1971) and Dreze and de la Vall6e Poussin (1971) were the first to suggest an allocation process which had attractive incentive properties for the participants. Called the MDP mechanism, it is a planning process run by the benevolent dictator (central government) to determine the allocatively efficient level of a public good. The incentives of the citizens in such allocation problems are to act as free riders and to understate their true willingness-to-pay for the public activity. The MDP process encourages the truthful revelation of preferences by insuring that everyone who reveals their true willingness-to-pay cannot be hurt by such cooperative behavior. The worst that can happen is that they remain at their initial, pre-public good level of well-being. If everyone does cooperate, the MDP process converges to Pareto-efficient allocation. How does the MDP process work? In the simple two-person version public goods are provided at a constant marginal (equal to average) cost which can be normalized to $1 per unit of the public activity (X). Players are asked their willingness-to-pay for an increase in X which equals their marginal rate of substitution of income for X, denoted 7r, and evaluated at the current level of public activity (which may be zero). In the simple case of two players, A and B, they each respond with answers, gra and i b , which may or may not be their true marginal rates of substitution (MRS). The dictator-planner increases the provision of X if stated total marginal benefits, ra + 7 b, exceed marginal costs, $1. The increase in X, AX, is given by the spending rule: AX=p(ira + 7b - 1), where 4i is a positive constant. To pay for the increase in X, players A and B must contribute taxes, which they do according to the tax rules: AYa= ()(rb -
where Aya and
Ayb
-l)AX
and
Yb= ()(a
- 7b-
are the taxes paid (Aya < 0, Ayb
0). When services are reduced because ra + rb < 1, then subsidies are paid to each party (AYa > O, AYb > O) from the saved expenditure due to reduced public output (AX < 0). It should also be noted that these tax rules raise just enough money to pay for the increased public output, or distribute back to the voters no more than the resources released when public output is reduced. The MDP process is therefore economically feasible and always has a balanced public budget.5 3 We must ask two questions about the MDP process. First, if agents reveal their true marginal rates of substitution will the process find a Pareto efficient allocation? Second, what are the incentives to tell the truth? The answer to the first is easy. Since the process is always economically feasible at each step - neither wasting nor borrowing resources -and since it produces the public activity (X) until b(ra + rb - 1) = AX= 0, we know that in equilibrium,'/a + /rb = 1, which is the requirement for allocative efficiency (.MRS = MC) for public goods when all agents tell the truth. 54 But will agents reveal their true marginal benefits at each step? First, under the MDP process, agents are never made worse off by revealing their true marginal rates of substitution ( = MRS). Suppose party A tells the truth (Tra = MRSa), while party B understates his or her true benefits ( b < MRSb) at each step of the MDP process. Then if AX> 0, 7ra+ 7b 1 > 0, or srb + ra - I - 2 r a> a - 1)AX -2 a , or ()(7rb - Ta _- 1)AX> (_ 7a)AX. The expression (½)(ba is nothing more than the actual taxes paid (Ay < 0) by person A. The expression (- ra)(AX) is person A's true willingness to pay for the public activity (- aA X < 0). Thus, when the public good is increased, person A pays less in taxes than his or her maximum willingness to pay; person A is better off. What if A X < 0, perhaps because player B lied and stated a very low 7rb? Then ,a + 7rb 1 < 0, or by the same reasoning as before (noting AX< 0),½(Tb- Ta_ 1)AX> (-i a ) AX. Now (v b - a- 1) AX is person A's compensation because of the cutback in X, (Aya > 0), while (-sra) AX is the true amount A would demand
as compensation (- MRSaA X) for the cutback. In this case, A is paid more than the amount required to keep him indifferent; again player A is better off. Finally, when
a + rTb= 1, AX= 0, and both persons A and B remain at the status quo.
We conclude that with the DMP process, telling the truth will never hurt a player and it may make him better off. But is telling the truth always the best strategy to play in the MDP process? Sadly, no; there are advantages to lying. Consider the pay-off to player B in the above example where player A tells the truth and AX> 0. In this case, player B's 53 This is easily shown. Just add AYa + Yb and note that the total taxes raised just equals expenditures ($1 AX) when X> 0, while the total subsidy paid just equals released public resources ($1 A X) when A X < 0. 54 Whether the process converges to an equilibrium is also an issue. Drze and de la Valle Poussin (1971, theorem 1) show that when both parties tell the truth convergence is assured.
696
R.P. Inman
1) AX) are reduced as 7 b is made smaller; indeed B's taxes can become a subsidy if , a is large enough and 17b is small enough (AYb = ½(r a - b -1)AX> 0 if a> 1 and ,rb 0). Why not lie all the time? Because lying can make you worse off too. Cheating is a two-edged sword in the b < 1 so that AX< 0, we can show MDP process. If rb is too low and ,ra following the argument in the paragraph above that ()(r a- 77 b - 1) AX> - VbA X. The left-hand side of this expression is actual compensation paid to B (AYb > 0). In contrast to the case where B tells the truth (so rb = MRSb), the right-hand-side (-rbAX) is not B's true required compensation; in fact, it may be much less than B's true required compensation (- rbA X < -- MRSbA X, when 7rb < MRSb). The MDP process only promises that actual compensation will exceed stated compensation, not true compensation. If 7Tb is too low, it is possible for player B's actual compensation to fall short of true compensation when A X < 0; player B can be made worse off. We conclude, then, that the MDP process protects you from actually being harmed if you tell the truth, but does not guarantee that cooperation is always the best, or dominant, strategy. If you lie you can often do better than if you tell the truth, but at the risk (when A X < 0) of being made worse off relative to the status quo. In the MDP process, cooperative behavior is a minimax strategy; telling the truth minimizes your maximum loss. An extremely risk-averse agent will play cooperatively to avoid harm, but less risk-averse players may wish to take a chance on the cheating strategy. Thus, telling the truth is not a dominant strategy for citizens. There is no guarantee, therefore, that we will find the Pareto-optimal, cooperative solution we seek with the MDP process. 55
taxes (= AY b =
()(Ita - 77b -
3.3.2. The Vickrey-Groves-Clark mechanism The demand revealing (DR) process first proposed first by Vickrey (1961) and independently discovered by Groves (1973) and Clark (1971) is a planning mechanism which fully discourages cheating; telling the truth is a dominant strategy.5 6 In its basic form the DR process requires agents to submit a statement to the planner-dictator of their total benefits from the provision of all levels of the public activity, X. The stated total benefits from X for agent i are represented by the function Pi(X), which may or may not correspond to the 55 Roberts (1979) and Henry (1979) show the central difficulty that noncooperative behavior creates for the MDP process is not in finding an efficient allocation but in the speed of convergence to that equilibrium. Lying increases the time before convergence by a factor of N - 1, where N is the number of agents in the economy [Roberts (1979, p. 287)]. In economies with many agents who have finite lives, lying will effectively defeat the MDP search for a Pareto efficient outcome. For an excellent overview of the MDP literature, see Tulkens (1978). 56Green and Laffont (1977b) show all truth-dominant planning mechanisms must be a variant of Groves' mechanism.
697
Ch. 12: The "New" Political Economy
individual's true benefit function from the provision of X.57 The planner then collects all the submitted benefit schedules, aggregates them (EiN=lPi (X)), subtracts the social cost of providing X, ( =itPi(X)- C(X)), and finally provides that level of X which maximizes the difference between stated social benefits and total social costs. The chosen X (= X) is defined by an expression which looks a good deal like Samuelson's original condition for the optimal provision of public goods, /Y,'lTi(X) = MC(X), except ri(X) (= dPi/dX) may not equal agent i's true marginal benefit schedule [denoted, as before, as MRSi(X)]. Agent i may have lied. To prevent cheating the DR process makes clever use of the tax system needed to pay for the public activity. While the DR tax structure seems complicated, it has a compelling intuitive interpretation. Each agent i pays a tax (Ti) equal to
Ti
C(X) +
N N=
Si -
[P
C()0 N
where X is the planner's chosen level of the public activity, N is the number of citizens, C(X) is the total cost of the public activity, P(X) is the total benefit submitted by agent j, and Si is defined by
i
where X is that X which maximizes the expression YN_ i[Pj(X) - C(X)/N]. Note that X is the chosen level of X when agent i fails to submit his or her total benefit schedule. Thus, Si is the social surplus earned by all citizens other than i when i does not submit a total benefit schedule. Finally, the expression E% .[P.(X)- C()/N] is the social surplus earned by all citizens other than i when i does submit a total benefit schedule. Thus, T consists of two components: (1) i's equal share of total cost of paying for (= C(X)/N), and (2) the difference between the social benefits earned by everyone but i when i does not 57 The informational requirements of the DR process are to be contrasted with those of the Wicksell-Lindahl process and the MDP process. In the Wicksell-Lindahl process we needed just the preferred level of spending given tax shares and in the MDP process just the marginal benefit from a given level of X. Here we need the total benefit schedule, that is, a function which relates total benefits to all possible levels of X. Yet the DR process asks for information from the agents only once. The Wicksell-Lindahl and MDP schemes are iterative processes which require new information at each step. The question of which scheme manages the required information more efficiently is an important question, but one which this literature in particular, and economists generally, have not considered with much success. Is it better to ask for a lot of information once as in DR processes- yes, if collection is cheap and transmission expensive-or a little information often as in the MDP scheme - yes, if collection is difficult and transmission is cheap. Such considerations need to be made much more precise.
698
R.P. Inman
submit a benefit schedule for consideration by the planner and the social benefits earned by everyone but i when i does submit a benefit schedule. This second term measures the "externality" which i's statement about benefits imposes on the rest of society, and the DR tax scheme taxes this externality accordingly. How does this tax scheme ensure the truthful revelation of benefits? Agent i knows the planner-dictator will choose the level of the public activity to maximize net social benefit defined as N
i=1
P(X) - C(X),
which can also be written as
P(X)
-
C(X)N
+
CN ]
E j(x) jzi
Agent i will manipulate his message, Pi(X), so that the planner provides i's preferred level of X. That level is defined by maximizing i's own net benefits given i's true benefit, Ui(X), less i's tax payment:
U,(X)-IT= U (X)-
N
Si
N
'
which can be rewritten as U(X)-
C(X)
+ N,
-
N
]
i
This specification of the individual's objective function corresponds to the rewritten version of the planner's objective function given above except that (i) Pi (X) may not equal Ui(X), (ii) X need not equal X, and (iii) the addition of the term Si to the individual's maximand. But note that from the individual's point of view Si is fixed - it depends only on the messages of the other agents - and is therefore irrelevant to agent i's decision about what message to send to the planner. What is relevant is the possible difference between Ui(X) and Pi(X) and the subsequently chosen level of X by the planner. How can an individual manipulate the planner to do his or her bidding? That is to say, what message should person i send to the planner so that the planner selects that level of X which maximizes the individual's net benefits? The answer is P(X)= U(X);
Ch. 12: The "New" Political Economy
699
person i should always tell the truth no matter what the other members of society do. When you tell the truth, the planner's objective becomes identical to the individual's net benefit function: U(X) replaces Pi(X) and the relevant X for evaluating i's tax payments (r) is just that X preferred by agent i when his tax schedule is based on X = X. It is the tax scheme of the DR process - specifically the "externality tax" on i's message-which ensures that cooperative behavior will be the dominant strategy. With all agents revealing their true total benefit schedules from the public activity the planner-dictator can achieve a cooperative, Pareto-efficient outcome. The DR process therefore seems to have solved our problem of how to structure government decision-making, at least for dictatorial processes. Unfortunately, there are operational problems and they are serious.5 8 . First, a very specific form of the true benefit function is required for the DR process to ensure that truthfulness is the dominant strategy; the benefit function must depend on the level of X only - that is, U1( X). This is a very strong requirement. It says, in effect, that variations in how we value X cannot depend on how much of the private good we own. Wealth or income cannot determine our demand for public activities; the evidence is against such an assumption.5 9 Second, the DR process is not immune to manipulation by coalitions. Bennett and Conn (1976) show that it is not optimal for coalitions of agents to reveal their joint valuations for the public activity. If such coalitions form, the DR process cannot guarantee a cooperative outcome. Third, the tax scheme may take all the private income of some members of society. More generally, there is nothing in the DR mechanism which guarantees that all members of society will in fact be made better off in the new efficient outcome. The level of X is efficient but the tax scheme may extract so much from some members of society that they are worse off than they were in the original, non-cooperative allocation. When this threat of bankruptcy is significant, we can expect possibly large enforcement costs to running government through a DR process. Finally, and perhaps most importantly, the DR mechanism runs a budget surplus which must be wasted. Tax revenues from the DR process exceed the cost of providing the public activity. The DR tax raises revenue to cover production costs and also imposes a tax on the "externality" effect of each agent's message to the dictator-planner. It is this "externality" tax 58
A much more detailed discussion of these issues is in Chapter 10 by Laffont in this Handbook. 59The intuition behind this requirement is straightforward. If the level of our private goods determine our benefit from X, and the level of private goods depends on taxation, and in the DR process taxation depends on the message Pj(X) of others in society, we must then conclude that our utility from X, and hence our message P (X) to the planner, now depends upon the messages of all others. We no longer have the simple correspondence of the agent's message to the planner's objective function which made truthfulness the dominant strategy. A review of the evidence on the effects of income or wealth on the demand for public services is found in Inman (1979).
700
R.P. Inan
on messages which raises the surplus.6 0 Dropping the externality tax or returning these revenues to citizens will undo just what the tax has created: the incentive to truthfully reveal the benefits of the public activity.61 In effect, the resulting budget surplus is the decision-making cost of a government which is capable of ensuring that we will discover the allocatively efficient level of public activities. 3.3.3. Extensions
Is there any planning mechanism which cannot be manipulated by the citizenry, does not generate a budget surplus, and thus always finds the allocatively efficient level of a public good? That is, can we find a dictator-planner mechanism which dominates the Groves mechanism? Green and Laffont (1977b) and Walker (1980) provide an answer for a very general class of economic environments, and that answer is no. Walker's important analysis parallels and extends the GibbardSatterthwaite conclusions for manipulable voting systems to the more demanding setting of dictator-planner processes. If we have any chance of finding an efficient government it must surely be amongst dictatorial processes. Walker shows that even here we will necessarily fall short. Either the mechanism is decision-making efficient (i.e. no budget surplus) but open to strategic manipulation and thus unable to ensure the cooperative, allocatively efficient allocation or (as with the Groves DR mechanism) is capable of ensuring cooperative behavior but at the price of a wasted budget surplus. We can go either way. 60The sum of tax revenues is given by
,=
=
G(
N
)C ,= (
J
[ y(
N
)
)
The term iN (C(f)/N) = C(X) is the production costs of the public activity. If the second term on the right-hand side is positive, then total tax revenues is greater than production costs, and a surplus is earned. The second term is positive if
si
E
C(
> E
C[_
_
But remember S is evaluated at X which is the value of X which maximizes Dl~ i[ P (X) - C( X)/N]. Therefore Si is at least as great as ,N [P1 ( X) - C( X)/N], and may be larger. When S is larger, the expression in curly brackets- called here the "externality tax" on messages-will be positive, and a surplus above production costs results. Since tax revenues depend on all agents' messages, including each agent's own message, returning any surplus revenues suddenly makes each agent's tax net of returned surplus equal to some function T - f(X, P1,..., PN). But now an added term appears in the agent's objective function which breaks down the correspondence of the societal and individual maximands. Telling the truth will no longer be a dominant individual strategy.
Ch. 12: The "New" Political Economy
701
Groves and Ledyard (1977) advance a mechanism which has a balanced budget and is therefore decision-making efficient, but which is manipulable if citizens make allowances for how their messages to the dictator-planner might influence the messages given by others. If, however, everyone behaves as if their reported benefits or costs have no influence on the messages of others (called Cournot-Nash behavior), then Groves and Ledyard show that their mechanism will achieve the cooperative, Pareto-efficient allocation. Recent experimental work by Smith (1977) and Harstead and Marrese (1982) with a Groves-Ledyard process shows that the mechanism does succeed in finding a Pareto-efficient allocation in most cases. These experimental results are limited to the case of a few participants, however, and recent work by Muench and Walker (1983) show that the Groves-Ledyard process may fail to achieve an optimal allocation when there are many individuals reporting benefits. Allocation with many individuals is surely an important, if not the prominent, case. However, d'Aspremont and Gerard-Varet (1979) have extended the Groves-Ledyard results by devising mechanisms where truthful reporting is not simply the preferred Nash strategy (preferred when each player assumes all others ignore his message) but the preferred strategy in a more behaviorally realistic structure in which citizens allow for the effects of their messages on others and do so conditional upon a common expectation as to the distribution of preferences (and hence message behavior) in society. Recently, Cremer and Riordan (1984) have moved this line of research to its most promising limit by devising a mechanism where only one citizen, not everyone, need have truthful behavior as the expectationally rational strategy, that is, rational conditional on expectations of others' preferences. For all other citizens truthful behavior is the dominant (always preferred) strategy. Furthermore, both the d'Aspremont-Gerard-Varet and the Cremer-Riordan mechanisms are non-wasteful and thus achieve balanced public budgets. Unfortunately, the mechanisms are limited to the special case of separable, quasi-linear preferences-ui(x) +yi-where wealth or income (yi) has no effect on public
good demands. Rob (1982) has pursued the second line of inquiry and has examined more carefully the size of the budget surplus likely to arise from use of the Groves mechanism. How large are the decision-making costs needed to guarantee a cooperative allocation? Rob's analysis shows that as the number of agents in the economy increases, the budget surplus per agent from the externality tax of the Groves mechanism tends to zero. In fact, the surplus per agent shrinks so fast that even the total budget surplus declines as the number of agents increases. What is not included in Rob's analysis, however, but what must be a central component of any analysis of decision-making costs, are the costs of sending and processing each message from the citizens to the planner. Transmission of messages will involve a fixed cost per agent. Thus, increasing the number of
702
R.P. nman
agents may save on the budget surplus, but at the expense of an increase in transmission costs. A balance between the budget surplus and transmission costs can be achieved by selecting a cost-minimizing number of agents to sample from the national population. Green and Laffont (1977a) have explored the properties of such a sampling problem, and they conclude that for at least the Groves mechanism the optimal sample size (n*) is less than the full population (N), with N 1 / 3 an upper bound to n*. When N equals one million potential beneficiaries, for example, the optimal sample size will be no larger than 100. We should emphasize, however, that such sampling procedures do not guarantee the true allocatively efficient outcome, for we have not collected information from all citizens. What we do have is the best allocation possible given the fact that collecting information is costly. Yet in a world with decision-making costs, that is exactly what we require. 3.3.4. Conclusion We must conclude that in a world with sophisticated, rational agents, benevolent planner-dictator processes cannot guarantee economic efficiency. Processes which are capable of finding the preferred cooperative allocation are expensive to implement, while those processes which are relatively inexpensive to run are open to manipulation and thus may produce non-cooperative, allocatively inefficient outcomes. Planner-dictator processes suffer from a further defect: they are dictatorial and thus potentially unfair. In each of the broad class of mechanisms considered - MDP processes and Groves processes - the planner retains significant control over the distribution of the final economic surpluses which the mechanisms provide. How these surpluses are to be distributed will have an important effect upon the final well-being of all citizens and thus upon one's sense of the society as a just society. All the net benefits of cooperative activity may well be distributed to the dictator and his friends.6 2 Who, then, shall pick the dictator-planner and what institutions are to ensure his benevolence? Perhaps a democratic collective choice process can be found which is preferred? 62 Yet if the distribution becomes too unfair, members of society may well decide not to play the planning game at all. Enforcement costs with dictatorships can be high and rebellion is a real possibility. MDP processes are more attractive then Groves processes in this regard because the MDP process itself guarantees that you can never be harmed if you tell the truth; utilities of all cooperating participants are increased (or not reduced) at each step in the search for the cooperative allocation. The Groves mechanism, however, cannot offer that guarantee. In the extreme, the Groves "tax" may even drive some respondents into bankruptcy. Again we need to choose between those planning processes which are likely to have high decision-making costs (now the cost of enforcement) but can guarantee cooperative allocations (Groves mechanisms) and those processes with low decision-making costs but which cannot ensure a cooperative allocation (MDP processes).
Ch. 12: The "New" PoliticalEconomy
703
3.4. Democratic processes
Numerous democratic decision-making processes have been proposed for setting cooperative allocations, and almost all fall within either of the three strategies outlined under path (2) in Figure 3.6. Figure 3.7 provides a more detailed description of the alternatives. Majority rule processes (MRP) will protect axiom I - independence of irrelevant alternatives - and sacrifice either the axiom of Democratic Choice Processes
(2)
Satisf y ND, P PA
(i) Satisfy
os I Ls e
(ii) Satisfy
R U
Majority Rule Processes for Single-Peaked Preferences
(iii) Satisfy
Ue
U, I ILos e R
Lo s e
Majority Rule Processes With Structure- Induced Equilibrium
Preference Intensity Processes
Examples:
Examples:
Examples:
-Simple Majority Rule
-Simple Majority Rule
-Vote-Trading
-Agenda-Setter
With Special Districts -Agenda-Setters
-Logrolling With Interest Groups
l Decision-Making Inefficiency
I
Figure 3.7. Democratic choice processes.
704
R.P. Inman
universal domain (U, route 2, i) or the rationality axiom (R, route 2, ii). Alternatively point voting, logrolling, and various vote trading procedures - called "preference-intensity" processes (PIP) -will sacrifice axiom I (route 2, iii) but protect axioms R and U. The structure and performance of various majority rule and preference intensity processes are considered below for both one-dimensional and many-dimensional public policy decisions. 3.4.1. Majority rule processes The intuitive case for a majority rule process (hereafter MRP) is its apparent administrative simplicity and the respect MRPs show for all individuals and their proposals. May (1952) was the first to formalize these intuitions. First, May required that the social decision process reach a decision for all possible preference orderings of alternatives. This axiom is Arrow's axiom of unrestricted domain (U). Second, May asked that the decision process consider only the actual orderings of alternatives and not who holds those orderings. If two voters, David and Larsen, swap ballots, then May requires the voting outcome to be unchanged. This requirement is called the axiom of anonymity (A). Third, May required that only the attributes of two alternatives be relevant when comparing those alternatives; if x beats y and everyone who prefers x over y also prefers w over z then w must beat z. This requirement is called the axiom of neutrality (N). Neutrality requires that no option-for example, the status quo -play a special role in the voting process. Finally, May imposed the axiom of positive association (PA); if x is tied with, or beats, option y and if one person who was indifferent or preferred y now switches to prefer x, then x must become, or remain, the winner. Together, the axioms require that the democratic choice process allow all individual preference profiles (U), be responsive to those preference orderings (PA), and ignore any information not related to those orderings and the comparisons at hand (A and N). What social choice processes meet these requirements? May proves that only MRPs will qualify.6 3 The axioms of May are arguably attractive properties to impose on a democratic choice process,64 and Sen (1970a) shows that they are closely related to 63 Rae (1969) and Taylor (1969) motivate the MRP from a different perspective. Rae and Taylor show that in situations of conflict, where issues are unidimensional and randomly or impartially selected, and where the individuals gain an amount if the issue passes in their favor and lose an equal amount if the issue loses (an equal intensity assumption), then if a voter wishes to maximize his or her expected utility their preferred voting rule will be majority rule. This specification of the social choice problem has some appeal as a pre-constitution decision problem when voters will not know what issues will be considered or how they might react to the issues when they are considered. Voters select a voting rule behind a "veil of ignorance". See Buchanan and Tullock's (1962) discussion of constitutional design from this perspective. 64But they are not the only axioms we might wish to impose on a voting system, and other axioms may give other voting systems. See the discussion of preference-intensity processes below, but particularly see Plott (1976, pp. 554-566).
705
Ch. 12: The "New" Political Economy
Most preferred Least preferred
Individuals 1 2 3 x z y y x z z y x MRP Winner: xPy by 2 votes to 1 vote yPz by 2 votes to 1 vote zPx by 2 votes to 1 vote
Figure 3.8. Voting cycles with MRP.
Arrow's requirements as well. A MRP which satisfies axioms U, A, N, and PA will also satisfy Arrow's axioms of U, P, ND, and I (the converse is not true). MRPs are not, however, rational (Arrow's axiom R). MRPs are prone to intransitive orderings of social allocations. Figure 3.8 illustrates the problem. In a pairwise comparison between alternatives x and y, x receives two votes, y one vote, and thus x beats y. In a comparison between y and z, y receives two votes, z one vote, and y beats z. If the majority rule process were rational in Arrow's sense, then xPy and yPz should imply xPz. In fact, when z opposes x, z receives two votes, x receives one vote, and z beats x. The MRP yields an intransitive social ordering of alternatives for the preference profile in Figure 3.8. When such intransitivities result, we have a voting cycle. No clear winner emerges in a sequence of pairwise comparisons and we simply "cycle" from one alternative to another. Voting cycles pose a fundamental problem for MRPs. Either MRPs cycle forever and never reach a decision or, more likely, a stopping rule is imposed which ends the cycle but which exposes MRP to strategic manipulation. For example, suppose we allow each alternative to be considered only as long as it continues to win in pairwise comparisons. If the order of voting is first x vs. y and then the winner vs. z, it is clear that with the preference profile of Figure 3.8 alternative z is the chosen alternative. But what if the agenda is instead x vs. z with the winner vs. y? Now the winner will be y. Finally, if the agenda is z vs. y with the winner vs. x, then the chosen option will be x. When voting cycles occur, the voting rules and the agenda will determine the outcome. Whoever is chosen to set the agenda can determine the outcome.6 5 MRPs are not problem-free. Because Arrow's axiom of rationality is necessarily violated (because U, P, I, and ND are met, R must go), MRPs are open to voting cycles and are either manipulable or indecisive. There are only two ways out of this problem. The first is to drop one of the other Arrow axioms and impose axiom R. The axiom generally dropped is U (see Figure 3.7). We are asking if there are certain preference configurations for which the MRP will 65The example above assumes truthful voting, but this argument which illustrates the unique powers of an agenda-setter also holds if voters vote strategically, see Shepsle and Weingast (1981).
706
R.P. Inman
satisfy R. If so, we then need to examine how plausible these preferences configurations are in fact. Perhaps intransitivity and voting cycles are only a theoretical difficulty and not a serious problem in real societies. The second strategy retains the original four axioms (U, A, N, PA or their Arrow counterparts U, P, I, ND) and seeks to impose an additional structure on a MRP to prevent voting cycles, or, if there are cycles, to minimize the adverse consequences of manipulation (see Figure 3.7). Such processes are called majority rule processes with a structure-induced equilibrium (MRP-SE). Both strategies are reviewed here. Relaxing the axiom of universal domain (U) which allows for all possible profiles of voter preferences seemed, at least initially, an attractive way out of Arrow's dilemma. In real societies, common cultural backgrounds may simply mean that some preference profiles will never occur. How likely are voting cycles (the violation of axiom R) when voters have "similar preferences?" Sen (1966) has made the notion of "similar preferences" precise and has defined conditions where no cycles occur at all. Sen advances the Idea of value-restricted preferences and shows that if voter preferences over every combination of three alternatives has the property that all agree some alternative is not worst, or some alternative is not best, or some alternative is never medium, then a MRP will be transitive. Sen's conditions of value restriction establishes the minimum degree of voter compatability needed to avoid voting cycles in MRP. On its face, Sen's condition does not appear overly restrictive.6 6 Voters need only agree that some alternative is always "pretty good" (i.e. never worst), or always "pretty bad" (i.e. never best), or never a reasonable compromise (i.e. never medium). There is one prominent case where the condition of value restriction holds and a MRP yields a clear winner. It is the case where alternatives can be ordered along one dimension (e.g. from conservative to radical or from less to more public spending) and where preferences are "single-peaked" along this dimension. If, in Figure 3.8 x equals a "low budget", y a "middle budget", and z a "high budget", then the alternatives can be ordered along the one dimension, dollars, and we can plot the rank ordering of alternatives as solid lines in Figure 3.9. The rank orderings for persons I and III are called "single peaked" since there is only one "peak" to the plot of their preferences against the dimension of budget size. Person I might be called "conservative", while person III might be called "liberal". Person II has "double-peaked" preferences, however, and is the source of the voting cycle. If we change the preferences of person II to the ordering z > y > x, then Sen's value-restriction condition is satisfied-y is the one alternative which is never 66Sen's theorem (1966) does require the number of voters to be always odd, which is "disturbing and unattractive" [Sen (1970a, p. 169)]. Sen (1970a, pp. 169-171) discusses the various ways around this problem which require slightly stronger restrictions on allowable preferences. In the end, Sen concludes, as we shall too, that the usefulness of a pure MRP is a"question for empirical investigation". Will preference structures actually satisfy the needed restrictions to insure an equilibrium?
Ch. 12: The "New" Political Economy
707
PREFERENCE RANKING most
(11)
middle
(111)
least
(I) I x ="low"
I y "middle"
I z "high"
b~~~
budget level Figure 3.9. Single-peaked preferences.
worse. Furthermore, person II's preferences qualify as "single peaked" (the dashed line in Figure 3.9). Now a MRP of pairwise comparisons yields alternatively y as the clear winner: y beats x, x beats z, and y beats z. This majority winner in pairwise comparisons is called the Condorcet winner. How reasonable is the requirement that voter preferences be single peaked? In simple, one-dimensional budgetary allocations the requirement is satisfied when voters have convex preferences and linear or concave budget constraints [see Denzau and Parks (1979)]. These are the usual restrictions in demand analysis and not likely to be often violated.6 7 Figure 3.10 illustrates this case for voter III in the example above; similar diagrams could be drawn for voter I and for voter II's new (single-peaked) ordering. The fact that majority rule processes are transitive (produce a Condorcet winner) when preferences are single peaked, and single-peaked preferences are reasonable in one-dimensional budget problems, explains the popularity of majority rule models of budgetary politics. Unfortunately, the general applicability of these models is severely limited; they are only valid for problems where alternatives can be ordered along one dimension. 67 But they may be, with the most plausible violation being increasing returns to scale, and hence convex budget constraints. If increasing returns are strong enough we may well have the case where voters prefer a large amount of spending to a small amount because of economies of scale, and yet both small and large spending levels are preferred to some medium amount which does not yet afford all the cost savings of increasing returns to scale. An interesting analysis of voting outcomes in such cases is developed by Kramer and Klevorick (1974) where they develop conditions for local equilibria. These results are applied to actual allocation problems in pollution control in Kramer and Klevorick (1973).
708
R.P. Inman
private goods
x
y
z budget level
Figure 3.10. Budget preferences for voter III.
When we move to collective choice problems where alternatives are ordered on two or more dimensions -e.g. tax rates and spending or two spending levels- single peakedness is no longer adequate. New, stronger restrictions on preferences are required. Plott (1967) was the first to clarify sufficient limitations on voter preferences to ensure MRP would produce a Condorcet winner when the policy alternatives have two or more attributes. The Plott conditions require a finite number of voters whose preferences can be represented by a quasi-concave, continuously differentiable utility function. Furthermore, some voter is satiated at a point e in the policy space and the remaining voters can be partitioned into pairs so that each voter who wants to change from point e is exactly offset by another voter who prefers an equal but opposite change from e. The Plott conditions, which amount to a symmetry condition on distribution of voter preferences, are illustrated for the case of five voters in Figure 3.11. To simplify the presentation, I assume circular indifference curves around each voter's most preferred (" bliss") point or satiation point. The voter is worse off as we move in any direction from his or her point, and utility decreases monotonically with distance. [More formally, U(x) =f(O - (x - 6)) for any point x, where defines the voter's bliss point.] In Figure 3.11, the bliss points for voters A-E are indicated by points a-e, with circular indifference curves for each voter drawn about his bliss point. The solid lines connecting the various bliss points are "contract lines" marking all points of tangency between the indifference curves. Note that point e beats any alternative in a majority rule pairwise comparison, and that point e satisfies
709
Ch. 12: The "New" PoliticalEconomy Ve
'1
i
x2
Figure 3.11. Majority rule equilibrium with two policies.
Plott's conditions. Voters A and C are always opposed to each other as are voters B and D. Consider point 8 vs. point e. Voting for e are persons E, A, and B; voting for 8 are voters C and D. Consider point P vs. point e. Voting for e are E, D, and C while A and B vote for P3. Similarly, e beats a (E, B, and C for e, while A and D want a) and e beats y (E, A, and D for e, while B and C want y). A majority rule comparison of any point in the policy space to point e will show e to be the winner. When the Plott conditions do not hold, however, voting cycles may result. This is shown in Figure 3.12, where for simplicity I draw only the case of three voters with circular preferences. The arguments generalize easily. The solid lines connecting the bliss points of voters A, B, and C are the "contract lines" marking the tangencies of the indifference curves between the various pairs of voters. As drawn, the Plott conditions do not hold; if they did, the bliss point from one of the voters would have to lie upon the contract line for the other two. Figure 3.12 illustrates two important points. First, the area within the three contract lines defines the set of Pareto points for this allocation problem. A move from an allocation within the Pareto set (e.g. point e) to a point outside the set (e.g. point 8) will make at least one voter worse off (e.g. voter B). Conversely there is always a point within the Pareto set (e.g. point y) which will make all voters better off compared to its alternative outside the Pareto set (e.g. point 8). Second, there may be no stable majority rule winner when the Plott conditions are violated. As any point outside the Pareto set will be dominated by some point within the set, we can focus our analysis on alternatives such as a, , y, and e. In pairwise comparisons by majority rule, (favored by B and C) beats a (favored by A), a (favored by A and B) beats y (favored by C), but (contrary to transitivity which
710
R.P. Inman
predicts P beats y) y( favored by A and C) beats P (favored by B). Point e which is inside the Pareto set is also caught in a voting cycle. Point wins over point a as voters B and C prefer ; point a defeats y as A and B prefer a; but (contrary to transitivity) y beats e as A and C prefer y. There is no Condorcet winner amongst the alternatives in Figure 3.12. How likely is it that we will violate the Plott conditions and have voting cycles? The fact that a small deviation from perfect symmetry of voter preferences about the equilibrium point (e, in Figure 3.11) upsets the equilibrium and allows a voting cycle suggests that the Plott conditions are quite restrictive. Kramer (1973) has shown that when voters' preferences are convex and continuous and issues are to be decided on more than one dimension then voting cycles are the general case. Only when preferences are essentially identical can MRPs escape intransitivity [Kramer (1973)]. We must conclude that the emergence of a Condorcet winner is an unlikely occurrence in multi-dimensional, many-issue voting. 68 Tullock (1967) conjectured that voting cycles are less likely as the number of voters is increased. The cycle itself would be limited to a smaller subset of alternatives in the policy space (called the "cycle set"), and that this small subset would be within the Pareto set. The intuition of Tullock's conjecture was that as more and more voters covered the policy space the chances of finding the offsetting pairs of voters about this small cycle set increases - the cycle set being defined by those voters who do not offset each other. MRP should therefore tend to a decision within the cycle set (which decision we cannot say) and that the decision will be Pareto optimal. In an important paper, McKelvey (1975) has shown this conjecture to be wrong. In multi-dimensional voting problems there will generally be majority rule voting cycles which cover the entire policy space. McKelvey showed that if voters have circular preferences (Tullock's assumption too),6 9 then for any two alternatives in the policy space, a voting sequence of pairwise comparisons can be found which begins with one alternative and ends 68 We might well ask now the question posed by Sargent and Williamson (1967): How likely are voting cycles if some, but not all, voters have identical preferences? Using simulation techniques, they conclude that even for a modest proportion of identical preferences the probability of a voting cycle is significantly reduced, at least for the case of many voters and three alternatives. The probability of a cycle falls from 0.0877 to less than 1 percent in this case. It is likely that the probability of a cycle rises significantly, however, as the number of alternatives increases and approaches 1 as we cover the whole policy space [Kramer's (1973) result]. Fishburn and Gehrlein (1980) take another tack and examine the effect of voter indifference on the probability of voting cycles. They find that for three alternatives the probability of a cycle falls from 0.0877 without indifference to zero when everyone has at least one indifferent ranking in their ordering; for four and five alternatives and probability of a cycle falls from 0.1755 and 0.2513 to (=) 0.11 and () 0.20, respectively, with at least one indifferent ranking in each ordering [from Fig. 1, Fishburn and Gehrlein (1980)]. Preference compatibility- whether defined by Sen's value restriction, by identical preferences for some voters, or by indifference - does seem to help, but only when the number of alternatives are few. 69 McKelvey's results have since been generalized to more general preference structures by Schofield (1978), Cohen (1979), McKelvey (1979), and Cohen and Matthews (1980).
711
Ch. 12: The "New" Political Economy X1l
k
x2 Figure 3.12. Majority rule disequilibrium.
x2
L
//
x1 Figure 3.13. Majority rule voting paths.
712
R.P. Inman
with the other alternative such that each point along the way is preferred to its predecessor. Figure 3.13 illustrates this result for a point within the Pareto set outside the Pareto set. In a comparison of vs. 8, 8 and two points 8 and receives two votes (voters A and C) and e one vote (voter B). If we then compare points 8 and , both outside the Pareto set, alternative (voters B and C) beats 8 to . Alternative e is preferred by voters A and B (voter A). Finally, compare and therefore beats 0. We have found a voting cycle which travels from an arbitrary point inside the Pareto set to points outside the set and then back into the Pareto set. McKelvey's result proves this is no fluke; we can travel anywhere in the policy space by an appropriately chosen sequence of alternatives. Rather than unique occurrences, voting cycles are endemic to MRP. The analysis of Plott, Kramer, and McKelvey makes clear that dropping axiom U is not a particularly promising avenue to travel in our search for an attractive democratic social decision process. Only with truly one-dimensional policy choices does the relaxation of axiom, U - to single-peaked preferences- buy us very much at all. We must seek, then, another route. If we wish to continue down the path of the MRP, yet retain axiom U, then the rationality axiom R must go. The loss of axiom R raises serious questions, however. They center fundamentally upon the susceptibility of MRPs to voting cycles. We know already from the work of Blair and Pollak (1981, 1983) that no democratic choice process is immune to voting cycles. If we impose the restriction that our constitution satisfy an axiom of acyclicity (no voting cycles) then some individual or group will have extensive veto power. Since Blair-Pollak's analysis applies to all constitutions, it applies to MRPs as well. Nonetheless, acyclic constitutions may be worth examining. Shepsle (1979) in an important paper on "structure-induced equilibrium" for MRPs has begun this examination.70 To escape voting cycles endemic to MRPs, legislatures often adopt rules on how policy proposals are to be presented, discussed, and finally approved. Shepsle (1979) has begun the formal analysis of such rules, which he calls institutional structures. Such structures have three dimensions: (1) who is allowed to vote on a proposal (the "committee structure"); (2) which proposals may be compared to the status quo (the "jurisdiction structure"); and (3) how proposals may be modified once they have been formally introduced for voter consideration (the "amendment structure"). The now classic example of a structure-induced equilibrium for a majority rule process is the use of local budget referenda to set educational spending. The "committee" includes all residents within a defined geographic boundary. The committee's "jurisdiction" is to set the aggregate level of the school budget. 70 In their review of the recent literature on equilibrium in MRPs, Fiorina and Shepsle (1982) conclude: "Indeed, the central constructive feature of the Cohen-McKelvey-Schofield theorems is precisely that 'other features,' not the majority rule mechanism, are decisive in democratic institutions."
713
Ch. 12: The "New" Political Economy
percent of voters
I I I G°
I II
G
G1
G
Figure 3.14. Median voter budget.
The "amendment" process may allow all spending levels to be considered-an "open" agenda-or it may allow no amendments and only two budgets to be considered - "closed" agenda. This structure reduces a complicated multi-dimensional policy issue - the level, the content, and the financing of education - to a one-dimensional issue, the size of the education budget. Along this one dimension, voter preferences for spending levels are likely to be single peaked [see Figure 3.10, and Denzau and Parks (1979)]. In the case of an "open" agenda structure, the outcome will be that budget preferred by the voter with the median (50th percentile) position in the distribution of most preferred (or "ideal") points [Black (1958)]. Figure 3.14 illustrates this result. The solid curve is the density function of the distribution of voters' ideal points along the single dimension (G) of allowed alternatives. G represents the level of the school budget. The median position in the distribution, denoted G*, will be the winning alternative. Since preferences are single peaked, voters always prefer alternatives closest to their ideal points. If G1 were proposed, all the voters whose ideal points are to the left of G* will favor G* and a few voters to the right of, but close to, G* will also favor G*. Since half the voters lie to the left of G*, G* defeats G1. If GO were proposed, it too would lose to G*. Half the voters whose ideal points lie to the right of, G * will favor G * over G° , and a few voters positioned just to the left of G* will also prefer G* to G° . It is clear that G* is a MRP-SE equilibrium outcome. In the case of a closed agenda MRP, voters face a one-time choice between only two alternatives. How are those alternatives chosen? They arise from an agenda-setting process. One alternative is a "reversion", or fall-back,
R.P. Inman
714
A
private goods
Figure 3.15. Bureau behavior with alternative reversion budgets.
alternative - for example, last year's budget or status quo. The other alternative must be specified from an agenda-setting process. In an important line of new research, Romer and Rosenthal (1979b, 1982a, 1982b, 1982c) have developed and tested a model of agenda-setting behavior in one-dimensional policy problems. The analysis extends the early work on Niskanen (1971, 1975) on bureaucratic agenda-setters. In both Niskanen's and Romer-Rosenthal's models it is assumed that the agenda-setter is the bureaucracy and that the professional bureaucrat's objective is to maximize the size of the budget. Figure 3.15, which shows the budget line and indifference curves for the decisive median voter, illustrates how the bureaucrat will select the agenda, given that 50 percent of the voters must approve one alternative or the other. The trick is to select the agenda so that the approved budget is as large as possible. Note that if G*, the median voter's preferred budget, is one alternative, then it will receive voter approval. Thus, the worst the bureaucrat can do is G*. But they can capture a higher budget. Consider the agenda pair G (the "reversion" level) and a budget slightly smaller than G. Which will the median voter prefer? The answer is the budget near G for it places him on a higher indifference curve (more satisfaction) than that available with budget G. The bureaucrat can increase the budget from G * to nearly G with an appropriate agenda pair. Yet the approved budget can be made larger still. If the reversion level is set at G = 0, then an alternative budget slightly smaller than G will win voter approval in the pairwise comparison 0 vs. (G- 1). Romer and Rosenthal have shown how agenda manipulation can influence the final outcome in a MRP-SE process. Note too that such manipulation generally makes the agenda-setter better off (G increases) while making those outside the select circle (the median voter and other low demanders) worse off. An interesting series of empirical papers have in fact tested MRP-SE models for this special case of one-dimensional policy choices. Howard Bowen, in a
Ch. 12: The "New" PoliticalEconomy
715
remarkable paper published over forty years ago [Bowen (1943)], first proposed the empirical test of such MRP-SE models, 71 but it remained for Bergstrom and Goodman (1973) and Inman (1978) to perform the test. Those results were generally supportive of the MRP-SE process as a description of local government budgetary choice. But Bergstrom and Goodman, and Inman, both assumed the amendment process was an open one - that is, any proposal could be considered. Romer and Rosenthal's work extends this analysis by allowing for no amendments and closed agendas. Within their more general structure they could formally test the proposition that the amendment structure of these MRP-SEs was "open" or "closed". The key variable added to the Bergstrom-Goodman and Inman open agenda model was the "reversion" level. The lower is the reversion level, the higher should be government spending under the closed agenda model (see Figure 3.15). With an open agenda, the median voter is decisive, G* is chosen, and the reversion level should be statistically insignificant. In their study or Oregon school districts, Romer and Rosenthal find the reversion level is significant and has the expected impact on local spending. As the reversion level passes below a critical lower bound, Romer and Rosenthal find that local school spending is 11-18 percent higher than that predicted by the open agenda median voter model. In Oregon school districts, at least, the monopoly agenda-setter has an important role to play in setting the level of government spending. Shepsle (1979) has extended the analysis of MRP-SE to two or more policy dimensions. The structure involves committees of all voters (committees-of-thewhole), but each committee's jurisdiction is restricted to considering only one policy dimension. If there are two policy dimensions, then there must be two committees. Finally, the amendment process allows an open agenda subject to the restriction that only amendments pertaining to each committee's policy dimension can be considered (called the germaneness rule). Furthermore, truthful voting is assumed.7 2 Figures 3.16(a)-(c) illustrate an equilibrium in the simple case of two policy dimensions (x 1 and x 2 spending) and three voters (A, B, and C). The jurisdiction and amendment rules restrict voters to consider one policy dimension at a time. Figure 3.16(a) plots the preferred spending on x1 for each voter, given an exogenously set level of x 2. The curves are drawn as the dashed lines AA', BB', and CC' and are found by drawing lines parallel to the x1 axis 71 In this one paper Bowen (1943) has addressed the entire research agenda of the new political economy. First, he described the market failure of pure public goods and derived the conditions which must be satisfied if an efficient allocation is to result. Second, he proposed a behavioral model of government budgetary politics and derived the essential results of the median voter theorem. Third, he proposed an empirical test of whether governments, if behaving according to his model, would in fact provide the efficient level of the public good. His analysis precedes Samuelson's (1954) normative analysis of public goods by eleven years, Black's (1958) median voter results by fifteen years, and the new empirical political economy by at least twenty! 72 Extensions of the analysis to the case of strategic voting is found in Miller (1980) and Shepsle and Weingast (1984). The essential point- that structure-induced equilibrium result - remains intact.
716
R.P. nman I
X2
Xo 2
x1 (a)
/
B
X2
X1
xo (b) Figure 3.16. Structure-induced equilibrium.
717
Ch. 12: The "New" Political Economy
(c)
x1
Figure 3.16. Continued.
at given levels of x2 and noting the tangency of those lines to a voter indifference curve at a point (e.g. f/, a, and y) closest to the voters' ideal points (a, b, and c). Given the curves AA', BB', and CC' we can calculate the majority winning level of x 1 for each level of x2. For example, at x 2 = x° , voter B prefers r, voter A prefers a, and voter C prefers y. Since preferences for xl are single peaked along the ray at x2 = x, proposal a is the median preferred alternative and the majority rule winner. We can repeat the calculation for each x 2 level, and the result is the set of majority preferred x levels, drawn as the heavy line I in Figure 3.16(a). Figure 3.16(b) repeats this procedure, except here we determine the majority winning x 2 levels, given a level of x 1. For example, when x = x °, the three preferred proposals are , , and /3. Since preferences for x 2 are single peaked along the ray at x = x ° , is the majority winner. The heavy line II is a collection of the majority preferred x 2 levels, given x1. The intersection of lines I and II in Figure 3.16(c) is the Shepsle structure-induced equilibrium for this simple case, denoted as point E. In the example drawn here, E is a stable equilibrium as well. If moved away from E, the voting process will eventually return to point E.7 3 In general, the structure-induced equilibrium is not a single 73 0ne needs to impose a dynamical structure on the Shepsle model to discuss stability. A "cobweb" process seems reasonable. Given x2, select a majority preferred x,. Then given that x,, select a new majority preferred x2 . Does this sequence converge to E? In Figure 3.16(c), it does. However, it is not hard to construct an example with lines I and II (and underlying tastes) which does not converge to E. In that case, the equilibrium point where lines I and II intersect is unstable. For a more detailed discussion of stability with structure-induced equilibria, see Denzau and Mackay (1982).
718
R.P. nman
point but a set of points [Shepsle (1979, p. 50)]. Nonetheless, the equilibrium set is likely to be a small subset of the whole policy space. Shepsle's two-dimension, MRP-SE analysis assumes an open agenda; Mackay and Weaver (1981, 1983) have extended the model to allow for closed agendas set by a budget-maximizing bureaucrats. The structure of their model parallels Shepsle's special case of a committee-of-the-whole restricted to considering one policy dimension at a time [Figures 3.16(a) and 3.16(b)]. Mackay and Weaver (1981) examine both the case of a monopoly (closed) agenda-setter for one policy dimension but open agenda politics on the other dimensions and the case of monopoly (but non-collusive) agenda-setters on all dimensions. Results are then compared to a Shepsle open-agenda equilibrium, such as point E in Figure 3.16(c). They show that for two-dimensional policies (e.g. x and x 2), monopoly power will increase the size of budget allocated to the monopolized activity (from point E) and may reduce (if substitutes) or increase (if complements) the size of the budget to the openly decided public activity. If both activities are monopoly controlled and the monopoly bureaus are non-cooperating, it is possible for one bureau's budget to actually fall below that of the open-agenda outcome. The more likely case, however, is for both budgets to increase from the open-agenda equilibrium. Finally, if monopoly bureaus cooperate- specifically if they can agree to transfer funds between each other - then it is possible to expand their individual budgets still further [Mackay and Weaver (1983)]. The basic theoretical results from Mackay and Weaver parallel those of Niskanen, and Romer and Rosenthal, for the single-dimension case, but as yet no empirical analyses have confirmed these predictions for the case of multi-dimensional policies.74 Although the effect of institutional structure on policy choice is important, and real, it is not by itself an answer to the question of how MRPs do, or should, select outcomes. For if a structure-induced equilibrium such as point E in Figure 3.16 is not a Condorcet equilibrium (a policy capable of beating all others in pairwise comparisons), then there is always a majority with an incentive, and the potential power, to change the institutional structure and hence the chosen outcome. When, why, and how do such institutional changes occur? It is this question which now stands as the central challenge to our understanding of majority rule processes. In an important sense we have moved the dilemma of
74 An alternative to the Mackay-Weaver models is presented in Inman (1981). Inman hypothesizes a bureau - called a labor union- seeking to maximize an objective function over current and future labor income and the level of employment. Employment proxies for public good output, while labor income translates directly into taxpayer income through the government's budget constraint. The union negotiates with a coalition of taxpayers. The tendency is for negotiation to move to outcomes along a "contract curve" defined by mutually agreeable trades between the bargaining parties. Where on the contract line the final outcome will be, will depend on the relative bargaining strengths of the two parties. The analysis predicts and the empirical work confirms, a larger public budget with two-party bargaining than when workers are available at the competitive wage.
Ch. 12: The "New" PoliticalEconomy
719
Net benefits from roadway: Farmers I II A - $2000 - $2000 B $5000 - $2000 C -$2000 $5000 Figure 3.17. Vote-trading.
disequilibrium and cycling from the space of policies to the space of institutions. As Riker (1982) concludes in his own recent survey of majority rule processes: ... rules or institutions are just more alternatives in the policy space and the status quo of one set of rules can be supplanted with another set bf rules. (T)he only difference... is that the revelation of institutional disequilibrium is probably a longer process. ... If institutions are congealed tastes and if tastes lack equilibria, then also do institutions, except for short-run events. 3.4.2. Preference intensity processes
Having now examined various dictatorial processes and then democratic decision processes which sacrifice Arrow's axioms of universal domain or rationality (MRPs), it remains to describe possible democratic choice processes which violate the last Arrow axiom of independence from irrelevant alternatives (I). We have seen that democratic mechanisms such as the Borda count which violate axiom I have one advantage - they allow for voters to express the relative intensity of preference or value they place on alternatives - and two disadvantages- all alternatives just be considered jointly and such mechanisms are open to strategic manipulation by voters. I will call such democratic mechanisms preference intensity processes (PIPs). While the Borda count is the common textbook example of a PIP, it is not prominently used by real legislatures. PIPs do exist, however. Any democratic mechanism which permits votes to be traded, bought, or sold will violate axiom I, permit voters to express the intensity of their preferences for particular policy positions, require voters to consider all feasible policies, and be susceptible to strategic manipulations.7 5 Vote-trading, logrolling, and interest group politics in which votes are for sale are all democratic preference intensity processes. How do they work? Figure 3.17 illustrates the potential and hazards of vote-trading processes. 7 6 Persons A, B, and C are three farmers. Both farmers B and C have need of roadways to their farms which will each cost $6000 to be shared equally among 75See Wilson (1969) for a proof of these propositions. 76 The analysis is based on an example in Mueller (1979).
720
R.P. nman
the three farmers. The benefits of the access roads to farmers B and C will be $7000 each. If roadway I is built, farmer B gains a net increase of $5000 (= $7000 - $2000), while farmers A and C lose $2000. If roadway II is built, farmer C gains $5000 ($7000 - $2000), while farmers A and B lose $2000. The construction of the roadways is to be decided by a democratic process. If simple majority rule is used, both roadways I and II are defeated by votes of 2-1. Note, however, that both B and C gain by trading votes; B agrees to vote for roadway II if C agrees to vote for roadway I, and both projects are then approved by 2-1. Furthermore, joint passage is potentially Pareto optimal in this case since social benefits ($7000) exceed social costs ($6000) for each roadway. Two facts should be noted regarding this example. First, the existence of beneficial trades requires a non-uniform distribution of net benefits over the alternatives. If net benefits from the roadways for farmers B and C were only $2000, they would gain nothing by trading votes. For vote-trading to be worthwhile to individual participants, they must feel intensely toward one issue over another. Second, vote-trading need not always lead to a Pareto-optimal allocation. Suppose the gross benefits of each roadway were only $5000; now social benefits ($5000) are less than social costs ($6000). But a vote-trading equilibrium with $3000 in place of $5000 in Figure 3.17 (B and C still have an incentive to trade) will approve both roadways. The vote-trading outcome overprovides the public good. There is no assurance that vote-trading will achieve the Pareto-efficient allocation we seek; see Riker and Brams (1973). More fundamentally, however, there is no guarantee that a vote-trading equilibrium will even exist. Both farmers B and C have an incentive to cheat on the agreement if voting occurs sequentially. Once one farmer has achieved his roadway approval, he has an incentive to cheat on the agreement and vote to reject the roadway of his neighbor. Since the first farmer knows he will be cheated, he refuses to trade. The "game" they are playing is, of course, a Prisoner's Dilemma [Mueller (1979, p. 54)], and just as non-cooperative behavior in such games can undo trades in the market place, so too can it undo trades in the halls of government. We need a way to ensure that cheating cannot occur. The vehicle to foster cooperation in this case is the omnibus legislative package which requires that all bills (roadway I and roadway II) which have been vote-traded to be decided simultaneously. The creation of such omnibus legislative packages is called "logrolling". Logrolling permits voters to achieve a stable vote-trading equilibrium. The logrolling game has been analyzed by Becker (1983) and Weingast, Shepsle and Johnson (1981) under the assumptions each voter represents a particular interest group (e.g. roadway I or II) and that all voters adopt a Cournot-Nash strategy when bargaining-that is, all voters behave as if their votes do not influence the benefits and costs, and thus the behavior, of other voters. Becker
Ch. 12: The "New" Political Economy
721
and Weingast-Shepsle-Johnson show that such games will have an equilibrium allocation even over multi-dimensional policy choices. What has not yet been established is the existence of a logrolling/interest group equilibrium under more sophisticated, rational expectations behavior by voters. Yet, sophisticated, strategic voter behavior is generally viewed as a central element of all preference intensity processes. The analysis of logrolling/interest group models must be considered incomplete until sophisticated, strategic voter behavior is allowed [Kramer (1972)]. 3.4.3. The performance of democraticprocesses- allocative efficiency Having established that MRPs and PIPs may, in the right behavioral and institutional settings, lead to well-defined and predictable equilibria, we can now turn to an evaluation of those equilibria against the normative criterion of allocative efficiency. When will a democratic process satisfy Samuelson's condition (MRS = MC) for a cooperative, allocatively efficient provision of a collective activity? We know from Arrow's theorem that democratic processes cannot give stable, allocatively efficient outcomes all the time, but perhaps for a certain configuration of alternatives and preferences, it will succeed. Bowen (1943) was the first to consider systematically the efficiency properties of a democratic process. Analysis was limited to a MRP in the case of singledimension policies with single-peaked preferences. Bowen showed that if each of N voters pays an equal share of the marginal (= average) costs of the public good x [= MC(x)/N] and if preferences for the public good are distributed symmetrically [so the median MRS(x) corresponds to the mean MRS(x)], then for the MRP's selection of median voter's preferred alternative [defined by MRS(x)median = MC(x)/N] and from the symmetry of preferences [-MRS(x)/N= MRS(x)me,, = MRS(x)median] we can conclude that majority rule will satisfy Samuelson's conditions for allocative efficiency [MC(x)/N= MRS(X)media = YMRS(x)/N = MC(x) = YMRS(x)]. These conditions are obviously very special. Bergstrom (1979) has generalized the Bowen results for MRP to a more plausible tax system and structure of voter preferences. Specifically, Bergstrom proves that: if
(i) voter preferences are log-linear in the private good (Yi for voter i) and the public good (x) (Ui = In Yi + ailn x), and if, (ii) the public good is produced at a constant cost per unit of x(= c), and if, (iii) each voter's share of total taxes (i) is proportional to voter income (T = I,/ZEI), and if,
722
R.P. nman
(iv) the "taste parameter" which determines a voter's willingness to pay for the public good (parameter a) is distributed symmetrically across all voters and is independent of income, then a Bowen median voter equilibrium will be Pareto optimal as well.77 The assumption (i)-(iv) of the Bowen-Bergstrom theorem are not implausible, 77 The proof of the Bergstrom-Bowen theorem is straightforward. From assumption (i) and the voter's linear budget constraint, I, = Y + icx, we know the issue is single-dimensioned (what level of x?) and that voter preferences for alternative levels of x will be single peaked. The public sector equilibrium will therefore be determined by the preferences of the median voter. The maximization of U subject to the budget constraint defines the median voter's preferred level of x implicitly as
MRS =
=a(x )6=· TC,(1)
where a is the preference parameter for the median voter. Rearranging gives aY, = Ticx and substituting into the budget constraint yields I, = Y(l + a) or Y = I/(1 + a). Substituting this specification for Y/ into the above equilibrium condition and again rearranging terms allow us to specify the median voter's preferred level of x = as = (a/l + a)(I/i,c) =
+a)(EJ,/c),
(/
(2)
using Ti = (I,/YI). i is the level approved by the MRP. But does it satisfy Samuelson's conditions for efficiency? The typical -voter's MRS evaluated at x = x is equal to MRS = ai ( Y/)
=a ( -
which upon the substitution of ( i/l MRS = (ai/a)c
c)/X,
(3)
as defined in (2) equals: i).
(4)
Summing over all i voters including the median voter gives: , MRS, = ( c/a
I )
ai I.
(5)
If a is independently distributed from I, within the community, then EaiI,= aY,, where a is the mean value of the preference parameters. Upon substitution, EMRS = c((/a),
(6)
which, if the ai's are symmetrically distributed so a = a, reduces to _ MRS = c. Equation (7) is Samuelson's condition for the allocatively efficient provision of x.
(7)
Ch. 12: The "New" Political Economy
723
but of course they must be tested. Assumption (i) is a common specification in empirical public good demand analysis, where a i is often specified to depend upon age, religion (for educational spending), or race. Assumption (ii) is likely to hold for most public goods, while assumption (iii) is valid for any tax system where taxes are proportional to income. 78 Assumption (iv) is yet to be examined: are age, religion, or race or their combined effects on demand symmetrically distributed? Barlow (1970) has actually performed a variant of this test for a sample of Michigan school districts. At issue is the level of school spending. The median voter is assumed decisive (a Bowen equilibrium) and local spending is set by the condition: MRS(.)median = c(), where is the median voter's share of total local taxes, c is the constant cost per unit of x, and is the median voter's preferred level of x. The median voter's willingness to pay for schooling is related to the average willingness to pay by the relationship MRS() median = 4MRS(.)mean = i[MRS(x.)/N], where is a constant relating median to mean benefit. Hence, fc(x )= MRS(x) median = p[YMRS(/)/N], or
c(x)/)jMRS(x) = p/N. Note that if the ratio (/TN) equals one, then c(x) = LMRS(.) and Samuelson's efficiency condition is satisfied. If (+/TN) > 1 (< 1), then c(x) > ZMRS(.) [c(x)< EMRS(Z)] and local school services are
over-provided (under-provided) relative to the allocatively efficient level. The key ratios are the ratio of median to mean benefits [ = MRS(x) median/MRS(x) mean] and median to mean tax payments [N = /(YI/N)]. If we assume benefits are
symmetrically distributed and everyone pays equal taxes, then mean and median benefits are equal and mean and median tax payments are equal, so (/TN) = 1, and Samuelson's efficiency condition holds. This is Bowen's (1943) special case analyzed earlier. Barlow has actually estimated (/TN), however, and he concludes that for the majority of local school districts in his sample (/9N) < 1. Thus, local school spending is too low relative to the efficient level. For this single-dimension choice, the MRP is allocatively inefficient. 79 Kramer (1977) has provided the basis for evaluating the efficiency performance of a MRP in the case of multi-dimensional policies. A dynamic sequence of elections between an incumbent who must defend the status quo and a challenger who selects a vote maximizing alternative to the existing policy is shown to converge to the set of Pareto-efficient policies. The incumbent must run on his or
78A tax system based on a proportional income tax will define each voter's tax payment as T = tr I, where t is the proportional tax rate. Total taxes equal YT = t2I,. The voter's share of taxes (Ti) will therefore equal T//YT = tli/tEli = jl,/I,. Note, too, that any proportional tax system which taxes a commodity which is proportional to income also satisfies assumption (iii). For example a proportional property tax, where housing value is a constant times income, will satisfy assumption (iii).79 See Edelson (1973) and Bergstrom (1973) for critical comments and extensions of Barlow's work.
724
R.P. nman
her performance during office, which is assumed to be identical to the proposal the incumbent offered when elected. However, the challenger can offer any proposal he or she thinks can beat the incumbent's platform. The challenger seeks to maximize the number of votes received against the incumbent's position. Since in general there is no single policy which beats all others in majority rule comparisons, the challenger, if he knows voter preferences, can always find an alternative policy which beats the incumbent's. Thus, the "out" candidate can always beat the incumbent, and we should observe a series of one term candidates. Where does this process lead? Perhaps not too surprisingly, it takes us into the set of Pareto-optimal policies. When politicians seek to please the maximal number of voters with any policy change (the vote-maximization hypothesis), and when an incumbent's policy is outside the Pareto set, then there will always be a policy within the Pareto set which is unanimously preferred to the incumbent's policy and which the votemaximizing challenger will select. Unfortunately, once inside the Pareto set, there is no guarantees that the two-candidate competitive process will keep us there. Figure 3.13 above illustrated a case where a proposal in the Pareto set () could be defeated by a challenger's proposal outside the Pareto set () which in turn could lose to another proposal outside () or inside the Pareto set. The equilibria of the Kramer's dynamical MRP are not stable. However, vote-maximization behavior which assumes challengers are always looking for unanimously preferred alternatives keeps pushing outcomes back into the Pareto set. Kramer's analysis predicts that policies will eventually be in or near the Pareto set, but exactly where is not known. 80 What drives this MRP to the Pareto set is the assumption of vote-maximizing behavior. Just as the Bowen-Bergstrom structure required empirical testing, so too must we test the validity of the Kramer structure. While there is evidence that elected officials are responsive to constituent preferences [Jackson (1974)], there is also ample evidence that nonvoters - industry associations and other special interest groups - can also have a significant impact on policy selection. See Estey and Caves (1983), Kau, Keenan and Rubin (1982), Darden and Silberman (1976), Chappell (1982) and, most recently, Kalt and Zupan (1984). Non-voters can only influence policy if voters are imperfectly informed and/or politicians have objectives other than simply maximizing their election pluralities. Both alternative hypotheses seem plausible 80Actually, Kramer's analysis gives more precise predictions in certain cases. The Kramer twocandidate, dynamic model drives outcomes into a subset of the Pareto set, called the minmax set. The minmax set is composed of all proposals having the property that the maximum number of voters wishing to move in any policy direction is a minimum. Proposals outside the Pareto set cannot be minmax points because there is always some proposal in the Pareto set which will be unanimously preferred to it. Kramer (1977, p. 321) gives examples of minmax sets which are significantly smaller than the Pareto set. However, whenever the policy issue is purely redistributive (e.g. the division of a fixed sum of money) and voters are selfish, the minmax set will correspond to the entire Pareto set [Kramer (1977, p. 330)].
Ch. 12: The "New" Political Economy
725
and sufficient to drive final policy selection away from the Pareto set based on voter preferences. If, therefore, there is no strong evidence that MRPs have achieved allocatively efficient allocations of public goods, then how about preference intensity processes? Buchanan and Tullock (1962, p. 186) and Becker (1983) have argued that logrolling will succeed in finding Pareto-efficient packages of public goods and taxes, but both studies ignore the difficulties of finding cooperative allocations when voter-traders have incentives to conceal their true preferences through strategic behavior. Creating a pseudo-market in the legislature generally will not solve a market failure. The recent analysis of Weingast, Shepsle and Johnsen (1981) illustrates this important point. In their legislative model of logrolling, politicians are elected to represent local voter interests. If local voter welfare is defined to include not only direct project benefits but also the pecuniary benefits of project outlays within the district paid for by taxpayers living outside the district, then Weingast et al. show that each individual legislator will prefer a project size larger than the allocatively efficient size and, furthermore, that all legislators together have an incentive to approve an omnibus public works package which gives each legislator their favored project The result is "pork81If x is the scale of the public project in an individual district, b(x) is the aggregate benefits to members of the district, and c(x) is the total cost of project, then efficiency requires marginal benefits to equal marginal costs: b'(x E ) = c'(xE), where xE is the efficient level of the project. Total project costs can be divided into production costs spent within the district, denoted c,(x), production costs spent outside the district, denoted c2 (x), and non-expenditure resource costs (externalities, for example) within the district, C3 (X): C(X)= C(X)+ C2(X) + c3(x). Benefits are limited to district residents without loss of generality. Politicians are assumed to receive political credit for all project benefits and for all public expenditures which occur within their district. They lose political credit for all tax payments by residents and for the external costs which residents bear from a public project. The politician's maximand is specified as
P(= [b(xj) +cl (xj)
+ E C2(i)]
-
[C3(Xj) + rj(cl(Xi)
+ C2(Xi))
political costs
political benefits
where Pj(-) is the net reward to the elected representative from district j (re-election prospects, bribery income, etc.), b(xj) is direct project benefits, c(xj) + Fi ,jc 2 are all public expenditures in district j, C3 (X ) is external project costs (no spillovers across districts are assumed), T is district j's share of total taxes, and ,i(c,(xi) + c2 (xi )) is total budget costs for all projects across all districts. The politician's reward depends upon the difference between political benefits and political costs. In deciding on the level of public spending, the politician maximizes Pj ( .). Pj ( .) can be re-written as j(x)
P()
= { b(xj) + c (Xi-3(x +
i
[c2j(Xi) -
(X
j(
l(Xi) + c2(XJ))}
(C1(xi) + C2(Xi))]
726
R.P. Inman
barrel" politics and inefficiently large government spending; see Ferejohn (1974), Plott (1968) and Mayhew (1974) for some recent evidence. 8 2 Stigler (1971) and Peltzman (1976) have offered a variant of a vote-selling model to explain the apparent over-provision of government regulation; see Joskow and Noll (1981).83 Regulatory inefficiencies are the outcome of a game of redistribution in which the "owner" of the rule-setting process - either bureaucrat or legislator-can "sell" favorable regulations to the highest bidder.84 Such regulations create favored, monopoly positions within the private economy which produce allocative inefficiencies as an unwanted by-product. Recent papers by Krueger (1974), Brock and Magee (1978) and Bhagwati and Srinivasan (1980) The expression in curly brackets x,'s except x. The politician's spending elsewhere) is therefore the argument x. The first-order
depends only upon x,, while the second term is a summation over all preferred level of own district spending (which is independent of defined by the maximization of the expression in curly brackets over condition is given by
b'(xjN) + ci(XjN) = rj (ci(xjN) + C2(Xj )) + C(X ), where xjN is the politically preferred level of the district j's project. How does xN compare to Note
[cl(x) + 2(x) + c3(x)] > [(c (x)
c,(x) always for
[j(C;(XE) + C (XE)) + c,;(XE) - C (xE)],
or alternatively, b'(xE) + ci (xI) > jy(
(I)
+ c (XE)) + C (
) = b' ( XJN) +
( XJ)
If we assume b"(x)< 0 and c"(x)> 0, then the efficient outcome, xjE, must be less than the politically preferred outcome, xN: xN > x. Because projects in district j do not interact with projects elsewhere to influence net benefits in j, N each politician can set his or her preferred x without coordinating with politicians elsewhere. Each politician comes into the budget process with a preferred xN and the sum of those project levels over all districts sets the aggregate budget. The resulting equilibrium budget is a Nash equilibrium. 82It is instructive to compare the experiences of Presidents Carter and Reagan in their separate attempts to control government spending. Carter attacked selected individual district projects as inefficient - primarily water resource projects - and made no progress at all in limiting "pork-barrel" spending. Reagan, on the other hand, gave Congress a reduced budget as a package and emphasized that everyone's projects were reduced to everyone's benefits. This is exactly the strategy required by the analysis of Weingast, Shepsle mnd Johnson (1981, particularly, pp. 656-658) to promote efficiency. 83The recent book edited by Fromm (1981) in which the Joskow-Noll paper appears contains several of the better studies of the causes and consequences of economic regulation. Other recent studies on the consequences of regulation are Sloan's (1981) review and study of hospital cost inflation and Masson and DeBrock's (1980) study of milk price regulation. 84For a critique of these models using the recent history of regulatory reform, see Weingast (1981) and Wilson (1980).
Ch. 12: The "New" Political Economy
727
have argued that a similar process is at work in setting quota or tariff barriers to international trade. Not only is the politically-created, economic monopoly wasteful, but the process of logrolling or "rent-seeking" is itself inefficient. Since an explicit market in political favors is often illegal, favorable rulings must appear to have been openly considered by the policy process. Hearings must be held, studies completed, and rules written all of which consume scarce time and resources. 85 The sum of the economic losses from the misallocation of private resources and the resources consumed by rent-seeking behavior may be considerable; Krueger (1974), for example, has estimated the inefficiencies from rent-seeking in import licenses at 7 percent of GNP for India to as much as 15 percent of GNP for Turkey! 86 While it is safe to conclude from this brief review of the efficiency performance of known MRPs and PIPs that democratic processes do not generally guarantee an efficient allocation of social resources, we cannot go the next step and conclude that collectively-decided allocations using MRPs or PIPs are inferior to individually-decided market allocations. At the moment we know too little about the economic performance of democratic policy choice to clearly favor either markets or democratic governments as a more efficient resource allocator. 3.4.4. The performance of democraticprocesses - equity
The second and, it is often argued, the most important function of government, is setting the distribution of economic resources across members of society. From the perspective of those who argue the market process may generate an ethically unacceptable distribution of final goods and services, an extra-market institution called government with the coercive powers to tax and transfer is required to correct this market "failure"; see, for example, Rawls (1971). It is important to examine the distributional performance of governments. Indeed, such leading scholars as Stigler (1971) and Riker (1982) have argued that understanding the distributional performance of government is the central task of political economy. The essence of the redistribution activity is its zero-sum nature: what is given to one agent is necessarily tiken from another. There are no net gains from the activity, only transfers. When this redistribution game is played amongst equally 85 But Bhagwati (1980) makes the familiar point that one distortion-lobbying-may be welfare improving in a world with other market distortions. It is a valid point and cautions us to do empirical analysis before judging our political institutions as always inefficient. 86Bhagwati and Srinivasan (1980) emphasize that Krueger's estimates are an outer estimate of these inefficiencies as she assumes rents are sought competitively thereby drawing many resources into the rent-seeking process. If quotas are set as a bilateral bargaining game between the monopoly bureau and the to-be-favored private party then resources consumed by rent-seeking itself will be much less. This is not to say that the resulting market inefficiencies will be insignificant, however.
728
R.P. nman
powerful agents it generally has no stable, or undominated, outcome. 8 7 Whenever one group forms with sufficient power to take resources from another group, the weaker group has an incentive to bribe a few members of the stronger group to abandon that group, join with the weaker to make it strong, and then defeat the remaining members of the once strong group. Around it goes. Stability is achieved only through the outside imposition of rules of how groups may form and how groups can take and allocate resources. Understanding the distributional performance of governments therefore requires us to understand how these rules influence the outcome of the distribution game. Democratic solutions to the distribution game may take either of two forms: (1) as a majority rule solution to a non-cooperative game in which players do not communicate, collude, or form coalitions (MRP solutions), or (2) as a solution to a cooperative game in which coalitions, collusion and bargains are allowed (PIP solutions). The contemporary political economy literature contains examples of both. The use of a majority rule process (MRP) to describe distribution policy faces a serious, initial hurdle. If tax and transfer policy is multi-dimensional, then the MRP will not produce a stable policy outcome, unless the very restrictive Plott-Kramer conditions on voter preferences - essentially, that preferences are identical - are satisfied. The redistribution game is hardly interesting if everyone agrees! Thus, if a simple MRP is to be used as a basis for an analysis with stable outcomes, the redistribution game must be reduced to a single-dimension policy problem. 8 8 Romer (1975), Roberts (1977), and Meltzer and Richard (1981)- the major studies of redistribution through a MRP - do just that. Voters are assumed to have similar preferences over the primary goods of after-tax consumption (c) and work effect (e) but to differ in their basic ability (n) to earn income. It will be this difference in ability or wages which generates the conflict over tax and transfer policies. The final, and crucial restriction, is to limit government taxtransfer policy to a linear income tax of the form: T(y) = ty - a, where y is
earned income, t is the marginal tax rate, and a is a lump-sum subsidy (a > 0) or 8
7 More formally, we can say that there are. no undominated allocations in zero-sum games, or that the "core" of the game-i.e. the set of undominated allocations-is empty. statement is straightforward. Let the game involve N players, indexed by 1i = ... The proof of this n, resources valued at V(N). For an allocation to be in the core it must satisfy three who own total (1) no individual's allocation, xi, is less than his worst performance on his own, v(i): requirements: xi > v(i); (2) the total of all individual allocations must equal the value of the grand coalition, v(N): YNxi v = (N); and (3) the value of the allocations to members of some coalition smaller than (S < N), denoted -sx i, must be at least as great as the worst that coalition the grand coalition could do on its own, v(S): L Esxi uv(S). Assume an allocation x(= x1 ... , x,) is in the core. Then by condition (3) all members in S earn Ysx 2 (S) and those not in S earn _ sxi >_ v(- S). Since S and - S equals the grand coalition N, NXi > v(S) + v(- S). For zero-sum game (N) = u(S) +-v(- S) 0; thus ENXO.>2 But this result violated requirement (2) for a core allocation says ENX, = 0 for zero-sum games. Thus, core requirements cannot be satisfied for zero sum which games; 88Craig and Inman (1986) have recently analyzed a redistribution game in twothe core is empty. dimensions-an income guarantee and a marginal tax rate- using a MRP-SE model.
Ch. 12: The "New" Political Economy
729
tax (a < 0). Earned income equals effort times ability -y = ne - while after-tax consumption equals earned income minus net taxes - c = (1 - t)y + a. Individuals behave in the labor market so as to maximize their utility function u(c, e) which can also be written as u((1 - t)ne + a, e}. Households select work effort, e, to maximize u }. The resulting labor supply schedule is a function of the "net wage", (1 - t)n, and the lump-sum tax or subsidy, a: e = e[(l - t)n, a]. Once e has been chosen, y and c are also determined. The linear tax-transfer schedule has two policy variables: the marginal tax rate (t) and the lump-sum subsidy (a). The government's budget constraint which requires tax revenues to cover all (exogenously set) public good spending (G) and all lump-sum subsidies reduces this two-dimensional policy problem in a and t to a decision in one dimension, t. Total net taxes raised over all N agents must equal total spending: ZNT(y) = G, or N(ty - a) = G, where y is average before tax income. The budget constraint can also be written as a = ty - G/N. If leisure is a normal good (the usual assumption) so that work effort and earned income decline as the lump-sum subsidy a rises, then for any value of t (given G) there will be a unique value of a which satisfies the government's budget constraint. This is shown in Figure 3.18, where y(a) shows the fall in income as a rises, and where ty - G/N is the government's budget constraint for a chosen t. When
cx /N)
cx
cx
y
t2 >t 1l =>(
2
>(C1
Figure 3.18. Determination of lump-sum subsidy.
730
R.P. Inman
t = t, the required a which balances the budget constraint is ac. When t rises to t2, we can offer a higher subsidy for each value of y and a rises to a 2 . When the tax-transfer schedule is linear (two parameters, t and a) and the government must balance its budget, the redistribution game is reduced to a single-dimension policy problem: the selection of t. How does a MRP select a winning tax rate, t *? Since the selection of a value of t defines a corresponding value of a, and t and a together determine each agent's preferred labor supply (e), earned income (y), and final consumption (c), each household's welfare will ultimately depend upon the chosen tax rate, t*. Each voter will have a preferred tax rate, t*, set as that rate which maximizes ui {( }. If voters can be ordered by their preferred tax rates and if an individual's preferred rate is independent of the preferred rates of the other agents so there are no advantages to collusion, then the MRP can select a unique t*. Roberts (1977, p. 334) has made these requirements precise. The key assumption is called "hierarchical adherence" and requires that there exist an ordering of individuals such that pre-tax income is increasing along that ordering irrespective of which tax schedule is in operation. Meltzer and Richard (1981) show that if consumption (c) is a normal good, then in this simple model incomes can be ordered by worker productivity (n) and the assumption of hierarchical adherence will be satisfied; the more able workers earn more income no matter what value of t is finally chosen [see also Romer (1977)]. Hierarchical adherence seems to be a mild additional restriction in the context of this simple model. If hierarchical adherence holds, Roberts (1977, lemma 1) has shown that higher income individuals prefer lower tax rates and, hence, lower lump-sum subsidies. They favor less redistribution. Since individually preferred tax rates can be ordered along a single dimension, the collectively preferred tax rate (t*) under MRP will be the median tax rate (t*) preferred by that voter with the median pre-tax income or productive ability. Voters with economic abilities less than the median will prefer higher rates and subsidies, while the more economically able will prefer lower rates and lower subsidies. This general analysis allows us to make only weak predictions about t*, however - namely 1 > t * = t *. We know t* will be less than I since (i) the least able individual person prefers to make a, the subsidy, as large as possible; (ii) a will be zero if the chosen rate is 1 or larger since no one works when t * > 1, and (iii) the more able median voter prefers a lower rate than the least able person who clearly prefers a rate less than 1. Additional structure gives us additional predictions. If we further assume that the distribution of abilities and income is positively skewed so that average income (y) exceeds median income (y.), then the median voter will always prefer a positive marginal tax, t* > 0. There is always some advantage in taxing income as long as your income is not higher than average.8 89 The proof of this result follows an analysis of the median voter's preferred tax rate. A typical individual chooses effort (e) so as to maximize the preference relationship u (1 - t)ne + a, e }. The
Ch. 12: The "New" Political Economy
731
Unfortunately, while we can predict 1 > t = t * > 0, we cannot say anything about the value of the lump-sum tax or subsidy (a) without knowing the value of non-transfer government spending (G). And knowledge of a is crucial for judging the progressivity of a society's tax-transfer system. A progressive system has marginal tax rates which exceed the average tax rate so the average tax rate rises as income rises, i.e. the rich pay proportionally more. A regressive system has marginal rates below the average tax rate, so the average rate falls as income rises. A proportional tax system has a constant average (equal to marginal) tax rate. In this model the average tax rate equals T(y)/y = t - a/y which will be (>) the marginal tax rate (aT(y)/ay = t) as a < 0. A progressive tax-transfer system has a > 0. If G 0, than as 1 > t * > 0 we know from the government's budget constraint (a * = t *y - G/N) that a * > 0. However, if G > 0, we can no longer predict the sign of a a priori. Once again, further structure is required. Romer (1975) and Meltzer and Richard (1981) have examined changes in the MRP preferred t * and a * under alternative, more fully specified, models of the economy. They conclude that as leisure time becomes relatively more important in voter preferences - or, alternatively, as the elasticity of labor supply
first-order conditions for a maximum are given by (1-
t) nut
u < 0,
(i)
where u = au/ac and ue = au/ae, and the condition holds as an equality when e > 0. Small changes in the tax schedule, denoted by dt and da, will affect household utility as du = u[da - ne dt] + de[(l - t)nuc + uJ],
(ii)
which reduces to, du=uc[da-nedt] or
du/dt=u[da/dt-ne],
(iii)
when households respond to the changes in a and t so as to maximize utility [ la (i)]. Any changes in a and t must satisfy the government's budget constraint, however, where a = t - G/N. Thus, da/dt =y +t(dy/dt).
(iv)
Substituting (iv) into (iii) for the median ability voter (y, = nmem) gives: d um/dt= u [ y
+
t(dy/dt) - y].
(v)
Since the median vote sets ti = t, we know du/dt = 0, or (y -ym) + t(dy/dt) = 0. As ( -ym) > 0 and as (dy/dt) < 0 (see Figure 3.20), t must be positive. If ( -ym) < 0, then t must be negative.
732
R.P. nman
increases-the marginal tax rate t* and the level of the subsidy a* decline. Furthermore, when non-transfer government expenditures as a fraction of y is less than 10 percent, the preferred tax structure is generally progressive (a* > 0). However, as G approaches 25 percent of y the MRP preferred a * is negative and the tax-transfer system becomes regressive. The logic behind these results is straightforward. Large government spending for services means that t* must be very high (generally > 0.4 in Romer's simulations) before a becomes positive. At high marginal tax rates we suffer large losses in average income as workers reduce labor effect (see Figure 3.18). These efficiency losses dominate the median voter's desire for redistribution. The final value of t * is too small to both pay for G and offer positive subsidies: thus, a* < 0. These results highlight an important tension in democratic societies between the twin goals of efficiency and equity. As government spending rises to correct market failures, the democratically chosen level of lump-sum subsidies declines and may become zero or negative. Seater (1983) finds evidence for this trade-off in the historical U.S. data. The present struggle in the U.S. Congress between a desire to increase defense spending against the political necessity to simultaneously reduce transfers is a concrete example of this tension. These simple MRP models have made clear the potentially important role that agents' abilities and preferences play in setting a government's tax and transfer policies. Despite this important contribution, however, the MRP models are extremely limited in what they can tell us about redistribution. To ensure analytically stable outcomes, we have been forced to reduce our discussion of distribution policy to one dimension. In fact, in most societies redistribution policies are multi-dimensional. Nations assist their poor who are elderly, sick, permanently disabled, unemployed, homeless, or fatherless, and each group is helped at a different level of subsidy and taxed a different rate on earned income. If we are to capture the essential features of such policies we must move beyond one policy dimension to an analysis of multi-dimensional policies. Recent cooperative bargaining models of the redistribution game give us some results. Aumann and Kurz (1977, 1978) and Gardner (1981) have applied a new approach to the income redistribution game based upon the theory of n-person cooperative games. The implicit political process is a preference intensity process (PIP) where agents are allowed to communicate, to make side-payments, and to form coalitions. In their analysis the n-person game is first reduced to a two-party bargaining game between a winning or "taking" coalition (denoted as S) and a losing or "paying " coalition (- S). The two coalitions then resolve their differences in the redistribution game according to Nash's (1953) bargaining theory of two-player cooperative games. What makes the two-coalition game interesting is the fact that the "paying" coalition canl destroy some or all of its initial resource endowments. It is this threat which induces the "taking" coalition to bargain and not just "take". Nash's theory predicts a unique outcome to the redistribution
Ch. 12: The "New" Political Economy
733
game.9 0 To apply Nash's bargaining theory, however, the analysis must first specify the value to the two coalitions and their members of playing the redistribution game. The value concept chosen is the Shapley value, based on the work of Shapley (1953) and applied in numerous studies of political power [see Riker and Ordeshook (1973, pp. 154-175) and Nagel (1975)]. The Shapley value measures the value to an individual agent as the expectation of his or her contribution to the worth of the agents preceding them in a random order of all the players in a coalition.9 l The Aumann-Kurz analysis proves that the resulting solution to the redistribution game is that allocation of social resources which ("as if") maximizes the Shapley-Nash value of the game for all agents in society.9 The key assumptions which define the structure of the Aumann-Kurz redistribution game are: (i) majority rule determines redistribution policy - any coalition with more than half the agents controls taxes and transfers; (ii) agents are individually unimportant and can influence policy only through coalitions; (iii)
9°Nash's (1953) model of two party bargaining with threats specifies that an acceptable solution must be (i) feasible, (ii) Pareto optimal, (iii) invariant with respect to the choice of utility scaling, (iv) independent of the names of the players, and (v) independent of irrelevant alternatives. Furthermore (vi) when a player's choice of threat strategies is restricted while his opponent's choices remain unchanged, the restricted player cannot obtain a higher utility then he achieved when playing with the unrestricted set. Finally, (vii) the value to a player of his available threats should not be dependent upon the collective, or sequentially reinforcing, property of threats as he plays with his opponent. Nash then shows that the only solution to the bargaining game which satisfies these seven axioms is one which divides the pay-offs equally beyond the player's chosen threat points. 91 The Shapley value splits the value of a coalition amongst the individual members of the coalition according to
~,(v)=
V,(S)[U(S)-(V(S)- {i)], SEI,
where cpi(v) is the Shapley value for player i. v(S) is the aggregate value of coalition S with i as a member, while (v(S) - {i}) is value of coalition S when i is not a member. Thus the term in square brackets measures player i's marginal contribution to joining coalition S. The term V,,(S) is the probability that player i will in fact be the marginal (last) member to join coalition S when there are n other players in the game: V(S) = (S- 1)!(n - S)!/n!. Thus, the expression i(v) is the expected value of the marginal contributions of player i to all coalitions of which he is a member. Shapley (1953) proves that the value ,i(v) is the only value expression which satisfies the three requirements that (i) the value of a player should not depend on the label used to describe the player; (ii) the individual values shall partition the value of the whole game [~=l,i (v) = v(It)]; and (iii) the value of a game composed of two games shall be the sum of the separate values in the two component games. 92 There is also reason to believe that a stable, winning coalition in these zero-sum redistribution games will be of the minimum size necessary to remain a winning coalition; see Riker (1962), Riker and Ordeshook (1973, ch. 7) and Shepsle (1974).
734
R.P. nman
agents can, if they wish, destroy part or all of their initial resource endowments (denoted e); (iv) agents are assumed to have von Neumann-Morgenstern utility for final resources (x), u(x), with the properties u(x) > 0, u '(x)< 0, and oo > ut(x) > 0 for x > 0, but u(O) = 0; and (v) total social resources are positive but finite and equal the sum of all the initial endowments of the many infinitesimally small agents (where dt denotes one small agent and T the sum of all agents): oo > fre(t)dt> 0. The game is played in two steps. First, agents maneuver to form majority (S) and minority coalitions (- S = T- S). Only two coalitions are likely in such games since a majority coalition is needed for redistribution, and the minority coalition will not subdivide since anything a smaller coalition can achieve can also be reached by the larger minority coalition and the larger minority coalition will often be able to do more. Once the coalitions have formed, they play a two-party cooperative bargaining game in which threats are allowed. The majority threatens to take all of the minority's endowments, while the minority threatens to destroy some or all of that endowment. The two coalitions adopt strategically preferred threat points which guarantee the winning coalition at least f *(S) in aggregate utility and the losing coalition at least g*(- S) in aggregate utility, where f*(S) > g*(- S). The Nash (1953) solution to a bargaining game with threats yields a compromise in which the aggregate utility which remains after the threat points are guaranteed is divided evenly between the two coalitions. The Shapley value is then computed for each player in each coalition to determine the individual agent's share of his coalitions aggregate utility. The major conceptual difficulty in the analysis is defining utility values for coalitions in such a way that the aggregate utility can be redistributed amongs the players. Only dollars, not utility, can be re-allocated. Following re-allocations, it may be that the sum of the individual values exceeds the total value allocated to the coalition. The "adding-up" requirement is satisfied if each individual's utility is weighted by X(t) = 1/u'(x(t)); then utility is additively transferable through transfers of income.9 3 The result of this income 93 See Aumann and Kurz (1977, pp. 1144-1145). The intuition behind this requirement is as follows. If the notion of aggregate value for a coalition is to have meaning for any coalition it must be valid for a coalition of all players as well. The aggregate utility for the all-player coalition is given by
vx(T
)
= fX(t)u[x()dt,
where (t) is the "weight" on player t's utility which allows us to sum utilities. The maximum value of v(T) is defined by a value allocation x(t), where the weighted marginal utilities are equal across all agents. Thus, X(t)u'[x(t)] = k, where k is some constant. With no loss of generality, we can set k 1, which implies X(t) = 1/u'[x(t)]. The weighted value lost or gained with the transfer of dollars is therefore X(t)u'[x(t)]dx = l(dx), where dx is the change in income. Transferring dollars is therefore equivalent to transferring utility, when X(t) = 1/u[x(t)].
735
Ch. 12: The "New" Political Economy
redistribution game is a final, unique, value allocation which satisfies the formula [Aumann and Kurz (1977, p. 1145)]:
{X(t)u[x(t)] - V},
x(t) = e(t)-
where x(t) is the income finally received by player t, e(t) is the player's initial endowment, and the expression in curly brackets is player t's tax (if positive) or transfer (if negative). V is a constant and equals the average value of the weighted utilities from the chosen allocation: V= JTX(t)u,[x(t)]dt. Agents whose weighted utility value of money - V= X(t)u,[x(t)] -is greater than average (V) pay a tax. Taxpayers will be agents whose total utility is large and whose marginal utility, u'[x(t)], is small- that is, the richer people in society. Agents who receive a subsidy have a utility value of income (V) less than V- that is, the poorer people in society. Redistribution is pro-poor in the Aumann-Kurz model. What is most striking about the Aumann-Kurz result is the high marginal tax rate implied by the final value allocation. The rate exceeds 0.5 for all players. Total taxes are defined by e(t) - x(t). The marginal tax rate is therefore given by94
d[e(t)-x(t)] de(t) =m(t) = 1-
1 (2+ (utu/u
2
)
from which we must conclude, since u ' < 0, that 1 > m(t) > 0.5. The definition 94
Calculation of the marginal tax rate follows from d[e(t) - x(t)]
_
de(t)
1 _ dx(t) de(t)'
where dx(t)/de(t) is evaluated at the equilibrium value allocation, defined by x(t) = e(t) - u,[x(t)]/u; [x(t)] + V.
(ii)
As V is a constant:
dx(t)
=de(t)
-
ut [x(t)] dx(t) u,[x(t)] ; t)] + [X()'2 u,[x(t)]dx(t),
which implies:
dx(t) de(t)
/
2
.(v
u,[x(t)] U X(t'] 2
I
Substituting (iv) into (i) gives the definition of marginal tax rates below.
(iii)
736
R.P. Inman
of m(t) contains a further insight. The marginal tax rises as the value of the term (- u'/u')/(u'/u,) (= 8,) in m(t) rises. The numerator of 8t is the Arrow-Pratt measure of absolute risk aversion, while Aumann and Kurz (1977, p. 114) label the denominator as player t's "boldness" in risking total ruin. Players who are adverse to risk even on small bets and who are less "bold" when faced with total ruin will have higher values of 8, and therefore larger values of m,(x). The aggressive agent, one willing to risk everything, does better in the Aumann-Kurz redistribution game then one who plays timidly. This results marks an important extension over the non-cooperative, median voter results reviewed above. Not only is redistribution tied to differences in agents' incomes (or productive abilities) but now it is also related to differences in voter preferences and/or skills in playing the game. As a descriptive model of real tax and transfer policies, however, the Aumann-Kurz framework fails; few nations have marginal tax rates which exceed 0.5 for all players. While marginal rates greater than 0.5 are observed in many welfare programs [see Hausman (1981)], most taxpayers are taxed at significantly lower rates. Aumann and Kurz (1977, p. 1559) conjecture that if voting power in their model is related to the initial wealth of the agent, then marginal tax rates tend to fall. In the extreme case where votes are distributed in direct proportion to the initial wealth, there are no taxes or transfers at all. These results suggest the need for another dimension to the model of the income distribution game: the need to specify correctly the initial distribution of political power. Gardner (1981) has extended the Aumann-Kurz analysis in just this direction. He introduces two new political elements which move the analysis away from Aumann-Kurz's simple majority rule model. First, Gardner postulates the existence of an agent with veto power over all redistributive schemes. This veto agent cannot ensure his preferred outcome (i.e. is not a dictator) but he must be included in any winning coalition. Second, Gardner specifies that the winning coalition must attract not only the veto agent(s) but also some variable fraction (a) of the remaining agents. If we normalize the sum of all agents to be 1(T) = 2 and the sum of the veto agent(s) to be (0) = 1 (0 = veto agent), then a coalition S is winning if it includes enough members so it(S)> 1 + a, is indecisive if 1 + a > ,u(S) > 1 - a, and is losing if 1 - a > p&(S). As a - 0, the veto agent becomes closer and closer to being a dictator. As a - 1, the social choice process tends to unanimity. Gardner's analysis follows Aumann-Kurz from this point except that Gardner, having generalized the political structure, must place an additional restriction on agent preferences if he is to get results. The restriction is not trivial; the veto agent and some subset of other voters must be risk neutral (i.e. u,[x(t)] = x(t) for t = 0, and some subset t S'). In terms of our earlier discussion of the Aumann-Kurz results, we require the veto agent and some other voters to have no fear of ruin (, - 0) when playing the redistribution game.
Ch. 12: The "New" Political Economy
737
This is not an implausible restriction on the veto agent as nothing happens without his approval; whether other risk-neutral agents exist is an empirical question best tested by examining the plausibility of Gardner's results. Gardner (1981 pp. 360-362) shows that the equilibrium value allocation of this re-specified redistribution game with the power parameter a is given by
x(O) = e(O) +
1 + 2a - a2
for the veto agent, and by
x(t) = e(t) - ((1
a2 a[x
(t)u+
(1 + a2)(1
2-
a2)
}
for the remaining members of society. Endowments are again measured as e(t) and V is again the average value of the weighted utilities from the allocation. Note, first, that the veto agent is always a net recipient: x(0) > e(0). But as in Aumann and Kurz, poor agents who do not have veto power may also receive a net subsidy if the term within the curly brackets is negative. Second, as a - 1 and the political process tends to unanimity, x(0) e(0) and x(t) - e(t). Not surprisingly, redistribution disappears as we require unanimous consent. Third, as a - 0 and we move to a dictatorship by the veto agent, the veto agent receives his or her maximum transfer and the remaining agents pay a tax almost equal to the utility value of their final allocation [ X(t)u(x(t))]. Redistribution is maximized in favor of the ruling coalition in this case. Finally, as with Aumann and Kurz, we can calculate the schedule of marginal tax rates for the agents outside the veto group: d(e(t)-x(t))
de(t)
(
2 m)
1- ( 2
1 =a (1 + a2)(
ut u/u, )
Now m(t) can be less than 0.5 depending on the value of a. Specifically, as a - 1, m(t) - 0. Conversely, as a - 0, the marginal tax rates tend to the Aumann-Kurz values. As the political power of the agents outside the veto group increases (a - 1), those agents are better able to withstand the redistributive pressure from the veto group itself. The work of Aumann, Kurz, and Gardner, on the income redistribution game is promising. Of course, one can be critical. Extending Gardner's analysis to allow all agents to be risk averse would be an important extension. There are important difficulties with the Shapely-Nash solution concept itself, but what
738
R.P. nman
one might propose for an alternative is not obvious. 95 Exactly how coalitions form is never specified, yet this is a central issue for the cooperative game
approach to collective choice [Kramer (1972)]. Finally, and perhaps most importantly, both studies assume when specifying the final value allocation that redistribution involves no waste of social resources; total resources after redistribution equal total endowments before redistribution: fx(t)dt = JTe(t)dt. While players threaten to destroy endowments when the game is played, the final contract is such that no resources are in fact sacrificed. In effect, all preferences and abilities are assumed to be revealed truthfully during the game so that a first-best tax and transfer policy can be specified and enforced. This is a very strong assumption. The extension of these PIP models to accommodate the incentive effects of taxation remains to be done.9 . Conceptual and empirical analyses as to why societies redistribute resources has just begun. While the early work is promising, we are far from a compelling model of collective resource redistributions. The effects of agent preferences, abilities, and, most importantly, political institutions must be considered. Until such analysis is completed we must reserve final judgement as to the redistributive performance of alternative collective choice and market institutions.
3.5. Summary The failure of market institutions to achieve an economically efficient or fair allocation of social resources has raised the possibility that a government might somehow do it better. Voluntary collective organizations where individuals are free to stay or go were seen to suffer from the same fundamental problems which plagued markets (Section 3.1). Individuals will not cooperate unless it is in their private interest to do so. A coercive government with the power to tax and spend is necessary if we are to achieve cooperative allocations. Yet we do not wish our 95 A review and critique of the Nash bargaining theory is found in Luce and Raiffa (1957, pp. 128-134; 140-142) or Cross (1969). A review and critique of the Shapley value can be found in Luce and Raiffa (1957, pp. 245-252) or Riker and Ordeshook (1973, pp. 154-163). Alternative solution concepts to n-person, cooperative ames are reviewed in Shubik (1982, pp. 336-367). 96 A beginning on this question can be found in Ordover and Schotter (1981). Their analysis does not address transfer policy but simply concentrates on the determination of tax rates. They show that in a model where the politician is effectively a monopoly agenda-setter and seeks to maximize his side-payments ("campaign contributions"), the offered tax structure will be the "best" tax structure available after allowing for the disincentives of taxation. The voters and the agenda-setter write an implicit contract-enforced by reputation-in which the politician promises the "best" tax structure in exchange for the dollar value to the voters at this "best" structure over the dollar value of the structure which an opponent might propose (in effect, the reversion level). The opposing proposal may be the status quo. If political competition is allowed, however, no voting equilibrium is generally possible and the "best" tax structure is no longer a stable outcome.
Ch. 12: The "New" Political Economy
739
governments to be oppressive. Nor do we want collective decision-making to be too administratively cumbersome and expensive. Can we find, therefore, a government which respects individual preferences for collective activities, is administratively efficient, and, finally, capable of finding economically efficient allocation of social resources? The answer, as shown by Arrow, is no (Section 3.2). We therefore stand at a crossroads, and we must choose. One route takes us to dictatorial, often unfair, but potentially efficient, allocation processes (Section 3.3). The other roads lead to democratic, potentially more equitable, but generally inefficient mechanisms (Section 3.4). The decision as to how to organize the social allocation of social resources is a difficult one; there are no clear answers. Setting a preferred mix of market and governmental - voluntary and coercive - institutions requires a delicate balancing of pluses and minuses for neither institutional form clearly dominates the other. Before beginning the task of weighing the relative virtues and vices of markets and governments, however, we need to examine some recent attempts to improve the efficiency performance of governments.
4. Public policies for the public sector The very principle of constitutional government requires it to be assumed that political power will be abused to promote the particular interests of the holder; ... such is the natural tendency of things, to guard against which is the especial use of free institutions (J.S. Mill, Considerationson Representative Government, 1861). Accepting the view that government performance can be analyzed systematically and that successful predictions as to future public choices can be made, raises the next questions: Can we alter that performance and, if so, how? To political economists, governments are self-contained organizations making choices and allocating resources. Governments have their own structure and their own dynamics; one cannot simply tell a government to change. To alter performance, we must alter the structure of government. The central elements in that structure of public choice are the underlying preference and endowments of the participating agents (voters, bureaucrats, elected officials), the technologies of service production, and the rules (including the rules to change the rules) of government decision-making. If tastes and endowments of agents are assumed fixed (the usual assumption), then the only structural elements which can be changed are production technologies and the rules of choice. It is not unfair, I think, to characterize the task of expanding the set of feasible outcomes, given choice rules as the central mission of the policy economist/analyst. The task of the political economist/analyst, on the other hand, is to understand the effects of, and to
740
R.P. Inman
recommend changes in, the rules of public choice given the production technology. To do so we must understand how public organizations behave. That has been the central aim of the studies reviewed in Section 3 above. We must also think about taking the next step: changing the rules which govern public sector behavior. Section 4.1 discusses how we might address the problem of recommending changes in the rules of public choice. Section 4.2 reviews some recent studies which have proposed changes in the rules of fiscal choice to improve the efficiency performance of governments. Section 4.3 offers concluding remarks.
4.1. On designing policies for government reform The design and implementation of public policies for public organizations must move through each of three steps: (1) the evaluation of the current performance of the organization against a set of normatively compelling criteria; (2) the behavioral description of the present public organization which allows us to predict organization performance and propose a remedy; and (3) the creation of a legitimate, extra-organization institution which can enforce a change in the public organization's environment and thereby alter its performance. The evaluation of government performance must proceed within the same normative system which leads us to propose its existence in the first place. For example, within the framework of utilitarianism, governments as a coercive organization are desired because they permit transactions which increase the aggregate well-being of society as reflected in a social welfare function of the general form, W{U 1 ,..., UN }. Such a normative structure allows us to justify, and criticize, government performance on grounds of economic efficiency or distributive justice. Normative systems as different as those of Bentham or Rawls can be included with this framework; see Arrow (1973) or Alexander (1974). Alternatively, one might well deny the ethical validity of such "end-state", utilitarian evaluations of political performance and concentrate instead upon the "process" of collective actions. Nozick (1974) and Buchanan (1975) develop such normative positions giving primary moral emphasis to notions of liberty. The social contract, including the appropriate role of government, is based upon the central notion that individuals should be allowed to develop their initial endowments to their fullest potential. Any activity, institution, or social process which discourages or prevents that development is to be prohibited. It is not necessary here to argue for or against any particular evaluative structure - utilitarian, libertarian, or an alternative -but only to understand that such structures have been proposed and that government performance can, and should, be judged against such criteria. Actual governments can be inefficient, unfair, or in violation of some basic human right. What if the performance of government is found wanting? The next task is to find out why government has "failed" and to seek a remedy. At this step, we turn
Ch. 12: The "New" Political Economy
741
to the positive analyses of government behavior as reviewed in Section 3. Particularly important in the context of policy design is to understand how the technological and institutional environments influence the behavior of the government organization. Technology defines the aggregate volume of resources potentially available for government allocation as well as the marginal costs (including any excess burden costs) associated with turning private dollars into public activities. The institutional environment are the rules - the constitution - under which government allocations are decided. Included here are the voting rules which determine how choices will be made as well as any prohibitions against, or requirements to perform, explicit activities. Changes in the technological environment through grants-in-aid (or foreign aid) from one, "higher" level of government to another "lower" level of government will influence the tax and spending decisions of the lower levels.97 New production technologies or management techniques will alter the relative prices of public activities and thereby influence government allocations.9 8 The institutional environment can also be changed. Voting rules can be altered, or governments may be prohibited from using certain taxes, from exceeding a spending limit, or from enforcing laws which discriminate by age, sex, race, or religious preference. Having (i) discovered a government "failure", and (ii) knowing the effects of changes in the technological or institutional environments on government performance, we can then (iii) alter those environments and possibly correct or reduce the observed failure. Crucial to this view of policy, however, is the assumption that we have a legitimate - i.e. accepted - extra-governmental institution through which these technological and institutional reforms can be effected. In a democratic society, two such extra-government institutions suggest themselves: the courts and citizen-initiated constitutional amendments. In most Western democracies the judicial system stands, at least in constitutional theory, as a body largely independent of the legislative and the administrative-executive branches of government. Whether this independence holds in fact is a disputed question. Citing as evidence the recent advances by the U.S. courts into social policy-making, Shapiro (1964), and more recently Horowitz (1977), have argued that the judiciary does provide an independent review of legislative activity. Law professor Andrew Bickel (1970, p. 134) wrote: "All too many federal judges have been induced to view themselves as holding roving commissions as problem-solvers, and as charged with a duty to act when majoritarian institutions do not." In these analyses the courts are seen to respond to a potentially different set of criteria than their legislative counterparts when evaluating and, in many instances, requiring policy initiatives. At least the recent U.S. legal history suggests an emphasis on equity over the goal of efficiency 97
See, for example, Inman (1978) and for a general review of the grants-in-aid literature, see Inman (1979). 98 For example, Baumol (1967b) argues that technological improvements in the provision of private goods will eventually bankrupt the less technologically advanced public sector.
742
R.P. Inman
[Shapiro (1964), Horowitz (1977), Inman and Rubinfeld (1979)]. The fact that legal counsel for welfare and civil rights groups and for environmental and consumer groups prefer the judicial to the legislative route to reform is evidence on this point [Brill (1973)]. What is lacking in all of these studies, however, is a compelling institutional theory of how judicial decisions are made. Without such a theory one could well assert the alternative view and claim the courts do not act as an independent check on legislative performance. Landes and Posner (1975) take just such a position when they argue that the judiciary enforces rather than challenges the decisions of the legislature. The courts' enforcement of past legislative decisions is just what is required to give legislation the stability needed to ensure its passage in the first place. In the legislative market place only stable laws enforced by the courts promise sufficient long-term value to the various special interest groups to make the fight for passage worthwhile. In this role, the judiciary must be free of the wishes of the current legislature, who may wish to overturn those laws, yet subservient to the will of past legislatures. The use of lifetime appointments for judges, appointments generally made by prior legislatures, is seen by Landes and Posner as one means to ensure this historical obligation. Whether in fact the courts play a role independent of the legislative process therefore remains an open issue. 99 For those who do observe an independent judiciary there are the further questions: How well do the courts perform their review of legislative activities and do we want them to perform that function at all? ° Horowitz (1977, p. 298) for one is skeptical: "Over the long run, augmenting judicial capacity (for review) may erode the distinctive contribution the courts make to the social order. The danger is that courts, in developing the capacity to improve on the work of (legislative or administrative) institutions, may become altogether too much like them." If not the judicial system as an overseer of legislative action, then what? Citizen-initiated referenda have emerged in the United States as the most prominent alternative. The states of California, Massachusetts, and Michigan have all passed tax or spending limitations to restrict state budget growth. A constitutional amendment to require a balanced federal budget has also been discussed. Mueller (1973) has argued for just such a review process of legislative action. The legislature is viewed as responding to short-run vested interests, while 99 The empirical evidence of Landes and Posner (1975, appendix) on this question is not conclusive. Looking only at judicial judgments as to the constitutionality of a legislative act, Landes and Posner find that longer tenured judges are more likely to nullify old laws (contrary to expectations) and that older judges (holding judicial tenure fixed) are more likely to nullify more laws (consistent with expectations). The number of nullifications is very few, however; only 97 of the 38 000 laws passed by Congress from 1789 to 1972 have been overturned by the courts. What is needed is a careful look at how the courts monitor administrative agencies- how laws are enforced- for it is here that the judiciary is most often influential. See, for example, MacIntyre (1983). 100For an analysis of the logic and consequences of judicially-imposed reforms on government fiscal choice, see Inman and Rubinfeld (1979).
Ch. 12: The "New" Political Economy
743
citizens voting at the "constitutional stage" are assumed to consider the long-run implication of choices which may have no direct bearing on their own lives. The existing legislature responds to individual gains and losses, given present circumstances and tastes. Citizens and their representatives vote in their self-interest. Constitutional restrictions, however, are institutional reforms which will influence current and future legislative choices. Voters must necessarily take a long-term view of such decisions, but in the long run the position of each citizen is uncertain. Mueller argues (1973, p. 62) that in the face of such uncertainty voters may well adopt as a working hypothesis that they or their heirs can well be anyone: rich or poor, liberal or conservative. Individuals may then make their constitutional choices from a wider perspective than their own current self-interests - for example, by maximizing expected welfare assuming there is an equal probability of occupying any position in the future distribution of outcomes [ la Harsanyi (1955)]. The point is not that Mueller's precise construction of citizen oversight is correct [see Pattanaik (1968) or Sen (1970) for a critique of the Harsanyi logic], but that it is plausible to view citizens as stepping outside the current rush of political activity to evaluate and then to alter the flow of those events. The logic of such evaluation is prospective, forward-looking, and necessarily more inclusive of others' interests than would be a decision on current legislative activity; today's smokers can vote to ban smoking in public places, the rich can favor income redistribution, men can favor the extension of the franchise to women. To be effective, the institutional reforms must carry a measure of permanence, however; today's reform cannot be easily overruled tomorrow. And it may be necessary to postpone the enactment of reform, or to allow for "grandfathering", to ensure a more socially inclusive view of policy and a sense of legitimacy. Thus constitutional amendments or their repeal often require two-thirds majorities rather than simple majorities, and enactment may be postponed from two to five years after passage. In recent years there have been several efforts (notably in the United States) to improve the process of government decision-making. Each has used the constitutional or the judicial route to reform. In each case there has been an evaluation of government performance, a diagnosis of the problem based upon a behavioral model of that performance, and finally, a proposed remedy established and enforced by an extra-legislative institution. In the next section I review the logic behind one of these reforms: the use of constitutional amendments to improve the efficiency performance of government allocations.
4.2. An example: Constitutional limitations for government efficiency The past ten years have seen the rise of the citizen-initiated referendum to impose constitutional limitations upon government budgetary activities. In June 1978 the voters of California in a statewide referendum approved Proposition 13, restrict-
R.P. nman
744
ing the rate of local property taxation to no more than 1 percent of 1975 market values. In effect, the voters elected to cap local government spending. Similar referenda with similar effects have been approved in Michigan (a limitation on state income taxation) and Massachusetts (a limitation on local taxation). Surveys of voters in each state following the approval of these constitutional limits to government spending indicated that the voters felt government was inefficiently too big. °"' Reforms of bureaucratic and legislative behavior were required. The chosen means of limiting what was felt to be inefficient government performance was to constrain total spending through a constitutional amendment. While the initial reaction of public finance economists to such limitations was one of distrust - the reforms seemed excessively restrictive - there are in fact good economic arguments for their use. The economic problem which lies behind the perceived desirability of spending limitations is government inefficiency. The source of this potential inefficiency is the self-seeking, utility-maximizing behavior of the imperfectly controlled political agent. This politician-bureaucrat is assumed to have a well-defined preference relationship over personal income, y, and work effort, e, specified here as V(y, e) = u(y) + v(e); the marginal utility of income is non-negative and declining [u'(y) > 0, u"(y)< 0] and the marginal utility of effort is non-positive and declining [v'(e) Y *, but where we express poverty negatively, in that social welfare is decreasing in the poverty index, so that minus the headcount is represented by the integral of -1 from 0 to Y *, with the discontinuity at Y*. This is indicated by the heavy line in Figure 2.1. [See Atkinson (1985b) for further details.] There are several possible ways in which one could arrive at a replacement for the head count. One can simply propose measures which look better. Watts himself suggested two, and these are shown in Figure 2.1. First, there is the normalised poverty deficit which replaces the discontinuous function in the head count by the continuous function (Y- Y *)/Y *, or minus the shortfall from the poverty line divided by the poverty line. The second is more sophisticated, and is designed to take account of the fact that "poverty becomes more severe at an increasing rate" [Watts (1968, p. 326)], taking the function log(Y/Y*) in place of the poverty deficit, where Y < Y*, and zero otherwise. This in turn may be seen as a subcase of the class of measures suggested by Clark et al. (1981), governed by the parameter c, varying between 1 and -oo, with the Watts case being
Ch. 13: Income Maintenance and Social Insurance
791
obtained as c tends to zero. There are other possibilities, such as the class of measures suggested by Foster et al. (1984) and shown in Figure 2.1, based on some power of the (normalised) poverty gap. Sen, on the other hand, took an axiomatic approach and this has been followed in a number of subsequent articles. In the construction of the actual index that he proposes, the key axiom is that which rejects the equal weighting of poverty gaps implied by the poverty deficit, on the grounds that this is insensitive to the distribution of income amongst the poor, and proposes that the poverty gap be weighted by the person's rank in the ordering of the poor. The Sen assumptions lead to an index for the measurement of the extent of poverty which includes as one of its elements the Gini coefficient of the distribution of income among the poor. At the same time, he emphasises the "arbitrary" nature of the rank order assumption; and it is not surprising that the subsequent literature has sought to develop alternatives. Reference should be made to the work of Anand (1977, 1983), Blackorby and Donaldson (1980), Chakravarty (1981), Clark et al. (1981), Hamada and Takayama (1977), Kakwani (1981), Takayama (1979), and Thon (1979, 1981, 1983). A valuable survey is provided by Foster (1984). There is yet another approach that has not been much followed in the literature. This considers a class of poverty measures satisfying certain general properties and seeks conditions under which all members of that class will give the same ranking. It tries to establish common ground, so that we may be able to say that poverty has increased, or decreased, for all poverty measures in a class. In Atkinson (1985b) it is shown that a sufficient condition for poverty to have decreased for all poverty measures which are additively separable, non-decreasing and (weakly) concave functions of income is that the poverty gap be no greater taking all poverty lines from zero to the maximum in the range Y). The head count does not satisfy these conditions, but all the other measures shown in Figure 2.1 do satisfy them. So that if for 1950 the poverty gap curve, drawn taking each value of Y as a possible poverty line up to Y2*, lies everywhere below that for 1980, then we can conclude that poverty has increased. [This is in effect a second-degree stochastic dominance result, and related to that given in portfolio theory by Fishburn (1977)]. The implications of the concavity assumption, which embodies the Dalton transfer principle, are discussed in Atkinson (1985b). It may be noted that the result can be strengthened to include measures which are s-concave but not necessarily additively separable, such as the variation on the Sen index proposed by Thon (1979), where the weight attached to the poverty gaps is the ranking in the whole population and not just that among the poor. 2.3. Welfare economics and social welfare functions
The study of poverty over the last half century has proceeded largely independently from the main body of theoretical welfare economics. Whereas Pigou's
792
A.B. Atkinson
The Economics of Welfare contained seven chapters dealing with the poor and measures for the alleviation of poverty, such considerations received short shrift from the "new welfare economics" [see, for example, Little (1950) and Graaff (1957)]. In the present context, it is clearly necessary to bring together the two fields, since the public economics treatment of social insurance lies firmly in the welfare economic domain. In making this link, we can distinguish two main approaches. The first sees concern for poverty as stemming from individual interests or what Harsanyi (1955) calls a person's "subjective" preferences; that is a person supports transfers to the poor because this makes him better off in some way. The second sees concern for poverty stemming from considerations of social welfare, or what Harsanyi calls "ethical" preferences; that is the person supports transfers because that accords with his notion as to what constitutes a good society. The relation with individual interests has received attention in recent years in terms of "Pareto-improving" transfers, or voluntary redistribution, following the article by Hochman and Rodgers (1969). At the heart of the approach is the assumption that individual welfares are interdependent. The individual utility is a function of his consumption, and that of others, where we allow both for the possibility that it is the consumption of particular goods by others with which he is concerned and for the possibility that it is their utility level which enters his judgement. The first of these possibilities is illustrated by the case where he is concerned with how much others have to eat; the second by the case where he is concerned with their sense of well-being. These have rather different implications, the former leading to concern with "specific poverty" and the latter to assessment in terms of standards of living or even income. The utility interdependence justification for redistribution has been extensively explored [see, for instance, Hochman and Peterson (1974)]. Two points should, however, be noted. First, it has been taken for granted that the consumption of others does not enter negatively in the form of envy. The latter many people would want to ignore; as it was put by Sir Dennis Robertson, we should "call in the Archbishop of Canterbury to smack people over the head if they are stupid enough to allow the increased happiness... to be eroded by the gnawings of the green-eyed monster" [Robertson (1954, p. 677)]. Secondly, we have to consider whether the interdependence is symmetric. In the original example of Hochman and Rodgers (1969), Mutt on $10000 a year is concerned about Jeff on $3 000, but for much of the analysis Jeff is not, at the margin, concerned with Mutt's income. The asymmetric case where the rich are concerned about the welfare of the poor, whereas the poor are not concerned. about'the welfare of the rich, is open to the objection that we are replacing the subjective evaluations of the poor by those of the rich. A number of treatments of interdependence, such as that of Arrow (1981), have, however, assumed symmetric utility functions.
Ch. 13: Income Maintenance and Social Insurance
793
The approach to redistribution via interdependent preferences takes on particular significance when we consider the relation between private and social transfers in the Section 3. The same applies to the interpretation in terms of individual subjective preferences based on the "insurance" motive, where individuals favour redistribution on the grounds that they may themselves later be in need of assistance. This motive has been described by Buchanan and Tullock (1962, p. 193) as follows: to prevent the possibility of his falling into dire poverty in some unpredictable periods in the future, the individual may consider collective organisation which will, effectively, force him to contribute real income during periods of relative affluence. Such collective redistribution of real income among individuals, viewed as the working out of this sort of "income insurance" plan, may appear rational to the utility-maximizing individual." The notion of insurance is closely allied to the contractarian justification which has been advanced by several authors for the adoption of particular sets of ethical preferences. Harsanyi (1953, 1955) suggested that one's ethical preferences should be governed by the condition of "impersonality", such that "they indicate what social situation he would choose if he did not know what his personal position would be - but rather had an equal chance of obtaining any of the social positions" (1955, p. 315). The idea has been developed in a somewhat similar way by Buchanan and his colleagues [see, for example, Buchanan (1974) and Hochman (1983)]. They distinguish between the "constitutional choice" assumed to be made by people in the absence of knowledge about their own position in society, which corresponds to the ethical preferences, and the "post-constitutional choice" made by people in the light of full knowledge of their rights and resources. According to Buchanan (1974, p. 36): at the stage of genuine "constitutional" choice, it is not appropriate to include the standard arguments in individual utility functions, nor is it at all appropriate to introduce explicit utility interdependence. [Individuals] remain uncertain about their own income-wealth positions in subsequent periods during which the rules to be adopted will remain in force. Harsanyi [and Vickrey (1960)] argues that the condition of impersonality would lead to the resulting social welfare function taking the form of the sum of utilities, in this way providing a rationale for a utilitarian objective function. A quite different cnclusion is reached by Rawls (1971) arguing on the basis of a similar hypothetical situation, which he describes as the "original position". He argues that in this original position rational individuals will agree to two principles. The first is that liberty has priority over other considerations, no quantity of material goods being sufficient to compensate for loss of liberty. The second is the principle that everything should be distributed equally unless
794
A.B. A tkinson
unequal distribution is to the benefit of the least advantaged. It is this second principle, that of "maxi-min", and its lexicographic extension, that has had most influence on economics. The mental experiment suggested by Rawls is the same as that used by Harsanyi, Vickrey, and others to justify utilitarianism, but Rawls reaches an apparently quite different conclusion. Indeed, he is concerned to emphasise the differences. Following Arrow (1973), we may view both approaches as special cases of the social welfare function:
([Ul
--1}/(1-q),
q>0,
(2.2)
where social welfare is assumed to be an iso-elastic function of individual welfares, the latter being denoted by Ui. If, as in the Vickrey-Harsanyi story, people assume that they may be any member of society with equal probability, then expected utility maximisation would lead them to adopt this social welfare function with q = 0. The concavity of the function Ui then takes account of risk aversion. On the other hand, the Rawlsian social welfare function may be obtained either by allowing U to express infinite risk aversion or by letting q tend to infinity. Rawls himself suggests that "the parties will surely be very considerably risk-averse" (1974, p. 143). But his argument goes wider than this, and he adduces several further reasons for adopting the maxi-min objective. The attractions of this objective include "sharpness", or the fact that "citizens generally should be able to understand it and have some confidence that it is realised" (1974, p. 143). The social welfare function (2.2) is formulated in terms of individual welfares. If the function is formulated in terms of individual incomes, Y, then we can derive an index of income inequality, like that described in Atkinson (1970). In empirical studies, such as those of the redistributive impact of the welfare state discussed in Section 4, it is commonly income which is the variable studied, and measures of income inequality are widely used. There is of course little agreement about the appropriate value of q [see Okun (1975) for the "leaky bucket" experiment which helps people think about the implications of different values]. For this reason, the less demanding requirement of non-intersecting Lorenz curves is often used. If one is unhappy [see Kolm (1976)] about the implicit assumption that the measure of inequality is invariant with respect to proportional shifts in income (the Lorenz curve remains the same if all incomes are doubled or halved), then it may be better to use the "generalised" Lorenz curve, where it is the cumulative total income, not the share in income, which is shown on the vertical axis [Shorrocks (1983)]. [For results that allow some conclusions to be drawn in cases where the Lorenz curves cut, see Atkinson (1973b) and Shorrocks (1984)]. Ethical preferences have been discussed in terms of contractarian theories of the type advanced by Harsanyi or Rawls, but they may arise more directly
Ch. 13: Income Maintenance and Social Insurance
795
because the distribution of income, or resources, enters individual utility functions. This notion has been described by Thurow (1971, p. 327): "individuals may simply want to live in societies with particular distributions of income and economic power. There may be no externalities; the individual is simply exercising an aesthetic taste for equality or inequality similar in nature to a taste for paintings." The same idea is discussed by Breit (1974), who suggests that individual utility functions may be treated as embodying an index of inequality such as the Gini coefficient.
2.4. Concluding comment In several of the explanations of redistributional objectives, a key role is played by uncertainty. The contractarian theories posit a state of uncertainty as a mental experiment. The insurance model sees uncertainty as a real possibility that motivates individual choices. As it has been put by Dryzek and Goodin, people may act impartially "not because they are acting as if they were uncertain, but rather because they really are uncertain" [Dryzek and Goodin (forthcoming, p. 8)]. In more applied writing on the welfare state, too, the notion of uncertainty is prominent. Haveman (1985, p. 8) has recently argued, for example, that "the primary economic gain from the welfare state is the universal reduction in the uncertainty faced by individuals". In this context it is, however, important to distinguish between "risk" and "uncertainty". Industrial accidents may be a risk, whose probability and cost may be assessed by people with reasonable accuracy, and for which insurance may be provided on an actuarial basis. On the other hand, fears that there will be widespread recession, or a breakdown of family support for the aged, are not easily translated into the expected utility calculus. They are sources of uncertainty for which insurance in a narrow sense may be inapplicable. What is required is mutual support of a kind described by the French term "solidarity" or by Beveridge's phrase that "men should stand together with their fellows" [Beveridge (1942, p. 13)].
3. The design of state income maintenance
3.1. A simple model What is the role of state income redistribution? How does it relate to private philanthropy and to private insurance possibilities? What form should it take? To explore these questions, I use a highly simplified model, based on strong assumptions, particularly those about behaviour. It should however be stressed that the model does include behavioural responses. One of the major advances in
A.B. Atkinson
796
recent years in the analysis of the design of income maintenance, and public policy more generally, has been the explicit treatment of the effect of taxes and benefits on family behaviour. The "fixed cake" model which featured prominently in many discussions of distributional justice has been replaced by one in which labour supply, or other decisions, may affect the size of the cake. Indeed, this happened before supply side considerations became fashionable elsewhere in economics. The model used is similar in key respects to that investigated in the literature on the design of income taxation, notably following the article by Mirrlees (1971), which considered the optimum income tax structure when people varied their labour supply in response to changes in the tax schedule. His treatment concerned the working, or potentially working, population, but allowed for the possibility that the after tax income would exceed before tax income (a negative income tax) and that at low levels of wages people might choose zero hours of work, living solely on the guaranteed minimum income. The income transfer aspect was dealt with independently by Zeckhauser (1968, 1971). He distinguished two main groups, characterised as the representative citizen and the representative poor. The latter were assumed to vary their labour supply in response to the transfer. The simplified model combines elements of both those outlined above. In contrast to the Mirrlees formulation, two different groups are distinguished, but unlike the Zeckhauser model the second group is assumed to consist of people unable to work. [A two group example is also used in numerical calculations by Cooter and Helpman (1974)]. The assumptions made concerning the first group are similar to those of Mirrlees. They work a fraction L of the day, consume C of a consumption good which is taken as the numeraire, and are identical in all respects except their wage rate per hour, w, which is distributed with density f(w) over the range from W to infinity, such that the integral of f(w) over that range is equal to unity. Their utility functions are identical and of the form: U(C, L) = log(C+ C*) + b log(1 - L),
(3.1)
where b and C * are constants and are assumed strictly positive. The assumption about the sign of C * has implications for the behavioural responses, as discussed below, and it should be stressed that it is made purely for purposes of illustration, not for its empirical relevance. The first term in (3.1) is the utility derived from consumption and the second is that from leisure [a fraction (1 - L) of the day being available for leisure]. The budget constraint in the absence of taxes is given by C = wL, there being no non-labour (lump-sum) income. The indirect utility function, denoted by V(w,O), gives the level of utility attained as a function of the wage rate and lump-sum income (in this case zero).
Ch. 13: Income Maintenance and Social Insurance
797
The second group consists of those who are unable to work and who have zero income in the absence of transfers. This group is assumed to have the same utility function (3.1), where the positive value of C * implies that their welfare level has a finite lower bound. The level of indirect utility is given by v(R), where R is the transfer received. They are referred to as group A (for aged); the working population is referred to as group B. The ratio of the number in group A to that in group B - the dependency ratio - is denoted by D. Suppose that we consider first the traditional role of income maintenance: that of providing relief for those who would otherwise have no income. The government levies a proportional tax on the earnings of group B at rate t, with the proceeds being transferred as an equal cash payment to group A. This means that group B is faced with a budget constraint C= w(l - t)L, and the utility-maximising labour supply function may be calculated to be L = (1 - bC*/w(1 - t))/(l + b),
(3.2)
where t < - bC*/W
(3.3)
(ensuring that labour supply remains strictly positive even for the person with the lowest wage, this being W). Where (3.3) holds, the revenue collected per head of group A is given by
R = t(w -bC*/(1 - t))/D( + b),
(3.4)
where w denotes the average level of the wage rate. We can trace out a transfer possibility locus, as in Figure 3.1. The extent of the transfer will be subject to several conditions, including (3.3). If these do not intervene, then the attainable transfer reaches a maximum at the value given by [from (3.4)] (1 - t)2 = bC*/w.
(3.5)
From the expression for the labour supply function [equation (3.2)], we can see that the supply of labour in this case is a declining function of the tax rate where C* is positive. Whether this is so in the real world is of course an important empirical question (see Chapter 4 by Hausman in Volume I of this Handbook). As the literature on optimum taxation has shown, one of the ingredients in the determination of the level of taxation and transfer is the elasticity of labour supply. [See, for example, Atkinson and Stiglitz (1980, lecture 13). The model used here differs in that the tax is proportional without an exemption level, with the transfer only being made to those not in the labour force.]
798
A.B. A tkinson FPnihlP trnnsfpr
Basic Model
Where people in group can opt to join A
Tax rate Figure 3.1. Transfer possibility locus.
The limiting form of disincentive is when a person withdraws totally from the labour force. The assumption (3.3) precludes labour supply from falling to zero, but we need to consider more generally the relation between the two groups A and B. If L were to fall to zero, then the resulting level of utility for group B would be lower than that achieved by group A, given that the latter receive a positive transfer. This raises issues of fairness and incentives. If it were open to the low paid worker to opt out of the labour force and joint group A (for example, by retiring early), then we should have to allow for the possibility that increases in the transfer and in t would lead to an increase in the size of the dependent population. (In this respect, the model would be closer to that discussed in the optimum income tax literature.) The working population would then become those with wage rates in excess of w *, where w * is defined by log(R +C*)=log[(w*(
-t) + C*)/(b+ 1)l
+blog[(1 + C*/w*(I - t))/(1 + 1/b)].
(3.6)
The left-hand side is the level of utility achieved in group A; the right-hand side is that achieved in group B with a wage w * [calculated by substituting from (3.2) and the budget constraint]. This condition is illustrated in Figure 3.2, where the indifference curve indicating the choice of hours on the budget line AB is such as to pass through the point where L = 0 and the transfer is R. When w * > W, then the possible transfer is reduced, as illustrated by the dashed line in Figure 3.1. The extent to which social security leads to withdrawal from the labour force will be discussed in Section 5.
Ch. 13: Income Maintenance and Social Insurance
799
NET
Figure 3.2. Person indifferent between work and categorical benefit.
It is assumed above that the option of withdrawing from the labour force is one freely open to the worker, and this is indeed the assumption made in the optimum tax literature. In practice, the group A is typically defined as satisfying certain conditions; that is, the transfer is a categoricalone. In the case of old age, the test is a relatively straightforward one (and presumably only a minority of the working population would qualify). But in the case of disability or unemployment, the application of the criteria is less straightforward, and it is possible that some degree of discretion exists. Nonetheless, it is important to remember that the receipt of transfers such as unemployment insurance is in practice subject to constraints: benefit claims may be rejected on the grounds that the administrators do not consider the person "genuinely" qualified. We have set out above a menu of choices open to the government, in the context of a highly simplified model. In this respect, it is similar to the analysis by Diamond (1968) of the negative income tax. Using a function like (3.1), with b = 1 and C * = 0, he calculated how the possible level of transfer depended on the rate of withdrawal of benefit (the marginal tax rate), showing how the feasible transfer fell below that estimated ignoring labour supply responses (in this case the responses of the recipients).
3.2. Income support- individualisticpreferences The choice from the menu described above will evidently depend on the objectives pursued and these may take different forms, as discussed in Section 2. First,
800
A.B. A tkinson
we consider the case where income maintenance is justified on an individualistic basis, according to the preferences of the working population for redistribution. As it was put by Zeckhauser (1971, p. 324): poor and non-poor alike seek to institute mechanisms that give something to the poor beyond what they get from the natural workings of the "competitive" system. That the non-poor share in this desire indicates that an externality is generated when the economic welfare of the poor is improved. This leads us directly to ask why state redistribution is required. If the interdependence of individual utilities is such that individuals would like to see transfers carried out, why is private charity not sufficient? In order to examine this question, let us suppose that, associated with each N members of the dependent group A are N/D workers who regard the support of the dependent as their shared responsibility. Their preference for redistribution may be represented by extending the utility function of group B to include the level of welfare per person in group A, "discounted" by a factor r (where this is a constant assumed to lie between zero and unity): log(C + C* ) + b log(1- L) + r log(R + C*).
(3.7)
Within this framework, we may identify two rather different concerns. The first is for what we may call "family" redistribution, where a member of the working population is, say, concerned with the welfare of his or her parents. Such altruism within the family has been studied by Becker (1974, 1981), and the intertemporal aspects, particularly as they affect bequests, have been emphasised by Barro (1974, 1978). In the Becker analysis of the family, there is in general one donor, whose transfer then determines the living standard of the recipient. For example, with one donor and one recipient, the level of utility-maximising private transfer, denoted by S, is given by [where (3.3) is assumed to hold] S = [rw- (1 + b - r)C*]/(1+ r + b),
(3.8)
where this is non-negative; otherwise S = 0. From examination of the corner condition for S = 0, it may be seen that, holding b and C* constant, it is those with too low wages, or insufficient concern (too low r) who do not make transfers. As Barro has stressed, where such private transfers are being made, the introduction of state transfers will simply lead to a replacement of private transfers (this is leaving out of account any distortionary effect from the financing of the transfer). If the donor is at a corner, so that S = 0, then the introduction of state transfers will have an effect, but it will not be preferred by the donor (although it evidently will be preferred by the recipient). The family model takes no account of the real-world fact that some people have no descendants, so that no provision would be made for them. This brings
Ch. 13: Income Maintenance and Social Insurance
801
us to the second interpretation of the model, which is where people have a more general concern for the aged or other deserving groups. In this "large group" model, assumptions have to be made about the perceived relation between the individual transfer and the level of income received by group A. The most common assumption is that each individual takes the contributions of others as fixed, the "Nash assumption": R = (S + (N/D - 1)S)/N,
(3.9)
where S denotes the average transfer by the others. In other words, the individual assumes that the total divided among the N dependents will be his transfer S plus S from each of the remaining (N/D - 1) workers who are contributing to their support. An equilibrium is characterised by a set of transfers such that each person is happy with the amount of his donation given the transfers of others. This model of private redistribution is similar to that treated by Arrow (1981), Collard (1978), and Sugden (1982, 1984). The main conclusion drawn from this model is that the private level of transfers will often be inefficient in a Pareto sense, the reason being that such transfers have a public good aspect: the reason why the charity game so rarely leads to a Pareto-optimal allocation is that the income of others is a public good. If there are two or more givers, then each would benefit by the other's giving; but the charity game does not permit mutually advantageous arrangements [Arrow (1981, p. 287)]. These large group models of income transfers have provided formal underpinning for the conclusion reached long ago that private charity on its own is unlikely to be adequate. As was noted by Friedman in Capitalism and Freedom: it can be argued that private charity is insufficient because the benefits from it accrue to people other than those who make the gifts ... we might all of us be willing to contribute to the relief of poverty, provided everyone else did. We might not be willing to contribute the same amount without such assurance [Friedman (1962, p. 191)]. The scope for state intervention to overcome this problem has been extensively discussed in the series of papers on "Pareto-optimal redistribution", including Hochman and Rodgers (1969), Goldfarb (1970), and von Furstenberg and Mueller (1971). The form of such intervention is however important. As noted by Warr (1982), the lump-sum taxes on donors discussed by these authors could simply lead to the replacement of private transfers by public transfers dollar for dollar (where donors are not at a corner solution), just as in the case of family redistribution. He argues that an additional net transfer may be achieved by measures which provide an increased incentive to donate at the margin. This could take the form of a subsidy to charitable giving, and this is indeed provided
802
A.B. A tkitnson
in many countries through income tax deductions. The theoretical arguments for such deductions are examined in Feldstein (1980b) and Atkinson (1976). (It should be noted that it is not just the aggregate level of transfers which is of concern, but also its distribution. Tax relief may be an indiscriminate policy, not distinguishing, for example, between different types of charitable body.) The provision of state income maintenance goes beyond the subsidy of private charity in that it provides direct income support. As noted above, this may have the effect of reducing the extent of private transfers, but there are several reasons why it may be adopted, including the possibility that state income support has lower administrative costs, particularly where there are economies of scale, and that state income support involves redistribution between different groups. As we have seen in Section 2, such redistribution may be desired even on an individualistic basis for reasons of insurance. Suppose that we consider first the position of a people uncertain whether they are going to be workers with wage w or in group A with zero wage, attaching a probability P to the first and (1 - P) to the second event. In such a situation they may be concerned with the expected utility, which may be written in the same form as (3.7), where r = (1 - P)/P. Even without the workers attaching any weight ex post to the welfare of the disadvantaged group A, the ex ante objective function would be the same as before, and transfers would raise the expected utility of all. Why cannot private insurance provide for such contingencies? Recent theoretical work on insurance markets has revealed the problems which may arise even in the absence of considerations such as the fixed costs of administration (favouring a single state insurance scheme covering multiple contingencies): for example, Rothschild and Stiglitz (1976), and C. Wilson (1977). The first difficulty is that the risk of being in group A typically varies across the population, and while the individual may know P, this information may not be available to (may be concealed from) the insurer. If such differences are not observed, then there will be a problem of "adverse selection". If the private company offers insurance at a premium based on the expected claims for the whole population, then it will find that those with low risks will tend not to take out insurance and that their clientele will be weighted towards the higher risks, rendering it unprofitable. Secondly, and related, is the fact that people may place different weight on the value of insurance. We would therefore expect to see a menu of policies, but as Spence (1978, p. 428) has described: a competitive market is not able to provide the efficient menu of policies. ... The reason is that when policies designed for low risk people are priced at the expected cost for low risk people, the high risk people will tend to buy these policies because of the lower cost. [There are] informationally-based negative externalities running from high to low risk people, that damage the latter and impair the performance of the market
803
Ch. 13: Income Maintenance and Social Insurance
Thirdly, there is the "moral hazard" problem that the probability P may be influenced by decisions by the individual which cannot be monitored by the insurance company. The most obvious example would be voluntary unemployment. Social insurance may be able to provide options which cannot be sustained by private markets, and improve on the private outcome. However, as noted by Spence (1978), there are two important qualifications: the scope for improvement may depend on the prohibition of private insurance, since otherwise the latter would cream off all but the highest cost groups, and the social insurance scheme may provide insufficient diversity of options. Social insurance also redistributes between people with different wages, and here we are entering territory which private insurance is unlikely to cover. In its extreme form, such insurance would have to provide for the person completely uncertain about earnings possibilities. This would be analogous to the hypothetical situations envisaged in the contractarian theories of ethical preference, to which I now turn.
3.3. Ethicalpreferences The utilitarian function represents the longest-standing approach to the assessment of social welfare, having influenced, explicitly or implicitly, much of thinking about social policy for the past century and a half. As we have seen, it may be represented in terms of a person having an equal chance of being anyone in the population, so that expected welfare is 1+ D
V(w(1-t),)dF(w)+ l+Dv(R),
(3.10)
where D is the ratio of the number in group A to that in group B, and R denotes the transfer received per head by those in group A. The implications of this objective for the design of income support in our simple model may be seen by differentiating equation (3.10) with respect to t and obtaining the first-order condition [on the assumption that neither the constraint (3.3) nor that embodied in (3.6) are binding]: -
VM[w(1-t),0] wL dF(w) +DvM[R]dR/
dt,
(3.11)
where we have multiplied by (1 + D) and used the fact that the derivative of the indirect utility function with respect to the tax rate is equal to (- wL) times the marginal utility of income, which is denoted by VM, with a corresponding
804
A.B. Atkinson
expression
uM
for group A. This condition may be re-written as
DdR/dt=
{M[w(1 - t),O]/uM[R]}wLdF(w).
(3.12)
The transfer is not therefore carried to the point where the marginal increase, dR/dt, falls to zero, but stops short to an extent which depends on the relative marginal utilities of income of group A and of the working population, the latter weighted by their earnings. The magnitude of the right-hand side depends on the form of the marginal utility of income function and on the distribution of earnings. The way in which the distribution enters the calculation is illustrated by the examples given in Atkinson (1973c) and Deaton (1983). When we move from the strictly Benthamite sum of utilities case to consider more general social welfare functions, we have to allow for the effect of a greater degree of aversion to inequality, i.e. that the social marginal valuation of income declines more rapidly with income. From (3.12) we can see that the weights in the curly brackets are subject to two influences. To the extent that the working population receive less weight relative to group A, this reduces the right-hand side, and, other things equal, leads the transfer to be higher. To the extent that those with high earnings (wL) within the working population receive less weight, it will further reduce the right-hand side. The limiting case of greater inequality aversion is the Rawlsian objective function, and the solution in this case is given by setting dR/dt = 0, since the working population, being better off than group A, receive no weight in this case. The model considered so far has focused on transfers to those out of the labour force. A number of authors have taken the case where there are two groups both in the labour force, in order to explore the extent to which such transfers should be income related. Let us now assume that group A are now at work, with a wage
rate WA, and that group B are all identical, with wage rate WB, where
WA < WB.
The simplest form of income support would be that provided by a social dividend, or credit income tax, scheme, where everyone receives a payment R and all income is taxed at rate t. This is identical to the linear income tax which has been treated in the optimum income tax literature, for example, by Sheshinski (1972), Atkinson (1973c), Feldstein (1973), and Stern (1976). The social dividend has been contrasted with more complex negative income tax schemes which have two, or more, marginal tax rates. The tax may be levied first at rate t and then at t2 ; and typically it has been suggested that t > t2 , with the aim of allowing the guaranteed income to be kept up while at the same time preventing the benefit from spilling over too far to those higher up the income scale. The version illustrated in Figure 3.3 by a dashed line has been described as a "fully integrated" negative income tax, which means that the kink occurs on the 45° line. In other words, the break-even point is also that at which the
Ch. 13: Income Maintenance and Social Insurance
805
NET
GROSS INCOME Figure 3.3. Negative income tax schedules.
marginal tax rate changes. Alternatively, we may have an overlapping system, where the kink occurs after the break-even point, as shown in Figure 3.3 by the dotted line. The different types of negative income tax schemes are described in Green (1967). The choice between a social dividend and a two-tier negative income tax is examined by Kesselman and Garfinkel (1978) in the context of the two-group model. They point out that an unconstrained choice of the two tax rates can always improve on a single rate, making use of the result obtained by Sadka (1976) and others that the marginal tax rate at the top can be set to zero, with the kink at the point chosen under the flat-rate tax, and hence raise the welfare of the better-off group without reducing the revenue available to finance the transfer to the poorer group. Kesselman and Garfinkel restrict their attention to cases where the kink occurs above (or at) the break-even point and strictly below the point chosen by the better-off group. They compare different structures, holding the utility level of the poor constant, and examining the implications for the better-off group. The comparison is shown in Figure 3.4 of a fully integrated negative income tax (line GXR) with a social dividend (line SD). The indifference curves are obtained from those between consumption (which is equal to net income) and leisure, gross income being a decreasing linear function of leisure for a given wage rate. The latter means that the indifference curves of the rich and poor have different slopes. The fact that the tax rate t on the segment GX is higher
806
A.B. Atkinson NET MIC, MF
Figure 3.4. Comparison of social dividend and integrated negative income tax.
than t2 on the segment XR is taken by the authors to indicate that there is income testing (this is discussed further below). The poor group are in both cases on the same indifference curve; but the better off are on a higher indifference curve with the social dividend. This does not necessarily happen, but is possible, as Kesselman and Garfinkel point out, because the social dividend requires a smaller transfer to achieve that specified level of utility. The authors consider how the comparison depends on labour supply elasticities, examining the implications of empirical estimates, and introduce administrative factors. They conclude that "contrary to the conventional wisdom, the universal payment, flat-marginal-rate form may be more efficient than the income-tested form" [Kesselman and Garfinkel (1978, p. 211)]. The comparison described above did not consider the optimal two-tier structure; rather, the aim was to show that a specified negative income tax need not necessarily be preferable to a single rate structure. The analysis has been taken further by Sadka et al. (1982), who introduce an explicit social welfare function, with differing degrees of inequality aversion, and calculate the optimum two-tier structure with the restriction that it be of the fully integrated form. They also extend the model to five groups of workers. They conclude that the results "are broadly consistent" [Sadka et al. (1982, p. 309)] with those of Kesselman and Garfinkel. They note that higher elasticities are less favourable to the social dividend, but point out that the welfare losses are not very great if one fails to adopt a non-income-tested regime when it would be optimal.
Ch. 13: Income Maintenance and Social Insurance
807
The choice between income-tested and other programmes raises a number of issues not encompassed in the model we have been considering. These include the stigma associated with the income-tested benefits, to which I return later, and the administrative cost.
3.4. Social insurance and intertemporalallocation To this juncture the analysis has been essentially static. The group A have been referred to as the aged, but no account has been taken of the fact that they were alive at an earlier period. We have not therefore treated the- evidently important - intertemporal aspects of social insurance. In order to investigate this, let us go back to the formulation of Section 3.2, where the individual's concern for the income of group A now becomes concern for his own income in the second period of his life. (We work throughout with a two-period model, which is clearly a gross simplification, but the main theoretical points may be seen.) The difference is now that the transfer S is one which the person himself makes from an earlier period. If the interest rate is denoted by i, and it is assumed that the person can borrow or lend as much as desired at that interest rate, then we have the intertemporal budget constraint, giving the consumption in the second period, R: [w(1 - t)L- C](1 + i) + T= R,
(3.13)
where T denotes the state transfer in old age and t is the rate of contributions levied on wage-earners. The square bracket on the left-hand side denotes savings. Unlike the earlier formulation, there are no public good aspects, since each person is making provision for his or her old age. This again raises the question as to why state intervention is necessary. We must, however, start by observing that the transition to a dynamic economy has affected the welfare-economic basis of the argument. In a static economy, with a full set of markets, and full information, the private market allocation is Pareto efficient. In an economy with an infinite sequence of overlapping generations, there is no guarantee that the no-intervention allocation will necessarily be efficient. As illustrated by the examples of Diamond (1965) and Starrett (1972), a competitive economy may be pursuing a path on which the capital accumulation is "excessive" and all generations could be made better off by intervention. We may have "Phelps-Koopmans inefficiency" [see Phelps (1962) and Koopmans (1965)], where the rate of interest is below the rate of growth. In this situation, it is possible that state intervention, for example through creation of a national debt, may be unanimously preferred.
808
A.B. Atkinson
The introduction of a state pension scheme is one such instrument for the securing of a desired intertemporal allocation. The state, unlike private pensions, can operate on a different basis from conventional borrowing and lending. It can operate a pay-as-you-go scheme. Suppose that the population is growing at rate n. Then a contribution of $1 each by the working population generates a pension payment $(1 + n) per pensioner. Comparing this with the return i on private savings, it appears that the state scheme can offer a "better deal" where i < n, which is precisely the situation of dynamic inefficiency. This aspect was analysed by Aaron (1966). (Where there is labour-augmenting technical progress, then the term n includes the rate of increase in labour productivity.) The welfare consequences of social security can, however, only be assessed in a general equilibrium context. The fact that a generation is offered a pension scheme with a positive lifetime present value, or what is typically referred to as positive "net social security wealth", does not necessarily imply a net welfare gain [Tesfatsion (1984)]. Samuelson (1975) discussed the optimum social security system in the two-period overlapping generations model. Comparing steady states, he observes that a fully-funded state scheme will simply displace private savings, but that a pay-as-you-go scheme can achieve a "golden-rule" steady state. As he notes, a full analysis must take account of the approach to the steady state; and the inefficiency argument only applies where there is initially excess capital. Moreover, consideration has to be paid to the paths of public debt and public capital, which are not always considered explicitly. This aspect is discussed by Burbidge (1983), who distinguishes between the Samuelson-Gale and Diamond versions of the overlapping generations model. The introduction, or expansion, of social security may lead to changes in public indebtedness or public ownership of productive assets, which will be relevant to the general equilibrium consequences, and hence to the welfare impact. This analysis does not allow for the existence of bequests, which provide a link between generations; and Barro (1974) argues that where there are bequests then any state intervention would be offset by a family re-adjustment. (The presence of a bequest motive may in addition affect the kinds of steady state which may be obtained.) This does, however, assume that no one is constrained in their bequests. It is possible that people would like to pass on less wealth, but are limited in the extent to which they can leave negative bequests. It also depends on the absence of uncertainty. Sheshinski and Weiss (1981) set up a model with overlapping generations where the duration of life is uncertain. They show that, starting at the optimum level, an increase in the level of a fully funded social security system is only partially compensated by a decrease in private savings. The second reason why a state pension scheme may be justified is that people may exhibit myopia and not save enough for their old age if it is left to voluntary decisions. As is noted by Diamond (1977), there are several possible reasons for this: (i) people may lack the information necessary to judge their needs in
Ch. 13: Income Maintenance and Social Insurance
809
retirement; (ii) people may be unable to make effective decisions about such long-term issues, not being willing to confront the fact that one day they will be old and grey; and (iii) they may simply fail to give sufficient weight to the future when making decisions, that is, they may act "myopically" (a fact of which they may themselves be aware but not be able to withstand). As a result, it may be desirable for the government to act paternalistically or for people to pre-commit themselves (tie themselves to the mast to prevent over-spending). The myopia argument could be used to justify compulsory saving in either private or state form, or indeed the kind of contracting-out scheme operated in Britain, where people with at least a prescribed minimum of private cover can opt out of part of the state pension. In assessing the impact of a state pension scheme we have to take into account its interaction with other elements. First, there is the possible effect on other savings. As is noted by Kotlikoff et ai. (1982), the introduction of social security pensions may lead to an offsetting reduction in private saving if people foresee the state pensions and if they are not constrained in the capital market. This latter condition is an important one: a person may wish to borrow on the strength of the future pension but be unable to do so in an imperfect capital market. Secondly, in the absence of a state pension there is likely to be some form of income support for the elderly, the need for which would be reduced by the state pension scheme. Forced savings through social insurance may be a method of reducing dependence on public assistance. Or it may be the children of the elderly who would otherwise have had to support them, so that the younger generation are in fact the beneficiaries. The third set of arguments concern uncertainty. In few other personal decisions can uncertainty play a greater role. Those entering the labour force in the 1980s are having to plan savings to yield fruit in the second quarter of the twenty-first century. The uncertainties surround the rate of return, the ageincome path, the number of children, the length of life, the length of working life, the need for medical and other expenses, and the degree of support from children and other relatives. The existence of such uncertainties does not imply that state intervention is essential. The capital market may offer appropriate instruments: for example, indexed bonds to guarantee purchasing power and annuities to insure against the risk of longevity. The contribution of such markets is however limited in reality. One reason is the adverse selection problem discussed earlier; another is moral hazard with regard to decisions such as early retirement. The design of an optimum social insurance scheme taking account of the problem of moral hazard is examined by Diamond and Mirrlees (1978). Their analysis is extended by Whinston (1983) to allow for a diversity of types and he notes that this may enhance the case for public provision. Alternatively, similar functions may be performed by family arrangements. Kotlikoff and Spivak (1981) have examined the extent to which families can self-insure against uncertain death
810
A.B. A tkinson
through implicit or explicit family agreements with respect to transfers between family members, arguing that even small families can substitute to a large degree for a complete annuity market. One important difference between the family and market arrangements is that the family may be better placed to cope with uncertainty rather than risk. The same may also be true of social, in contrast to private, insurance. The "solidarity", which is an essential part of social insurance, implies a willingness to allow flexibility in contracts where there is agreement that conditions have changed in some way which could not have been foreseen initially. These arguments are all concerned with the role of state pensions in raising individual welfare, seen on a lifetime basis. The redistribution of lifetime income between individuals may be a major objective. Suppose, for example, that we take the simple two-period model. The present value of pre-transfer lifetime income measured in terms of first period consumption is w. The effect of the pay-as-you-go pension scheme, collecting tw and then paying a pension of tw(1 + n), is to change the present value of lifetime income by tw(n - i)/(1 + i). This generates a uniform gain or loss for everyone as a percentage of pre-transfer lifetime income. On the other hand, a scheme which collected proportional contributions and paid equal cash pensions would change the present value by t[w(1 + n)/ (1 + i) - w], and this would be progressive relative to the proportional system. The desirability of such a "tilt" in the benefit formula would depend on the distributional objectives pursued. State pension schemes may be redistributive between people in the same cohort in two major ways. The first is where the benefit formula is progressive with respect to past earnings, as just discussed; the second is where the pension is tailored to current income, through an income test. We have referred earlier to income-testing as part of a negative income tax, but there are specifically intertemporal aspects. Suppose that there are no differences in earnings records. Should the state pension be uniform, or related negatively to current income as it would be if the pension were income tested, or if it were supplemented by income-related assistance? The desirable extent of income testing depends on the degree of variation in returns to saving, on the responsiveness of savings to the means test, and on the extent of dispersion of past earnings. It depends also on the extent of stigma associated with the income testing, which may deter claims being made, and on the administrative costs of its operation. The possibility of greater income testing of income support for the aged in the United States is discussed by Berry et al. (1981). Redistribution may be desired on account of the different earnings which people have received while working, but a further important consideration is the difference in access to the capital market. The rates of interest at which people can borrow or lend differ both randomly and systematically across the population. There are limits to lending and borrowing, with people subject to borrowing
Ch. 13: Income Maintenance and Social Insurance
811
constraints to differing degrees. One of the aims of social insurance is to provide an asset which generates a guaranteed rate of return and which is typically rationed in favour of lower income groups.
3.5. The form of income support There are several dimensions to income support that have not so far been examined in this chapter. I take up two of these: the choice between categorical and non-categorical programmes, and benefits in cash versus benefits in kind. There are others - such as the roles of different levels of government in a federal system [see, for example, Pauly (1973), Buchanan (1974), Oates (1977), and Ladd and Doolittle (1982)] and the choice between income and wage subsidies [see, for example, Zeckhauser (1971), Kesselman (1971, 1973), Garfinkel (1973a) and, in the context of uncertainty, Cowell (1981)]-which should be considered, but space does not permit. In the simple models used in Sections 3.1-3.3, we had in effect a categorical programme providing support to members of group A, entitled to receive it by virtue of their membership in that category, and a non-categorical programme in the form of the social dividend or negative income tax, where the benefit was governed solely by income. The latter appeared in the model where the only difference between people was in their wage rates. In the model of Section 3.4, it was taken for granted that there should be a categorical programme for pensions, in the sense of conditions for eligibility other than income, such as reaching a minimum age. The gains from use of categorical programmes are that one can tailor the benefits more closely to categories of need, such as old age, and that one can take account of differences in labour supply (or other) responses. This may be illustrated here in the context of the static model of the earlier sections, with the modification that we are now concerned with both the position of the group A who are out of the labour force and with that of low paid workers. The relation between the two policies, categorical and non-categorical, may be seen if we move in stages from the simple redistributive scheme with which we began, where the uniform payment to group A was financed by a proportional tax on group B. Suppose first that the income tax is converted to a progressive linear tax, with an exemption level, and that the exemption level is then cashed out to yield a negative tax payment below the exemption level. In other words, the after-tax income of those in work becomes G + (1 - t)wL, where G represents the effect of the negative income tax (G is equal to t times the exemption level). The fact that the transfer to group A is now financed in a progressive fashion affects the determination of the optimum level of transfer. It also raises the question as to whether G and R should be set equal. If they are equal, then we have a pure
812
A.B. A tkinson
system of negative income tax; if R is set above G, then the system would be a categorical one in that those classified as in group A would receive more than if they were in group B with zero earnings. The choice in this simple model is relatively straightforward. If one asks whether it would be desirable to raise R above G, holding revenue constant (so R rises and G falls), then it would appear obvious that this would be preferred on distributional grounds, since R is better "targeted" on those with low incomes. The fall in G would hit low paid workers, but it would also affect the higher paid. We have, however, also to bear in mind the impact on labour supply. In the present case, this is likely to reinforce the argument. A fall in G will have an income effect on the working population, and on the assumption that leisure is a normal good will lead to an increase in revenue, allowing R to be increased still further. The model just described is similar to Akerlof's (1978) discussion of "tagging". Akerlof considers a situation where there are two classes of workers, one skilled and the other unskilled, each with a specified wage rate. There is no decision about hours of work, but the government's scope to redistribute from the better paid skilled workers to the less well paid unskilled workers is limited by the requirement that skilled workers do not switch to take unskilled jobs. This latter constraint may however be eased if it is possible to identify unskilled workers and present them with a different tax/benefit schedule. Akerlof supposes that a proportion beta of the unskilled workers can be "tagged" and shows that such tagging allows a higher transfer to be made: "total welfare could be raised by giving increased subsidies to the tagged poor" [Akerlof (1978, p. 17)]. It should be noted that where, as in this model, the recipient group have other sources of income, the categorical programme may involve a higher level of payment and a different rate of tax. (This differs from a two-tier negative income tax in that a person must meet the qualifying condition in order to benefit from this branch of the budget constraint.) These examples show the potential gains from categorical programmes, but there are also disadvantages. Akerlof notes (i) the administrative costs of the categorical system, including those to the recipients, (ii) the inequity which arises where there are errors in classifying the eligible (the 1 - beta who do not receive the preferential treatment in Akerlof's model), and (iii) the incentive for people to misrepresent their circumstances so as to qualify. The implications of errors in classification have been examined in greater depth by Stern (1982), who compares the benefits from the use of categorical information and the costs of erroneous classification including that of horizontal inequity. The second question considered here is that of benefits in cash versus benefits in kind. The models employed in the analysis have not allowed for differentiated goods, but may readily be extended. Let X1 be the commodity in question (for example, housing or food). The effect of provision in kind may be represented by
Ch. 13: Income Maintenance and Social Insurance
813
the addition of an element T[X 1, Pl, Y] to the individual budget constraint, where p, is the price of the good and Y denotes income. We may distinguish several forms of provision: (i) provision of a specified quantity in kind, G1, so that T=plmin{ X, G}, (ii) subsidy to the purchase of the good, which may be specific or ad valorem, so that T= tX or tplXl , and (iii) income-related provision of the good, or subsidy, so that G1 and/or t are functions of Y. In practice, these schemes are of considerable complexity. In Britain, the housing benefit scheme (in 1985) involves a rebate on rent for tenants:
T= min{ p 1 X,max[h lpX
1
- h 2 (Y- h3 ),O] },
(3.14)
where O < h, h2 < 1, and h3 depends on family size. The reader may well find this confusing, but in fact the representation (3.14) is a simplification of the way in which the scheme operates in practice. The provision of support in kind, that is tied to the purchase of particular commodities and assumed not be resaleable, runs counter to the typical liberal economist's recommendation that redistribution take the form of money income. The welfare economic basis for such a recommendation in a perfectly competitive economy without external effects or non-convexities and with full information has been set out by Arrow (1963, p. 943): social policy can confine itself to steps taken to alter the distribution of purchasing power. For any given distribution of purchasing power, the market will, under the assumptions made, achieve a competitive equilibrium which is necessarily optimal; and any optimal state is a competitive equilibrium corresponding to some distribution of purchasing power. There are several reasons why this may not apply. First, as observed by Foldes (1967a, 1967b, 1968), there may be difficulties when the competitive equilibrium is not unique. As he shows in a two-good example, if there are multiple equilibria, it may not be possible to attain a desired allocation simply by redistributing one good (money). Redistribution of all goods may be necessary [see also Samuelson (1974)]. This argument is valid within the confines of the assumptions made in the traditional model. The second argument introduces considerations not present in that model. Foldes supposes that the government is uncertain as to individual preferences and argues that again the government can attain the highest level of social welfare only by redistributing all goods. For comments on Foldes' argument, including the possibility of a multi-stage process, see Buchanan (1968). A third reason why in-kind provision may be necessary is that there may be interdependent preferences, where these preferences cover the form in which
814
A.B. Atkinson
income is spent and not just the total level of utility. The goal of eradicating specific poverty discussed in Section 2 is an example. Suppose that the consumption of one good by the recipient group enters the utility functions of others. In this situation, lump-sum transfers of purchasing power cannot ensure a Paretoefficient allocation [Daly and Giertz (1972)]. Moreover, voluntary transfers in kind leave scope for Pareto improvements [Daly and Giertz (1976)]. The same will in general be true if the level of state transfer in kind is determined by regard to the objectives only of the taxpayers, or what Browning refers to as "paternalistic" transfers. Garfinkel (1973b) points to the need to take account of the preferences of the beneficiaries. As is noted by Brennan and Walsh (1977), the situation described above differs from that where the welfare of the poor enters the donors' utility functions in that the change in the consumption pattern of the poor could be achieved without making the poor any better off. This analysis has largely been based on the assumption that economy is otherwise distortion-free and [with the exception of Foldes (1967a)] that the government is fully informed. Relaxing both of these assumptions has featured prominently in the agenda of the optimum taxation literature. Nichols and Zeckhauser (1982) draw attention to the way in which in-kind transfers may be used to discriminate between those genuinely eligible for support and those who are "impostors". Suppose that there are two groups, one eligible and one declaring the same income but in fact better off. If there is a good with a negative income elasticity of demand (Nichols and Zeckhauser refer to "lower quality housing"), then provision in kind of the amount chosen by the eligible group would act as a deterrent to the impostors. The interrelation of a general subsidy with other instruments of policy has been investigated as part of the literature on the design of indirect taxation [for example, Diamond and Mirrlees (1971) and Atkinson and Stiglitz (1976)] and on public sector pricing [for example, Feldstein (1972) and L. S. Wilson (1977)]. The argument for a subsidy depends on the use which can be made of other tools for redistribution and the extent to which people differ in the prices they face for the good and in their tastes for its consumption. In an analysis of housing subsidies, Atkinson (1977) shows that where people differ in their wage rates, but not in the prices they face, then there is no case for a subsidy that varies with housing outlay if the income tax schedule may be freely varied and there is separability between housing and leisure in the individual utility function. Where these conditions do not hold, then there may be a case for a subsidy. (For further discussion of housing subsidies, see Chapter 7 by Rosen in Volume I of this Handbook.) The general question of public price discrimination where the price can differ between households is discussed by Friedman and Hausman (1974) and Le Grand (1975). The interrelation between in-kind provision of goods and other instruments is examined in papers by Guesnerie (1981), Roberts (1978), and Guesnerie and
Ch. 13: Income Maintenance and Social Insurance
815
Roberts (1984). These have brought out the interrelation between the desirability of in-kind redistribution and the existence of distortionary taxes. In some cases, the results are counter-intuitive: for example, that it may be "luxuries" which should be provided in kind [Guesnerie and Roberts (1984, p. 79)]. Such results are a warning that, once we enter the territory of the second best, intuition may be highly misleading; but they may also lead us to question whether important features have been excluded from the analysis.
3.6. Concluding comment In this section I have examined some of the theoretical issues underlying the design of income support, and in the next section I go on to consider some of the ways in which it works out in practice. In both cases, it is important to remember that it is the way in which the system is conceived to work that may be crucial to its effects and effectiveness. Behavioural responses may be based on perceptions which bear little relation to reality. Support for a programme of income maintenance may derive from a social mythology in which notions such as that of a contributory principle play a potent role in providing "the cement which holds together a social system" [Hollister (1974, p. 39)].
4. The impact in practice The theoretical literature reviewed in the previous section is intended to illuminate policy decisions, and in this respect it has made valuable contributions, but there remains a large gulf between the problem as posed in that literature and that viewed by those actually concerned with the development of policy. An example is the treatment of income-tested benefits. Most of the models described above ignore the aspect of means testing which has generated most concern: the stigmatising effect of receipt which deters people from claiming. This is one of the features of reality which is considered in this section. I begin however by considering in more detail the actual form of benefit and tax schedules, and the complex budget constraints to which they give rise.
4.1. Complex budget constraints The budget constraint considered in the theoretical section is typically of a straightforward form, with one or two linear segments. In reality, the constraint is likely to be considerably more complicated, as illustrated by the account of the U.S. federal income tax given by Hausman in Chapter 4, Volume I, of this
816
A.B. Atkinson
Handbook. On the face of it, the tax schedule has a steady progression of marginal rates, from 14 percent on the first band of taxable income, rising through 16, 18, 21, up to 50 percent at the top (all figures relate to 1980). But for families with dependent children, the earned income tax credit introduced in 1975 modifies the picture. There is first a subsidy of 10 percent at the margin on earnings and then an additional marginal tax, via the withdrawal of the credit, until the benefit from the credit is extinguished. At this point, the marginal tax rate becomes lower than on the earlier tranche of earnings. As a result, there is a region where the marginal tax rate falls with earnings, creating a non-convexity in the budget constraint, as is illustrated by Figures 2.5 and 2.6 in Chapter 4 of this Handbook. This is in contrast to the situation typically shown in public finance textbooks where the marginal rate of tax rises with gross income (usually referred to as a "progressive tax system"), leading to a convex budget constraint, as illustrated in Figure 2.2 in Chapter 4 of this Handbook. The earned income tax credit applies only to those with dependents and to the relatively low paid, but a non-convexity is also generated by the form of the contributions to finance social security under the Federal Insurance Contributions Act (FICA), which are levied on earnings up to an upper limit, causing the marginal tax rate to fall at that point. Further sources of non-convexities in the budget constraint are the income transfer programmes designed to help families on low incomes. This is illustrated by the operation of the Aid to Families with Dependent Children (AFDC) programme in the United States [see Williams et al. (1982, pp. 482-497) for a description]. This scheme is administered at the state level and the provisions vary widely, but as shown in Chapter 4 of this Handbook, recipients may be faced with a higher marginal tax rate on earnings than that levied on any taxpayer under the income tax. In Britain, the income-tested programme designed to help working families with children, involves (in 1985) a 50 percent rate of withdrawal of benefit, which is substantially higher than the 30 percent basic rate of income tax (and not far short of the top rate of income tax of 60 percent). The complexity is considerably increased by the interaction and overlap between different elements in the system. Receipt of one benefit may preclude receipt of another, alternatively beneficiaries may be able to cumulate entitlements just as French politicians cumulate elected offices. Where benefits are cumulative, there are complications introduced by the fact that one benefit may take account of the amounts received under another scheme. The procedure by which the benefits from one programme are counted as income for the purpose of eligibility and benefit determination for a second programme is referred to as "sequencing" by Barth et al. (1974, p. 116). They give the example in the United States of AFDC payments being counted as income for food stamps and public housing benefits. In Britain, the decline in the income tax threshold, relative to average earnings, and the rise in the limits of eligibility for income-tested
Ch. 13: Income Maintenance and Social Insurance
817
benefits, has meant that there is quite a wide range of potential overlap between paying income tax and receiving income-tested benefits. Since income tax (in 1985) is payable at the full basic rate of 30 percent from the first £1 of taxable income, this leads to a marginal tax rate for recipients of 80 percent plus National Insurance contributions of at least 6.85 percent. If, in addition, the family is eligible for and claims housing benefit, then the marginal rate can (in 1985) exceed 100 percent (the system is being reviewed). The situation in 1985 is illustrated in Figure 4.1, which shows the relation between gross earnings and net income (including benefits) for a hypothetical family consisting of a man with a wife, who is not in paid work, and three children, and living in a rented house. There is a range of gross earnings where there is little increase in net income as gross income increases, and indeed net income falls at certain points. (In considering the marginal tax rates, one has to take account of the lagged adjustment of benefits to changes in income, so that benefit is not withdrawn immediately.) The possibility of such a "poverty trap" or "poverty plateau" was noted in Britain by Lynes (1971), Piachaud (1972), and Prest (1970). In the United States it was analysed by Hausman (1971), Aaron (1973), and Lampman (1975). Its actual importance depends on the number of people in the relevant income ranges (and those who would choose to be in that range if it were not for the trap). This number may not in practice be large, since the benefits are often confined to the lowest income levels, but the poverty trap has played an important part in debate about income support and its reform. Aaron (1973) describes how the high marginal rates implicit in different reform proposals advanced in the early 1970s, including Nixon's Family Assistance Plan, were one of the main reasons why the proposals failed to pass Congress. As Aaron brings out, it is not just the actual but also the apparent disincentive which has to be avoided. The effect of such complex budget constraints on individual work decisions is discussed by, among others, Diamond (1968), Hall (1973), Burtless and Hausman (1978), Hanoch and Honig (1978), Heckman and MaCurdy (1981), and Brown (1980). Where the budget constraint is convex, then the choice may be analysed in terms of the slope (which depends on the marginal tax rate) and the intercept with the vertical axis (the "virtual income"). This is illustrated in Figure 2.2 in Chapter 4 of this Handbook, where the individual's choice of hours is that which would be generated by a linear budget constraint with slope w2 and virtual income Y2-. It is, however, important to note that the marginal tax rate faced by the person is endogenous. This means that in econometric work one cannot treat the marginal tax rate as a parameter; one needs to take the full budget set (which is exogenous) into account. Where the budget constraint is non-convex, then the resulting labour supply curves may be discontinuous. A person may be indifferent between two distinct levels of work, as illustrated in Figure 2.8 in Chapter 4
A.B. A tkinson
818 Net family income (£ per week) 160 150 140 130
With FIS and housing rebates
120 110 100 l-,
1
//Without FIS and housing rebates
90 I I 100 110
I 120
I I I I I I I , 130 140 150 160 170 180 190 Gross earnings of husband (£ per week)
I 30.
I 40
Net family income (£per week) 180 170 160 150 140 130 120 i10 I 10
I 20
I I I I I 50 60 70 80 90 100 Gross earnings of wife (£ per week)
Figure 4.1. Gross and net income for family with three children in Britain at different earnings of husband (upper diagram) and wife (lower diagram). Note: FIS denotes Family Income Supplement; the difference between the solid and dashed lines in the upper diagram shows the effect of this and housing rebates (also a means-tested benefit).
Ch. 13: Income Maintenance and Social Insurance
819
of this Handbook, where it is noted that "small changes in ... any parameter of the tax system can then lead to large changes in desired hours of work" (p. 221). Attention has tended to focus on implications of the marginal tax rate on additional earnings by the family head. This is not, however, the only, nor indeed the most important, margin at which labour supply decisions may be made. Quantitatively more important are likely to be decisions by other family members, particularly those with regard to participation. [For studies of labour force participation, a subject not covered in Section 5, see Chapter 4 of this Handbook, and Hausman (1980), Heckman and Macurdy (1980), Zabalza (1983) and Blundell et al. (forthcoming).] The tax and social security system typically treats earnings by other workers in a different fashion. In Britain, there is a separate wife's earned income allowance for income tax, and a separate earnings exemption for housing benefit. This means that £1 earnings by the wife may bear a significantly lower marginal rate. This is illustrated in the lower part of Figure 4.1, which shows the implications for net family income if it is the wife, rather than the husband, who receives the additional earnings. There is a "notch" but in general the curve rises more steeply for the wife, indicating a lower marginal tax rate. This means that we have to consider the family labour supply decision as a whole [see Smith (1979) and Hausman and Ruud (1984)]. The work decisions made by families have many dimensions: for example, hours of work, choice of job, education and training, and retirement. Of particular significance here are decisions which may lead to an entitlement to categorical benefits. A person may retire and hence become eligible for a pension; a person may claim the status of disability and become eligible for disability benefits. These possibilities may generate a further non-convexity in the budget constraint, as illustrated in Figure 3.2, where a person may be either on the segment AB, or, if they satisfy the retirement condition, at the point R. The retirement condition may be enforced less sharply. It is implemented in Britain and the United States by an earnings test, which reduces the pension if earnings rise above a certain level, until the pension is extinguished. In other words, there is a segment joining R to the line AB with a slope steeper than that of AB. In considering the date of retirement, we need to bear in mind the intertemporal aspects. To continue after the minimum retirement age may provide an increment to the pension and, with an earnings-related pension scheme, staying at work may allow a person to improve his or her earnings record. These aspects are discussed further in Section 5, when we come to the empirical evidence about retirement decisions. The intertemporal aspects may be important for other work decisions. A person who is unemployed and considering accepting a job offer (or, conversely, who is employed and is considering quitting) may consider the implications not just for current income but also for the income stream over several periods. A decision to turn down a job offer may jeopardise future benefit receipt, and continued dependence on benefit may reduce future entitlement (as where there is a maximum duration of benefit).
820
A.B. A tkinson
Finally, eligibility for benefits may depend not just on income but also on wealth. Public assistance typically has an upper limit on assets or imputes an above-market return. Asset tests in the United States are described by Moon (1977b) for the aged and disabled, by Lurie (1977) for cash benefits for the non-elderly, and by Hausman (1977) for housing, medical care, and higher education. A further important factor is that hypothetical calculations of the poverty trap do not always take full account of the detailed provisions of the benefits, including the exemptions for certain types and amounts of income and the exercise of local discretion. The implications for the effective marginal tax rates under public assistance in the United States are discussed by Lurie (1974), Rowlatt (1972), Barr and Hall (1975), Hutchens (1978), and Moffitt (1979).
4.2. Non-receipt of benefits For taxes, there is an incentive for the taxpayer to try to opt out and for the tax authorities to make sure that he does not succeed in evading their net. Applying the same argument in reverse to benefits, we may expect the potential recipients to have every incentive to opt in and the responsibility of the authorities to be the detection of ineligible claimants. Experience in several countries however has shown that non-receipt by eligible families is a serious problem, at least for income-tested benefits. Families may be unaware of the existence of the benefit, a factor which be especially important with new programmes. Families may be aware of the benefit but believe that they are not eligible, as may happen if they have previously applied in different circumstances and been turned down. They may be aware that they are eligible but not claim because of the costs of doing so, such as the time spent queuing in offices and in completing forms, as may particularly be the case where the amount of benefit involved is small. They may not claim because they do not wish to be identified as receiving the benefit which they feel to be stigmatising. And they may have claimed but been rejected as a result of administrative error. Eligibility for a means-tested benefit typically depends on a range of conditions in addition to income. In some cases these are easy to check, such as age, but others may involve a substantial element of judgement, such as whether two people are deemed to be co-habiting. This means that it may not be easy to identify those people who are eligible but not claiming. One possible approach is to screen out potential recipients from a sample survey and then invite them to apply for the benefit, in order to confirm their eligibility according to the administrative procedures. However, this is unlikely to yield a correct estimate of eligible non-recipients since, unless the main reason for non-claiming was lack of information, many are still going to refuse. Studies of non-receipt have therefore usually been based on survey information supplied directly by respondents, and the need to rely on this source must qualify
Ch. 13: Income Maintenance and Social Insurance
821
the conclusions drawn. The studies also typically involve secondary analysis of survey data not explicitly collected for this purpose and hence may not contain all the details required. In Britain, much of the evidence is derived from the Family Expenditure Survey (FES), which is carried out primarily to collect information for the annual adjustment of the weights in the Retail Prices Index. The problems which arise in the use of the survey to estimate the extent of take-up are discussed in Atkinson (1984). Concern about non-take-up in Britain has been particularly directed at the means-tested Supplementary Benefit (SB), which replaced National Assistance in 1966. One of the reasons for the change was the evidence, confirmed in an official enquiry [Ministry of Pensions and National Insurance (1966)], that a sizeable number of pensioners were not claiming the assistance to which they were entitled. The changes in administration accompanying the introduction of SB were intended to improve the take-up rate. However, the analysis in Atkinson (1969) showed that more than half of the ensuing increase in the number of recipients could be attributed to a more generous scale rate rather than to improvement in take-up. Later analyses based on the FES for the 1970s confirmed that a sizeable minority were not claiming. The estimated percentage of non-claimants varied between 24 and 35 percent for families over pension age, and between 21 and 30 percent for those below pension age over the period 1973-1981 [Atkinson (1984, table 1)]. In terms of the proportion of benefit not claimed, the record is better, but still not impressive: in 1981 the estimated amount of benefit not claimed was 15 percent of the total [Atkinson (1984)]. Evidence about non-take-up of public assistance is found in other countries. For example, there are studies for West Germany by Hauser et al. (1981), Kortmann (1982) and, in German, Hartmann (1981). Take-up of FIS in Britain provides an interesting case study, since it was a new benefit, having no direct association with the old poor law, and the government attempted to secure a high take-up (the target was 85 percent). From the outset, it was clear that this was not being achieved; and a special survey of low income families in 1978-79 [Knight (1981)] estimated that the take-up was 51 percent in Great Britain. The time series of number of recipients has been analysed in Atkinson and Champion (1981), who find a significant relationship with the variation in the generosity of the scale relative to the earnings of the lowest decile of male manual workers, but no evidence of the separate upward trend over time which might indicate improved take-up. In the United States, attention has focused on the take-up of AFDC, of Supplemental Security Income (SSI) for the aged, blind and disabled, and of food stamps. In the case of AFDC, the evidence shows the participation rate in the late 1960s as being low [45 percent for the basic family portion and 27 percent for the unemployment parent (UP) portion in 1967], but as having increased in the early 1970s, reflecting the outreach efforts of administrators and the activities of rights workers. By 1977 the estimated take-up rates were 94 and 72 percent,
822
A.B. A kinson
respectively. These figures are taken from Warlick (1981a), who gives an assessment of their accuracy and draws attention to the sensitivity of the estimates to the methods applied [see also Moffit (1983, p. 1028, fn. 7)]. She notes for instance that adjustment for ineligible participants (those estimated by the researchers not to be eligible but who were in receipt) had the effect of reducing the estimated participation rate by 7 percentage points. The official study by the United States Department of Health, Education and Welfare [Projector and Murray (1978)] confirmed that take-up was low in 1970. They concluded that only one in three of the apparently eligible families reported receipt of public assistance in the Current Population Survey, although this was qualified by the fact that receipt may be incompletely reported and that there was only limited information to assess eligibility. Warlick (1981a) quotes estimates for the take-up of other income-tested benefits, including the figure of 37.5 percent for participation in the food stamp programme in 1974 given by MacDonald (1977) and her own estimate of 47 percent for SSI in 1975. In the latter case, she notes the wide range of estimates which can be obtained by applying different methods, but in all cases the estimated participation is well below 100 percent. Incomplete take-up of income-tested benefits has two important consequences. The first is that the poverty trap described in Section 4.1 may be less serious in fact than it appears in theory. Calculations based on hypothetical families are misleading in that they assume that benefit entitlement is taken up. The second important consequence of non-take-up is that the means-tested benefits fail to provide a fully satisfactory safety net. We examine next the evidence about the effectiveness of income support programmes in preventing poverty.
4.3. Benefits, taxes, and poverty In their assessment of the first decade of the War on Poverty in the United States, Plotnick and Skidmore estimate that before transfers 24.8 percent of households were below the official SSA poverty line in 1972, with a total pre-transfer income gap of $34 billion per year [Plotnick and Skidmore (1975, table 6.1)]. The transfers studied included OASDI, public assistance, unemployment insurance, workmen's compensation, veterans' pensions, and government employee pensions; and the total cost was $62 billion in 1972. The effect of these transfers was estimated to be reduce the poverty figure to 14.0 percent and the income gap to $12.5 billion (Plotnick and Skidmore, table 6.2). In other words, the transfers reduced the poverty population by 44 percent and the poverty gap by some 64 percent. These estimates have to be qualified, since transfer payments are believed to be under-reported and since in-kind transfers are excluded. Plotnick and Skidmore quote the estimates of Smeeding (1975), which allow also for the impact of taxes. These adjusted figures show the reduction in the poverty
Ch. 13: Income Maintenance and Social Insurance
823
population to be 72 percent in 1972 and the fall in the poverty gap to be 83 percent. More recent estimates by Danziger et al. (1984) show 24.2 percent of people (as opposed to households in the figures quoted above) being below the official poverty line in 1983 before transfers, with the post-transfer figure being 15.2 percent, or between 10 and 13 percent if in-kind transfers are included. It should be stressed that these, and the calculations that follow, assume that neither taxes nor transfers affect the pre-transfer income of the poor. The possible impact on a range of decisions is the subject of Section 5. In Britain, corresponding estimates have been made by Beckerman (1979a) for 1975 using data from the FES. He takes as the poverty line the Supplementary Benefit standard and calculates that before benefits 22.7 percent of the population were below this level, with an income gap of £5.9 billion per year. The benefits covered are cash transfers via National Insurance, child benefit, SB and other programmes, with a total cost of £8.0 billion. The proportion below the poverty line after benefits is estimated to be 3.3 percent and the total income gap £0.25 billion. The percentage reduction in the number in poverty is therefore 85 percent and the percentage reduction in the poverty gap is over 95 percent. Calculations of this kind have been used in a variety of ways to assess the effectiveness of income support. The simplest is the "success rate" in terms of reducing the measure of poverty, such as the headcount and the poverty gap used above. The shortcomings of the former measure, discussed in Section 2.2, seem particularly important here, since a reduction in the number in poverty could well be achieved at the expense of deepening poverty for others. For example, the conversion of a universal benefit to a means-tested basis could allow the level to be raised, thus benefiting those low income families who claim but making considerably worse off those low income families who do not claim. Why do the present systems not have 100 percent success? Here we must distinguish between the British and U.S. calculations. In the British case, the poverty line is the same as the income floor which SB seeks to provide. People therefore fall below only insofar as they do not claim or they are not fully covered by SB. The latter applies to those families where the head is in full-time work, and one of the main discoveries of the 1960s, notably by Abel-Smith and Townsend (1965), was the extent of poverty amongst working families. Non-takeup applies, for instance, to the elderly, where the basic National Insurance pension has typically been insufficient on its own to raise people above the SB scale and those without occupational pensions or substantial savings have been dependent on means-tested assistance. For discussion of the composition of the low income group and the proximate causes of poverty in Britain, see Atkinson (1969), Fiegehen et al. (1977), Townsend (1979), Berthoud et al. (1981), and Beckerman and Clark (1982). In the United States, the poverty line is not related to social security scales, and there is no presumption that benefits will raise all families above. It was not
824
A.B. Atkinson
therefore surprising that the studies in the 1960s laid considerable stress on groups such as the aged who were in receipt of transfers. While the Council of Economic Advisers (1964) correctly drew attention to the pervasiveness of poverty (" the poor are found among all major groups in the population", p. 62), the incidence of poverty was considerably higher for the aged, for families with a female head, and for the disabled. The position has however changed over the past twenty years. In particular, the transfers to the elderly appear to have become more effective in preventing poverty, but transfers for certain other groups have become less effective. Danziger et al. (1986) show that the proportion of the aged removed from poverty has risen over the period 1965-1982 from 57 to 78 percent in the case of whites and from 26 to 50 percent for blacks. But the corresponding percentages for families headed by a woman with children have fallen from 30 to 14 percent for whites and from 11 to 8 percent for blacks. A second criterion in assessing income support programmes has been target efficiency, where it should be stressed that the word "efficiency" is being used in a specific sense which does not take account of the aspects more usually included under this heading in welfare economics. Target efficiency was introduced by Weisbrod (1969), who noted that there are two different elements to efficiency in this context: the accuracy of a programme in assisting only those in the target group and the comprehensiveness of the programme in assisting all of the target group. The latter, which he refers to as "horizontal" efficiency, is related to the criterion of reducing the poverty count that we have already discussed. The former, or "vertical" efficiency, concerns the proportion of the total expenditure which goes to the pre-transfer poor and its relation to the amount necessary to raise them to the poverty line. A programme may be less than 100 percent efficient on this basis if part of the benefit goes to those above the poverty line or if the size of the transfer exceeds the amount required to fill the poverty gap. From the calculations given above, it may be seen that the reduction in the poverty gap was some 35 percent of the total transfer spending in 1972 in the United States and some 70 percent in 1975 in Britain. [For further analysis of the target efficiency of different programmes in the United States, see Garfinkel and Haveman (1977, ch. 5).] In the case of Britain, Beckerman notes that the high level of efficiency is surprising in view of the fact that most benefits are not income tested, but he points out that the subtraction of state transfers would leave the vast majority of pensioners with incomes below the poverty line. This of course takes no account of the incidence of such transfers. If, for example, in the absence of state pensions the children of the elderly would provide them with income support, then the reduction in poverty achieved by the state transfers may be quite a lot less. The calculations of pre-transfer incomes assume that earned income and savings would be the same if there were no income maintenance provisions. Given the scale of the transfers, this seems an heroic assumption; and there are good reasons to expect for example that the
Ch. 13: Income Maintenance and Social Insurance
825
incomes of the elderly would be different in the absence of state pensions, as would the number of elderly people living on their own. The prediction of such changes is, however, a difficult matter. They will depend on the behavioural changes by the recipients, and some of the most important are considered in Section 5. The contribution to the abolition of poverty depends also on the valuation placed on the transfers. Reference has been made to in-kind transfers, and in the United States there has been considerable controversy surrounding their treatment, not least because they have increased faster in amount than cash assistance. Some authors [for example, Browning (1976) and Paglin (1980)] have argued that in-kind transfers should be valued at cost, so that $1 of Medicare is valued at $1, and Paglin has even suggested that their value should exceed $1 since they are tax free and automatically indexed. Others argue that the provision of the transfer in kind restricts its value to the recipient. Smolensky et al. (1977), Smeeding (1977), Murray (1980), and Schwab (1985) have examined the general question of valuation; Kraft and Olsen (1977) and Murray (1975) deal with housing, and Clarkson (1976) and MacDonald (1977) deal with food stamps. The implications for the measurement of poverty of different approaches to the valuation of medical care benefits are investigated by Smeeding and Moon, who conclude that in aggregate there is little difference, but that for the measurement of poverty among the elderly the choice is quite crucial. The valuation of cash benefits has been less discussed, but the earlier analysis of non-take-up suggests that there may be grounds for differential treatment of means-tested and universal benefits. To the extent that non-receipt arises on account of stigma, then this may reduce the value of the benefit (at the margin the extra income would be exactly counter-balanced by the cost in terms of loss of dignity, etc.). This is of particular importance if one adopts a "rights" approach to poverty. More generally, there is the question as to whether "income" is the correct basis for a welfare measure; for groups such as the elderly or the disabled, it may be particularly inappropriate. Finally, we have not considered the unit of receipt and the distribution of income within that unit. In Britain, Fiegehen and Lansley (1976) compared the measures of poverty obtained when the household was taken as the unit with those where the tax unit was the basis. The tax unit is narrower, consisting of a single adults, or married couples, together with any dependent children, and the assessment of poverty on this basis has to make assumptions about the extent of income transfers between tax units in the same household (for example, help by sons and daughters to their elderly parents). Taking only the external income, Fiegehen and Lansley calculate from the 1971 Family Expenditure Survey that on a household basis there are 3.4 percent of persons below the Supplementary Benefit line and on a tax unit basis there are 6.4 percent (in both cases those receiving Supplementary Benefit were excluded). They conclude that "the choice
826
A.B. Atkinson
of income-receiving unit is crucial in estimating the numbers of people in poverty" [Fiegehen and Lansley (1976, p. 517)]. A closely related question is that of the distribution of income within the unit. In both the household and the tax unit calculations, it is assumed that all family members share the same living standard, being all above or all below the poverty line, but this may not be the case. The intrafamily distribution is a subject about which we know relatively little. Pahl (1983) quotes evidence in Britain that a sizeable proportion of women, up to a third, whose marriages have broken up report being "better off" on Supplementary Benefit than living with their husbands, but this can be interpreted in several ways. In the United States, evidence from the Michigan Panel Study of Income Dynamics has been used, for example, by Morgan (1984, and the references given there) to calculate intrafamily transfers, including the time that is contributed. Lazear and Michael (1984) have attempted to apportion consumption, taking account of the public good aspects within the family, and conclude that allowance for inequality within the family can make a significant difference to measured poverty. Given the limited information, a natural approach is to examine the effect of different assumptions about the intrafamily distribution. This is the approach of Moon (1977a). Her central assumption is that the highest priority is to ensure a subsistence minimum for all members but that once this is achieved transfers rise less than proportionately with family income. This is compared with alternatives under which transfers are limited to providing the subsistence minimum or where equivalent income is equalised. This subject warrants further investigation; and in the present context it should be noted that income transfer policy itself may have an important impact (for example, if child benefits are paid to the mother rather than to the father). 4.4. Overall redistributiveeffect Measures intended to relieve poverty may well have effects which spill over to those above the poverty line. Categorical benefits, for example, may go those who qualify under the conditions but whose income is well above the poverty cut-off. While this may be considered "vertically inefficient" in the sense discussed in the previous section, the income support measures may nonetheless contribute to a broader goal of reducing income inequality. As noted by Danziger et al. (1984), the impact on the non-poor of the War on Poverty has received little attention; but there have been a large number of studies of the overall redistributive impact of the government budget, from which we can draw conclusions about the role of the income transfer component. In the United States, the study by Danziger and Plotnick (1977) of the distribution among families and unrelated individuals found that in 1974 the
Ch. 13: Income Maintenance and Social Insurance
827
Gini coefficient for the distribution of pre-transfer income was 47.65 percent, whereas that for post-transfer income was 40.77 percent. These figures relate only to cash transfers. When we consider transfers in kind, then the same problems of valuation arise as were discussed above. The importance of this aspect in assessing the degree of inequality, and its trend over time, has been brought out by Browning (1976) and the subsequent discussion by Smeeding (1979). In a recent study, Smeeding (1984) has shown that there may be a large difference in the distributional consequences of medical care and housing subsidies between the conventional market value/government cost approach and that based on valuing the benefit to recipients and non-recipients. This issue is also considered at length by Smolensky et al. (1977). Their estimates for 1970 show cash transfers as reducing the Gini coefficient from 44.4 to 39.8 percent, and that adding in-kind transfers at taxpayer cost reduces it further to 37.1 percent, but that the reduction would be smaller if the transfers were valued at their cash equivalent to the beneficiaries and if account is taken of the benefit derived by the donors (via utility interdependence). More recently, the position may have changed with the advent of the Reagan Administration. For discussion of the changes in the 1980s, see Burtless (1986), Danziger and Haveman (1981), Danziger (1982), Danziger et al. (1984) and Meyer (1986). In Britain, the impact of cash and in-kind transfers is included in the official Economic Trends studies of the effects of taxes and benefits on household incomes [Central Statistical Office, (1983)]. The estimates for 1982 show the share of the bottom 20 percent rising from 0.4 percent to 5.7 percent as a result of cash transfers, and the authors noted that "the reduction in the Gini values from 48.2% to 36.2% ... clearly confirm that cash benefits produce the largest reduction in income inequality [compared to other elements in the budget]" [Central Statistical Office (1983, p. 87)]. The treatment of in-kind transfers in these studies is based on taxpayer cost, and shows their value as falling from 35 percent of final income for the bottom 20 percent to 12 percent for the top 20 percent. For discussions of these studies, and criticisms of the methods employed, see Nicholson (1974), O'Higgins (1980), Stephenson (1980), and Le Grand (1982, forthcoming). The extent of redistribution between different groups in the population is also important. Particular interest has attached to the position of the elderly. In its 1985 report, the U.S. Council of Economic Advisers devoted a chapter to the economic status of the elderly, arguing that "thirty years ago the elderly were a relatively disadvantaged group in the population. That is no longer the case" (1985, p. 160). The report recognises that the elderly are not a homogeneous group, but notes that the average per capita income for families (but not for unrelated individuals) is virtually the same for the elderly and non-elderly. In the United Kingdom, the Department of Health and Social Security (1984) show the
828
A.B. Atkinson
disposable income per head of pensioners having increased from 41 percent of that of non-pensioners in 1951 to 69 percent in 1984/85. The assessment of the position of the elderly calls into question the use of current income as a measure of economic status. Moon (1977a) develops an indicator of potential consumption which adds to current income adjustments for net worth, income in kind and family transfers (as discussed above). Using data from the Survey of Economic Opportunity in 1966 and 1967, Moon shows that such calculations can make a noticeable difference to the measured inequality among the aged and to the ranking of different families. Danziger et al. (1984) use information for 1973 and pay particular attention to the adjustments made for differing family size. The conclusion regarding the relative status of the elderly appears to be much more sensitive to this latter aspect than to the choice of consumption or income as the measure of economic status. In monitoring the economic circumstances of the aged, panel data are of especial value, and Hurd and Shoven (1983) have used the U.S. Longitudinal Retirement History Survey to track changes in status over six years. The analysis of the distributional impact of transfers raises in a particularly acute form the problem of treating the intertemporal aspects of redistribution. In the case of pensions, we observe a transfer to people in lower income ranges from those in the working population, but this does not necessarily imply redistribution in a lifetime sense. One method of allowing for this is to estimate the amount of the state pension which represents an actuarial return to past contributions. In the United States, Burkhauser and Warlick (1981) use the 1973 Current Population Survey merged with OASI earnings and benefit records (the Social Security Exact Match File) to compare the "actuarily fair" OASI pension with that actually received, and conclude that it would only be some 27 percent of the total. The major part represented an intergenerational transfer. This reflects the tendency of past legislators to "blanket in" groups who have not previously been covered or who could only attain a limited period of coverage before retirement. The same approach is used by Burkhauser (1979) to compare the return to single persons and married couples, showing that women married to men with larger pensions are making "redundant" (i.e. less than actuarily fair) contributions. A calculation which is forward rather than backward looking is that of the internal rate of return which prospective pensioners may expect, as was calculated for the (never introduced) National Superannuation scheme in Britain in Atkinson (1971). In the United States, Aaron (1977) has calculated benefit-cost ratios under the OASI system, bringing out the different treatment of different groups, such as two-earner and one-earner families, and that the scheme may be regressive rather than progressive, reflecting such factors as differential survival rates. This may explain some of the political support for the social security scheme. [See also Brittain (1972, ch. VI) and Ozawa (1974).] In contrast, Creedy (1982) concludes in his study of the earnings-related pension scheme in Britain
Ch. 13: Income Maintenance and Social Insurance
829
that there is relatively little systematic redistribution of lifetime income for a cohort. Calculations of lifetime redistribution under Canadian pension plans are given by Wolfson (1983). Typically such calculations of lifetime redistribution have been based on simulated earnings patterns, but in some cases actual earnings histories are available. These may cover the entire lifespan, as with the data on workers in Germany born between 1908 and 1913 used by Schmihl (1983). They may cover a substantial fraction of the lifespan, as with the U.S. Continuous Work History Sample used in the Report of the Consultant Panel on Social Security (1976), which discusses the problems of extrapolation to lifetime earnings.
4.5. Politicaleconomy of income support
The reasons why income maintenance policies obtain political support have been the subject of considerable debate, not least in view of the stated intention of governments in the United States and Britain, and other countries, to push back the frontier of the welfare state. [The cutting back of the welfare state is discussed in, among others, Aaron (1983), Danziger (1982), Danziger and Haveman (1981), Le Grand and Robinson (1984), Meyer (1986), and, from a historical perspective, Johnson (1985).] In this debate, one can discern several types of contribution, but the three most important are those which may be labelled - somewhat imprecisely - the "public choice", "institutional", and "Marxist" approaches. Public choice has been a major research area in recent years. It is surveyed by Inman in Chapter 12 of this Handbook, and I make no attempt to cover the ground comprehensively, simply referring to studies that deal explicitly with income maintenance. The essence of the public choice approach is that government decisions are influenced by voter preferences, by the goals of oliticians, and by the bureaucratic machinery, in varying proportions. Many of the applications of this approach in the field of explaining government expenditure decisions have been based on the median voter model, where the preferences of the median voter are assumed to be decisive in a direct election. In a representative democracy, too, the platform chosen by the parties would be designed to occupy the decisive middle ground. This line of explanation takes us back to the reasons why state income support may be desired, for example, because voters' utility functions exhibit interdependencies or because they are guided by a contractarian notion of justice. The median voter model has been applied by Orr (1976) to the explanation of redistribution at the level of local governments in the United States. Under the AFDC programme, the levels of assistance payments are set by the various states
830
A.B. A tkinson
who administer the programme; and the cost is shared between the states and the federal government according to a formula. He sets out a simple two-group model, similar to that used in Section 3 with recipients and taxpayers, with the incomes of the recipients being an argument in the utility function of the taxpayers [it is an asymmetric model of the type developed by Hochman and Rodgers (1969)]. The level of transfers is determined by majority vote in each jurisdiction, with the cost being shared with the federal authority via a matching grant. The theoretical argument suggests that the generosity of the transfer per AFDC recipient will depend on the level of taxpayer income, the recipient/taxpayer ratio, the absolute number of recipients, the federal matching rate, taxpayer preferences, and the composition of the recipient group. Using proxies for the last two of these variables, Orr estimates equations which account for a sizeable part of the variation in AFDC spending across states. There are difficulties with the analysis, including the fact that the number of recipients is treated as exogenous, whereas we may expect its size to depend on the level of assistance, both via migration and via transfer from the taxpayer group. But Orr correctly recognises that the position of the local authority is similar to that of an individual taxpayer in that it faces a budget constraint, determined by the federal matching formula. As discussed earlier, the slope of this constraint may be endogenous, depending on the local decision, and this is allowed for in the estimation. Moffitt (1984) gives a fuller treatment of the intergovernmental budget constraint, as applied to AFDC, and draws conclusions about the design of the matching grant formula. The median voter approach does not require that redistribution be based on the notion of interdependent utilities. Peltzman (1980) bases his theory of the growth of government on voters being concerned solely with their own welfare, the critical features being the degree of inequality and the ability to perceive and articulate self-interest. Dryzek and Goodin (forthcoming) in their analysis of the foundations of the post-1945 welfare state in Britain argue that people will support major redistribution at a time when there is major uncertainty. They suggest that the pervasive uncertainty generated by the Second World War led to a growth of risk-sharing arrangements. The government took over the task of providing universal war risks and war damage insurance in response to public desire for risk-spreading, and this in turn provided a major impetus to the growth of the post-war welfare state. The median voter explanation implies that political parties as such are irrelevant, in that no variation in policy would typically be observed according to the political party in office. The political ideology of the government would only become relevant when it had some degree of monopoly power. This led Kasper (1971) to argue that the dispersion of policies will be greater in states where there is little or no political competition than where there is competition and the politicians are constrained by the preferences of the median voter. He tests this
Ch. 13: Income Maintenance and Social Insurance
831
empirically in a study of welfare programmes in the United States and their variation by state. He divides states into four categories: one-party Democratic, modified one-party Democratic, two-party competitive, and modified one-party Republican. The results indicate that the variation in welfare payments is generally less in the two-party competitive group, although he recognises that there are alternative explanations. The variation by states in the United States was also used in the study by Davis and Jackson (1974) of Senate volting on the Family Assistance Plan proposed by Nixon but defeated by the Senate in 1970. They note that the plan was supported by equal proportions of Republicans and non-Southern Democrats but that all Southern Democrats voted against. Using survey information on the views of families, they show that the degree of support for the Plan varied significantly with income and region (the latter not according to the apparent self-interest in that the South could be expected to obtain more benefit but gave less support). This variation in constituency support could in turn explain a substantial part of the Senate voting patterns. The failure of the Family Assistance Plan has been taken by several authors as a case study of the political process. Marmor and Rein (1973) note that when the plan was announced most commentators were in favour, but the "euphoria" of 1969 gave way to a congressional stalemate. One account of the events is that by an active participant, Moynihan (1973). This book, together with those of Bowler (1974) and Burke and Burke (1974), is critically reviewed by Williams (1975), who discusses among other aspects the role of economic analysis in the political process. He expresses the view that "the long struggle for [the negative income tax] is surely one of the finest hours of social policy analysis. Never have policy analysts been so prominent for so long in a social program area" [Williams (1975, p. 434)]. Politicians are one set of actors in the public choice model; pressure groups [whose role is analysed by Becker (1983)] are .another, as are special interest groups. Lindbeck (1985) has argued that the more recent expansion of the redistributive activities of the public sector has taken the form of "fragmented horizontal redistributions", in contrast to the broader redistribution which had dominated earlier policy. These reflect politicians "buying votes" from various minority groups. He suggests that one by-product is that the size of gross transfers is much larger than that of net transfers. The public choice approach is explicitly grounded in the theory of the political process. The second set of explanations, which I have grouped together under the broad heading of "institutional", are more concerned with the development of the welfare state as a social and historical process. In Britain, this is typified by the writing of Titmuss (1958, 1968), in which he sees the evolution of income support as part of a wider process of changing social structure. He attaches particular significance to the role of occupational and fiscal welfare, an aspect
832
A.B. Atkinson
which has been developed by Sinfield (1978). An important example is provided by the relation between private pension schemes and state pensions. Thus, Abel-Smith (1984, p. 173) has recently argued for the retention of state earningsrelated pensions on the grounds that "if the social security system does not incorporate earnings-related benefits, these are bound to develop independently. And it is these occupational provisions, both public and private, that have created the most glaring inequalities and inequities." The development of "fiscal welfare" was stressed by Titmuss long before the concept of tax expenditures was widely discussed: "since the introduction of progressive taxation in 1907 there has been a remarkable development of social policy operating through the medium of the fiscal system" [Titmuss (1958, p. 45)]. Recognition of the parallel function of cash transfers and tax allowances has however failed to influence the political debate. As noted by Marmor (1971, p. 85n), "it is extraordinary how difficult it is to convince the skeptics that tax exemptions are functional equivalents of direct government expenditures". He quotes the example of Aaron as to how absurd it would be if the double exemption for the aged under income tax were proposed as a cash programme: "yesterday on the floor of Congress, Senator Blimp introduced legislation to provide cash allowances for most of the aged. Senator Blimp's plan is unique, however, in that it excludes the poor. The largest benefits - are payable to aged couples whose real income exceeds $200000 per year" [Aaron (1969, pp. 4-6)]. The institutional approach provides one way of rationalising some of the differences between countries in their income support programmes. Aaron (1967) examined the per capita social security expenditure of 22 countries in 1957, and found a significant relationship with the age of the system (in addition to per capita income), concluding in fact that this is the single most important determinant. This has been more fully developed by Wilensky (1975). Using data for 60 countries in 1966, he concludes that the ratio of social security spending to GNP is related significantly to the age of the system, to GNP per capita, and to the proportion of the population aged 65 or older; and that the type of political system had little independent influence. Wilensky discusses in addition the evidence from time series; on this, see also Heidenheimer and Layson (1982), who consider the balance between universal and selective programmes. The third type of research, that labelled "Marxist", is also historical in its approach, and it may be seen as offering one explanation of the political and other forces which have been observed in the determination of welfare policy. At the same time, a central insight of this approach is that the welfare state has a contradictory position in a modern capitalist economy. As it is put by Gintis and Bowles (1982, p. 341), "the contradictory nature of the welfare state stems from its location at the interface of two distinct sets of 'rules of the game' ". Whereas the welfare state is founded in a liberal-democratic manner on claims on resources being related to citizenship, capitalism recognises claims on resources based on the ownership of property. "Where the games are overlapping, as in all
Ch. 13: Income Maintenance and Social Insurance
833
modern capitalist welfare states, there is [increasingly] conflict between the regime of citizen rights and the regime of property rights" [Gintis and Bowles (1982, p. 341)]. On the other hand, the welfare state is seen by some Marxists as an instrument of capital. Gough (1979, p. 11) describes the analysis by radicals and Marxists of "the welfare state as a repressive mechanism of social control: social work, the schools, housing departments, the probation service, social security agencies - all were seen as means of controlling and/or adapting rebellious and non-conforming groups in society to the needs of capitalism". 4.6. Concluding comment In seeking to explain why income maintenance programmes have enjoyed less than complete success, two aspects should be stressed. The first is the problem of sheer complexity. The complex provisions and interactions pose serious barriers for recipients, administrators, and legislators. Economists and others have an important role to play in ensuring that the implications of present policies, and proposed reforms, are fully understood. The second is the failure of political will which prevented proposals for income support being enacted in their entirety (such as the Beveridge plan to do largely away with means-tested assistance) and which has led in recent years to the cutting back of the social insurance system. Account has to be taken of the background of shifting political objectives and support, and programmes for reform have to be designed with a view to the political processes necessary for their introduction. 5. Effects on economic behaviour Much of the recent growth of interest in social security amongst economists has stemmed from an anxiety that the existence of social security has had significant effects on behaviour. The supposed impact of social security on labour supply, on welfare dependence, on retirement, and on savings, has therefore come to play a central role in discussion of reforms in income support. In this survey, I concentrate on the influences on family decisions. The analysis is therefore a partial one, and the effects on, for example, the employment decisions of firms are not considered. (Thus, when discussing unemployment insurance, the incentive to lay-off workers is not addressed.) The theoretical models employed in the analysis of the effects of social security, and the econometric techniques used in the estimation of empirical relationships, are in many respects similar to those used in the analysis of taxation. Indeed, concern for the possible disincentive effects of a negative income tax was one of the main reasons for the resurgence of empirical work on taxation and labour supply. There is therefore considerable overlap with Chapter 4 in this Handbook by Hausman on labour supply and, to a lesser extent,
834
A.B. A kinson
Chapter 5 in this Handbook by Sandmo on savings. In view of this, I have chosen to concentrate on those aspects less fully covered by these authors. In particular, I have not considered the influence of social security on the hours of work by family heads or on the participation by married women in the labour force. Nor will I devote space to the negative income tax experiments. In addition to the survey in Chapter 4 of this Handbook, there is extensive coverage of these topics in Keeley (1981), Killingsworth (1983), and Pencavel (1984). At the same time, it has to be remembered that there are important interrelationships between different family decisions. The labour supply decisions with which I begin, dealing with retirement in Section 5.1, disability in Section 5.2, and duration of unemployment in Section 5.3, all are closely related to the decision to claim benefits. This is treated explicitly in Sections 5.2 and 5.3, where some studies focus on the probability of claiming disability insurance or receiving unemployment benefit, and it forms the subject of Section 5.4, which is concerned with the probability of claiming welfare/public assistance. Receipt of welfare may in turn have an impact on decisions about family formation, and the evidence about this is considered in Section 5.5. Decisions about participation during the prime working years have an influence on pension entitlements and hence on retirement decisions. This is affected too by the level of savings, which is the subject of Section 5.6, and over the life-cycle savings and retirement decisions are likely to be made simultaneously.
5.1. Income support and its effect on retirement decisions In the United States there has been a steady decline since 1900 in the participation rate of men aged 65 or over, with an acceleration after the Second World War [Aaron (1982, fig. 4)]. This trend has been seen by some as coinciding with the expansion of social security and the conclusion drawn that there must be a causal connection. However, such a conclusion cannot be drawn so easily. The trend to earlier retirement appears to pre-date widespread eligibility for state pensions, and it is far from evident that the cuts in social security in the 1980s will reverse the trends. In this context, it should be noted that this review is concerned with state pensions, not covering occupational, or other private, pensions, and that disentangling the effects of the two systems is no easy task. Moreover, it is evident that any correlation must be interpreted with considerable care, a care which is not always exhibited by those anxious to draw empirical conclusions. We need throughout to bear in mind that both the labour supply responses and the expansion of social security may be the product of other economic changes. In Section 3, the simple theoretical model portrayed the retirement decision in a static manner, in which the introduction of a categorical benefit to the retirement would tend to reduce participation. This remains true if the benefit is
Ch. 13: Income Maintenance and Social Insurance
835
tapered with earnings, through an earnings test, rather than removed completely if the person works (as shown in Figure 3.2), in the sense that the existence of the pension reduces the probability of participation. It is of course possible that variations in the size of the pension about its present level have no effect because all those qualifying (over the minimum pension age) have decided to retire, i.e. there is a corner maximum. Variations in other parameters of the pension may also have no effect on the participation decision. As noted by Gordon and Blinder (1980), the tax rate implicit in the earnings test in the United States has no effect on the corner condition since the first tranche of earnings do not affect the pension. If people are free to vary their hours of work, and there are no fixed costs of working or other sources of non-convexity in the budget constraint, then the decision to increase hours above zero will be unaffected by the earnings test. If working 1 hour a week is not attractive, then no amount of work will be, and the earnings test is not relevant. A static model is of course unsatisfactory for a phenomenon such as retirement. Models of life-cycle choice of retirement date are described by, among others, Feldstein (1976a), Hemming (1977), Sheshinski (1978), and Hu (1979). In this context, the explicit treatment of uncertainty appears especially important: people experience considerable uncertainty about wealth accumulation, financial needs, health, and job opportunities. These uncertainties imply that people are continuously reconsidering their plans for retirement and wealth accumulation as their economic and health positions develop [Diamond and Hausman (1984b, p. 97)]. The importance of the intertemporal aspects may arise on account of the rules of the pension scheme and the complex interdependencies that are induced. A person in Britain who reaches the minimum retirement age may choose to defer retirement, in which case the pension ultimately received is increased, acting as an incentive to defer retirement in certain cases. In the United States, Blinder et al. (1980) have argued that the increase in pension through deferred retirement may be more than actuarially fair for those aged 62-64, and that this may be enhanced through the formula that relates benefits to the earnings of a subset of years in the person's career. Through the automatic benefit recomputation, earnings at age 62 may raise the average on which pensions are calculated. The details of the calculations have been the subject of debate [see the comment by Burkhauser and Turner (1981) and the reply by Blinder et al. (1981)], but Blinder et al. (1980) have made an important contribution in drawing attention to the complexities of the trade-offs with which people may be faced under the social security system. How far people are in fact well informed about the provisions is a matter about which little is known. The perceptions which govern behaviour may be quite different from the reality. As in all areas of applied economics, the extent of the link between theoretical models and econometric specification varies considerably. Table 5.1 summarises
A.B. Atkinson
836
"l ,,wE m 4 co
~~~~~~~>, ..C
w7 9 4
u
04
·a,~.o f~ e,
w
-a
o
8
'cj
*8
9EU _
u
©le
X
P -i o )
,
U I Ll
C 'I:
.s
, o
,"
O
c
,u m o
._sc
oo
o
" vB>
cd
N
-
II
vlg E:LE
X
0
UWcs
0
Z
mm ~
z~in
>aD D to 0,c~o~ m
837
Ch. 13: Income Maintenance and Social Insurance
c
" m o r~
i -;- O
i
X
N
n
E'
o~
ocpo 93 0%
°
.
.
.
°
'~
E"
e
o
3. GtQU0)
1
~%
L)NOC
00 Q ;
;C 4
t
v)
0 n
o
E r>
cn ,,,
-OnS
o
_
0
~:
_Q U' 0o 0 1 0
a,
I 0
;, ,o _
0
._
d
M -
bo
5 X
838
A.B. Atkinson
II
,ox
11L u6 -'O 0.:,
C C C)
o
:n e
IC c C/i
0 r
e:
vz
2
0 -0
-M E! va
0
C g 0
~A
C)4
C)
CE
Ch. 13: Income Maintenance and Social Insurance
C)~~~~~~~~l
C - ,)
oI
.
002
0
C~
839
' C)
-a
=o
~ea
.=~-C)NCo)I ~
C
En .
6'~~~~~~~~ C)~ ~~ a o o9 N -e-
0
o0
o
2g
a~~ s~~
0
0*ci
vl
~~C a0 0
,o~~
a a
020 ~ ~~~~~~~~~~0l ~ ~ ~~'C cL
C
O)
0 g~~a~2,_q" O
a
3
C)L
Z
°
Z ~ ..~ .~
E'[.
I~It06
aS s-~~" 0-
)
~ ~~~f~~~~·a ~
a a
_1C O.>S
iao
Q tC Q
i,
x: on Cd~ ·.= a*wi Ea a v a c~- a~ ~a a n.2
, a' ut0
*~"~
a, a~~~~~~~~~~I
-C)
0a
.0
OC)
a.~a aac
ao
0.
~aO
a i
'a8
A
~
·j~'~
840
A.B. Atkinson
-
a r) to-e
>wa , eO
to
.
._
, Cv)
E
o
-
In C) c
U
8Oe m
1)
to _
a r'
C)
EDa
a
s
a C)
a
to
'to
z C) . P.
c)
to
'.
,,.
a
C)o
a aOC
OC
~
""q~
Do3
to
E
ao
e
30
3
oo
'0
'_
a Ca~~~0C C:
a--a
r-
~~ a,
pp q.
E P c
J
~a
s g:
to-Cl-
-
._2 Do
s! at Q
&Ei°
0
C)
aa , a a >
o D
o
C0
0
88 B
Xv 0o
a a
C
a
a.
'a~
;S
-Y
t
-
o
a
3 a, C)
-~-"2
a
E)
C 0 O-C:
)
~cP
C)C)
a
~-j:
-C
ai
to'
> oO
XC)
aa
Ch. 13: Income Maintenance and Social Insurance
841
rI
o
' Ci c
Cl
t0
4a
Ov i 04-
V)C .0
a;
1/i
I)
In E
E -u
8
~~c:C C-
S V
_
a
~ c;
i
Z >8, - a D D nn
O'
842
A.B. A rkinson
the main features of studies of the retirement decision, beginning in the first column with the underlying theoretical model. The table, like those that follow, draws heavily on the valuable survey by Danziger et al. (1981), but since this survey deals almost exclusively with the United States, I have supplemented it with studies from Britain and certain other countries (as well as including U.S. studies that have appeared since 1981). In this, and subsequent tables, I have not attempted to provide exhaustive coverage of all relevant studies, rather taking a selection that is broadly representative of recent work. The evidence about retirement, like that about the other decisions discussed, may come from one of four different types of source: (1) cross-country comparisons, (2) time series, (3) cross-section data, and (4) panel data. The last of these is where information is collected from individuals, or from their records, at a series of dates. As may be seen from Table 5.1, most of the studies are based on cross-section data, but a feature of the recent research has been the increased availability and use of panel data. The different types of evidence pose different problems of interpretation. In the case of cross-country studies, there may be economic, social or cultural factors which lead to differences in retirement ages and which also lead to differences in the provision for that retirement, without there necessarily being any direct causal connection between benefits and retirement. In the case of time series, there is the same problem in a different dimension. We observe an upward trend in both retirement and in pensions, but find it difficult to isolate that element which is a causal relationship from that part associated with other trends in the society. The use of cross-section data allow us to explore the relation with individual benefit, which may be less highly correlated with other factors, but it is the relative behaviour which is being modelled. We may find that people with high pensions tend to retire earlier than those with low pensions but this is still consistent with there being no causal effect. The existence of pensions may influence who retires but not the average retirement age. The different types of evidence have different strengths and weaknesses. In the case of the cross-country studies, one obvious shortcoming, recognised by the authors, is that there is a small number of observations. Pechman et al. (1968) caution the reader that for this reason, and on account of the difficulty of obtaining comparable data, their results need to be treated with care. The same applies to the time-series studies, where annual observations for the post-war period are unlikely to offer a firm basis for separating the different influences in operation. For these reasons, most weight has been given to the cross-section investigations.
Ch. 13: Income Maintenance and Social Insurance
843
The cross-section studies have a number of features in common. With one exception, they relate to the United States. In several cases, they make use of the same data set: the Longitudinal Retirement History Survey, referred to as RHS. [This data source is a panel, but the studies do not typically make explicit use of this aspect; where more than one year is taken, as in Hanoch and Honig (1983) for example, the observations are treated as independent.] The sample size is typically at least a thousand, and this is one respect in which research has altered: the early study by Boskin (1977), for example, used only 131 observations. The estimation methods employed are typically fairly sophisticated, with allowance for the limited range of dependent variables and for sample selection [see Maddala (1983) for a useful review of the estimation problems in general]. At the same time, despite these similarities, the results reached in the different studies vary considerably. If we consider both the cross-section and panel studies for the United States listed in Table 5.1, then we can see that the following conclude that pensions have a significant influence on retirement: Burkhauser (1980), Hall and Johnson (1980), Hanoch and Honig (1983), Pellechio (1979), Quinn (1977), Boskin and Hurd (1978), and Diamond and Hausman (1984a and 1984b). See also Hurd and Boskin (1984), not shown in Table 5.1. On the other hand, the following conclude that the effect of pensions is either statistically insignificant or economically unimportant: Gordon and Blinder (1980), Hamermesh (1984), Kotlikoff (1979b), Burtless and Moffitt (1984), and Mitchell and Fields (1984). This is rather confusing. The first lesson to be drawn is that it is important for researchers to explain the relationship between their results and those of other investigators; and best practice applied econometric work now includes such a comparative analysis of findings. There are several reasons for these differences. Here three are considered: model specification, treatment of benefits and taxes, and choice of sample. 5.1.1. Model specification The first point to be made about the specification is that the variables to be explained differ considerably between the various studies. Some treat retirement as an either/or situation (not always with the same empirical definition). This applies to Burkhauser (1980), Pellechio (1979), and Quinn (1977), for example, in whose studies the variable to be explained is the state of retirement or not. The same discrete approach may be modelled in terms of the planned or actual retirement age, as in the studies by Hall and Johnson (1980), Kotlikoff (1979b), and Mitchell and Fields (1984). Such a treatment does not allow for partial retirement, and Boskin and Hurd (1978) [and Zabalza et al. (1980) in the United Kingdom] widen the choice to three states, including semi-retirement where the
844
A.B. Atkinson
person continues to work part time. The hours of work may however be treated as a continuous variable, as in the theoretical formulation of Gordon and Blinder (1980), in which case they may be combined with participation to yield a total hours variable, as in Hamermesh (1984). There are, however, good reasons for treating the participation and post-retirement hours separately, and this is the. practice of Hanoch and Honig (1983) and Burtless and Moffitt (1984). The second aspect of the specification concerns the underlying model. The earlier studies tended to write down directly the equation to be estimated rather than derive it from any more basic model. This raises the problem of relating the results of different studies. Thus we might model the probability, f(t)dt, of a person retiring in the interval [t, t + dt] conditional on not having previously retired, and treat f(t) as a function of, for example, the benefit rate. Or we might model the probability that the person has not retired by date t, giving G(t), the "survivor" function, which is related to f(t) by G(t) =exp[-ff(v) dv]
(5.1)
It is clear that a linear influence of benefits on f(v) at time v will have a (non-linear) influence on G(t) for all t > v. (The same kind of model is used in a number of studies of unemployment duration - see below.) The choice-theoretic basis for the retirement decision is considered by several authors. One approach is to represent this choice in terms of a person's reservation wage, where he participates where the offered wage exceeds this reservation wage. This provides a pair of equations, with the market wage being observed where the person participates, and this is the route followed by Gordon and Blinder (1980). A second, and more basic, approach is in terms of explicit utility maximisation, as is exemplified by the work of Zabalza et al. (1980). They posit a constant elasticity utility function defined over income and leisure, and use this to evaluate the utility levels at three points on the budget constraint. This technique is the same as that used for labour supply by Burtless and Hausman (1978). As these latter authors argued, one of the virtues of the approach is that it makes explicit the source of variation across individuals. In the Zabalza et al. (1980) study, people differ in their marginal rate of substitution of income for leisure in a way that reflects both observed characteristics and a stochastic term. 5.1.2. Treatment of benefits and taxes The treatment in terms of explicit utility maximisation allows one to handle the quite complex budget constraints which may be generated by benefit systems and tax systems and their interaction. It is this complexity which casts serious doubt on any attempt to use an aggregate index, as is necessary when using time-series
Ch. 13: Income Maintenance and Social Insurance
845
or cross-country data. It is not like studying the demand for cigarettes, where the price moves over time in a broadly similar manner for all consumers and we can therefore reasonably take an overall index. In the case of retirement, the "price" in terms of the income difference, varies in different ways for different people, depending for example on their past earnings histories. The indicator of pensions relevant to those at the margin of retirement may move quite differently over time, or vary quite differently over countries, from the average levels of benefits typically employed. This is an argument for the use of individual data, but the complexity of the budget constraint raises problems here too. The relevant parameters of the benefit and tax system may not be immediately apparent. This is illustrated by the ingenious argument of Gordon and Blinder (1980), referred to above, that the relevant consideration for participation is the return to the first hour of work, leading them to suggest that the earnings test is irrelevant. It is, however, not just a question of determining the relevant marginal incentive or disincentive but also a question of how the system is perceived by the people concerned. Evidence from the OPCS Older Workers and Retirement survey in Britain shows limited understanding of the provisions for earning additional pension by deferring retirement or of the earnings rule. In the latter case, it is noted that "even among those subject to the earnings rule, only a minority knew how it operated" [Parker (1980, p. 45)]. As a result, it may be better to try and model the perceived budget constraint than to refine the treatment of the theoretical trade-offs. The treatment of the amount of pension entitlement differs across the studies both because of the differences in the available data and because authors take different views as to the appropriate choice of independent variable. It is in the nature of the exercise that the actual pension is not observed for those who have not retired, so that a calculation must be made of the potential pension. In order to calculate the potential pension, one needs access to data on earnings records similar to that collected by the social security administration. Where these data are not available, then it may be necessary to proxy the entitlement, perhaps the simplest such proxy being the 0/1 eligibility variable used by Quinn (1977). Most of the studies have, however, used sources which contain past earnings records for at least part of the career. There are two issues which arise in the use of such records to construct potential pensions. First, some authors adopt the practice of using the actual earnings where available and otherwise the calculated earnings. To the extent that the calculations systematically under- or over-state the pension, this is a potential source of bias in the estimated coefficient on the pension variable. Second, several authors have drawn attention to the fact that the actual earnings may be related to unobserved variables which also enter into the determination of retirement behaviour. A person who has a taste for work may both have acquired a higher pension entitlement, through having worked longer hours and earned more, and
846
A.B. Atkinson
for the same reason be less likely to retire early. Thus, Boskin and Hurd (1978) use a two-stage estimation procedure to allow for this endogeneity of benefits, and Gordon and Blinder (1980) use an instrument constructed by assuming specified annual hours and using the estimated individual wage rate to calculate earnings and hence pension entitlement. 5.1.3. Choice of sample No fewer than seven of the studies listed in Table 5.1 make use of the same data set: the U.S. Longitudinal Retirement History Survey. Yet the samples chosen for analysis differ materially. The survey is a continuing survey and the first point of difference is in the wave or waves taken: Quinn (1977) and Hall and Johnson (1980) take the data for 1969; Boskin and Hurd (1978) use those for 1969 and 1971; Gordon and Blinder (1980) those for 1969, 1971, and 1973; Hamermesh (1984) and Hanoch and Honig (1983) take the story up to 1975; and Burtless and Moffitt (1984) are the most recent, using information up to 1977. Secondly, there is the difference between those who make use of the panel nature of the data, as with the studies shown towards the end of Table 5.1. The value of panel data is well illustrated by the work of Diamond and Hausman (1984a, 1984b). We have, however, to bear in mind the problems which arise with such data from attrition. This may arise because some people did not respond to all waves of the survey, or it may reflect mortality of the respondent of spouse (where attention confined to intact couples). Diamond and Hausman assume that attrition is independent of the causes of retirement, but in earlier work Hausman [with Wise (1979)] has examined the consequences of attrition being correlated with behavioural responses. As is noted by Diamond and Hausman (1984b), non-independent attrition could be allowed by via a "competing risks" model [see Lawless (1982, ch. 10)]. The bias which may be introduced through sample attrition is an example of a wider problem. In the use of any subsample, there is the danger that the sample inclusion criteria may generate "selection bias", the implications of which have been extensively studied by Heckman (1976, 1979) and others. The simple procedure proposed by Heckman for the case of normal distributions is to include for each observation the inverse Mills ratio calculated from an equation explaining sample inclusion, and this procedure is followed in some of the studies listed in Table 5.1. For example, when Kotlikoff (1979b) takes the subsample with expected retirement age greater than 62 he includes the inverse Mills ratio in the equation for the expected retirement age. The same procedure is used where wage equations are estimated (to calculate wages for those already retired) using data on those who are working. For example, Hanoch and Honig (1983) estimate wage equations for men and women aged 58-69 finding that the selectivity effect (inverse Mills ratio) is significant for men but not for women.
Ch. 13: Income Maintenance and Social Insurance
847
In fact, from Table 5.1 it may be seen that most investigators limit their attention to a subsample of the survey. Broadly, if you are a white employee, married and male, and responded to all questions, then you stand a good chance of being included. But if you were black, female or self-employed, then your retirement decisions are unlikely to be modelled. This of course raises questions about the extent to which the conclusions may be extrapolated to the population as a whole. It also raises questions about the validity of the resulting estimates. The issues raised by the deletion of observations - that is the deliberate limitation of the sample by the investigator- are discussed by MacDonald and Robinson (1985). The elements described above are some of those which may lead to the differences in results. In some cases, authors have noted the differences between their findings and those of others and sought to explain them [an example is Diamond and Hausman (1984b)]. There is, however, need for further work reconciling the different findings. 5.2. Disability benefits and participation
Concern for the position of the disabled has undoubtedly grown in the past 20 years, not least because people have increasingly seen that there are common problems shared by war veterans, those born with handicaps, and those with industrial injuries. The problems may be common, but the financial and other provisions have differed considerably, and there have been moves to ensure a more coherent and effective system of support, or disability benefit, for those whose earning power is diminished. At the same time, the expansion of financial support has attracted criticism from those who have argued that it has reduced the incentive to work. Parsons, for example, documents the changing participation rate among men under 55, and argues that "the sharp decline in adult male labour force participation among both blacks and whites has coincided with a rapid expansion of the Social Security disability program" [Parsons (1980b, p. 912)]. The provision of disability benefits raises many issues, and these have been discussed in a U.S. context by Burkhauser and Haveman (1982) and in Britain in Walker and Townsend (1981). Haveman et al. (1984) provide a recent account of policy towards disabled workers, covering West Germany, France, Israel, Italy, Netherlands, Sweden, and the United Kingdom. Here I concentrate on the incentive issue. Table 5.2 summarises six studies from the United States, all based on cross-section or longitudinal data. [See Wolfe et al. (1984) for a study which covers both the United States and Netherlands.] The research in this area has been much less extensive than that on retirement but the same range of findings is evident. On the one hand, Parsons, in an analysis which goes beyond the
848
Atkinson
a = Q00
.lCG >QD
C) 0
a u
e
-;
0
,o Q alaE 0
_
So as or
u
'
H o @ -
. Cp .5 o
C
m0
,v
cQ0 -
e
1 '
Uo
Q0 I i
_
.
_
..
E .
CO n
o 0
Nc
oa
.C
QEI
COD
' :SCOi D D Id
CO
0
a
0oZ
0
F0 t
-
e
0
q. !5g Dl
-G
3u
D CO
~
I 0 0-a
O '----0 COO
C
C.
C--CC~. ._
i 0 -
8qsh
CO -0
O'6CO~
-s.-
75
8--
a0
~0
-o
m '0j~ O 00 s'~ 0-
s 0 0 i.
O >
c'0
0 O~
0c~g-~
;
aQ
u 0 ·c 0 ~~ 00 00
'0 C.
5u
o~~
o oC O %
CC~E $
u0
Ch. 13: Income Maintenance and Social Insurance
849
~-
*0U
C0 L
U
m
00
C8 **
In,
c
ZsO C
- 0
I
o 5n , 0 -
0) -
a~-~
6'co
0E
r E 0) ff
oY~
I UI
,
E
0
0:
._m
D
,
c,
,~
a o'"'
~0.. u-:
n
0
~
o
;
*
a
rC
r
GiCa, 0)
.;~
Q>
~~0
0-0
c 'r
c~3
~
0
-0
2-
00~
5~oir
vi
~0
,°=I
-_
0 O
oe z
·
og~-, 8
S
~ · %~ ~~~~qwe,~~~~~~~~~~~0
.0-0 0) D3
hOC]0,~
I-.-.
Z:v
°;CNO
~
&~~·r $~
S
o,.~ i0 · I~ho<s p8
.,~
00. 0
o) 0 ...
U U U 0)U
~
850
A.B. A tkinson
simple time-series point of the previous paragraph, concludes that there is a significant relationship between non-participation at the individual level and disability income, with an estimated elasticity of 0.63. On this basis, much of the observed increase in non-participation can be explained. In the subsequent paper [Parsons (1980b)] he reports an estimated elasticity nearly three times as high (1.8), although the figures are corrected to 0.49, and 0.93, respectively, in Parsons (1984, p. 544n). On the other hand, Haveman and Wolfe (1984a) find a significant but small coefficient, and conclude that more generous disability benefits account for only a small part of the increase. In the middle comes the study by Leonard (1979), who attributes about half of the increase to the growth in transfers. These findings illustrate again that one can draw quite different conclusions even when applying similar methods and using similar types of data. As noted, Parsons himself obtains estimates that differ substantially, and this difference appears to arise from (i) consideration of labour force status in 1966 rather than 1969, (ii) augmenting the sample by including those without wage information for 1966, and (iii) use of an estimated wage for those not reporting this variable [Parsons (1980b, p. 919)]. In their critique of the study by Parsons, Haveman and Wolfe (1984a, 1984b) provide a useful table showing how they arrive at different results. Their replication of the model of Parsons on the data set employed, the Michigan Panel Study of Income Dynamics, yields a similar coefficient for the social security variable. They then vary the specification, showing that the introduction of benefits and wages separately, rather than in ratio form, leads to neither being significant, and that a correction for the selectivity bias associated with using only data on those with reported wages also reduces the coefficient on the social security variable. This lack of robustness of Parsons' results may be due to the small sample size used by Haveman and Wolfe, as suggested in Parsons (1984), but this kind of systematic comparison of results is clearly necessary if we are to understand the source of the discrepancies. In the case of disability insurance, there are similar problems to those in the study of retirement. First, the specification of the model needs elaboration. The recent work by Halpern and Hausman (1986) provides an example as to how the participation decision can be given a more explicit theoretical structure. This applies both to the underlying decision framework, based on expected utility maximisation, and to the institutional context, where Halpern and Hausman take account of the probability of acceptance, assuming that people form an unbiased estimate of their probability of acceptance. Consideration of the decision made by applicants has also to allow for the other possibilities which may be open. Haveman et al. (1984) examine the position of older workers who can choose between applying for disability insurance, working, or taking early retirement. Secondly, the treatment of benefits raises several issues. In the present case, the problems of predicting benefit entitlement seem especially acute. The method
Ch. 13: Income Maintenance and Social Insurance
851
used [for example, Parsons (1980b)] is to take recorded wages, to extrapolate these to a career earnings variable, and then to apply the benefit rules. Not only is this a rough approximation to the benefit which would be received but also it takes no account of the probability of acceptance. Eligibility depends on evidence of "total and permanent" disability and that no substantial gainful activity is possible; as a result only between a third and a half of applications are successful [Lando et al. (1982)]. Modelling the probability of acceptance is one of the features of Halpern and Hausman (1986). They find that a variable based on nine questions about functional limitations is highly significant, as are certain health conditions. Thirdly, there is the choice of sample. As emphasised by Parsons (1984), the quality of the information on health status is of central importance, and this limits the data sets which can be used. At the same time, relatively little is known about the incidence of disability of the national population, and the use of special surveys of the disabled raises the problems of extrapolating the findings to the population as a whole. Finally, it should be noted that this discussion has been confined to disability; we should also consider the relation between benefits and short-term sickness absence. This has been examined in the United Kingdom by Fenn (1981). 5.3. Unemployment insurance and work decisions
The rise in unemployment in the 1970s and 1980s has led to renewed interest in unemployment insurance. This might seem scarcely surprising in that the massive increase in unemployment has put the insurance system to a test which it had never previously faced in the post-war period, and there might be reasons to doubt that it had been fully effective in providing income support. It is not, however, these aspects which have received prominence in recent economic* debate. It is the incentive rather than the income support aspects that have been the focus. How far has the existence of unemployment insurance reduced the incentive to work? Are improvements in unemployment insurance themselves in part responsible for the rise in unemployment? The studies which have been made to throw light on this question-summarised in Table 5.3 - illustrate the strengths and weaknesses of different types of evidence about incentives. First, there are time-series studies, favoured by those who have argued that unemployment benefits have led to higher unemployment [a recent example in the United Kingdom being the work of Minford et al. (1983)]. Time series has the merit that it allows one to follow the consequences of changes in unemployment insurance. In the United Kingdom, one of the bestknown of such "experiments" was the introduction in 1966 of the earningsrelated supplement (ERS) which added substantially to the benefits which certain
A.B. Atkinson
852
~C
OO
4 u 1:
·
0)
0
oh
_,.
Q °;
, 0
09
R
-a,'
') h
t
d
e(U cd
a
E, E .)~ok E. o
6
E;
m0~
"~
B~~~~~. 0~
m0 0.)X3
;
s00 0D
a 9
Nq
N~~~~~~~~~~~~~~~~~O
Cl
ON3
.2vl
-0
uS
CL
0)
i~~~~b
-,
0
,
1C 9O
T~~~
-
-'0 .
00" 0
Cd
' 00
:
0i,0, ;-
0)
0~ 000o
~ ~~~de
~
0
0.~
-d,
-
0 0
40 0
C)
a,0 08SE
0
r:
0 )° 0
0
3Q ac 0 .Z So 0 0l
Ch. 13: Income Maintenance and Social Insurance
tQ >1 oo>
853
) -a
a -a
' U
a -
*- U jU
U
c E3 CJ a
, r2O: U
.5:v ·
-o '0 a
o
~C
,~ci
© m
ao
10 .0 I·
O
O:
U
r~
C
Ct
U
a 'a
a
U '3
a
0 E
oc : .u
O
-a·~
CCC
E
a
a .c 0 au U 3r
~
a3.
-,.
n a S
.
-
.
a 0. -a a3co · 0
v
854
A.B. Atkinson
Y;s,, ~~C)OE~ E E -av
o'E
a a
'f 5~"
~e
t:
X11 I
3
aocm
rl
S
a,
af i. I.I
c
a
o
,o Id 0~ ·
:0
3
E .
a:
* O. o
I0-
C
iUU .:0 rV'
-·
a 0
aa~ ja ,3
u
;
860
A.B. Atkinson
workers could receive. The initial impetus for studying the consequences of this change came from the observation that it had been followed by a significant rise in registered unemployment. The problem with the use of time-series evidence is that it is difficult to isolate the effect of policy changes, such as the introduction of ERS, from other influences. It is for this reason that the word "experiment" is in inverted commas. The early attempt to examine the relationship in the United Kingdom, by Maki and Spindler (1975), introduced into the equation for unemployment variables to control for the structural and cyclical factors affecting unemployment. Allowing for these, they conclude that increases in unemployment benefit had a "substantial" effect on unemployment. The same conclusion is reached by Grubel and Maki (1976) for the United States, although their estimate of the elasticity is around ten times that in the United Kingdom- see Table 5.3. Grubel and Maki (1976, p. 293) themselves suggest that the time series may lead to an over-estimate of the response, and feel that the elasticity estimated in a cross-state analysis is a more accurate reflection of the unemployment induced by benefits. The argument given as to why the time series may over-state the elasticity is that there may have been other variables changing over time in the same direction [the examples given by Grubel and Maki (1976) include changes in public attitudes towards work]; and there are reasons to doubt whether the control variables are sufficient to represent what would have happened in the absence of changes in benefits. In the United Kingdom, it has been shown that the estimated effect of benefits is sensitive to the introduction of a time trend [Taylor (1977)] or of a permanent income variable [Cubbin and Foley (1977)]. The study by Junankar (1981) shows the the findings are not robust with respect to the choice of estimation period. Consideration of the specification of the equation to be estimated leads one to ask how the model relates to the treatment of the labour market in general. In the studies by Maki and Spindler (1975) and others, the precise status of the equation is not made clear; presumably it is a reduced form, obtained from a fuller set of underlying relationships. The time-series approach needs in effect a fully-specified labour market model. The work of Minford et al. (1983) provides one version of such a model, although its structure is not easy to understand. A more explicit model is that of Nickell (1982), who models separately the outflow from unemployment and the inflow from employment, finding from time-series evidence in the United Kingdom that benefits have no significant impact on inflows, but that the coefficient in the outflow equation is significantly negative. Rather different results are given by Junankar and Price (1984). In the development of a full labour market model, it is necessary to distinguish the different labour market states (this aspect has been stressed in the work of Hamermesh and of Clark and Summers). A person may be employed (including here self-employment), unemployed, or not in the labour force. Transitions are
Ch. 13: Income Maintenance and Social Insurance
861
possible between all three states. If unemployment is defined in terms of registration for benefit (this is discussed below), then a rise in benefits may induce higher unemployment through transitions both from employment and from the "not in the labour force" category, in addition to those who remain unemployed. In other words, part of the measured increase may come about because people choose to register who previously were not recorded. The operation of the benefit system is more complex than simply signing-on, and this is a further source of problems with the time-series approach. The typical practice is to take the benefits which are received by some "representative" worker: for example, Maki and Spindler take those for a worker with average earnings in the relevant contribution year and no spells of unemployment in that year. The problem is that benefit entitlements vary a great deal across the population, and they may have changed at different rates. As is shown in Atkinson and Micklewright (1985), different indicators of benefits followed quite different time paths over the period 1970-1983. Evidence about replacement rates in the United States, and their variation across individuals, is discussed by Hamermesh (1977). The advantage of studies based on individual data is that they can take account of the diversity of unemployment benefits; and this type of evidence has been extensively analysed. We should however, note at the outset that the extrapolation from individual experience to the population as a whole poses a serious problem, in that the effect may be interdependent, but the analysis usually treats each individual's decision as independent. Suppose, for example, that we observe that people with higher benefits delay returning to work. This does not necessarily mean that aggregate unemployment will be higher, since the refusal of jobs by one group may mean that the jobs are offered to others, increasing their re-employment probability. The pattern of unemployment benefits may determine who is unemployed but not affect the aggregate. As it is put by Albrecht and Axell (1984, p. 839): the typical cross-section or panel analysis addresses the question of whether individuals who receive relatively generous unemployment compensation search more than others who receive relatively less generous compensation, holding other differences between individuals constant. But the relevant policy question is whether economies characterised by relatively more generous unemployment compensation have more search unemployment than economies with less generous compensation. Such a question requires an equilibrium answer. There is therefore a basic problem of interpretation once the problem is set in a general equilibrium context. Table 5.3 shows a wide range of cross-section studies, differing methodologically in several respects. There is the distinction between reduced form models (the majority of those listed) and structural models [such as those of Kiefer and
A.B. Atkinson
862
Neumann (1979a, 1981) and Narendranathan and Nickell (1984)1. The studies may also be classified according to the aspect of the employment decision which is studied. I described earlier three possible labour market states. In some cases, the transition probabilities between these states are modelled directly [Barron and Mellow (1981), and Clark and Summers (1982)]. Other studies have been concerned with one part of the transition matrix. For example, in the United Kingdom, the probability of leaving unemployment (which could include exit from the labour force) has been the subject of work by Lancaster (1979), Nickell (1979a, 1979b), Atkinson et al. (1984), Lynch (1983), and Narendranathan et al. (1984). They have estimated this exit probability using the relationship with survival among the unemployed similar to that shown in equation (5.1) for retirement. The probability of moving from employment to unemployment has been less studied, but see Marston (1980) and Nickell (1980), as well as Stern (forthcoming) on repeat spells of unemployment. In other cases, the variable studied is the result of the transition process: for example, the length of time spent in a particular state. The length of an unemployment spell is examined in Classen (1979), Ehrenberg and Oaxaca (1976), Kiefer and Neumann (1979a, 1981), Newton and Rosen (1979), and Hills (1982). Recognising that there is the alternative of leaving the labour force, other studies have taken the length of employment, as in Hamermesh (1979), Solon (1979), and Moffit and Nicholson (1982). The emphasis in some studies is in probing the factors underlying the decision, and the job search model has been commonly chosen as a vehicle for this purpose. People are seen as determining their reservation wage, any job offering this or more being acceptable, in order to maximise the expected present value of the income stream. In a simple formulation, where there is an infinite horizon, a stationary wage offer distribution, and constant rate of arrival of job offers, constant wages and benefits, the choice of reservation wage, w *, may be shown to satisfy w* = b + P(W- w*)/r,
(5.2)
where b is the benefit, r the rate of discount, P the probability of receiving an acceptable job offer, and W is the expected wage associated with such an acceptable offer. This is not a course an explicit expression for w *, since both P and W depend on w*. If one is willing to posit a functional form for the distribution of wage offers, then it may be possible to solve for w *, and this is exploited ingeniously by Lancaster and Chesher (1983), who use information on reservation wages and expected wages in accepted job to fit the model exactly for each group of observations in their sample. Other authors have used the theoretical framework less tightly, not relying on the strong assumptions necessary to reach explicit results, and have treated as functions of benefits and other variables the reservation wage [Feldstein and Poterba (1984) and Fishe (198}) or
Ch. 13: Income Maintenance and Social Insurance
863
the post-unemployment wage [Classen (1979), Ehrenberg and Oaxaca (1976), Kiefer and Neumann (1979a, 1981)]. Or they have treated the exit probability [f(v) in equation (5.1)] as a function of the benefit level [Lancaster (1979), Nickell (1979a,b), Atkinson et al. (1984), and Narendranathan et al. (1985)]. Despite the differences in approaches, the findings of the cross-section studies appear at first sight to be relatively close. In a number of cases, it is reported that a 1 percent rise in benefits tends to be associated with rather less than a 1 percent increase in unemployment duration or decrease in the probability of leaving unemployment: see, for example, Lancaster and Nickell's (1980) summary of their evidence. In 1977, Hamermesh summarised his review of twelve U.S. studies by saying that "the best estimate-if one chooses a single figure-is that a 10-percentage point increase in the gross replacement rate leads to an increase in the duration of insured unemployment of about half a week when labour markets are tight" [Hamermesh (1977, p. 37)]. If all that one wishes to conclude is that the effect of unemployment benefits is rather modest, then that is probably sufficient. However, if one investigates more closely the different studies in the hope of reaching a more precise conclusion, then the picture becomes more blurred. We should also note the rather different finding with regard to flows into unemployment: for example, the study of Stern (forthcoming) of repeat unemployment in the United Kingdom finds no significant effect of benefits. The first source of doubt is that the reported statistical significance of the findings is not in many cases particularly impressive [an exception is Narendranathan et al. (1985)]. Taking, for example, nine estimates from the work of Ehrenberg and Oaxaca (1976), Kiefer and Neumann (1979a), and Lancaster (1979), using samples of around 500 cases, we find estimated t-statistics ranging from 1.7 to 3.8 with a mode of 2.3. Although -statistics of 2.0 are often treated as having attained respectability, the resulting estimates are clearly not very precise; and their lack of robustness has been brought out in studies which have varied the specification, the treatment of benefits and the sample employed. The specification in terms of the search model has already been discussed. Not only the detailed assumptions necessary to arrive at the form to be estimated, but also the basic presumptions of the model may be questioned. The role given to demand considerations is limited, whereas for many people this may be the crucial factor. People may be willing to accept any job which is offered. The shortage of microdata covering both supply and demand sides of the market is perhaps one reason why this has not been further explored. The aspect of specification which has been most fully investigated is the treatment of individual differences ("heterogeneity"). If people differ in their willingness to return to work, for example, then any cohort of unemployed will contain an increasing proportion of those reluctant to return; if this is not allowed for, then it may contaminate the estimates of the relationship with other variables [see Heckman and Singer, (1984a, 1984b, 1984c)].
864
A.B. A tkinson
The treatment of benefits raises similar questions to those discussed in the context of retirement decisions. The complexity of the benefit formulae and their interaction with other benefits and the income tax system mean that the calculation of the true incentive or disincentive to work may be difficult. The interdependence of decision over time is a crucial factor not captured by the stationary job search model. The decision to reject a job offer today may have implications for future benefit entitlement, either because it may lead to future disqualification or because entitlement depends on contribution conditions which can only be satisfied if the person returns to work. The positive incentive to enter employment which arises from such contribution conditions has been emphasised in the work of Hamermesh (1979, 1980). The interrelation with the tax system was discussed in Atkinson and Flemming (1978), where it was noted that with an annual tax assessment and tax exemption for benefits, the marginal cost of not working for a week was represented by earnings net of the marginal tax rate, not the average tax rate as commonly assumed. Furthermore, we have to allow for the fact that part of the income support may be provided by income-tested schemes for which the take-up rate is less than 100 percent, as is discussed further in Section 5.4. It is therefore difficult for the investigator to calculate the benefit to which a person is entitled, and the consequences for net income of the decision to refuse a job offer. It may be difficult too for the unemployed themselves, and the complexity of the system leads to one to suspect that actual behaviour may be governed not by an exact calculation of the trade-off, but by an approximate perception of the replacement rate. In practice, two main approaches have been used. First, studies have calculated for each person the benefit entitlement applying the rules of the schemes in question. In the United Kingdom, this approach is illustrated by the work of Nickell (1979a, 1979b); in the United States, examples are Ehrenberg and Oaxaca (1976) and Clark and Summers (1982), who provide a useful account of their UISIM programme. The problem with this approach is that it takes no account of the factors which cause actual receipt of benefit to differ from that calculated by applying the rules. In the United Kingdom, there are several reasons, including disqualification (for example, for quitting employment "without just cause"), incomplete contribution records, and the linking of spells. These are discussed in Micklewright (1984) and Atkinson and Micklewright (1985). From the evidence contained in the records of the unemployment insurance system, it appears that calculated benefit is likely to over-state this proportion in receipt and the average amount paid. The modelling of the benefit receipt is discussed in the United Kingdom by Micklewright (1985) and in the United States by Barron and Mellow (1981). The second approach is to make use of information on actual benefit receipt and on the individual's past career. The objection to this approach is that discussed in the case of pensions: that benefits may be influenced by unobserv-
Ch. 13: Income Maintenance and Social Insurance
865
able individual characteristics which also enter into the determination of the reemployment probability. A person with a bad previous employment record (not observed) would both get a lower benefit and be less keen to get a new job. The use of the calculated benefit as an instrument would avoid such contamination. For this reason, results based on actual benefits need to be interpreted with caution. What is clear, however, is that variation in the benefit variable can make a sizeable difference to the conclusions drawn. Atkinson et al. show that in the United Kingdom the use of the calculated benefit variable leads to results similar to those of Lancaster and Nickell (1980), but if benefit is related more closely to actual receipt then its effect ceases to be significantly different from zero. The conclusion is reached that the results are far from robust. The results of Narendranathan et al., using information from the benefit records, are different in that they find a well-defined elasticity, but it is quite small (0.28-0.36). The lack of robustness is also illustrated by the variation of results with the choice of sample and dependent variable. In the United States, Hills (1982, p. 462) has shown the sensitivity of the findings to the choice of sample: "we find that changes in sample definition produce quite different estimates of the relation between UI and the duration of unemployment". Moreover, the choice of sample is important when we seek to extrapolate the findings to the whole population. As in other fields, the initial simplicity of findings with regard to the disincentive effects tends to disappear when the issue is investigated in more detail. The flow of new studies has tended less to confirm the earlier findings than to raise new problems and to point to subtleties not previously taken into account. One cannot, moreover, forget that the provision of unemployment insurance is an issue charged with emotion. The academic debate has been successful in excluding the more obvious bar-room prejudices, but there is considerable scope for the conclusions drawn to be influenced by prior beliefs. 5.4. The take-up of welfare/public assistance
A number of major income-tested benefits are characterised by take-up rates apparently well below 100 percent. The reasons for non-take-up are various, but considerable importance has been attached to the stigmatising effect of receipt. The role of stigma in general is discussed by Cowell (1985), Deacon and Bradshaw (1983), and Spicker (1984). Some writers are sceptical. Klein (1975, p. 5), for example, describes stigma as "the phlogiston of social theory", and argues that low take-up can be explained just as well by other factors. That content can be given to the role of stigma is, however, illustrated by the theoretical analysis of Weisbrod (1970), who concentrated on what he called "external" stigma, which is associated with the number of people who identify the recipient as poor. He argued that take-up would be an increasing function of
866
A.B. Atkinson
the expected duration of benefits, of the level of benefits, and a decreasing function of the proportion of participants who are poor. Similarly, participation would be greater if the aid were provided in money than if it were in goods and services which can be identified as being for the poor. If we accept that stigma may influence participation, then it is only one of the factors at work. People may not claim on account of the transaction costs involved. People may not claim because they do not know of their eligibility. Evidence about the relative importance of different reasons may be sought from interviews with claimants and non-claimants. Non-take-up of Supplementary Benefit in Britain by pensioners is examined by Townsend (1979, p. 845), who concludes that "among the elderly, the predominant impression about their failure to obtain supplementary benefit was one of ignorance and inability to comprehend complex rules and pride in such independence as was left to them". [See also Altmann (1981).] Evidence about attitudes towards welfare in the United States are examined by Handler and Hollingsworth (1969). The Michigan Panel Study of Income Dynamics in 1976 asked those not participating in the food stamp programme whether they thought they were eligible and, if so, why they did not take up the stamps. In 60 percent of cases, eligible non-participants did not think or did not know that they were eligible [Duncan (1984, p. 85)]. Interview evidence is suggestive, but does not readily lead to quantification. A second type of evidence is that which seeks to relate participation rates to observed variables. For example, Strauss (1977) set out to measure the contribution made by the availability of information to take-up rates for Supplemental Security Income (SSI) in the United States, using for this purpose observations on receipt in different North Carolina counties, distinguishing between those with and without federal eligibility determination offices. He concludes that the presence of an office was indeed associated with increased participation. Participation in SSI has been examined by Warlick (1981b), using evidence from the Current Population Survey. The limitations of the available data mean however that she has to proxy factors such as "awareness of availability of SSI" by variables like educational attainment or receipt of OASI. The only variable which is statistically significant for all types of family unit is the amount of entitlement; and the results as a whole are inconclusive. A second programme which has been examined in the United States is food stamps. MacDonald (1977) uses data for 1971 from the Michigan Panel Study of Income Dynamics to examine the determinants of participation. He finds that households in receipt of welfare are more likely to claim, reasoning that they are more likely to be informed about the existence of stamps and that they have already overcome the stigma costs associated with claiming welfare. Secondly, the amount of the food stamp bonus was found by MacDonald to be a significant factor. This conclusion does however run counter to that reached by Coe (1979) using data for 1974 from the same source, although this study agrees about the importance of welfare recipiency.
Ch. 13: Income Maintenance and Social Insurance
867
In the case of welfare payments, the participation decision has been seen in conjunction with that concerning work. Brehm and Saving (1964) use an income-leisure trade-off in explaining variation in general assistance payments by state in the US; see in this context the comment by Stein and Albin (1967). More recently, the income-leisure framework has been combined with the explicit treatment of stigma. Ashenfelter (1983) derives a relationship between participation in an income-tested programme and the break-even level of the programme, the level of unearned income, the tax rate [as in Rea (1974)] plus an unobserved variable which represents the non-pecuniary costs of participation. Using data from the Seattle/Denver Income Maintenance Experiment, and assuming that the unobserved variable is normally distributed, Ashenfelter estimates the mean and variance of this distribution. The presence of a control group allows him to identify the parameters, and he concludes that receipt of benefits was not significantly affected by welfare stigma or other non-pecuniary programme participation costs. As was clearly identified in the original analysis by Weisbrod (1970), we need to distinguish different ways in which stigma may operate. The cost may be a fixed cost associated with any claim or there may be a marginal cost arising from more extensive participation or the claiming of more than one benefit. This has been investigated in the context of AFDC in the United States by Moffitt (1983), who allows for the possibility that there is both a fixed and a variable component, reducing the value of income. Using data for female family heads in 1976 from the Michigan Panel Study, he estimates jointly the participation decision and the choice of hours of work. The determinants of participation and hours are obtained using a specific form for the utility function and assumptions about the distribution of the fixed stigma component. Moffitt concludes that there is a significant influence of stigma, but it "appears to arise mainly from the act of welfare recipiency per se, and not to vary with the amount of the benefit once on welfare" [Moffitt (1983, pp. 1033-1034)]. The intertemporal aspects of the cost may also be important. There may be a set-up cost to claiming. Or, as modelled by Plant (1984), there may be a cost to transition between states, inhibiting claiming by those not previously in receipt and inducing welfare dependence by those on the benefit register. Analyses of the kind just described are a valuable addition to our knowledge, but they concentrate exclusively on the demand side of the relationship. The behavior of those administering income-tested benefits is however crucial to their operation; and much of the concern about stigma derives from the way in which the benefit is administered. The choice of participation is not solely that of the applicant. The potential recipient faces a variety of hurdles designed to ensure that benefits are confined to those who are truly eligible. Receipt may depend on availability for work or on demonstration that one is actively searching for work; it may require proof that the person is not co-habiting. The demand for assistance may be "constrained", as emphasised by Albin and Stein (1968).
868
A.B. Atkinson
5.5. Income support and family formation The reference to co-habitation brings us to an aspect of income support which has caused considerable controversy-its possible effect on family formation. This has been particularly discussed in the United States with regard to the welfare system. In that receipt of AFDC is conditional on the absence of one parent (until 1979 the father), the payment of benefits may influence decisions to marry or to separate. It has often been argued that the welfare system is one important factor in leading to the growth of female-headed families. The percentage of female-headed families (with no husband present) rose in the United States from 10.0 percent in 1960 to 15.4 percent in 1983, with the growth being especially marked for black families, increasing from 21.7 percent to 41.9 percent [Wilson and Neckerman (1986, table 2)]. The effects of income support on family formation may go much wider. The payment of pensions to the elderly may affect their ability to continue living on their own rather than joining the household of a son or daughter, or of entering a residential home. The age of marriage may be influenced by the level of support provided to families, as may be the completed family size. It is, however, the effect of welfare on single parenthood which has received most notice in the United States, and it is on this that attention is focussed here. The changes over time in the aggregate level of household formation in the United States are examined by Hendershott and Smith (1984), who show that a significant part of the change is due to changes in age-specific headship rates, being proportionately greatest from those under 35 and those over 75. Their estimated equations show a significant relation with the number of families on welfare, and they attribute part of the increased headship amongst the young to increase real AFDC benefits and lower eligibility standards. Such evidence, however, needs to be considered alongside the finding of Darity and Myers (1983), using annual data from 1955 to 1980, that the proportion of female-headed families amongst black families is not significantly related to AFDC payments. Cross-section data on receipt of AFDC by area (SMSAs) in the United States are employed by Honig (1974) - see also Minarik and Goldfarb (1976) and Honig (1976). Using data from 1960, she finds a positive relationship between average AFDC payments and the proportion of adult females who headed families with children. The relationship is less strong for whites than for non-whites, and is found to be smaller when data for 1970 are used. Consideration of individual data on household formation has been accompanied by a more explicit treatment of the individual decision. Hutchens (1979) has drawn on the work of Becker (1973) on the theory of marriage to deduce a number of possible determinants of the rate of re-marriage. Using a subsample of data for 1970 from the Michigan Panel Study of Income Dynamics, covering 438 women with children, he estimates a logit model of the probability of re-marriage
Ch. 13: Income Maintenance and Social Insurance
869
within two years. (He notes in this context the problems caused by sample attrition, which may be particularly serious for this purpose.) He concludes that the AFDC transfers do have a significant negative influence on the probability of re-marriage. The model of Danziger et al. (1982) compares the income when married with that when single, the decision regarding household formation being taken by the woman in the light of this comparison. (The decision of the man is treated as irrelevant. They note that abandonment by the husband "creates a short-run disequilibrium, but we assume that in the long run she will have found a new husband" [Danziger et al. (1982, p. 533)]. Some people might think that Keynes' view of the long run is nearer the mark.) The findings of their model, estimated using data from the 1975 Current Population Survey, are that AFDC increases the headship proportion by only a small amount. The effect of income support on family formation has also been studied as part of the negative income tax experiments. The evidence is reviewed by Bishop (1981a, 1980b); see also MacDonald and Sawhill (1978) and Groeneveld et al. (1980). The subject of family formation is clearly one of considerable complexity, where the role of economic factors should not be exaggerated. The mechanisms at work are more subtle than any simple relation between income support and marriage or separation.
5.6. Pensions and savings Since the early days of state pensions it has been argued that they discourage private saving. In 1907, a leading member of the Charity Organisation Society said that if the poor law could be swept away, "the condition of the masses would in every way be vastly improved, a further impetus would be given to the growth of the Friendly Societies, ... the necessity for providence and thrift would bring about less expenditure on drink" [quoted by Johnson (1984, p. 332)]. In recent years, the debate about the effect of pensions on savings has re-opened largely because of the research in the United States of Feldstein, starting with the article in which he reached the striking conclusion that "the social security program approximately halves the personal savings rate [which] implies that it substantially reduces the stock of capital and the level of national income" [Feldstein (1974, p. 922)]. This article stimulated a great deal of research, including attempts to replicate his approach in other countries [see references in the survey by Cartwright (1984)]. Not surprisingly, the principal conclusion has been challenged, and other authors have come to different conclusions. It has been pointed out that any reduction in personal savings does not necessarily imply a fall in capital stock; as we saw in Section 3, account has to be taken of the consequences for public debt and public capital. The impact of a rise in social security might, for instance, be offset by a decline in government
870
A.B. Atkinson
4
0
°4 -4
0w
L4
4
I
:4 >
-s
,3:4
:
:4 V
-'O
-
.g
o °L 5-
c CC
C4 ~0
-s 9 ;r~ :4
:4
-4
:
.
d !
' :4:4
*u :4 O a: @-· :4 :4*
:
:E
·:
:
0
.oo
0
^ E
*.: C
.5 4
rr; :4
~3 oI~ O CC
0~~~~~~ 0 :n
:4
,
:_,-x
;-~
~
bCCC o'4-C:
o
V).=
C-.
:4
:4 .O ,5 CA
:4
CC
I:4:
:4:4w-:4:
ay~~~g
0·:4
o%~~~~~~~~~~~~~~:
r~:4 C~·E.
-:
.?-P
-:4~ :425 :4
-:4
:4
r
-
o·-
* o~~~~~~ct~~~~~~~~
:4
4:
:4C
~
5 o~~~: 54
4
>
5-
Ch. 13: Income Maintenanceand Social Insurance
871
)
=
a .4M>
E
a
10
0)'0)1
PP
.U
= U
00
na a8~
0
rn 010>
CO._
0 )
n
-~
a
I -a
a
a'~~~~~~~~i
m
9~~~~~~~~~~~~~~~~~~a
B CC~~~
00
0)
'
00~
0' -
~
0
0~~~~~~~~~;
12
n
a
~0
'a3
00.Cj OE
-> ci a,3;
0 To
0
a r. I
9~~~~~~00
M
900
U-a
,1~~~~~~~'-
a
~.~ .~4 a ar 06
-
.~120
E
0)
ga3
0
en
0
B
-a up .Id C
0
C
0,~a
M en
z
c
)0-
O 0'~0
X >U
a~oa
o
A.B. Atkinson
872
I
-
,2a
- ::,
. E
u Z
-'
r-a a;
, vaor.E C
-C
4 u1 04
I
3
I
'
E a-a Q
r.t Cr
E
-''m
3
a
.aO0n
a- -a 'I
E!~
i3 =
D
a
ut
a
-a
.5 a a-a Ia a CQ
-
~c~
t
oN
la'
r
CE ~~~~~-I "
'-'~~~~~~~~
a'
O~ ' .
=~m ~~~0`
0I..'a 'ae
-'a
'aa ._
V-a
:,
o
vl'
°' j-a
O
I
3
0
a~~~~a~N"
'a
O
a' -
-
-aCa
-
'as
' a
0
a a :s
a3
N" 'a-~a'
-o ,. I
._
-a
=0m
a
a
i·
a'a
D 0
873
Ch. 13: Income Maintenance and Social Insurance JY a 0
_ '8
(
0
D
D
as · Pa ', -:
°00a, 0W a -v sC C pt
>0F
-0$ .,;
m L 0
: ra0.g O
0
o N C4f b7 '0
0' 0
0 c0 c
gE
0',
d O 0 rn0' C7 I
0'
r-'4
Ri
I8- .
r0
-
=F 't (d a, 'S
'd. _ cX
Ct
~00;a
I 0W
a Ct CIO
Gt
00 t4 at
0e Id
0r-
-;C
C ao *0
I Id tD - 7 o, la
Ct
:O -t
U
0x =-
._
0 t
at
v) -
.0
U
10 I U
0
00'
E Ct a
roo
3
a~o
0
0~
r-'
0
~a0.0ra ~0'0 0
hC~
ol 0
(
'Ct P0
0
o~0 V v2 00
C
°O la0_7-
~
0,
vdxh < 0.
(2.38)
z + dxh:
V *(z ) > W(Ul(xl°),..., Uh-l(xh-1,0), Uh(xhO + Edxh)
.. , UH(X HO))
(2.39) = W(U(x
+E
10
), ..., Uh(
aOU h dx
h+
aUh ax"
> V*(zO),
for small
h0 )
..... UH(HO))
o(e 2) > 0.
(2.40) (2.41)
On the other hand, V*(Z,) = V*(z
) + v(z
= V*(zO) + evdx
< V*(z°),
- zO) + o(E 2 )
(2.42)
h
(2.43)
+ O(E 2 )
for small
> 0.
This contradiction establishes the result.
(2.44)
Ch. 14: Theory of Cost-Benefit Analysis
937
Notice that this result is not concerned with how first-best allocations are actually achieved; in particular, it says nothing about market prices. However, we know that under certain conditions, a first-best allocation may be achieved as a competitive equilibrium with optimal lump-sum transfers; in such a case, (relative) shadow prices coincide with (relative) market prices, both being equal to the common marginal rates of substitution. The reader may verify that, where producer prices and lump-sum transfers are unrestricted control variables in problem (P) (2.10), while taxes and quotas are absent, and the economy is closed, a first-best allocation is achieved, shadow prices coincide with market prices, and welfare weights are equal for all consumers.
2.3. Some important second-best theorems
Some of the early contributions to second-best theory were pessimistic regarding the possibility of defining operational rules for public decision-making outside the perfectly competitive model; they stressed, for instance, that distortions occurring anywhere in the economy would lead to a disruption of the standard conditions for Pareto efficiency everywhere else [Lipsey and Lancaster (1956-57)]. Since then much work has been directed at establishing more positive results, and exhibiting the conditions under which reasonably simple and operational rules for decision-making (e.g. the desirability of aggregate production efficiency) exist in spite of the failure to achieve a first-best allocation. We shall present some of the more important results in our framework. A more formal synthesis can be found in Guesnerie (1979). 2.3.1. Production efficiency of the public sector
So far we have considered the "public sector" to cover all firms whose production plan is directly under the control of the planner, and whose profits accrue entirely to the government. It may happen, however, that the production plan of a private firm can effectively be manipulated by the planner (e.g. via fiscal instruments or quantity constraints), without affecting other private agents (except, of course, indirectly, via the scarcity constraints). If, in addition, the profits of such a firm either accrue entirely to the government, or can be taxed optimally, we call it fully controlled; and it is intuitively clear that the planner should evaluate the production plan of such a firm according to the same criteria as those applying to other public firms. As a corollary, weak production efficiency (in the sense of Section 1.4) is clearly desirable for the subset of fully controlled firms.
J. Drze and N. Stern
938
As a simple example, consider a firm, say the gth, for which each commodity is rationed - except one, say the first, whose net output will therefore be chosen to satisfy the technological constraint Fg(y g ) = 0. Suppose, furthermore, that the profits of this firm are all taxed: g = 1 (an optimal profit tax for this firm would be an equally adequate assumption). If (Yf)i,1 are unrestricted signals, the first-order conditions (2.19) imply
ayg
v=vay =
Fig Fg
(2.45)
Hence, when v 4 0: vi = Fg,
for all i, j,
(2.46)
which illustrates the desirability of joint production efficiency for the set of fully-controlled firms. Notice that when g's production set is strictly convex, "full control" can equally be achieved by a set of (unrestricted) firm-specific taxes - which would bring the relative prices faced by the firm into equality with relative shadow prices. Examples of firms which are not fully controlled in the above sense are: (i) firms whose profits are not fully taxed (these are dealt with in Section 3.6); (ii) public firms facing a budget constraint of the Boiteux type, e.g. pyg = Hg (also in Section 3.6); and (iii) firms whose net output of some commodities only can be controlled by the planner (see Section 2.4.3 below). We hope that these examples help to indicate which type of firm can legitimately be regarded as belonging to the "public sector". For our purposes, no distinction is needed between the public sector and the set of fully controlled firms. 2.3.2. Aggregate production efficiency The reasoning we have applied for a single firm can be applied to the private production sector as a whole. Thus, if the planner's opportunity set is such as to give him an effective ability to induce any (feasible) aggregate private production plan, without directly affecting consumer's demands and utilities, aggregate production efficiency (defined as weak production efficiency for all firms taken together) will clearly be desirable; and, more generally, private marginal rates of transformation will provide a suitable shadow price vector for evaluating marginal changes in the public production plan. Once again this result says nothing about producer prices as such. With convex production sets, however, it implies that the private production plan y * associ-
939
Ch. 14: Theory of Cost-Benefit Analysis
ated with the solution of (P) (2.10) can be supported by the price vector : vy *
max vy,
(2.50)
yEY
where Y is the aggregate production set for the private sector. As a corollary, if such a support vector is unique and there are no quotas, shadow prices coincide with producerprices. However, the general relationship between the desirability of aggregate production efficiency on the one hand, and the equality of shadow prices and producer prices on the other is quite complicated when quotas apply and we shall not investigate it further here. Several sets of assumptions can be invoked to ensure that the tools available to the planner are powerful enough to enable him to "control" the private sector. Most of them are restrictive from a practical point of view. The best known of these sets of assumptions are those of the Diamond-Mirrlees model of optimal taxation and public production [see Diamond and Mirrlees (1971)]: competitive private producers, unrestricted commodity taxation and full taxation of private profits. When these conditions are satisfied: The government can induce private firms to produce any efficient net output bundle by suitable choice of producer prices p.... The choice of p does not affect consumer demands or welfare, since pure profit arising from decreasing returns to scale go to the government, and since, any commodity taxes being possible, q can be chosen independently of p [Diamond and Mirrlees (1971, p. 17)]. It is interesting to compare these statements with the following (optimistic) quote from the Technical Note to the Sixth Five-Year Plan of India [Government of India (1981, p. 4)]: As for the private sector, the relevant macro plans are formulated by the Planning Commission, in complete tune with the general development strategy of the country and of the public sector. This part of the plan is regarded as indicative and subsequently appropriate measures are undertaken by different Ministries through fiscal, monetary and income policies to ensure their fulfilment. We can now provide a formal proof of the Diamond-Mirrlees result for the case where all private firms have strictly decreasing returns of scale (for the case of constant returns see Section 2.3.5). Let g= 1, for all g, and (Pi), (ti) all be unrestricted signals. Then the first-order conditions (2.19) imply ay pl = 0.
(2.51)
940
J. Drize and N. Stern
But also, by homogeneity of degree zero of private net supply functions: p= 0.y
(2.52)
If the matrix ay/ap has rank (I- 1), then v must be collinear with p. As a corollary, if private firms face no quantity constraints, aggregate production efficiency is desirable. It is clear that these results extend to the case where the profits of each firm are optimally taxed since then the social value of any effect of price changes operating through profits is zero and (2.51) applies [see Dasgupta and Stiglitz (1972)]. The required rank condition is mild when producers face no quantity constraints. For a producer facing quantity constraints the matrix ayg/ap has at least as many zero columns as there are binding quotas; but the above rank condition will remain weak provided that for each good there is at least one unrationed producer. For further references on aggregate production efficiency see Stiglitz and Dasgupta (1971), Mirrlees (1972) and Roberts (1978). 2.3.3. Efficiency of public production and consumption By an analysis similar to that of the preceding sections, one can examine the conditions of efficiency between the public sector and private consumers. The details are left to the reader, who is also referred to Guesnerie (1979). The following results should be intuitively clear from our previous discussion. (1) If the planner can directly control the consumption plan of a consumer, he should choose it so as to equalise the ratios of the marginal utilities of different commodities for that consumer to shadow price ratios (Ufh/Uh = v/vY, for all i,j). (2) When the planner has an effective ability (e.g. using taxes and lump-sum transfers) to manipulate private consumption plans without directly affecting producers' behaviour, shadow price ratios will coincide with marginal rates of substitution for each consumer; as a corollary, optimal public production will involve equating marginal rates of transformation in the public sector with the common marginal rates of substitution for consumers. The latter property has been called "C-C efficiency" by Guesnerie (1979)see also Hammond (1980). As a simple example, let (rh), (ti) all be unrestricted signals. Equation (2.19) then implies: ax
h
h-am_h=0,
-
ElhXh h
--
for allh,
-a = O, for all i.
(2.53) (2.54)
Ch. 14: Theory of Cost- Benefit Analysis
941
Substituting the Slutsky identity in (2.54) and using (2.53) we obtain:
v= 0,
(2.55)
where x = Yh h and kh(q, Xh, Uh) is h's compensated demand vector. Moreover, the homogeneity of compensated demands implies: q-=
(2.56)
.
Under a mild rank condition (see the discussion in Section 2.3.2) and a suitable normalisation rule, (2.55) and (2.56) imply v = q. As a corollary, if consumers face no quantity constraints, C-C efficiency is desirable. Notice, incidentally, that when v = q, (axh/amh) = 1 for all h, and so from (2.53), ph = 1: all consumers have identical welfare weights at s *. In this illustration the optimality of individual lump-sum transfers plays a crucial role. However, when restrictions are placed on the structure of preferences the same results can be derived under much weaker assumptions concerning the set of available transfers. This principle is familiar from the literature on optimal taxation. For example if all households have the same linear Engel curves, and differ only in their earning power (and labour is separable from other goods) then the optimality of a poll tax or transfer will imply here that consumer prices are proportional to shadow prices. The proof follows from (2.53) and (2.54) using an argument analogous to Deaton (1979). Similarly if Engel curves for different households are linear and parallel but their intercepts depend only on observable household characteristics, or unobservable characteristics uncorrelated with welfare weights, then optimal transfers based on observable household characteristics will imply that consumer prices are proportional to shadow prices [this is an extension of Deaton and Stern (1986)]. It should be noted that all the results of this section hold irrespective of non-convexities in private production, and this considerably enhances their usefulness. 2.3.4. Efficiency of public production and trade
If the conditions under which aggregate production efficiency is desirable are in general rather restrictive, it will often be much less far-fetched to assume that the planner has substantial control over foreign trade. When n i =- is unrestricted [implying that yf enters the Lagrangian (2.18) only through yf( ) the first-order conditions (2.19) include [recalling (2.4)]:
Piaya+fayf
=0
(2.57)
J. Drize and N. Stern
942
or =i
rap/,
(2.58)
i.e. the shadow price of the ith commodity is simply its world price (or marginal cost) multiplied by the marginal social value of foreign exchange (vf); we shall call this the "border marginal cost rule". Notice that this result holds irrespective of the form of the side constraints, provided these do not involve n i so that this control variable remains unrestricted. It is also easy to see that i will remain proportional to pf even if the side constraints involve n, provided that the effects they capture operate only through the balance of payments yf. There are two important ways of interpreting the appearance of ni as a control variable of the planner's problem (in the language of Section 2.1.3 we can think of n as being either directly controlled or implicitly controlled). The first is where the ith commodity is traded under a quota, and this quota can be chosen optimally. In this case, (2.58) is perhaps more naturally interpreted as an optimality rule for choosing the appropriate quota (see Section 2.1.3). Secondly, when the ith market clears by trade in the sense that the amount traded is endogenously determined from the scarcity constraints, the variable ni can again be regarded as a control variable. The border marginal cost rule can then be interpreted as follows: since the effect of a small increase in the net public supply of commodity i is a corresponding reduction in its net imports, the social value of that commodity is its marginal cost (or revenue) in world trade, multiplied by the marginal social value of foreign exchange. In both these examples, the result requires, in addition, that n should be unrestricted, i.e. it should not interfere with the side constraints. A simple counterexample would occur when an exogenous tariff tf applies and the world price of commodity i depends on the quantity traded, implying a side constraint linking pi, tf and ni. 2.3.5. Constant returns to scale Constant returns to scale is a characteristic of the technology, and by itself it has no immediate implication for the values of shadow prices, which involve not only the fixed characteristics of the economy, but also policies and the behaviour of agents. It has been suggested, however, that under some simple assumptions (e.g. competition among private producers), the existence of constant returns in the private sector implies a close relationship between shadow prices and producer prices [see Diamond and Mirrlees (1976)]. We shall examine this idea in our model. Consider a private firm, say g, operating under constant returns to scale (i.e. Yg is a cone). If this firm faces no quantity constraint, its profit-maximising
Ch. 14: Theory of Cost-Benefit Analysis
943
production plan will be either zero or indeterminate. When strictly positive profits are possible it would want to expand production without limit. If profits are negative at any positive scale it would want zero output. If maximum profits are zero, as they are in the standard characterisation of competitive equilibrium with constant returns to scale, any scalar multiple of a profit-maximising production plan is also profit maximising. The possible indeterminacy of the production plan will be resolved, however, if the firm faces a binding quantity constraint, say yf, which may be called its scale factor. Now suppose that this scale factor is a control variable. This would be the case, for instance, if firm g is the sole producer of good 1, and adjusts its output to the demand it faces (i.e. yg is an endogenous variable); or if the planner can directly manipulate the gth firm's output of good 1. If, in addition, ylg'is unrestricted, then the first-order conditions (2.19) imply:
ayg where b g -
g[ ayg hghgbh and bh is as in (2.23). If firm g faces no quantity constraint
other than the scale factor then this can be rewritten as vy g= -bg(pyg).
(2.59)
Moreover, if firm g makes zero profit then vy g
0.
(2.60)
This expresses the well-known Diamond and Mirrlees (1976) result according to which the production plan of a firm operating under constant returns breaks even at shadow prices. This result also holds if firm g makes positive profits provided that these profits are fully or optimally taxed; this follows immediately from (2.59) with bg = 0.
As we have seen it is common in models where some firms have constant returns to scale to impose the condition that profits of such firms are zero (the "no pure-profit condition"). It is important to note that side constraints of the kind IIg(.) = 0 do not prevent the scale factors of the corresponding firms from being unrestricted since these constraints can always be stated in terms of unit profits. The Diamond-Mirrlees result is trivial when the firm under consideration operates at zero scale. In its non-trivial form, the result appears quite general, but its validity relies on the following conditions being satisfied. First, the firm under consideration must operate at a non-zero scale at the equilibrium s *. Second, the
944
J. Drze and N. Stern
scale factor for this firm should be a control variable; this may require the planner being able to control the distribution of output between firms within industries, particularly if (2.60) is considered to hold for many firms. Finally, the firm should not face any quantity constraints other than that imposed by its scale factor. As a corollary of the above result, if (2.60) holds for (I - 1) firms with linearly independent production plans and zero profits, v is proportional to p [Diamond and Mirrlees (1976)]. The circumstances under which this assumption is likely to be satisfied, however, have not been thoroughly elucidated, and the discussion above should make clear that they may be quite restrictive. The simplest example of an economy where the required conditions are likely to be met is one where the assumptions of the non-substitution theorem (constant returns, no joint production, and a single scarce factor) are satisfied, and the private sector is competitive. The details are left to the reader. As a final remark, notice that the equality of producer prices and shadow prices can also be brought about by a suitable collection of optimal scale factors for constant returns industries and optimal taxes. If firm g operates under constant returns with zero profits and its scale factor is unrestricted, we have, as before (2.60): vyg = 0. On the other hand, if both Pi and ti are unrestricted, and profits are fully (or optimally) taxed, we obtain, as in (2.51): v-
= 0.
(2.61)
Both these equations are also satisfied by the producer price vector. Thus if (I- 1) of them hold (and are linearly independent), v will be proportional to p. It should be noted that the assumption of unrestricted pi is somewhat implausible in this context, since the "no pure-profit" condition for constant returns firms will usually amount to side constraints involving prices. It can be shown, however, that the result holds if there are no further side constraints and, in addition, each Pi and t i is a control variable. To see this, rewrite (2.61) as
ay + gEC E agyg=o api where C is the set of constant returns firms and a g is the Lagrange multiplier associated with the no pure-profit condition applying to firm g. Post-multiplying
945
Ch. 14: Theory of Cost-Benefit Analysis
by v and using vyg = 0 (g
C) we obtain
E
ay = 0 . ap When the matrix ay/ap has rank (I- 1), the latter implies that v is collinear to p since ay/ap is a positive semi-definite matrix. 2.3.6. Conclusion Two elementary lessons seem to emerge from the results of this sub-section. First, there is little point in arguing in a vacuum about the relative weights one should give to producer prices, consumer prices or world prices in shadow pricing formulae; a useful discussion of such issues should involve an understanding of the functioning of the economy and an explicit recognition of the constraints and objectives relevant to the decisions of the planner. Second, the conditions under which shadow pricing rules of uniform simplicity are valid appear to be rather restrictive for practical purposes. In the next subsection we shall study the more general shadow pricing rules applicable in our model; their implications will be pursued in Section 3. 2.4. Shadow prices in distorted economies In this section we study more explicitly the features of shadow prices and optimal policies in our model under alternative specifications of the vector of control variables. Our first task is to find the marginal social value of each of these variables (see Section 2.2.2), by differentiation of the Lagrangian of (P) (2.10): MSVh = h V MSV =
ax h h
ay
,x [ a
-
E AX
(2.62)
- bh
- ap i
hi
MSVJ = -a
-v h
MSVg = v ,
oyg
q, qhxh
h
ghbh
g
Oig'
MSVn = vi- fpf, MSV-h =
a
]o gh b h a2.63 lO g
(2.63)
(2.64)
x
+
+ E E g h
(2.65)
(2.66)
ah-
a i
h
i
.
(2.67)
J. Drize and N. Stern
946
Each of these expressions holds with arbitrary side constraints, as long as the control variable (sk) being considered is unrestricted. The same expressions hold for the marginal social value of each predetermined variable (wk) which does not (locally) affect the side constraints. By setting the marginal social value of control variables to zero, we can study shadow pricing rules as well as optimal policy rules. By studying the marginal social value of predetermined variables, we could also study directions of welfare-improving reforms (see Section 1.5) - this, however, is not our purpose here. In the remainder of Section 2.4 we investigate optimal policies and shadow prices by examining the first-order conditions that marginal social values for control variables should be zero. 2.4.1. Unrestrictedproducerprices: The Ramsey-Boiteux rule If the ith producer price is an unrestricted control variable its marginal social value (2.63) should be zero. As explained in Section 2.1.3, the notion that a price is a control variable can be seen either as an assumption that it can be set optimally or that the price adjusts to clear the market (for example we may consider a traded good with an optimal tariff or a non-traded good with a market-clearing price). Accordingly, the first-order conditions may be interpreted either as optimality rules or as expressions to be satisfied by shadow prices. If the marginal social value of pi in (2.63) is zero we have
-
Aihx
- vOqi
h
ap
+
bYg = 0O,
(2.68)
g
where, as before, bg = _ghbh, and bh is given in (2.62). Note that bg can be interpreted as the marginal social value of a lump-sum tax on the profits of the gth firm. The first term on the .h.s. of equation (2.68) embodies the direct effect on consumers of the price increase, the second the social cost of meeting the extra net demands induced and the third the social value of the extra profits generated. Using the decomposition of consumer demands into income and substitution effects one can easily derive
pay i
a =d i,
(2.69)
where d = hbhXh - Egbgyig and may be viewed as a "net distributional characteristic" of the ith commodity. This net distributional characteristic captures the marginal social value of all the income effects associated with a marginal
Ch. 14: Theory of Cost-Benefit Analysis
947
change in the ith price. With optimal lump-sum transfers for consumers, each bh is zero so that d i is zero for each i. Note that, if a commodity is rationed for every private agent, the left-hand side of (2.69) vanishes; in other words, the rule for optimal pricing of a rationed commodity is simply di = 0 (this is because the price charged on a rationed commodity affects its users in the same manner as a lump-sum tax). More generally, d i will be higher, the more it is consumed by " transfer-deserving" household (with bh > 0) and produced by firms whose profits should be more heavily taxed (b g < 0). Viewed as a shadow pricing rule, (2.69) suggests (intuitively speaking) that if the ith market clears by price adjustment, then, other things being equal, the shadow price of the ith good will be greater: (i) the greater is the distributional term di, and (ii) the more (less) socially valuable are its substitutes (complements). More precisely, (2.69) provides a formula for evaluating the marginal social value of the general equilibrium effects of a small change in the ith price; these effects are "broken down" into compensated changes in net commodity demands and changes in real income. Equation (2.69) is sometimes written as Tc aa +
aq
P a = di, ap
(2.70)
using the symmetry of the Slutsky matrix and the net supply derivatives, the homogeneity of xi() and yi('), and the definition of Tc and TP as shadow taxes
(q - v) and (v - p), respectively. A number of variants of this formula have been investigated in the second-best literature on optimal production and pricing and synthesised under the name of Ramsey-Boiteux rules by Guesnerie (1980). To bring out further the implications of Ramsey-Boiteux rules viewed as shadow pricing rules, let us first ignore the distributional effects by assuming that d i = 0. Equation (2.69) can then be re-written as v i = aiMSC, + (1 - ai)MSC,
(2.71)
with
~MSC, MSGC-
E
aYk Pk
api
/
a/k
Yi
api
(2.71a)
ai
(2.71b)
and ayi [ ay i
Oti iapi
ax ] -
api
(2.71c)
J. Dreze and N. Stern
948
(note that a, lies between 0 and 1). We call MSCi the marginalsocial cost of the ith commodity: it is the cost at shadow prices of the inputs involved in the production of an extra unit of the ith commodity brought about by a change in price. Symmetrically if the ith good is an input we can think of MSCj as its marginal social product. The definition of MSC i is analogous: it measures the social value of the commodities involved in compensating consumers for a unit loss of the ith good. Equation (2.71) then expresses the ith shadow price as a weighted average of MSC and MSC,, the weights being given by the proportion of the extra unit coming from (or going to) each source (destination) as determined by the price elasticities. Whilst (2.71) provides an interesting interpretation of vi, taken by itself it does not allow us to calculate v since the right-hand side includes other shadow prices - as usual all shadow prices are determined simultaneously. The greater the number of goods for which (2.71) applies the more we can deduce from this formula about the vector of shadow prices. Thus, consider the case where (2.70) holds for each i: this would apply for instance in a competitive economy where prices clear markets but taxes and transfers are exogenous. Let us assume for simplicity that at least one commodity - say the first - is untaxed (ql =Pl) at the optimum, and use the normalisation vl =Pl. A few manipulations of (2.70) then give
=r + ( - F),
(2.72)
with F.
Yp(Yp-
Wq)-'-i
+ Xq(Yp-
q)
1
,
where a tilde above a vector indicates the deletion of its first element, and Xq and Yp denote, respectively, the matrices ai/aq and ay/ap deprived of their first row and column (the invertibility of (Y- Xq) involves a mild rank condition; without the truncation the corresponding I x I matrix could not be inverted - this follows from (2.69) with d i = 0 for each i). In equation (2.72) the shadow price vector appears as a generalised weighted average of producer and consumer prices, where the weights are computed simultaneously from all price elasticities. Notice that r is the product of two positive semi-definite matrices, and so is I- r (e.g. weights are non-negative in the two-commodity case). We have seen in (2.71) and (2.72) two weighted average rules, the former averaging social costs on the production and consumption side for a single commodity and the latter producer and consumer price vectors. Weighted average rules have been popular in both the theoretical and applied literature and we have shown that some such rules can be given a general equilibrium founda-
Ch. 14: Theory of Cost-Benefit Analysis
949
tion. The general equilibirum rules are not simple, however, and care has to be exercised in deriving short cuts or "approximations" for empirical applications. Moreover the weighted average rules explicitly omit the distributional terms di. The importance of paying attention to the distributional aspects can be seen from the following example. Where Pi adjusts to clear each market it is immediate from (2.69) that if p = q and di is different from zero for some i then v p. Thus even in a perfectly competitive economy without indirect taxes market prices will generally be inadequate as shadow prices because they ignore distributional considerations. In deriving the second "generalised weighted average" rule, we have assumed producer prices to be "unrestricted" for each commodity. This may appear as a stringent assumption when some commodities are traded and their domestic price depends on their world price. In Section 3.3 we shall see that under certain conditions the Ramsey-Boiteux rule, applied to non-traded goods exclusively, has a particular solution of the form dT vi = i + d-
(each i non-traded)
where dT/d zi is the generalequilibrium effect on tax and tariff revenue of a small change in the net supply of good i in the public sector. Whilst Ramsey-Boiteux rules have attracted a great deal of attention in the second-best literature, project evaluation manuals, on the other hand, often lack full rigour in their treatment of prices, e.g. by assuming fixed producer prices or relying on partial equilibrium analyses of the effects of price changes. A better integration between these two lines of research is an important step towards the understanding of shadow pricing rules in economies where some important markets (such as food grains) are fairly competitive and may be considered to clear by price adjustment. 2.4.2. Unrestrictedcommodity taxes: The generalised Ramsey rule When the ith tax is unrestricted, the equation MSV, = 0 holds, implying, using the same derivation as for (2.70):
aq =
bhx,
(2.73)
which may be interpreted along the same lines. As an illustration, if shadow prices are proportional to producer prices (as in the Diamond-Mirrlees model mentioned in Section 2.3.2), then (2.73) immediately reduces to the well-known
J. Drgze and N. Stern
950
"many-person Ramsey rule" for optimal taxation [see Diamond (1975)], namely t a'
EbhLb
(2.74)
hX
Its generalisation in (2.73) is of striking simplicity, in spite of the much greater variety of models in which it is applicable. This illustrates the power of shadow prices in summarising much of the information contained in an economic model: when markets clear according to rules different from those applying in the Diamond-Mirrlees economy, shadow prices will deviate from producer prices, but, in a large class of models, the optimal taxation formula (2.73) will remain valid. Accordingly, (2.73) perhaps deserves to be called the generalised Ramsey rule. Since the generalised Ramsey rule is in effect a special case of the Ramsey-Boiteux rule (the two are equivalent when all private firms are fully controlled), we shall not analyse it separately. To conclude our outline of optimal pricing rules, however, let us consider briefly the case where both Pi and t are unrestricted. Letting MSVP, = MSVt, = 0 gives (2.73) and, in addition: lap
=- Ebyg.
(2.75)
To interpret this, define the marginal cost (or marginal product, if yig < 0) of commodity i in firm g as MCg- - YPEpi
MC,
-
)ag
y/api p, 3 '
= i
,
if i is rationed for g,
(2.76)
otherwise.
(2.77)
The marginalsocial cost (or marginal social product) of good i in firm g, denoted by MSC/g (or MSPg), is defined similarly, by weighting the derivatives on the r.h.s. of (2.76) and (2.77) by shadow prices rather than by producer prices. Using (2.77), (2.75) becomes ,(Pi-MSCig)
aj
=-
b9yvg
(2.78)
or bgyfg g
Pi-a/plMScig- 3/
(2.79)
951
Ch. 14: Theory of Cost-Benefit Analysis
where the first term on the r.h.s. of (2.79) is precisely the marginal social cost of good i as defined in (2.71a). Thus, in this model, when both pi and ti are control variables (e.g. the ith commodity is optimally taxed, and exchanged at a market-clearing price) and unrestricted, the shadow price of the ith good reduces to its marginal social cost (suitably averaged over firms), corrected by a distributional term; the latter follows a simple inverse-elasticity rule, and vanishes if profits are fully or optimally taxed. A similar marginal social cost rule applies when a producer ration is unrestricted. This brings us to (2.65). 2.4.3. Unrestrictedproducer rations: The marginal social cost rule When Yjg is unrestricted, MSVg = 0, and so, from (2.65): + bg
=v
= vi - MSCg + bg( p - Mcf)
(2.80)
or pi = MSCig - bg(pi- MCig),
(2.81)
which is another variety of marginal social cost rule. The shadow price of good i is its marginal social cost less the social value of the extra profits. Recall that bg will be negative where the government would like to tax profits more heavily. When Yig is considered as a policy instrument, (2.81) is perhaps best interpreted as a decision rule for choosing an optimal quota applying to firm g. On the other hand, if figis regarded as an endogenous variable (e.g. in a model with price rigidities and quantity rationing), (2.81), viewed as a shadow pricing rule, reflects the fact that a small increase in the net public supply of good i leads to a corresponding reduction of its net output in firm g. An important application of this arises with firms in "Keynesian equilibrium"; for such a firm output adjusts to demand and g (where i indexes the output of the firm) is an endogenous variable in the model-hence (2.81) applies. In such an equilibrium, is it correct to say that a commodity in excess supply (pi > MCg) should be devalued in the shadow price system relative to the market price system? This notion is intuitively appealing and not completely devoid of content, but as usual general equilibrium effects can invalidate it. Thus when b = 0 we can write (2.81) as
Pi
(
I
P)
Pi
)
j
Pj
952
J. Drze and N. Stern
where the aj's (j i) add to one and represent the shares of each input in the marginal cost of good i. Thus the "accounting ratio" (,/pi) of good i is lower than a weighted average of the accounting ratios of its inputs by a fraction measuring the discrepancy between marginal cost and price. In particular if the production of good i involves a single input (say k), the price of i relative to k will be lower in the shadow price system than in the producer price system. Further results of this kind have been explored in the literature on cost-benefit analysis in "general disequilibrium" (see references below). The analogy between (2.79) and (2.81) brings out the potential equivalence between price and quantity instruments (here taxes and quotas) as well as its limits: on the one hand, taxes and quotas have different distributional implications; on the other hand, they may have differing degrees of effectiveness, e.g. according to the extent to which they can be individualised. 2.4.4. Unrestricted consumer rations and the marginal willingness-to-pay Quantity constraints in consumption are not quite so easy to analyse as quantity constraints in production. To interpret (2.67), let us define the "pure substitution effect" of a small change in h's ration of good i on his net demand for good j as axh rih-- &h + q
axh mh
'
(2.82)
This measures the effect on h's demand for good j of a small gift of the ith good, and it can be shown that, if h's utility is weakly separable between good i and other goods, then ai = 0, for all j i. Using (2.82), the equation MS V = 0 becomes h(
= -Pi +
qi) - Pi -
j*i
h(ph - qi) + E
\xi
h
Ij (hi-
(2.83)
qi amh
)
or (using Ej iqjaf= 0) i+i
where ph is h's marginal willingness-to-pay for good i: _ aUh/ax Vh/m h·
(2.84)
Ch. 14: Theory of Cost-Benefit Analysis
953
In (2.85) the ith shadow price appears as the sum of three terms which clearly reflect the various effects of a marginal change in h's quota. The first is the hth consumer's welfare-weighted marginal willingness-to-pay for good i; the second is the net social value of the payment made by consumer h in return for an extra "ration ticket" (notice that this payment is precisely equivalent to a lump-sum tax); the third measures the net social cost of the change in h's consumption pattern through pure substitution effects, and vanishes when good i is weakly separable from other goods in h 's utility. An interesting and important application of (2.85) concerns non-marketed goods. Consider, for instance, a pure public good, produced exclusively in the public sector, and available uniformly to all consumers. This situation may be formalised by putting q, = 0 and h i, for all h for that commodity. Setting MSVxi = 0, we obtain, much as in (2.85): vi =
flihp + E Ticai. h
(2.86)
j*i
where ai =-- a hh. In this model, then, the shadow price of a pure public good simply consists of a welfare-weighted sum of consumers' marginal willingness-topay, plus the induced contribution to shadow tax revenue through pure substitution effects; the latter term is zero if separability holds between public and private goods - see also Diamond and Mirrlees (1971, p. 271). An application of the theory of cost-benefit analysis with consumer and producer rationing which is of particular significance is that developed in recent models of macro-economic disequilibrium. Some examples include Dixit (1976b), Dreze (1984), Johansson (1982), Marchand, Mintz and Pestieau (1983), and Roberts (1982).
2.5. Conclusion The shadow pricing rules discussed in this section have no special claim at universality. In particular situations there is no alternative, in principle, to identifying carefully the functioning of the economy, the planner's objectives, his instruments and the constraints circumscribing their use, and then deriving the appropriate shadow pricing formulae. It is useful, however, to investigate the extent to which one can derive reasonable rules-of-thumb or guidelines based on characteristics showed by the economies in which cost-benefit techniques are to be applied. This enquiry has motivated some influential writings in the project evaluation literature [e.g. the Little-Mirrlees method (1974), and the UNIDO Guidelines - Dasgupta, Marglin and Sen (1972)]; the remainder of this chapter is devoted to a more informal discussion of some selected aspects of this area of research in the light of the theoretical results obtained above.
954
J. Drgze and N. Stern
3. Selected applications of the theory 3. 1. Introduction In the previous section we have set out the theory of shadow prices in a fairly general model. The purpose of the present section is to show how this theory can throw light on a number of issues facing the cost-benefit analyst in practice. In Section 3.2 we discuss the role of the distribution of welfare or income. We argue that its explicit recognition should not be avoided and further illustrate how one could carry out a discussion of the value judgements necessary to evaluate changes in welfare for different households. The treatment of shadow prices for traded and non-traded goods has been a central issue in discussions of cost-benefit analysis and planning in developing countries in the 1970s [e.g. Little and Mirrlees (1974)]. In Section 3.3 we apply the theory of Section 2 to characterise the circumstances which could justify using the widely debated rule that shadow prices for traded goods should be world or border prices. We examine also the common suggestion that shadow prices for non-traded goods should be equal to their marginal cost of production at shadow prices. And we discuss briefly the distinction between traded and non-traded goods. Another major topic in the literature is shadow wages. Much of that discussion has used very aggregated models and we show in Section 3.4 that many of the basic ideas can fit into a general equilibrium framework of the kind we have been using. In Section 3.5 we examine numeraires, discount rates, and foreign exchange. These topics are treated together and we shall emphasise their interrelatedness. Thus, as we shall see, the discount rate cannot be defined without the specification of the numeraire. And the value of particular premia (e.g. on foreign exchange) will depend on the numeraire chosen for the evaluation. Furthermore, the choice of discount rate is closely related to the issue of how such premia move over time. We also see how simple rules for deciding on the discount rate (for example linking it to a particular interest rate or rate or return) correspond to different assumptions about the opportunities and constraints faced by the planner. We show in Section 3.6 how the analysis of shadow prices for public sector firms may be extended in a direct and natural way to private sector firms or public firms which are not fully controlled. We examine in Section 3.7 how time and uncertainty can be treated more explicitly in our model. From the formal point of view we can use much the same general framework although the choice of the particular structure of the model
Ch. 14: Theorv of Cost-Benefit Analvsis
955
for any application would often, particularly in the case of uncertainty, raise quite severe difficulties. The final subsection of Section 3 groups together a number of topics which are important but which have not received special attention here such as public goods, externalities, and large projects. The notation in this section is consistent with Section 2 but we shall also introduce further symbols as necessary for particular models. 3.2. The distribution of welfare We argued in Section 2 that the social welfare function, interpreted as the objective function of the planner, should play a central role in the theory of shadow prices. In the expressions involving shadow prices the particular aspect of the welfare function which was prominent was the collection of social marginal utilities of income or welfare weights ( 1 h). In this subsection we first examine (Section 3.2.1) ways which have been suggested of avoiding the use of the social welfare function, and then see (Section 3.2.2) how one might discuss and specify a social welfare function or set of weights. In practice the analyst should present results involving a selection of possible social welfare functions. This would be necessary not only because individuals or policy-makers will differ in, or be unsure of, their values but also because one would wish to know the sensitivity of shadow prices with respect to values. This sensitivity would indicate how important it would be to specify values precisely or to resolve differences between those who do not seem to agree. For some shadow prices welfare weights will not matter, for example where shadow prices are world prices; but for others the welfare weights may be very important indeed - e.g. see below on the shadow wage. 3.2.1. "Avoiding" value judgements One can imagine four ways of apparently avoiding the use in cost-benefit analysis of judgements concerning distribution across households. First, one can attempt to identify actual Pareto improvements. Secondly, one can seek potential Pareto improvements. Thirdly, one can use criteria which are not based on household welfare. Fourthly, one can argue that distribution can be ignored in cost-benefit analysis because, if it is a problem, it should be dealt with using policy instruments other than public projects. An alternative and rather more fierce version of this fourth case is that the distribution of income is not an appropriate concern for the government. In practice this fourth view is translated as the assumption that money benefits or costs accruing to households can simply be added without weights. The third and fourth positions do not really dispense
J. Dreze and N. Stern
956
with the objective function but involve particular assumptions on its form. We examine these four viewpoints in turn. The first approach is to seek to identify projects which yield Pareto improvements. Accordingly, for a given policy P, a project evaluation criterion of the following kind might be proposed: accept dz if, for each h, (A) 3Vh ah ds >O, where ds=
dz.
This in effect amounts to requiring a project to pass a series of cost-benefit tests of the kind we have described in Section 1. We can think of a social welfare function which consists simply of the utility of household h and consider the associated shadow prices h. The project increases the welfare of the hth household if vh dz > 0. Then a project yields a (strict) Pareto improvement if Vhdz > 0 for each h. An immediate problem with this approach is that the collection of shadow prices which is needed to guide production decisions now contains as many elements as there are consumers, which is inconvenient for decentralisation. Furthermore, it could be that very few projects pass the test, thus effectively paralysing investment decisions. The second approach consists of seeking potential Pareto improvements from an arbitrary initial situation s, z °. Thus, a project dz would be deemed acceptable if there exists some compatible change ds for which (A) is satisfied. Compensation criteria of the Hicks-Kaldor type are familiar examples of this kind of criterion, which have also been explored by a number of other authors, e.g. Allais (1943), (1977), Debreu (1951), Diewert (1984) and Farrell (1957). On compensation criteria see, for example, the discussions in Graaff (1957), Boadway (1974), Bruce and Harris (1982) and Chipman and Moore (1979). A few remarks may help to clarify the nature of this criterion. First we should note that if the initial environment s is not Pareto efficient [given z and the constraints in problem (P)], then all sufficiently small projects will pass the test. This should be obvious since the zero project (dz = 0) passes the test and thus, with continuity, so will all small enough projects. Secondly, provided the initial environment is Pareto efficient no problems of circularity arise (i.e. if the move from z to zl is compatible with a Pareto improvement, then the move from z1 to z° cannot be); hence, the objections to the Hicks-Kaldor compensation criteria on this score are avoided. Thirdly, designing cost-benefit tests for this
Ch. 14: Theorv of Cost-Benefit Analysis
957
criterion raises similar difficulties to those indicated for actual Pareto improvements-except in rather special circumstances a single shadow price vector cannot provide such a test. An example of a model where a single cost-benefit test provides a sufficient condition for a potential Pareto improvement is Diewert (1983) - see also Dixit and Norman (1980, ch. 3). An important and obvious ambiguity with the proposed criterion is that it remains vague on whether Pareto improvements will be actually implemented. If no such guarantee exists, then the criterion is certainly unacceptable. In any case, a fundamental shortcoming of evaluation criteria based on Pareto improvements, whether actual or potential, is that, unless they are taken to imply that Pareto-improving changes are the only acceptable ones (a view which we regard as extremely unappealing and which attaches undue weight to the status quo), they provide no decision criterion for projects which cannot lead to Pareto improvements. It is difficult to overcome this problem without accepting the need to specify a social welfare function which embodies more definite judgements. A third way of seemingly avoiding value judgements in project evaluation is to rely on objective functions which do not involve interpersonal comparisons of welfare. As we saw in Section 2.2.7, criteria based on statistics such as national income, indices of inequality, price indices, and unemployment rates are often most sensibly interpreted as deriving from the social welfare function. It is likely that other objectives which are often proposed, such as self-sufficiency, could also be derived in a similar framework but with a richer structure. For example, self-sufficiency may be seen as a derived objective when the external world subjects the economy to random shocks which are costly or difficult to control. Where the objectives are derived then they simply represent convenient means for summarising the criteria embodied in the social welfare function. Or the derived objectives may provide a useful means of discourse with those who are uncomfortable or unfamiliar with the notion of a social welfare function. Where the objectives are not derived they must be justified in some other way and we suspect that often this would not be easy, at least in an ethically appealing manner. There may well be reasons for arguing, for example, that a society with high levels of unemployment is unsatisfactory. One would no doubt include the effects of unemployment on income distribution, the self-respect of the unemployed, or the availability of future skills; but most or all of these considerations operate through the level of welfare of current or future households. Nevertheless, it is quite possible that even after considerable reflection one would maintain that an objective was not derived in the sense we have described. For instance, one may wish to assert that every person has a right to gainful employment; a measure of the extent to which this right is satisfied, say the employment rate, may then be proposed as an objective. Even in this example interpersonal comparisons would seem to be involved since a measure such as the employment rate gives equal weight to the employment of different people. To
958
J. Drize and N. Stern
summarise, whilst the idea of a social welfare function which dispenses with interpersonal comparisons of welfare is not, a priori, indefensible, in practice the range of situations in which it would be appealing is likely to be limited. According to the fourth approach, we should ignore distribution either because it is not an appropriate concern for government or because distributional matters should be dealt with using other policy tools than public projects. These views are translated in practice into simply adding net money benefits, or willingness-to-pay, across households. The first version is, we hope, seen as clearly unattractive and irresponsible - at a minimum one would want to see governments accepting some obligation to the weak or disabled. The second version represents a mistaken understanding of second-best welfare economics. It may be true that policy instruments exist - e.g. income taxes - which allow a more direct influence on the distribution of income than public projects; and the planner's model should include them. However, in general, redistributing income using these instruments will have social costs, and therefore the implications of projects for income distribution should not be ignored. For example the optimal non-linear income tax [see Mirrlees (1971), Stern (1976), or Christiansen (1981)] does not imply the equality of the social marginal utilities of income, Ah, across households: the disincentive aspects of the non-lump sum tax provide a reason for not fully equating them. Moreover, outside the fully competitive economy, even optimal lump-sum taxes (bh = 0) do not equate ph across households since the marginal social values of a transfer take into account the different social costs of different consumption patterns. It is noteworthy that whilst some non-economists, e.g. the lawyer Lord Roskill in chairing the enquiry into London's proposed third airport, refuse to accept weighting [see Roskill Commission (1971)], it has come to be accepted by some of those economists who had been its strongest opponents-see, for example, Harberger (1978) and Nwaneri (1970) and Layard, Squire and Harberger (1980). 3.2.2. The specification of value judgements We are seeking here methods which can structure a discussion of value judgements so that we can clarify our ethical positions and translate them into a form which can be used directly in practical cost-benefit analysis. Value judgements, of course, have a subjective nature; but we can still, and should, discuss these judgements rationally and systematically, examine the use of evidence on the circumstances and feelings of individuals, and try to specify functions and parameters which capture our views. In this subsection we examine methods which involve inferring objectives from decisions. These decisions may be hypothetical or actual. In the former case we pose simple questions of the kind: "What would you do if asked to decide on policy in this simple problem?" An example would be whether or not to take an amount x from individual A in order to give an amount y to individual B. We can then use the values inferred in
Ch. 14: Theory of Cost-Benefi Alalysis
959
the simple case to characterise the social welfare function, or welfare weights, which will guide policy in more complicated cost-benefit models. In the case where actual decisions are used we have to model the problem perceived by the decision-maker so that we can infer his values from the decision actually taken. For example, the savings rate selected in a planned economy could be used to make inferences about judgements concerning distribution across generations; similarly, one might use the parameters of an income tax schedule to provide information on judgements across individuals. In the savings case one will have to model returns on investment, and in the income tax case responses to incentives, before one could use government decisions to infer the underlying values. The procedure is familiar from demand analysis where individual preferences are inferred from consumption choices given a model of the budget constraint. The type of exercise we are proposing here is similar in spirit but involves the policy-maker rather than the individual. It is often called the "inverse optimum" problem. It might be argued that if the policy-maker is prepared to specify policies directly when given the structure of a model then it is unnecessary to introduce the optimising apparatus. This is to miss the point, however. We are treating simple cases where not only the formal problem can be solved easily but also one might expect to be able to make a direct judgement. This allows the inference of specific values which can be used in more complex problems. The most straightforward way of thinking about welfare weights is to consider small imaginary transfers between different individuals. For instance, if a marginal transfer of a units to household h is judged to compensate exactly (from the point of view of social welfare, and all other signals being held constant) a loss of one unit of lump-sum income for household h', one may infer that aph _ ph = 0.
(3.1)
Simple thought experiments of this kind can be used to generate the whole profile of welfare weights when the latter can be characterised by a relatively small number of parameters. Thus, consider a simple case where individuals are considered to differ significantly only in their lump-sum incomes (mh) and supplies of a homogeneous type of labour indexed by . If the SWF is anonymous, the welfare weights themselves would vary across individuals only according to these two factors, and we may write h
(mh, xh),
=
(3.2).
for some function P3. Furthermore, if P3 can be approximated by a function of a simple form such as #()
=
(M)(-X1)
(3.3)
J. Dreze and N. Stern
960
then only two thought experiments of the above kind would yield all the relative welfare weights. Notice that in (3.3) the parameter 8 is the familiar income elasticity of the social marginal utility of income - see Stern (1977). Generally, of course, welfare weights will depend on the values taken by such signals as prices and rations- see, for example, Roberts (1980). As long as all consumers face the same signals, the above procedure remains valid for marginal changes around some initial environment. When discrete changes are considered, or when different individuals face different signals (e.g. different quantity constraints) or possess different characteristics (e.g. family structure), it will be necessary to take these additional factors into account as far as possible. It is often helpful, in thinking about the specification of value judgements, to construct very simple models as "laboratories" for thought experiments. For example, one could pose a hypothetical optimal savings model where the government sets out to maximise 0 1
e-,'dt
(3.4)
subject to a given K(0) and K = rK- C, where C is consumption, K is capital, r is a constant output-capital ratio and p a pure time discount rate. It is easy to show that the optimal savings ratio (/rK) is (1/n) (1 - p/r). Thus, if the government is prepared to judge the optimal savings rate given p and r, then we can immediately infer . Depending on the interpretation of the model one can see as reflecting distributional judgements across generations, or over time for given individuals. Under either view one can argue that distribution at a point in time embodies similar issues and regard 71in an analogous manner to 8 in (3.3). The inverse optimum method can be applied to more complex problems, allowing, for instance, a greater number of parameters to be estimated. Its application to problems of optimal taxation and tax reform, in particular, has recently received close attention - see, for example, Guesnerie (1975), Ahmad and Stern (1984), Christiansen and Jansen (1978), and Stern (1977). If the inverse optimum is to be used as a method for finding welfare weights it must be applied with considerable care. First, the calculated welfare weights may be sensitive to the model of the economy and to which tools are assumed optimally chosen. Secondly, the assumption that the government has optimised must be examined critically. One way of doing this would be to ask directly whether the calculated welfare weights correspond to plausible value judgements. One could go further and use inverse optimum calculations as part of a dialogue with the government concerning both its values and whether the current policies are optimum. Interpreted in this way, rather than as a mechanical device for deriving welfare weights, the inverse optimum exercise can be instructive. In this section we have argued first that methods which avoid distributional value judgements are unsatisfactory in important respects, secondly, that such
Ch. 14: Theory of Cost-Benefit Analysis
961
judgements can often be usefully embodied in welfare weights, and thirdly, that these welfare weights can be discussed and made explicit in rational and open ways.
3.3. Traded and non-traded goods We examine in this subsection the implications of the results in Section 2 for the social valuation of traded and non-traded goods and for the classification between the two. These issues have been central in the discussion of cost-benefit methods, particularly those for developing countries - see, for example, Little and Mirrlees (1974), Dasgupta, Marglin and Sen (1972), and the Bulletin of the Oxford University Institute of Economics and Statistics, February 1972. Our main result concerning the shadow price of traded goods was the border marginal cost rule (2.58), which says that the shadow price of the ith commodity is its world price or marginal cost multiplied by the marginal social value of foreign exchange: vi = vpf.
(2.58)
This result holds whenever the variable f/ is an unrestricted control variable, i.e. the planner can (directly or implicitly) control the level of imports of the ith good and further Yf does not interfere with the side constraints. As we have argued in Section 2, there are two kinds of circumstances where 3if will appear as a control variable in the planner's problem. In the first, f is interpreted as a market-clearing signal, i.e. trade adjusts endogenously to clear the ith market. In the second, the ith good is traded under a quota, but the planner can set the quota optimally. From a formal point of view there is no distinction between these two situations (see Section 2.1.3). In both situations the validity of the result also depends, of course, on the variable Yjf being "unrestricted". This does not necessarily preclude the existence of such things as tariffs and consumer or firm-specific quotas. What matters is that the rules by which these quotas and tariffs are set should not depend on 3y. As a counter-example, suppose that an exogenous tariff if applies to good i, and that the world price of this good depends upon the quantity traded through the function (.-). The planner's problem would then involve the side constraint p,=pf
+ if.
(3)
(3.5)
The first-order condition for YI becomes, where lu is the Lagrange multiplier on (3.5):
_
03 [
Pi+ vYf
Pi'd-
13
Pf
f
f=o
(3.6)
J. Drze and NV.Stern
962
We can readily interpret 1i as the marginal social value of the tariff since i Equation (3.6) may be written as - al/aif. C i = ripf - [L
34i if(3.7)
rf~i
Y~f
a
(3.7)
This differs from (2.58) unless either = 0 (the tariff is optimal) or a3/ayf= o (the world price is fixed). Intuitively, we can understand the border marginal cost rule as follows. The effect of an increase in the availability of an extra unit of the good in the economy can be treated simply as a reduction in net imports and an increase in foreign exchange earnings. The conditions under which the border marginal cost rule is valid do not appear very restrictive where trade is market clearing and world prices are fixed. Where they are not fixed one requires in addition an optimal tariff or a flexible tariff which offsets domestically any change in the world price. This rule has two great advantages. First, where world prices are fixed, it tells us quite a lot about shadow prices without needing to know a great deal about the structure of the economy, which may be very distorted. In this sense the rule is very robust. It is in part for this reason and because it is relatively easy to implement that this rule has been so popular as a basis for applied cost-benefit analysis. Secondly the rule holds commodity by commodity. When it holds for each traded commodity it can be seen as an example of the efficiency of the set of fully controlled firms (applied to the STC). The border marginal cost rule obviously does not apply to goods which are traded under exogenous quotas, or to goods which are not traded at all. It is often suggested, for non-traded goods, that the shadow price should be based on the marginal social cost of production. We can examine this suggestion using the results of Section 2.4. The analysis of Section 2.4 suggests two possible justifications for marginal social cost rules. The first relies on condition (2.79) for optimal (or endogenous) producer price: e b gyi g
vi =
ay//8 p,M 9ay'api
gSC- /
ayi/p
i'
(2.79)
This result holds when pi can be manipulated independently of qi - e.g. if the tax ti on commodity i can be chosen optimally, or if qi is fixed exogenously. We may think of this rule intuitively as follows. Suppose an extra unit of commodity i is required and its price is increased to bring about its production. The first part of the cost is the marginal social cost of production, averaged appropriately, where the averaging is based on how much comes from each firm. The second
Ch. 14: Theory of Cost-Benefit Analysis
963
part is an adjustment to take account of the social value of the change in profits brought about by the price increase - the price increase required to generate the extra unit is (ayi/ap) - 1, the effect on profits of the gth firm of a unit price increase is yig and the net social value of a unit increase in profits in firm g is bg, which when combined and summed give the second term on the r.h.s. of (2.79). The second potential justification for the marginal social cost rule arises when the gth producer's ration of good i is optimal (or endogenous, as when a firm produces according to demand). In that case, (2.81) applies: pi = MSCg - bg( Pi- MCig) .
(2.81)
The ith shadow price is then the marginal social cost of the extra resources required by firm g to increase the output of good i less the social benefit from increasing profits in firm g (by the excess of price over marginal cost). The rule that shadow price equals marginal social cost for the ith good holds when the conditions for either (2.79) or (2.81) are satisfied and, in addition, the adjustment for changes in profits vanishes. This profit term is zero when profits are optimally or fully taxed (b g = 0), or, in the case of (2.81), when price is equal to marginal cost. The application of (2.81) to the case of constant returns to scale has been discussed in Section 2. These results suggest that the marginal social cost rule is not likely to be plausible for all non-traded goods. It is restrictive in its suggestion that the source of an extra unit of the ith good lies exclusively in extra production. When producer and consumer prices cannot be manipulated independently and price adjustments are an important element in market clearing, one should consider explicitly the effect on household demands and utilities of consumer price changes. The Ramsey-Boiteux rule (2.70) provides a means of weighting social costs arising on the production and consumption sides. It deserves to be given greater prominence in applied work on cost-benefit analysis. An important alternative to the marginal social cost rule for non-traded goods is the following [an extension of Dinwiddy and Teal, (1987)]. Let us assume that (i) all consumers have identical welfare weights (though not necessarily identical preferences), normalised for convenience to unity; (ii) producers and consumers face no binding quantity constraints; (iii) taxes, tariffs and world prices are exogenous. Then we can write the total effect on social welfare of a project dz as dV= q. dx = t dx +p- (dy + dz) =p-dz+t.dx+p.dyf+ p.dy g~f =p dz + t dx +p-dyf
g
964
J. Drze
and N. Stern
using p dy g = 0 for g *of from profit-maximisation. Hence, using yf dp = 0 (since the prices of traded goods are fixed): dT
vi =Pi+ d
where T t x + p yf is the sum of tax and tariff revenue (see (2.5)). Interestingly the above rule seems applicable to traded goods as well as to commodities produced under constant returns, in which case it becomes equivalent to the border marginal cost rule (for traded goods) and the marginal social cost rule (for goods produced under constant returns)-whenever the latter rules are applicable. This equivalence, however, would not be easy to guess from the formula as it stands and this reminds us that the simplicity of this formula, and its intuitive appeal, are to some extent only apparent. In fact, the above rule embodies complicated general equilibrium effects and these are important. The classification of commodities into traded and non-traded is a problem of substantial practical importance, especially in an intertemporal context; but we hope that our general discussion of the planner's model, as well as the more specific comments on the border marginal cost rule, make clear the theoretical principles which should underlie this classification. For instance, the implication of quotas for this classification depends crucially upon the control which the planner can exercise on them: if he can choose quotas unrestrictedly he should set them optimally along with adopting the border marginal cost rule; at another extreme, when quotas are exogenous from his point of view, he should consider the corresponding commodities as non-traded for the purposes of project evaluation. The classification can, of course, change over time, and in an intertemporal framework special difficulties will arise in specifying correctly future constraints and opportunities (e.g. movements in transport costs and quota restrictions); such difficulties, however, are a general feature of the planner's problem in its intertemporal interpretation. For further reading on the issues of this section see Little and Mirrlees (1974), Dasgupta, Marglin and Sen (1972), the symposium in the Bulletin of the Oxford University Institute of Economics and Statistics, February 1972, Bell and Devarajan (1983), and Kuyvenhoven (1978).
3.4. Shadow wages The shadow price of a given type of labour is usually called a shadow wage. The theory of shadow wages therefore forms part of the general theory of shadow prices. However, shadow wages are often treated in either a partial equilibrium or a highly aggregated framework. In this subsection we provide some examples drawn from the model of Section 2 which will help indicate the important
Ch. 14: Theory of Cost-Benefit Analysis
965
determinants of the shadow wage and relate our general results to some of the common notions appearing in simple models of shadow wage determination. We shall treat in turn the cases of a competitive labour market, involuntary unemployment with forced leisure and finally involuntary unemployment where there is residual absorption of labour in household activities. Let I be the index for a particular type of labour service. If the market for this service is competitive, its price (wage), pi,, is a market-clearing variable, and (if pi is also unrestricted) the Ramsey-Boiteux rule studied in Section 2.4.1 applies. For instance, if labour supplies are fixed (so that compensated consumer demands are inelastic with respect to q,), we obtain from (2.69): v = MSP + ayap, ayI/apl,
(3.8)
where MSP, is the marginal social product of labour in private enterprises [averaged over firms as in (2.79)], and, as before, d,= Ehbhx h - gby,. The shadow wage here reduces to its marginal social product, corrected by a distributional term which follows a simple inverse-elasticity rule, and whose sign depends upon whether shareholders or labourers are relatively more " transfer-deserving". This is natural, since an increase in public sector employment resulting in an increase in wages will have both an allocative effect [reflected in the first term of (3.8)], and a distributional one (captured by the second term). As a second example, consider a labour market with involuntary unemployment, which clears by rationing of supplies (the money wage being fixed, or in any case, related to signals other than the quantity of employment). For simplicity, assume that at the margin a single consumer, say h, is rationed in his supply, and that his utility is separable between leisure and consumption. Taking xh as a market-clearing variable, we obtain at once from (2.85) (when Xh is unrestricted): v, = fhpI - bhqi,
(3.9)
where ph is h's reservation wage, or disutility of labour in money terms (with ph < q since x is binding). This expression has a straightforward interpretation: in an economy of this type, where increased public employment affects leisure rather than employment elsewhere (by contrast with the first example), the shadow wage is equal to the (welfare-weighted) reservation wage, corrected by a distributional term which measures the marginal social value of the increase in income enjoyed by the extra labourer(s) hired (which here amounts to the wage rate). This simple shadow wage formula seems particularly relevant to some Western economies, where significant wage rigidities exist and a reduction in wage employment implies greater (forced) leisure rather than self-employment. The form of the enforced leisure postulated here is more relevant to an economy
J. Drze and N. Stern
966
where labour is shared than to one where unemployment is a discrete phenomenon. Notice also that, with unemployment benefits, the second term in (3.9) is modified since the income increase from unemployment then amounts only to the difference between the wage and unemployment benefits. As a final example, consider another type of unemployment situation where residual labour is absorbed in self-employment (e.g. on peasant farms), labour supplies being fixed. For the present purposes, we may consider a peasant farm absorbing residual labour as a particular firm, say g, owned by a single individual being h (so that 0gh = 1), and facing a quota yg on its employment of labour, determined endogenously from the solution of the Ith scarcity constraint. When Yg is unrestricted (2.81) holds and so v I = MSP, - bh( p-
(3.10)
MPg),
where MSP and MP denote the marginal social product and marginal product of labour on the farm, respectively (defined as in Section 2). The r.h.s. of (3.10) is precisely the Little-Mirrlees shadow wage [see Little and Mirrlees (1974, pp. 270-271)]; and, once again, it has a straightforward interpretation: the first term represents the social value of the net loss of output caused by a withdrawal of one unit of labour from the peasant farm to the public sector; the second measures the marginal social value of the increase in income accruing to the peasant household. Notice the importance played, in each of these examples, by the marginal social values of transfers bh= h - v(axh/amh). The values of bh obviously depend crucially on the welfare weights; they are also closely related to what has become known in the literature as "savings premia". A plausible interpretation of the notion of a savings premium emerges in the intertemporal version of our model, where the marginal social value of a (first-period) transfer to consumer h becomes: bh =
axh
h- E VI
Tamh
h
i
2
h
qi,
(qijx ) i
m
h
If first-period commodities are relatively more valuable socially than future-period commodities (perhaps because the planner has the opportunity to invest first-period commodities in very productive activities), then, other things being equal, marginal social values of transfers will be positively related to marginal propensities to save (i.e. to spend on future commodities). Consider, for instance, a simple two-period, one-commodity case. Dropping the commodity index, bh becomes: bh = ph _
q0
(1 _- MPSh)
-
MPSh,
(3.11)
967
Ch. 14: Theory of Cost-Benefit Analysis
where MPSh is h's marginal propensity to save in period 0. The marginal social value of a transfer from individual k to individual h then reduces to bh
-
bk = (h
-
k) + ( qo-
q )(MPSh - MPSk).
(3.12)
When a savings premium in the above sense exists (o/q > vl/ql), then, the planner's inclination to redistribute on the grounds of unequal welfare weights may be modified by a recognition of differences in marginal propensities to save. As will be made clear in Section 3.5, the existence of a savings premium defined in this way is exactly equivalent to the shadow discount rate being higher than the consumer's rate of interest. Each of the examples we have discussed relates specifically to a particular kind of situation, but we hope that together they provide a feel for the issues most commonly involved in calculating shadow wages. Particularly important elements of the planner's modelling exercise in this context are labour market institutions, migration behaviour, the organisation of peasant farms and the factors underlying marginal social values of transfers to different individuals. For further reading on the shadow wage, see, for example, Little (1961), Marglin (1976), Mazumdar (1976), Lal (1973), and Stiglitz (1981); on the savings premium see also Galenson and Leibenstein (1955), Diamond (1968) and Sen (1967). 3.5. Numeraires, discount rates, andforeign exchange The discount rate allows us to compare the shadow values of goods which are used or produced in the future with those which are used or produced today. More precisely, the discount rate is defined as the rate of fall in the value of the numeraire against which goods are valued each year - this rate is used to convert shadow values in different years into common units this year or present values. One cannot discuss the value of the discount rate unless a numeraire has been defined and thus we have to treat the numeraire and the discount rate together. This is set out formally when we specify the definitions in Section 3.5.1. The determination and calculation of the discount rate are examined in Section 3.5.2. Finally, in Section 3.5.3 we discuss the shadow value of foreign exchange, and in particular the notion of a premium on foreign exchange, analogous to the premium on savings of the previous section. 3.5.1. Definitions A good i which appears at time is obviously different from one which appears at time '. It is often useful, however, to recognise that these goods are physically the same and distinguished only by their date. In order to do this we shall use double subscripts so that zi, is the level of public supply of good i at time and
968
J. Dreze and N. Stern
vi, is the shadow value of an extra unit of z. can then be written as
II =-vdz
d dz-
E
iT7 dziT)
The shadow value 11 of a project
(3.13)
where vT is the vector (... i,...) and similarly for dz,. We can now normalise the vector v, in year T by multiplying through by any scalar of our choice, for example to set one of its components to unity, forming the normalised vector i,. This gives us ViT -
a,,
(3.14)
iT
where a, is called the shadow discount factor. If, for example, good i is the numeraire (with vT > 0 for all ) then DiT = 1 for all , and vT,= aT. The shadow discount factor can then be seen as the marginal social value of a unit of numeraire accruing in year T. From (3.13) and (3.14) we have =
aT,,
(3.15)
H, = -,dz.
(3.16)
T
where
Thus, we can interpret the social value of a project as a discounted stream of social benefits ({ } expressed in terms of a common numeraire. Hence the term "discount factor". It should be emphasised that HrI is not the marginal contribution of the project to "social welfare accruing in year ,T" (supposing it were possible to identify such a contribution), but rather the contribution to intertemporal welfare of the net outputs in year T of the project. For example, a capital good supplied by the public sector this year may generate benefits over a substantial future period and this will be embodied in its shadow value now. The shadow discount rate is defined as the rate of fall of the discount factor IT
-aP
+1
=
(3.17)
i.e. it is the rate of fall in the marginal social value of the numeraire. Where good i is the numeraire Vi P
- +1 i,
(3.18)
969
Ch. 14: Theorv of Cost-Benefit Analyvsis
From (3.17) we have, if a = 1,
a,+, =
1
1
I
a+(1 p +
1
1
1)
(1 +
)
(3.19)
(3.19)
If the shadow discount rate were constant at r, then a,+, would be 1/(1 + r)' + 1, a familiar expression. We should emphasise, however, that there is no reason to expect p, to be constant, as should be clear from the next subsection. Our definition obviously implies that the value of the shadow discount rate depends on the choice of numeraire. Suppose, for example, that a, and p, apply when good one is the numeraire (i.e. 1, = 1 for all ) and a and p when good 2 is the numeraire (2T = 1 for all T). Then a = (V1T/V 2 T)a'. Thus, using (3.17) and (3.18): V1 7 I,
V 27
V1T+1 V 2
,7
+l
a
V 2 , T+1
Hence, p7 = p' if and only if v 1 /v 2 = v 1 ,,T+/V 2,T+, i.e. the relative shadow price of good 1 and good 2 does not change from T to + 1. From now on we write the shadow discount rate when i is numeraire as p(i) and adopt a corresponding superscript notation when we wish to emphasise the dependence of a concept on the numeraire. Furthermore, for simplicity we shall assume in the remainder of Section 3.5 that there are only two periods (0 and 1). 3.5.2. The determination of the discount rate Having defined the discount rate we now turn to its calculation. In particular we shall attempt to relate it to various rates of return in the economy. To start with, let us consider a marginal project in the public sector as a small movement dy along the production frontier of a fully-controlled firm g. Thus, VFdy = 0,
(3.20)
where VFg is the vector of partial derivatives of g's production function. But we know from the result concerning efficiency among fully controlled firms that 7Fg is proportional to v [see, for example, (2.46)] and hence vdy = 0,
(3.21)
970
J. Dree and N. Stern
or, using time subscripts: V dy
+
Pl dy = 0.
(3.22)
This allows us to define the social rate of return (SRR) in this firm, when good i is the numeraire, as 1 + SRR(') =
_l
Yl o dy
(3.23)
Intuitively, the r.h.s. of (3.23) indicates the number of units of the numeraire that can be earned next year by investing a unit of numeraire this year in firm g. Note that (3.23) holds for any dy satisfying (3.20). Indeed from (3.14), (3.17), (3.22) and (3.23) we have SRR(')= a- a 1 l al
p-
(3.24)
Thus, the shadow discount rate is equal to the social rate of return in every fully controlled firm. With our definition this is true irrespective of the choice of numeraire, though, of course, the value of SRR (or p) in (3.24) depends crucially on this choice. It is often suggested that the shadow discount rate is equal to the social rate of return in a "marginal project", in the sense of a. project which breaks even at shadow prices. The above result confirms the validity of this assertion. However, while the proposition is correct it is not directly useful since it really amounts to a restatement of the definition of "marginal project". A useful application of (3.24) arises when we take foreign exchange as the numeraire in each year (vf, = 1 for each ) and regard borrowing or lending abroad as a marginal project. If the STC can borrow and lend at fixed interest rates, then the social rate of return from having less foreign exchange this year (and more next year) is simply the interest rate it faces on the world capital market (the borrowing rate if it is borrowing at the margin and the lending rate if it is lending). If the STC can be legitimately regarded as a fully controlled firm, then the shadow discount rate is equal to the world interest rate. It should be emphasised that the shadow discount rate here is the social rate of return only in a fully controlled firm. There is no reason, from what has been said, to suppose that this is the rate of return in the private sector. If private producers are unconstrained then under the conditions where relative producer prices are equal to relative shadow prices (see Section 2.3.2), the marginal rates oftransformation in private and fully controlled firms will be the same. Then, using similar conventions for the numeraire for both shadow and producer prices we see that the shadow discount rate will be the private rate of return, i.e. the rate of return in terms of market prices (note that no distinction needs to be drawn here between pre and post-tax profits as long as profits taxes are proportional).
Ch. 14: Theory of Cost-Benefit Analysis
971
Intuitively under these conditions to each marginal public project there corresponds a marginal private project which is identical in all relevant respects so that the rates of return on the former and latter are the same. Alternatively, one could see the introduction of a public project as simply being accommodated in the general equilibrium by a displacement of an equivalent private project so that this private project can be seen as the opportunity cost of the public one. Now consider a firm which is not fully controlled (for simplicity we shall omit the superscript g identifying this firm). As it stands, (3.23) does not unambiguously define the social rate of return in this firm, since the expression on the r.h.s. will no longer have the same value for any move dy on g's production frontier. Therefore we now define the social rate of return of good j in firm g as follows: - ay,
-
I + SRR'
if j is rationed in firm g
(3.25)
otherwise
(3.26)
- ayo
ay, 1 + SRR'
)
-
where are taken along the firm's supply function. The interpretawhere all derivatives are taken along the firm's supply function. The interpretation of these definitions is analogous to that of (3.23). Using the same definition of "marginal social product" (MSP) as before (see Sections 2.4.2 or 3.4) it is then easy to show that SRR5' = p(i) p
v 0= MSPio.
(3.27)
In other words, the shadow discount rate is equal to the social rate of return of good j in firm g if and only if the shadow price of this good (period 0) is equal to its marginal social product in firm g. As we have seen in Sections 2.4 and 3.3, however, the latter is just the first-order condition for an optimal production level of good j (period 0) in firm g, provided that either price equals marginal cost for this good, or g's profits are fully (or optimally) taxed. In summary, the equality between the shadow discount rate and social rates of return in the private sector hold under conditions which appear as neither straightforward nor general. Let us now examine more closely the relationship between the shadow discount rate, social rates of return and private rates of return. The private rate of return (PRR) of good j in firm g is defined exactly as in (3.25) and (3.26), with
972
J. Drze and N. Stern
producer prices replacing shadow prices (and using the numeraire of the producer price system). It is easy to verify that if firm g faces no binding constraint on its input of good j (period 0), then PRR(') = Po
P- = r(,
Pil
(3.28)
where r(i) may be called the producer rate of interest using good i as numeraire (i.e. by selling one unit of good i this year, a producer can buy 1 + r(') units of it next year). Private rates of return are obviously unlikely to coincide with social rates of return unless the vector of intertemporal shadow prices is itself collinear with the producer price vector. To see more directly the relationship between private rates of return and the shadow discount rate, observe that r(i)= (i)
Pi
Pio
=
il
Pil
(3.29)
(this follows immediately from the definitions). Thus, for a given numeraire, the shadow discount rate is equal to the producer rate of interest if and only if the ratio of shadow price to producer price for this numeraire remains constant over time. One may define the consumer rate of interest for any commodity exactly as in (3.28) with producer prices replaced by consumer prices; (3.29) then has a straightforward counterpart. More generally one can analyse the relationship between the consumer rate of interest and the shadow rate of discount using the results of Section 2.4 on the general relationship between consumer prices and shadow prices much as we have done on the production side. Rather than doing this systematically we shall use these results to examine briefly some of the suggestions on the shadow discount rate found in the literature. Diamond and Mirrlees (1971) have relative producer prices equal to relative shadow prices, see Section 2.3.2. Hence for a given numeraire the shadow discount rate, the producer rate of interest, and all private and social rates of return coincide, though they differ from the consumer rate of interest. Arrow (1966) and Kay (1972) argue that it would often be reasonable to suppose that the ratio between shadow price and consumer price will be constant over time. They work at an aggregated level so the shadow price for each period is the Lagrange multiplier on the single scarcity constraint. Crudely speaking this would only change if the problems of financing public expenditure were expected to become more or less severe over time (recall the equivalence between the scarcity constraint and the government's budget constraint - see Section 2.2.3). As we have seen, if the ratio between shadow price and consumer price is
Ch. 14: Theory of Cost-Benefit Analvsis
973
constant for good i, then the shadow discount rate is equal to the consumer rate of interest for that good. Little and Mirrlees (1974, pp. 291-305) argue explicitly that one would expect relative shadow prices and consumer prices to change over time as the government overcomes various constraints on prices and quantities (see also the characterisation of dynamic shadow prices in optimal growth models such as Stern, 1972). Harberger (1973) claims that the shadow discount rate should be equal to the private rate of return on the grounds that private investment is a straightforward alternative to public investment. The validity of this argument, however, rests on the assumption that public projects are displacing private projects, which certainly involves a particular model of the economy [Bradford (1975)]. Moreover, as we have seen, in such a model it is the social rate of return in the private sector which would emerge as the shadow discount rate (and even this result would require some additional assumptions such as the full or optimal taxation of profits). Sandmo and Dreze (1971) suggest a weighted average formula (averaging interest rates for consumers and producers) based on the degree to which funds for public investment come from private investment and private consumers - see also Baumol (1986), and Dreze (1974). This is precisely a special case of the Ramsey-Boiteux formula derived in Section 2.4.1 where shadow prices appear as a weighted average of consumer and producer prices. Finally, let us mention a suggestion for finding the shadow discount rate which occurs quite commonly in the literature: namely, that the shadow discount rate should be set so that the total public investment budget is just exhausted. Thus, it is suggested that if we find ourselves accepting more projects than can be "afforded" with the existing discount rate then we should raise it, and conversely if too few projects pass the test of positive net present value. The shadow discount rate is seen as the price which clears the investment budget. Whilst this has some intuitive appeal, it is unsatisfactory in important respects. First, the argument runs in terms of "the" discount rate. There would, in general, be as many discount rates as periods and one could in principle adjust any or many of the discount rates to bring the amount of public investment to the desired level. In any case it is misleading to think of one single shadow price, or subset of shadow prices, as requiring adjustment. At shadow prices a marginal project should just break even and this is a condition on all the prices, not just on one or two of them. Thus, one can change the number of accepted projects by, for example, adjusting the shadow wage. Secondly, the nature and rationality of such a budget constraint should be carefully assessed in the first place (for the analysis of budget constraints in the public sector, the reader is referred to Sections 2.2.3 and 3.6). Thus, we would not recommend this approach to measuring the shadow discount rate.
974
J. Drbze and . Stern
Some further references on the topics of this subsection include Arrow and Kurz (1970), Marglin (1963a, 1963b, 1976), Pestieau (1975), and the collection of papers in Lind (1983). 3.5.3. Premia on foreign exchange Foreign exchange is often seen as a particular problem in the planning of investment and growth. Some speak of "the foreign exchange constraint", others of "the premium on foreign exchange", "the shadow exchange rate" and so on. One might ask how such considerations fit into the framework we have set out. The shadow value of foreign exchange has already appeared explicitly in our model: vf is the marginal social value of foreign exchange, since it measures the increase in social welfare resulting from a "gift" of one unit of foreign exchange. Since p is the exchange rate (the price of a unit of foreign exchange) it is tempting to call vf"a shadow exchange rate". Given that alternative interpretations have been placed on this concept it is not a terminology we shall adopt. Since we are free to normalise shadow prices we can set vf equal to pf so that the market and shadow exchange rates are equal. This does not, however, allow us to avoid enquiring about the relative social value of foreign exchange and other commodities. Intuitively the marginal social value of foreign exchange will depend on its source or destination at the margin or, in other words, on how the balance of payments is ensured. There are in principle many possibilities, such as changes in tariffs or quotas, adjustment of exchange rate, or fiscal adjustments; which of them are relevant will depend on the opportunities and constraints facing the planner. To give an explicit example which provides an illustration of the considerations involved, suppose that there is a quota system for the import of some commodities. The quota n i of each of these commodities depends on an index (say a) of the amount of foreign exchange available. The rationing system is designed by a licensing authority beyond the control of the planner. The variable a adjusts endogenously to clear the balance of payments, and hence it is a control variable. We shall also assume it is unrestricted. We can then examine the first-order condition that the marginal social value of a should be zero.This gives rememberingg ?f-= n, (2.4) and (2.66)]: E
(aivi p- vfPf) =
0,
(3.30)
where the summation is over the goods involved in the rationing scheme and pf is the marginal cost of good i on the world market in terms of foreign exchange.
975
Ch. 14: Theory of Cost-Benefit Analysis
Rearranging we have: an
EVi
i
v/= i
P
aa
i
,
(3.31)
aa
where again, and in what follows, the summations are over the goods involved in the rationing scheme. Thus, we can see the shadow value of foreign exchange as the shadow value of extra rations divided by the cost in terms of foreign exchange of obtaining them. The premium on foreign exchange depends on the definition of the term and the opportunities and constraints for the use of foreign exchange. As an illustration suppose we decide to say that there is a premium on foreign exchange if the relative shadow price of foreign exchange and good 1 is higher than the relative market price for foreign exchange and good 1. Thus there is a premium on foreign exchange if -- - > P Pi
0.
(3.32)
Assume that world prices are fixed for the goods involved in the rationing scheme (pf=pf), that good 1 is amongst these goods, and that positive tariffs apply (pi = Pfpf + tif, where t f is the ith tariff). Suppose, further, that relative shadow prices for commodities do not differ from their relative market prices within the set of goods subject to rationing so that vi = (vl/pl)p i in this group (note that the border marginal cost rule would not normally apply to these goods). Then (3.31) becomes:
Pf
V1
Pf
Pl
(3.33)
If an,/aa 2 0 and t> 0 (with strict inequalities for at least one i), then the term in square brackets in (3.33) is greater than one so that (3.32) is satisfied and there is indeed a premium on foreign exchange. If the premium on foreign exchange is defined as the l.h.s. of (3.32) divided by vl/pl, then it may be calculated from the value of a marginal bundle of
976
J. Dreze and N. Stern
commodities at domestic prices divided by its value at world prices. The content of the marginal bundle reflects the way in which an extra unit of foreign exchange will be allocated across commodities. Thus, in this example, and under this definition, the premium on foreign exchange is a measure of the extent to which domestic prices exceed world prices (more precisely, it can be expressed as a weighted average of tariff rates). Formulae such as (3.33) are familiar in the cost-benefit literature. For instance, in terms of the above example, the Little-Mirrlees Standard Conversion Factor (SCF) for the group of goods involved (with the normalisation rule vf=pf) corresponds to v/pl [see Little and Mirrlees, (1974, p. 218)] while the UNIDO Shadow Exchange Rate is analogous to one plus the premium on foreign exchange [see Dasgupta, Marglin and Sen (1972, ch. 16)]. As with the discount rate, many of the ideas on the marginal social value of foreign exchange found in the literature can be examined in terms of the general principles developed in Section 2. We shall not attempt this here but we hope that the above example illustrates convincingly the general idea that the marginal social value of foreign exchange depends upon the uses and sources of foreign exchange which are available to the planner. Some examples from the literature on the role of foreign exchange in cost-benefit analysis are: Bacha and Taylor (1971), Balassa (1974a and b), Bhagwati and Srinivasan (1981), Blitzer, Dasgupta and Stiglitz (1981), Scott (1974) and a collection of papers in Oxford Economic Papers, July 1974.
3.6. Privateprojects and Boiteux firms To see how private projects should be evaluated, let us introduce, in the model of Section 2, a firm (indexed by 0) whose production plan y 0 is regarded as a vector of predetermined variables. A private project dy ° then induces a total change in social welfare dV, where d=
a
ay 0
dyo = vdy ° + b(p dyO)
(3.34)
or dV= (v + bp)dy ° .
(3.35)
In other words, the price vector appropriate for the social evaluation of private projects is simply a linear combination of the shadow price vector and the producer price vector. This is not surprising, since in this context public and private projects differ only in respect of the distribution of profits, and these vary with net outputs at a rate indicated by producer prices.
Ch. 14: Theorv of Cost-Benefit Analysis
977
A very similar result holds concerning the shadow prices appropriate for the evaluation of projects taking place in "Boiteux firms", or public firms facing a budget constraint [Boiteux (1956), or (1971)]. Indeed, it is easy to repeat the above analysis in the case where 0h = 0 for all h and the side constraint pyO = H applies for some price vector p; one then finds that the price vector appropriate for the evaluation of projects in Boiteux firms is a linear combination of the shadow price vector and the price vector defining the budget constraint: dV= (v + b°p) dy ° ,
(3.36)
where now b represents the Lagrange multiplier associated with the side constraint pyO = II. An instructive way of restating this result is the following: Boiteux firms should choose the production plan which has maximum profits at shadow prices (v) among all those which satisfy its budget constraint. Guesnerie (1980) has shown that these decision rules for Boiteux firms are very general.
3.7. Uncertainty and time Uncertainty and time have been relatively neglected in this paper and we comment very briefly in this subsection on their treatment in the models of Sections 1 and 2. Projects are often undertaken in situations of considerable uncertainty and it is therefore important to consider how uncertainty should be treated in cost-benefit analysis. Formally uncertainty can be incorporated in the model of Section 1 by distinguishing commodities and signals according to the state of the world in which they occur [Debreu (1959)]. To proceed in this way is not to suppose that well-functioning markets for contingent commodities exist; it is only to recognise that, in every state of the world, every commodity has a source and destination and that private agents' demands for these commodities in each state of the world respond in a well-defined way to some collection of signals. Pointing out that the general framework of Section 1 can encompass many models of economies with uncertainty is not to deny, of course, that its application to specific problems will often be greatly complicated by the presence of uncertainty. In particular, the specification of an objective function may involve difficult judgements, e.g. if private agents are thought to have mistaken expectations. Besides, the planner's problem (P) is now one of dynamic stochastic optimisation. It is important to recognise, however, that in spite of the intricate problems of modelling and application which it precipitates, cost-benefit analysis under uncertainty rests on similar theoretical foundations to the basic theory of shadow prices developed in Section 1. We shall here confine ourselves to illustrating this point with two simple examples.
J. Dre and N. Stern
978
As a first example, consider a very simple two-period economy with a single consumption good and labour. Private agents consist of a single peasant who consumes x in period 0 (out of stocks), and supplies a fixed total amount of labour L, which is devoted both to employment on public works (-z , ), and agricultural labour on the farm (L + z,). We assume that the peasant is always willing to take up extra public employment. The planner's objective function is simply the peasant's expected utility, (3.37)
V= EoU(xo, Xl1 ) = Z7 U(xo, x 1o), a
where a is an index of the state of the world (e.g. a measure of rainfall), and %Tis the perceived probability of occurrence of event a. Since public profits (losses) ultimately accrue to (are borne by) the peasant, we write, after appropriate substitutions: V*(z)
EU(x, F(L + z,; ) +z°
- xo),
(3.38)
where Z1, is public production in period 1 and event a, and F is the farm's production function. Letting v aV*/dz as usual, we find U(x° x) axlo
(3.39)
v/ = E7roUloF' = E(UloF').
(3.40)
Vlo =
97U0, 7,,=
a
The marginal social value for a sure delivery of a unit of the commodity in period 1 (dzl, = 1, for all a), say v, is then V1 =
vlo = E(Ul,).
(3.41)
Combining (3.40) and (3.41) gives
PI
E(U )
= E(F')(1 + a),
(3.42)
where a is the correlation coefficient between F' and Ui. The r.h.s. of (3.42) is the direct analogue in this model of the Little-Mirrlees shadow wage introduced in Section 3.4, and it differs from the latter in two respects. First, the distribution term here vanishes - this is natural since we now have a single consumer and no investment, and it has nothing to do with uncertainty as such. Secondly, and
979
Ch. 14: Theory of Cost-Benefit Analysis
more importantly, the marginal product of labour is now replaced by its expected value, and is further corrected to take into account the correlation between marginal product on the farm and marginal utility of consumption. One would often expect this correlation to be negative (e.g. when bad weather means low marginal product and high marginal utility of income). In that case, the more risky is private production, the lower is the social cost of labour (in terms of P,); this makes intuitive sense. Consider now a more general two-period model, where private agents' decisions in period 1 are taken with full knowledge of past and current signals, while those in period 0 have to be taken with imperfect knowledge of future signals. Considering a consumer's decision problem first, let xIa(sla; x 0 ) be his preferred consumption bundle in period 1 and event a when x0 has been chosen in period 0, and first-period signals take the value s,,. One can then appeal to the theory of decision-making under uncertainty to model the consumer's choice of x0 . Generally, this choice will depend upon the consumer's expectations about the values of future signals. These expectations in turn depend on the vector of current signals s0. They may also depend, of course, on the quantities chosen or decisions taken by other agents in period 0, but since these decisions themselves are functions of current signals, ultimately the profile of demands in period 0 should depend upon s o only; and therefore, after fully taking into account the expectations process, one could still write x0 = xo(so). After proceeding analogously for private producers, the planner's model becomes max V(s)
J s.t.
CX(so)Y h
{h
EYo(So)
-
'0,
g
EX o(lo; Xh(S)) 0
- YYla(slo; YO(so)) -Zlo
g
0
,
s EsS, and the Lagrange multipliers associated with the scarcity constraints yield both a vector v of shadow prices for (certain) period 0 commodities and a collection of shadow price vectors v,, for contingent commodities in period 1. An uncertain project dz = (dz, ... dz 1 ... ) would be evaluated as before using the criterion: accept dz if v dz is positive. Since an explicit analysis of uncertainty can quickly become quite involved, we shall not enter here into any substantial comment on the main issues of the literature. A major early contribution was that of Arrow and Lind (1970), who argued that under certain conditions uncertainty can be "ignored" in costbenefit analysis in the sense that appropriate random variables can be replaced
980
J. Drkze and N. Stern
by their expected values. Roughly speaking, these conditions involve, first, that risks in public projects should be uncorrelated with private risks, and, secondly, that public risks should be borne by a large number of individuals. The latter condition can be regarded as a particular aspect of the planner's "policy" in the uncertainty context. The limited validity of the Arrow-Lind result has prompted a good deal of subsequent research on the implications of uncertainty. Some valuable contributions include: Graham (1981) Henry (1974a, 1974b, 1981), Hirshleifer (1965), Sandmo (1972), Schmalensee (1976) and a discussion on the Arrow-Lind paper in the American Economic Review of March 1972. The issue of the correlation between project risks and the private sector is analogous to that examined using 1-coefficients in the theory of finance (see the Journal of Finance for many contributions). Very similar comments to the above apply to the idea of treating time in the general framework of Section 1 by dating commodities; indeed, the examples we have just given are intertemporal. Under appropriate intertemporal separability assumptions, however, the problem (P) can be reduced to a one-period format where private agents transact only in current commodities and the future is taken into account by imputing an appropriate social value to changes in the components of a suitably specified vector of stock variables (e.g. the capital stock of each firm, and the wealth of each individual). The correct social values to be assigned to these "investments" can of course ultimately be derived only by an explicit consideration of (P) in its full intertemporal version. A rigorous treatment of these issues is by no means trivial, and lies much beyond the scope of this chapter. In the model of Section 2, we have effectively allowed for time by assuming perfect future information and either a complete set of forward markets, or perfect capital markets. These assumptions of course sound very stringent and should be relaxed whenever the simplifications involved are seriously misleading; we would argue, however, that the approach we have adopted permits an instructive discussion of a number of practical issues involving time, and we hope to have illustrated the point in Section 3, e.g. in the analysis of the discount rate.
3.8. Some neglected issues We conclude this section by suggesting briefly how the framework we have developed can be applied or extended to deal with some issues which have not received here the attention they deserve, such as public goods, externalities, exhaustible resources and large projects. From the formal point of view, public goods pose no acute problem. A public good is distinguished mainly by the special rules according to which it is
Ch. 14: Theory of Cost-Benefit Analysis
981
allocated. For instance, in Section 2.4 we considered a public good for which Xih =Xi for all h and q = 0, and found its shadow price vi to satisfy (2.86):
f "p +
vi = h
T'aji.
(3.43)
j*i
The first term on the r.h.s. is the welfare weighted marginal willingness-to-pay and the second is the induced effect on shadow tax revenue through pure substitution effects. The optimal level of public good provision is found, as for any other commodity, by equating its marginal rate of transformation with other goods in the public sector to the corresponding shadow price ratios. There may, of course, be important measurement problems in implementing this kind of rule, e.g. when the marginal-willingness-to-pay for public goods is not directly observable; but, as we have seen, analogous problems can arise for private goods as well (e.g. when the calculation of the shadow wage involves the reservation wage). The measurement of the benefits from public activities has been a substantial topic in the literature on practical cost-benefit analysis. There are obvious examples arising in most of the major departments of government: a ministry of transport has to evaluate time saved by new roads; a ministry of health will be concerned with the appraisal of benefits from immunisation programmes; a water department will want to measure the benefits of more reliable water supply; educational allocations will require a view on the benefits of different expenditures and so on. Only a few general comments are possible here. First, in many cases the willingness-to-pay of the consumer will be an important element and one can try to discover from questions, experiment and estimation how large this might be. An obvious example is the implicit valuation of time which might be revealed by an individual's choice of transport mode. The reader is referred to Chapter 9 in this Handbook on public goods for further discussion of these issues. Secondly, one must consider carefully whether it is appropriate to follow individual preferences. There are important problems of externalities (in, for example, immunisation), of information (in, for example, water supply), of identifying the consumer or his preferences (e.g. in education -both parent and child are involved). Thirdly, one must be humble and avoid giving the impression that very precise measurement is possible. Public goods are discussed in more detail in Chapter 10 by Laffont, Chapter 9 by Oakland, and Chapter 11 by Rubinfeld in this Handbook. The literature on the economics of health, education or transport defies summary - the reader may consult journals specialising in such areas (e.g. Journal of Health Economics, Journalof Human Resources, and Journal of Transport Economics and Policy). Externalities are well within the framework of Section 1, and can easily be introduced into the model of Section 2. For instance, the presence of externalities
J. Drze and N. Stern
982
in private production no longer allows us to write the gth firm's supply function as y(p, yg), since the production plans of other firms would become relevant signals for this firm's decisions. However, even after taking into account all the relevant interdependencies, the aggregate net supply function of the private sector should still be a well-defined function of (p,... yg... ), say y *(p,... yg... ). Though this function would no longer satisfy all the usual properties of competitive supply functions, many of our results did not use these properties, while those which did can be restated appropriately by replacing the derivatives of y(-) by those of y * (.). For instance, with full taxation of profits, the marginal social value of an unrestricted producer ration yig now becomes: MSVy = p ayg*
(3.44)
The estimation of the functions y *( ) may or may not be more difficult than that of supply functions involving no externalities. For example, if firms are unconstrained the informational requirements would be similar (a time series of prices and quantities) and in other circumstances one may be able to observe or find proxies for the rations. Externalities and public goods are particularly relevant in the context of environmental problems. These involve important and difficult applications of the theory. For an introduction see Johansson (1985) and Dasgupta (1982). The latter volume also includes an extensive discussion of cost-benefit analysis with exhaustible and renewable resources. "Large projects" require a major extension of the framework of Section 1. For a large project Az z - z0 , one wishes to evaluate V*(zl) - V*(z°). For this, simple cost-benefit tests based on a single shadow price vector are no longer available - though separate necessity and sufficiency tests can still be designed if V* is quasi-concave [see, for example, Hammond (1980)]. The calculation of V*(zl)- V* (z ° ) will usually involve much more demanding econometric exercises than that of shadow prices, e.g. the -estimation of complete demand systems rather than of local elasticities. An interesting step forward in the analysis of large projects is provided by Hammond (1983), who develops second-order approximations of their welfare effects. Unfortunately, the theory of cost-benefit analysis for large projects is yet to make significant headway, though the extensive literature on consumer surplus is relevant in this context-for example Chipman and Moore (1980), Hammond (1983), Harris (1978), Hausman (1981), MacKenzie and Pearce (1982), and Willig (1976).
4. Summary and conclusions We have tried to present a fairly general theory of cost-benefit analysis within which specific rules and procedures for setting shadow prices can be examined.
Ch. 14: Theory of Cost-Benefit Analysis
983
Throughout we have worked with one definition of the shadow price of a good: it is the effect on social welfare of a marginal increase in the supply of that good from the public sector. This definition is the natural one when we wish to ask of any (small) project whether it increases welfare: this will be the case if the value at shadow prices of the change in net supplies represented by the project is positive. The definition also reflects two central features, of our approach. First, we are working in terms of social welfare; in other words we assume that there is a well-defined criterion against which the planner can evaluate outcomes. Secondly, there is just one environment corresponding to each public production plan so that the planner is able to work out the consequences of any particular change in public supplies. The welfare function can take many forms according to the values of the planner and we showed how it could' be related to intermediate or derived objectives such as output, employment and income distribution. Similarly, whilst the appraisal of a project requires us to be specific about its consequences, the theory of cost-benefit analysis as we have presented it does not tie us to a particular model of the economy. Where the planner has a genuine choice over the policies which will determine the consequences of a project we have argued that he should make these choices optimally. This essentially involves consistency in the sense that open choices will be settled using the same criterion as that applied in the appraisal of projects. Nevertheless, one can include in the model as many constraints on government policies as one wishes. And we retain as a special case throughout the "fully determined" model where constraints are so restrictive that the planner has no real degree of freedom in his choice, i.e. given a public production plan there is only one compatible environment. The issue of how policies are determined is important because the values of shadow prices can be very sensitive to alternative specifications of policy. When the public production set is convex, shadow prices decentralise an optimal public production plan. The shadow prices associated with an optimal production plan may take different values from those which would be appropriate for evaluating local changes from an arbitrary production plan; but the underlying notion of shadow prices is the same. We have also seen that, whether fully determined or not, the model can generally be written in such a way that shadow prices are equal to the Lagrange multipliers on the scarcity constraints in the planner's problem. This does not mean that unemployment of a resource implies zero shadow prices. For example, as shown in Section 3, extra public employment at some given wage would normally affect social welfare in an economy with unemployment of labour and thus the shadow wage would not be zero. In Section 2 we have analysed a particular example of the planner's model. Firms and households trade as price-takers subject to quotas on their net trades. The model can incorporate an arbitrary degree of control by the planner over
984
J. Drze and N. Stern
prices, indirect taxes, lump-sum transfers, quotas on households and firms, profits taxes and income taxes. On the other hand, such problems as oligopoly, migration or imperfect information would require extension of the model, although in principle they can be analysed by an appropriate specification of "side constraints". We shall not attempt to summarise the various rules we have derived in Section 2 and discussed in Section 3. We merely recall some of the prominent issues. First, we have examined production efficiency and the relation between shadow prices and producer prices. Public production should take place on the boundary of the public production set (provided not all shadow prices are zero). Shadow prices should, generally, be the same throughout the public sector. Except in rather special circumstances, however, shadow prices will not be proportional to producer prices and aggregate production efficiency of public and private firms together will not usually be desirable given the opportunities available to the planner. Secondly, we saw that, under simple conditions, relative shadow prices for traded commodities will coincide with relative world prices. This rule applies (commodity-wise) when either trade clears the market or an optimal quota can be set (in addition the amount traded does not affect producers and consumers except through the scarcity constraints). When world demand or supply curves are not perfectly elastic, one has to replace world prices by marginal costs or revenues. Thirdly, for a non-traded good, when an optimal ration can be imposed on producers, or the producer price can be chosen independently of the consumer price, the shadow price is equal to the marginal social cost of production plus the shadow value of extra profits generated by extra production. As a special case of this rule we saw that, under certain conditions, competitive private sector firms operating under constant returns to scale will break even at shadow prices. These rules are amongst the simplest and clearest of those we examined. Before they can be legitimately applied, however, the assumptions involved should be understood and assessed in context. In many situations it will be necessary to have recourse to the richer though more complicated set of rules investigated in Section 2.4 and Section 3. In conclusion we should like to indicate some directions for further research which are suggested by the difficulties we have encountered in this chapter. First, the potentialities of the model of Section 2 have not been fully exploited. In our judgement, the general equilibrium approach which it illustrates should be pursued further to clarify many of the issues and methods appearing in the literature. Secondly, more vigorous efforts should be made to capture such phenomena as uncertainty and imperfect competition in a form which lends itself to similar comparative static methods. Thirdly, a more thorough integration between cost-benefit analysis and the theory of reform should prove particularly
Ch. 14: Theory of Cost-Benefit Analysis
985
fruitful. Finally, an explicit treatment of informational problems would be of great value in bringing the difficulties of the practitioner to greater prominence in the theory. In the meantime governments have to appraise projects. It is our judgement that the theory currently available is sufficiently rich and flexible to provide useful guidance for those decisions. We hope that the principles we have set out here will help to understand, assess and improve the theoretical foundations of the practical methods which have been advocated, and to provide guidance for their proper application.
References * Abadie, J., 1967, Non-linear programming (North-Holland, Amsterdam). Ahmad, E. and N. Stem, 1984, The theory of reform and Indian indirect taxes, Journal of Public Economics 25, 259-298. Allais, M., 1943. A la recherche d'une discipline economique (Paris). Allais, M., 1977, Theories of general economic equilibrium and maximum efficiency, pp. 129-201 in: E. Schwoediauer, ed., Equilibrium and disequilibrium in economic theory (D. Reidel, Dordrecht). Anand, S. and V. Joshi, 1979, Domestic distortions, income distribution and the theory of optimum subsidy, Economic Journal 89, 336-352. Apostol, T.M., 1957, Mathematical analysis: A modem approach to advanced calculus (AddisonWesley, Reading, Mass.). Arrow, K.J., 1966, Discounting and public investment, in: A.V. Kneese and S.C. Smith, eds., Water research (Johns Hopkins University Press). Arrow, K.J. and M. Kurz, 1970, Public investment, the rate of return and optimal fiscal policy (Johns Hopkins University Press). Arrow, K.J. and R.C. Lind, 1970, Uncertainty and the evaluation of public investment decisions, American Economic Review 60, 364-378. Arrow, K.J. and T. Scitovsky, eds., 1969, Readings in welfare economics (Allen and Unwin, London). Atkinson, A.B. and J.E. Stiglitz, 1980, Lectures in public economics (McGraw-Hill, Maidenhead). Bacha, E. and L. Taylor, 1971, Foreign exchange shadow price: Critical review of current theory, Quarterly Journal of Economics 85, 197-224. Balassa, B., 1974a, Estimating the shadow price of foreign exchange in project appraisal, Oxford Economic Papers 26, 147-168. Balassa, B., 1974b, New approaches to estimation of shadow exchange rate: Note, Oxford Economic Papers 26, 208-211. Balassa, B., 1976, The "Effects Methods" of project evaluation, Bulletin of Oxford University Institute of Economics and Statistics 38, 219-231. Balassa, B. and M. Chervel, 1977, The "Effects Methods" of project evaluation once again: Plus reply, Bulletin of Oxford University Institute of Economics and Statistics 39, 345-353. Balassa, B. and D.M. Schydlowsky, 1968, Effective tariffs, domestic cost of foreign exchange and the equilibrium exchange rate, Journal of Political Economy 76, 348-360. Batra, R. and S. Guisinger, 1974, New approach to estimation of shadow exchange rates in evaluating development projects in LDCs, Oxford Economic Papers 26, 192-207. Baumol, W.J., 1968, On the social rate of discount, American Economic Review 58, 988-802. *Bibliographical assistance was provided in the summer of 1982 by Graham Andrew and references to subsequent literature are less systematic. The list below refers essentially to works mentioned in the text plus a little supplementary material which may be useful. It is in no sense exhaustive.
986
J. Drgze and N. Stern
Bell, C. and S. Devarajan, 1983, Shadow prices for project evaluation under alternative macroeconomic specifications, Quarterly Journal of Economics 97, 454-477. and shadow prices: Critical Bell, C., S. Devarajan and A. Kuyvenhoven, 1980, Semi-input-output Statistics 42, 257-262. and Economics of Institute note and reply, Bulletin of Oxford University perspective (Johns Hopkins Bell, C., P. Hazell and R. Slade, 1982. Project evaluation in regional University Press). with trade restrictions, Oxford Bertrand, TJ., 1974, Shadow Exchange Rates in an economy Economic Papers 26, 185-191. at world prices under trade Bhagwati, J.N. and T.N. Srinivasan, 1981, The evaluation of projects 385-399. 22, Review Economic restrictions, International and development planning, 2nd Blitzer, C., P. Clark and L. Taylor, 1977, Economy-wide models edition (Oxford University Press). foreign exchange constraints, Blitzer, C., P. Dasgupta and J. Stiglitz, 1981, Project appraisal and 58-74. 91, Economic Journal Economic Journal 84, Boadway, R.W., 1974, The welfare foundations of cost-benefit analysis, 926-939. Economic Studies 42, Boadway, R.W., 1975, Cost-benefit rules in general equilibrium, Review of 361-373. budgetalre, Econometrica Boiteux, M., 1956, Sur la gestion des monopoles publics astreints A 'equilibre 24, 22-40. to budgetary constraints, Boiteux, M., 1971, On the management of public monopolies subject (translation of Boiteux, 1956) Journal of Economic Theory 3, 219-240. and the choice of discount Bradford, D.F., 1975, Constraints on government investment opportunities 887-889. 65, Review Economic American rate, principle in evaluating Bruce, N. and R. Harris, 1982, Cost-benefit criteria and the compensation small projects, Journal of Political Economy 90, 755-776. in: C. Blitzer, P. Clark and Bruno, M., 1975, Planning models, shadow prices and project evaluation, edition (1977) (Oxford 2nd planning, development and models Economy-wide L. Taylor, eds., University Press) 197-211. International Economic Chipman, J.S. and J.C. Moore, 1978, The new welfare economics, 1939-1974, 547-584. 19, Review surplus and welfare, Chipman, J.S. and J.C. Moore, 1980, Compensating variation, consumers' American Economic Review 70, 933-949. Review of Economic Christiansen, V., 1981, Evaluating of public projects under optimal taxation, 447-457. 48, Studies the Norwegian system of Christiansen, V. and E.S. Jansen, 1978, Implicit social preferences in indirect taxation, Journal of Public Economics 10, 217-245. practice (Macmillan, Dasgupta, A. and D.W. Pearce, 1972, Cost-benefit analysis- theory and London). Dasgupta, P., 1982, The control of resources (Basil Blackwell, Oxford). Review of Economic Dasgupta, P. and J.E. Stiglitz, 1972, On optimal taxation and public production, 87-103. 39, Studies Journal of Political Dasgupta, P. and J.E. Stiglitz, 1974, Benefit-cost analysis and trade policies, Economy 82, 1-33. (UNIDO, New York). Dasgupta, P., S. Marglin and A.K. Sen, 1972, Guidelines for project evaluation (Cambridge University behaviour consumer and Economics 1980, Muellbauer, J. and A. Deaton, Press, New York). taste differences and Deaton, A.S. and N.H. Stern, 1986, Optimally uniform commodity taxes, lump-sum grants, Economics Letters 20, 263-266. 19, 273-292. Debreu, G., 1951, The coefficient of recourse utilization, Econometrica, Press, New Haven). University Yale Foundation, (Cowles value of Theory Debreu, G., 1959, Quarterly Journal of Diamond, P.A., 1968, Opportunity costs of public investment: Comment, Economics 82, 682-688. Economics 4, 335-342. Diamond, P., 1975, A many-person Ramsey tax rule, Journal of Public I: Production efficiency production public and taxation Optimal 1971, Mirrlees, Diamond, P. and J.A. and II: Tax rules, American Economic Review 61, 8-27 and 261-278.
Ch. 14: Theorv of Cost-Benefit Analysis
987
Diamond, P. and J.A. Mirrlees, 1976, Private constant returns and public shadow prices, Review of Economic Studies 43, 41-47. Diewert, W.E., 1983, Cost-benefit analysis and project evaluation: A comparison of alternative approaches, Journal of Public Economics 22, 265-302. Diewert, W.E., 1984, The measurement of deadweight loss in an open economy, Economica 51. 23-42. Dinwiddy, C. and F. Teal, 1987, Shadow price formulae for non-traded goods in a tax-distorted economy, Journal of Public Economics, forthcoming. Dixit, A.K., 1975, Welfare effects of tax and price changes, Journal of Public Economics 4, 103--123. Dixit, A.K., 1976a, Optimization in economic theory (Oxford University Press, London). Dixit, A.K., 1976b, Public finance in a Keynesian temporary equilibrium, Journal of Economic Theory 12, 242-258. Dixit, A.K., 1979, Price changes and optimum taxation in a many-consumer economy, Journal of Public Economics 11, 143-157. Dixit, A. and V. Norman, 1980, Theory of international trade (Nisbets, Welwyn). Dreze, J.H., 1974, Discount rates and public investment: A post scriptum, Economica 41. 52-61. Drgze, J.P., 1982, On the choice of shadow prices for project evaluation, Development Economics Research Centre Discussion Paper No. 16 (University of Warwick). Dreze, J.H., 1984, Second-best analysis with markets in disequilibrium: public sector pricing in a Keynesian regime, in: M. Marchand, P. Pestieau and H. Tulkens, eds., The performance of public enterprises (North-Holland, Amsterdam). Farrell, M.J., 1957, The measurement of productive efficiency, Journal of the Royal Statistical Society, Series A 120, 253-281. Findlay, R., 1971, Comparative advantage, effective protection and the domestic resource cost of foreign exchange, Journal of International Economics 1, 189-204. Fuss, M. and D. McFadden, eds., 1978, Production economics: A dual approach to theory and applications (North-Holland, Amsterdam). Gale, D., 1967, A geometric duality theorem with economic applications, Review of Economic Studies, 34, 19-24. Galenson, W. and H. Leibenstein, 1955, Investment criteria, productivity and economic development, Quarterly Journal of Economics 69, 343-370. Government of India, 1981, A technical note on the sixth plan of India, (Planning Commission, Government of India, New Delhi). Graaff, J. de V., 1957, Theoretical welfare economics, (Cambridge University Press, Cambridge). Graham, D.A., 1981, Cost-benefit analysis under uncertainty, American Economic Review 71, 715-725. Guesnerie, R., 1977, On the direction of tax reform, Journal of Public Economics 7, 179-202. Guesnerie, R., 1979, General statements on second-best Pareto optimality, Journal of Mathematical Economics 6, 169-194. Guesnerie, R., 1980, Second-best pricing rules in the Boiteux tradition, Journal of Public Economics 13, 51-80. Hammond, P.J., 1980, Cost-benefit as a planning procedure, in: D.A. Currie and W. Peters, eds., Contemporary economic analysis (Croom-Helm, London) 2, 221-250. Hammond, P.J., 1983, Approximate measures of the social welfare benefits of large projects, Institute for Mathematical Studies in the Social Sciences, Technical Report No 410 (Stanford University, California). Hammond, P.J., 1985, Project evaluation by potential tax reform, Institute for Mathematical Studies in the Social Sciences, Technical Report no. 459, Stanford University. Harberger, A.C., 1973, Project evaluation: Collected papers (Macmillan). Harberger, A.C., 1978, On the use of distributional weights in social cost-benefit analysis, Journal of Political Economy 86, S87-S120. Harris, R.G., 1978, On the choice of large projects, Canadian Journal of Economics 11, 404-423. Hausman, J.A., 1981, Exact consumer's surplus and deadweight loss, American Economic Review 71, 662-676. Henry, C., 1974a, Option values in the economics of irreplaceable assets, Review of Economic Studies Symposium 41, 89-104.
988
. Drize and N. Stern
Henry, C., 1974b, Investment decisions under uncertainty: The irreversibilityy effect", American Economic Review 64, 1006-1012. risque, Econometrica 49, 153-170. Henry, C., 1981, Critres simples pour la prise en compte du - theoretic approaches, Journal Hirshleifer, J., 1965, Investment decisions under uncertainty- Choice 509-536. 79, Economics of Journal of Public Economics 18, Johansson, P.O., 1982, Cost-benefit rules in general equilibrium, 121-137. measurement, Working Paper no. 45, Johansson, P.O., 1985, Environmental benefits: theory and Umea. Sciences, Agricultural of Swedish University 1, 359-378. Kay, J.A., 1972, Social discount rates, Journal of Public Economics method; with empirical applications to Kuyvenhoven, A., 1978, Planning with the semi-input-output Nigeria, Studies in Development and Planning 10 (Nijhofl). wage rate, Oxford Economic Papers 25, Lal, D., 1973, Disutility of effort, migration and the shadow 112-126. (Penguin Books). Layard, R., ed., 1972, Cost-benefit analysis: Selected readings of distributional weights in social use the On 1980, Harberger, A.C. and Squire L. R., Layard, 88, 1041-1052. Economy Political of Journal Comments, cost-benefit analysis: Amsterdam). Lesourne, J., 1975, Cost-benefit analysis and economic theory (North-Holland, policy (Johns Hopkins University energy in risk and time for Discounting 1983, ed., R.C., Lind, Press). second-best, Review of Economic Lipsey, R.G. and K. Lancaster, 1956-57, The general theory of the Studies 24, 11-32. consumption and investment, Little, I.M.D., 1961, The real cost of labour and the choice between 1-15. 75, Economics of Journal Quarterly planning for developing countries Little, I.M.D. and J.A. Mirrlees, 1974, Project appraisal and (Heinemann, London). London). Little, I.M.D. and M.FG. Scott, 1976, Using shadow prices (Heinemann, pricing in a model of shadow and production Public 1983, Pestieau, P. and Mintz Marchand, M., J. Louvain-la-Neuve). (CORE, 8315 No. Paper Discussion markets, disequilibrium in labour and capital rate of investment, Quarterly Marglin, S.A., 1963a, The social rate of discount and the optimal Journal of Economics 77, 95-112. Quarterly Journal of Economics 77, Marglin, S.A., 1963b, The opportunity cost of public investment, 274-289. London). Marglin, S.A., 1967, Public investment criteria (Allen and Unwin, (Oxford University Press). economy labour-surplus the in price and Value Marglin, S.A., 1976, wage, Oxford Economic shadow the and migration gap, wage Mazumdar, D., 1976, Rural-urban Papers 28, 406-425. and income distribution in McArthur, J.D. and G.A. Amin, eds., 1978, Cost-benefit analysis (Pergamon Press Ltd., New 2 No. 6, Vol. Development, World symposium, developing countries: A York). and the cost of living, Review of McKenzie, G.S. and I.F. Pearce, 1976, Exact measures of welfare 465-468. 43, Studies Economic (Oxford University Press). Meade, J.E., 1955, Trade and welfare: A mathematical supplement income taxation, Review of Economic Mirrlees, J.A., 1971, An exploration in the theory of optimal Studies 38. 39, 105-111. Mirrlees, J.A., 1972, On producer taxation, Review of Economic Studies and Unwin, London). (Allen introduction informal An analysis: Mishan, E.J, 1975, Cost-benefit under rationing, European Neary, J.P. and K.W.S. Roberts, 1980, The theory of household behaviour Economic Review 13, 25-42. for developing countries (The World Newbery, D.M.G. and N.H. Stern, 1987, The theory of taxation Press). University Bank and Oxford optimal planning (International Arts Novozhilov, V.V., 1976, Problems of cost-benefit analysis in and Sciences Press). of the third London airport, Nwaneri, V.C., 1970, Equity in cost-benefit analysis: A case study 234-254. 4, Policy and Economics Journal of Transport A text in cost-benefit analysis Pearce, D.W. and C.A. Nash, 1981, Social appraisal of projects: (Macmillan).
Ch. 14: Theory of Cost-Benefit Analysis
989
Pestieau, P.M., 1975, The role of taxation in the determination of the social rate of discount, International Economic Review 16, 362-368. Prest, A.R. and R. Turvey, 1965, Cost-benefit analysis: A survey, Economic Journal 75, 683-735. Roberts, K.W.S., 1978, On producer prices as shadow prices, Mimeo (M.I.T.). Roberts, K.W.S., 1980, Price independent welfare prescriptions, Journal of Public Economics 13, 277-297. Roberts, K.W.S., 1982, Desirable fiscal policies under Keynesian unemployment, Oxford Economic Papers 34, 1-22. Roskill, Lord, 1971, Commission on the Third London Airport, HMSO. Sah, R.K. and J.E. Stiglitz, 1985, The social cost of labour and project evaluation: a general approach, Journal of Public Economics, 28, 135-163. Sandmo, A., 1972, Discount rates for public investment under uncertainty, International Economic Review 13, 187-302. Sandmo, A. and J.H. Dreze, 1971, Discount rates for public investment in closed and open economies, Economica 38, 396-412. Schmalensee, R., 1976, Public investment criteria, insurance markets and income taxation, Journal of Public Economics 5, 425-445. Scott, M.FG., 1974, How to use and estimate shadow exchange rates, Oxford Economic Papers 26, 169-184. Sen, A.K., 1967, Isolation, assurance and the social rate of discount, Quarterly Journal of Economics 81, 112-124. Sen, A.K., 1972, Control areas and accounting prices: An approach to economic evaluation, Economic Journal 82, 486-501. Squire, L. and H.G. van Der Tak, 1975, Economic analysis of projects (World Bank, Johns Hopkins University). Srinivasan, T.N., 1980, General equilibrium theory, project evaluation and economic development, in: M. Gersovitz, C. Diaz-Alejandro, G. Ranis and M.R. Rosenzweig, eds., Development: Essays in honour of W. Arthur Lewis (Allen and Unwin, London) Ch. 14, 229-251. Stern, N.H., 1972, Optimum development in a dual economy, Review of Economic Studies 39, 171-184. Stern, N.H., 1976, On the specification of models of optimum income taxation, Journal of Public Economics, 6, 123-162. Stern, N.H., 1977, The marginal valuation of income, in: Artis and Nobay, eds., Studies in modern economic analysis (Blackwell). Stiglitz, J.E., 1982, Structure of labour markets and shadow prices in LDC's, in: R.H. Sabot, ed., Essays in migration and the labour market in LDC's, Westview. Stiglitz, J.E., 1983, Rate of discount for cost-benefit analysis and the theory of the second best, in: R.C. Lind, ed., Discounting for time and risk in energy policy (Johns Hopkins University Press). Stiglitz, J.E. and P. Dasgupta, 1971, Differential taxation, public goods and economic efficiency, Review of Economic Studies 38, 151-174. Willig, R.D., 1976, Consumers' surplus without apology, American Economic Review 66, 589-597.
Chapter 15
PARETO EFFICIENT AND OPTIMAL TAXATION AND THE NEW NEW WELFARE ECONOMICS JOSEPH E. STIGLITZ* Princeton University
1. Introduction'
For more than a hundred years, economists have attempted to show that progressive taxation can be justified on more fundamental principles. Among the earliest of such attempts was that of Edgeworth (1868, 1897), who tried to show that utilitarianism (combined with two other assumptions) implied progressivity. His argument was simple: he postulated that all individuals had the same utility of income function, and that they exhibited diminishing marginal utility. Since social welfare was simply the sum of the utility functions of all individuals, it immediately followed that the decrease in social welfare from taking a dollar (or a pound) away from a rich person was less than the decrease in social welfare from taking a dollar away from a poor person. With the advent of the New Welfare Economics in the 1930s, interest in optimal taxation waned: interpersonal utility comparisons of the kind employed in utilitarianism were viewed to be inadmissible. Whether taxes should be progressive might be an appropriate question for a moral philosopher, but not for *All workers in this area owe an intellectual debt to James Mirrlees. I should like to acknowledge this debt as well as my indebtedness to my several collaborators who have greatly influenced the formulation of my ideas: Partha Dasgupta, Richard Arnott, Raaj Sah, Bob Britto, Steve Slutsky, John Hamilton, and especially A.B. Atkinson. I am also indebted to Alan Auerbach, Yungoll Yun and B. Salanie who provided comments on an earlier draft. Financial support from the National Science Foundation is gratefully acknowledged. 'Although this chapter is primarily a survey of existing literature, several sections are based on until now unpublished work, and several sections contain new proofs of previously published results. New results are reported in Section 8, on optimal taxation where those who are more productive in "outside" employment are also more productive in activities at home; Section 13.1, on the taxation of capital income in imperfect capital markets; Section 14, on altruism and inheritance taxation; and Section 15, on commitment. New proofs are contained in Section 5, on random taxation; in Section 3, on partial pooling; in Section 9, on Pareto efficient taxation with general equilibrium effects; and in Section 2, on characterizing the Pareto efficient tax structure. Handbook of Public Economics, vol. 1I, edited by A.J. Auerbach and M. Feldstein © 1987, Elsevier Science Publishers B. V. (North-Holland)
992
J.E. Stigitz
an economist. Economists, in their role as economists, should limit themselves to identifying Pareto efficient allocations, and to finding Pareto inefficiencies and showing how these could be eliminated. Many policy economists found this a too circumscribed view of their role. In almost all of the choices which they faced, some individuals were better off, others worse off. If they could not make some interpersonal comparisons, they would not be in a position to contribute much to policy debates. This provided the setting for the re-examination of the design of tax structures in the 1970s, at first making use of utilitarian social welfare functions, but then broadening the analysis to investigate the consequences of a wider class of social welfare functions. The major advance over Edgeworth's earlier work [due to Mirrlees (1971)] was the recognition of the incentive effects associated with taxation; 2 there was a trade-off between equity and efficiency considerations, a trade-off which Edgeworth had ignored. Mirrlees' calculations (though only an example) provided less support, however, for the advocates of progressivity than they had hoped. With what seemed reasonable hypotheses concerning individual utility functions, the optimal tax schedule looked close to linear. Those brought up within the doctrines of the New Welfare Economics felt distinctly uneasy about these developments; the analysis seemed so dependent on the implicit or explicit interpersonal utility comparisons. This chapter is concerned with what the New New Welfare Economics has to say about the design of tax structures. The New New Welfare Economics is distinguished by two features: First, unlike much of the "old" New Welfare Economics and unlike the earlier Utilitarian analysis, it does not assume that the government has at its disposal the information required to make lump-sum redistributions; it identifies who is able to pay higher taxes at least partly by using endogenous variables, like income or consumption. 3 It is this limitation on the information of the government which results in taxation being distortionary, and which gives rise to the trade-off between equity and efficiency. Granted that the government has imperfect information, on the basis of which it can redistribute income, what is the best that can be done? The New New Welfare economics attempts to answer this question and to describe the Pareto efficient tax structures, i.e. the tax structures which get the economy to the utilities possibilities schedule, given the limitations on the government's information and other limitations on the government's ability to impose taxes. Just as the earlier New Welfare Economics argued that it is essential first to identify the set of Pareto efficient allocations, so that one could separate out efficiency considerations from the value judgements associated with choices 2 The presence of trade-offs between equity and efficiency considerations had, of course, long been recognized, and there had even been some formal modelling [see Fair (1971)]. 3It may, of course, also use exogenous variables, like age.
993
Ch. 15: New New Welfare Economics
among Pareto efficient points, so too does the New New Welfare Economics. There are some properties that all Pareto efficient tax structures have, whereas other properties may be specific to particular Pareto efficient tax structures, e.g. those which would be chosen with a utilitarian social welfare function. Economists may make an important contribution by identifying Pareto efficient tax structures, but it still remains the case that many of the critical choices necessitate interpersonal trade-offs, choices among alternative Pareto efficient allocations. The use of Social Welfare Functions can provide a useful way of thinking systematically about. these trade-offs; one of the objectives of this chapter is to show how this has been done, and some of the dangers and pitfalls that arise in doing so. The topics which are the subject of this chapter include some of the areas in which there has been the most active research during the past fifteen years. We cannot and do not provide a complete survey; what we attempt to do is to put this literature in perspective, to provide some of the key parts of the analysis, and to refer the reader to more detailed surveys or articles dealing with particular topics. The chapter is divided into three parts. Part I discusses Pareto efficient taxation in a one-commodity world. The central question is: How progressive should the tax structure be? Part II is concerned with the circumstances under which it is desirable to impose commodity taxation at different rates on different commodities and to tax the return to capital. We conclude in Part III with some general observations concerning extensions and interpretations of the general theory.
Part I: One-commodity neoclassical models
2. The basic model The basic issues can be illustrated there are two types of individuals assume that type 2 individuals are their output per unit of time spent have the same utility function,
by a simple model of an economy in which and a single produced commodity; we shall more productive than type 1 individuals, i.e. working is larger. Assume that all individuals
U(Ci, Li), where C i is the consumption of an individual of type i (this may be a whole vector of consumption goods, possibly dated), and L i is his labor supply. 4 An 4
Throughout, we assume that utility functions are "well behaved".
994
J9. Stiglitz
rium
m taxation
U2 Figure 2.1. Utility possibilities schedule with and without lump-sum transfers.
individual of type i has a productivity (output per hour) of wi. Thus, in the absence of taxation, the individual's budget constraint would simply be (1)
C = Yi = WiL,
where Y, is the individual's income. By imposing lump-sum taxes (or subsidies), Mi, on individuals of type i (so the individual's budget constraint is now
C, =
- M= W,L,-
M),
(1')
so that the revenues raised by taxes on one group are redistributed as lump-sum payments to the other group: M 1N 1 + M2 N 2 = 0,
(2)
where N is the number of individuals of type i, we can trace out the whole utilities possibilities schedule (see Figure 2.1).
2.1. Optimal lump-sum taxes The optimal redistributive tax depends on the social welfare function. Two social welfare functions which have been extensively employed are the utilitarian,
W= E ui,
(3a)
995
Ch. 15: New New Welfare Economics
UI
e
U2
Figure 2.2. With different social welfare functions, different points on the utilities possibilities schedule will be chosen.
and the Rawlsian, W= mintUi),
(3b)
where U' is the utility of the ith type of individual. The former is maximized at the point on the utilities possibilities schedule where the slope is - 1; the latter by the intersection of the utilities possibilities schedule with a 45 ° line (Figure 2.2). Analytically, the utilitarian solution with lump-sum taxation is characterized by the equality of the marginal utility of consumption: 5
8u 1 /aC 1 = au2/aC2;
(4)
while the marginal rate of substitution between goods and leisure is equal to the individual's marginal product (wage):
w, au'/ac, + au'/aL = 0.
(5)
This, in turn, means that so long as leisure is a normal good, the more able individual is actually worse off. (This assumes, of course, that we can make interpersonal comparisons. The utilitarian solution has the property that the high 5
Assume output Q = F(N, L, N2 L2), a function of the inputs of the two types of labor. We form the Lagrangian:
y= NU + NU
2
+ F[ F(NL, NL) - (CN + C2 N2 )].
Equations (4) and (5) follow from the first-order conditions.
J.E. Stiglitz
996
nstraint le with ax
Figure 2.3. Utilitarian optimum with lump-sum taxes: the more able work more and are worse off than the less able.
ability individuals would prefer the {C, L } allocation of the low ability individual.) This can be seen most easily in the case of a separable utility function Ui= u(C/) - v(Li).
(6)
For then, (4) ensures that all individuals, regardless of their ability, receive the same level of consumption; but then the efficiency condition (5) implies that those with a higher productivity have a higher marginal disutility of work, i.e. that they provide a greater labor supply (see Figure 2.3). (This seems in accord with the Marxian dictum, "from each according to his abilities, to each according to his needs"). Notice that the more able individuals are actually better off in the Rawlsian solution than they are under utilitarianism.
2.2. Imperfect information The previous analysis assumed that the government had perfect information about who was of type 2, and who was of type 1. Clearly, if the government could not tell, the more able individuals would (under the utilitarian solution) have no incentive to come forward to claim their greater ability; they would know that, if they were to do so, they would actually be worse off. The government, in its choice of tax structures, must recognize these limitations on its information. In our example with only two groups, the government has a choice of two kinds of tax structures: (a) those in which the government is unable to distinguish (ex post
Ch. 15: New New Welfare Economics
997
or ex ante) among individuals; this is called a pooling equilibrium; 6 and (b) those in which the more able and the less able can (ex post) be identified as a result of the actions undertaken by the different groups; this is called a self-selection equilibrium. 7 To attain a self-selection equilibrium, the two groups have to be given choices, such that the first group prefers one choice, the second group the other choice. (These are called the self-selection constraints.) 8 We assume, however, that the individual's labor input is not observable, just as his productivity is not observable; his before-tax income, Yi= wLi, is, however, observable. (Obviously, if both before-tax income and labor input were observable, then one could immediately infer the individual's productivity, wi; more generally, even if hours on the job is observable, effort is not.) 9 Thus, it may be possible to differentiate among individuals on the basis of their choices of the observable variables ({C, Y }. We seek to characterize the Pareto efficient pairs of { C i, Y,}. To do this, we rewrite the ith individual's utility function in terms of the observable variables {Ci, Y }: Vi(C,, Y) -- U(C,, Lw,/w) = U(C,, Y/w,).
(7)
It is useful to note three properties of this derived utility function. First, increases in Y (income) decrease utility (keeping C, consumption, fixed), because to attain the increases in income, more labor must be expended. Second, the more productive individual has a flatter indifference curve. This follows from observing that the marginal rate of substitution, (aC'/a%) ;
-
[aV aY]
/[dav ac] = - [(aU'/aL')/w']/[aU'/aCi].
6 This vocabulary was first introduced, in the context of the analysis of how competitive markets distinguish among individuals with different characteristics, by Rothschild and Stiglitz (1976). 7It is also referred to as a separating equilibrium, because the observed actions of the individuals separates them according to types; or as a revealing equilibrium, because their actions reveal who they are. 8Since, in self-selection equilibria, individuals' types can be identified, one could, alternatively, have asked the individual what type he was, with the understanding that the consequences for the individual are precisely those associated with the self-selection equilibrium. Similarly, in the pooling equilibrium, if the individual were told that nothing would depend on what statement he made concerning his abilities, he would have no incentive to lie. Thus, the set of possible equilibria correspond to those which could be attained if individuals are simply asked what their ability is, and are provided with incentives not to lie. This (loosely speaking) is referred to as the revelationprinciple. See R. Myerson (1983). The self-selection constraints are sometimes referred to as the incentive compatibility constraints. 9In jobs where individuals punch time clocks, presumably hours worked is observable. But note that were the government to base its tax on the individual's wage rate, firms would have an incentive to collude with the workers in "cheating" on the official hours worked. Such practices are already common in circumstances where bureaucratic regulations impose a maximum on compensation per hour.
998
J.E. Stiglit
If leisure is a normal good, then at any given set of values of (C, Y }, labor will be lower for the more productive individual, and hence the marginal rate of substitution between consumption and labor will be lower, i.e. (ac'/aL%)= - [a'/aL']/[au'/ac'] is smaller. Higher ability individuals have flatter indifference curves (in (C, Y}). (See Figure 2.4.) Finally, we note that when individuals maximize their utility in the absence of taxation, they set their marginal rate of substitution between consumption and labor equal to the wage, i.e.
(ac'/aL')v= - [aU'/aL']/[au'/ac']= w, from which it immediately follows that in the absence of taxation
(aci/a ').= - av/ari]/[avi/aci] = - [(aU'/aL/ )/wj][aU/ aCi] = 1; the slope of the indifference curve in ({C, Y } space is unity. The first basic result on Pareto efficient tax structures is this: with two groups, Pareto efficiency requires a self-selection equilibrium; the government will, in fact, wish to differentiate among individuals on the basis of their observed incomes [Stiglitz (1982b)].1 0 This result is not general: with many groups, though a complete pooling equilibrium is not consistent with Pareto efficiency, a partial pooling equilibrium (one in which several different types take the same action) may be." The set of Pareto efficient self-selection equilibria is easy to characterize: though the government in general sets a whole tax function (which in turn determines the relationship between after-tax and before-tax income), when there are only two groups, only two points on this schedule will be chosen. We simply l0 This result is easy to see. Assume there were a pooling equilibrium, E, as depicted in Fig. 2.4. Any point lying between the two indifference curves will, together with E, separate. For instance, if the government were to offer H and E, the more able would choose H and the less able E. Assume that the slope of the more able indifference curve through E is flatter than 1. Then there exists points (such as H) slightly above the more able individual's indifference curve, which will be chosen only by the more able, and which increase government revenue (since consumption increases less than income.) The pair (H, E} Pareto dominates E. (If the slope of the more able indifference curve through E is steeper than 1, then there exists a point below E, in the shaded area, which will separate and which will generate more government revenue.) "The essential assumption for this result is that the indifference curve between consumption and pre-tax income is unambiguously steeper for the less able.
999
Ch. 15: New New Welfare Economics
C U
E
Y Figure 2.4. More able individuals have flatter indifference curves.
need to characterize those two points, or equivalently, to characterize the {consumption, income} decisions of the two groups (Figure 2.5). Formally, then, the set of Pareto efficient allocations is described by the solution to
max
C, C2 , 1 Y , 2)
v 2 (C2, Y2 )
(8)
C
ion Tax
/' / /
/ / /
450 Y Figure 2.5. Although the government may set a whole tax function, which in turn determines the relationship between after tax and before tax income, when there are only two groups, only (at most) two points will be chosen.
1000
J.E. Stiglitz
s.t. VI (C, Y) > U1,
(9)
V 2 (C2 , Y2) 2 V2 (C1 , Y1) WV(C2, Y2) Ž YV(C,
the self-selection constraints,
,l
Vt(Cl , Y1) 2> V(C2, Y2)
(10)
10)
(11)
R = (Y 1 - C 1)N 1 + (Y2 - C2)N2
R, the revenue constraint R>
(12)
(where R is government revenue, R is the revenue requirement).1 2 The Lagrangian for this maximization problem may be written as "=
V 2 (C 2 , Y2 ) +
2 V'(C1 , Y1 ) + x 2 (V(C
+X(V1(C1 , Y1)-
2
, Y 2 ) - V 2 (C1, yl))
V1 (C2 , Y2 )) + y[(Y1 - C)N
1 + (Y 2 - C2 )N 2 -
R].
(13) The first-order conditions for this problem are straightforward: a
avv 2
al
ac ac aC, = 1ay yat
X2 C X-
- - A2 2
ay c2
av
ay _y2
aV 2
,
aV2
avW -/
2-,
av'
-
y
+
av 2 2 aC 2 -
+
2
(14a)
av l+X aY
aV 1 la 2 -
aV2
3Y2 Y_ -
ac N-yNI,=o0,
aV I
1 +
+ yN 1 = 0,
N2 =0, yN 2 = 0.
(14b)
(14c) (14d)
12 Notice that this problem is just the dual to the standard problem of a monopolist attempting to differentiate among his customers [Stiglitz (1977) and (forthcoming)]. There, the problem was to maximize profits (corresponding to R here), subject to utility constraints on each of the two types of individuals (that he be willing to purchase the good, referred to as the reservation utility or individual rationality constraint) and subject to the self-selection constraints. The Lagrangian which we form to analyze the two problems is identical. Here the constraint on the individual's utility arises because we wish to ensure him a certain level of utility; in the standard monopoly problem, the constraint arises because the transaction is voluntary, and if the monopolist does not offer him the level of utility that he could have obtained without trading with the monopolist, he will choose not to deal with him.
1001
Ch. 15: New New Welfare Economics
(a)
(b) V2 slope >1
C slope 1
slope =1 2
_T Y (c)
Figure 2.6. (a) First-best taxation fully revealing. (b) First-best taxation not fully revealing: Pareto optimal taxation entails a positive marginal tax rate on low ability and a zero marginal tax rate on high ability. (c) First-best taxation not fully revealing: Pareto optimal taxation involves a zero marginal tax rate on low ability and a negative marginal tax rate on high ability.
It is easy to see that, under our assumptions concerning the relative slopes of the indifference curves, there are three possible regimes: (i)
X1 = 0, X2 = 0 [Figure 2.6(a)];
(ii)
X = , A2 > 0 [Figure 2.6(b)]; and
(iii) X2 = 0, Al > O [Figure 2.6(c)]. That is, at most one of the two self-selection constraints is binding. Moreover, it is also easy to show that [ > 0, i.e. the constraint on the utility level of the low ability individuals is binding. The case where X = A2 = 0 is illustrated in Figure 2.6(a). With first-best taxation, the equilibrium is fully revealing. This is always the case at the competitive equilibrium. Hence, if the social welfare function chooses the competitive allocation (point C in Figure 2.1) or a point near it, neither self-selection constraint will be binding. 3 13The same result obviously holds if a uniform lump-sum tax is imposed. (See Figure 2.7.)
J.E. Stigit
1002
The "normal" case, on which most of the literature has focused, is that where AX= 0 and 2 > 0. This arises whenever the government wishes to improve the welfare of the poor significantly relative to what they would have obtained at the competitive allocation (with uniform lump-sum tax to raise the requisite revenue). With a utilitarian objective function ( = 1) (or indeed any concave social welfare function) and separable utility functions it can be shown that this is the only possibility.l4 But, more generally, the possibility that X > 0 and X2 = 0 cannot be ruled out. The case with (XA> 0 and X2 = 0} has the property that if lump-sum taxation were feasible, the lump-sum tax imposed on the low ability individual would exceed that on the high ability [Figure 2.6(c)]. We can again characterize the utilities possibilities schedule, this time under the constraint that lump-sum taxes are not feasible. The new utilities possibilities schedule coincides with the old over a range, but elsewhere is interior. The range over which the two coincide corresponds to the situations where X1 = 2 = O. (See Figure 2.1.) 2.3. Implementing the optimal allocations by means of a tax structure In the preceding subsection we described the Pareto efficient resource allocation. We posed the problem as if the government confronted the individual with a choice of two "bundles". These allocations can easily be implemented by means of a tax schedule, as illustrated in Figure 2.8. The tax schedule must pass through the points Ci, Y }, i = 1, 2; and elsewhere must lie below the indifference curves through Ci, Y. Given such a tax schedule, individual i will clearly choose the point Ci, Y, }. If the tax schedule were differentiable, T= T(Y), (so that C = Y - T(Y)) the individual's first-order condition would be ali'/aL,
.__ -___ = (1 au/acq
- T')wi,
(15)
or
av'/ac,
(1- T').
14The issue of which of the self-selection constraints is binding has played a major role in the debate over the relevance of the theory of implicit contracts with asymmetric information; with separable utility functions and risk-neutral firms, one obtains over-employment (all firms wish to pretend to be in the good state). (See Azariadis and Stiglitz (1983); or Stiglitz (1986b). The analysis is exactly parallel to that presented here.)
1003
Ch. 15: New New Welfare Economics
C
Y Figure 2.7. At the competitive allocation, the self-selection constraints are never binding (with no taxation, C = Y).
The left-hand side is the individual's marginal rate of substitution; the right-hand side is the after-tax marginal return to working an extra hour. It is clear from Figure 2.8 that the implementing tax schedules will not, in general, be differentiable. Still we shall refer to 1 aU'/aL, avi/a i 1+ a -/a1 + as the marginal tax rate.15 2.4. The optimal tax structure with X2 > O,X, = O Dividing (14d) by (14c) we immediately see that - av 2 /aY2 av2/aC2=
au 2/aL 2 1 - -(17a) au 2 /aC 2 W2
15
There exist optimal tax structures for which 1 aU/aLi +
w, aUl/aci is the left-handed derivative of the tax function at Y = wi Li.
(16)
J.E. Stiglitz
1004 l
IrVI nclm IJU
rnni-n2
Tax Schedule
Income Figure 2.8. Implementing the allocation [{ C, Y }, {C 2 , Y2 }] by means of a tax system.
i.e. the marginal tax rate faced by the more able individual is zero (although the average tax rate is positive). [This corresponds to the result noted earlier by Sadka (1976) and Phelps (1973).] The argument for this can be easily seen diagrammatically. Assume that the government offered a pair of separating contracts, { E1 , E2 }, as in Figure 2.9, but in which the slope of the more able's indifference curve (in terms of (C, Y }) at (E2 } differed from unity. Then, keeping E1 } fixed, and moving along the more able's indifference curve, one obtains alternative pairs of separating contracts; if the slope at (E2} was less than one, by increasing Y2, one increases 2's consumption by less than income, and hence government revenue is increased; the original allocation could not have been Pareto efficient; similarly if the slope at ( E2 } is greater than unity, one can increase government revenue by reducing Y2.
Figure 2.9. Pareto efficiency requires a zero marginal tax rate on the more able: moving from E } increases revenue, but leaves the utility of all individuals unchanged.
E2 } to
1005
Ch. 15: New New Welfare Economics
Dividing (14b) by (14a): av'/Y
_
-1
aU/aLl
w. au
avl/ac
1/acl
_
-X
2
aV 2 /aY 2
x2aV /a,
+ yN, + yN1
7b
1
a 2,
it therefore follows that
a 2 < al < 1.
We immediately see that the marginal tax rate faced by the less able individual will be positive.1 6
2.5. The optimal tax structure with X > O, X 2 = Exactly the same kinds of arguments as used in the previous section can be employed to establish that if X1 > 0 and X2 = 0, the marginal tax rate faced by the more able individual is negative; self-selection requires that they work more than they would in a non-distortionary situation. For the rest of this chapter, we focus our attention on the "normal" case with X1 = 0 and h 2 > 0. Our analysis has thus revealed a second property of Pareto efficient tax structures: at least one of the groups must face no distortionarytaxation. The result holds regardless of the number of groups in the population. 16This corresponds to the result noted earlier by Mirrlees for the case of a continuum of types.
J.E, Stiglitz
1006
2.6. Equity-efficiency trade-offs
This analysis makes clear the nature of the equity-efficiency trade-offs that are inherent in redistributive tax policies. There is a "distortion" associated with redistributing any significant amount of resources from the more able to the less able. The distortion arises from the self-selection constraint. The magnitude of the distortion depends in part on the amount of revenue that the government is attempting to raise for public goods. Note that one feasible policy, a uniform lump-sum tax on all individuals, is among the set of Pareto efficient tax structures. It is, however, one which does not seem acceptable to most modern governments. [It is sometimes assumed that lump-sum taxes are not feasible. Uniform lump-sum taxes are clearly feasible (provided they are not too large). What is not feasible are lump-sum taxes at different rates for individuals of different abilities; this is not feasible because the government does not have the requisite information to implement such a tax.] 3. Continuum of individuals Though most of the central insights in the theory of Pareto efficient (and optimal)
taxation can be gleaned from the simple model presented above, the earliest analyses of optimal income tax structures employed a model in which there was a continuum of individuals [Mirrlees (1971)]. A rigorous analysis of that problem entails a large number of mathematical niceties, which are discussed by Mirrlees (1976, 1985). Here, we ignore these and follow the approach of Atkinson and Stiglitz (1976) using Pontryagin's technique. It is assumed that the distribution of individuals by ability is given by F(w). The problem of the government is to maximize
fU(C(w), Y(w)/w)dF(w)
(18)
subject to the national income constraint,
f[C(w) - Y(w)] dF(w) = 0,
(19)
and subject to the self-selection constraints, which here take on the simple form that utility must increase with w, which we write (making use of the first-order condition for utility maximization) dU/dw = - ULY/W 2 .
(20)
Ch. 15: New New Welfare Economics
1007
Here, we follow Atkinson and Stiglitz (1980) in analyzing a slightly more general form of the problem, employing a social welfare function fIG(U) dF(w).
(21)
Utilitarianism involves the special case of G 1. The Rawlsian social welfare function is G'(U)= 0, for U> Umin; G(U)= U, for U= Umin. Egalitarian social welfare functions entail G" < 0, the social marginal utility of income is nonincreasing in utility. Form the Hamiltonian, H = [G(U)- y(C- Y)](dF/dw) - XdU/dw,
(22)
and let U be our "state variable" (like capital in an intertemporal problem), and L(w) be the control variable (like investment in the typical optimal accumulation problem) - knowing L and U determines Y(w) = wL(w) and C(w) [from the equation U(w)= U(C(w), L(w)]. Now take the derivative of H with respect to L: OH/aL = y{w - (dC/dL) }uf + X*UL/w = 0,
(23)
where e* = a ln(dU/dw)/a In L = a In - (LUL/w)/a In L,
(24)
where f= dF/dw, the density function, and where (dC/dL)u= - UJUC. In the case of a separable utility function, we thus obtain: e* = 1 + LULL/UL.
(24a)
Again making use of the individual's first-order condition, we can rewrite (23) as T'/(1- T') = UcX(w)e*/yfw,
(25)
where T' is the marginal tax rate. Thus (25) implies that the marginal tax rate depends on four factors: (1) The marginal tax rate is lower, the larger the fraction of the population f at the particular income level that pays that marginal tax rate; one does not want to impose large distortions where there are many people. (2) The marginal tax rate is lower, the smaller is X(w), the shadow price on the self-selection constraint. Analyzing X(w) is a rather complicated matter. For the
J.E Stiglitz
1008
Rawlsian social welfare function, it is monotonically decreasing. For the utilitarian social welfare function, it increases and then decreases. The one general result that holds (for ability distributions with a finite range) is17 that at the extreme, the marginal tax rate on the highest income (ability) individual should be zero. This result corresponds to the result noted earlier in the two-group model. It is often cited as one of the few general, qualitative propositions to arise from the theory of optimal taxation, but in fact, as we shall see, it is not a very general result. The intuition behind the result is simple: one of the advantages of increasing the marginal tax rate at a particular income level is that the average tax rate of those at higher income levels is increased (keeping their marginal tax rates unchanged). The distortion (dead weight loss) arising from any tax is associated with the marginal tax rate; thus increasing the average tax rate on an individual while keeping the marginal tax rate the same increases revenues without increasing the deadweight loss. (3) The marginal tax rate is lower (other things being equal) the higher the productivity of the group; that is, the loss in output from distortionary taxation is more important from groups whose output per unit of labor is higher. (4) The marginal tax rate is lower the higher is the labor supply response. To see this, we consider the special case of a separable utility function (so UcL = 0). In that case, it is easy to show that the compensated elasticity of labor (in the absence of taxation) is just e*. 4. Some mathematical problems The maximization problems with which we have been concerned here are not well behaved. Even though the individual utility functions are, the self-selection constraints are not (in general.) This has several consequences. First, of course, there may be allocations which satisfy the first-order conditions but are not the global optimum (and indeed may be a local minimum). Secondly, if the government can implement random tax policies, it may wish to do so (see below). But even if the government cannot, or does not, wish to '7 Note that the derivative of the Hamiltonian with respect to the state value is equal to minus the derivative of the associated shadow price with respect to the running variable: -
(aX/aw)
= aH/aU=(G'-
(aC/aU)L)f++ X(L/w)ULC(aC/aU)L
For simplicity, we assume a separable utility function. This enables us to integrate to obtain X(w)/J~ =f
{(1/Uc - (G'/y) dF,
using the fact that at w = Wm,,, X(w) = 0.
For a general discussion of the conditions under which this limiting result obtains, see Sadka (1976) and Seade (1977).
1009
Ch. '5,' New New Welfare Economics C
-//
//// /
/ y
Figure 4.1. There must be a convex region to the function relating consumption to income. This induces at least a small randomization.
implement random tax policies, it must be concerned lest individuals randomize, unless, of course, the government can prohibit such gambling (which seems unlikely, since that would require that it differentiate between "naturally" occurring risks and artificially created risks).18 Recall our earlier result that the marginal tax rate on the highest income individual is zero. This, combined with the fact that his average tax rate is positive, and, if the tax system is at all redistributive, the fact that the average tax rate on poor individuals must be negative, implies that there must be a region in which the function, giving after-tax income as a function of before-tax income, must be convex (see Figure 4.1). If individuals are not too risk averse, they will, accordingly, attempt to randomize; indeed, an employer who offered any individual whose before-tax income was in the convex region a random pay could increase his profits. Unfortunately, there does not exist a correct analysis of Pareto efficient (or optimal) tax structure taking into account this class of responses to the tax structure. Note that this problem does not arise in the analysis of linear tax structures (Section 6 below).
4.1. Non-differentiability In the analysis of a continuum of types we have implicitly assumed that the optimal tax function will be differentiable. It may well not be. There is an 18
Even if it could prohibit artificially created gambles, it would simply encourage individuals to
undertake naturally occurring gambles, even if they were not quite actuarially fair.
1010
J.E. Stiglitz C
Y Figure 4.2. Points of non-differentiability in the tax schedule induce bunching (partial pooling).
interesting interpretation to these points of non-differentiability. Note in Figure 4.2 that if the consumption function is not differentiable, there will be a range of types of individuals who will not be distinguishable, i.e. who will choose precisely the same point (have the same income). Thus, it may be optimal to have a partially pooling equilibrium, and this corresponds precisely to the circumstances under which the tax function is not differentiable. To see why with three or more groups partial pooling may be desirable, turn to Figure 4.3, where there are just three groups. Assume we fix the utility of the high ability (type 2) and the low ability (type 0). As we raise U', the revenue we obtain from that group decreases (since we are moving along U 2 's indifference curve, and below E2, its slope is less than unity). But the revenue we raise from group 0
c
U
I
I_
Figure 4.3. Desirability of partial pooling.
_
--
~Y
Ch. 15: New New Welfare Economics
1011
may be increased, if the slope of U0 is less than unity at E 0. Clearly, if there are relatively few type l's, Pareto efficiency requires a pooling equilibrium in which types 0 and 1 cannot be differentiated. We can use Pontryagin's technique to analyze Pareto efficient tax structures with non-differentiabilities, and derive conditions under which partial pooling is desirable.19
4.2. Discontinuities Even if we preclude randomization (gambling), since the tax function is at least partially convex, and the indifference curves are convex, it is possible that there be multiple tangencies between the indifference curve and the function relating after-tax to before-tax income. Under these circumstances, small changes in the tax (consumption) function could cause seemingly large changes in revenues. However, if we assume a continuum of individuals, and if we give the government the right to assign a given proportion of any group which is indifferent between two allocations to each of the allocations, then it is clear that this presents no serious problem to the analysis.20 5. Generalizations: Random taxationl This chapter is concerned with the analysis of Pareto efficient tax structures. Whether a given tax structure can or cannot be improved upon depends crucially on the set of admissible tax structures. Indeed, this is perhaps one of the central lessons to emerge from the literature during the past two decades. [See also Dasgupta and Stiglitz (1971).] Unfortunately, there is no agreement about how to define the set of admissible tax structures. Presumably, this should be related to the technology of tax collection; to the costs to the government of monitoring, or to the private sector of compliance (or of non-compliance); and to the accuracy with which the government can monitor the relevant variables. And these costs are likely to differ across countries, and to change over time. Most of the analyses, however, simply begin by postulating the admissible tax structure. There are those who contend that the tax structures analyzed so far are overly restrictive, and those who contend that they are not restrictive enough. We take up each of these allegations in the next two sections. '9 See Stiglitz (1977) for an analysis in the context of the monopoly insurance problem. 20 See Mirrlees (1971, 1976, 1985) for a detailed discussion of the mathematical problems posed by the optimal tax problem. 2 'This section draws heavily upon Arnott and Stiglitz (1986) and Stiglitz (1982b).
1012
J. E. Stiglitz
R Figure 5.1. Welfare non-concave function of R. For revenues between randomization is desirable.
R
and R2 ex ante
Those who contend that we have been overly restrictive in our analysis point out that we should at least consider the possibility of random taxes. 22 It turns out, in fact, that under a variety of situations, random taxation is desirable. This can take on two forms. In ex ante randomization the government randomly assigns individuals to one of two tax schedules; if the tax is purely redistributive, the net revenue raised by one, S, is equal to the net loss of the other. More generally, the desirability of randomization can be seen most simply in the case where the government maximizes a utilitarian social welfare function. We can then plot average social welfare as a function of the revenue raised; if there is a convex region, then for some values of R (in the diagram, between R, and R 2) ex ante randomization is desirable. Arnott and Stiglitz derive conditions under which this can occur (see Figure 5.1). In ex post randomization, the individual is not told what his tax schedule is until after he has announced his ability type. This kind of randomization may be a way of loosening the self-selection constraints. If, for instance, the more able individuals were much more risk averse than the less able, by randomizing the taxes paid by the less able the government makes it less attractive for the more able to attempt to pretend to be less able. This in turn implies that the distortionary tax imposed on the poor may be reduced. This can be seen most simply by considering Figure 5.2, where we choose two points on the poor individual's indifference curve yielding the same average government revenue as the original point E. This randomization has no effect on the poor or on government revenue. But as risk aversion increases, the utility of the rich 22
Other more general tax structures include those which make taxes a function of wages. If wages are a function of effort one could not obtain a first-best outcome, even were wages observable. See also Part II below.
Ch. 15: New New Welfare Economics
1013
C
Figure 5.2. With sufficiently risk-averse individuals, ex post randomization weakens the self-selection constraint.
approaches the utility associated with the worst of the outcomes (from his perspective); hence we can effect a Pareto improvement.2 3 What are we to make of these results? Though they might indeed provide a rationale for the seeming capriciousness with which taxes are sometimes imposed, we doubt whether such a policy would ever be openly argued - and voted for - in a democratic society. Our results show that the principle of horizontal equity - that otherwise identical individuals should be treated the same - may not only not be derivable from a utilitarian social welfare function, but that that principle may in fact be inconsistent with the principle of Pareto efficiency (at least on an ex ante expected utility basis). I suspect that it is because of the possibilities of abuse-the belief that a random tax would not in fact be random, that the die would be loaded in favor of those in political power - that a proposal for tax randomization would meet such opposition.2 4 Still, the analysis of random taxation is of interest for several reasons. First, it serves to remind us of the difficulties of defining the set of admissible taxes, and that our models may be-indeed probably are-leaving out some important considerations in determining tax structures. Secondly, some of the properties of optimal tax structures may reflect an attempt to introduce randomization surreptitiously. Assume, for instance, that there were two types of labor, denoted M and W, which in all production and consumption characteristics were identical. In those circumstances in which 23I am indebted to Steve Slutsky for this argument. 24 One important exception to the general presumption against randomization is in law enforcement. It is often argued that efficient law enforcement entails large fines imposed rarely. Another exception is provided by the draft lottery.
1014
J.E. Stiglitz
ex ante randomization is desirable, the optimal tax will "tell" us to impose different tax structures on the two groups. Most of us would say that that should not be allowed. But what if the two groups are slightly different? How different should they be to make differential taxation admissible? The theory gives us no guidance on this issue. This issue arises even more forcefully in Part II, where we consider commodity taxation. Consider two groups of individuals who are identical except that one prefers chocolate ice cream, the other vanilla ice cream. In those cases where tax randomization is desirable, we will be told to tax vanilla and chocolate at different rates. But is this any more acceptable than randomizing taxes?
6. Limitations: Restricted taxation So far we have assumed that there are no restrictions on the type of consumption (wage) tax which the government can impose. The tax schedules may be highly non-linear, with marginal tax rates increasing over some intervals and decreasing over others. The tax schedules which typically emerge from these analyses are far more complex than those currently employed in the United States. Curiously enough, a major focus of tax reform discussions in the United States has been simplifying the tax structure, by having a single marginal rate (the flat tax, or linear income tax). Non-linear tax structures present problems with income averaging. The unit of taxation becomes important -under the current U.S. tax system there are strong incentives for income shifting. Moreover, decreasing marginal rates 25 provides incentives for gambling (for firms to pay random wages). 2 6 Non-linear tax schedules open up large opportunities for tax arbitrage, with associated inequities and inefficiencies. One of the major advantages of a flat rate tax is that it enables income to be taxed at source, thus reducing considerably the administrative and compliance costs of the income tax.27 We noted earlier that the underlying problem in the analysis of income taxation is an information problem: if the government could perfectly tell who is of what ability, it could impose lump-sum redistributive taxes. But the government can only make inferences about who is of what ability by the income they receive. The problems we have just been discussing can also be viewed as information problems. The government cannot perfectly and costlessly observe the income or consumption of each individual; if it could, there would be no compliance or administrative costs associated with running a non-linear tax system. 25 Recall the standard result that the marginal rate for the highest income individual should be zero; hence, the optimal tax schedule (as usually defined) entails decreasing marginal rates over at least some interval. 26 See the discussion above in Section 4. 27 For an excellent discussion of the advantages of a flat tax, see Hall and Rabushka (1983).
1015
Ch. 15: New New Welfare Economics
Opposing these objections to a highly non-linear tax are the possible increases in social welfare from the greater redistribution that can be achieved through a non-linear tax. If there were no administrative costs, etc. it is easy to show that a non-linear tax Pareto dominates a linear tax. But there is a growing consensus in the United States that the problems associated with very non-linear tax schedules outweigh these gains in redistribution. 2 8 Whether there would be significant gains in going from a flat rate tax to a two- or three-tier schedule remains a question of on-going research. A number of economists have analyzed the optimal linear tax.29 This problem is a fairly simple one: a tax schedule is described by an equation of the form C=I+(1-t)Y,
(26)
where I is the lump-sum payment to each individual, and t is the marginal tax rate. The government looks for that feasible tax structure which maximizes social welfare, that is, it solves maxfG(U(C(w), L(w))dF(w)= G(v(w(1 - t), I))dF(w)
(27)
subject to R + f{C(w) -L(w)w)
dF(w) =R + I-
tL(w)w dF(w) < 0,
(28)
where R is the revenue which the government is required to raise for financing public goods (may be zero) and where we have expressed individual utilities in terms of the indirect utility function, giving utility as a function of the after-tax wage and lump-sum transfer. The condition simply says that the aggregate value of expenditures on public goods plus private consumption equals the aggregate value of output. Note that the objective function is the same as that employed in our earlier analysis; the only difference is that whereas before, any tax function was admissible, now we are restricted to linear tax functions. Three general results can be obtained: (a) If R < 0, then the optimal linear tax entails I > 0. The marginal deadweight loss from a very small tax is negligible, but the gain in redistribution is positive. (b) If R is large enough, then I < 0; indeed, since in general an increase in a proportional tax beyond a certain point will actually decrease revenues, if R exceeds the maximal revenue that can be raised through a proportional tax, the government will have to supplement it with a uniform lump-sum levy. More generally, the greater is R, the smaller is I. 28
These views are reinforced by Mirrlees' (1971) calculations suggesting that the optimal tax structure is close to linear. 29 See, in particular, Dixit and Sandmo (1977), Sheskinski (1972), and Stiglitz (1976b). The discussion here follows closely along the lines of the last of these papers.
J.E. Stiglitz
1016
(c) The optimal tax formula can be written in a remarkably simple form: t/(1 - t) = -cov[b, Y]/
YELL dF,
(29)
where b = G'a/y + tw(aL/8I), the net social marginal valuation of income, 30
a = aU/C, the marginal utility of income, y = the marginal value of a dollar to the government (the Lagrange multiplier associated with the revenue constraint), cov(b, Y) = E(b - b)(Y- Y), the covariance between b (defined above) and Y, and eLL = compensated elasticity of labor supply.3 1 30
The first term is the direct effect of giving an extra dollar to an individual (normalized by the value of a dollar to the government), while the second term is the change in 'revenue to the government resulting from the fact that the individual will alter his labor supply as a result. 31To obtain this result, we form the Lagrangian: (30)
-=f[G+ y(rwL- I- R)]dF, and differentiate with respect to I and t to obtain: f [Ga + (tw(L/I) - )]dF=
(31)
f [G'(aU/t) + Y (wL + tw(L/at)] dF= 0.
(32)
and
The first condition can be written simply as Eb = 1.
Equation (29) follows from (32), making use of the Slutsky relation, aLal = -w(aL/w)U- wL(aL/al), the fact that aU/at = -awL, and the definition of LL = [w(1 - r)/L](aL/Ow),, to obtain: fwL(G'a -yl
- tw(L/a
)L
- t(aL/aI)
or, dividing through by y and using the fact that Eb = 1, fY(h-
Eb + t,,/(1
- )) dF=0.
) dF= ,
Ch. 15: New New Welfare Economics
1017
The covariance may be seen as a marginal measure of inequality. Consider, in particular, the case of a small tax, so b G'a/y. If we gave a lump-sum payment to everyone, financed by a tax at rate t, I= tY, where Y is mean income. The increment in social welfare is thus ftG'a(Y- Y)dF= - tycov(b, Y). Thus, our basic formula says that the tax rate should be greater, the greater this measure of marginal inequality, and the smaller the (weighted average) compensated elasticity of labor supply. The marginal measure of inequality will be greater, the more egalitarian the social welfare function, i.e. the larger is - G"U/G'.
7. Limitations: The push-pin versus poetry controversy While the New Welfare Economics emphasized the impossibility of interpersonal utility comparisons, much of the recent work in optimal tax theory has simply ignored these concerns. By positing that all individuals have the same indifference curves between leisure and consumption, they could simply assert that all had the same utility function; there seemed a "natural" way of making interpersonal utility comparisons. Three problem were thus ignored. First, there are many alternative ways of representing the same family of indifference maps. Each representation would yield a different optimal income tax. The literature has developed no persuasive way for choosing among these alternative representations. Secondly, when indifference curves cross, as in Figure 7.1, then one must decide which of individual 2's indifference curves corresponds to the indifference
I11= C,
2
U2
C2 Figure 7.1. When indifference curves cross, there is no obvious way of ascertaining which of individual 2's indifference curves correspond to U1 = a.
J. E. StiglitZ
1018
curve of individual 1 that we have labeled a. This choice too will have a critical effect on the desired income tax rate. Thirdly, conventional models have assumed that individuals who are more productive in the workplace have the same capacity for enjoyment of goods. The relationship between these two capacities troubled the early utilitarians. The fact that there was no resolution of the ensuing controversy does not mean that the issues should be ignored. In particular, the consequences can be seen most forcefully within the strand of modern economic theory which has emphasized the role of home production. Assume that individuals do not consume the goods they produce directly, but rather transform purchased goods into true consumption goods. Thus, for simplicity, assume that there is a single ultimate good which the consumer is consuming (e.g. book-reading), with U= U(Q). The output of this good, Q, is a function of the input of time (leisure) and of goods (C): the functional relationship will differ from individual to individual, depending on the abilities of individuals in household production. Such a model has important - and probably not completely acceptable - implications within a utilitarian framework. Assume that Q is Cobb-Douglas and normalise the available time for market plus home production at unity: Q = C(
- L)-ahi = wlh L(1-
L)
Then, in a free market economy, all individuals will supply labor equal to L=a. The marginal utility of income of an individual of market productivity wi and home productivity h i is equal to MUi = aU'(Q)Cf-'(1 - Li)l-ah = aa(1 - a)-' aU(Q)h ,w
Letting home productivity depend on market productivity, dlnh/dlnw=x,
hi=h(wi),
and differentiating, we obtain: din MUg dlnw
+ h'w Up h' h' ,QLa+ (a-1)+ Tw,+ Q a
as a+x-1 a+x
.
Ch. 1: New New Welfre Etomics
1019
where p is the elasticity of marginal utility. Assume, for instance, that h is proportional to w, so x = 1. Then, so long as p < a/(a + 1), there should be redistribution towards individuals with higher wages. Although h i is in principle observable (by detailed observation of the individual within the home), there is, in this example, no way we can infer from observing individual's behavior in the market place (i.e. his labor supply) his home productivity. It would appear, then, that one cannot derive, on the basis of the utilitarian ethic, a general case in favor of redistribution from those who have high wages in the market to those with low wages. Such an argument requires specific assumptions concerning the rate at which marginal utility diminishes, the relationship between home and market productivity, and characteristics of the home production process. Yet, there is more widespread agreement that there ought to be redistribution from the more able to the less able than this utilitarian analysis would suggest. These considerations - as well as the practical considerations entailed in deciding upon the appropriate cardinalization within the utilitarian framework - make the utilitarian approach questionable as a guide to policy.
8. Limitations: General equilibrium effects of taxation Traditional tax theory has been much concerned with the incidence of taxation, with the extent to which taxes are shifted. There has been concern, for instance, that taxes on doctors and lawyers simply result in higher fees. This central issue of the incidence of taxation was completely ignored in the early analyses of optimal income taxation; it was simply assumed that the before-tax wage of an individual would be unaffected by the imposition of a tax. Several recent papers have called attention to the potentially important consequences of these general equilibrium effects. 3 2 Consider, for instance, a modified version of the two-group model presented in the first section. There, we assumed that the productivity (output per man hour) of each group was fixed. Now, assume that the two groups provide different kinds of labor services (skilled versus unskilled labor). The relative wage will then depend on the relative supplies of the two types of labor. If the labor supply of the skilled is relatively elastic, and of the unskilled relatively inelastic, then a uniform marginal tax rate on both might increase the before-tax relative wage of the skilled; to offset this, one might wish to have a lower marginal tax rate on the skilled. Indeed, Stiglitz (1982b) has shown that (in the standard case where it is the high ability self-selection constraint which is binding) Pareto efficient taxation requires that the marginal tax rate on the most able individual should be negative, 32
See Allen (1982), Stern (1982), Feldstein (1973), and Stiglitz (1982b, 1985b).
1020
J.E. Stiglitz
except in the limiting case where the two types of labor are perfect substitutes, in which case it is zero. The intuition behind this result is simple. Since there is a deadweight loss from redistributing income, it is always desirable to incur some small deadweight loss to change the before-tax distribution of income in a desirable manner. Stiglitz also shows that while the marginal tax rate on the less able is always positive, its magnitude depends on the elasticity of substitution; the smaller the elasticity of substitution, the larger the marginal tax rate. The government increasingly relies on the general equilibrium incidence of the tax, the change in the before-tax relative wages, to redistribute income. Formalproof. These results can be seen more formally as follows. Assume that output is a function of the supply of hours by each of the two types:
Q= F(NLl, N2L 2) = LlNltf N2Lj'
(33)
where F exhibits constant returns to scale. If each factor receives its marginal product, W2 -
(NL
=f(n);
aF w, = a(NL
=f(n) - nf '(n) = g(n),
(34)
where n = N 2 L 2/N 1 L1 = N 2Y 2w/N 1Y 1w 2. We can now write the Lagrangian: 33 o= V 2 + [tV
1
+ y(F(NY /wl, N 2 Y2/w 2 ) - C 1N l - C2 N 2 - R) 1 (resource constraint) +X 2 (V 2 (C 2 , Y2; w2) - V 2 (C1 , Y1; W2) (self-selection constraint) + (w, - g(n)),
(35)
33 The proof presented here is rather different from that presented in Stiglitz (1982b), which re-expressed the problem in terms of the control variables L, and C,. The proof here uses the fact that > 0, i.e. welfare would be raised if unskilled workers were paid a wage in excess of their marginal product.
Ch. 15: New New Welfare Economics
1021
where we have expressed the resource constraint making use of the aggregate production function; we have appended the constraint on the value of w1 ; and we focus on the case where the upper self-selection constraint is binding. Then straightforward differentiation yields: y C2
OVa2 -
a
2
2
+ X2
V2
-
o ayl aV 2 dV 2 2 + y -= y + X2 aY2 a , aY
2
2
N2 = ,
(36a)
N2 - g'n/Y 2 = 0.
(36b)
Dividing aV2/aY2
av2/ac2
-
g'n
(37)
yN2Y2
that is, unless f" = 0, (unless the elasticity of substitution is infinity) the more productive individual should work beyond the point where the slope of his indifference curve (in C, Y space) is unity, that is, he should face a negative marginal tax rate.
9. Other limitations on the general model Two other limitations on the general model should be noted. First, we have assumed that income is a non-stochastic function of effort. Views about the desirability of progressive taxation are often related to views concerning the extent to which differences in income are due to differences in effort, to differences in ability, or to differences in luck.3 4 Secondly, we have assumed that income is monitored perfectly. In fact, measured income can be viewed as a noisy signal of true income. Using the techniques developed by Radner and Stiglitz, 35 it should be possible to show that if it is a noisy enough signal, then it will not be desirable to differentiate among individuals on that basis (using, say, a utilitarian social welfare function). The gains in redistribution will be more than offset by the costs. 36 [See also Stern (1982).] 34 For a brief discussion on optimal taxation with stochastic income, see Stiglitz (1982b), Varian (1980), Eaton and Rosen (1980), and Diamond, Helms and Mirrlees (1980). 35 See Radner and Stiglitz (1984). 36 Another line of research is concerned with ascertaining conditions for the optimal allocation of resources to monitoring.
J.E. Stiglitz
1022 Table 10.1 Utilitarian and maxi-min solutions of optimal income tax.a Maxi-min
Utilitarian Level of
Tax rates
Tax rates
Marginal
Average
Marginal
6 14 16
21 20 17
10 28 28
52 34 26
13 4 7
19 19 17
-6 22 25
50 33 25
Average Case Ih Median Top decile Top percentile Case IIb
Median Top decile Top percentile
'Source: Atkinson (1972, Table 3). bIn Case I the revenue requirement is positive, in Case II it is negative (the government disposing of the profits of public sector production).
10. Numerical calculations We have seen that only limited qualitative results concerning the nature of the optimal tax schedule have been derived. Several economists have attempted to calculate the optimal tax schedule, making particular assumptions concerning the distribution of abilities, the social welfare function, and individuals' utility functions. Not surprisingly, the results seem to vary greatly with the particular assumptions employed. The elasticity of labor supply is obviously critical in determining the extent of our concern with incentive effects, and unfortunately, there is not agreement among economists about the magnitude of this. The more egalitarian the social welfare function, the larger the role to be played by redistributive taxation. As an example, Mirrlees calculated the optimal income tax for a utilitarian social welfare function, assuming the special utility function u = log C + log(l - L), and a lognormal distribution for w. The schedule he obtained was remarkably close to linear. By contrast, Atkinson analyzed the optimal tax schedule for the same problem, assuming a Rawlsian social welfare function. He obtained increasing average but decreasing marginal tax rates.3 7 37
See Atkinson (1972). For other numerical calculations, see Stem (1976).
Ch. 15: New New Welfare Economics
1023
Welfare gains One of the major thrusts of the recent research in optimal taxation has been its concern with the incentive effects of redistributive taxation. Given an egalitarian social welfare function, there are gains from a more equal distribution of income, but there are losses from the distortions associated with the income tax. The question is: How significant are the gains relative to the costs? The answer clearly depends on at least three critical assumptions: how elastic the (compensated) labor supply is; how averse to inequality society is; and how large a share in national income government expenditure is. For a utilitarian social welfare function, with "reasonable" estimates of the compensated labor supply function, Stiglitz (1976b) suggested that the welfare gains to be achieved by the optimal linear income tax were minimal, without counting the administrative costs associated with the tax.
Part II: Pareto efficient taxation with many commodities and many periods
11. Alternative tax bases So far we have focused our attention on the analysis of an optimal wage income tax. Other tax and expenditure policies have often been employed as redistributive devices. Most governments impose taxes on such luxuries as perfumes, and many governments provide subsidies for such necessities as food. There was a widespread feeling that the government should concentrate its redistributive policies within the income tax. .Using excise taxes to redistribute simply added additional distortions. Clearly, if labor were inelastically supplied, then an optimally designed wage tax would be all that is required. But labor is not inelastically supplied, and, as Ramsey (1927) forcefully established more than fifty years ago, counting the number of distortions is no way to do welfare analysis. If we have an optimally designed income tax, should we supplement it with excise taxes? The heuristic argument, that by doing so we impose less of a burden on the income tax, seems persuasive, but unfortunately is incorrect. The answer turns out to depend simply on whether the utility function is separable between the vector of consumption goods and work, i.e. on how changes in leisure affect individual marginal rates of substitution between different consumption goods. If the utility function is separable between consumption and leisure (U = u(C) - v(L), where C is a vector), then all Pareto efficient tax structures entail no taxation of commodities.
J.E. Stiglitz
1024
11.1. Formal analysis The result can be seen most simply in the context of our two-group model. If we now interpret C as a vector, the Lagrangian for this problem is identical to that formulated earlier (Section 2.2), except the government budget constraint is now written as R < N1Y 1 + N2Y2 -
E(cjN, + C2jN 2).
(38)
J
If we now differentiate the Lagrangian with respect to Cij, we obtain see the analogous equations (12) and (14)]:
ac l
wCj -acii
ay
0Ci
A
+
a2. C
2j a
aY = ]l.+
1
l a C2i -
A .X 1
2
aY2 = -Y2
av
C2 -
(39a)
Ij
j
aV 2
aV2 ac 2 a
aV1 ac -yN =0,
aV2 -X2aC X + ±
aV1
ay
2-y22 - X
N2 = 0,
+ yN1 = 0,
22y+
(39b)
(39d)
N2 =
We again focus on the case where X1 =0 and X2 > 0: only the second self-selection constraint is binding. From (39a)-(39d) we obtain: aV
2
av
2
/aC /aC
2 2
k
aV 1 /aClj aVl/alk
2 aV /aC2j
j
(40a)
av2/a2
N1 Y + X 2 aV 2 /8Clj
(40b)
2
N 1, + X 2 aV /aClk
Equation (40a) yields the taxation on the individual however, somewhat more separable utility functions a2Ui
V1,
=1, ,
familiar result that there should be no distortionary with the highest ability. The interpretation of (40b) is, subtle. Consider first the case where individuals have between leisure and goods, i.e.
3C/3L, = 0, all i, j.
(41)
1025
Ch. 15: New New Welfare Economics
Since we assume that individuals have the same indifference curves (in (C, L } space), at the same consumption vector aV2/aCj = avl/ac,j
(42a)
aV 2 /aClk = avl/aClk,
(42b)
and equation (40b) becomes:
aVI/acj
1
(43) (43)
av1/aclk
We have thus established our basic result: If leisure and goods are separable, there should be no commodity taxation. It should be noted that in this analysis we allow tax functions which are not only non-linear functions of consumption, but are also not separable, i.e. the marginal rate imposed on the consumption of commodity j may depend not only on the consumption of commodity j but on other commodities as well. If the utility function is not separable, we obtain:
av1 aClC
a
aV2
aV244
aClk
8
aClk
CiJ
or
avl/aclj
1
A2 aV 2 /aClk
aV /aClk
1
1
V /
av2/aC / 1
,aCkaV2/aClk
X2 aV 2/aclk Vav 2 /aclj /' aVl/aClk aV 2 /aClk +
2 avl/acl;j _ Lt avl/aClk
aV'/aC 1 aVl/8Clk
aV2/aClk aVx/aClk
X2 av2/aclk av2/aclj [ avl1ack av2/aClk
avl/aClj avl/aclk
X2 av 2/aClk
t1 al/aClk
1026
J. . Stiglit
Thus, whether commodity j should be taxed or subsidized relative to k depends on whether the more able individual's marginal rate of substitution of j for k exceeds that of the low ability person, or conversely. In the case we have focused on, where the low ability and high ability individuals have the same utility function, the relative values of the marginal rate of substitution depend simply on how the amount of leisure affects the marginal rate of substitution. The critical parameter that has to be estimated, then, for determing the structure of commodity taxes is
d In
Uk/
U
dln L The result that, with separability, only an income tax is needed, which seemed so surprising at first, becomes entirely understandable within this framework; if the two groups of individuals have the same indifference curves (locally) between two commodities we cannot use the differential taxation as a basis of separation; if they differ, we can. By taxing the commodity which the more able individual values more highly in the lower ability individual's package, we make the lower ability individual's "package" less attractive to him. (Since in this model both groups have identical utility functions, the only difference in the evaluation of a given consumption bundle arises from the differences in leisure which they enjoy at any given level of income.) We thus can tax the higher ability individual more heavily without having him trying to "disguise" himself as a low ability person.
11.2. Ramsey versus Atkinson-Stiglitz Pareto efficient taxation The results we have just obtained stand in marked contrast to standard results on the design of commodity tax structures.3 8 In the simple form (where the compensated demand for each commodity depends only on its own price, and where producer prices are fixed) optimal taxation entails tax rates which are inversely proportional to the elasticity of demand; in the more general case, optimal taxation calls for an equal percentage reduction of consumption (again along the compensated demand curves) for all commodities. 39 The factors which are relevant for optimal Atkinson-Stiglitz pricing are completely different; there, all 38The list of contributors to this literature is enormous. For a comprehensive survey, see Chapter 2 by Auerbach in this Handbook. Besides the classic papers by Ramsey (1927) and Boiteux (1956), important contributions by Diamond and Mirrlees (1971), Baumol and Bradford (1970), Dasgupta and Stiglitz (1971), Atkinson and Stiglitz (1972, 1976), Diamond (1975), Meade (1955), Corlett and Hague (1953) may be noted. 39Both results assume all individuals are identical, so that distributional considerations can be ignored. See below.
Ch. 15 New New Welfare Economics
1027
that is relevant is how the marginal rate of substitution between two commodities changes as the amount of leisure changes. This poses a problem: Which theory provides the appropriate bases for the design of tax policy? How can these alternative approaches be reconciled? Ramsey's analysis (and much of the subsequent work which developed from it) was based on an artificial problem: he assumed that all individuals were identical, but that the government could not impose a lump-sum tax. As we noted earlier, there is really no reason the government cannot impose a uniform lump-sum tax; what the government cannot do is levy lump-sum taxes which vary according to the individual's ability to pay. But if all individuals are identical, then a uniform lump-sum tax is the optimal tax. Our analysis has assumed not only that the government can impose a uniform lump-sum tax, but that it can impose a non-linear consumption (wage) tax. Giving the government this additional instrument is what changes the nature of the optimal commodity tax structure. Virtually all developed countries impose some income tax, which makes Ramsey's analysis inapplicable to those countries. On the other hand, for a variety of reasons, many less developed countries do not impose a very effective income tax. For these, Ramsey's analysis might seem appropriate; but Ramsey (and the subsequent work cited in footnote 38) assumed a neoclassical model of the economy, a model which is particularly ill-suited for less developed countries. In a later section, we describe briefly recent work attempting to extend this kind of analysis to less developed countries.
11.3. Redistributive commodity taxes with optimal linear income taxes In the previous section we noted the limited role to be played by redistributive commodity taxation when there is an optimal consumption tax. Is that still true if there is an optimal linear consumption (or income) tax? The answer is provided by the generalized Ramsey Rule [Atkinson and Stiglitz (1976)]:
£
ktiSk/HCk = Ok,
(45)
where Sh is the hth household's compensated price derivative of the ith commodity with respect to the kth price; and Sk = ECOV(Ckh,
bh)/HCkb,
(46)
the normalized covariance between C h and bh. Generally, goods which are consumed by the rich will have a high (absolute) value of k. It is a measure of the extent to which the good is consumed by those with low values of social
1028
J.E. Stiglitz
marginal utility of income. H is the total number of households; Ck is the average consumption of commodity k and HCk is the aggregate consumption of commodity k. bh is the net marginal social utility of income, as defined earlier. Equation (45) says that the percentage reduction in consumption of each commodity (along the compensated demand curve) should be proportional to its distributional characteristic k . This stands in contrast to the standard Ramsey result that the percentage reduction should be the same for all commodities. 4 0 Equation (45) has some simple and remarkable implications. With linear demand curves derived from a utility function of the form
2d(Qi -i,)2
-aL,
it is easy to show that (45) implies that there should be no commodity taxation (Atkinson and Stiglitz (1976, 1980)). Deaton (1979) has shown that this holds for more general classes of utility functions, including the frequently estimated linear expenditure system. Another special case is that originally investigated by Ramsey, where there is constant marginal utility of income and separable demand functions. When wage differences are the only sources of inequality, this implies that we can write the demand curves as Ch = Ci(q,/wh), where Wh is the wage of the hth individual. Moreover, bh = 1/yw h,
where, it will be recalled, y is the marginal utility of income to the government. Now, the optimality condition (45) becomes: tk(C/wh)
=
L(b -b)(Ckh-
Ck).
(48)
h h . Let h = 1/w Taking a Taylor series approximation to the right-hand side, we obtain (setting, without loss of generality, = 1, and expressing all aggregate 40
In fact, both results are special cases of the more general result of Atkinson and Stiglitz (1976): E E ti
S h/ HC
= b0 - (1 -
.
(47)
With an optimal linear income tax, h = 1. If all individuals are identical (as in Ramsey's original analysis), Ok = 0.
1029
Ch. 15: New New Welfre Economics
results in per capita terms): 41 C, o2qk/y
while taking a Taylor series approximation to the left-hand side yields: tk
where
(CC/W )
tkCk{1 + [ -- k-
ek = - C"q/C', the
tk/qk-
mkek/2]O,
elasticity of demand, and
mk =
C "'q/C". Thus,
al2/{1 + [--k - mkEk/2]a,2) y.
Notice that for the quadratic utility function, k = mk = 0, so uniform taxation is again optimal. More generally, even to a first-order approximation, the relative tax rates will depend not only on the elasticity of demand, but also on mk, a parameter which is hard to estimate. Assuming that the demand curves exhibit constant elasticity has, in particular, strong implications for the structure of optimal commodity taxes. Finally, note that tk does not vary inversely with the elasticity of demand, even for constant elasticity demand functions. In short, if there is an income tax, the role for indirect taxation is greatly changed, and if commodity taxes are imposed, the optimal structure of these taxes may depend on parameters other than those that it depended on in the absence of an income tax, parameters for which it seems intrinsically hard to obtain reliable estimates; and even in those special cases where the only demand parameter on which the structure of taxes depends is the elasticity of demand, it does not vary simply inversely with the demand elasticity.
11.4. Otherforms of restricted taxation In the previous subsection we noted the important consequences of restrictions on taxation: the optimal commodity tax structure depended critically on whether there was or was not an income tax, and whether, if there was an income tax, it was restricted to being linear. Another important example where restrictions on taxation have an important implication for tax structure arises when the government cannot impose 100 percent profits taxes. No government does this, perhaps because of the difficulty of differentiating between true profits (in the sense that economists use that term) and return to capital or to entrepreneurship. With 100 percent profits taxes, the 41bh = vh/y, (bh -
a, - E(v - ) 2.
b)
h = (h - 1)/y; Ckh Ck(qk) + Ck(qk)(V - l)qk.
1030
J.E. Stiglitz
commodity tax structure depends simply on properties of the demand curves, and there should be no taxation of producers (which would interfere with the production efficiency of the economy). 42 The former result conflicts with Ramsey's original prescription, where optimal tax rates depended on both the demand and supply elasticities. The reconciliation was provided by Stiglitz and Dasgupta (1971), who showed that if there were not 100 percent profits taxes, producer taxation may be desirable, and optimal commodity taxes would depend on supply elasticities. Ramsey had implicitly assumed no profits taxes.
12. Implications for capital taxation Consumption at different dates is, of course, like consumption of different commodities. Thus, the Atkinson-Stiglitz results provide the suggestion of an answer to the long-standing controversy on whether interest income should be taxed. Early advocates of a consumption tax (taxing consumption only is equivalent to exempting interest) emphasized that it was both equitable (individuals should be taxed on what they take out of society - their consumption - and not what they contribute-their income 4 3 ) and more efficient (since it did not introduce the additional intertemporal distortion.)44 Ramsey's analysis of optimal taxation should have made the latter argument suspect even before Kaldor put it forward: fewer distortions are not necessarily better than more. Indeed, Ramsey's analysis suggested that whether consumption at later dates should be taxed at a higher or lower rate should depend on the elasticity of demand for consumption at later dates relative to the elasticity at earlier dates. The Atkinson-Stiglitz analysis provided a framework within which both the equity and efficiency issues could be addressed. They showed that if there was separability between consumption (at all dates) and work, then a consumption tax was Pareto efficient. Still, that did not close the matter, for theirs was a partial equilibrium model. We noted earlier (Section 8) that general equilibrium effects of tax policy must be taken into account when tax policy can affect factor prices. Partial equilibrium models cannot address the effects of taxation on savings and investment, and hence on the intertemporal distribution of income. These questions have been addressed by two sets of models. 42
This result was first established by Diamond and Mirrlees (1971). For a simple interpretation of that result see Stiglitz and Dasgupta (1971). 43 This is the position most associated with Fisher (1937). 44See, for instance, Kaldor (1955).
1031
Ch. 15: New New Welfare Economics
12.1. Overlapping generations models Stiglitz (1985b) has extended the Atkinson-Stiglitz analysis using overlapping generations. He asked: What can we say in that context about the set of Pareto efficient tax structures? Three results stand out: (1) If relative wages between the skilled and unskilled workers is fixed (independent of the capital stock), then the partial equilibrium results remain valid: provided there is separability in the utility functions, Pareto efficiency requires that there be a consumption tax.45 (2) If relative wages between the skilled and unskilled workers changes with the level of capital accumulation, the government will, in general, wish either to have an interest income tax or subsidy. The intuitive reason is analogous to that presented earlier concerning why, when there are these general equilibrium effects, it would be desirable for the government to impose a negative marginal tax rate on the more productive individuals. Here, if increasing the level of capital increases the income of the more able relative to the less able (for 45
The Lagrangian for this problem can be written
'P=
Z{Z/irr+Ax,.(U(C2 ,. L2 ,)
+y, K,+ - K, +
C,, +
-U
C,
2
(C1,. Ll,))
,F,(K,,E
where Q = F(K, E) is the aggregate production function, E the number of efficiency units of labor ( E = vN 1 L1 + N2 L,2 where is the ratio of the productivity of the two types of laborers), where y, is the Lagrange multiplier for the resource constraint at time t; where C,,, is the consumption of the i th type in the t th generation the first year of its life; C,, ,+1 is its consumption in the second year of its life: and, as in the standard overlapping generations model, we assume individuals live for only two periods, working only in the first. C,, { Ci,,, C, ,t+ I }. Note that we have written the self-selection constraint in a slightly different form than in our earlier analysis. We define L, as the labor that the more able would have to supply, in order to imitate the less able (recall that we are focusing on the case when it is the more able's self-selection constraint that is binding). Clearly, L, = L2,v. We obtain from the first-order conditions for C,, and L,, that
i.e. the marginal rate of substitution must equal the ratio of the shadow prices of resources; but i.e. the marginal rate of substitution must equal the ratio of the shadow prices of resources; but differentiating the Lagrangian with respect to K, we obtain: Y,+l(l + FK(Kt+,
E,+l)) =y,,
from which the indicated result immediately follows.
1032
J.E. Stiglitz
instance, if unskilled labor and capital are substitutes, while skilled labor and capital are complements) then the government may wish to discourage capital accumulation, and it will use tax policy to do this.4 6 (3) If the government cannot completely control the level of capital stock, through debt or social security policy, 47 as seems to be the case, then the government will, in general, wish to use tax policy, e,g. interest income subsidies, if it wishes to encourage savings, taxes if it wishes to discourage accumulation.
12.2. Infinitely lived individuals The other model which has been intensively investigated is that 9 f infinitely lived individuals [Judd (1985), Chamley (1980), Stiglitz (1985b)]. The first two of these studies are extensions, in this context, of the standard Ramsey analysis, and the same criticisms can be raised: if individuals are identical, there is no reason not simply to impose a uniform lump sum tax. Still, their result that asymptotically, there should be no interest income tax, is of interest. 48
46 The analysis follows by writing the constant returns to scale production function Q= F(K, El , E2 ) where E, = LiNt, with the two types of laborers being imperfect substitutes. With this modification, the Lagrangian remains the same as that presented in footnote 45 except that now, we note that, v, relative wages, are a function of K. This means that the first-order condition for K now becomes:
3U2 'C
Y +I(1
+ FK) =,
+ X2-.,
l a
(lt+l,t+1,
Clt+l,t+2 Li,+l)K
The other first-order conditions remain unchanged. It immediately follows that the marginal rate of substitution is now not equal to the marginal rate of transformation. 47 See Atkinson and Stiglitz (1980) for a discussion of how this could be done. 45 The result is in one sense stronger than the earlier results for the overlapping generation model, and in one sense weaker. It is stronger, in that it does not require separability; it is weaker in that it is only valid asymptotically. (The earlier results characterized all Pareto efficient tax structures, and held at each moment of time.) The simplest way to see these results is to write the Lagrangian
,'=E ULS'+ flt[Uc, - S(1 + rr)UCr,,l
]
(f.o.c.)
+y,[ F(K,, L,) - (C, + K,+, - K,) - R,] (resource constraint) +4 [ wo Lo - C, +
(wL, - E
C, )/r(1
+ r,)] (budget constraint)
where we have embedded the individual's first-order condition for allocating income among different periods; is the pure rate of time preference and where is the Lagrange multiplier associated with
1033
Ch. 15: New New Welfare Economics
But, as in the Atkinson-Stiglitz analysis, as soon as there are general equilibrium effects of taxation, this result no longer holds; that is to say, if there are two types of individuals, and the government cannot offset the changes in relative wages induced by tax policy, then it will wish to take these into consideration; even asymptotically, an interest income tax or subsidy will, in general, be desirable." the individual's budget constraint. The first-order conditions are thus ay, aC =' Uc,- y,+l,Ucc
-
+(1a
r,)+
,+l +r
= 0,
aK, =, (l + FK)- ,=O. In the steady state, (l+r,)=l,
Uc,ct+l,
t= ,
A=t+
so Y+l
1
1
y,
l+r,
1+FK
The consumer rate of interest converges to the producer rate of interest (the marginal rate of transformation): there is no interest income taxation. 49 This result is, in some sense, more surprising than the earlier result, because it implies that, viewed as of time 0, the ratio of consumer and producer prices approaches (asymptotically) either zero or infinity, although each price itself is also approaching zero. To derive this result, we assume the government must impose uniform taxation on wages and interest. This imposes two additional sets of constraints: 1 + rl, = 1 + r2 , and
W
FL
= V(K, El, E2 ).
We reformulate the Lagrangian, appending these two additional constraints. Letting m, and , be the Lagrange multipliers associated with the two constraints, the derivative of the Lagrangian with respect to K is now yt+(1
+ FK) = Yl + CtPK,
and it immediately follows that the government will wish to take into account the effect of a change in capital accumulation on the distribution of before-tax income. If capital and unskilled labor are substitutes, then the government will wish to discourage the accumulation of capital, and will thus impose an interest income tax.
1034
J.E. Stiglitz
13. Imperfect capital markets Throughout this discussion (as in most of the rest of the literature) we have assumed a perfect capital market, i.e. that all investors obtain the same return. There is considerable evidence that this is not true. The reasons for this are not hard to find: there is imperfect information concerning who are good investors. In an imperfect capital market, then, individuals will be characterized not only by their labor productivity, but also by their capital productivity. 5 0 We then will wish to use self-selection mechanisms not only to determine who among the population are more productive laborers but also who are more productive investors. Assume, for simplicity, that individuals differed only in their investment productivity. The results [Stiglitz (1985b)] are parallel to those obtained earlier: the government will wish to impose an interest income tax on the less able investors, but no interest income tax on the more able.
14. Altruism and inheritance taxation In the life-cycle model, it is assumed that individuals do not care at all for their children. One interpretation of the model with infinitely lived individuals is that there is a dynastic utility function: the parent cares about his child, the child about his child, and so on. In the simplest form, we can assume that the utility of the t th generation is just U= U(Cm L) + Utl' where a is the weight assigned to the utility of the children. Then, successive substitution yields the utility function facing the current generation: U.= EaTU(C,+T, L,+T) T
In this interpretation the only savings are bequests, and hence a tax on savings is equivalent to a tax on bequests. The analysis of Pareto efficient taxation is little affected by the introduction of these interdependencies in utilities: we are assumed to keep the utility levels of all but one type of individual fixed. Thus, if we fix the utility of all future 50In a perfect capital market, the most productive investors will invest all of society's resources, and thus the return to capital will be the same for all.
Ch. 15: New New Welfare Economics
1035
generations, then maximizing the utility of the current generation is equivalent to maximizing the utility he derives from his direct consumption, which is precisely the problem we have previously solved. To answer, however, the question about whether we wish to impose an inheritance tax, we must first answer some difficult questions concerning the formulation of the social welfare function. If each generation's utility - including their valuation of future generation's utility - enters the social welfare function, we obtain:
W= Ew=uu,'= 3 =E
_/a- 1I
U(C,, L,),
where we have weighted future generation's total utility by a factor 8. Notice now that there is a discrepancy between the way the government values future utility and how individuals do. Since giving increases the utility both of the giver and the receiver, it is doubly blessed in our social welfare function, and the government will seek to encourage it, through a bequest subsidy.
15. Commitment In the intertemporal problems we have analyzed so far, we have assumed that the government can commit itself to a tax structure. The government will, however, in general be able to identify who the more able are from actions that they take early in life. Thus, if it were not committed to a given tax structure, the government would use this information to impose a lump-sum tax on such individuals later on in their life. Recently, Britto, Hamilton, Slutsky and Stiglitz (1986) have investigated the consequences of assuming that the government cannot commit itself. Then individuals know that if they reveal who they are early in their life, they will be confronted with a lump-sum tax. The government must take this into account in designing its tax structure. Britto et al. show that in general, the tax structure is characterized by three phases: in the first, there is a pooling equilibrium, i.e. all individuals are treated identically; then there is a period in which separation occurs, i.e. there is distortionary taxation; after that, the government imposes a lump-sum tax. Because individuals know that there will be a lump-sum tax later, it is harder to induce the more able to reveal (through their actions) who they are; the effective distortion in the one period in which the identification occurs is greater. Indeed, in some cases, where discount rates are low enough, with infinitely lived individuals, the only feasible equilibria
J.E. Stiglitz
1036
may entail pooling. (When there are many groups, there will be partially separating/partially pooling equilibria.)
Part III: Concluding comments
16. Optimal taxation in non-neoclassical economies The analysis of optimal tax structures prior to 1980 focused on models of competitive (neoclassical) economies. Pigou (1947) had earlier emphasized the use of taxation to correct market distortions (such as externalities). The optimal tax structure thus might be markedly different in an economy with monopolies, externalities, and other imperfections. One of the important lessons to emerge from the literature on optimal taxation was the important consequences of interdependencies in demand; corrective taxation needs to take these interdependencies into account as well. Arnott and Stiglitz (1986) provide an example of how this may be done; economies in which there are moral hazard problems, i.e. in which the provision of insurance affects the probability of the occurrence of the insured-against event, are almost always constrained Pareto inefficient;51 there always exist a set of taxes and subsidies which can make everyone better off. They analyze the structure of the optimal set of corrective taxes. Similarly, when there are monopolies, expressions for the optimal tax can be derived which can be thought of as consisting of two parts; in addition to the standard terms, there are those correcting the monopoly distortions.
Less developed economies We have also emphasized throughout our analysis the importance of identifying the set of admissible tax instruments. This is likely to depend on the structure of the economy; it is much easier to monitor many transactions in developed economies than in less developed countries. Thus, few LDCs have an effective income tax, and the taxation of wage income in the rural sector appears to be a virtual impossibility. Accordingly, the analysis of the design of tax structures in LDCs differs in two important respects from the analysis we have presented earlier: (a) the set of tax instruments is different; and (b) there may be important distortions in the economy, e.g. wages may be set at above market-clearing levels 5 l The result on the constrained Pareto inefficiency of the economy with imperfect information and incomplete risk markets is in fact more general. See Greenwald and Stiglitz (1986).
Ch. 15: New New Welfare Economics
1037
in the urban sector. 52 In a series of papers, Braverman, Sah, and Stiglitz 5 3 have investigated the structure of optimal taxes and prices under a variety of assumptions concerning the structure of the LDC economy. In their analyses they have attempted to characterize Pareto improvements, policies which would make everyone better off, as well as policies which would increase social welfare, under weak assumptions concerning the structure of the social welfare function.
17. Concluding remarks The literature which we have surveyed in this chapter has, I believe, cast problems of the design of tax structure into a new perspective. It has emphasized the informational limitations on the government. And it has attempted to ascertain what economic theory can say about the design of tax structures without imposing welfare judgments, i.e. to identify Pareto efficient tax structures. As in so many other areas of economic analysis, there are perhaps more negative results than positive ones. Which taxes are not admissible has been shown to be a crucial determinant of the rates at which the remaining, feasible taxes should be set. Thus, a critical difference between Ramsey's earlier analysis and more recent analyses, such as that of Diamond and Mirrlees, is that Ramsey assumed that there were no profits taxes, while Diamond and Mirrlees assumed that there were no profits (or, equivalently, profits were taxed at 100 percent). Neither is a good assumption for most countries. And both analyses assumed the absence of an income tax. Though traditional arguments against commodity and interest income taxation, based on counting the number of distortions, are not persuasive, the theory of self-selection shows that the nature of commodity taxation should depend simply on how the marginal rate of substitution between two commodities is affected by the amount of leisure; in the central case of separability, no commodity taxes should be imposed. Although this analysis has shown that the widely held view that in the second-best world nothing can be said is incorrect - we have obtained some quite general qualitative propositions -we have also shown that those qualitative results are, in turn, quite dependent on the precise assumptions concerning the set of admissible taxes. The theory itself has, however, had little to say about the determinants of the set of admissible taxes. 52
This may not, however, be simply because of unions or other institutional considerations. It may be because the productivity of workers depends on the wage they receive, and thus firms find it profit maximizing to set wages at above market clearing levels. [See Stiglitz (1974, 1982a).] 53See in particular, Braverman, Sah and Stiglitz (1982), Sah and Stiglitz (1984, 1986, 1987), and Braverman, Ahn and Hammer (1983). The latter study entails an empirical investigation of the consequences of different tax structures for the Korean economy. Other studies investigating the consequences of different tax structures for other countries, using the same basic framework, are presently underway at the World Bank.
1038
J.E. Stiglitz
Though optimal consumption tax structures are, in general, non-linear, the presence of non-linearities increases the administrative problems associated with the tax system. Whether the gains are worth the cost remains unproven. The suggestion contained in some of the numerical calculations, namely that the overall welfare gains from optimal taxation (using a utilitarian welfare criterion) are small and that the optimal tax structure may be close to linear, indicates that it may not be unreasonable to focus attention on linear tax structures. Moreover, Pareto efficient non-linear structures have some properties that may make them politically unacceptable: the marginal tax rate on the highest income individuals is negative, except in the limiting case where the elasticity of substitution between laborers of different skills is infinite, in which case the marginal tax rate on the highest income individual should be zero. The focus on the incentive effects of taxation has, however, provided key insights: it has enabled us to identify the parameters which determine the magnitude of the deadweight losses associated with any tax system. The trade-offs on which this literature has focused in the design of the tax structure are at the core of many of the recent policy debates. Negative income tax proposals typically increase marginal tax rates for some individuals while reducing it for others, and an assessment of these proposals requires a balancing of the efficiency and equity effects on these different groups. On the other hand, many of the central policy issues transcend these models, and the disparity between these may provide suggestions for future directions for research. We have already referred to one set of such issues: those related to administrative problems. The information problems associated with monitoring and assessing income to capital may be no less important than those with which we have been concerned here. The view that by encouraging savings and investment growth will be enhanced and thus all individuals, including the poor, will be better off, is partly reflected in our analysis of the general equilibrium effects of taxation, but only partly so. In particular, we have paid little attention to R&D, to entrepreneurship and risk-taking, or to the consequences of imperfect capital markets. Though the work discussed here suggests that there are distinct advantages to imposing differential tax rates on different groups, whose labor supply characteristics differ (and that ex ante expected welfare may even be increased by imposing differential tax rates on groups whose characteristics do not differ), popular views of fairness seem inconsistent with these results produced by the utilitarian calculus. 5 4 Though utilitarianism has often been thought to provide a basis for choosing among alternative Pareto efficient tax structures, to do so requires choosing a 54 We have, moreover, consistently ignored any political economy considerations. These might suggest that constitutional restrictions on the set of admissible taxes may be welfare enhancing, for instance when the rates and terms at which those taxes are imposed are determined by majority voting.
Ch. 15: New New Welfare Economics
1039
particular cardinalization, and no persuasive basis for choosing among alternative cardinalizations has been provided." Moreover, most analyses, while assuming that individuals differ in their productivity in the market place, ignore differences in home productivity. When these are taken into account, the case for redistributive taxation becomes even weaker. Thus, while the original goal of this line of research, which was to provide a "scientific" basis for arguing for a progressive tax structure and for analyzing the design of tax systems, has not been achieved - and does not seem achievable - important insights have been gleaned, which should enable governments to make better choices of tax policies in the future.
References Allen, F., 1982, Optimal linear income taxation with general equilibrium effects on wages, Journal of Public Economics 17, 135-143. Arnott, R. and J.E. Stiglitz, 1985, Randomization with asymmetric information: A simplified exposition, Institute for Economics Research Discussion Paper No. 594 (Queen's University). Arnott, R. and J.E. Stiglitz, 1986, Moral hazard and optimal commodity taxation, Journal of Public Economics. Atkinson, A., 1972, Maxi min and optimal income taxation, paper presented at the Budapest Meeting of the Econometric Society. Atkinson, A., 1973, How progressive should income tax be?, in: M. Parkin and A.R. Nobay, eds., Essays in modern economics (Longman, London). Atkinson, A., 1977, Optimal taxation and the direct versus indirect tax controversy, Canadian Journal of Economics 10, 590-606. Atkinson, A. and A. Sandmo, 1977, The welfare implications of personal income and consumption taxes, The Economic Journal. Atkinson, A. and J.E. Stiglitz, 1972, The structure of indirect taxation and economic efficiency, Journal of Public Economics 1, 97-119. Atkinson, A. and J.E. Stiglitz, 1976, The design of tax structure: Direct versus indirect taxation, Journal of Public Economics 6, 55-75. Atkinson, A. and J.E. Stiglitz, 1980, Lectures in public economics (McGraw-Hill, New York). Azariadis, C, and J.E. Stiglitz, 1983, Implicit contracts and fixed price equilibria, Quarterly Journal of Economics (Supplement) 1-22. Baumol, W.J. and D.F. Bradford, 1970, Optimal departures from marginal cost pricing, American Economic Review 60, 265-283. Berry, B., 1973, The liberal theory of justice (Oxford). Blum, W. and H. Kalven, 1963, The uneasy case for progressive taxation (University of Chicago Press, Chicago). Boiteux, M., 1956, On the management of public monopolies subject to budgetary constraints, Journal of Economics Theory 3, 219-240. 55We have also not entered into the debate concerning the appropriateness of the utilitarian approach on more fundamental philosophical grounds. We have noted Rawls' views suggesting that utilitarianism is insufficiently equalitarian. Brian Berry, while disagreeing with much of Rawls' work, remarks, "... Rawls, in taking utilitarianism as the main rival to his own account, is flogging an almost dead stalking-horse." One can perhaps best view the utilitarian calculus as simply providing a convenient way of summarizing the effects of a policy change [see Stiglitz (1986a)]. Notice that our results on Pareto efficient tax structures are not liable to most of the criticisms leveled against the utilitarian approach.
1040
J.E. Stiglitz
Boskin, M. and E. Sheshinski, 1978, Optimal income redistribution when individual welfare depends on relative income, Quarterly Journal of Economics, 92, 589-602. Braverman, A., C.Y. Ahn and J.S. Hammer, 1983, Government deficit reduction and alternative agricultural pricing policies in Korea, unprocessed mimeo (The World Bank, Washington, D.C.). Braverman, A., R. Sah and J.E. Stiglitz, 1982, The town-versus-country problem: Optimal pricing in an agrarian economy, presented to a conference at the World Bank. Brito, D. and W.H. Oakland, 1977, Some properties of the optimal linear tax, International Economic Review, 407-424. Britto, D., J. Hamilton, S. Slutsky and J.E. Stiglitz, 1986, Information and multi-period optimal taxation with self-selection, Mimeo (Princeton University). Carruth, A.A., 1982, On the role of the production and consumption assumptions for optimum taxation, Journal of Public Economics, 17, 145-156. Chamley, Christophe, 1980, Optimal intertemporal taxation and the public debt, Cowles Foundation Discussion Paper 554. Cooter, R. and E. Helpman, 1974, Optimal income taxation for transfer payments under different social welfare criteria, Quarterly Journal of Economics, 88, 656-670. Corlett, W.J. and D.C. Hague, 1953, Complementarity and the excess burden of taxation, Review of Economics Studies 10, 295-336. Dasgupta, P. and J.E. Stiglitz, 1974, Benefit-cost analysis and trade policies, Journal of Political Economy 82, 1-33. Deaton, A.S., 1979, Optimal taxes and the structure of preferences, Economic Letters. Deaton, A. and N. Stern, 1986, Optimally uniform commodity taxes, taste differences, and lump-sum grants, Economics Letters, 20, 263-266. Diamond, P.A., 1975, A many-person Ransey Tax Rule, Journal of Public Economics 4, 283-299. Diamond, P., L. Helms and J. Mirrlees, 1980, Optimal taxation in a stochastic economy: A Cobb-Douglas example, Journal of Public Economics, 14, 1-29. Diamond, P.A. and J.A. Mirrlees, 1971, Optimal taxation and public production I-II, American Economic Review 61, 8-27, 261-278. Dixit, A.K. and A. Sandmo, 1977, Some simplified formulae for optimal income taxation, Scandinavian Journal of Economics 79, 417-423. Eaton, J. and H. Rosen, 1980, Labor supply, uncertainty, and efficient taxation, Journal of Public Economics 14, 365-374. Edgeworth, F.Y., 1868, Mathematical physics. Edgeworth, F.Y., 1897, The pure theory of taxation, Economic Journal 7, 46-70, 226-238 and 550-571 (reprinted in Edgeworth, 1925). Fair, R., 1971, The optimal distribution of income, Quarterly Journal of Economics, 85, 551-579. Feldstein, M.S., 1973, On the optimal progressivity of the income tax, Journal of Public Economics 2, 357-376. Fisher, I., 1937, Income in theory and income taxation in practice, Econometrica 5, 1-55. Greenwald, B. and J.E. Stiglitz, 1986, Externalities in economies with imperfect information and incomplete markets, Quarterly Journal of Economics, May, 229-264. Guesnerie, R., 1975, Production of the public sector and taxation in a simple second best model, Journal of Economic Theory 10, 127-156.. Guesnerie, R. and J. Seade, 1982, Nonlinear pricing in a finite economy, Journal of Public Economics, 17, 157-180. Hall, R.E. and A. Rabushka, 1983, Low tax, simple tax, flat tax (McGraw-Hill, New York). Helpman, E. and E. Sadka, 1978, The optimal income tax: Some comparative statics results, Journal of Public Economics, 9, 383-394. Judd, K.L., 1985, Redistributive taxation in a simple perfect foresight model, Journal of Public Economics 28, 59-83. Kaldor, N., 1955, An expenditure tax (Allen and Unwin, London). Lerner, A.P., 1944, The economics of control (Macmillan, New York). Meade, I.E., 1955, Trade and Welfare: Mathematical Supplement (Oxford University Press, Oxford). Mirrlees, ., 1971, An exploration in the theory of optimum income taxation, Review of Economics Studies 38, 175-208. Mirrlees, J., 1976, Optimal tax theory: A synthesis, Journal of Public Economics 6, 327-358.
Ch. 15: New New Welfare Economics
1041
Mirrlees, J. 1985, The theory of optimal taxation, in: K.J. Arrow and M.D. Intriligator, eds., Handbook of Mathematical Economics (Elsevier Science Publishers B.V.) vol. 3. Myerson, R.B., 1983, Incentive compatibility and the bargaining problem, Econometrica, 47, 175-208. Ordover, J. and E. Phelps, 1979, The concept of optimal taxation in an overlapping generations model of capital and wealth, Journal of Public Economics, 12, 1-26. Phelps, E.S., 1973, The taxation of wage income for economic justice, Quarterly Journal of Economics 87, 331-354. Pigou, A.C., 1947, A study in public finance, 3rd ed. (Macmillan, London). Radner, R. and J.E. Stiglitz, 1984, A nonconcavity in the value of innovation, in: M. Boyer and R. Khilstrom, eds., Bayesian models in economic theory (Elsevier Science Publications). Ramsey, F.P., 1927, A contribution to the theory of taxation, Economic Journal 37, 47-61. Rawls, J., 1971, A theory of justice (Harvard University Press, Cambridge, Mass.). Rawls, J. 1974, Concepts of distributional equity: Some reasons for the maximum criterion, American Economic Review, Papers and Proceedings, 64, 141-146. Rothschild, M. and J.E. Stiglitz, 1976, Equilibrium in competitive insurance markets, Quarterly Journal of Economics 90, 629-650. Sadka, E., 1976, On income distribution, incentive effects and optimal income taxation, Review of Economic Studies 43, 261-268. Sah, R. and J.E. Stiglitz, 1984, The economics of price scissors, American Economic Review 74, 125-138. Sah, R. and J.E. Stiglitz, 1986, Taxation and pricing of agricultural and industrial goods in developing economies, forthcoming in: D. Newbery and N. Stern, eds., Modern tax theory for developing countries. Sah, R. and J.E. Stiglitz, 1987, Price scissors and the structure of the economy, Quarterly Journal of Economics. Seade, J.K., 1977, On the shape of optimal tax schedules, Journal of Public Economics 7, 203-236. Sheshinski, E., 1972, The optimal linear income tax, Review of Economic Studies, 39 297-302. Slemrod, J. and S. Yitzhaki, 1983, On choosing a flat-rate income tax system, National Tax Journal, 36, 31-44. Slemrod, J., Do we know how progressive the income tax should be?, National Tax Journal, 36, 361-369. Stern, N., 1976, On the specification of models of optimum income taxation, Journal of Public Economics 6, 123-162. Stern, N., 1982, Optimum taxation with errors in administration, Journal of Public Economics, 1982, 181-211. Stiglitz, J.E., 1974, Alternative theories of wage determination and unemployment in L.D.C.'s: The labor turnover model, Quarterly Journal of Economics, LXXXVII, 194-227. Stiglitz, J.E., 1975, Information and economic analysis, in: Parkin and Nobay, eds., Current economic problems (Cambridge University Press, Cambridge) 27-52. Stiglitz, J.E., 1976a, Prices and queues as screening devices in competitive markets, IMSSS Technical Report No. 212 (Stanford University). Stiglitz, J.E., 1976b, Simple formulae for the measurement of inequality and the optimal linear income tax, IMSSS Technical Report No. 215 (Stanford University). Stiglitz, J.E., 1977, Monopoly, non-linear pricing and imperfect information: The insurance market, Review of Economic Studies, 44, 407-430. Stiglitz, J.E., 1982a, Alternative theories of wage determination and unemployment: The efficiency wage model, in: M. Gersovitz, C.F. Diaz-Alejandro, G. Ranis and M. Rosenzweig, eds., The theory and experience of economic development: Essays in honor of Sir Arthur W. Lewis (George Allen & Unwin, London) 78-106. Stiglitz, J.E., 1982b, Self-selection and Pareto efficient taxation, Journal of Public Economics 17, 213-240. Stiglitz, J.E., 1982c, Utilitarianism and horizontal equity: The case for random taxation, Journal of Public Economics 18, 1-33. Stiglitz, J.E., 1983, Some aspects of the taxation of capital gains, Journal of Public Economics 21, 257-294. Stiglitz, J.E., 1985a, The general theory of tax avoidance, National Tax Journal 38, 325-338.
1042
J.E. Stiglitz
Stiglitz, J.E., 1985b, Inequality and capital taxation, IMSSS Technical Report No. 457 (Stanford University). Stiglitz, J.E., 1986a, Economics of the public sector (W.W. Norton, New York). Stiglitz, J.E., 1986b, Theories of wage rigidities, in: J.L. Butlerewicz, K.J. Koford and J.B. Hiller. eds., Keynes' economic legacy (Preager Publishers, New York). Stiglitz, J.E. [forthcoming], The wage-productivity hypothesis: Its economic consequences and policy implications, paper presented to the American Economic Association, December 1982 and forthcoming in: M. Boskin, ed., Essays in honor of A. Harberger. Stiglitz, J.E. and P. Dasgupta, 1971, Differential taxation, public goods, and economics efficiency, Review of Economic Studies 38, 151-197. Varian, H., 1980, Redistributive taxation as social insurance, Journal of Public Economics 14, 49.
Chapter 16
TAX INCIDENCE LAURENCE J. KOTLIKOFF AND LAWRENCE H. SUMMERS National Bureau of Economic Research
0. Introduction The incidence of taxes is a fundamental question in public economics. The study of tax incidence is, broadly defined, the study of the effects of tax policies on the distribution of economic welfare. It bridges both the positive and normative aspects of public economics. Studying tax incidence requires characterizing the effects of alternative tax measures on economic equilibria. Tax policy decisions are based, at least in part, on their effects on the distribution of economic welfare. It is, therefore, little wonder that the study of tax incidence has attracted the attention of economic theorists, at least since Ricardo's discussion of taxes on rent. Certainly the study of the incidence of different types of tax policies in various economic environments continues to be an area of active research. The distinctive contribution of economic analysis to the study of tax incidence has been the recognition that the burden of taxes is not necessarily borne by those upon whom they are levied. In general, the introduction of taxes, or changes in the mix of taxes, changes the economy's equilibrium. Prices of goods and rewards to factors are altered by taxes. In assessing the incidence of tax policies, it is necessary to take account of these effects. Changes in prices can lead to the shifting of taxes. Thus, for example, a tax on the hiring of labor by business may be shifted backwards to laborers in the form of lower wages or forward to consumers in the form of higher prices. The measurement of tax incidence is not an accounting exercise; rather it is an analytical characterization of economic equilibria under alternative assumptions about taxation. Tax incidence is a part of the very broad study of how exogenous interventions affect the economy and is necessarily predicated on a theory of economic equilibrium. As such, tax incidence conclusions are critically dependent on which Handbook of Public Economics, vol. II, edited by A.J. Auerbach and M. Feldstein © 1987, Elsevier Science Publishers B. V. (North-Holland)
1044
LJ.. Kotlikeotr and . H_ Sunimmer
theory of economic equilibrium is chosen. We follow the main thrust of the literature in studying the effects of taxes in competitive economies where markets clear. This assumption has been adopted extensively not so much for its realism as because of the absence of widely accepted, fully articulated alternatives to the competitive paradigm. Even maintaining the assumption of perfect competition and market clearing, the question of tax incidence has been approached in many ways. This is a consequence of both the richness of the problem and the generality of models of competitive equilibrium. The incidence of a wide variety of tax instruments ranging from estate taxes, to excise taxes, to the corporate tax is of interest. Incidence has many dimensions that have attracted attention. These include the effects of taxes on the distribution of factor incomes, the degree of income inequality, the welfare of members of different generations, and the consumers of different products. Given the choice of model and the issue of concern, tax incidence results are generally ambiguous without additional restrictions on the precise nature of preferences and technology. As Arrow and Hahn (1971) emphasize, without further assumptions, virtually all comparative static experiments have ambiguous effects in the standard model of competitive equilibrium. Our survey of the tax incidence literature is organized as follows. Section 1 reviews traditional approaches to the study of tax incidence. These include partial equilibrium analyses and studies based on the judgmental allocation of tax burdens to different population groups. A number of issues in the tax incidence literature, including the proper alternative to be considered in defining the incidence of a tax and the dimensions along which incidence should be measured, are also discussed. Section 2 introduces the general equilibrium analysis of tax incidence. A general equilibrium approach is necessary to treat the incidence of taxes which impact on large parts of the economy. The sensitivity of judgments about the incidence of taxation to a large number of elasticities is stressed. Section 3 takes up the incidence of taxes in open economies. These may be thought of either as regions within a single country, or as different nations. We show that the incidence of alternative taxes hinges on what is assumed mobile. We also discuss the controversial question of the incidence of local property taxes. Section 4 places the analysis of tax incidence in a dynamic context. This step is important for several reasons. The long-run incidence of any tax change will depend critically on its effects on capital accumulation and the resulting marginal productivities of capital and labor. The short-run burdens borne by the owners of productive assets will depend, in large part, on the instantaneous revaluation of these assets arising from both current and anticipated future tax changes. Perhaps most importantly, studying the intergenerational incidence of tax changes (the tax burden on successive generations) requires a dynamic model.
Ch. 16: Tax Incidence
1045
1. Preliminaries
1.1. The partial equilibrium analysis of tax incidence
Many of the fundamental principles of tax incidence may be illustrated in the simplest partial equilibrium setting. We therefore begin by considering the partial equilibrium analysis of an excise tax on a product. As discussed below, for partial equilibrium analysis to be appropriate, it is necessary that the product in question have a market that is small relative to the entire economy. The analysis is depicted in Figure 1.1. In the absence of taxation, equilibrium is attained where supply equals demand and the equation, D(p) = S(p),
(1.1)
is satisfied. Now consider the introduction of an excise tax at rate . If the tax is collected from buyers, the new equilibrium will satisfy the condition that D(p' +
) = S(p'),
(1.2)
while if the tax is collected from sellers, the new equilibrium will satisfy the condition that D(p") = S(p"-
T).
(1.3)
Comparing equations (1.2) and (1.3), it is clear that the determination of the p
Figure 1.1. Incidence of a commodity tax.
1046
L.J. Kotlikoff and L. H. Summers
equilibrium quantity, the price paid by consumers, and the net of tax receipts of producers does not depend on which side of the market the tax is levied, i.e. p" =p' + T. Diagrammatically, the imposition of a tax can be analyzed as a shift in the demand curve facing suppliers, as in Figure 1.1, or as a shift in the supply curve facing consumers. The resulting equilibrium is the same in both instances. This principle, that the incidence of a tax does not depend on which side of the market it is levied, carries over to much more general contexts and underlies the general equilibrium tax equivalence results that we discuss below. It follows immediately from this tax equivalence principle that the ultimate incidence of a tax cannot be assessed simply by looking at where the tax is proximately levied. For, as we have seen, shifting the tax assessment between consumers and producers has no real effects. The real equilibrium is invariant to whom the government requires mail in the tax payment. In order to examine the incidence of an excise tax we begin by characterizing the change in equilibrium that results from the imposition of the tax. For convenience we think of the tax as being collected from consumers. Differentiating (1.2) yields: dp dp dT
D' S-DI S'- D"
(1.4)
where D' = dD(p)/dp, and S' = dS(p)/dp. More conveniently, we can write: dp
nD
_
(1.5)
where qDo represents the elasticity of demand, and % represents the elasticity of supply. At this point we are ready to assess the incidence of an excise tax. Consider the changes in consumers' and producers' surplus arising from introducing a small tax. For a small tax the change in consumer surplus equals minus the change in the consumer price times the initial quantity demanded, and the change in producer surplus equals the change in the producer price times the initial quantity supplied. Hence, we have dCs
1,D(p) 9s - 90
d7
(1.6)
and dPs TdT d*
_'DS(p)
%7,- lO'I 1qD(1.7) %
where the Cs and Ps represent, respectively, consumers' and producers' surplus.
Ch. 16: Tax Incidence
1047
Note that for a small tax change starting with no tax, minus the sum of the changes in consumers' and producers' surplus equals the tax revenue since D(p) = S(p), and the change in tax revenue equals dTS(p). In terms of Figure 1.1 the change in consumer surplus is the area abcd, and the change in producer surplus is dcef. For very small taxes the sum of these taxes is essentially abef, the tax revenue. Equations (1.6) and (1.7) illustrate a fundamental principle that will recur frequently in our discussion of tax incidence. Taxes tend to be borne by inelastic suppliers or demanders. It is instructive to consider the limiting cases of (1.6) and (1.7). If demand is completely inelastic or supply perfectly elastic, consumers will bear the entire burden of an excise tax. Conversely, if supply is perfectly inelastic or demand is perfectly elastic, the entire excise tax will be borne by suppliers. More generally, taxes are borne by those who can not easily adjust. The greater buyers' abilities to substitute other commodities for the taxed commodity, the greater their ability to shift taxes. Likewise, if producers have no fixed factors and can leave an industry where taxes are being levied, their supply curve is perfectly elastic and the tax must be borne by consumers. For if sellers were forced to bear the tax, they would earn a sub-normal rate of return leading them to cease production. Hence, in the new equilibrium producers receive the same price for producing as in the old equilibrium, while the price paid by consumers rises by the full amount of the tax. While this analysis accurately describes the effects of introducing an excise tax in a small market where there are no pre-existing distortions, it is difficult to extend to other cases. In general, shifts in the demand curve such as would be caused by a tax change will be associated with changes in the demand for other products. This will alter their prices leading to changes in factor prices which will affect the position of both the supply and demand curves in Figure 1.1. In considering taxes which affect a large part of the economy, it is therefore necessary to adopt a general equilibrium perspective rather than the partial equilibrium view taken above. Two principles which emerge from this partial equilibrium analysis will remain valid. First, tax incidence does not depend on which side of a market the tax is assessed. Second, taxes will be shifted by those agents and factors that are more elastic in supply or demand.
1.2. Methodological issues Before turning to the general equilibrium analysis of the effects of alternative tax instruments, it is useful to comment on several important methodological issues which arise in the study of tax incidence. The first question that arises is how should incidence be measured. A natural, but difficult to implement answer is that the incidence of a tax change can be assessed by looking at the compensating
J.
1048
otlikof and
L . Summers
variation associated with the tax change for each participant in the economy. As discussed in detail in Alan Auerbach's contribution to this Handbook (Chapter 2), the compensating variation provides a dollar measure of the impact of a given tax change on individual economic welfare. It equals the additional income needed to restore the consumer to his or her initial utility level given a change in consumer prices. The compensating variation may be computed as follows: CV= e(q l , uo) - e(q O, U),
(1.8)
where e(.) is the minimum expenditure function, which depends on the consumer price vector q, and the initial utility, uo. In (1.8) q is the post-tax price vector, and q0 is the pre-tax price vector. Alternatively, similar measures discussed by Auerbach, such as the equivalent variation that replaces u0 in (1.8) with the post-tax utility level, may be used in assessing tax reforms. While the computation of the compensating variation associated with a tax change for each affected person reveals its incidence, actual compensating variations are not explicitly calculated in most theoretical or empirical work on tax incidence. In part, this is because of the difficulty involved in specifying individual expenditure functions. It is also due to the fact that most studies of tax incidence focus on the effects of taxes on different classes of individuals. Studies concerned with vertical equity focus on the differential effects of taxes on after-tax incomes of persons with varying levels of pre-tax income. Much of the literature studies the effects of alternative tax policies on the after-tax payments to different factors. This is thought to indicate how different tax changes affect workers, capitalists, land owners, etc. Other dimensions along which incidence is sometimes measured include region and date of birth. Intergenerational issues have been a particularly active subject of research in recent years. Studies of incidence along all these dimensions are best thought of as providing indications of salient features in the distribution of compensating variations associated with alternative tax policies. In the partial equilibrium incidence calculation described above no attention was devoted to the question of what is done with the revenue raised by the excise tax. Specifying the use of the revenue was unnecessary given the partial equilibrium character of the analysis. Implicitly it was assumed that the revenue was not spent on the taxed good, but no more needed to be specified. In general equilibrium the effects of a tax change will depend on whether the tax is used to finance changes in government spending, rebates to consumers, or changes in other taxes. Thus, the incidence of a tax cannot be considered in isolation. Its incidence must be assessed along with a feasible disposition of tax revenue. This has led to the development of several different incidence concepts in the literature. The term "differential incidence" is used to describe comparisons of the incidence of different tax instruments that raise the same amount of revenue. The
Ch. 16: Tax Incidence
1049
term "balanced budget incidence" is used to describe the effects of increased government spending financed by increases in the tax under consideration. No single concept is uniformly appropriate in assessing the incidence of a tax. The appropriate assumption about the use of tax revenue will depend on the nature of the tax reform being studied. In discussing tax incidence, however, it is important to be clear about the nature of the experiment being considered.
1.3. Empirical incidence evaluations Analytical work on tax incidence in various economic models of the type surveyed in this chapter has been complemented by empirical studies that have sought to evaluate the overall incidence of the tax system. Major studies of this type include Pechman and Okner (1974), Musgrave and Musgrave (1976), Browning and Johnson (1979), and Pechman (1985). Their approach and that of other authors working in this tradition is to postulate an incidence for each type of tax, and then use microdata on individuals to calculate the distribution of total tax burdens by income class. The finding of many of these studies is that despite the apparent progressivity of the income tax, the share of income paid in taxes does not increase rapidly with income. According to these studies the total tax system appears to be only slightly progressive over much of the income range. If one considers taxes net of transfer, the system appears highly progressive at the lower income range. A number of problems have been raised with empirical incidence studies of this sort. Perhaps most serious is the need to make assumptions about the ultimate incidence of various types of taxes. As our subsequent analysis will make clear, this is very difficult to determine. And conclusions about the effects of taxation on the distribution of income can be no better than the judgments about the incidence of individual taxes on which they are based. Devanajour, Fullerton, and Musgrave (1982) do, however, provide some evidence suggesting the results of judgemental studies are fairly close to those of full-scale general equilibrium simulation exercises of the type described below. A related problem with these incidence evaluations is that they measure taxes collected by the government from various income classes, but these tax collections may differ substantially from the tax burdens that are imposed on them. A good example is provided by the municipal interest exclusion in the United States. Taxpayers in high tax brackets tend to hold municipal bonds which are tax-free. But these bonds have lower yields than taxable bonds. Hence, high bracket taxpayers bear a burden imposed by the tax system even without transferring revenue to the government. Similar reasoning applies to investments in any tax-favored activity.
1050
L.J. Kotikoff nd L H. Summers
An understanding of the distributional impact of the tax system must be predicated on an understanding of its general equilibrium effects. We now turn to this question.
2. Static general equilibrium models of tax incidence Static models of tax incidence take the economy's aggregate supplies of productive factors, such as physical capital, as given, and consider changes in equilibrium prices arising from commodity and factor taxes. While ignoring the intertemporal issue of human and nonhuman capital formation, static models can provide considerable insight into the incidence of taxation in the short run, i.e. before capital stocks have adjusted to changes in after-tax prices. In addition, many of the conclusions of static tax analysis can be directly applied to the case of long-run dynamic incidence. In contrast to the implicit model underlying Figure 1.1, in which producers and consumers of goods are distinct agents, this section examines models in which all agents have identical consumption preferences, i.e. the producers (actually the owners of productive factors) are also the consumers. Hence, interest shifts from the question of whether producers or consumers bear the burden of a tax to the question of the division of the tax burden among the owners of productive factors. As one would expect, the aggregate supply elasticities of and demand elasticities for particular factors play key roles in the analysis of the incidence of uniform commodity and factor taxation. In the case of differential commodity taxes or industry-specific factor taxes the structure of industrial demands for factors is of great importance for incidence; the ultimate incidence of industryspecific factor taxes depends not only on the demand conditions in taxed industries, which will typically attempt to reduce their demand for the tax factor, but also on the demand conditions for that factor in untaxed industries which will absorb factors released from the taxed sector.
2.1. Tax incidence in a one-sector general equilibrium model 2.1.1. Factor taxes A one-sector general equilibrium model is useful for highlighting several of the main results in static tax incidence analysis. Let X be the economy's single commodity, which is produced using labor, L, and capital, K. The linear
Ch. 16: Tax Incidence
1051
homogeneous production function is given by X= F(K, L),
(2.1)
where FL > 0 and FK > 0. The supply of capital, K, is totally inelastic in the short run, but the supply of labor is positively related to the real wage, W/P, where W is the wage rate, and P is the price of good X: L = L ( W/P),
(2.2)
where L'(W/P) > 0. In competitive equilibrium, marginal revenue products are equated to factor costs, giving: PFL(K, L) = W
(2.3)
PFK(K, L) = r,
(2.4)
and
where r is the rental rate on capital. Equating labor supply and labor demand in (2.3) determines the equilibrium real wage and the equilibrium level of labor, i.e. FL(L(W/P), K) = W/P.
(2.5)
Given the equilibrium level of L, r/P, the real rental on capital, is determined by (2.4). Consider first the incidence of a tax at rate T on the rental of capital. Equation (2.4) becomes: PFK= r( + ).
(2.4')
Since neither the supply of nor the demand for labor are affected by this tax, the equilibrium values of W/P, L, and FK are unaffected. Capital, in this case, bears the full burden of the tax, since its real rental, r/P, falls from FK to FK/1 + T. The change in the real rents received by owners of capital is FK - FK/(1 + T)K, which equals (rK/P), the real taxes paid on the rental of capital. The entire incidence of the tax falls on capital since it is perfectly inelastic in supply to the economy and since it is taxed at the same rate in all its uses, which in this case is a single use, namely the production of X. The results are different in the case of taxing elastically supplied labor at rate T. Producers now equate the marginal revenue product of labor to the after-tax cost of hiring labor, i.e. PFL= W(1 + ).
(2.3')
1052
L.J. Kotlikoff and L.H. Summers
The demand and supply relations, equations (2.3') and (2.2), respectively, determine the new equilibrium wage. The percentage change in W/P arising from an increase in r, evaluated at = 0, is given by
a(W/P) aT
(2.6)
W/P
s_
D
(2.6)
where 7qS is the (positive) elasticity of labor supply, and 7 D is the (negative, since FLL < 0) elasticity of labor demand. Evaluated at = 0, the change in real tax revenue ((W/P)L) is simply (W/P)L. The marginal losses in rents to labor, [a(W/P)/3T] L, and to capital, [a(r/P)/3aT] K, expressed as a ratio of the marginal tax revenue, (W/P)L, are given by
a(w/P)L aT
_(2.7)
(W/P)L -
(2.7)
and
a(r/P)K (W/P)Tqs71 D 7'
(W/P)L -
(2.8)
Note that the two ratios sum to -1, indicating that the full incidence of the tax falls on either capital or labor. If labor supply is perfectly inelastic, US = 0, labor bears the full burden of the tax, i.e. the right-hand sides of (2.7) and (2.8) are -1 and 0, respectively. This result also holds if the demand for labor is perfectly elastic (D = o0). At the opposite extreme labor may be perfectly elastic in supply (S= oo), or the demand for the labor may be perfectly inelastic (iqD = 0), in which case capital bears the full burden of the tax. Note that capital always bears some burden of the tax provided s =0 and D = o0. In general, the larger the supply elasticity and the smaller the demand elasticity, the larger will be the share of the tax burden shifted to capital. The elasticity of labor demand is related to the degree of substitutability between capital and labor. Equation (2.9) expresses this relationship for the case of a linear homogeneous production function, where o is the elasticity of substitution of capital for labor with respect to increases in the wage-rental ratio, and 0 K is capital's share of output (FKK/F): 7
=
- a/OK.
(2.9)
1053
Ch. 16: Tax Incidence
As a - oo, i.e. as capital and labor become closer substitutes in production, 0 and capital and labor approach perfect Alternatively, as 7D-- -o. complementarity, qD - O.Intuitively, when a = oo, capital and labor are perfect substitutes; hence, their after-tax user costs to competitive firms must be identical, and W(1 + T) = r. But, assuming constant returns to scale, the production function in this case is of the form F(K, L) = a(K+ L), where the constant a equals both FK and FL. Since FK = a = r/P=(W/P)(1 + T), W/P falls by the full amount of the tax. If, on the other hand, a = 0, firms view their input as a fixed combination of capital and labor. From their perspective a tax on labor is identical to a tax on capital as long as it raises the cost of the fixed capital and labor input bundle by the same percentage. Since any reduction in the real wage, W/P, received by labor will reduce the supply of labor [assuming L'(W/P) > 0] and make part of the capital stock redundant, the equilibrium involves no change in W/P and a fall in r/P such that the loss to capitalists in real rents equals the tax revenue. 2.1.2. Commodity taxation A proportional tax at rate T on the consumption of good X creates a divergence between the price, P, paid by the consumer and the price P/1 + T received by producers. With a consumption tax the factor demand conditions are written as )FK(L(W/P),K) =r
(2.4")
)F,( L(W1P), K)= W.
(2.5")
and (+
Multiplying (5') and (6') by (1 + T) reveals that a proportional consumption tax is structurally equivalent to a uniform proportional tax on factor inputs. A further equivalence result can be demonstrated by rewriting (2.5') and (2.6') in terms of the producer price q, where q = P/1 + T, and the tax rate T *, where * = T/1 + f:
qFK(L (
W(I -))
qFL(L(
W(-
K)= r
(2.4
K) =
(2.5)
and *)
1054
L.J. Kotlikoff and L.H. Summers
Equations (2.4 "' ) and (2.5 "' ) describe the identical tax structure as a uniform proportional income tax at rate * on wages and capital rents; here r is the pre-tax return to capital paid by firms, while r(1 - T*) is the post-(income) tax return to capital. The equivalence in a static model between a uniform consumption tax, a uniform tax on factor returns (a value added tax), and an income tax is a general result that applies independent of the number of sectors and factors [Break (1974), Musgrave (1959), and McLure (1975)1.1 Using (2.4") and (2.5") the incidence of the commodity tax expressed as a fraction of real marginal tax revenues (X) is
a(W/P) aT
L
L
D
and
(r/P) K 8T
= _OK-
S
D )*K'
(2.10)
where L is labor's share of output. If qs = 0 or q = oo capital and labor bear the burden of the tax in proportion to their respective shares of output. If ND = 0 or qs= oo, capital bears the entire burden of the tax.
2.2. Tax incidence in a two-sector general equilibrium model The static two-sector general equilibrium model permits the analysis of sectorspecific factor taxes. Harberger's (1962) seminal analysis of the incidence of the corporate income tax focused attention on the elasticity of supplies of domestically mobile factors to particular industries, with such mobile factors able to switch industries in an attempt to avoid taxation. The research also identifies industry-specific factor demand conditions as critical for determining the incidence of a sector-specific factor tax. As indicated in the discussion of one-sector factor tax incidence, factor demand elasticities are closely related to elasticities of substitution in production. A variety of taxes can be analyzed in this model. A thorough review is presented in McLure (1975) who also discusses a number of tax equivalence lThe equivalence also holds in a dynamic model provided final output in the investment goods industries is treated as consumption under the consumption tax and that purchases of newly produced investment goods are not deductible under the value added tax.
1055
Ch. 16: Tax Incidence
propositions. Since both capital and labor are assumed to be inelastically supplied, the incidence of a general factor tax is trivial. It is borne entirely by the taxed factor. The incidence of excise taxes on one of the commodities is more complex and is considered below. We focus on the case of a sector-specific tax such as the corporate tax. The two-sector general equilibrium model is the simplest framework in which such a tax can be investigated. The two-sector general equilibrium model highlights the wide range of theoretically possible effects of a tax even in a very simple setting. A tax may actually benefit the taxed factor, or the taxed factor may lose more than the amount of tax revenue collected. Intermediate outcomes are also possible. 2.2.1. Assumptions of the model The two-factor, two-sector model examined by Harberger assumes that economy-wide supplies of the two factors, capital and labor, are perfectly inelastic, but that capital and labor are perfectly mobile between sectors. The two industries are perfectly competitive, and the production functions of both industries exhibit constant returns to scale. The assumption that workers, owners of capital, and the government have identical homothetic preferences ensures that the redistribution between the three demanders of goods arising from the sector-specific factor tax has no impact on the aggregate demands for the two goods. The analysis thus abstracts from changes in the relative demands for the two goods that could arise if workers', capitalists', and the government's preferences differ. 2.2.2. The model Denote the two industries as X and Y. The constant returns assumption permits the production function to be written in the intensive form: X= Lf(k,) = Kxf(k,)/k, Y= Lg(ky) = Kyg(ky)/ky,
(2.11)
where L x and Ly are, respectively, the quantities of labor used to produce goods X and Y, and k and ky are the respective capital-labor ratios in industries X and Y. The function f(-) expresses the ratio of output of X to Lx, and g(.) is, similarly, output of Y per worker in industry Y. The respective economy-wide supplies of labor and capital, L and K, must sum to their respective demands: a l x X+ alyY= L, akxX+ ayy=
k,
(2.12)
1056
L.J. Kotlikoff and L.H. Summers
where a,,, akx, a, and ak, are input-output coefficients and are implicitly defined in (2.11). The equalities, required by competition in factor markets, between marginal revenue products and post-tax factor costs may be written as
P,
w +(r r 7kxk)kX,
f
w+rky Py=
=
r+ Tk
P f,
r
I'= ,7,
(2.13)
where P, and Py are the respective prices of goods X and Y, W and r are the respective net rentals to labor and capital, and TkX is the tax on the rental of capital in the X industry. The assumption of identical homothetic preferences of workers, capitalists and the government implies that aggregate demands can be written as X=m(P,/Py) 1,
Y= n(P./Py) i,
(2.14)
where I is total disposable income of the private sector plus the government's tax revenues, i.e. I = wL + rK+ TkxKx.
(2.15)
The expressions (2.12), (2.13), and (2.14) provide eight equations in eight unknowns, X, Y, w, r, k , ky, P, and Py. One of these equations is, however, redundant, implying that one can only solve for relative prices. Specifically, the expressions in (2.14) satisfy PxX + PyY= I = wL + rK + rkxKx.
(2.16)
The first equality in (2.13) implies PxX= wL. + (r + Tk)KX. This expression and (2.16) imply Py = (w + rky)/g, the third equation in (2.13). 2.2.3. The incidence of a tax on capital in industry X Following the derivation in Atkinson and Stiglitz (1980), which employs Jones' (1965) technique of considering "equations of change", one can reduce the eight equations in (2.12), (2.13), and (2.14) to three equations in the percentage change in the three ratios: X/Y, P/P, and w/r. Let 2 denote the proportional change in a variable z. From (2.13) we have Pf' = Pyg' + Tkx. Totally differentiating
Ch. 16: Tax Incidence
1057
this expression yields: g"k
P - Py =
Y
f "kx
(2.17)
+ d'rkx
Expression (2.13) also implies: (
and
kx
)= f-f
w
g-g'ky
w
Total differentiation of these expressions gives: f _wx
f ky
A
_
f'O
dkx
f "Ix
"k
r
g'wyt ^ _a),
(2.18)
where Owx and 0wy are the respective labor shares in producing X and Y [e.g. Ow~ = (f-f'kx)/f]. Equations (2.17) and (2.18) imply: P -PY = (Owx-Owy)(-
-i)
+ rx
dTkx(2.19)
r
(2.19)
In (2.19) rx is capital's cost share in X. The equation indicates that the relative price of X rises with increases in the wage-rental ratio if labor's cost share in X exceeds its cost share in Y. We next consider percentage changes in factor demands. Total differentiation of (2.12) yields: + + (;)IX
(,x + (akx+
(aky +
k)Xk+
X,ly=0, )Aky = 0,
(2.20)
where XtX = 1 - Aly is the share of total labor supply used in producing X, and Xkx =1 - Aky is X's corresponding share of total capital. From the definitions of ax, ay, ak,, and aky implied by (2.11) and (2.12), it is easy to show that ax
+OrxOx
= -rxx(W-?)
aly = - Oryy (
I-
aky =Owa dky = OyCFy(V - P),
r
),
dTkx r
(21-p)Oxx
(2.21)
1058
L.J. Koikoff nd L H Summes
where a, and ay are the respective elasticities of substitution of capital for labor in the two industries [e.g. ax = k/(¢ - P)]. Since X, - Xk = (1 - XA) (1 - Xk) = Aky - ;Ay, subtracting the bottom equation in (2.20) from the top equation gives:
(Ax- XAk)(X -
) = akX
kx - alxx + akaky - a^Xi.
(2.22)
Substituting equation (2.21) into (2.22) yields:
Y) = [(wxXkx + O,rXAx)
AkX)(-
(Xx-
+ (wyky + OryXy)oY](W -?)
-[°WxkX±+ orxAlx]ax
rkX
(2.23)
If X is more labor intensive than Y, ky > kx, which implies AX > Xkx, and increases in the wage-rental ratio are associated with increases in the ratio of X to y.2 The corresponding equation for change in the relative demand for X and Y is obtained by differentiating the ratio of the demand for X to the demand for Y determined by (2.24)
x- Y= -(Pk-
(2.24)
Py).
In (2.24) a/ is the elasticity of the demand for X relative to the demand for Y with respect to the relative price ratio. */ is positive since both X and Y are assumed to be normal goods. Equations (2.19), (2.23), and (2.24) are three equations in the unknowns X- Y, - , and P - Py. The solution for ¢ - &is given by (w -
)[(X
- Xkx)(Owx - Owy)
+ (wx
kx
+ (wyAky +
[(wXkx + Orxx)O-
(X,-
Akx)9,x]
orxx)ox x+
,,Ay)ly] dTkx,
(2.25)
where the bracketed term on the right-hand side of (2.25) multiplying ( - ) is 3 positive since (XA 1 - Xk,) and (OWx - Owy) have the same sign. 2To prove k. > k,_implies X,, > Xkx use the two equations AXikx + X,,vk = k, and XkJ/k, + XAk,/k, = l/k, where k = K/L. Substituting out for k and some algebra yields the results. 3
Note indicates that Al, - Xk, > 0 if ky > k.. But Ow - Yw
=
wx0, - Bwyx = (wr/XY) L. L, (k,
-k )
Ch. 16: Tax Incidence
1059
In evaluating the incidence of the tax, we choose I, the level of total nominal income, as the numeraire. With I fixed, we have, from (2.16), dw L + drK = - K x dTk.
(2.26)
The right-hand side of (2.26) is the marginal tax revenue evaluated at Tk = 0. The equation indicates that the burden of the tax equals the burden on labor, dwL, plus the burden on capital, drK. This equation can be rewritten as
0, 8
+
Kx dkx 1V+P~~i'~~= 1 =_ K '(
-
~~~(2.26')
-
where 0, and Ok are labor's capital's respective shares of total income. Obviously, if wi = 0, the tax is fully borne by capital, and vice versa if P = 0. Substituting for v from (2.26') into (2.25) and expressing capital's burden as a share of the total burden gives: dr dTK
0kI
Ok +
[(
X
O
wxXkx + OrxXIx)a
x-
'(X
l
x - Xkx)Orx]
(2.27)
(2.27)
where D is the bracketed term on the left-hand side of (2.25) multiplying (w - r). Consider, first, conditions under which the right-hand side of (2.27) is unity, i.e. capital bears the entire burden of the tax. A sufficient condition for this result is that both industries have the same elasticities of substitution and the same initial factor proportions. In this case Xx = X, and X = Xky ky (see footnote 2), and the bracketed term on the right-hand side of (2.27) equals Xkx, implying that the right-hand side of (2.27) equals k + 0, = 1. An alternative condition sufficient for capital to bear the entire tax burden is that all three substitution elasticities, a,, a,, and , are equal. In this case the bracketed term in (2.27) equals XkX. If factor intensities are initially equal (implying Xl = Xk) and if ax= 0, capital's share of the tax burden equals its share of total income, Ok . Intuitively, if a = 0, capital and labor are used in fixed proportions in X, and there is effectively only a single factor. The tax on capital in X, which in this case is equivalent to simply taxing X, raises the relative price of X leading to a reduction in its demand. To accommodate this reduced demand the X industry releases both capital and labor to the Y industry in the proportion that these factors are used in producing X. If the Y industry is using these factors in the same ratio, then it can expand production of Y to meet its increased demand with no change in factor proportions. Since Y's capital labor ratio, ky, is unchanged, the ratio w/r upon which ky depends [equation (2.13)] is unchanged. Hence, w and r both fall (relative to P, and Py) by the same percentage. This
1060
L.J. Kotikoff acnd L.H. Summers
same incidence outcome arises in the case that capital and labor are perfect substitutes in producing Y (y = oo). From industry Y's perspective, capital and labor are identical factors, and it will use both factors only if their net costs are equal. This implies w = r, where we measure capital in units such that one unit of capital is equivalent in production to one unit of labor. Capital's share of the tax burden can also be less than its income share. A necessary condition for this outcome is that the X industry be more labor Xk, then intensive than the Y industry. Suppose, for example, ox = 0 and XAx > A share is tax burden that capital's indicating in (2.27) is negative, second term the less than k . In this case there is more labor relative to capital released from X to Y than industry Y is initially using to produce. To accommodate the initial relative excess supply of labor, the wage must fall relative to the rental on capital. It is indeed possible that more than 100 percent of the burden of the tax on capital is shifted onto labor; i.e. the right-hand side of (2.27) can be negative, and capital can be made absolutely better off by the capital tax levied on industry X. At the other extreme, capital may bear more than 100 percent of the tax burden, i.e. labor may be made better off by the tax. Consider the outcome when capital and labor are perfect substitutes in X. In this case o = o, and the demand for capital in X is perfectly elastic [see equation (2.7)]. From (2.27), when ao= o, capital's share of the tax burden is Ok + (0,)/(Xk.), which exceeds 1. From the perspective of industry X, capital and labor are identical factors, and the net costs to the industry of hiring labor and capital must be equal (w = r + Tk); otherwise there would be an incentive to substitute one factor for the other. Capital bears more than 100 percent of the tax because, relative to the wage, the rental rate of capital declines in both industries by the full amount of the tax.
2.2.4. The incidence of a commodity tax on good X Additional insight into formula (2.27) can be obtained by considering Mieskowski's (1967) observation that capital's share of the tax burden can be divided into output and substitution effects. Consider the term
- 6O (,
- Xkx) rx XkxD
which is the second bracketed term in (2.27) multiplied by 1/JX. This expression is precisely capital's share of the burden of a tax on the consumption of good x. To see this, note that in the case of a tax on good x, the relative changes in producer prices and outputs are given by equations (2.19) and (2.23), respec-
1061
Ch. 16: Tax Incidence
tively, with a
( -
d kX
set to zero, and equation (2.24) is modified to
) = - q( p_-py + &_) '(2.24')
where T, is the per unit excise tax on good X. In addition, in (2.26) dTk is set equal to zero. These revised versions of (2.19), (2.23), and (2.26), plus (2.24') can be solved together to derive the formula given above for -drK/dTxX, capital's share of the incidence of an excise tax on good X. Note that if XAx = kx, capital's and labor's shares of the tax burden are both zero, i.e. there is no change in the factor incomes received by capital and labor measured at producer prices, although capitalist and workers bear the tax burden as consumers. While income measured at producer prices is the numeraire and, therefore, does not change, real income of consumers falls because of the change in consumer prices. Consider the change in the cost measured in units of good x of consuming the initial bundle, X 0 , Y0. Let C denote this cost, then C=
X0 +
+
,
(2.28)
and rP
o
=X0,
(2.29)
since Py/Px is unchanged by Tr when - = 0 [see equation (2.17) with dkX = 0]. Hence, measured at producer prices, the loss in real income to consumers from a marginal increase in equals the marginal tax revenue. Since capitalists and workers are assumed here to have identical tastes, they bear the burden of the tax in proportion to their share of total income. Returning to equation (2.27), note that when a, = 0, the second term in (2.27) reduces to the output effect. This is what one would expect, since when ox= 0, a tax in industry X is effectively equivalent to an excise tax on good X. If ox= 0 and XX = kx, capitalists and workers bear the tax through a decline in their factor income rather than through an increase in the consumer price level. 2.2.5. The relationshipof welfare changes to the incidence of a sector-specific factor tax While the preceding analysis described changes in incomes of workers and owners of capital arising from a sector-specific factor tax, a full analysis of the associated welfare changes requires taking account of changes in the prices of the
1062
L.
otikoff nd L. H. SumnmersJ
two commodities, P and Py. The assumption that workers and capitalists have identical homothetic preferences implies that they both view the price changes as equivalent to the same specific percentage change in their incomes, which can be positive, negative, or zero depending on the precise changes in prices. For example, the change in the welfare of workers can be calculated from their indirect utility function, V, as dV =wL
[P+
kx d
o+p
dwL
(2.30) *
where Xw is the worker's marginal utility of income, and 0x and 0y are the respective (and identical for workers and capitalists) expenditure shares on X and Y. A similar expression involving the terms 8xP,+ OPv, reflecting price changes, and d rK/r , reflecting income changes, holds for capitalists. Hence, the price related change in welfare, ,P x, + OyP, is the same fraction of income for both workers and capitalists, and the incidence terms, dwL and drK, describe differential changes in the welfare of workers and capitalists. Stated differently, workers and capitalists bear the tax both in terms of possible changes in their incomes and, in their roles as consumers, in terms of changes in consumer prices. However, the price changes are common to both and have the same welfare impact as a xg, + Oyfv percentage change in income, while the direct percentage changes in incomes due to changes in factor payments can be quite different for the two groups. In the special case that the direct percentage changes in incomes of workers and capitalists are equal, one could describe the tax as simply "falling on consumers" through changes in prices with no changes in factor incomes, i.e. since the model's numeraire can be freely choosen, we could choose w or r as numeraire. In the case that both factors share the burden in proportion to their = r=0. While incomes share, we have with the new numeraire convention Py rise and generate possibly P and, constant, incomes remain factor nominal the same reduction in real factor incomes and welfare that arises if I is chosen as the numeraire. 2.2.6. Extension to the case of differences in preferences Mieszkowski (1967) and Atkinson and Stiglitz (1980) extend the analysis of the incidence of sector-specific factor taxes to the case in which workers' and capitalists' preferences differ. The results can be illustrated with the following example. Let workers and the government consume only X, and let capitalists consume only Y. Then the new demand equations are PX= wL +kK, P,Y= rK,
(2.14')
Ch. 16: Tax Incidence
1063
and the change in relative demands associated with the tax is given by
(-
Y)=
-(Px- Py) + (
Pr9 ) +
r
(2.24')
xw
Using (2.24') rather than (2.24), capital's share of the tax burden is dr hl
dTkx
Kx
k
(Owx,
+ r[XI()Oax-
+kx[
(XIx-
D - (XI
Xkx)(O,,x-
rKx/wL )
kx)
(2.27') Equation (2.27') is identical to (2.27) for
= 1 except for the additional term
(XI--Xkx)(rKx/wL) in the numerator and the term -(X,
-
kx) in the de-
nominator. When factor intensities are equal in X and Y we have the same result as above, namely that the incidence of the tax is independent of demand parameters. When XA, Xkx demand elements enter, and the demand structure (2.14') can either reduce or increase the tax burden on capital. When the X industry is capital intensive the fact that workers and the government spend all their expanded income on X while capitalists adjust to their lower incomes simply by reduced consumption of Y implies a smaller ultimate share of the tax burden falling on capital. On the other hand, if X is labor intensive, the smaller increase in the incomes of workers relative to the X = AXy case feeds back to lower r by more than would occur if demands were identical and homothetic.
2.3. Estimates of tax incidence in static general equilibrium models Tax incidence formulae, like that given in equation (2.25), are appropriate only for small changes around an initial no-tax equilibrium. To examine the incidence of large tax changes as well as consider many more sectors and types of consumers, Shoven and Whalley (1972) constructed a computable general equilibrium model. Their method of calculating an equilibrium is based on Scarf's (1967, 1973) algorithm and related techniques. 4 The analysis of large perturbations of the equilibrium requires specifying explicit functional forms for preferences and production technologies. The parameter of these functions are then selected such that initial equilibrium values of the model are in rough accord with those actually observed in the economy. This calibration of the model is not an 4 See Ballard, Fullerton, Shoven and Whalley (1984) for a description of recently developed computable general equilibrium models and their methods of solution.
1064
LJ. Kodioff and L. . Summers Table 2.1 Calculations of corporate tax incidence from the Shoven-Whalley model. Capital's share of the tax burden Capital-Labor elasticities of substitution ox = 1, ov = 1
ao = 0.5 x = 1, a, = 1, a, = 0.25 aox= 0.75, ay = 0.25 o = 0.5, a = 0.25 ox = 0.25, ay = 0.25
I 2 Sectors (%) 100 117 128 105 75 33
=
1.0 12 Sectors (%) 100 NA NA 104 NA 39
1 2 Sectors (5[) 118 145 162 141 110 62
0.5 12 Sectors (%) 117 NA NA 137 NA 62
Source: Shoven (1976, tables 4 and 6). NA = not available. Note: a, and oy are, respectively, elasticities of substitution in the taxed (x) and untaxed (y) industries. l is the elasticity of demand for x relative to y.
econometric procedure, although econometric findings, such as the estimated values of particular elasticities of substitution, are often used as part of the calibration. Table 2.1, which is based on Shoven's (1976) reported results, shows calculations of corporate tax incidence for the United States for two-sector and twelve-sector variants of the Shoven-Whalley model. The table reports the share of the tax burden borne by capital for alternative elasticities of substitution of the assumed CES production and demand functions. For this particular experiment the two- and twelve-sector models yield quite similar results. The incidence on capital is, however, very sensitive to the assumed elasticities of substitution. A low relative demand elasticity, a low elasticity of substitution in the noncorporate sector, and a high elasticity of substitution in the corporate sector imply reductions in the net income of capital well in excess of 100 percent of the corporate tax revenue. Alternatively, if capital and labor are both highly complementary in producing the two gfods (a = 0.25, a = 0.25), capital's share of the tax burden is only a third. While Shoven and Whalley's results on U.S corporate tax incidence, like Harberger's (1962) original calculations based on equation (2.25), are instructive for understanding tax incidence in static general equilibrium models, their assumed values for corporate income tax rates are subject to question. They estimate the effective marginal tax rate in different sectors by calculating taxes collected as a fraction of reported profits. There are a number of problems with this procedure. Stiglitz (1973), for example, points out that since interest is deductible from the corporate tax, the effective marginal tax on debt financed corporate capital is zero. Other research [e.g. Jorgenson (1963), Auerbach (1979a,
1065
Ch. 16: Tax Incidence
1979c), King and Fullerton (1984), Bradford (1981), Gordon (1985), Bulow and Summers (1984)], stresses the importance of depreciation and other investment incentives, the tax treatment of dividends and capital gains, and the risk sharing attributes of capital income taxation as critical elements for determining effective marginal taxes on capital income.
3. Tax incidence in open economies Our analysis has so far maintained the assumption of immobile factors. This assumption is clearly inappropriate in considering taxes levied by a single locality from or towards which capital and labor can migrate. Increasingly, as capital becomes more mobile internationally, it is necessary to recognize the effects of international factor mobility in considering national tax policies as well. In order to focus on the effects of factor mobility, we return to the one-good general equilibrium model of Section 2.1. We further simplify this model by assuming that the factor complementary to capital-here labelled land-is supplied inelastically and is immobile.
3.1. A simple model of factor mobility Analysis of tax incidence in a simple one-good, two-factor, two-country model provides important insights into the differences in incidence results in open and closed economies. Following Bradford (1978), assume that the two factors of production are capital and land and that capital is internationally mobile. In contrast to the analysis in Section 2.1, of a country-wide tax on capital in a closed economy, a tax on capital imposed by one country on income earned by capital in that country is not fully borne by the capital initially in the country imposing the tax. Since capital is internationally mobile, reductions in capital rentals in one country imply reductions in the other. Hence, the incidence of the tax borne by capital is spread evenly across all capital, regardless of the country in which it is ultimately used. In contrast to capital owners, landowners in the two countries are differentially affected by the tax; there is a loss in rental income to landowners in the country imposing the tax on capital and a gain to landlords in the country with no tax on capital. To see this let F(K) and G(K) be the respective production functions in countries A and B. KA is the capital in country A, and K- KA = KB is the capital in country B, where K is total world-wide capital. If r is the rental on capital, and is the tax on capital in country A, we have r + T = F'(KA);
= G'(KB)
(3.1)
1066
LJ. Kotlikoff and L . Sumrmers
From these equations and the constraint KA + KB = K it is easy to show that the ratio of the change in worldwide capital income, drK, to the change in tax revenue, dTKA, calculated at the no tax equilibrium, is given by (dr/dr)K KA
AK
_
(7BKB + TAKA)
< 0, I(1) = 0,
(4.22)
where K is the firm's capital stock, PK is the price of its capital goods relative to consumption goods, s is the subsidy paid to the purchase of new investment goods, and I(.) is the firm's net rate of investment function. Note that K can be
L.J. Kotlikoff and L.H. Summers
1084
Figure 4.1. The dynamics of capital accumulation.
negative. Assume further that the capital good K is used in a production process where it earns a total return F'(K)K and that F"(K) is negative. Finally, assume that all returns are paid out and that investors require some fixed rate of return, p, to induce them to hold the capital assets. The returns to holding a unit of capital come in the form of rents F'(K) and capital gains so that
=F'(K)(1 -
+
PK
(4.23)
Equations (4.22) and (4.23) describe the dynamics of the adjustment of the quantity and price of capital. The phase diagram is depicted in Figure 4.1 for = 0. Equilibrium occurs at the intersection of the two schedules at the point = where F'(K) = p. Note that the system displays saddle point stability. Except along a unique path marked by the dark arrows, the system will not converge. Only along this path does the supply of investment exactly validate the future returns capitalized into the market price of capital goods. Such saddle point stability is characteristic of asset price models. It implies that at any point in time, the stock of capital and the assumption of saddle point stability uniquely determine the asset price of capital. The phase diagram in Figure 4.1 can be used to examine the effects of various type of tax changes. In Figure 4.2 the effect of a tax on the asset's marginal product is considered. Such a tax does not affect its supply curve so that the K = 0 locus does not shift. The reduction in after-tax returns due to the increase in T leads to the leftward shift in the PK = 0 locus. Such an increase in the tax rate has no immediate effect on the capital stock, but the market price of capital drops from El to B. As capital is decumulated, the marginal product of capital rises, and the system converges from B to E2 where Pk again equals its equilibrium value. Note that after the first instant (i.e. after the capital loss)
16: Tax Incidence
1085
Figure 4.2. The effects of a corporate tax increase.
investors always receive a fixed return p as reduced rents are made up for by capital gains as equilibrium is restored. The position of the adjustment path depends on the elasticity of supply of the capital good. If the elasticity is substantial (the line with arrows is flatter), adjustment is rapid so that the tax change has little effect on the asset price of capital. If the supply of capital is relatively inelastic, there is a larger movement in the price of capital. In the limiting case where the supply of capital is completely inelastic, the relative price of capital declines to point A along the PK= 0 locus. The effect of a subsidy (raising s) to new capital investment which does not apply to existing capital, such as accelerated depreciation or the investment tax credit, is depicted in Figure 4.3. This shifts the K = 0 schedule, but has no effect on the return from owning capital and so does not affect the PK= 0 locus. Such a subsidy leads to an increase in long-run capital intensity, but also reduces the market value of existing capital goods. This illustrates that tax measures that encourage investment may hurt existing asset holders. The magnitude of the loss will depend upon the elasticity of the supply of capital. If it is high, owners of
Figure 4.3. The effects of an investment subsidy.
1086
L.J Kodlikoffand L H. Summers
existing capital will suffer a loss close to the subsidy rate. If not, they will continue to earn rents during the period of transition so the loss will be smaller. That a subsidy to capital may hurt capitalists may at first seem counterintuitive. It occurs because the subsidy is available only to new capital which is a perfect substitute for existing capital in this model. The adverse effect of a reduction in new car prices on used car prices illustrates the effect considered here. 4.4.2. Tax capitalization in a general equilibrium model
This section illustrates how capitalization and changes in factor prices interact to determine tax incidence in general equilibrium. Consider a tax on land rents in the simple two-period life-cycle model presented above, but altered to include two assets, land and capital, but to exclude population growth. Such a model is considered by Feldstein (1977) in his seminal contribution on the relationship of capitalization to tax incidence. The lifetime budget constraint (4.1) is unaltered and is duplicated here for reference: C2,,t+= (,
- C1,)(1
(4.1)
+ r,,+).
The capital stock in period t + 1 corresponds to the period t savings of young workers that is not invested in land. Letting P, stand for the price of land in period t, T for the inelastic stock of land, and K,, as above, for the period t stock of capital, we have that capital per worker in t + 1 equals: K+
=
wt - C1 ,w,
,rt+
(4.24)
) - PT.
The production function determining output per worker, F(K,, T, L), is assumed to be homogeneous of degree one in capital, and inelastically supplied land and labor. Pre-tax land rents, corresponding to the marginal product of land, FT,, are taxed at rate 0. Since land and capital are perfect substitutes, the return to holding land must equal the return to holding capital: FT,(1 -
(4.25)
) + Pt+ l-P
In the steady state we have from (4.25) and (4.24):
k + fT(Fr
r
1 ) w-
C(w, r)
(4.26) (4.26)
P
(4.27)
and k
+F
FrL
ak
ak
16: Tax Incidence
1087
where r = FK. The above expression is positive if the denominator is positive, a requirement for steady-state stability. Hence, a tax on land rents raises capital intensity and, therefore, the real wage in the long run. The intuition behind this result is that the land rent tax involves a redistribution from initial elderly landowners to subsequent generations. As described above, such redistribution leads to increased savings. The initial generation of elderly suffer a windfall loss in their resources because of the tax on current land rents, and, generally, because of a reduction in the initial period price of the land. Chamley and Wright (1986) provide an extensive analysis of the dynamic incidence of the land tax in this type of model. They show that the initial generation of elderly landowners is always worse off because of the imposition of the tax. In addition, the initial price land can fall by more or less than the present value of future tax revenues. Consider, as an illustration of the dynamic incidence of a land rent tax, the special case in which consumption in periods 1 and 2 are perfect substitutes at rate 1/(1 + p); that is, U(C11 , C2t) = U(C + 1
C21 )
(4.28)
The first-order conditions for utility maximization imply: r = p.
(4.29)
With this preference function, first-period consumption is perfectly elastic with respect to the interest rate. Since rt is pegged by p, K, is also pegged, since p = r= FK(K,, T, L) and T and L are given. But pegging K, means that w, is also pegged. Hence, in (4.24) Kt+ and w, are unaffected by the land rent tax, and C rises by the amount of the decline in PT. The price of land can be expressed via (4.25) as the present discounted value of the after-tax marginal product of land. Since the marginal products of land and capital are fixed, we have P= FT(
-
, IVt.
(4.30)
Hence, in response to the introduction of a land rent tax the initial price of land falls by the present discounted value of all future government tax receipts. In this case the burden of the future as well as the current tax rent tax falls entirely on the initial generation of elderly landowners. The initial young and future generations are unaffected by this tax because the wage and interest rate they face do not change.
1088
LJ. Kodiikoff and L H. Summers
Obviously, this strong result arises because of assumption (4.28). In what may be the more likely event that both the price of land falls by less than the present value of future taxes and capital increases over time, the initial generation of elderly will bear most of the burden of the tax and future generations will be better off due to the tax. The initial generation of young workers will, however, bear some fraction of the tax burden, because the increased capital accumulation will post-date their appearance in the labor market. Hence, they will not enjoy a higher wage and will not be better off; on the contrary, when this initial young generation is old, it will receive a lower return on its saving, because the marginal product of capital will be lower due to the increase in capital formation. Through this channel the initial young generation will share some fraction of the burden of the tax. In this model, as in others presented in this chapter, the equilibrium may not be unique. More specifically, for certain combinations of preferences and production functions there may be an infinity of transition paths for the capital stock and the price of land, all of which ultimately converge to the same steady state [Chamley and Wright (1986)]. The possibility of multiple asset price equilibria in models of this kind is discussed more generally by Calvo (1978). In such settings intertemporal incidence, like other variables in the economy, are not uniquely determined.
5. Conclusion Economics is at its best when it offers important insights that contradict initial, casual impressions. The theory of tax incidence provides a rich assortment of such insights. Tax incidence's basic lesson that real and nominal tax burdens are not necessarily related means that taxes on capital may be born by workers, that investment incentives may be injurious to capitalists, that taxation of foreigners may simply represent indirect domestic taxation, and that generations alive many decades in the future may be supporting those currently alive. The study of tax incidence is both fun, because it offers such surprising findings, and very important, because of its implications about the impacts of government policies. Much of the current tax incidence literature considers settings of certainty, perfect information, and market clearing. As more sophisticated models relax these assumptions, the theory of tax incidence will be enriched and, with all probability, provide even more surprising and exciting economic insights.
References Aaron, H.J., 1975, Who pays the property tax? (Brookings Institution, Washington, D.C.). Abel, A.B., 1979, Investment and the value of capital (Garland Publishing, New York).
16: Tax Incidence
1089
Ando, A. and F. Modigliani, 1963, The "life cycle" hypothesis of saving: Aggregate implications and tests, American Economic Review 53, 55-84. Arrow, K.J. and F.H. Hahn, 1971, General competitive analysis (Oliver and Boyd, Edinburgh). Atkinson. A.B. and N.H. Stern, 1974, Pigou, taxation and public goods, Review of Economic Studies 41, 119-128. Atkinson, A.B. and J.E. Stiglitz, 1980, Lectures on public economics (McGraw-Hill, New York). Auerbach, A.J., 1979a, Share valuation and corporate equity policy, Journal of Public Economics 11, 291-305. Auerbach, A.J., 1979b, Wealth maximization and the cost of capital, Quarterly Journal of Economics. Auerbach, A.J., 1979c, Inflation and the choice of asset life, Journal of Political Economy. Auerbach, A.J. and J.R. Hines, Jr., 1986, Tax reform, investment, and the value of the firm, NBER Working Paper No. 1803. Auerbach, A.J. and L.J, Kotlikoff, 1983a, National savings, economic welfare, and the structure of taxation, in: M. Feldstein, ed., Behavioral simulation methods in tax policy analysis (University of Chicago Press, Chicago) 459-493. Auerbach, A.J. and L.J. Kotlikoff, 1983b, Investment versus savings incentives: The size of the bang for the buck and the potential for self-financing business tax cuts, in: L.H. Meyer, ed., The economic consequences of government deficits (Kluwer-Nijhoff Publishing, Boston) 121-154. Auerbach, A.J. and L.J. Kotlikoff, 1987, Dynamic fiscal policy (Cambridge University Press, Cambridge). Auerbach, A.J., L.J. Kotlikoff and J. Skinner, 1983, The efficiency gains from dynamic tax reform, International Economic Review. Ballard, C.L., D. Fullerton, J.B. Shoven and J. Whalley, 1985, A general equilibrium model for tax policy evaluation, NBER Volume (University of Chicago Press, Chicago). Ballentine, J.G. and I. Eris, 1975, On the general equilibrium analysis of tax incidence, Journal of Political Economy 83, 633-644. Ballentine. J.G., 1977, Non-profit-maximizing behavior and the short run incidence of the corporation income tax. Journal of Public Economics. Barro, R.J., 1974, Are government bonds net wealth?, Journal of Political Economy 82, 1095-1118. Bernheim, B.D., 1981, A note on dynamic tax incidence, Quarterly Journal of Economics, 705-723. Bernheim, B.D. and K. Bagwell, 1985, Is everything neutral?, Mimeo. Boskin, M.J.. 1975a, Efficiency aspects of the differential tax treatment of market and household economic activity, Journal of Public Economics 4, 1-25. Boskin, M.J., 1975b, Notes on the tax treatment of human capital, Conference on Tax Research. (NBER, Stanford, Calif. and Cambridge, Mass.) 185-195. Boskin, M.J., 1978, Taxation, saving, and rate of interest, Journal of Political Economy 86, 3-27. Bradford, D.F., 1978, Factor prices may be constant, but factor returns are not, Economic Letters. Bradford, D.F., 1980, The economics of tax policy towards saving, in: G. von Furstenberg, ed., The government and capital formation (Ballinger Publishing Company, Cambridge), 11-71. Bradford, D.F., 1981, The incidence and allocation effects of a tax on corporate distributions, Journal of Public Economics (February). Break, G.F., 1974, The incidence and economic effects of taxation, in: A. Blinder et al., eds., The economics of public finance (Brookings Institution, Washington, D.C.). Browning, E.K. and W.R. Johnson, 1979, The distribution of the tax burden (American Enterprise Institute, Washington, D.C.). Brueckner, J.K., Labor mobility and the incidence of the residential property tax, Journal of Urban Economics 10, 173-182. Bulow, J.I. and L.H. Summers, 1984, The taxation of risky assets, Journal of Political Economy (February). Calvo, G., 1978, On the'indeterminacy of interest rates and wages with perfect foresight, Journal of Economic Theory (December). Calvo, G., L.J. Kotlikoff and C.A. Rodriguez 1979, The incidence of a tax on pure rent: A new (?) reason for an old answer, Journal of Political Economy 87, 869-874. Chamley, C., 1981, The welfare cost of capital income taxation in a growing economy, Journal of Political Economy (June). Chamley, C. and B. Wright, 1986, Fiscal incidence in overlapping generation models with a fixed asset, Mimeo.
1090
L.J. Kotlikoffand LH. Summers
Courant, P.N., 1977, A general equilibrium model of heterogeneous property taxes, Journal of Public Economics 8, 313-327. Devanajour, D. Fullerton and R. Musgrave, 1982, Estimating the distribution of tax burdens: a comparison of different approaches, Journal of Public Economics. Diamond, P.A., 1965, National debt in a neoclassical growth model, American Economic Review 55, 1125-1150. Diamond, P.A., 1970, Incidence of an interest income tax, Journal of Economic Theory 2, 211-224. Diamond, P.A., 1978, Tax incidence in a two good model, Journal of Public Economics 9, 283-299. Eaton, J. and H. Rosen, 1980, Taxation, human capital, and uncertainty, American Economic Review (September). Edgeworth, F.Y., 1897, The pure theory of taxation, Economic Journal 7, 46-70, 226-238 and 550-571 (reprinted in Edgeworth, 1925). Feldstein, M., 1974a, Social security, induced retirement and aggregate capital accumulation, Journal of Political Economy 82, 905-926. Feldstein, M., 1974b, Tax incidence in a growing economy with variable factor supply, Quarterly Journal of Economics 88, 551-573. Feldstein, M., 1974c, Incidence of a capital income tax in a growing economy with variable savings rates, Review of Economic Studies 41, 505-513. Feldstein M.. 1977, The surprising incidence of a tax on pure rent: A new answer to an old question, Journal of Political Economy 85, 349-360. Feldstein, M., 1983, Domestic saving and international capital movements in the long run and the short run, European Economic Review 21, 129-151. Feldstein, M. and C. Horioka, 1980, Domestic savings and international capital flows, The Economic Journal 90, 314-329. Fischel, W., 1975, Fiscal and environment considerations in the location of firms in suburban communities, in: E.S. Mills and W.E. Dates, eds., Fiscal zoning and land use (D.C. Heath. Lexington, Mass.). Fullerton, D.B. and R. Gordon, 1983, A reexamination of tax distortions in general equilibrium models, in: M. Feldstein, ed., Behavioral simulation methods in tax policy analysis (University of Chicago Press, Chicago). Fullerton, D., J.B. Shoven and J. Whalley, 1978, General equilibrium analysis of U.S. taxation policy in compendium of tax research. (Department of the Treasury, Washington, D.C.). Fullerton, D., J.B. Shoven and J. Whalley, 1983, Replacing the U.S. income tax with a progressive consumption tax., Journal of Public Economics 20, 3-23. Gordon, R.H., 1985, Taxation of corporate capital income: Tax revenues vs. tax distortions, Quarterly Journal of Economics (February). Goulder, L.H., J.B, Shoven and J. Walley, 1983, Domestic tax policy and the foreign sector: The importance of alternative formulations to results from a general equilibrium tax analysis model, in: M. Feldstein, ed., Behavioral simulation methods in tax policy analysis (University of Chicago Press, Chicago) 333-364. Hall, R.E., 1968, Consumption taxes versus income taxes: Implications for economic growth, Proceedings of the 61st National Tax Conference (National Tax Association, Columbus, OH) 125-145. Hall, R.E., 1980, Vignettes on the world capital market, American Economic Review 70, 331-337. Hamilton, B.W., 1975, Zoning and property taxation in a system of local governments, Urban Studies 12, 205-211. Hamilton, B.W., 1976, Capitalization of intrajurisdictional differences in local tax prices, American Economic Review 66, 743-753. Hamilton, B.W., 1983, "A review: Is the property tax a benefit tax?, in: G.R. Zodrow, ed., Local provision of public services- the Tiebout model after twenty-five years (Academic, New York). Harberger, A.C., 1962, The incidence of the corporation income tax, Journal of Political Economy 70, 215-240. Harberger, A.C., 1980, Vignettes in the world capital market, American Economic Review (May). Hines, J.R. Jr., 1985, What is benefit taxation?, Mimeo. Jones, R.W., 1965, The structure of simple general equilibrium models, Journal of Political Economy 73, 557-572. Jorgenson, D.W., 1963, Capital theory and investment behavior, American Economic Review (May).
16: Tax Incidence
1091
Kaldor, N., 1957, An expenditure tax. (Allen and Unwin, London). King, M.A., 1980, Savings and taxation, in: G.A. Hughes and G.M. Heal, eds., Essays in public policy (Chapman and Hall, London). King, M.A., 1977, Public policy and the corporation (Chapman and Hall, London). King, M.A. and D. Fullerton, 1984, The taxation of income from capital, NBER Volume (University of Chicago Press, Chicago). Kotlikoff, L., 1979, Social security and equilibrium capital intensity, Quarterly Journal of Economics (May). Kotlikoff, L.J., 1979, Tax incidence in a life cycle model with variable labor supply, Quarterly Journal of Economics 93, 705-718. Kotlikoff, L.J., 1984, Taxation and savings: a neoclassical perspective, Journal of Economic Literature, 1576-1629. Kotlikoff, L.J. and L.H. Summers, 1981, The role of intergenerational transfers in aggregate capital accumulation, Journal of Political Economy 89, 706-732. Krzyzaniak, M. and R.A. Musgrave, 1963, The shifting of the corporation income tax. (Johns Hopkins Press, Baltimore). McLure, C.E. Jr., 1969, The inter-regional incidence of general regional taxes, Public Finance 24, 457-483. McLure, C.E. Jr., 1971, The theory of tax incidence with imperfect factor mobility, Finanzarchiv 30, 27-48. McLure, C.E. Jr., 1974, A diagrammatic exposition of the Harberger model with one immobile factor, Journal of Political Economy 82, 56-82. McLure, C.E. Jr., 1975, General equilibrium incidence analysis: The Harberger model after ten years, Journal of Public Economics 4, 125-161. Mieszkowski, P.M., 1967, On the theory of tax incidence, Journal of Political Economy 75, 250-262. Mieszkowski, P.M., 1969, Tax incidence theory: The effects of taxes on the distribution of income, Journal of Economic Literature 7, 1103-1124. Mieszkowski, P.M., 1972, The property tax: An excise tax or a profits tax?, Journal of Public Economics 1, 73-96. Mieszkowski, P.M., 1984, The incidence of the local property tax, a reformulation, NBER Working Paper No. 1485. Mieszkowski, P.M. and G.R. Zodrow, 1984, The new view of the property tax: A reformulation, NBER Working Paper No. 1481. Modigliani, F. and R. Brumberg, 1954, Utility analysis and the consumption function: An interpretation of cross-section data, in: K.K. Kurihara, ed., Post-Keynesian economics (Rutgers University Press, New Brunswick, N.J.). Modigliani, F., 1983, The life cycle hypothesis and national wealth- a rehabilitation, M.I.T. Discussion Paper. Musgrave, R.A., 1951, Distribution of tax payments by income groups: A case study for 1948, National Tax Journal (March). Musgrave, R.A., 1953a, General equilibrium aspects of incidence theory, American Econ-.nic Review 43, 504-517. Musgrave, R.A., 1959, The theory of public finance (McGraw-Hill, New York). Musgrave, R.A. and P.B. Musgrave, 1976, Public finance in theory and practice, 2nd ed. (McGraw-Hill, New York). Pechman, J.A., 1985, Who paid the taxes, 1966-1985 (Brookings Institution, Washington, D.C.). Pechman, J.A. and B. Okner, 1974, Who bears the tax burden (Brookings Institution, Washington, D.C.). Phelps, E.S. and K. Shell, 1969, Public debt, taxation and capital intensiveness, Journal of Economic Theory 1, 330-346. Samuelson, P.A., 1975, Optimum social security in a life-cycle growth model, International Economic Review 16, 539-544. Sato, R. and R.F. Hoffman, 1974, Tax incidence in a growing economy, in: W.L. Smith and J.M. Culbertson, eds., Public finance and stabilisation policy (North-Holland, Amsterdam). Scarf, H.E., 1967, The approximation of fixed points of a continuous mapping, SIAM Journal of Applied Mathematics. Scarf, H.E., 1973, The computation of economic equilibria (Yale University Press, New Haven.
1092
LJ. Kotlikoff and L . Summers
Shoven, J.B., 1974, A proof of the existence of a general equilibrium with ad valorem commodity taxes, Journal of Economic Theory 8, 1-25. Shoven J.B., 1976, The incidence and efficiency effects of taxes on income from capital, Journal of Political Economy 84, 1261-1284. Shoven, J.B. and J. Whalley, 1972, A general equilibrium calculation of the effects of differential taxation of income from capital in the U.S, Journal of Public Economics 1, 281-321. Starrett, D.A., 1980, On the method of taxation and the provision of local public goods, American Economic Review 70, 380-392. Stiglitz, J.E., 1973, Taxation, corporate financial policy, and the costs of capital, Journal of Public Economics, 1-34. Stiglitz, J.E., 1976, The corporation tax, Journal of Public Economics 5, 303-311. Stiglitz, J.E., 1977, The theory of local public goods, in: M.S. Feldstein and R.P. Inman, eds., The economics of public services (MacMillan, London). Stiglitz, J.E., 1978, Notes on estate taxes, redistribution, and the concept of balanced growth path incidence, Journal of Political Economy. Stiglitz, J.E., 1982, The theory of local public goods twenty-five years after Tiebout: A perspective., NBER Working Paper 954. Summers, L.H., 1981a, Taxation and corporate investment: A q theory approach, Brookings Papers on Economic Activity 1, 67-127. Summers, L.H., 1981b, Capital taxation and capital accumulation in a life cycle growth model, American Economic Review 71, 533-544. Thompson, E.A., 1967, Debt instruments in macroeconomic and capital theory, American Economic Review 57, 1196-1210. Tiebout, C.M., 1956, A purey theory of local expenditures, Journal of Political Economy 64, 416-424. Tobin, J., 1967, Life cycle saving and balanced growth, in: W. Fellner et al.. eds., Ten economic studies in the tradition of Irving Fisher (Wiley, New York) 231-256. Vandendorpe, A.L. and A.F. Friedlaender, 1975, Differential incidence in the presence of initial distorting taxes, Journal of Public Economics 6, 205-229. White, M.J., 1975, Firm location in a zoned metropolitan area, in: E.S. Mills and W.E. Oates, eds., Fiscal zoning and land use controls (Lexington, Books, Lexington, Mass.). Zodrow, G.R. and P.M. Mieszkowski, 1983, The incidence of the property tax-- the benefit view vs. the new view, in: G.R. Zodrow, ed., Local provision of public services - the Tiebout model after twenty-five years (Academic, New York). Zodrow, G.R. and P.M. Mieszkowski, 1984, Pigou, Tiebout, property taxation and the under-provision of local public goods, Rice University Working paper.
CORRIGENDUM Handbook of Public Economics, Volume I Chapter 3, "Public Sector Pricing" by Dieter B6s
Please note that the results presented on page 171 of the above-mentioned volume are incorrect in their present form. The correct version of page 171 appears overleaf.
Handbook of Public Economics, vol. II, edited by A.J. Auerbach and M. Feldstein © 1987, Elsevier Science Publishers B. V. (North-Holland)
Corrigendum 171
Ch. 3: Public Sector Pricing
We obtain the following result: 38 E (Ci-Pi) ap=YZe-(1~ -)
~i=1
EX a h Iaz C L = E Cieat i=O h=l
E (c'-Pi)
E
~P
-a,
eEE,
(2.65)
i=oj=1
±
>
E
i=0 j=l
Pi
aL(
(2.66)
Surprisingly, the structure of prices (2.65) is identical with the conditions (2.55). Hence, rationing individual labor supply does not influence the qualitative results on the structure of prices we have derived in section 2.3.1. Quantitatively, of course, prices will differ depending on the rationing scheme. The results we have obtained depend decisively on the form of individual rationing. As labor is allocated according to a binding scheme xh(L), this labor allocation cannot be influenced by changes in the public prices. Hence there is no rationale for changing the structure of public prices in order to influence the labor market. Such a rationale exists, however, if we deal with an economy where the overall employment is constrained, but where the individuals are given the option to choose the utility maximizing labor supply. The marginal condition (2.66) shows that at the rationed optimum the marginal welfare gains must be equal to the marginal costs in the public sector which result from changes of total employment plus the impact on the private profit weighted by the shadow price y.39
2.4. Time dependent pricing 2.4.1. Pricing through time, adjustment clauses The rapid inflation of the Seventies has increased the interest in pricing rules through time. The more rapidly input prices increase, the more quickly must 38 The transformations use the following properties: (a) aH/azo = 0 which holds, for instance, for a fixed revenue-cost constraint and for a fair profit per unit of output. (b) , . 1 pi ax/ape = 0, e E E. which holds because the labor income po xh is exogenously given for the consumer. 39 As y = 0 corresponds to welfare maximization and y = 1 to profit maximization, the influence of private profits will typically be lower the lower the public profit H.
1094
INDEX
Acyclicity, 688, 689, 712 Adverse selection, 802 AFDC (Aid to Families with Dependent Children), 681, 816, 821, 829-830, 867-868 After-tax consumption, 729, see also Consumption cost, 1051, 1053 income, 635, 745n, 811, 998, 999, 1009, 1011, 1048 interest rates, 1073 marginal product of land, 1087 marginal return, 1002 payments, 1048 prices, 1050, 1080 rate of return, 1082 return to capital, 1074, 1084 Agenda-setting process, 713, 714 Agent preferences, 736, 738, see also Preferences Aggregate benefits, 515 costs, 515 outcome, 527 Air pollution, 513 Allocation, 626, 630, 631, 647, 649, 650, 652, 656, 657, 659, 660, 662, 665n, 666, 673, 679-681, 684, 687, 688n, 689, 692, 694, 695, 699, 701, 702, 707n, 725, 727, 735, 738, 739, 748, 750, 751, 753, 757-761, 801, 807, 814, 937, 992, 1001-1004, 1011, 1021n, 1044 of public goods, 537-567, see also Public goods Allocative effect, 965 efficiency, 693, 758, see also Efficiency performance, 753 Altruism, 768, 1034, 1078 Altruist, 768 Amendment process, 713, 715 structure, 715 Annuities, 709, 710 Anonymity, 704 Arbitrage; 516 Arrow, K. -Gibbard-Satterthwaite theorems, 673, 691-693
impossibility theorem, 538, 547, 549 possibility theorem, 682, 684-692, 718 transitivity, 683 Auction, 554, 677 Auctioneer, 678, 679, 681 Aumann-Kurtz redistribution, 732-737, 759, 760n, 764n Axelrod, R., 667 theorem, 668, 669 Balance of payments, 934, 974, see also Foreign exchange Ballot box, 527 Bankruptcy, 699, 702n Bargaining, 512, 515, 658n costs, 515 power, 625 process, 513 strength, 514 theory, 732, 733 Bayesian-Nash behavior, 561 equilibrium, 540-543, 560, 561, 566, see also Equilibrium Before-tax income, 997-999, 1011, 1019, 1033n Benefit taxes, 1070, see also Taxes Bequests, 808, 874, 877, 1034, 1071 Bilateral bargaining, 514, 520, see also Bargaining monopoly, 514 Blair-Pollak theorem, 778 Bliss points, 708, 709 Borda count, 689, 690, 719 Bowen equilibrium, 723, see also Equilibrium procedure, 565 Bowen-Bergstrom theorem, 720, 721, 724 Budget constraints, 508, 674, 729, 731n, 784, 796-798. 807, 812, 813, 815-819, 830, 835, 845, 882, 885, 930, 931, 938, 473, 977, 1024, 1032n, 1072-1079, 1086 preferences, 708, see also Preferences Budget-maximizing bureaucrat, 718 Bureaucracy, 714, 751, 766 Bureaucrat, 739, 744-747, 749n, 751, see also Politician-bureaucrat
1096 Bureaucratic behavior, 744 machinery, 829 performance, 747 Capacity expansion, 502, 503 Capital accumulation, 807, 1032, 1070, 1084, 1088 formation, 1071, 1078, 1088 gains, 1085 goods, 967-969, 1070, 1084, 1088 infrastructure, 639 intensity, 1073, 1074 investment, 1085 market, 809, 810, 980, 1034 productivity, 1034, see also Productivity stock, 639, 980, 1032 taxation, 1030, 1065, see also Taxation Capital-labor ratio, 1072, 1080, 1081 Capital-output ratio, 1080 Capitalist, 1053, 1062, 1063, 1082 Capitalization, 591-598, 601, 629, 1070, 1086 Cardinal utility indicator, 684 Center (public decision-maker), 539, 565, see also Decision-makers Certification, 660, 661 Civil rights groups, 742 Clarke-Groves-Vickrey mechanisms, 538, 554-558 Closed-ended matching grant, 636, see also Grants Club, 631 goods, 502-504, 508, 573, 576, 577 model, 573, 576, 577, 580, 581 Coase, R. H., 513 theorem, 657, 658n Coercive mechanisms, 681 Cobb-Douglas utility function, 596, 605n, 888 Collective action, 674 bargaining, 909, 924, see also Bargaining choice, 681-694, 702, 708 decision, 537, 555, 682, 683, 688 processes, 537, 538 democratic choice, 681, 682 dictatorial choice, 681, 682 goods, 1, 485, 674 Commercial taxes, 587, 611 Commodities, 786, 912, 915, 921-923, 925, 936, 938, 940, 942, 948, 949, 962-964, 974-981, 993, 1023, 1025, 1047 Commodity taxation, 993, 1014, 1023, 1025-1027, 1037, see also Taxation taxes, 951, 1025, 1026, 1029, 1054, see also Taxes
Index Community decision-maker, 634, see also Decision-makers formation, 574 Compensation, 745n, 749, 751, 956, 1074-1076 Compensatory transfers, 538 Competition, 520, 627, 663 Competitive markets, 504 private markets, 504 Condorcet equilibrium, 718, see also Equilibrium winner, 707-710 Congested goods, 502, 504 public goods, 500-502, 504, 508 Congestion, 500-503, 505, 508 costs, 501, 506 damage, 501 externalities, 500 function, 503 goods, 502, 504 Constraints on production, 497 on distribution, 497 Consumer, 485-490, 494, 498, 499, 504, 509, 510, 516-521, 549, 581, 589, 592, 595, 602, 653-661, 694, 756, 870-872, 889,920-923, 935-938, 940, 946-948, 953, 956, 963, 966, 968, 979, 981, 1018, 1043, 1047, 1050, 1061, 1062, 1073 demand system, 922 expenditures, 870, 872, 873, 876 groups, 742 preferences, 520, 981, see also Preferences prices, 941, 945, 948, 963, 972, 984, 1033n, 1048, 1062 rate of interest, 972, 1033n rations, 952, 953 surplus, 514, 592, 982, 1046, 1047 utility, 581 Consumption, 485, 487, 490, 494, 497-502, 506, 509-511, 515-518, 521-524, 576-581, 584, 585, 588, 590, 594n, 598, 611, 653, 654, 730, 763, 787, 788, 792, 805, 807, 810, 814, 828, 871, 873, 876, 940, 948, 953, 960, 965, 978, 979, 995-998, 1011, 1013, 1017, 1018, 1023, 1025, 1028-1033n, 1035,1063, 1069, 1071, 1075-1082 bundles, 538, 979 constraints, 952 efficiency, 941, see also Efficiency events, 786 externality, 510, see also Externality function, 871, 1072 goods, 493, 494, 993, 1018, of externality, 497, 513, see also Externality
Index of goods, 486 of private goods, 494, 555 of production, 485, see also Production of public goods, 574 possibilities, 485, 487, 489, 497 targets, 786 tax, 1030, 1037, 1038, 1053, 1077 Continuous preferences, 745, see also Preferences Contract curve, 511 lines, 709 Cooperative bargaining, 734 Cost-benefit analysis, 528, 529, 909-913, 918-955, 963, 976-985 analyst, 954 literature, 931, 976 methods, 961 models, 959 techniques, 918, 919, 953 test, 911, 956, 957, 982 Cost-sharing scheme, 511 Costless mobility, 574-583 Council of Economic Advisers, 824 Cournot-Nash strategy, 720 Covariate taxes, 1044, 1054, 1064, 1083 Cremer-Riordan mechanism, 701 Current Population Survey, 822, 828, 837, 839, 858, 866, 869, 886
D'Aspremont-Gerard-Varet mechanism, 701 Deadweight efficiency loss, 485, see also Efficiency Decentralized decision-making, 634 information, 545 Decision-makers, 522, 526, 537, 539, 545, 624, 634, 635, 959, see also CenteDecision-making, 634, 669, 681, 683, 692, 699-702, 743, 753, 766, 783, 784, 909, 937 process, 594, 628, 648, 673, 682, 691-694, 703 Decisive voter, 634 Defense expenditure, 512 Demand analysis, 723 approach, 614 curve, 514, 519n, 525-527, 620, 1028, 1029, 1046 equation, 610, 611, 1068 function, 516, 518, 573, 601-606, 609-623, 925, 1064 revealing (DR) process, 696-700 parameters, 609, 615, 618 for public goods, 617
1097 side, 574, 602, 637, 880 variable, 621 Democracy, 567, 673, 681, 799 Democratic decision-making processes, 692, 694, 703, 712, 719 processes, 691, 692, 704, 712 societies, 527 Demsetz, 654, 656, 659 process, 655, 656 Depreciation, 1065, 1085 Dictator, 549, 686, 692, 694, 696, 698, 736, 747, 760 Dictatorial choice, 693 government, 762 processes, 692, 699, 719 regimes, 692 Dictatorship, 558, 688, 692, 694, 698, 736, 747, 760 Differential approach, 551 Direct consumption, 497 democracy, 567 Disability, 851, 880 benefits, 847-851 income, 848, 850 insurance, 780, 834, 849, 850 Disbenefits, 492 Diseconomies of size, 503 Disequilibrium, 711, 719, 869, 953 Disincentives, 757, 798 Disposable income, 828, 1056 Distortionary taxes, 495, 525n, 585n, 815, 992, 1005, 1012, 1024, 1035 Distribution, 626-630, 650, 652, 702, 713, 728, 730, 754n, 759, 766, 787, 795, 802, 804, 813, 825-827, 862, 888, 947, 955-958, 1008, 1044, 1071 of income, 1020, 1023, 1069 outcomes, 514 of preferences, 520, see also Preferences, of wage offers, 862 Distributional effect, 965 needs, 628 Distributive goal, 628 Disutility, 485, 744, 745 Dominant equilibrium, 540, 544, 551 see also Equilibrium Dominant strategy, 550-552, 554-557, 562 equilibrium, 540, 543, 544, 550n, see also Equilibrium implementation, 538, 545-547 mechanisms, 538, 550n, 553, 557, 559 Dynamic efficiency, 565, see also Efficiency
1098 Earnings-related pensions, 832 Economic agents, 537 behavior, 833 efficiency, 490, 523, 532, 533, 653, 754, 762, see also Efficiency environments, 549, see also Environment externalities, 631 justice, 764 policy analysis, 763 welfare, 800, 1048 Economies of scale, 518, 572, 573, 585, 598, 658n, 707n, 802 of scope, 658n Efficiency, 488, 506, 517, 574, 591, 597, 726n, 727, 739, 743, 755-758, 763, 764, 824, 887, 969, 992, 1006, 1031, 1038 benefits, 758 conditions, 486, 491-495, 498, 499, 524, 633 costs, 758 gain, 671, 756 loss, 485, 652n, 732 problems, 486 properties, 514 Employment, 661, 663, 784, 833, 838, 854-857, 861, 862, 865, 888, 957, 964-977, 983 decision, 862 equilibrium, 663, see also Equilibrium rate, 957 Endowments, 737-739, 755-759, 761 Engel curve, 521 Entrepreneur, 578, 584, 586n Entrepreneurial ability, 586n decision, 581 process, 583 rents, 581 wishing, 581 Environment, 912, 960, 983 Environmental groups, 742 Epsilon equilibrium, 669 Equality, 486, 497 constraints, 922 Equalization, 629 Equilibrium, 574, 575, 581, 586, 587, 591n, 595, 598, 637-639, 649, 669, 672, 675-677, 695n, 696n, 715, 721, 724, 735n, 737, 738n, 751, 760n, 785, 801, 808, 813, 880, 881, 888, 889, 915, 919, 920, 937, 949, 954, 985, 997n, 1001, 1019, 1020, 1031, 1044-1055, 1063-1071, 1079, 1080, 1083-1088 levels, 510 outcome, 528 messages, 540
Index Equity, 727, 731, 754, 755, 758, 763, 992, 1006, 1030, 1038 Ethical preferences, 793 Estate taxes, 1044, see also Taxes European Community, 783 Excess demand, 519, 520 Excise taxes, 1044, 1047, 1055, 1061, see also Taxes Exclusion, 486, 490, 491, 493, 524 costs, 486, 491, 492 External benefits, 497, 498, 523 Externality, 498, 499, 515, 523, 634, 656-658, 661, 698, 699, 725n, 795, 800, 918, 981, 982 good, 498 problem, 634 tax, 700, see also Taxes Factor mobility. 1065 Family Assistance Plan, 817, 831, 881, 888 Expenditure Survey (FES), 821, 823, 825, 884 formation, 868, 869 Income Supplement (FIS), 783, 818, 821 Federal Insurance Contributions Act (FICA), 816, see also Social Security Federalism, 626, 631, 639 Federalist system, 626, 630, 634 Final consumption good, 493, see also Consumption First-best taxation, 1001 Fiscal capacity, 639 equity, 639 externality, 590, 631, see also Externality federalism, 574, 625, 626, 639 impacts, 629 policy, 1071 stabilization, 626 welfare, 831, 832, see also Welfare Fixed population, 572 Flat-rate taxes, 1014, 1015 Flypaper effect, 636, 637 Food Stamps, 816, 822, 825, 866, 881 Foreign aid, 740 exchange, 922, 923, 933, 942, 954, 961, 967, 970, 974-976 trade, 923, 941 Free rider, 694 critique, 654 problem, 554, 567, 659n Free riding, 653, 654, 656 agent, 664 Free ridership, 757
1099
Index Game theoretic consideration, 514 Generalized Ramsey rule, 950 Gibbard-Satterthwaite theorem, 538, 545 Global efficiency, 523, see also Efficiency Government(s), 492 intervention, 509, 545 sector, 537 services, 492 Grandfathering, 743 Grant programs, 637 Grants, 611, 631, 634-637, 880 Grants-in-aid, 602, 629, 634, 741 Gross domestic product (GDP), 647, 870 Gross national product (GNP), 727, 832 Group choice, 574 Groves-Ledyard process, 701 Groves mechanism, 556, 702 Head count, 789, 790 Head taxes, 583, 590, 591, 624 Health insurance, 780 Hedonic approach, 573, 620-622 housing price function, 621 method, 619, 622 regression model, 599, 600 Henry George theorem, 585n High income community, 585-590, 614 households, 590, 606 individuals, 587, 589, 590 Hobbes theorem, 658n Homegeneity of income, 601 of pubic good provision, 601 Housing, 575, 584-598, 604n, 606, 607, 619-621, 814-819 benefit, 819 consumption, 584, 585, 588-590 rebates, 818 subsidies, 814 Human capital, 650 Imperfect capitalization, 593 Incentive(s), 537-567 compatibility, 537, 540, 549 constraints, 567 requirements, 538, 541, 542 theory, 539 Income constraints, 1006 differences, 630 distribution, 524, 525, 614, 957, 983 elasticities, 620, 622 function, 991 homogeneity, 627
inequality, 794, 1044 maintenance, 779-784, 795, 797, 800, 802, 815, 824, 829 redistribution, 628, 630 support, 785, 811, 815, 868 taxation, 1080 taxes, 589, 604n, 624n, 632, 723n, 780-782, 796, 798, 804, 811, 814-817, 832, 883, 884, 958, 959, 984, 1006, 1013, 1018, 1023-1027, 1029-1033, 1037, 1038, 1054, 1072-1074, 1080, 1081, see also Taxes Indifference curves, 674-677, 680, 709, 714, 745-750, 798, 805, 806, 995, 999, 1004, 1011, 1012, 1017, 1025, 1026 Indifferent ranking, 710n Indirect taxes, 930, 984, see also Taxes utility function, 504, 508 Industrial taxes, 587, 611, see also Taxes Industry output, 517, see also Output Inequality holding, 515 Inflation, 726n Inframarginal gains, 524 Inheritance taxation, 1034 Input, 529 Insurance, 802 Interdependent preferences, 793, 813, see also Preferences Intergenerational altruism, 1078 redistribution, 1071, 1075, 1076, see also Redistribution transfers, 872, 1078, see also Transfers Interior equilibrium, 580, see also Equilibrium Interjurisdictional efficiency, 572, see also Efficiency externalities, 626, 632, see also Externalities optimality, 581 Intermediate goods, 493, 529 Internal Revenue Service (IRS), 883 Intervention, 801, 807 Intrajurisdictional capitalization, 594 efficiency, 572, 591, see also Efficiency inefficiency, 623 Intransitivity, 706 Investment productivity, 1034, see also Productivity tax credit, 1085 Irrelevant alternatives, 549, 683-685, 693 Jurisdictional optimality, 572 Labor, 610 demands, 494 elasticity of, 496
1100 (continued ) force participation, 834-839, 844-850, 853, 866, 870. 871, 878 marginal productivity of, 494. see also Productivity market, 661, 662 market transactions, 661 productivity, 808, see also Productivity supply, 494-496, 833 Labor productivity supply function, 495, 797 Lagrange multipliers, 576, 915, 927-931, 961, 972, 977, 979, 983, 1016, 1031n, 1032n, 1033n Lagrangian, 494, 499, 576, 928, 930, 941, 945, 995n, 1000, 1016n, 1020, 1024, 1031n, 1032n, 1033n Land taxes, 585, 1087, see also Taxes Leisure, 995, 1017 Less developed countries (LDCs), 1036 Leviathan bureaucracy, 746 Liberty, 740, 754, 755, 761-764, 793 Life-cycle model, 870, 871, 874-876, 1034, 1071, 1076, 1078, 1079, 1086 policy simulation, 1079 savings, 876 Lifetime income, 810, 829 Limitation taxes, 638 Lindahl equilibrium, 561, 582, 623, 679, 680, 681n, see also Equilibrium pricing, 582 taxation, 582, 590, see also Taxation Linear taxes, 1015 Local dominant equilibrium, 543, see also Equilibrium public economics, 637, 638 public goods, 503, 504, 529, 574, 614, 637, 638 public sector, 628, 638, 639 Logrolling, 624, 720, 725, 727 Log run equilibrium, 627, see also Equilibrium Longitudinal Retirement History Survey, 828, 838-840, 843, 846, 874, 875, 878 Lorenz curves, 794, 884 Low income community, 585, 587, 589, 614 entry barrier, 590 household, 589, 596 individuals, 587-590 migrant, 590 Lump-sum charges, 659 income, 959 payments, 994, 1015, 1017 redistributions, 992
Index subsidy, 728-731 taxation, 507, see also Taxation taxes, 495, 525, 575, 652n, 657n, 658n, 659, 729, 731, 757, 930, 958, 994-996, 1002, 1005, 1006, 1014, 1015, 1026, 1032, 1035, 1067, see also Taxes Lyapunov function, 565
Majority disequilibrium, 711 rule, 708, 709, 720, 760, 766 equilibrium. 709, see also Equilibrium referendum, 624 voting. 550, 909 voting paths, 711 voting requirements, 753 winner. 717 Marginal benefits, 489-491, 500, 503, 523, 525, 591, 631, 751 change, 953 congestion costs, 503 congestion damage. 504 cost, 485, 486, 489, 493, 495, 523, 691, 751, 950. 954, 961-964 schedule, 525 external benefit, 523, 524n increases in output, 519 inequality, 1017 pricing. 521, 522 product, 979, 995, 1070 production cost, 489, 500 productivity, 578, see also Productivity project, 970, 973 rate of substitution, 542, 562, 577, 621, 623, 624, 1026, 1032n rate of transformation, 624 social benefit, 496 social cost, 950, 962 social product (MSP), 950, 966, 971 social value, 929, 930, 942, 945, 946, 961, 966-968, 974, 976, 978, 982 tax rate, 1001-1009, 1015, 1019-1023, 1031, 1038, 1064 tax revenue, 1052, 1054, 1059, 1061, 1067 taxes, 730-732, 735-737, 799, 804, 805, 816, 819, 820, 885, 886, see also Taxes transfer, 959, see also Transfers utility, 491, 506 valuations, 494, 517, 520, 521, 524, 525, 622 willingness-to-pay, 981 Market economy, 661 equilibrium, 517, see also Equilibrium failures, 648-650, 663, 665, 671n, 672, 738,
Index 753, 756, 757, 765, 787 imperfections, 787 inefficiencies, 499n mechanism, 663 outcomes, 513 performance, 509, 661 prices, 524, 529, 663 productivity, 1018, 1019, see also Productivity response, 660 structure, 638 transactions, 509, 661 Matching grants, 611, 636, see also Grants Maximin equilibrium, 541, 543, 561, 566, see also Equilibrium Mean voter, 624 Median agent, 550, 551 demands, 606 income, 609, 614 level of services, 622 person, 606 preferences, 528, see also Preferences public goods, 609 quantity, 606 taxes, 723 value home, 610 voter, 533, 602, 606-612, 619, 621, 623, 624, 714, 715, 721-723, 730n, 829, 830 equilibrium, 722, see also Equilibrium MDP (Malinvaud-Dreze-Poussin) mechanism, 694 procedures, 566 process, 695, 696, 697n, 702 Means-tested assistance, 781 Medicare, 825 MERGE file, 883, 884 Michigan Panel Study of Income Dynamics, 848, 850, 854, 866, 868, 886 Migration incentives, 591 relationship, 613 Minimax strategy, 696 Minimum cost scale of production, 663 income, 784 Misallocation of resources, 485 Mixed club, 582, see also Club Monopolist, 519-521, 656, 1000n Monopoly, 519-521, 658n, 718, 727, 1011n, 1036 model, 638 MRP (majority rule processes), 703-712, 718-721, 725, 727, 730, 732 MRP-SE equilibrium outcome, 713, see also Equilibrium Multiple public goods, 498 Myopia, 809 Myopic behavior, 538, 541, 542
1101 Nash assumption, 801 bargaining solution, 512, 738n, see also Bargaining equilibrium, 541, 561, 566, 665, see also Equilibrium National Assistance, 821 debt, 407 Food Survey, 786 Insurance, 780-783, 817, 821, 823 Insurance Act, 779 Insurance pensions, 823 Longitudinal Survey, 839, 841, 848, 853, 856, 874, 875, 879 Negative benefits, 496 externality, 499, 513, see also Externality income tax, 782, 799, 804-806, 810-812, 833, 834, 869, 884-886, 1038, see also Taxes marginal benefits, 493 Neoclassical model, 783, 993 Network externality, 661, see also Externality Neutrality, 704 New New Welfare Economics, 991-1041 New Welfare Economics, 991, 992, 1017 Non-compliance, 1011 Non-convexity, 505 Non-cooperative allocation, 699, see also Allocation behavior, 670, 671, 720 strategy, 668, 669, 680 Non-dictatorial Borda count, 689 mechanism, 684 scheme, 691 Non-dictatorship, 682, 685, 686, 693 Non-differentiability, 1009-1011 Non-excludability, 489n Non-existence problem, 587, 589 Non-linear tax, 922, 958, 1014, 1015 Non-matching aid, 604 grant, 635-637, see also Grant unconditional grant, 635 Non-satiation assumption, 498 Non-voters, 611, 612 Numeraire, 523, 967-972, 1059-1062 commodity, 523 OECD (Organization for Economic Cooperation and Development), 836, 871 Occupational pensions, 823, see also Pensions Old age insurance, 779 pensions, 779, see also Pensions
1102 Open-agenda equilibrium, 718, see also Equilibrium Optimal allocations, 1002, see also Allocation condition, 508 level of capacity, 502 level of public good, 518 linear tax, 1015, 1027 population, 508 tax schedule, 1022 tax structure, 1003, 1005, 1013-1015, 1038 taxation, 991, 992, 1006, 1009, 1017, 1022-1030, 1036-1038, see also Taxation Optimality, 492. 582, 1028 conditions, 502 Optimization, 558 Output, 505, 516, 517, 519, 520, 531 Overconsumption of public services, 609 Overemployment, 1002n, see also Employment Overlapping generations, 807, 808 models, 1031, 1031n, 1070, 1071 Oversupply of the public good, 528, 531 Pareto conditions, 507 criterion, 505, 507 efficiency, 522, 574, 648-659, 673, 689, 694, 695, 696n, 699, 701, 723, 725, 753, 761n, 807, 814, 936, 937, 956, 991-993, 997, 999, 1001, 1004, 1011, 1013, 1031, see also Efficiency efficient tax structures, 998, 1011, 1030, 1032n, 1037-1039, see also Taxes efficient taxation, 991-993, 1006, 1019, 1023, 1030, 1034, see also Taxation improvement, 527, 814, 956, 957, 1013 improving allocation, 673, see also Allocation improving changes, 957 improving transfers, 792, see also Transfers inefficiency, 658, 992, 1036 optimal mechanisms, 560 outcome, 547 taxation, 991, 1001, 1009, see also Taxation optimality, 506, 516, 522, 527, 537, 549, 554, 555,557-560, 562-565,574, 653--659, 662, 670, 673, 676, 679, 680, 686, 692, 693, 696, 720, 722, 724, 733n, 801 points, 709 preference, 763, see also Preferences principle, 685, 686 set, 709, 710, 712, 724, 725 superiority, 681 Partial capitalization, 593 Pay-as-you-go pensions, 410 Peak alternative, 544
Index Pensions, 779, 808, 819, 828, 835, 839, 842-845, 864, 868, 869, 880, 881, 885 and savings, 869, 878 Perfect competition, 514, see also Competition Pigou, 496, 523 -Samuelson tradition, 648 Pigouvian, 523, 524 approach, 523, 524 congestion, 506, see also Congestion prescription, 523 subsidies, 522, 523 taxation, 522, see also Taxation taxes, 523, see also Taxes Planner, 692, 694, 697, 702, 910-912, 925, 931, 933, 937-942, 953, 955, 961, 964, 966, 976-980, 983 -dictator, 694-702 Planning mechanism, 694 procedures, 538, 561 process, 694 Plott conditions, 708-710 Policy changes, 881 decisions, 815 economists, 992 parameters, 522 Policy-makers, 490, 501, 502, 522, 524, 525, 889, 955, 959 Political economy, 573, 638, 754, 780, 784, 829 externalities, 569 mechanism, 574 Politician-bureaucrat, 744-752 Pooling equilibrium, 997, 997n, 998, 1010, 1011, see also Equilibrium Population reduction, 509 Positional dictators, 550, see also Dictator Positional dictatorship, 558 Positive association, 682n, 693 benefits, 496 extemalities, 513, see also Externality transfer, 505, see also Transfers Post-tax factor costs, 1056 return to capital, 1054, 1073 Poverty, 781-792, 814, 817, 823-825, 833 gap, 823, 833 line, 787-790, 823, 826, 887 pervasiveness of, 824 plateau, 817 scale, 788 trap, 817, 820 War on, 822, 826 Preference intensity process (PIP), 704, 719, 721,
Index 727, 738, 759 Preferences, 706, 738, 739, 744, 748, 750, 753, 763, 800, 813, 1062, 1079, 1088 for private goods, 580 for public goods, 526, 574, 580 revelation mechanism, 528 Pressure groups, 831 Pre-tax income, 730, 1048 return to capital, 1054, 1069, 1073, 1076 wages, 1076, 1080, 1081 Price(s), 486, 520, 609-611 discrimination, 520 elasticity, 606-608 exclusion, 490, 491, 494 Prisoner's Dilemma, 663-672, 680, 720, 756 Private charity, 779, 801, 802 consumption, 506, 509, 576, see also Consumption contracts, 761 good consumption, 510, 588, see also Consumption goods, 485, 489-492, 494, 495, 497-499, 501, 503, 504, 506, 510, 514, 516, 524, 525, 528, 529, 532, 549, 550n, 555, 561, 576, 577, 586, 603, 605, 620, 675n, 699n, 721, 760n, 953, 981, 1069 income, 509 insurance, 780, 795, 802, 803 markets, 486. 498, 513 pensions, 808, 832, 840, see also Pensions producers, 979 production, 979, 982, see also Production projects, 976 provision of public goods, 486, 509 rates of return, 971-973 risks, 980 saving, 877 sector, 600, 634, 750, 910, 924, 963, 942, 944, 954, 970, 971, 973, 980, 982 sector firms, 954 transfers, 793, 800, 877, see also Transfers Problem-solvers, 741 Producer(s), 1050 prices, 936, 944-950, 970, 972, 976, 984, 1034, 1053, 1061 rate of interest, 972, 103 3 n rations, 951-953, 963 surplus, 1046, 1047 taxation, 1030, see also Taxation Production, 487-489, 1018, 1068, 1079, 1084 characteristics, 1013 constraints, 952 costs, 486 decisions, 917, 919
1103 divisibility, 663 efficiency, 937-942, 984, 1030, see also Efficiency function, 493, 508, 529, 576, 632, 634, 969, 978, 1021, 1031n, 1051-1053, 1064, 1066, 1072, 1080, 1081, 1086, 1088 inefficiency, 585 plan, 917, 921 prices, 963 process, 493, 576 of public goods, 574 Productivity, 994, 997, 1018, 1034, 1037n Professional bureaucrat, 714 Profit(s), 485 maximization, 923 maximizing, 591, 658 production plan, 922, 943 tax, 922, 1030, 1037, see also Taxes Progressive linear tax, 811, see also Taxes taxation, 1021, see also Taxation Progressivity, 991, 992 Project evaluation, 911, 957 Property rights, 513, 650, 651, 761 tax, 529, 583, 585n, 587-590, 591, 593, 604, 610, 624, 632, 693n, 1068, 1070, see also Taxes Proportional tax, 723n, 811, 1015, 1053, see also Taxes Public assistance, 781, 820, 822, 834, 865 bad(s), 485-487, 509, 513, 514 choice, 595, 598, 648, 739, 740, 784, 829, 831 consumption goods, 494, see also Consumption decision-maker, see also Center and Decision-maker economics, 545, 571-574, 580, 626, 637-639, 792, 1043 economy, 572, 575 employment, 965, 978 expenditures, 508, 972 finance, 571-575, 580, 603, 744 goods, 485-569, 571-592, 596-625, 628, 631-635, 638, 653-656, 664, 674, 676, 679n, 679, 688, 694, 697, 715, 721-725, 747, 758, 761n, 801, 807, 826, 980, 981, 1015, 1069, 1070 allocation of, 537-567 bundle, 581 concept, 491 consumption, 578, see also Consumption elements, 492 explicit cost of, 496 inputs, 494
1104
Index
Qualitative, political economy, 648
tax, 994, 1006, 1014, see also Taxes Re-employment, 857, 858, 861, 865 Reform, 918-920, 984 Rent-seeking, 727 Reservation wages, 844, 856, 862, 965, 981 Resource(s), 647, 766 allocation, 488, 497, 522, 648, 666n, 728, 1002, see also Allocation constraint, 1033n endowments, 734 misallocation, 495, 511 Restricted taxation, 1014, 1029, see also Taxation Retail Prices Index, 821 Retirement, 819, 833, 835, 837, 839-847, 850, 874, 876, 880, 1079, see also Pensions age, 819, 836, 838-843, 846 benefits, 836, 840, 876 decisions, 819, 834, 842, 844, 847, 864, 885 effect, 870 plans, 839 rates, 836 test, 836, 870 Returns to scale, 662, 663, 942, 1020 Revenue sharing block grant, 635, see also Grant from users, 485 Reversion level, 715 Risk averse agent, 696, 737 individuals, 1009, 1012, 1013 player, 696 worker, 662 Risk aversion, 746, 1080 "Rivalryness," 487, 490
R&D (research and development), 1038 Random taxation, 1011-1013, see also Taxation Randomization, 1011-1014 Rational expectations, 721 Rationality, 557, 564-566, 683, 685, 687-689, 695, 712, 719, 973 Rationing, 490, 491, 498, 502, 924, 975 Reagan Administration, 627 Real wage rate, 496 Reallocation, 652, 655, 734 Redistribution, 617, 628, 630, 652n, 726, 728, 732-738, 754, 757, 759, 764, 784, 792, 793, 795, 800-802, 810-815, 827-831, 887, 958, 1015, 1019, 1021, 1071, 1076, 1078, 1087 Redistributive democracy, 762 goals, 628 government, 760, 762 policies, 627, 1023 programs, 628
Sales tax, 632, see also Taxes Samuelson, P. A., 653, 689, 808 condition (rule), 489, 493, 494, 496, 498, 500, 502, 503, 508, 517, 523, 533, 562, 577, 582, 584, 597n, 624, 679n, 697, 721, 722n, 743 criterion, 655 dilemma, 656 public good, 661 Scale factor, 943, 944 Second World War, 834 Second-best literature, 947, 949 theory, 920, 924, 937 welfare economics, 958 Self-selection, 1005, 1037 constraints, 997-1001, 1002n, 1003, 1006-1008, 1012, 1013, 1021, 1024, 1031n equilibrium, 998, 1001, see also Equilibrium mechanisms, 1034 Shadow discount rate, 967-973
(continued) output, 491, see also Output oversupply of, 523 private provision of, 509 social cost of, 496 supply, 496 intermediate goods, 493 intervention, 509 policy, 485, 739, 767 production, 916, 918, 978, 984, see also Production profits, 978 relief, 779 sector, 486, 575, 576n, 591, 601, 613, 625, 628, 629, 634, 638, 639, 739, 740, 750n, 753, 767, 910-914, 917, 918, 930, 937-941, 953, 954, 966, 969 employment, 965, see also Employment firms, 954 projects, 910 services, 529, 572, 573, 576, 582, 584, 589, 600, 603-605, 608-610, 629, 631, 674, 688, 699n works, 978 Purchasing power, 813, 814 Pure income grant, 636, see also Grant private goods, 486, 487, 491, 493, 497, 499, 500, 605 public goods, 486, 487, 491, 492, 496-500, 503, 508, 510, 524, 538, 576, 586, 605, 653-656, 674, 715n, 953
Index exchange rate, 974 prices, 518, 910-920, 925-929, 931, 933, 937-952, 954,956, 961-964, 970-977, 981, 983-984, 1007, 1008n, 1031n profits, 914-918, 931, 933 revenue, 930, 932, 981 values, 967, 968, 975 wages, 954, 964-967, 981 Side constraints, 924, 931, 977 Single-peaked preferences, 707, 712, 713, 717, 721, 722n, see also Preferences Social benefit, 697, 968 choice, 537-539, 545, 547, 550, 551, 648, 684, 704, 763n, 766 control, 768 cost, 697, 950, 979 of public goods, 496 decision, 567 mechanism, 567 dividend, 782, 804-806, 811 judgments, 788, 789 optimum, 496, 633 planner, 505, see also Planner rate of return, 970-972, 973 resources, 649, 734, 739, 761n state, 538-540, 546 system, 768 transfers, 793, see also Transfers value, 507, 915, 918, 968, 980 welfare, 506, 507, 652n, 658, 683, 691n, 694, 793, 803, 804, 806, 813, 910, 911, 913, 918, 920, 924, 929, 932-935, 955, 958, 959, 968, 983, 1002, 1017 function (SWF), 933, 934, 957, 991-995, 1007, 1012, 1013, 1017, 1021, 1023, 1035, 1037 Social Security, 780, 781, 783, 808, 809, 816, 819, 828, 829, 832-841, 847-849, 869-873, 876-880, 884, 1077 Act, 779 Administration (SSA), 822, 845 benefits, 870, 875, 887 disability program, 847 pensions, 809, see also Pensions policy, 1032 and savings, 870 System, 781, 835 wealth, 871-878 Socialization, 768 Spatial public goods, 529 Special interest groups, 831 Stabilization, 626, 627, 754n Standard of living, 787, 788, 792, 800 State insurance, 802
1105 intervention, 801, 808 pensions, 779, 780, 808-810, 832, 837, 840, 876, see also Pensions Trading Corporation (STC), 923 Structure-induced equilibrium, 706, 712, 715n, 716-718, see also Equilibrium Subscription equilibrium, 537, see also Equilibrium Substitution, 636, 940, 946, 953, 1020, 1021, 1026, 1034, 1052, 1060, 1064, 1074, 1079, 1080 Supplementary Security Income (SSI), 781, 821, 822, 866 Supplementary Benefit (SB), 781, 783, 821, 823, 825, 826 Supply side, 574, 602, 603, 637, 638, 880 Survey of Economic Opportunity, 828 Taxable income, 816 Taxation, 991-993, 1043, 1073 Tax(es), 495, 524, 526, 582, 601, 604, 611, 616, 631, 633, 687, 694, 695n, 697, 699, 702, 721, 725, 730, 733, 738, 745, 753, 758, 784, 823, 843, 844, 875, 940, 947, 951 arbitrage, 1014 base, 587, 589, 628-630, 747 burden, 1044, 1049, 1052, 1059-1064, 1082, 1088 collection, 1011 cost, 594 incidence, 1043-1045, 1048-1054, 1063-1065, 1070, 1074, 1082 -price, 587-590, 604, 605, 610, 611, 625, 629 rate, 746, 747 reform, 1014, 1048, 1049, 1053 relief, 802 rules, 695 structures, 992 Tiebout, C. M., 504, 654 assumption, 530, 575 bias problem, 622 efficiency, 598-601, see also Efficiency equilibrium, 575, 590, 591, 596n, 601, 1069, see also Equilibrium framework, 573, 575, 589, 591 hypothesis, 600n, 601 model, 571-575, 579-587, 590-594, 597, 598, 601, 603, 620, 625, 627, 630, 638, 1069 process, 612-614 sorting process, 629 world, 612 Tit-for-tat strategy, 668, 669 Trade, gains from 513 Transactions costs, 513, 567, 628 Transfers, 728, 731, 732, 734, 736, 738, 757, 784, 797-804, 807, 810, 823-830, 877, 887,
1106 (continued) 888, 922, 948, 959, 967, 1049 Transformation function, 488, 496 Transitivity, 683, 687, 688, 693 Two-good world, 497 Unanimity, 677 Uncertainty, 977-980, 984 and time, 977 Underconsumption of housing, 494, see also Housing of public services, 609 Undersupply of the public good, 528 Unemployed, 840, 858, 862, 957, 966, 983 Unemployment, 627, 661, 785, 799, 803, 821, 834, 839, 851-863, 872, 876, 877, 920, 957 benefit, 834, 851-856, 859, 861, 863, 966 compensation, 839 duration, 856-858, 863, 865 insurance (U1), 779, 799, 833, 851-856, 865 parent (UP), 821 rate, 852 spell, 862 Uniform price, 516 Universal domain, 547-706 Unrestricted domain, 682, 685, 686, 693, 704 User benefits, 490 Utilitarian, 765, 793, 992, 994, 1018 solution, 996 Utiltarianism, 740, 763, 764n, 794, 991, 1007, 1038 Utility, 1004 constraint, 488, 489 function, 488, 491-494, 497, 499, 502, 504, 508, 521, 529, 532n, 542, 549-551, 555, 558, 561, 576, 587, 596, 603, 632, 729, 792-797, 803, 814, 829, 844, 915, 966, 1002, 1008, 1021-1026, 1029, 1034, 1062, 1072, 1079 gain, 744 level, 1001 maximization, 510, 582, 584, 587, 844, 1007, 1035 maximizing agents, 666n, 672 behavior, 637, 676, 744 individual, 793 labor supply function, 797 level, 576, 583 outcome, 674 private transfer, 800, see also Transfers value, 735 Utilization and capacity, aggregate, 501
Index Value -added taxes, 1054n, see also Taxes judgments, 955-961 -restricted preferences, 706 Voluntary exchange, 524, 648 Vote-trading equilibrium, 720, see also Equilibrium Voter(s), 601, 602, 606-610, 615, 619, 636, 684, 686, 688-690, 706, 708, 710, 713-715, 719, 738n, 739, 743, 745-748, 751, 760n, 762, 767, 768 behavior, 721 compatibility, 686 indifference, 717 preferences, 686, 706, 708, 710, 721, 725, 745n, 748, 829, see also Preferences utility, 745 welfare, 745, 748-750, see also Welfare Voting agenda, 533 cycle, 705, 706, 709, 712 mechanism, 527 Wage differences, 509 tax, 1077, 1080, see also Taxes Walrasian equilibrium, 662, see also Equilibrium War on Poverty, 822, 826 Wealth tax, 1067 Welfare, 629, 635, 671n, 689, 744, 745, 805, 807, 865, 866, 883, 886, 888, 955-958, 1038, 1062, 1071, 1079 benefits, 627 change, 1061 consequences, 808 economies, 652n, 763, 785, 791, 792, 824 function, 505, 506, 791, 983 gain, 532n, 934, 1023, 1038 group, 742 losses, 751 maximum, 507 optimum, 508, 632 payments, 831, 867 programs, 627, 831 state, 829-833, 888 weight, 955, 959-961, 966 Wicksell, K. 527, 673, 676, 677 -Lindahl process, 697n Wicksellian unanimity, 527 Widows pensions, 779, see also Pensions Willingness-to-pay, 620 Workhouse, 779, see also Poverty Zoning, 779