THE
QUARTERLY JOURNAL OF ECONOMICS Vol. CXXV
February 2010
Issue 1
FREE DISTRIBUTION OR COST-SHARING? EVIDENCE FROM ...

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

THE

QUARTERLY JOURNAL OF ECONOMICS Vol. CXXV

February 2010

Issue 1

FREE DISTRIBUTION OR COST-SHARING? EVIDENCE FROM A RANDOMIZED MALARIA PREVENTION EXPERIMENT∗ JESSICA COHEN AND PASCALINE DUPAS It is often argued that cost-sharing—charging a subsidized, positive price— for a health product is necessary to avoid wasting resources on those who will not use or do not need the product. We explore this argument through a field experiment in Kenya, in which we randomized the price at which prenatal clinics could sell long-lasting antimalarial insecticide-treated bed nets (ITNs) to pregnant women. We find no evidence that cost-sharing reduces wastage on those who will not use the product: women who received free ITNs are not less likely to use them than those who paid subsidized positive prices. We also find no evidence that costsharing induces selection of women who need the net more: those who pay higher prices appear no sicker than the average prenatal client in the area in terms of measured anemia (an important indicator of malaria). Cost-sharing does, however, considerably dampen demand. We find that uptake drops by sixty percentage points when the price of ITNs increases from zero to $0.60 (i.e., from 100% to 90% subsidy), a price still $0.15 below the price at which ITNs are currently sold to pregnant women in Kenya. We combine our estimates in a cost-effectiveness analysis of the impact of ITN prices on child mortality that incorporates both private and social returns to ITN usage. Overall, our results suggest that free distribution of ITNs could save many more lives than cost-sharing programs have achieved so far, and, given the large positive externality associated with widespread usage of ITNs, would likely do so at a lesser cost per life saved. ∗ We thank Larry Katz, the editor, and four anonymous referees for comments that significantly improved the paper. We also thank David Autor, Moshe Bushinsky, Esther Duflo, William Easterly, Greg Fischer, Raymond Guiteras, Sendhil Mullainathan, Mead Over, Dani Rodrik, and numerous seminar participants for helpful comments and suggestions. We thank the Mulago Foundation for its financial support, and the donors to TAMTAM Africa for providing the free nets distributed in this study. Jessica Cohen was funded by a National Science Foundation Graduate Research Fellowship. We are very grateful to the Kenya Ministry of Health and its staff for their collaboration. We thank Eva Kaplan, Nejla Liias, and especially Katharine Conn, Carolyne Nekesa, and Moses Baraza for the smooth implementation of the project and the excellent data collection. All errors are our own. [email protected], [email protected] C 2010 by the President and Fellows of Harvard College and the Massachusetts Institute of

Technology. The Quarterly Journal of Economics, February 2010

1

2

QUARTERLY JOURNAL OF ECONOMICS

I. INTRODUCTION Standard public finance analysis implies that health goods generating positive externalities should be publicly funded, or even subsidized at more than 100% if the private nonmonetary costs (such as side effects) are high. Although this analysis applies to goods whose effectiveness is independent of the behavior of the recipients (e.g., vaccines, deworming pills administered to schoolchildren), it does not necessarily apply to goods that require active usage (adherence) by their owner for the public health benefits to be realized (e.g., bed nets for reduced malaria transmission, pit latrines for reduced water contamination). For such goods, charging nonzero prices (“cost-sharing”) could improve the efficacy of public subsidies by reducing wastage from giving products to those who will not use them. There are three possible effects of positive prices on the likelihood that people who acquire the product use it appropriately. First, a selection effect: charging a positive price could select out those who do not value the good and place it only in the hands of those who are likely to use it (Oster 1995; Population Services International [PSI] 2003; Ashraf, Berry, and Shapiro forthcoming). Second, a psychological effect: paying a positive price for a good could induce people to use it more if they exhibited “sunk cost” effects (Thaler 1980; Arkes and Blumer 1985). Third, higher prices may encourage usage if they are interpreted as a signal of higher quality (Bagwell and Riordan 1991; Riley 2001). Although cost-sharing may lead to higher usage intensity than free distribution, it may also reduce program coverage by dampening demand. A number of experimental and field studies indicate that there may be special psychological properties to zero financial price and that demand may drop precipitously when the price is raised slightly above zero (Ariely and Shampan’er 2007; Kremer and Miguel 2007). Beyond reducing demand, selection effects are not straightforward in the context of credit and cash constraints: if people who cannot afford to pay a positive price are more likely to be sick and need the good, then charging a positive price would screen out the neediest and could significantly reduce the health benefits of the partial subsidy. In the end, the relative benefits of various levels of subsidization of health products depend on a few key factors: (1) the elasticity of demand with respect to price, (2) the elasticity of usage with respect to price (which potentially includes selection, psychological, and signaling effects), (3) the impact of price variation on the

FREE DISTRIBUTION OR COST SHARING?

3

vulnerability (i.e., need) of the marginal consumer, and, finally, (4) the presence of nonlinearities or externalities in the health production function.1 This paper estimates the first three parameters and explores the trade-offs between free distribution and cost-sharing for a health product with a proven positive externality: insecticidetreated bed nets (ITNs). ITNs are used to prevent malaria infection and have proven highly effective in reducing maternal anemia and infant mortality, both directly for users and indirectly for nonusers with a large enough share of users in their vicinity. The manufacture of ITNs is expensive, and the question of how much to subsidize them is at the center of a very vivid debate in the international community, opposing proponents of free distribution (Sachs 2005; World Health Organization [WHO] 2007) to advocates of cost-sharing (PSI 2003; Easterly 2006). In a field experiment in Kenya, we randomized the price at which 20 prenatal clinics could sell long-lasting ITNs to pregnant women. Four clinics served as a control group and four price levels were used among the other 16 clinics, ranging from 0 (free distribution) to 40 Kenyan shillings (Ksh) ($0.60). ITNs were thus heavily subsidized, with the highest price corresponding to a 90% subsidy, comparable to the subsidies offered by the major costsharing interventions operating in the area and in many other malaria-endemic African countries. To check whether women who need the ITN most are willing to pay more for it, we measured hemoglobin levels (a measure of anemia and an important indicator of malaria in pregnancy) at the time of the prenatal visit. To estimate the impact of price variation on usage, we visited a subsample of women at home a few months later to check whether they still had the nets and whether they were using them. The relationship between prices and usage that we estimate based on follow-up home visits is the combined effect of selection and sunk cost effects.2 To isolate these separate channels, we 1. There are other potential channels from the price of a health product to its health impact. For example, the price could influence how the product is cared for (e.g., a more expensive bed net could be washed too frequently, losing the efficacy of its insecticide) or could have spillover effects to other health behaviors. We focus on the four channels described because these are most commonly cited in the debate over pricing of public health products and likely to have first-order impacts on the relationship between prices and health outcomes. 2. The correlation between prices and usage is also potentially the product of signaling effects of prices, but this is unlikely in our context. Qualitative evidence suggests that the great majority of households in Kenya know that ITNs are subsidized heavily for pregnant women and young children and that the “true” price of ITNs (i.e., the signal of their value) is in the $4–$6 range. This is likely due to the fact that retail shops sell unsubsidized ITNs at these prices.

4

QUARTERLY JOURNAL OF ECONOMICS

follow Karlan and Zinman (forthcoming) and Ashraf, Berry, and Shapiro (forthcoming) and implement a randomized two-stage pricing design. In clinics charging a positive price, a subsample of women who decided to buy the net at the posted price were surprised with a lottery for an additional discount; for the women sampled for this second-stage lottery, the actual price ranged from 0 to the posted price. Among these women, any variation in usage with the actual price paid should be the result of psychological sunk cost effects. Taken together, both stages of this experimental design enable us to estimate the relative merits of free distribution and varying degrees of cost-sharing on uptake, selection and usage intensity. We find that uptake of ITNs drops significantly at modest costsharing prices. Demand drops by 60% when the price is increased from zero to 40 Ksh ($0.60). This latter price is still 10 Ksh ($0.15) below the prevailing cost-sharing price offered to pregnant women through prenatal clinics in this region. Our estimates suggest that of 100 pregnant women receiving an ITN under full subsidy, 25 of them would purchase an ITN at the prevailing cost-sharing price. Given the very low uptake at higher prices, the sample of women for which usage could be measured is much smaller than the initial sample of women included in the experiment, limiting the precision of the estimates of the effect of price on usage. Keeping this caveat in mind, we find no evidence that usage intensity is increasing with the offer price of ITNs. Women who paid the highest price were slightly more likely (though without statistical significance) to be using the net than women who received the net for free, but at intermediate prices the opposite was true, showing no clear relationship between the price paid and probability of usage, as well as no discontinuity in usage rates between zero and positive prices. Further, when we look only at women coming for their first prenatal care visits (the relevant long-run group to consider), usage is highest among women receiving the fully subsidized net. Women who received a net free were also no more likely to have resold it than women paying higher prices. Finally, we did not observe a second-hand market develop. Among both buyers of ITNs and recipients of free ITNs, the retention rate was above 90%. The finding that there is no overall effect of ITN prices on usage suggests that potential psychological effects of prices on usage are minor in this context, unless they are counteracted by opposite selection effects, which is unlikely. The second-stage randomization enables us to formally test for the presence of sunk-cost

FREE DISTRIBUTION OR COST SHARING?

5

effects (without potentially confounding selection effects) and yields no significant effect of the actual price paid (holding the posted price constant) on usage. This result is consistent with a recent test of the sunk-cost fallacy for usage of a water purification product in Zambia (Ashraf, Berry, and Shapiro forthcoming). In order to explore whether higher prices induce selection of women who need the net more, we measured baseline hemoglobin levels (anemia rates) for women buying/receiving nets at each price. Anemia is an important indicator of malaria, reflecting repeated infection with malaria parasites, and is a common symptom of the disease in pregnant women in particular. We find that prenatal clients who pay positive prices for an ITN are no sicker, at baseline, than the clients at the control clinics. On the other hand, we find that recipients of free nets are healthier at baseline than the average prenatal population observed at control clinics. We suspect this is driven by the incentive effect the free net had on returning for follow-up prenatal care before the benefits of the previous visit (e.g., iron supplementation) had worn off. Taken together, our results suggest that cost-sharing ITN programs may have difficulty reaching a large fraction of the populations most vulnerable to malaria. Although our estimates of usage rates among buyers suffer from small-sample imprecision, effective coverage (i.e., the fraction of the population using a program net) can be precisely estimated and appears significantly (and considerably) higher under free distribution than under a 90% subsidy. In other words, we can confidently reject the possibility that the drop in demand induced by higher prices is offset by an increase in usage. Because effective coverage declines with price increases, the level of coverage under cost-sharing is likely to be too low to achieve the strong social benefits that ITNs can confer. When we combine our estimates of demand elasticity and usage elasticity in a model of cost-effectiveness that incorporates both private and social benefits of ITNs on child mortality, we find that for reasonable parameters, free distribution is at least as cost-effective as partially but still highly subsidized distribution, such as the cost-sharing program for ITNs that was under way in Kenya at the time of this study. We also find that, for the full range of parameter values, the number of child lives saved is highest when ITNs are distributed free. Our results have to be considered in their context: ITNs have been advertised heavily for the past few years in Kenya, both by the Ministry of Health and by the social-marketing

6

QUARTERLY JOURNAL OF ECONOMICS

nongovernmental organization Population Services International (PSI); pregnant women and parents of young children have been particularly targeted by the malaria prevention messages; and most people (even in rural areas) are aware that the unsubsidized price of ITNs is high, thus reducing the risk that low prices through large subsidies are taken as a signal of bad quality. Our results thus do not speak to the debate on optimal pricing for health products that are unknown to the public. But if widespread awareness about ITNs explains why price does not seem to affect usage among owners, it makes the price sensitivity we observe all the more puzzling. Although large effects of prices on uptake have been observed in other contexts, they were found for less well-known products, such as deworming medication (Kremer and Miguel 2007) and contraceptives (Harvey 1994). Given the high private returns to ITN use and the absence of a detected effect of price on usage, the price sensitivity of demand we observe suggests that pregnant women in rural Kenya are credit- or saving-constrained. The remainder of the paper proceeds as follows. Section II presents the conceptual framework. Section III provides background information on ITNs and describes the experiment and the data. Section IV describes the results on price elasticity of demand, price elasticity of usage, and selection effects on health. Section V presents a cost-effectiveness analysis, and Section VI concludes. II. A SIMPLE MODEL OF PIGOUVIAN SUBSIDIES This section develops a simple model to highlight the parameters that must be identified by the experiment to determine the optimal subsidy level. Assume that ITNs have two uses: a health use, when the net is hung, and a nonhealth use, for which the net is not hung.3 Nonhealth uses could be using the net for fishing, or simply leaving it in its bag for later use, for example, when a previous net wears out. Health use of the ITNs generates positive health externalities but nonhealth uses do not. Purchasing a net for health or nonhealth purposes costs the same to the household. The price of a net to a household is the marginal cost C minus a subsidy T. We call h the number of nets used for health purposes and n the number of nets used for nonhealth purposes. The household 3. We thank an anonymous referee for suggesting this formalization.

FREE DISTRIBUTION OR COST SHARING?

7

utility is U = u(h) + v(n) − (C − T )(h + n) + kH, where u(h) is the utility from having hanging nets, with u ≥ 0 and u ≤ 0; v(n) is the utility from nonhanging nets, with v ≥ 0 and v ≤ 0; H is the average number of nets used for health purposes per household; and the constant k represents the positive health externality.4 When choosing how many nets to invest in, the household ignores the health externality and chooses h and n such that u (h) = v (h) = C − T . Increasing the size of the subsidy T increases households’ investment in nets for health use, and thus the health externality. Because the subsidy is common for all nets, however, increasing T might also affect households’ investment in nets for nonhealth use. Call N the average number of nets used for nonhealth purposes per household. The marginal cost of increasing the health externality is T × [d(H + N)/dT], whereas the marginal benefit is only k × (dH/dT). The efficient subsidy level is the level that equates the marginal cost of increasing the externality to the marginal benefit of increasing it: T = [k × (dH/dT )]/[d(H + N)/dT ]. If N does not respond to the subsidy (dN/dT = 0), the optimal subsidy is k, the level of the externality, as in Pigou’s standard theory. But if subsidizing H distorts the amount of N consumed upward, the optimal subsidy is lower than the level of the externality. The gap between the level of the externality and the optimal subsidy level will depend on how sensitive the hanging of nets is to price, relative to total ownership of nets. In other words, what we need to learn from the experiment is the following: when we increase the price, by how much do we reduce the number of hanging nets (nets put to health use), and how does it compare to the reduction in the total number of nets acquired? This simple model could be augmented to incorporate imperfect information (for the household) on the true returns to hanging nets, especially on the relative curvature of u(.) and v(.). The lack of information could be on the effectiveness or the quality of ITNs. In this context, households could use the subsidy level as a signal of effectiveness or quality (i.e., if households interpret the size of the subsidy as the government’s willingness to pay to increase coverage and thus as a measure of the net’s likely effectiveness). 4. For simplicity we assume that the positive health externality is linear in the share of the population that is covered with a net. In reality the health externality for malaria seems to be S-shaped.

8

QUARTERLY JOURNAL OF ECONOMICS

In such a case, subsidizing H would distort the amount of N consumed downward, and the optimal subsidy would be greater than the level of the externality. Alternatively, households could lack information on the nonmonetary transaction cost of hanging the net and underestimate this cost when they invest in nets for health use. Once households realize how much effort is required to hang the net (hanging it every evening and dehanging it every morning can be cumbersome for households that sleep in their living rooms), they might decide to reallocate a net from health use to nonhealth use. Households that suffer from the sunk-cost fallacy, however, would be less likely to reallocate a net from health use to nonhealth use if they had to pay a greater price for the net. This could be formalized, for example, by adding an effort cost in the function u(.), and assuming that the disutility of the effort needed to hang the net is weighted by the relative importance of the nonmonetary cost (effort) in the total cost of the net (nonmonetary cost + monetary cost). Increasing the subsidy level (decreasing the price) would then increase the disutility of putting forth effort to hang the net and increase the likelihood that households do not use the net. This sunk cost effect would lead to an upward distortion of N, and imply a subsidy level lower than the level of the externality. For a quick preview of our findings, Figure I plots the demand curve and the “hanging curve” observed in our experiment. The slope of the top curve is an estimate of −d(H + N)/dT and the slope of the bottom curve estimates −dH/dT. We find no systematic effect of the price on the ratio of these two slopes. When the price decreases from 10 Ksh to 0, the ratio of hanging nets to acquired nets actually increases, suggesting that the full subsidy (a price of zero) distorts the demand for nonhanging nets downward. However, at higher price levels, the effect of changing the subsidy is different. The ratio increases when the price decreases from 40 to 20 Ksh and from 20 to 10 Ksh. Overall, however, the ratio remains quite close to 1 over the price range we study.

III. BACKGROUND ON ITNS AND EXPERIMENTAL SETUP III.A. Background on Insecticide-Treated Nets ITNs have been shown to reduce overall child mortality by at least 20% in regions of Africa where malaria is the leading cause of death among children under five (Lengeler 2004). ITN

9

0

0.2

0.4

0.6

0.8

1

FREE DISTRIBUTION OR COST SHARING?

Free

10Ksh

20Ksh

40Ksh

Price of ITN Acquired ITN Acquired ITN and using it

95% CI 95% CI

FIGURE I Ownership vs. Effective Coverage Sample includes women sampled for baseline survey during clinic visit, and who either did not acquire an ITN or acquired one and were later randomly sampled for the home follow-up. Usage of program ITN is zero for those who did not acquire a program ITN. Error bars represent ±2.14 standard errors (5% confidence interval with fourteen degrees of freedom). At the time this study was conducted, ITNs in Kenya were social-marketed through prenatal clinics at a price of 50 Ksh.

coverage protects pregnant women and their children from the serious detrimental effects of maternal malaria. In addition, ITN use can help avert some of the substantial direct costs of treatment and the indirect costs of malaria infection on impaired learning and lost income. Lucas (forthcoming) estimates that the gains to education from a malaria-free environment alone more than compensate for the cost of an ITN. Despite the proven efficacy and increasing availability of ITNs on the retail market, the majority of children and pregnant women in sub-Saharan Africa do not use ITNs.5 At $5–$7 a net (US$ in PPP), they are unaffordable to most families, and so governments and NGOs distribute ITNs at heavily subsidized prices. However, the price that is charged for the net 5. According to the World Malaria Report (2008), which compiled results from surveys in 18 African countries, 23% of children and 27% of pregnant women sleep under ITNs.

10

QUARTERLY JOURNAL OF ECONOMICS

varies greatly by the distributing organization, country, and consumer. The failure to achieve higher ITN coverage rates despite repeated pledges by governments and the international community (such as the Abuja Declaration of 2000) has put ITNs at the center of a lively debate over how to price vital public health products in developing countries (Lengeler et al. 2007). Proponents of cost-sharing ITN distribution programs argue that a positive price is needed to screen out people who will not use the net, and thus avoid wasting the subsidy on nonusers. Cost-sharing programs often have a “social marketing” component, which uses mass media communication strategies and branding to increase the consumer’s willingness to pay (Schellenberg et al. 2001; PSI 2003). The goal is to shore up demand and usage by making the value of ITN use salient to consumers. Proponents of cost-sharing programs also point out that positive prices are necessary to ensure the development of a commercial market, considered key to ensuring a sustainable supply of ITNs. Proponents of full subsidization argue that, although the private benefits of ITN use can be substantial, ITNs also have important positive health externalities deriving from reduced disease transmission.6,7 In a randomized trial of an ITN distribution program at the village level in western Kenya, the positive impacts of ITN distribution on child mortality, anemia, and malaria infection were as strong among nonbeneficiary households within 300 meters of beneficiary villages as they were among households in the beneficiary villages themselves (Gimnig et al. 2003).8 Although ITNs may have positive externalities at low levels of coverage (e.g., for unprotected children in the same household), it is estimated that at least 50% coverage is required to achieve strong community effects on mortality and morbidity (Hawley et al. 2003). To date, no cost-sharing distribution program is known to have reached this threshold (WHO 2007). 6. The external effects of ITN use derive from three sources: (1) fewer mosquitoes due to contact with insecticide, (2) reduction in the infective mosquito population due to the decline in the available blood supply, and (3) fewer malaria parasites to be passed on to others. 7. The case for fully subsidizing ITNs has also been made on the basis of the substantial costs to the government of hospital admissions and outpatient consultations due to malaria (Evans et al. 1997). 8. In a similar study in Ghana, Binka, Indome, and Smith (1998) find that child mortality increases by 6.7% with each 100-meter shift away from the nearest household with an ITN.

FREE DISTRIBUTION OR COST SHARING?

11

III.B. Experimental Setup The experiment was conducted in twenty communities in western Kenya, spread across four districts: Busia, Bungoma, Butere, and Mumias. Malaria is endemic in this region of Kenya: transmission occurs throughout the year with two peaks corresponding to periods of heavy rain, in May/June/July and October/November. In two nearby districts, a study by the CDC and the Kenyan Medical Research Institute found that pregnant women may receive as many as 230 infective bites during their forty weeks of gestation, and as a consequence of the high resulting levels of maternal anemia, up to a third of all infants are born either premature, small for gestational age, or with low birth weight (Ter Kuile et al. 2003). The latest published data on net ownership and usage available for the region come from the Kenya Demographic and Health Survey of 2003. It estimated that 19.8% of households in Western Kenya had at least one net and 6.7% had a treated net (an ITN); 12.4% of children under five slept under a net and 4.8% under an ITN; 6% of pregnant women slept under a net the night before and 3% under an ITN. Net ownership is very likely to have gone up since, however. In July 2006, the Measles Initiative ran a oneweek campaign throughout western Kenya to vaccinate children between nine months and five years of age and distributed a free long-lasting ITN to each mother who brought her children to be vaccinated. The 2008 World Malaria Report uses ITN distribution figures to estimate that 65% of Kenyan households now own an ITN. A 2007 survey conducted (for a separate project) in the area of study among households with school-age children found a rate of long-lasting ITN ownership around 30% (Dupas 2009b). Our experiment targeted ITN distribution to pregnant women visiting health clinics for prenatal care.9 We worked with 20 rural public health centers chosen from a total of 70 health centers in the region, 17 of which were private and 53 were public. The 20 health centers we sampled were chosen based on their public status, their size, services offered, and distance from each other. We then randomly assigned them to one of five groups: four clinics formed the control group; five clinics were provided with ITNs 9. The ITNs distributed in our experiment were PermaNets, sold by Vestergaard Frandsen. They are circular polyester bed nets treated with the insecticide Deltamethrin and maintain efficacy without retreatment for about three to five years (or about twenty washes).

12

QUARTERLY JOURNAL OF ECONOMICS

and instructed to give them free of charge to all expectant mothers coming for prenatal care; five clinics were provided with ITNs to be sold at 10 Ksh (corresponding to a 97.5% subsidy); three clinics were provided with ITNs to be sold at 20 Ksh (95.0% subsidy); and the last three clinics were provided with ITNs to be sold at 40 Ksh (90% subsidy). The highest price is 10 Ksh below the prevailing subsidized price of ITNs in this region, offered through PSI to pregnant women at prenatal clinics.10 Table I presents summary statistics on the main characteristics of health centers in each group. Although the relatively small number of clinics leads to imperfect balancing of characteristics, the clinics appear reasonably similar across ITN price assignment and we show below that controlling for clinic characteristics does not change our estimates except to add precision. Clinics were provided with financial incentives to carry out the program as designed. For each month of implementation, clinics received a cash bonus (or a piece of equipment of their choice) worth 5,000 Ksh (approximately $75) if no evidence of “leakage” or mismanagement of the ITNs or funds was observed. Clinics were informed that random spot checks of their record books would be conducted, as well as visits to a random subsample of beneficiaries to confirm the price at which the ITNs had been sold and to confirm that they had indeed purchased ITNs (if the clinic’s records indicated so). Despite this, we observed leakages and mismanagement of the ITNs in four of the eleven clinics that were asked to sell ITNs for a positive price. We did not observe any evidence of mismanagement in the five clinics instructed to give out the ITNs for free. Of the four clinics that mismanaged the ITNs, none of them altered the price at which ITNs were made available to prenatal clients, but they sold some of the program ITNs to ineligible recipients (i.e., nonprenatal clients). The ITN distribution program was phased into program clinics between March and May 2007 and was kept in place for at least three months in each clinic, throughout the peak “long rains” malaria season and subsequent months. Posters were put up in clinics to inform prenatal clients of the price at which the ITNs were sold. Other than offering a free hemoglobin test to each woman on survey days, we did not interfere with the normal 10. Results from a preprogram clinic survey suggest that it is perhaps not appropriate to interpret our results in the context of widely available ITNs to pregnant women at 50 Ksh, as many of the clinics reported the supply of PSI nets to be erratic and frequently out of stock.

4

67 [46.3] 114 [69.4] 10.0 [8.2] 0.50 [0.58] 3.8 [2.9] 11.3 [2.6] 5

13.3

3.4

0.40

12.0

117

63

(2)

(1)

5

13.0

3.6

0.80

4.0

164

75

(3)

10 Ksh ($0.15)

3

12.1

4.3

0.67

13.3

106

54

(4)

20 Ksh ($0.30)

3

11.4

5.0

0.33

10.0

122

62

(5)

40 Ksh ($0.60)

.743

.769

.507

.292

.565

.769

(6)

p-value, joint test 1

.593

.758

.713

.619

.847

.965

(7)

p-value, joint test 2

Notes: Standard deviations presented in brackets. At the time of the program, $US 1 was equivalent to around 67 Kenyan shillings (Ksh). Prenatal clinics were sampled from a pool of seventy prenatal clinics over four districts in Kenya’s Western Province: Busia, Bungoma, Butere, and Mumias. Joint test 1: Test of equality of means across four treatment groups. Joint test 2: Joint test that means in treatment groups are equal to mean in control group.

Number of clinics

Distance (in km) to closest prenatal clinic in the sample

Total other prenatal clinics within 10 kilometers (km)

Fraction of clinics with HIV testing services

Prenatal enrollment fee (in Ksh)

Average monthly attendance in 2006 (first + subsequent visits)

Average monthly attendance in 2006 (first visits ONLY)

0 Ksh (free)

Control group

Treatment groups ITN price:

TABLE I CHARACTERISTICS OF PRENATAL CLINICS IN THE SAMPLE, BY TREATMENT GROUP

FREE DISTRIBUTION OR COST SHARING?

13

14

QUARTERLY JOURNAL OF ECONOMICS

procedures these clinics used at prenatal care visits, which in principle included a discussion of the importance of bed net usage. Within clinics where the posted price was positive, a second stage randomization was conducted on unannounced, random days. On those days, women who had expressed their willingness and showed their ability to purchase an ITN at the posted price (by putting the required amount of money on the counter) were surprised by the opportunity to participate in a lottery for an additional promotion by picking an envelope from a basket. All women given the opportunity to participate in the lottery agreed to pick an envelope. The final price paid by these women was the initial offer price if they picked an empty envelope; zero if they picked a “free net” envelope; or a positive price below the initial offer price if the initial price was 40 Ksh. This second-stage randomization started at least five weeks after the program had started in a given clinic, and took place no more than once a week, on varying week days, to avoid biasing the women’s decisions to purchase the ITN based on the expectation of a second-stage discount.11 III.C. Data Three types of data were collected. First, administrative records kept by the clinic on ITN sales were collected. Second, each clinic was visited three or four times on random days, and on those days enumerators surveyed all pregnant women who came for a prenatal visit. Women were asked basic background questions and whether they purchased a net, and their hemoglobin levels were recorded. In total, these measures were collected from 545 pregnant women. Third, a random sample of 246 prenatal clients who had purchased/received a net through the program were selected to be visited at their homes three to ten weeks after their net purchases. All home visits were conducted within three weeks in July 2007 to ensure that all respondents faced the same environment (especially in terms of malaria seasonality) at the time of the follow-up. Of this subsample, 92% (226 women) were found and consented to be interviewed. During the home visits, respondents were asked to show the net, whether they had started using it, and who was sleeping under it. Surveyors 11. By comparing days with and those without the lottery, we can test whether women heard about the lottery on days we did the lottery. We do not find evidence that uptake was higher on the days we performed the lottery; we also do not observe a significant increase in the uptake of nets after the first lottery day (data not shown).

FREE DISTRIBUTION OR COST SHARING?

15

checked to see whether the net had been taken out of the packaging, whether it was hanging, and the condition of the net.12 Note that, at the time of the baseline survey and ITN purchase, women were not told that follow-up visits could be made at their homes. What’s more, neither the clinic staff nor the enumerators conducting the baseline surveys knew that usage would be checked. This limits the risk that usage behavior might be abnormally high during the study period. Also note that we do not observe an increase in reported or observed usage over the three weeks during which the home surveys were conducted. This suggests that the spread of information about the usage checks was limited and unlikely to have altered usage behavior. III.D. Clinic-Level Randomization The price at which ITNs were sold was randomized at the clinic level, but our outcomes of interest are at the individual level: uptake, usage rates, and health. When regressing individual-level dependent variables on clinic-level characteristics, we are likely to overstate the precision of our estimators if we ignore the fact that observations within the same clinic (cluster) are not independent (Moulton 1990; Donald and Lang 2007). We compute clusterrobust standard errors using the cluster-correlated Huber–White covariance matrix method. In addition, because the number of clusters is small (sixteen treatment clinics), the critical values for the tests of significance are drawn from a t-distribution with fourteen (= 16 − 2) degrees of freedom (Cameron, Miller, and Gelbach 2007). The critical values for the 1%, 5%, and 10% significance levels are thus 2.98, 2.14, and 1.76, respectively. Another approach to credibly assessing causal effects with a limited number of randomization units is to use (nonparametric) randomization inference, first proposed by Fisher (1935), later developed by Rosenbaum (2002), and recently used by Bloom et al. (2006). Hypothesis testing under this method is done as follows. For each clinic, we observe the share of prenatal clients who purchased a net (or were using a net). Let yi denote the observed purchase rate for clinic i. For each clinic i = 1, 2, . . . ,16, Yi (Pi ) represents the purchase rate at clinic i when the ITN price at clinic i is Pi , Pi ∈ [0, 10, 20, 40]. The outcome variable is a function of 12. The nets that were distributed through the program were easily recognizable through their tags. Enumerators were instructed to check the tags to confirm the origin of the nets.

16

QUARTERLY JOURNAL OF ECONOMICS

the treatment variable and potential outcomes: (1|Pi = k)Yi (k). yi = k=0,10,20,40

The effect of charging price k in clinic i (relative to free distribution) is Eki = Yi (k) − Yi (0). To make causal inferences for a price level k via Fisher’s exact test, we use the null hypothesis that the effect of charging k is zero for all clinics: H0 : Eki = 0 for all i = 1, . . . , 16. Under this null hypothesis, all potential outcomes are known exactly. For example, although we do not observe the outcome under price 0 for clinic i subject to price k > 0, the null hypothesis implies that the unobserved outcome is equal to the observed outcome, Yi (0) = yi . For a given price level k, we can test the null hypothesis against the alternative hypothesis that Eki = 0 for some clinics by using the difference in average outcomes by treatment status as a test statistic: (1|Pi = 0)yi (1|Pi = k)yi − . Tk = (1|Pi = k) (1|Pi = 0) Under the null hypothesis, only the price variable P is random, and thus the distribution of the test statistic (generated by taking all possible treatment assignments of clinics to prices) is completely determined by that of P. By checking whether Tkobs , the statistic for the “true” assignment of prices (the actual assignment in our experiment), falls in the tails of the distribution, we can test the null hypothesis. We can reject the null hypothesis with a confidence level of 1 − α if the test statistic for the true assignment is in the (α/2)% tails of the distribution. This test is nonparametric because it does not make distributional assumptions. We call the p-values computed this way “randomization inference p-values.” IV. RESULTS IV.A. Clinic-Level Analysis: Randomization Inference Results Table II presents the results of randomization inference tests of the hypotheses that the three positive prices in our experiment

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)

Share of prenatal clients who acquired program ITN (12)

Panel A: Takeup Mean in free group 41.03 0.99 Difference with free group: ITN price = 10 Ksh −3.43 −13.77 −0.07 −0.07 S.E. (18.60) (13.98) (0.03)∗∗ (0.03)∗ Randomization inference p-value .824 .460 .125 .091 ITN price = 20 Ksh −13.87 −10.75 −0.17 −0.18 S.E. (20.36) (18.16) (0.02)∗∗∗ (0.02)∗∗∗ Randomization inference p-value .64 .61 .000 .036 ITN price = 40 Ksh −32.12 −34.03 −0.58 −0.58 S.E. (25.05) (22.00) (0.06)∗∗∗ (0.05)∗∗∗ Randomization inference p-value .23 .19 .000 .018 Clinic-level controls X X X X X X Number of clinics 10 10 8 8 7 7 10 10 8 8 8 8 R2 .00 .54 .07 .39 .25 .54 .45 .45 .93 .96 .95 .97 # of possible assignments 252 252 56 56 21 21 252 252 56 56 56 56 for random inference

(1)

Average weekly ITN sales over first 6 weeks

TABLE II CLINIC-LEVEL ANALYSIS: FISHERIAN PERMUTATION TESTS FREE DISTRIBUTION OR COST SHARING?

17

Panel B: Effective coverage

(14)

X 10 .22 252 2.429 1.418

10 .2 252 2.571 2.107

−0.18 −0.17 (0.13) (0.14) .173 .206

0.70

(13)

(16)

0.598

1.588

8 .42 56

0.822

1.500

X 8 .42 56

−0.27 −0.27 (0.13)∗ (0.14) .071 .143

(15)

(18)

0.185

0.948

0.153

0.931

−0.55 −0.54 (0.14)∗∗∗ (0.15)∗∗ .018 .054 X 8 8 .71 .73 56 56

(17)

Share using program ITN at follow-up (unconditional on takeup)

Notes: Panel A, columns (1)–(6): Sales data from clinics’ records. Data missing for one clinic due to misreporting of sales. Panel A, columns (7)–(12), and Panel B: Individual data collected by research team, averaged at the clinic level (the level of randomization). “Using program ITN” is equal to 1 only for those who (1) acquired the ITN and (2) had the ITN hanging in the home during an unannounced visit. Standard errors in parentheses, estimated through linear regressions. P-values for treatment effects computed by randomization inference. ∗∗∗ , ∗∗ , ∗ Significance at 1%, 5%, and 10% levels, respectively.

Mean in free group Difference with free group: ITN price = 10 Ksh S.E. Randomization inference p-value ITN price = 20 Ksh S.E. Randomization inference p-value ITN price = 40 Ksh S.E. Randomization inference p-value Clinic-level controls Number of clinics R2 # of possible assignments for random inference Ratio [(H/T ) from Panel B / ((H + N)/T ) from Panel A] Standard error of ratio (H/T )/(H + N/T )

TABLE II (CONTINUED)

18 QUARTERLY JOURNAL OF ECONOMICS

FREE DISTRIBUTION OR COST SHARING?

19

had no effect on demand and coverage. The data used in Table II were collapsed at the clinic level. (The raw data on clinic level outcomes are provided in the Online Appendix). We have two indicators of demand presented in Panel A: average weekly sales of ITNs (recorded by the clinics) in columns (1)–(6) and the share of surveyed pregnant women who acquired an ITN in columns (7)–(12). Panel B shows the rate of effective coverage: the share of surveyed pregnant women in the clinic who not only acquired the ITN but also reported using it at follow-up. For each outcome (sales, uptake, effective coverage), we present the estimated effect of prices both without and with clinic-level controls. We present the standard errors estimated through parametric linear regressions, as well as the randomization inference p-values. Results in columns (1)–(6) suggest that, although the ITN sales were lower on average in clinics charging a higher price for the ITN, none of the differences between clinics can be attributed to the price. Even the 32/41 = 78% lower sales in the clinics charging 40 Ksh are not significant. Note, however, that the sales data are missing for one of the three 40 Ksh clinics, and as a consequence the power of the randomization inference test in columns (5) and (6) is extremely low: there are only 21 possible assignments of seven clinics to two groups of sizes five and two, and each of them has a 1/21 = 4.76% chance of being selected. This means that even the largest effect cannot fall within the 2.5% tails of the distribution, and randomization inference would thus fail to reject the null hypothesis of no price effect with 95% confidence no matter how large the difference in uptake between 0 Ksh and 40 Ksh clinics is (Bloom et al. 2006). The power is higher for the tests performed on the survey data (columns (7)–(12) of Panel A, and Panel B), but still lower relative to tests that impose some structure on the error term. Nevertheless, the p-values in columns (9)–(12) suggest that we can reject the hypothesis that charging either 20 or 40 Ksh for nets has no effect on uptake with 95% confidence. In particular, uptake of the net is 58 percentage points lower in the 40 Ksh group than in the free distribution group, and the confidence level for this effect is 98%. The results on effective coverage (usage of the net unconditional on uptake) are weaker for the 20 Ksh treatment but still significant for the 40 Ksh treatment: effective coverage is 54 percentage points lower in the 40 Ksh group than in the free distribution group, and the confidence level for this effect is 94%.

20

QUARTERLY JOURNAL OF ECONOMICS

As shown in Section II, the key parameter of interest in determining the optimal subsidy level is the ratio (H/T)/((H + N)/T). We compute this ratio for T = 10 Ksh, T = 20 Ksh, and T = 40 Ksh at the bottom of Panel B in Table II. The ratio is greater than 1 for price changes from 0 to 10 Ksh or 0 to 20 Ksh, but the standard errors are massive and there is little informational content in those numbers. For T = 40 Ksh, the ratio is more precisely estimated, at 0.95, still quite close to 1. The standard error of this ratio is 0.18 in the absence of covariates, and implies a 95% confidence interval of [0.58–1.31]. When we control for clinic-level covariates in the estimations of the two effects, the confidence interval on the ratio is somewhat reduced to [0.63–1.23]. The finding in Table II that effective coverage is statistically significantly lower by 54 percentage points in the 40 Ksh group (the group that proxies the cost-sharing program in place in Kenya at the time of the study) compared to the free distribution group is the main result of the paper. In the remainder of the analysis, we investigate the effects in more detail by conducting parametric analysis on the disaggregated data with cluster standard errors adjusted for the small number of clusters. IV.B. Micro-Level Analysis: Price Elasticity of Demand for ITNs Table III presents coefficient estimates from OLS regressions of weekly ITN sales on price. The coefficient estimate on ITN price from the most basic specification in column (1) is −0.797. This estimate implies that weekly ITN sales drop by about eight nets for each 10 Ksh increase in price. Because clinics distributing ITNs for free to their clients distribute an average of 41 ITNs per week, these estimates imply that a 10 Ksh increase in ITN price leads to a 20% decline in weekly ITN sales. The specification in column (4) regresses weekly ITN sales on indicator variables for each ITN price (0 Ksh is excluded). Raising the price from 0 to 40 Ksh reduces demand by 80% (from 41 ITNs per week to 9)— a substantial decline in demand, a bit smaller than the decline implied by the linear estimate in column (1). These results are not sensitive to adding controls for time effects (columns (2) and (5)). Columns (3) and (6) present results of robustness checks conducted by including various characteristics of the clinics as controls. Because net sales are conditional on enrollment at prenatal clinics, one concern is that our demand estimates are confounded

ANC clinic offers HIV testing services

Prenatal enrollment fee (in Ksh)

Average attendance in 2006 (total)

Average attendance in 2006 (first visits)

Number of weeks since program started

ITN price = 40 Ksh ($0.60)

ITN price = 20 Ksh ($0.30)

ITN price = 10 Ksh ($0.15)

ITN price in Kenyan shillings (Ksh)

−0.797 (0.403)∗

−0.797 (0.401)∗

−5.08 (1.41)∗∗∗

(2)

(1)

−5.08 (1.46)∗∗∗ 1.48 (0.21)∗∗∗ −0.46 (0.15)∗∗∗ −0.77 (0.27)∗∗ 14.08 (7.44)∗

−0.803 (0.107)∗∗∗

(3)

−0.33 (16.81) −9.50 (16.04) −32.42 (15.38)∗

(4)

Weekly ITN sales

TABLE III WEEKLY ITN SALES ACROSS PRICES: CLINIC-LEVEL DATA

−0.33 (16.92) −9.50 (16.14) −32.42 (15.47)∗ −5.08 (1.42)∗∗∗

(5)

1.52 (4.37) −14.08 (5.00)∗∗ −33.71 (2.88)∗∗∗ −5.08 (1.48)∗∗∗ 1.56 (0.22)∗∗∗ −0.50 (0.15)∗∗∗ −0.54 (0.32) 7.07 (7.65)

(6)

FREE DISTRIBUTION OR COST SHARING?

21

90 .13 41.03

90 .21

(2)

(4)

90 .14

(3) −1.08 (0.77) −8.85 (2.89)∗∗∗ 90 .64

Weekly ITN sales

90 .23

(5)

−1.84 (0.68)∗∗ −9.63 (2.70)∗∗∗ 90 .65

(6)

Notes: Each column is an OLS regression of weekly ITN sales on ITN price or on a set of indicator variables for each price (0 Ksh is excluded). All regressions include district fixed effects. The sample includes fifteen clinics in three districts over six weeks after program introduction. (One 40 Ksh clinic is not included because of problems with net sales reporting.) Standard errors in parentheses are clustered at the clinic level. Given the small number of clusters (fifteen), the critical values for T -tests were drawn from a t-distribution with 13 (15 − 2) degrees of freedom. ∗ ∗ ∗ , ∗∗ , ∗ Significance at 1%, 5%, and 10% levels, respectively.

Observations (clinic-weeks) R2 Mean of dep. var. in clinics with free ITNs

Distance to the closest ANC clinic in the sample

Distance to the closest ANC clinic

(1)

TABLE III (CONTINUED)

22 QUARTERLY JOURNAL OF ECONOMICS

FREE DISTRIBUTION OR COST SHARING?

23

by variation in the level of prenatal attendance across clinics. Subsidized ITNs may provide an incentive to receive prenatal care, and therefore the level of prenatal enrollment after the introduction of the program is an endogenous variable of interest (Dupas 2005). Any impact of ITN price on total enrollment should be captured by total ITN sales (which reflect the change in the number of patients and in the fraction of patients willing to buy ITNs at each price). However, our demand estimates could be biased if total attendance prior to program introduction is correlated with the assigned ITN price. To check whether this is the case, the specification in columns (3) and (6) control for monthly prenatal attendance at each clinic in 2006, as well as additional clinic characteristics that could potentially influence attendance such as any fee for prenatal care, whether the clinic offers counseling and/or testing for HIV, the distance to the closest other clinic/hospital in our sample, and the distance to the closest other clinic/hospital in the area. The coefficient estimates on ITN price are basically unchanged when clinic controls are included, but their precision is improved. One might be concerned that our net sales data are biased due to (a moderate amount of) mismanagement, theft, and misreporting by clinics. Further, because the number of observations in Table III is small, demand estimates are not precisely estimated. For these reasons, it is important to check that the demand estimates based on net sales are consistent with those based on our survey data. Table IV presents additional estimates of demand based on individual-level data from surveys conducted among all prenatal clients who visited the clinics on the randomly chosen days when baseline surveys were conducted. These specifications correspond to linear probability models where the dependent variable is a dummy equal to one if the prenatal client bought or received an ITN; the independent variables are the price at which ITNs were sold, or dummies for each price. The coefficient estimate of −0.015 on ITN price in column (1) implies that a 10 Ksh ($0.15) increase in the price of ITNs reduces demand by fifteen percentage points (or roughly 20% at the mean purchase probability of .81). This is very consistent with the results based on net sales and corresponds to a price elasticity (at the mean price and purchase probability) of −.37. These results imply that demand for ITNs is 75% lower at the cost-sharing price prevailing in Kenya at the time of the study (50 Ksh or $0.75) than it is under a free distribution scheme.

424 .26 0.81 .23

424 .28 0.81

X X

−0.015 −0.017 (0.002)∗∗∗ (0.001)∗∗∗

(2)

(4)

424 .32 0.81

424 .32 0.81

−0.073 −0.058 (0.018)∗∗∗ (0.037) −0.172 −0.331 (0.035)∗∗∗ (0.102)∗∗∗ −0.605 −0.656 (0.058)∗∗∗ (0.037)∗∗∗ X X

(3)

(6)

(7)

201 .42 0.77

X X X

134 .24 0.84

X

X X

266 .32 0.84

X

X X

−0.018 −0.012 −0.016 (0.001)∗∗∗ (0.002)∗∗∗ (0.002)∗∗∗

(5)

266 .33 0.84

X

0.046 (0.034) −0.350 (0.142)∗∗ −0.635 (0.061)∗∗∗ X X

(8)

Notes: Data are from clinic-based surveys conducted in April–June 2007, throughout the first six weeks of the program. All regressions include district fixed effects. Standard errors in parentheses are clustered at the clinic level. Given the small number of clusters (sixteen), the critical values for T -tests were drawn from a t-distribution with 14 (16 − 2) degrees of freedom. All specifications are OLS regressions of an indicator variable equal to one if the respondent bought or received an ITN for free on the price of the ITN, except columns (4) and (8), in which regressors are indicator variables for each price (price = 0 is excluded). Time controls include fixed effects for the day of the week the survey was administered and a variable indicating how much time had elapsed between the day the survey was administered and the program introduction. Clinic controls include total monthly first prenatal care visits between April and June of 2006, the fee charged for a prenatal care visit, whether or not the clinic offers voluntary counseling and testing for HIV or prevention-of-mother-to-child-transmission of HIV services, the distance between the clinic and the closest other clinic or hospital and the distance between the clinic and the closest other clinic or hospital in the program. ∗ ∗ ∗ , ∗∗ , ∗ Significance at 1%, 5%, and 10% levels, respectively.

Time controls Clinic controls Restricted sample: first prenatal visit Restricted sample: first pregnancy Restricted sample: did not receive free ITN previous year Observations R2 Mean of dep. var. Intracluster correlation

ITN price = 40 Ksh ($0.60)

ITN price = 20 Ksh ($0.30)

ITN price = 10 Ksh ($0.15)

ITN price in Kenyan shillings (Ksh)

(1)

Bought/received an ITN during prenatal visit

TABLE IV DEMAND FOR ITNS ACROSS PRICES: INDIVIDUAL-LEVEL DATA

24 QUARTERLY JOURNAL OF ECONOMICS

FREE DISTRIBUTION OR COST SHARING?

25

In column (2) of Table IV, we add controls for when the survey was administered, including day-of-the-week fixed effects and the time elapsed since program introduction, as well as controls for the clinic characteristics used in Table III, column (3). The coefficient estimate for price remains very close to that obtained in the basic specification. Columns (3) and (4) present estimates of demand at each price point. In the absence of clinic or time controls, the decrease in demand for an increase in price from 0 to 10 Ksh is estimated at seven percentage points (larger than suggested by the clinic-level ITN sales in Table III). An increase in price from 20 to 40 Ksh leads to a 43–percentage point drop in demand. Column (5) presents demand estimates for the restricted sample of women who are making their first prenatal care visits for their current pregnancies. It is important to separate first visits from revisits because the latter may be returning because they are sick. Alternatively, women who are coming for a second or third visit may be healthier, because they have already received the benefits of the earlier visit(s), some of which can directly affect their immediate need for an ITN (such as malaria prophylaxis and iron supplementation). The coefficient estimate in column (5) is larger than that for the entire sample, implying that women coming for the first time are more sensitive to price than women coming for a revisit. This could be because women learn about the subsidized ITN program at their first visit and bring the cash to purchase the net at their second visit. Access to free ITNs from other sources could have dampened demand for ITNs distributed through the program. This is a real concern, because the Measles Initiative ran a campaign in July 2006 (nine months before the start of our experiment) throughout Kenya to vaccinate children between nine months and five years of age, distributing free ITNs to mothers of these children in western Kenya. To examine the demand response among women who are less likely to have had access to free ITNs in the past, column (6) estimates the impact of ITN price on demand for women in their first pregnancies only. When we restrict the sample in this way, the coefficient on ITN price drops to −0.012. This implies that women in their first pregnancies are indeed less sensitive to ITN price differences, but their demand still drops by 55 percentage points when the ITN price is raised from 0 to 50 Ksh. Our baseline survey asked respondents if they had received a free ITN in the previous year, and 37.3% said they did. In columns

26

QUARTERLY JOURNAL OF ECONOMICS

(7) and (8), we focus on the 63% who reported not having received a free ITN and estimate how their demand for an ITN in our program was affected by price. We find a coefficient on price very similar to that obtained with the full sample (−0.016), and the specifications with dummies for each price group generate estimates that are also indistinguishable from those obtained with the full sample. Nearly three-quarters of prenatal clients walked to the clinics for prenatal care. Because clinics included in our sample were at least 13 kilometers from one another, it is unlikely that prenatal clients would switch from one of our program clinics to another. However, it is likely that our program generated some crowd-out of prenatal clients at nonprogram clinics in the vicinity, particularly in the case of free nets. Because these “switchers” are driven by price differences in ITNs that would not exist in a nationwide distribution program, we should look at the demand response of those prenatal clients who, at the time of the interview, were attending the same clinic that they had in the past. In Online Appendix Table A.1, we replicate Table IV for this subsample of prenatal clients who did not switch clinics. The results are nearly unchanged, suggesting that the same degree of price sensitivity would prevail in a program with a uniform price across all clinics. In sum, our findings suggest that demand for ITNs is not sensitive to small increases in price from zero, but that even a moderate degree of cost-sharing leads to large decreases in demand. At the mean, a 10 Ksh ($0.15) increase in ITN price decreases demand by 20%. These estimates suggest that the majority of pregnant women are either unable or unwilling to pay the prevailing cost-sharing price, which is itself still far below the manufacturing cost of ITNs. IV.C. Price-Elasticity of the Usage of ITNs Usage Conditional on Ownership. Let us start this section with an important caveat: Our sample size to study usage conditional on uptake is considerably hampered by the fact that uptake was low in the higher-priced groups: only a small fraction of the respondents interviewed at baseline in the 40 Ksh group purchased an ITN and could be followed up at home for a usage check. Keeping this caveat in mind, Figure II shows the average usage rate of program-issued ITNs across price groups. The top panel shows self-reported usage rates, and the bottom panel shows the likelihood that the ITN was found hanging, both measured during

27

FREE DISTRIBUTION OR COST SHARING?

0

0.2

0.4

0.6

0.8

1

Declare using ITN

Free

10Ksh

20Ksh

40Ksh

ITN Price Average

95% CI

0

0.2

0.4

0.6

0.8

1

ITN seen visibly hanging

Free

10Ksh

20Ksh

40Ksh

ITN Price Average

95% CI

FIGURE II Program ITN Usage Rates (Conditional on Uptake) by ITN Price Error bars represent ±2.14 standard errors (95% confidence interval with fourteen degrees of freedom). Number of observations: 226.

28

QUARTERLY JOURNAL OF ECONOMICS

an unannounced home visit by an enumerator. On average, 62% of women visited at home claimed to be using the ITN they acquired through the program, a short-term usage rate that is very consistent with previous usage studies (D’Alessandro 1994; Alaii et al. 2003). The observed hanging rate was only slightly lower, at 57%. However, we find little variation in usage across price groups, and no systematic pattern. This is confirmed by the regression estimates of selection effects on usage, presented in Table V. Our coefficient estimate on ITN price in column (1) is positive, but insignificant, suggesting that a price increase of 10 Ksh increases usage by four percentage points, representing an increase of 6% at the mean. The confidence interval is large, however, and the true coefficient could be on either side of zero (the 95% confidence interval is −0.004; 0.012). These estimates correspond to a price elasticity of usage (at the mean price and usage rate) of 0.097. Adding controls in column (2) does not improve precision but reduces the size of the estimated effect. The results also hold when the sample is restricted to the subsample of women coming for their first prenatal visit, women in their first pregnancy, or to those who reported not having received a free ITN the previous year (data not shown). Estimates using indicators for each price in column (3) are also very imprecise, but show no pattern of increasing use with price. Women who pay 10 or 20 Ksh are less likely to be using their ITNs than women receiving them for free, but women who pay 40 Ksh appear close to 10% more likely to be using their ITNs. In none of the cases, however, can we reject the null hypothesis that price has no effect on intensity of usage. We cannot observe whether the net is actually used at night, but it is reasonable to believe that, if the ITN is taken out of its packaging and has been hung on the ceiling, it is being used.13 Of those women who claimed to be using the ITN, 95% had the net hanging. Results for whether or not the net is hanging (columns (5) and (6)) are very similar to those using self-reported usage. One might be concerned that usage rates among prenatal clients receiving a free net are higher than they would be under a one-price policy, because pregnant women who value an ITN 13. Having the insecticide-treated net hanging from the ceiling creates health benefits even if people do not sleep under the net, because it repels, disables, and/or kills mosquitoes coming into contact with the insecticide on the netting material (WHO 2007).

226 0.62 .01 .04

0.004 (0.004)

X X 226 0.62 .06

0.003 (0.003)

(2)

1.16 .36

1.14 .37

226 0.62 .03

−0.094 (0.103) −0.017 (0.119) 0.125 (0.123) X X 226 0.62 .07

(4)

−0.125 (0.120) −0.017 (0.107) 0.098 (0.135)

(3)

222 0.57 .01

0.003 (0.003)

(5)

ITN is visibly hanging

1.87 .18

222 0.57 .03

−0.154 (0.129) −0.088 (0.124) 0.071 (0.131)

(6)

Notes: Data are from home visits to a random sample of patients who bought nets at each price or received a net for free. Home visits were conducted for a subsample of patients roughly three to six weeks after their prenatal visit. Each column is an OLS regression of the dependent variable indicated by column on either the price of the ITN or an indicator variable for each price. All regressions include district fixed effects. Standard errors in parentheses are clustered at the clinic level. Given the small number of clusters (sixteen), the critical values for T -tests were drawn from a t-distribution with 14 (16 − 2) degrees of freedom. The specifications in columns (2) and (4) control for the number of days that have elapsed since the net was purchased, the number of days that have elapsed since the program was introduced at the clinic in which the net was purchased, and whether the woman has given birth already, is still pregnant, or miscarried, as well as the clinic controls in Table III.

Time controls Clinic controls Observations Sample mean of dep. var. R2 Intracluster correlation Joint F-test Prob > F

ITN price = 40 Ksh

ITN price = 20 Ksh

ITN price = 10 Ksh

ITN price

(1)

Respondent is currently using the ITN acquired through the program

TABLE V ITN USAGE RATES ACROSS PRICES, CONDITIONAL ON OWNERSHIP

FREE DISTRIBUTION OR COST SHARING?

29

30

QUARTERLY JOURNAL OF ECONOMICS

highly may have switched clinics in order to get a free net. We show in Online Appendix Table A.2 that, as with our demand estimates, usage rates among the subsample of women who did not switch clinics (i.e., attended the same prenatal clinic after our program was introduced as before it) are not different from the sample as a whole. Overall, one might be surprised that the level of net usage is not higher than 60%. This result might come from the fact that usage was measured a relatively short time after the net was purchased. In the usage regressions, the coefficients on time controls (not shown) suggest that usage increases as time passes after the ITN purchase. Among women not using the net, the most common reasons given for not using it were waiting for the birth of the child and waiting for another net (typically untreated with insecticide) to wear out. Dupas (2009a) finds that, among the general population, usage among both buyers and recipients of free ITNs is around 90% a year after the ITNs were acquired. Unconditional Usage: “Effective Coverage.” Although our estimates of usage rates among buyers suffer from small sample size imprecision, effective coverage (i.e., the fraction of the population using a program net) can be precisely estimated. Figure I presents effective coverage with program ITNs across ITN prices. The corresponding regression is presented in Table VI, column (1). The coefficient on price is −0.012, significant at the 1% level. This corresponds to a price elasticity of effective coverage of −0.44. The share of prenatal clients that are protected by an ITN under the free distribution scheme is 65%, versus 15% when ITNs are sold for 40 Ksh; this difference is significant at the 1% level (column (3)). The results are robust to the addition of clinic controls (columns (2) and (4)), and hold for all subgroups (data not shown). Overall, our results suggest that, at least in the Kenyan context, positive prices do not help generate higher usage intensity than free distribution. The absence of a selection effect on usage could be due to the nature of the good studied, which is probably valued very highly in areas of endemic malaria, particularly among pregnant women who want to protect their babies. The context in which the evaluation took place also probably contributed to the high valuation among those who didn’t have to pay. In particular, women had to travel to the health clinic for the prenatal visit and were told at the check-up about the importance

259 0.42 0.65 .02

−0.012 (0.003)∗∗∗

X X 259 0.42 0.65

−0.010 (0.002)∗∗∗

(2)

12.71 .00

259 0.42 0.65

−0.188 (0.123) −0.203 (0.097)∗ −0.504 (0.112)∗∗∗

(3)

8.12 .00

0.020 (0.145) −0.143 (0.104) −0.389 (0.095)∗∗∗ X X 259 0.42 0.65

(4)

Notes: Data are from random sample of patients who visited program clinics. Usage for those who acquired the ITNs was measured through home visits conducted roughly three to six weeks after their prenatal visit. Each column is an OLS regression of the dependent variable indicated by column on either the price of the ITN or an indicator variable for each price. All regressions include district fixed effects. Standard errors in parentheses are clustered at the clinic level. Given the small number of clusters (sixteen), the critical values for T -tests were drawn from a t-distribution with 14 (16 − 2) degrees of freedom. ∗∗∗ , ∗∗ , ∗ Significance at 1%, 5%, and 10% levels, respectively.

Time controls Clinic controls Observations Sample mean of dep. var. Mean in (ITN price = 0) group Intracluster correlation Joint F-test Prob > F

ITN price = 40 Ksh

ITN price = 20 Ksh

ITN price = 10 Ksh

ITN price

(1)

Respondent is currently using an ITN acquired through the program

TABLE VI EFFECTIVE COVERAGE: ITN USAGE RATES ACROSS PRICES, UNCONDITIONAL ON OWNERSHIP

FREE DISTRIBUTION OR COST SHARING?

31

32

QUARTERLY JOURNAL OF ECONOMICS

of protection against malaria. In addition, PSI has been conducting a very intense advertising campaign for ITN use throughout Kenya over the past five years. Last, the evaluation took place in a very poor region of Kenya, in which many households do not have access to credit and have difficulty affording even modest prices for health goods. Thus, a large number of prenatal clients may value ITNs but be unable to pay higher prices for them. IV.D. Are There Psychological Effects of Prices on Usage of ITNs? In this section, we test whether the act of paying itself can stimulate higher product use by triggering a sunk cost effect, when willingness to pay is held constant. We use data from the ex post price randomization conducted with a subset of women who had expressed their willingness to pay the posted price (in clinics charging a positive price). For those women, the transaction price ranged from “free” to the posted price they initially agreed to pay. Table VII presents estimates of the effect of price (columns (1) and (2)) and of the act of paying (columns (3)–(6)) on the likelihood of usage and likelihood that the ITN has been hung. These coefficients are from linear probability models with clinic fixed effects, estimated on the sample of women who visited a clinic where ITNs were sold at a positive price, decided to buy an ITN at the posted price, and were sampled to participate in the ex post lottery determining the transaction price they eventually had to pay to take the net home. Because the uptake of ITNs decreased sharply with the price, the sample we have at hand to test for the presence of sunk cost effects is small, and therefore the precision of the estimates we present below is limited. We find no psychological effect of price or the act of paying on usage, as expected from the earlier result that there is no overall effect of prices on usage. In column (1), the coefficient for price is negative, suggesting that higher prices could discourage usage, but the effect is not significant and cannot be distinguished from zero. The 95% confidence interval is (−0.0158; 0.0098), suggesting that a 10 Ksh increase in price could lead to anything from a decrease of sixteen to an increase of ten percentage points in usage. Larger effects on either side can be confidently rejected, however. Adding controls, including a dummy for having received a free ITN from the government in the previous year, does not reduce the standard error but decreases the coefficient of price further, enabling us to rule out sunk cost effects of more than seven percentage points per 10 Ksh increase in price (column (2)).

132 0.58

123 0.58 3.23 .00

−0.192 (0.100)∗ −0.234 (0.121)∗ 0.202 (0.102)∗∗ 0.148 (0.104) 0.000 (0.001) 0.015 (0.006)∗∗∗

−0.006 (0.006)

−0.003 (0.006)

132 0.58

−0.017 (0.100)

(3)

124 0.58 2.99 .01

−0.195 (0.122) 0.199 (0.103)∗ 0.184 (0.100)∗ 0.000 (0.001) 0.014 (0.006)∗∗

−0.072 (0.101)

(4)

123 0.58 3.60 .00

−0.065 (0.100) −0.191 (0.101)∗ −0.231 (0.122)∗ 0.202 (0.104)∗ 0.153 (0.104) 0.000 (0.001) 0.015 (0.006)∗∗∗

(5)

121 0.53 1.97 .07

−0.084 (0.099) −0.165 (0.102) −0.213 (0.125)∗ 0.121 (0.107) 0.063 (0.106) 0.000 (0.001) 0.011 (0.005)∗∗

(6)

ITN is visibly hanging

Notes: Standard errors in parentheses. Estimates are from linear probability models with clinic fixed effects, estimated on the sample of women who (1) visited a clinic where ITNs were sold at a positive price; (2) decided to buy an ITN at the posted price; and (3) were sampled to participate in the ex post lottery determining the transaction price they eventually had to pay to take the net home. The transaction prices ranged from 0 (free) to the posted price. Some of the individual control variables are missing for some respondents. ∗∗∗ , ∗∗ , ∗ Significance at 1%, 5%, and 10% levels, respectively.

Observations Sample mean of dep. var. F stat Prob > F

Time elapsed since ITN purchase

Time to clinic

First pregnancy

First prenatal visit

Still pregnant at time of follow-up

Got a free ITN the previous year

Transaction price > 0

Transaction price

(2)

(1)

Respondent is currently using the ITN acquired through the program

TABLE VII SUNK COST EFFECTS? ITN USAGE RATES ACROSS PRICES (CONDITIONAL ON OWNERSHIP), HOLDING WILLINGNESS TO PAY CONSTANT

FREE DISTRIBUTION OR COST SHARING?

33

34

QUARTERLY JOURNAL OF ECONOMICS

In column (3), the coefficient for the act of paying a positive price is also negative, suggesting that if the act of paying had any effect, it would decrease usage rather than increase it, but here again the coefficient cannot be confidently distinguished from zero. The 95% confidence interval for this estimate is quite large and suggests that a 10 Ksh increase in price could lead to anything from a decrease of 22 to an increase of 20 percentage points in usage. Overall, these results suggest that, in the case of ITNs marketed through health clinics, there is no large positive psychological effect of price on usage. We do not have data on baseline time preferences to check whether certain subgroups are more likely to exhibit a “sunk cost” effect. We also do not have data on what women perceived ex post as the price they paid for the ITN; we thus cannot verify that those who received a discount mentally “integrated” the two events (payment and discount) to “cancel” the loss, in the terms of Thaler (1985), or whether they “segregated” the two events and perceived the payment as a cash loss and the discount as a cash gain. If usage might not increase with price, what about the private benefits to the users? Is it the case that the users reached through the 40 Ksh distribution system are those who really need the ITN, whereas the additional users obtained through the free distribution will not benefit from using the ITN because they don’t need it as much (i.e., they are healthier, or can afford other means to protect themselves against malaria)? From a public health point of view, this issue might be irrelevant in the case of ITNs, given the important community-wide effects of ITN use documented in the medical literature cited earlier. Nevertheless, it is interesting to test the validity of the argument advanced by cost-sharing programs with respect to the private returns of ITN use. This is what we attempt to do in the next section. IV.E. Selection Effects of ITN Prices This section presents results on selection effects of positive prices on the health of patients who buy them. The argument that cost-sharing targets those who are more vulnerable by screening out women who appear to need the ITN less assumes that willingness to pay is the main factor in the decision to buy an ITN. In the presence of extreme poverty and weak credit markets, however, it is possible that people are not able (do not have the cash) to pay what they would be willing to pay in the absence of

FREE DISTRIBUTION OR COST SHARING?

35

credit constraints. The optimal subsidy level will have to be low enough to discourage women who do not need the product from buying it, although at the same time high enough to enable creditconstrained women to buy it if they need it. We focus our analysis on an objective measure of health among prenatal clients— their hemoglobin levels. Women who are anemic (i.e., with low hemoglobin levels) are likely the women with the most exposure and least resistance to malaria, and are likely the consumers that a cost-sharing program would want to target. To judge whether higher prices encourage sicker women to purchase nets, we study the impact of price on the health of “takers” (i.e., buyers and recipients of free nets) relative to the health of the prenatal clients attending control clinics. Figure III plots the cumulative density functions (CDFs) of hemoglobin levels for women buying/receiving a net at each price relative to women in the control group. The surprising result in Figure III is that the CDFs for women receiving free nets stochastically dominates the distribution in the control group, implying that women who get free nets are healthier than the average prenatal woman (Panel A). In contrast, the CDFs of hemoglobin levels of women who pay a positive price (whether 10, 20, or 40 Ksh) are indistinguishable from the CDFs of women in the control clinics (Panels B, C, and D). In other words, women who pay a higher price do not appear to be sicker than the average prenatal clients in the area.14 Why would it be that women who receive free nets appear substantially healthier, even though higher prices do not appear to induce selection of women who are sicker than the general prenatal population? Dupas (2005) shows that there is a strong incentive effect of free ITNs on enrollment for prenatal care. To test whether such an effect was at play in our experiment, Table VIII presents the average characteristics of prenatal clients in control clinics (column (1)), and, for each price group, how the average buyer diverges from the average woman in the control group (columns (2)– (5)). The results provide some evidence that the incentive effect of free ITNs was strong: women who came for free nets were 12% 14. For each price level, we test the significance of the differences in CDFs (compared to the control group) with the Kolmogorov–Smirnov equalityof-distributions test. Following Præstgaard (1995), we use the bootstrap method to adjust the p-values for clustering at the clinic level. The results of the tests are presented in the notes of Figure III. We can reject the null hypothesis of equality of distributions between women who receive free nets and those attending control clinics at the 10% significance level. We cannot reject the equality of distributions for women in the control population and those paying 10, 20, or 40 Ksh for an ITN.

0 0.2 0.4 0.6 0.8 1

Control

Free net

10 Hemoglobin level (g/dL)

15

5

Control

20 Ksh net

10 Hemoglobin level (g/dL)

15

C: Clients at control clinics vs. clients buying 20 Ksh net

5

A: Clients at control clinics vs. clients receiving free net

Control

10 Ksh net

10 Hemoglobin level (g/dL)

15

5

Control

40 Ksh net

10 Hemoglobin level (g/dL)

15

D: Clients at control clinics vs. clients buying 40 Ksh net

5

B: Clients at control clinics vs. clients buying 10 Ksh net

FIGURE III Cumulative Density of Hemoglobin Levels among ITN Recipients/Buyers The p-values for Kolmogorov–Smirnov tests of equality of distribution (adjusted for clustering at the clinic level by bootstrap) are .091 (Panel A), .385 (Panel B), .793 (Panel C), and .781 (Panel D). Number of observations: 198 (Panel A), 217 (Panel B), 208 (Panel C), and 139 (Panel D).

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

36 QUARTERLY JOURNAL OF ECONOMICS

37

FREE DISTRIBUTION OR COST SHARING? TABLE VIII CHARACTERISTICS OF PRENATAL CLIENTS BUYING/RECEIVING ITN RELATIVE TO CLIENTS OF CONTROL CLINICS Differences with control clinics

Mean in control clinics

0 Ksh (free)

10 Ksh ($0.15)

20 Ksh ($0.30)

40 Ksh ($0.60)

(1)

(2)

(3)

(4)

(5)

Respondent owns animal assets

Panel A. Characteristics of visit to prenatal clinic 0.48 −0.12 −0.02 0.03 0.02 0.50 (0.06)∗∗ (0.04) (0.06) (0.04) 0.73 −0.12 0.04 0.07 −0.16 0.45 (0.13) (0.07) (0.06) (0.08)∗ 4.58 3.52 0.79 −1.17 4.27 10.83 (3.29) (1.78) (1.37) (1.94)∗∗ 0.81 0.10 0.05 0.00 0.09 0.40 (0.03)∗∗∗ (0.05) (0.04) (0.02)∗∗∗ 0.61 0.06 0.07 −0.11 0.11 0.49 (0.12) (0.12) (0.12) (0.12) 0.19 0.00 0.01 0.12 0.07 0.39 (0.06) (0.05) (0.05)∗∗ (0.09)

Hemoglobin level (Hb), in g/dL Moderate anemia (Hb < 11.5 g/dL) Severe anemia (Hb ≤ 9 g/dL)

10.44 1.77 0.69 0.46 0.16 0.37

First prenatal visit for current pregnancy Walked to the clinic If took transport to clinic: price paid (Ksh) Can read Swahili Wearing shoes

Observations

110

Panel B. Health status 0.94 0.49 0.22 (0.34)∗∗ (0.49) (0.47) −0.18 −0.09 −0.08 (0.07)∗∗ (0.12) (0.10) −0.10 −0.01 0.07 (0.06) (0.07) (0.09) 98

120

99

0.48 (0.78) −0.05 (0.19) −0.06 (0.11) 28

Notes: For each variable, column (1) shows the mean observed among prenatal clients enrolling in control clinics; the standard deviations are presented in italics. Columns (2), (3), (4), and (5) show the differences between “buyers” in the clinics providing ITNs at 0, 10, 20, and 40 Ksh and prenatal clients enrolling in control clinics. Standard errors in parentheses are clustered at the clinic level; given the small number of clusters (sixteen), the critical values for T -tests were drawn from a t-distribution with 14 (16 − 2) degrees of freedom. ∗∗∗ , ∗∗ , ∗ Significance at 1%, 5%, and 10% levels, respectively.

more likely to be coming for a repeat visit and 12% less likely to have come by foot (i.e., more likely to have come by public transportation), and they paid about 3.5 Ksh more to travel to the clinic than women in the control group (Panel A). These results suggest that the free ITN distribution induced women who had come to the clinic before the introduction of the program to come back for a revisit earlier than scheduled, and therefore before the health benefits of their first prenatal visit had worn out.15 As a result, 15. In Kenya, pregnant women are typically given free iron supplements, as well as free presumptive malaria treatment, when they come for prenatal care. Both of these “treatments” have a positive impact on hemoglobin levels.

38

QUARTERLY JOURNAL OF ECONOMICS

as seen in Figure III, women receiving free nets are substantially less likely to be anemic (eighteen percentage points off of a base of 69% in Panel B of Table VIII).16 In absolute terms, however, the number of anemic women covered by an ITN is substantially greater under free distribution than under cost-sharing. As shown in Table VIII, the great majority of pregnant women in Kenya are moderately anemic (71%). All of them receive ITNs under free distribution, but only 40% of them invest in ITNs when the price is 40 Ksh (Table IV). Given that usage of the ITN (conditional on ownership) is similar across price groups, effective coverage of the anemic population is thus 60% lower under cost-sharing.17 Finally, it is interesting to note in Table VIII that women who bought nets for 40 Ksh were more likely to pay for transportation and paid more to come to the clinic than the control group. Women who paid 40 Ksh were also more likely to be literate, more likely to be wearing shoes, and more likely to report owning animal assets. Not all of these differences are statistically different from zero, given the small-sample problem, but overall these results are suggestive that selection under cost-sharing happened at least partially along wealth lines.18 V. COST-EFFECTIVENESS ANALYSIS This section presents estimates of the cost-effectiveness of each pricing strategy in terms of children’s lives saved. There are many benefits to preventing malaria transmission in addition to saving children’s lives, and restricting ourselves to child mortality will lead to conservative estimates of cost-effectiveness. An important dimension to keep in mind in the costeffectiveness analysis is the nonlinearity in the health benefits associated with ITN use: high-density ITN coverage reduces overall transmission rates and thus positively affects the health of both 16. Because some of the women who received free nets appear to have traveled farther and spent more money on travel to the clinic, one might expect that this group was composed of many switchers from nonprogram clinics. However, we find that the effects of price on selection in terms of health are unchanged for the subsample of women staying with the same clinic (Online Appendix Table A3). 17. The usage results in Table V hold when the sample is restricted to moderately anemic women (data not shown). 18. This hypothesis is supported by the fact that, when we compare the average client at 40 Ksh clinics (rather than the average buyer at these clinics) to the average control client, they are not more likely to have paid for transportation and paid no more for transportation than the control group (results not shown).

FREE DISTRIBUTION OR COST SHARING?

39

nonusers and users. The results of a 2003 medical trial of ITNs in western Kenya imply that “in areas with intense malaria transmission with high ITN coverage, the primary effect of insecticidetreated nets is via area-wide effects on the mosquito population and not, as commonly supposed, by simple imposition of a physical barrier protecting individuals from biting” (Hawley et al. 2003, p. 121). In this context, we propose the following methodology to measure the health impact of each ITN pricing scheme: we create a “protection index for nonusers” (a logistic function of the share of users in the total population) and a “protection index for users” (a weighted sum of a “physical barrier” effect of the ITN and the externality effect, the weights depending on the share of users). This enables us to compute the health impact of each pricing scheme on both users and nonusers and to (roughly) approximate the total number of child lives saved, as well as the cost per life saved. Because the relative importance of the “physical barrier” effect and of the externality are uncertain, we consider three possible values for the parameter of the logistic function predicting the protection index for nonusers (the “threshold externality parameter”) and three possible values for the effectiveness of ITNs as physical barriers. This gives us a total of 3 × 3 = 9 different scenarios and 9 different cost-per-life-saved estimates for each of the four pricing strategies. The cost-effectiveness estimates are presented in Table IX. These estimates are provided to enable comparisons across distribution schemes, but their absolute values should be taken with caution, as they rely on a number of coarse assumptions (the details of the calculations are provided in the Online Appendix). In particular, two key assumptions made are the following: (1) We assume that the only difference in cost per ITN between free distribution and cost-sharing is the difference in the subsidy. That is, we assume that an ITN given for free costs 40 Ksh more to the social planner than an ITN sold for 40 Ksh. We thus ignore money management costs associated with cost-sharing schemes. (2) We assume that 65% of households will experience a pregnancy within five years and be eligible for the ITN distribution program.19 The estimates in Table IX suggest that, under all nine scenarios we study, child mortality is reduced more under free distribution than any cost-sharing strategy (Panel A). This result is not 19. Making less conservative assumptions would increase the relative costeffectiveness of free distribution programs.

0 10 20 40

100.0 97.5 95.0 90.0

200 234 189 175

38 29 32 16

High (1)

Low (3)

High (4)

Medium (5)

Low (6)

Panel A. Child lives saved per 1,000 prenatal clients 37 36 30 27 24 28 26 20 16 13 30 28 22 19 15 14 12 11 8 6 Panel B. Cost per child life saved (US$) 206 212 255 284 321 251 270 348 421 531 200 213 274 325 399 201 235 261 339 483

Medium (2)

Medium Hypothesis on physical barrier effectiveness:

352 448 361 302

22 15 17 9

High (7)

460 609 487 418

17 11 12 7

Medium (8)

662 949 748 678

11 7 8 4

Low (9)

High Hypothesis on physical barrier effectiveness:

Notes: Each cell corresponds to a separate state of the world. To this date, existing medical evidence on the relative importance of the physical barrier provided by an ITN and on the externality threshold is insufficient to know which cells are closest to the actual state of the world. See Online Appendix for details on how these estimates were computed and the hypotheses they rely on.

0 10 20 40

ITN price (Ksh)

100.0 97.5 95.0 90.0

Subsidy level (%)

Low Hypothesis on physical barrier effectiveness:

Hypothesis on externality threshold:

TABLE IX COST-EFFECTIVENESS COMPARISONS

40 QUARTERLY JOURNAL OF ECONOMICS

FREE DISTRIBUTION OR COST SHARING?

41

surprising considering the large negative effect of cost-sharing on the share of ITN users in the population. Under the low threshold assumption for the externality effect, in terms of cost per life saved, we find that charging 40 Ksh is more cost-effective than free distribution if the physical barrier effect of ITNs is high (Panel B, column (1)). When the assumptions about the effectiveness of ITNs as physical barriers for their users are less optimistic, we find that free distribution becomes at least as cost-effective, if not more, than cost-sharing. Under the assumption of a “medium” externality threshold level, we find that free distribution could dominate cost-sharing in terms of cost-effectiveness (Panel B, columns (4)–(6)). Last, in the scenario where a large share of ITN users is necessary for a substantial externality to take place, we find that cost-sharing is again slightly cheaper than free distribution, unless the physical barrier effectiveness is very low. This is due to the fact that under the high threshold hypothesis, even free distribution to pregnant women is not enough to generate significant community-wide effects, because not all households experience a pregnancy. That said, given the very large standard errors on the usage estimates, the differences observed across schemes in cost per life saved typically cannot be distinguished from zero. The general conclusion of this cost-effectiveness exercise is thus that cost-sharing is at best marginally more cost-effective than free distribution, but free distribution leads to many more lives saved. VI. DISCUSSION AND CONCLUSION The argument that charging a positive price for a commodity is necessary to ensure that it is effectively used has recently gained prominence in the debate on the efficiency of foreign aid. The cost-sharing model of selling nets for $0.50 to mothers through prenatal clinics is believed to reduce waste because “it gets the nets to those who both value them and need them” (Easterly 2006, p. 13). Our randomized pricing experiment in western Kenya finds no evidence to support this assumption. We find no evidence that cost-sharing reduces wastage by sifting out those who would not use the net: pregnant women who receive free ITNs are no less likely to put them to intended use than pregnant women who pay for their nets. This suggests that costsharing does not increase usage intensity in this context. Although it doesn’t increase usage intensity, cost-sharing does considerably

42

QUARTERLY JOURNAL OF ECONOMICS

dampen demand: we find that the cost-sharing scheme ongoing in Kenya at the time of this study results in a coverage rate 75 percentage points lower than with a full subsidy. In terms of getting nets to those who need them, our results on selection based on health imply that women who purchase nets at cost-sharing prices are no more likely to be anemic than the average prenatal woman in the area. We also find that localized, short-lived free distribution programs disproportionately benefit healthier women who can more easily travel to the distribution sites. Although our results speak to the ongoing debate regarding the optimal subsidization level for ITNs—one of the most promising health tools available in public health campaigns in sub-Saharan Africa—they may not be applicable to other public health goods that are important candidates for subsidization. In particular, it is important to keep in mind that this study was conducted when ITNs were already highly valued in Kenya, thanks to years of advertising by both the Ministry of Health and Population Services International. This high ex ante valuation likely diminished the risk that a zero or low price be perceived as a signal of bad quality. Our findings are consistent with previous literature on the value of free products: in a series of lab experiments, both hypothetical and real, Ariely and Shampan’er (2007) found that when people have to choose between two products, one of which is free, charging zero price increases consumers’ valuation of the product itself, in addition to reducing its cost. In a recent study in Uganda, Hoffmann (2007) found that households that are told about the vulnerability of children to malaria on the day they acquire an ITN are more likely to use the ITN to protect their children when they receive it for free than when they have to pay for it. In a study conducted with the general Kenyan population, Dupas (2009b) randomly varied ITN prices over a much larger range (between $0 and $4), and also found no evidence that charging higher prices leads to higher usage intensity. Dupas (2009b) also found that the demand curve for ITNs remains unaffected by common marketing techniques derived from psychology (such as the framing of marketing messages, the gender of the person targeted by the marketing, or verbal commitment elicitation), further suggesting that the high price-elasticity of the demand for ITNs is driven mostly by budget constraints. Our finding that usage of ITNs is insensitive to the price paid to acquire them contrasts with the finding of Ashraf, Berry, and Shapiro (forthcoming), in which Zambian households that paid a

FREE DISTRIBUTION OR COST SHARING?

43

higher price for a water-treatment product were more likely to report treating their drinking water two weeks later. Their experimental design departs from ours in multiple ways that could explain the difference in findings. First, because the range of prices at which the product was offered in their experiment did not include zero, Ashraf, Berry, and Shapiro do not measure usage under a free distribution scheme. Second, in contrast to a bed net that can be used for three years before it wears out, the bottle of water disinfectant used in Ashraf, Berry, and Shapiro lasts for only about one month if used consistently to treat the drinking water of an average family; in this context, it is possible that households that purchased the water disinfectant but were not using it two weeks later had stored the bottle for later use (e.g., for the next sickness episode in their household or the next cholera outbreak), and therefore the evidence on usage in Ashraf, Berry, and Shapiro has a different interpretation from ours. In addition, the baseline level of information about the product (its effectiveness, how to use it) might have differed across experiments. Although ITN distribution programs that use cost-sharing are less effective and not more cost-effective than free distribution in terms of health impact, they might have other benefits. Indeed, they often have the explicit aim of promoting sustainability. The aim is to encourage a sustainable retail sector for ITNs by combining public and private sector distribution channels (Mushi et al. 2003; Webster, Lines, and Smith 2007). Our experiment does not enable us to quantify the potentially negative impact of free distribution on the viability of the retail sector and therefore our analysis does not consider this externality. Another important dimension of the debate on free distribution versus cost-sharing is the effect of full subsidies on the distribution system. In particular, the behavior of agents on the distribution side, notably health workers in our context, could depend on the level of subsidy. Although user fees can be used to incentivize providers (World Bank 2004), free distribution schemes have been shown to be plagued by corruption (in the form of diversion) among providers (Olken 2006). Our experiment focused on the demand side and was not powered to address this distribution question. As with most randomized experiments, we are unable to characterize or quantify the impact of the various possible distribution schemes when they have been scaled up and general equilibrium effects have set in. Our experimental results should thus be seen as one piece in the puzzle of how to increase uptake of effective, externality-generating health products in resource-poor settings.

44

QUARTERLY JOURNAL OF ECONOMICS

HARVARD SCHOOL OF PUBLIC HEALTH UNIVERSITY OF CALIFORNIA, LOS ANGELES

REFERENCES Alaii, Jane A., William A. Hawley, Margarette S. Kolczak, Feiko O. Ter Kuile, John E. Gimnig, John M. Vulule, Amos Odhacha, Aggrey J. Oloo, Bernard L. Nahlen, and Penelope A. Phillips-Howard, “Factors Affecting Use of PermethrinTreated Bed Nets during a Randomized Controlled Trial in Western Kenya,” American Journal of Tropical Medicine and Hygiene, 68 (2003), 137–141. Ariely, Dan, and Krsitina Shampan’er, “How Small Is Zero Price? The True Value of Free Products,” Marketing Science, 26 (2007), 742–757. Arkes, Hal R., and Catherine Blumer, “The Psychology of Sunk Cost,” Organizational Behavior and Human Decision Processes, 35 (1985), 124–140. Ashraf, Nava, James Berry, and Jesse Shapiro, “Can Higher Prices Stimulate Product Use? Evidence from a Field Experiment in Zambia,” American Economic Review, forthcoming. Bagwell, Kyle, and Michael H. Riordan, “High and Declining Prices Signal Product Quality,” American Economic Review, 81 (1991), 224–239. Binka, F. N., F. Indome, and T. Smith, “Impact of Spatial Distribution of Permethrin-Impregnated Bed Nets on Child Mortality in Rural Northern Ghana,” American Journal of Tropical Medicine and Hygiene, 59 (1998), 80– 85. Bloom, Erik, Indu Bhushan, David Clingingsmith, Elizabeth King, Michael Kremer, Benjamin Loevinsohn, Rathavuth Hong, and J. Brad Schwartz, “Contracting for Health: Evidence from Cambodia,” Brookings Institution Report, 2006. Cameron, A. Colin, Douglas Miller, and Jonah B. Gelbach, “Bootstrapped-Based Improvements for Inference with Clustered Errors,” Review of Economics and Statistics, 90 (2007), 414–427. D’Alessandro, Umberto, “Nationwide Survey of Bednet Use in Rural Gambia,” Bulletin of the World Health Organization, 72 (1994), 391–394. Donald, Stephen, and Kevin Lang, “Inference with Differences-in-Differences and Other Panel Data,” Review of Economics and Statistics, 89 (2007), 221–233. Dupas, Pascaline, “Short-Run Subsidies and Long-Term Adoption of New Health Products: Evidence from a Field Experiment,” Mimeo, UCLA, 2009a. ——, “What Matters (and What Does Not) in Households’ Decision to Invest in Malaria Prevention?” American Economic Review: Papers and Proceedings, 99 (2009b), 224–230. ——, The Impact of Conditional In-Kind Subsidies on Preventive Health Behaviors: Evidence from Western Kenya, unpublished manuscript, 2005. Easterly, William, The White Man’s Burden: Why the West’s Efforts to Aid the Rest Have Done So Much Ill and So Little Good (New York: Penguin Press, 2006). Evans, David B., Girma Azene, and Joses Kirigia, “Should Governments Subsidize the Use of Insecticide-Impregnated Mosquito Nets in Africa? Implications of a Cost-Effectiveness Analysis,” Health Policy and Planning, 12 (1997), 107–114. Fisher, Ronald A., The Design of Experiments (London: Oliver and Boyd, 1935). Gimnig, John E., Margarette S. Kolczak, Allen W. Hightower, John M. Vulule, Erik Schoute, Luna Kamau, Penelope A. Phillips-Howard, Feiko O. Ter Kuile, Bernard L. Nahlen, and William A. Hawley, “Effect of Permethrin-Treated Bed Nets on the Spatial Distribution of Malaria Vectors in Western Kenya,” American Journal of Tropical Medicine and Hygiene, 68 (2003), 115–120. Harvey Philipp D., “The Impact of Condom Prices on Sales in Social Marketing Programs,” Studies in Family Planning, 25 (1994), 52–58. Hawley, William A., Penelope A. Phillips-Howard, Feiko O. Ter Kuile, Dianne J. Terlouw, John M. Vulule, Maurice Ombok, Bernard L. Nahlen, John E. Gimnig, Simon K. Kariuki, Margarette S. Kolczak, and Allen W. Hightower, “Community-Wide Effects of Permethrin-Treated Bed Nets on Child Mortality and Malaria Morbidity in Western Kenya,” American Journal of Tropical Medicine and Hygiene, 68 (2003), 121–127.

FREE DISTRIBUTION OR COST SHARING?

45

Hoffmann, Vivian, “Psychology, Gender, and the Intrahousehold Allocation of Free and Purchased Mosquito Nets,” Mimeo, Cornell University, 2007. Karlan, Dean, and Jonathan Zinman, “Observing Unobservables: Identifying Information Asymmetries with a Consumer Credit Field Experiment,” Econometrica, forthcoming. Kremer, Michael, and Edward Miguel, “The Illusion of Sustainability,” Quarterly Journal of Economics, 112 (2007), 1007–1065. Lengeler, Christian, “Insecticide-Treated Bed Nets and Curtains for Preventing Malaria,” Cochrane Dabatase Syst Rev 2:CF000363, 2004. Lengeler, Christian, Mark Grabowsky, David McGuire, and Don deSavigny, “Quick Wins versus Sustainability: Options for the Upscaling of Insecticide-Treated Nets,” American Journal of Tropical Medicine and Hygiene, 77 (2007), 222–226. Lucas, Adrienne, “Economic Effects of Malaria Eradication: Evidence from the Malarial Periphery,” American Economic Journal: Applied Economics, forthcoming. Moulton, Brent R., “An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units,” Review of Economics and Statistics, 72 (1990), 334–338. Mushi, Adiel K., Jonna R. Schellenberg, Haji Mponda, and Christian Lengeler, “Targeted Subsidy for Malaria Control with Treated Nets Using a Discount Voucher System in Southern Tanzania,” Health Policy and Planning, 18 (2003), 163–171. Olken, Benjamin, “Corruption and the Costs of Redistribution: Micro Evidence from Indonesia,” Journal of Public Economics, 90 (2006), 853–870. Oster, Sharon, Strategic Management for Nonprofit Organizations: Theory and Cases (Oxford, UK: Oxford University Press, 1995). Population Services International [PSI], “What Is Social Marketing?” available online at http://www.psi.org/resources/pubs/what is smEN.pdf, 2003. Præstgaard, Jens P., “Permutation and Bootstrap Kolmogorov-Smirnov Test for the Equality of Two Distributions,” Scandinavian Journal of Statistics, 22 (1995), 305–322. Riley, John G., “Silver Signals: Twenty-Five Years of Screening and Signaling,” Journal of Economic Literature, 39 (2001), 432–478. Rosenbaum, Paul R., Observational Studies, (New York: Springer-Verlag, 2002). Sachs, Jeffrey, The End of Poverty: Economic Possibilities for Our Time (New York: Penguin, 2005). Schellenberg, Joanna A., Salim Abdulla, Rose Nathan, Oscar Mukasa, Tanya Marchant, Nassor Kikumbih, Adiel Mushi, Haji Mponda, Happiness Minja, and Hassan Mshinda, “Effect of Large-Scale Social Marketing of InsecticideTreated Nets on Child Survival in Rural Tanzania,” Lancet, 357 (2001), 1241–1247. Ter Kuile, Feiko O., Dianne J. Terlouw, Penelope A. Phillips-Howard, William A. Hawley, Jennifer F. Friedman, Simon K. Kariuki, Ya Ping Shi, Margarette S. Kolczak, Altaf A. Lal, John M. Vulule, and Bernard L. Nahlen, “Reduction of Malaria during Pregnancy by Permethrin-Treated Bed Nets in an Area of Intense Perennial Malaria Transmission in Western Kenya,” American Journal of Tropical Medicine and Hygiene, 68 (2003), 50–60. Thaler, Richard, “Toward a Positive Theory of Consumer Choice,” Journal of Economic Behavior and Organization, 1 (1980), 39–60. ——, “Mental Accounting and Consumer Choice,” Marketing Science, 4 (1985), 199–214. Webster, Jayne, Jo Lines, and Lucy Smith, “Protecting All Pregnant Women and Children under Five Years Living in Malaria Endemic Areas in Africa with Insecticide Treated Mosquito Nets,” World Health Organization Working Paper, available at http://www.who.int/malaria/docs/VulnerableGroupsWP.pdf, 2007. World Bank, World Development Report 2004: Making Services Work for Poor People (Washington, DC: World Bank and Oxford University Press, 2004). World Health Organization [WHO], “WHO Global Malaria Programme: Position Statement on ITNs,” available at http://www.who.int/malaria/docs/itn/ ITNspospaperfinal.pdf, 2007. World Malaria Report, available at http://www.who.int/malaria/wmr2008/ malaria2008.pdf, 2008.

SOPHISTICATED MONETARY POLICIES∗ ANDREW ATKESON VARADARAJAN V. CHARI PATRICK J. KEHOE In standard monetary policy approaches, interest-rate rules often produce indeterminacy. A sophisticated policy approach does not. Sophisticated policies depend on the history of private actions, government policies, and exogenous events and can differ on and off the equilibrium path. They can uniquely implement any desired competitive equilibrium. When interest rates are used along the equilibrium path, implementation requires regime-switching. These results are robust to imperfect information. Our results imply that the Taylor principle is neither necessary nor sufficient for unique implementation. They also provide a direction for empirical work on monetary policy rules and determinacy.

I. INTRODUCTION The now-classic Ramsey (1927) approach to policy analysis under commitment specifies the set of instruments available to policy makers and finds the best competitive equilibrium outcomes given those instruments. This approach has been adapted to situations with uncertainty, by Barro (1979) and Lucas and Stokey (1983), among others, by specifying the policy instruments as functions of exogenous events.1 Although the Ramsey approach has been useful in identifying the best outcomes, it needs to be extended before it can be used to guide policy. Such an extension must describe what would happen for every history of private agent actions, government policies, and exogenous events. It should also structure policy in such a way that policy makers can ensure that their desired outcomes occur. Here, we provide such an extended approach. To construct it, we extend the language of Chari and Kehoe (1990) in a natural fashion by describing private agent actions and government policies as functions of the histories of those actions and policies as well as of exogenous events. The key to our approach is our ∗ The authors thank the National Science Foundation for financial support and Kathleen Rolfe and Joan Gieseke for excellent editorial assistance. The views expressed herein are those of the authors and not necessarily those of the Federal Reserve Bank of Minneapolis or the Federal Reserve System. 1. The Ramsey approach has been used extensively to discuss optimal monetary policy. See, among others, the work of Chari, Christiano, and Kehoe (1996); Schmitt-Groh´e and Uribe (2004); Siu (2004); and Correia, Nicolini, and Teles (2008). C 2010 by the President and Fellows of Harvard College and the Massachusetts Institute of

Technology. The Quarterly Journal of Economics, February 2010

47

48

QUARTERLY JOURNAL OF ECONOMICS

requirement that for all histories, including those in which private agents deviate from the equilibrium path, the continuation outcomes constitute a continuation competitive equilibrium.2 We label such policy functions sophisticated policies and the resulting equilibrium a sophisticated equilibrium. If policies can be structured to ensure that the desired outcomes occur, then we say that the policies uniquely implement the desired outcome. Here we describe this approach and use it to analyze an important outstanding question in monetary economics: How should policy be designed in order to avoid indeterminacy and achieve unique implementation? It has been known, at least since the work of Sargent and Wallace (1975), that when interest rates are the policy instrument, many ways of specifying policy lead to indeterminate outcomes including multiple equilibria. Indeterminacy is risky because some of those outcomes can be bad, including hyperinflation. Researchers thus agree that designing policies that achieve unique implementation is desirable. Here we demonstrate that our sophisticated policy approach does that for monetary policy. We illustrate our approach in two standard monetary economies: a simple sticky-price model with one-period pricesetting and a sticky-price model with staggered price-setting (often referred to as the New Keynesian model). For both, we show that, under sufficient conditions, any outcome of a competitive equilibrium can be uniquely implemented by appropriately constructed sophisticated policies. In particular, the Ramsey equilibrium can be uniquely implemented. In the two model economies, we construct central bank policies that uniquely implement a desired competitive equilibrium in the same basic way. Along the equilibrium path, we choose the policies to be those given by the desired competitive equilibrium. We structure the policies off the equilibrium path, the reversion policies, to discourage deviations. Specifically, if the average choice of private agents deviates from that in the desired equilibrium, then we choose the reversion policies so that the optimal choice, or best response, of each individual agent is different from the average choice. One way to see why such reversion policies can eliminate multiplicity is to recall how multiple equilibria arise in the first 2. This requirement is the natural analog of subgame perfection in an environment in which private agents are competitive. In this sense, our equilibrium concept is the obvious one for our macroeconomic environment.

SOPHISTICATED MONETARY POLICIES

49

place. At an intuitive level, they arise if, when each agent believes that all other agents will choose some particular action other than the desired one, each agent finds it optimal to go along with the deviation by also picking that particular action. Our construction of reversion policies breaks the self-fulfilling nature of such deviations. It does so by ensuring that even if an agent believes that all other agents are choosing a particular action that differs from the desired action, the central bank policy makes it optimal for that agent not to go along with that deviation. When such reversion policies can be found, we say that the best responses are controllable. A sufficient condition for controllability is that policies can be found such that after a deviation the continuation equilibrium is unique and varies with policy. Variation with policy typically holds, so if policies can be found under which the continuation equilibrium is unique (somewhere), then we have unique implementation (everywhere). This sufficient condition suggests a simple way to state our message in a general way: uniqueness somewhere generates uniqueness everywhere. One concern with our construction of sophisticated policies is that it apparently relies on the idea that the central bank perfectly observes private agents’ actions and thus can detect any deviation. We show that this concern is unwarranted: our results are robust to imperfect information about private agents’ actions. Specifically, with imperfect detection of deviations, sophisticated policies can be designed that have unique equilibria that are close to the desired outcomes when the detection error is small and that converge to the desired equilibria as the detection error goes to zero. The approach proposed here suggests an operational guide to policy making: First use the Ramsey approach to determine the best competitive equilibrium, and then check whether in that situation, best responses are controllable. If they are, then sophisticated policies of the kind we have constructed can uniquely implement the Ramsey outcome. If best responses are not controllable, then the only option is to accept indeterminacy. Our work here is related to previous work on the problem of indeterminacy in monetary economies (Wallace 1981; Obstfeld and Rogoff 1983; King 2000; Benhabib, Schmitt-Groh´e, and Uribe 2001; Christiano and Rostagno 2001; Svensson and Woodford 2005). The previous work pursues an approach different from ours (and from that in the microeconomic literature on implementation); we call it unsophisticated implementation. The basic idea of that approach is to specify policies as functions of the history

50

QUARTERLY JOURNAL OF ECONOMICS

and check only to see whether the period-zero competitive equilibrium is unique. Unsophisticated implementation has been criticized in the macroeconomic and the microeconomic literature. For example, in the macroeconomic literature, Kocherlakota and Phelan (1999), Bassetto (2002), Buiter (2002), and Ljungqvist and Sargent (2004) criticize this general idea in the context of the fiscal theory of the price level; Bassetto (2005) criticizes it in the context of a simple tax example; and Cochrane (2007) criticizes it in the context of the literature on monetary policy rules. In the microeconomic literature, Jackson (2001) criticizes a related approach to implementation. In our view, unsophisticated implementation is deficient because it does not describe how the economy will behave after a deviation by private agents from the desired outcome. This deficiency leaves open the possibility that the approach achieves implementation via nonexistence. By this phrase, we mean an approach that specifies policy actions under which no continuation equilibrium exists after private agent deviations. We agree with those who argue that implementation via nonexistence trivializes the implementation problem. To see why it does, consider the following policy rule: If private agents choose the desired outcome, then continue with the desired policy; if private agents deviate from the desired outcome, then forever after set government spending at a high level and taxes at zero. Clearly, under this policy rule, any deviation from the desired outcome leads to nonexistence of equilibrium, and hence, we trivially have implementation via nonexistence. We find this way of achieving implementation unpalatable. Our approach, in contrast, insists that policies be specified such that a competitive equilibrium exists after any deviation. We achieve implementation in the traditional microeconomic sense— by discouraging deviations, not by nonexistence. In our approach, policies are specified so that even if an individual agent believes that all other agents will deviate to some specific action, that individual agent finds it optimal to choose a different action. Our approach not only ensures that the continuation equilibria always exist, but also has the desirable property that the reversion policies are not extreme in any sense. That is, after deviations, our reversion policies do not threaten the private economy with dire outcomes such as hyperinflation; they simply bring inflation back to the desired path.

SOPHISTICATED MONETARY POLICIES

51

Despite the shortcomings of the unsophisticated implementation approach, this literature has made two contributions that we find useful. One is the idea of regime-switching. This idea dates back at least to Wallace (1981) and has been used by Obstfeld and Rogoff (1983), Benhabib, Schmitt-Groh´e, and Uribe (2001), and Christiano and Rostagno (2001). The basic idea in, say, Benhabib, Schmitt-Groh´e, and Uribe (2001) is that if the economy embarks on an undesirable path, then the monetary and fiscal policy regime switches in such a way that the government’s budget constraint is violated, and the undesirable path is not an equilibrium. The other useful contribution of the literature on unsophisticated implementation is what Cochrane (2007) calls the King rule. This rule seeks to implement a desired equilibrium through an interest-rate policy that makes the difference between the interest rate and its desired equilibrium level a linear function of the difference between inflation and its desired equilibrium level, with a coefficient greater than 1. This idea dates back to at least King (2000) and has been used by Svensson and Woodford (2005). As we show here, the King rule, like other rules that use interest rates for all histories, namely, pure interest-rate rules, always leads to indeterminacy in our simple model and does so for a large class of parameters in our staggered price-setting model as well. We build on these two contributions by considering a King– money hybrid rule: When private agents deviate from the equilibrium path, the central bank uses the King rule for small deviations and switches regimes (from interest rates to money) for large deviations. Notice that with this rule, under our definition of equilibrium, outcomes return to the desired outcome path in the period after the deviation. In this sense, our hybrid rule achieves unique implementation without threatening agents with dire outcomes. Our work here is also related to another substantial literature that aims to find monetary policy rules which eliminate indeterminacy. (See, for example, McCallum [1981] and, more recently, Woodford [2003].) The recent literature argues that to achieve a unique outcome, interest-rate rules should follow the Taylor principle: interest rates relative to exogenously specified levels should rise more than one for one when inflation rates rise relative to their exogenously specified levels. We show here that adherence to the Taylor principle is neither necessary nor sufficient for unique implementation. It is not necessary because the sophisticated policy approach can uniquely

52

QUARTERLY JOURNAL OF ECONOMICS

implement any desired competitive equilibrium outcome, including outcomes in which, along the equilibrium path, the central bank follows an interest-rate rule that violates the Taylor principle. It is not sufficient because pure interest-rate rules may lead to indeterminacy even if they satisfy the Taylor principle. Notwithstanding these considerations, our analysis of the King–money hybrid rule does lend support to the idea that adherence to the Taylor principle can sometimes help achieve unique implementation. Specifically, this is true within the class of King– money hybrid rules when the Taylor principle is used in the region where the King part of the rules applies. Our findings also cast light on empirical investigations of determinacy based on the Taylor principle. We argue that, under the set of assumptions made explicit in the literature, inferences about determinacy based on existing estimation procedures should be treated skeptically. For our simple model economies, we provide assumptions under which such inferences can be confidently made. Although there is some hope that such inference may be possible in more interesting applied examples using variants of our assumptions, difficult challenges remain. Using sophisticated policies is our proposed way to eliminate indeterminacy when setting monetary policy. For some other re˜ Correia, cent proposals, see the work of Bassetto (2002) and Adao, and Teles (2007). II. A SIMPLE MODEL WITH ONE-PERIOD PRICE-SETTING We begin by illustrating the basic idea of our construction of sophisticated policies using a simple model with one-period price-setting. The dynamical system associated with the competitive equilibrium of this model is straightforward, which lets us focus on the strategic aspects of sophisticated policies. With this model, we demonstrate that any desired outcome of a competitive equilibrium can be uniquely implemented by sophisticated policies with reversion to a money regime. We show that pure interest-rate rules, which exclusively use interest rates as the policy instrument, cannot achieve unique implementation. Finally, we show that reversion to a particular hybrid rule, which uses interest rates as the policy instrument for small deviations and money for large deviations, can achieve unique implementation. The model we analyze here is a modified version of the basic sticky-price model with a New Classical Phillips curve (as in

SOPHISTICATED MONETARY POLICIES

53

Woodford [2003, Chap. 3, Sect. 1.3]). In order to make our results comparable to those in the literature, we here describe a simple, linearized version of the model. In Atkeson, Chari, and Kehoe (2009), we describe the general equilibrium version that, when linearized, produces the equilibrium conditions studied here. II.A. The Determinants of Output and Inflation Consider a monetary economy populated by a large number of identical, infinitely lived consumers, a continuum of producers, and a central bank. Each producer uses labor to produce a differentiated good on the unit interval. A fraction of producers j ∈ [0, α) are flexible-price producers, and a fraction j ∈ [α, 1] are sticky-price producers. In this economy, the timing within a period t is as follows. At the beginning of the period, sticky-price producers set their prices, after which the central bank chooses its monetary policy by setting one of its instruments, either interest rates or the quantity of money. Two shocks, ηt and νt , are then realized. We interpret the shock ηt as a flight to quality shock that affects the attractiveness of government debt relative to private claims and the shock νt as a velocity shock. At the end of the period, flexible-price producers set their prices, and consumers make their decisions. Now we develop necessary conditions for a competitive equilibrium in this economy and then, in the next section, formally define a competitive equilibrium. Here and throughout, we express all variables in log-deviation form. This way of expressing variables implies that none of our equations will have constant terms. Consumer behavior in this model is summarized by an intertemporal Euler equation and a cash-in-advance constraint. We can write the linearized Euler equation as (1)

yt = Et [ yt+1 ] − ψ (it − Et [πt+1 ]) + ηt ,

where yt is aggregate output, it is the nominal interest rate, ηt (the flight to quality shock) is an i.i.d. mean-zero shock with variance var(η), and πt+1 = pt+1 − pt is the inflation rate from time period t to t + 1 , where pt is the aggregate price level. The parameter ψ determines the intertemporal elasticity, and Et denotes the expectations of a representative consumer given that consumer’s information in period t, which includes the shock ηt .

54

QUARTERLY JOURNAL OF ECONOMICS

The cash-in-advance constraint, when first-differenced, implies that the relationships among inflation πt , money growth μt , and output growth yt − yt−1 are given by a quantity equation of the form (2)

πt = μt − (yt − yt−1 ) + νt ,

where νt (the velocity shock) is an i.i.d. mean-zero shock with variance var(ν). We turn now to producer behavior. The optimal price set by an individual flexible-price producer j satisfies p f t ( j) = pt + γ yt ,

(3)

where the parameter γ is the elasticity of the equilibrium real wage with respect to output (often referred to in the literature as Taylor’s γ ). The optimal price set by a sticky-price producer j satisfies (4)

pst ( j) = Et−1 [ pt + γ yt ] ,

where Et−1 denotes expectations at the beginning of period t before the shocks ηt and νt are realized. The aggregate price level pt is a linear combination of the prices p f t set by the flexible-price producers and the prices pst set by the sticky-price producers and is given by α 1 (5) pt = p f t ( j) dj + pst ( j) dj. 0

α

Using language from game theory, we can think of equations (3) and (4) as akin to the best responses of the flexible- and stickyprice producers given their beliefs about the aggregate price level and aggregate output. In this model, the flexible-price producers are strategically uninteresting. Their expectations about the future have no influence on their decisions; their prices are set mechanically according to the static considerations reflected in (3). Thus, in all that follows, equation (3) will hold on and off the equilibrium path, and we can think of p f t ( j) as being residually determined by (3) and substitute out for p f t ( j). To do so, substitute (3) into (5) and solve for pt to get 1 1 (6) pt = κ yt + pst ( j) dj, 1−α α where κ = αγ /(1 − α).

SOPHISTICATED MONETARY POLICIES

55

We follow the literature and express the sticky-price producers’ decisions in terms of inflation rates rather than price levels. To do so, let xt ( j) = pst ( j) − pt−1 , and rewrite (4) as (7)

xt ( j) = Et−1 [πt + γ yt ] .

For convenience, we define (8)

1 xt = 1−α

α

1

xt ( j) dj

to be the average price set by the sticky-price producers relative to the aggregate price level in period t − 1, so that we can rewrite (7) as (9)

xt = Et−1 [πt + γ yt ] .

We can also rewrite (6) as (10)

πt = κ yt + xt .

Consider now the setting of monetary policy in this model. When the central bank sets its policy, it has to choose to operate under either a money regime or an interest-rate regime. In the money regime, the central bank’s policy instrument is money growth μt ; it sets μt , and the nominal interest rate it is residually determined from the Euler equation (1) after the realization of the shock ηt . In the interest-rate regime, the central bank’s instrument is the interest rate; it sets it , and money growth μt is residually determined from the cash-in-advance constraint (2) after the realization of the shock νt . Of course, in both regimes, the Euler equation and the cash-in-advance constraint both hold. II.B. Competitive Equilibrium Now we define a notion of competitive equilibrium for the simple model in the spirit of the work of Barro (1979) and Lucas and Stokey (1983). In this equilibrium, allocations, prices, and policies are all defined as functions of the history of exogenous events, or shocks, st = (s0 , . . . , st ), where st = (ηt , νt ). Sticky-price producer decisions and aggregate inflation and output levels can be summarized by {xt (st−1 ), πt (st ), yt (st )}. In terms of the policies, we let the regime choice and the policy choice within the regime be δt (st−1 ) = (δ1t (st−1 ), δ2t (st−1 )), where the first argument δ1t (st−1 ) ∈ {M, I} denotes the regime choice, either money (M) or the interest rate (I), and the second argument

56

QUARTERLY JOURNAL OF ECONOMICS

denotes the policy choice within the regime, either money growth μt (st−1 ) or the interest rate it (st−1 ). If the money regime is chosen in t, then the interest rate is determined residually at the end of that period, whereas if the interest-rate regime is chosen in t, then the money growth rate is determined residually at the end of the period. Let {at (st )} = {xt (st−1 ), δt (st−1 ), πt (st ), yt (st )} denote a collection of allocations, prices, and policies in this competitive equilibrium. Such a collection is a competitive equilibrium given y−1 if it satisfies (i) consumer optimality, namely, (1) and (2) for all st ; (ii) optimality by sticky-price producers, namely, (9) for all st−1 ; and (iii) optimality by flexible-price producers, namely, (10) for all st . We also define a continuation competitive equilibrium starting from any point in time. For example, consider the beginning of period t with state variables st−1 and yt−1 . A collection of allocations, prices, and policies {a(st−1 , yt−1 )}r≥t = {xr (sr−1 | st−1 , yt−1 ), δr (sr−1 | st−1 , yt−1 ), πr (sr | st−1 , yt−1 ), yr (sr | st−1 , yt−1 )}r≥t is a continuation competitive equilibrium from (st−1 , yt−1 ) if it satisfies the three conditions of a competitive equilibrium above for all periods starting from (st−1 , yt−1 ). In this definition, we effectively drop the equilibrium conditions from period 0 through period t − 1. This notion of a continuation competitive equilibrium from the beginning of period t onward is very similar to that of a competitive equilibrium from the beginning of period 0 onward, except that the initial conditions are now given by (st−1 , yt−1 ). We define a continuation competitive equilibrium that starts at the end of period t from (st−1 , yt−1 , xt , δt , st ) in a similar way. This latter definition requires optimality by consumers and flexibleprice producers from st onward and optimality by sticky-price producers from st+1 onward. Note that this equilibrium must satisfy all the conditions of a continuation competitive equilibrium that starts at the beginning of period t, except for the sticky-price optimality condition in period t, namely, (9) in period t. Finally, a continuation competitive equilibrium starting at the beginning of period 0 is simply a competitive equilibrium. The following lemma proves that any competitive equilibrium gives rise to a New Classical Phillips curve along with some other useful properties of such an equilibrium.

SOPHISTICATED MONETARY POLICIES

57

LEMMA 1 (New Classical Phillips Curve and Other Useful Properties). Any competitive equilibrium must satisfy (11)

πt (st ) = κ yt (st ) + E[πt (st ) | st−1 ],

which is often referred to as the New Classical Phillips curve; E[yt (st ) | st−1 ] = 0 and xt (st−1 ) = E[πt (st ) | st−1 ]; and (12) (13) E[xt+1 (st ) | st−1 ] = E[πt+1 (st+1 ) | st−1 ] = it , where it = it (st−1 ) if the central bank uses an interest-rate regime in period t and it = it (st ) if the central bank uses a money regime in period t. Proof. To see that E[yt (st ) | st−1 ] = 0, take expectations of (10) into (9). Using this result in (10), we as of st−1 and substitute obtain xt (st−1 ) = E πt (st ) | st−1 . Substituting this result into (10) yields (11). To show (13), take expectations of the Euler equation (1) with respect to st−1 and use E[yt (st ) | st−1 ] = 0 along with the law of iterated expectations to get (13). QED A similar argument establishes that (11)–(13) hold for any continuation competitive equilibrium. II.C. Sophisticated Equilibrium We now turn to what we call sophisticated equilibrium. The definition of this concept is very similar to that for competitive equilibrium, except that here we allow allocations, prices, and policies to be functions of more than just the history of exogenous events; they are also functions of the history of both aggregate private actions and central bank policies. For sophisticated equilibrium, we require as well that for every history, the continuation of allocations, prices, and policies from that history onward constitutes a continuation competitive equilibrium. Setup and Definition. Before turning to our formal definition, we note that our definition of sophisticated equilibrium simply specifies policy rules that the central bank must follow; it does not require that the policy rules be optimal. We specify sophisticated policies in this way in order to show that our unique implementation result does not depend on the objectives of the central bank. We think of sophisticated policies as being specified at the beginning of period 0 and of the central bank as being committed to following them.

58

QUARTERLY JOURNAL OF ECONOMICS

We turn now to defining the histories that private agents and the central bank confront when they make their decisions. The public events that occur in a period are, in chronological order, qt = (xt ; δt ; st ; yt , πt ). Letting ht denote the history of these events from period −1 up to and including period t, we have that ht = (ht−1 , qt ) for t ≥ 0. The history h−1 = y−1 is given. For notational convenience, we focus on perfect public equilibria in which the central bank’s strategy (choice of regime and policy) is a function only of the public history. The public history faced by the sticky-price producers at the beginning of period t when they set their prices is ht−1 . A strategy for the sticky-price producers is a sequence of rules σx = {xt (ht−1 )} for choosing prices for every possible public history. The public history faced by the central bank when it chooses its regime and sets either its money-growth or interest-rate policy is hgt = (ht−1 , xt ). A strategy for the central bank {δt (hgt )} is a sequence of rules for choosing the regime as well as the policy within the regime, either μt (hgt ) or it (hgt ). Let σg denote that strategy. At the end of period t, then, output and inflation are determined as functions of the relevant history hyt according to the rules yt (hyt ) and πt (hyt ). We let σ y = {yt (hyt )} and σπ = {πt (hyt )} denote the sequence of output and inflation rules. Notice that for any history, the strategies σ induce continuation outcomes in the natural way. For example, starting at some history ht−1 , these strategies recursively induce outcomes {ar (sr | ht−1 ; σ )}. We illustrate this recursion for period t. The sticky-price producer’s decision in t is given by xt ( j, st−1 | ht−1 ; σ ) = xt (ht−1 ), where xt (ht−1 ) is obtained from σx . The central bank’s decision in t is given by δt (st−1 | ht−1 ; σ ) = δt (hgt ), where hgt = (ht−1 , xt (ht−1 )) and δt (hgt ) is obtained from σg . The consumer and flexible-price producer decisions in t are given by yt (st | ht−1 ; σ ) = yt (hyt ) and πt (st | ht−1 ; σ ) = πt (hyt ), where hyt = (ht−1 , xt (ht−1 ), δt (ht−1 , xt (ht−1 ))) and yt (hyt ) and πt (hyt ) are obtained from σ y and σπ . Continuing in a similar way, we can recursively define continuation outcomes for subsequent periods. We can likewise define continuation outcomes {ar (sr | hgt ; σ )} and {ar (sr | hyt ; σ )} following histories hgt and hyt , respectively. We now use these strategies and continuation outcomes to formally define our notion of equilibrium. A sophisticated equilibrium given the policies here is a collection of strategies (σx , σg ) and allocation rules (σ y , σπ ) such that (i) given any history ht−1 , the continuation outcomes {ar (sr | ht−1 ; σ )} induced by σ constitute

SOPHISTICATED MONETARY POLICIES

59

a continuation competitive equilibrium and (ii) given any history hyt , so do the continuation outcomes {ar (sr | hyt ; σ )}.3 Associated with each sophisticated equilibrium σ = (σg , σx , σ y , σπ ) are the particular stochastic processes for outcomes that occur along the equilibrium path, which we call sophisticated outcomes. These outcomes are competitive equilibrium outcomes. We will say a policy σg∗ uniquely implements a desired competitive equilibrium {at∗ (st )} if the sophisticated outcome associated with any sophisticated equilibrium of the form (σg∗ , σx , σ y , σπ ) coincides with the desired competitive equilibrium. A central feature of our definition of sophisticated equilibrium is our requirement that for all histories, including deviation histories, the continuation outcomes constitute a continuation competitive equilibrium. We think of this requirement as analogous to the requirement that in a subgame perfect equilibrium, the continuation strategies constitute a Nash equilibrium. This requirement constitutes the most important difference between our approach to determinacy and that in the macroeconomic literature. Technically, one way of casting that literature’s approach into our language of strategies and allocation rules is to consider the following notion of equilibrium. An unsophisticated equilibrium is a strategy for the central bank σg and allocations, policies, and prices {at (st )} = {xt (st−1 ), δt (st−1 ), πt (st ), yt (st )} such that {at (st )} is a period-zero competitive equilibrium and the policies induced by σg from {at (st )} coincide with {δt (st−1 )}. In our view, unsophisticated equilibrium is a deficient guide to policy. Although an unsophisticated equilibrium does tell policy makers what to do for every history, it does not specify what will happen under their policies for every history, in particular for deviation histories. Achieving implementation using the notion of unsophisticated equilibrium is, in general, trivial. As we explained earlier, one way of achieving implementation is via nonexistence: simply specify policies so that no competitive equilibrium exists after deviation histories. We find this way of achieving implementation uninteresting. 3. In general, a sophisticated equilibrium would require that for every history (including histories in which the government acts, hgt ), the continuation outcomes from that history onward constitute a competitive equilibrium. Here, that requirement would be redundant because the conditions for a competitive equilibrium for hgt are the same as those for hyt .

60

QUARTERLY JOURNAL OF ECONOMICS

Finally, to help avoid a common confusion, we stress that our definition does not require that, when there is a deviation in period t, the entire sequence starting from period 0, including the deviation in period t, constitute a period-zero competitive equilibrium. Indeed, if we achieve unique implementation, then such a sequence will not constitute a period-zero equilibrium. Implementation with Sophisticated Policies. We focus on implementing competitive equilibria with sophisticated policies in which the central bank uses interest rates along the equilibrium path. This focus is motivated in part by the observation that most central banks seem to use interest rates as their policy instruments. Another motivation is that if the variance of the velocity shock νt is large, then all of the outcomes under the money regime are undesirable. To set up our construction of sophisticated policies, recall that in our economy the only strategically interesting agents are the sticky-price producers. Their choices must satisfy a key property, that (14)

xt (ht−1 ) = E[πt (hyt ) + γ yt (hyt ) | ht−1 ],

where hyt = (ht−1 , xt (ht−1 ), δt (ht−1 , xt (ht−1 )), st ). Notice that xt (ht−1 ) shows up on both sides of equation (14), so we require that the optimal choice xt (ht−1 ) satisfy a fixed point property. To get some intuition for this property, suppose that each sticky-price producer believes that all other sticky-price producers will choose some value, say, xˆt . This choice, together with the central bank’s strategy ˆ yt ) and the inflation and output rules, induces the outcomes πt (h ˆ ˆ and yt (hyt ), where hyt = (ht−1 , xˆt , δt (ht−1 , xˆt ), st ). The fixed point property requires that for xˆt to be part of an equilibrium, each sticky-price producer’s best response must coincide with xˆt . The basic idea behind our sophisticated policy construction is that the central bank starts by picking any desired competitive equilibrium allocations and sets its policy on the equilibrium path consistent with them. The central bank then constructs its policy off the equilibrium path so that even if an individual agent believes that all other agents will deviate to some specific action, that individual agent finds it optimal to choose a different action. In this sense, the policies are specified so that the fixed point property is satisfied at only the desired allocations.

SOPHISTICATED MONETARY POLICIES

61

We now analyze several possible ways for a central bank to attempt the implementation of competitive equilibria in which it uses interest rates as its monetary policy instrument. With reversion to a money regime. We show first that in the simple sticky-price model, any competitive equilibrium in which the central bank uses the interest rate as its instrument in all periods can be uniquely implemented with sophisticated policies that involve a one-period reversion to money. Under these policies, after a deviation, the central bank switches to a money regime for one period. More precisely, fix a desired competitive equilibrium outcome path (xt∗ (st−1 ), πt∗ (st ), yt∗ (st )) together with central bank policies it∗ (st−1 ). Consider the following trigger-type policy: If sticky-price producers choose xt in period t to coincide with the desired outcomes xt∗ (st−1 ), then let central bank policy in t be it∗ (st−1 ). If not, and these producers deviate to some xˆt = xt∗ (st−1 ), then for that period t, let the central bank switch to a money regime with a suitably chosen level of money growth. This level of money growth makes it not optimal for any individual sticky–price setter to cooperate with the deviation. If such a level of money growth exists, we say that the best responses of the sticky–price setters are controllable. The following lemma shows that this property holds for our model. LEMMA 2 (Controllability of Best Responses with One-Period Price-Setting). For any history (ht−1 , xˆt ), if the central bank chooses the money regime, then there exists a choice for money growth μt such that (15)

ˆ yt ) + γ yt (h ˆ yt )], xˆt = E[πt (h

where hyt = (ht−1 , xˆt , M, μt ). Proof. Substituting (2) into (10), we have a result showing that if the central bank chooses the money regime with money growth μt , then output yt and inflation πt are uniquely determined and given by (16) (17)

μt + νt + yt−1 − xˆt , 1+κ πt = κ yt + xˆt . yt =

62

QUARTERLY JOURNAL OF ECONOMICS

Hence, ˆ yt ) + γ yt (h ˆ yt )] = E[πt (h

κ +γ (μt + yt−1 − xˆt ) + xˆt . 1+κ

Clearly, then, any choice of μt = xˆt − yt−1 will ensure that (15) holds. QED We use this lemma to guide our choice of the suitable money growth rate after deviations. We choose this growth rate to generate the same expected inflation as in the original equilibrium. (Of course, we could have chosen many other values that also would discourage deviations, but we found this value to be the most intuitive.4 ) In particular, if the producers deviate to some xˆt = xt∗ (st−1 ), then for that period t, let the central bank switch to a money regime with money growth set so that (18)

μt = xˆt − yt−1 +

1 + κ ∗ t−1 xt (s ) − xˆt ) . κ

Note that μt = xˆt − yt−1 . With such a money growth rate, expected inflation is the same in the reversion period as it would have been in the desired outcome. From Lemma 1, such a choice of xˆt cannot be part of an equilibrium. It is also easy to see that if a deviation occurs in period t, the economy returns to the desired outcomes in period t + 1. We have established the following proposition. PROPOSITION 1 (Unique Implementation with Money Reversion). Any competitive equilibrium outcome in which the central bank uses interest rates as its instrument can be implemented as a unique equilibrium with sophisticated policies with one-period reversion to a money regime. Moreover, under this rule, after any deviation in period t, the equilibrium outcomes from period t + 1 are the desired outcomes. A simple way to describe our unique implementation result is that controllability of best responses under some regime guarantees unique implementation of any desired outcome. We obtain controllability by reversion to a money regime. Note that even though the money regime is not used on the equilibrium path, it is useful as an off-equilibrium commitment that helps support 4. We choose this part of the policy as a clear demonstration that after a deviation, the central bank is not doing anything exotic, such as producing a hyperinflation. Rather, in an intuitive sense, the central bank is simply getting the economy back on the track it had been on before the deviation threatened to shift it in another direction.

SOPHISTICATED MONETARY POLICIES

63

desired outcomes in which the central bank uses interest rates on the equilibrium path. Notice also that the proposition implies that deviations lead to only very transitory departures from desired outcomes. In particular, we do not achieve implementation by threatening the economy with dire outcomes after deviations. (Note that the particular result, that the economy returns exactly to the desired outcomes in the period after the deviation, would not hold in a version of this model with state variables, such as capital.) So far we have focused on uniquely implementing competitive outcomes when the central bank uses interest rates as its instrument. Equations (16) and (17) imply that the equilibrium outcome under a money regime is unique, so that implementing desired outcomes is trivial when the central bank uses money as its instrument. Clearly, we can use a simple generalization of Proposition 1 to uniquely implement a competitive equilibrium in which the central bank uses interest rates in some periods and money in others. With pure interest-rate rules. Now, as a second possible way for a central bank to implement competitive equilibria, we analyze pure interest-rate rules. We find that this way cannot achieve unique implementation. We begin with a pure interest-rate rule of the form (19)

it (st−1 ) = it∗ (st−1 ) + φ(xt (st−1 ) − xt∗ (st−1 )),

where it∗ (st−1 ) and xt∗ (st−1 ) are the interest rates and the stickyprice producer choices associated with a competitive equilibrium that the central bank wants to implement uniquely, and the parameter φ represents how aggressively the central bank changes interest-rates when private agents deviate from the desired equilibrium. Notice that this rule (19) specifies policy both on and off the equilibrium path. On the equilibrium path, xt (st−1 ) = xt∗ (st−1 ), and the rule yields it (st−1 ) = it∗ (st−1 ). Off the equilibrium path, the rule specifies how it (st−1 ) should differ from it∗ (st−1 ) when xt (st−1 ) differs from xt∗ (st−1 ). Pure interest-rate rules of the form (19) have been discussed by King (2000) and Svensson and Woodford (2005). We follow Cochrane (2007) and call (19) the King rule. Note from Lemma 1 that xt (st−1 ) = E[πt (st ) | st−1 ], so that the King rule can be thought of as targeting expected inflation, in the

64

QUARTERLY JOURNAL OF ECONOMICS

sense that (19) is equivalent to (20)

it (st−1 ) = it∗ (st−1 ) + φ(E[πt (st ) | st−1 ] − E[πt∗ (st ) | st−1 ]).

We now show that if the central bank follows the King rule (19), it cannot ensure unique implementation of the desired outcome. Indeed, under this rule, the economy has a continuum of equilibria. More formally: PROPOSITION 2 (Indeterminacy of Equilibrium under the King Rule). Suppose the central bank sets interest rates it according to the simple economy’s King rule (19). Then any of the continuum of sequences indexed by the initial condition x0 and the parameter c that satisfies (21)

xt+1 = it + cηt , πt = xt + κ(1 + ψc)ηt , and yt = (1 + ψc)ηt

is a sophisticated outcome. Proof. In order to verify that the multiple outcomes that satisfy (21) are part of a period-zero competitive equilibrium, we need to check that they satisfy (1), (9), and (10). That they satisfy (9) follows by taking expectations of the second and third equations in (21). Substituting for it from (19) and for xt+1 from (21) into (1), we obtain that yt = (1 + ψc)ηt , as required by (21). Inspecting the expressions for πt and yt in (21) shows that they satisfy (10). Clearly, any such period-zero competitive equilibrium can be supported by a government strategy, σg , of the King rule form and QED appropriately chosen σx , σ y , and σπ . The intuitive idea behind the multiplicity of equilibria associated with the initial condition x0 is that interest-rate rules, including the King rule, induce nominal indeterminacy and do not pin down the initial price level. The intuitive idea behind the multiplicity of stochastic equilibria associated with c = 0 is that interest rates pin down only expected inflation and not the stateby-state realizations indexed by the parameter c. Note that Proposition 2 implies that even if the King rule parameter φ > 1, the economy has a continuum of equilibria. In that case, all but one of the equilibria has exploding inflation, in the sense that inflation eventually becomes unbounded. In the literature, researchers often restrict attention to bounded equilibria. We argue that, in this model, equilibria with exploding inflation

SOPHISTICATED MONETARY POLICIES

65

cannot be dismissed on logical grounds. Indeed, these equilibria are perfectly reasonable because the inflation explosion is associated with a money supply explosion. To see this association, suppose that the economy has no stochastic shocks and the desired outcomes are πt = 0 and yt = 0 in all periods. Then, from the cash-in-advance constraint (2), we know that the growth of the money supply is given by μt = xt = φ t x0 .

(22)

Thus, in these equilibria, inflation explodes because money growth explodes. Each equilibrium is indexed by a different initial value of the endogenous variable x0 . This endogenous variable depends solely on expectations of future policy and is not pinned down by any initial condition or transversality condition. Such equilibria are reasonable because at the core of most monetary models is the idea that the central bank’s printing of money at an ever-increasing rate leads to a hyperinflation. In these equilibria, inflation does not arise from the speculative reasons analyzed by Obstfeld and Rogoff (1983), but from the conventional money-printing reasons analyzed by Cagan (1956). In this sense, our model predicts, for perfectly standard and sensible reasons, that the economy can suffer from any one of a continuum of very undesirable paths for inflation. (Cochrane [2007] makes a similar point for a flexible-price model.) The same proposition obviously applies to more general interest-rate rules that are restricted to be the same on and off the equilibrium path. For example, Proposition 2 applies to linear feedback rules of the form (23)

it = ¯ıt +

∞ s=0

φxs xt−s +

∞ s=1

φ ys yt−s +

∞

φπs πt−s ,

s=1

where the intercept term ¯ıt can depend on the history of stochastic events. With reversion to a hybrid rule. Analysis of a third possible way to implement competitive equilibria is a bit more complicated. In Proposition 1, we have shown how reversion to a money regime can achieve unique implementation. In Proposition 2 and the subsequent discussion, we have shown that pure interest-rate rules, such as the King rule, cannot. Notice that in our money reversion policies, even tiny deviations trigger a reversion to a money

66

QUARTERLY JOURNAL OF ECONOMICS

regime. A natural question arises: Can unique implementation be achieved using a combination of these two strategies, or a hybrid rule, specifying, for example, that the central bank continue to use interest rates unless the deviations are very large and then revert to a money regime? The answer is yes. To see this, consider a particular hybrid rule that is intended to implement a bounded competitive equilibrium {xt∗ (st−1 ), πt∗ (st ), yt∗ (st )} with an associated interest rate it∗ (st−1 ). Fix some x¯ and x which satisfy x¯ > maxt xt∗ (st−1 ) and x < mint xt∗ (st−1 ). What we will call the King–money hybrid rule specifies that if xt (st−1 ) is within ¯ then the central bank follows a the interest-rate interval [x, x], King rule of the form (19); and if xt (st−1 ) falls outside this interval, then the central bank reverts to a money regime and chooses the money growth rate that produces an expected inflation rate π¯ ∈ ¯ That the money growth rate can be so chosen follows from [x, x]. (16) and (17). We show that an attractive feature of outcomes under this hybrid rule is that deviations from the desired path lead only to very transitory movements away from the desired path. More precisely, after any deviation in period t, even though inflation and output in period t may differ from the desired outcomes, those in subsequent periods coincide with the desired outcomes. More formally: PROPOSITION 3 (Unique Implementation with a Hybrid Rule). In the simple economy, the King–money hybrid rule with φ > 1 uniquely implements any bounded competitive equilibrium. Moreover, under this rule, after any deviation in period t, the equilibrium outcomes from period t + 1 are the desired outcomes. We prove this proposition in the Appendix. Here we simply sketch the argument for a deterministic version of the model. The key to the proof is a preliminary result that shows that no ¯ To see equilibrium outcome xt can be outside the interval [x, x]. that this is true, suppose that in some period t, xt is outside that interval. But when this is true, the hybrid rule specifies a money growth rate in that period that yields expected inflation inside the interval. Because xt equals expected inflation, this gives a contradiction and proves the preliminary result. To establish uniqueness, suppose that there is some sophisticated equilibrium with xˆr = xr∗ for some r. From the prelimi¯ where the King rule nary result, xˆr must be in the interval [x, x]

SOPHISTICATED MONETARY POLICIES

67

is operative. From Lemma 1, we know that in any equilibrium, it = xt+1 , so that the King rule implies that ∗ xˆt+1 − xt+1 = φ xˆt − xt∗ = φ t−r (xˆr − xr∗ ). Because φ > 1 and xt∗ is bounded, eventually xˆt+1 must leave the ¯ which is a contradiction. interval [x, x], Extension to Interest-Elastic Money Demand. So far, to keep the exposition simple, we have assumed a cash-in-advance setup in which money demand is interest-inelastic. This feature of the model implies that if a money regime is adopted in some period t, then the equilibrium outcomes in that period are uniquely determined by the money growth rate in that period. This uniqueness under a money regime is what allows the central bank to switch to a one-period money regime in order to support any desired competitive equilibrium. Now we consider economies with interest-elastic money demand. We argue that under appropriate conditions, our unique implementation result extends to such economies. When economies have interest-elastic money demand, sophisticated policies that specify reversion to money or to a hybrid rule can uniquely implement any desired outcome if best responses are controllable. A sufficient condition for such controllability is that competitive equilibria are unique under a suitably chosen money regime. Here, as with inelastic money demand, the uniqueness under a money regime is what enables unique implementation. A sizable literature has analyzed the uniqueness of competitive equilibria under money growth policies with interest-elastic money demand. Obstfeld and Rogoff (1983) and Woodford (1994) provide sufficient conditions for this uniqueness. For example, Obstfeld and Rogoff (1983) consider a money-in-the-utilityfunction model with preferences of the form u(c) + v(m), where c is consumption and m is real money balances, and show that a sufficient condition for uniqueness under a money regime is lim mv (m) > 0.

m→0

Obstfeld and Rogoff (1983) focus attention on flexible-price models, but their results can be readily extended to our simple sticky-price model. Indeed, their sufficient conditions apply unchanged to a deterministic version of that model because our model without shocks is effectively identical to a flexible-price model. Hence, under appropriate sufficient conditions, our unique

68

QUARTERLY JOURNAL OF ECONOMICS

implementation result extends to environments with interestelastic money demand. More generally, for our hybrid rule to uniquely implement desired outcomes, we need a reversion policy that has a unique equilibrium. An alternative to a money regime is a commodity standard such as those in the work of Wallace (1981) and Obstfeld and Rogoff (1983). With this type of standard, the government promises to redeem money for goods for some arbitrarily low price and finances the supply of goods with taxation. An alternative to our hybrid rule with money reversion is, therefore, a hybrid rule with reversion to a commodity standard.

III. A MODEL WITH STAGGERED PRICE-SETTING We turn now to a version of our simple model with staggered price-setting, often referred to as the New Keynesian model. We show that, along the lines of the argument developed above, policies with infinite reversion to either a money regime or a hybrid rule can uniquely implement any desired outcome under an interest-rate regime. We also show that for a large class of economies, pure interest-rate rules of the King form still lead to indeterminacy. To make our points in the simplest way, we abstract from aggregate uncertainty. III.A. Setup and Competitive Equilibrium We begin by setting up the model with staggered price-setting. In the model, prices are set in a staggered fashion as in the work of Calvo (1983). At the beginning of each period, a fraction 1 − α of producers are randomly chosen and allowed to reset their prices. After that, the central bank makes its decisions, and then, finally, consumers make theirs. This economy has no flexible-price producers. The linearized equations in this model are similar to those in the simple model. The Euler equation (1) and the quantity equation (2) are unchanged, except that here they have no shocks. The price set by a producer permitted to reset its price is given by the analog of (4), which is (24)

pst ( j) = (1 − αβ)

∞ r=0

(αβ)r−t (γ yr + pr ) ,

SOPHISTICATED MONETARY POLICIES

69

where β is the discount factor. Here, again, Taylor’s γ is the elasticity of the equilibrium real wage with respect to output. Letting pst denote the average price set by producers permitted to reset their prices in period t, we can recursively rewrite this equation as (25)

pst ( j) = (1 − αβ) (γ yt + pt ) + αβpst+1 ,

together with a type of transversality condition limT →∞ (αβ)T psT ( j) = 0. The aggregate price level can then be written as (26)

pt = αpt−1 + (1 − α) pst .

To make our analysis parallel to the literature, we again translate the decisions of the sticky-price producers from price levels to inflation rates. Letting xt ( j) = pst ( j) − pt−1 and letting xt denote the average of xt ( j), with some manipulation we can rewrite (25) as (27)

xt = (1 − αβ)γ yt + πt + αβxt+1 .

We can also rewrite (26) as (28)

πt = (1 − α)xt

and the transversality condition as limT →∞ (αβ)T xt ( j) = 0. Using (28) and the fact that xt is the average of xt ( j) implies this condition is equivalent to (29)

lim (αβ)t πt = 0.

t→∞

In addition to these conditions, we now argue that in this staggered price-setting model, a competitive equilibrium must satisfy two boundedness conditions. In general, boundedness conditions are controversial in the literature. Standard analyses of New Keynesian models impose strict boundedness conditions: in any reasonable equilibrium, both output and inflation must be bounded both above and below. Cochrane (2007) has forcefully criticized this practice, arguing that any boundedness condition must have a solid economic rationale. Here we provide rationales for two such conditions: output yt must be bounded above, so that (30)

yt ≤ y¯

for some y¯ ,

70

QUARTERLY JOURNAL OF ECONOMICS

and interest rates must be bounded below, so that (31)

it ≥ i

for some i.

The rationale for output being bounded above is that the economy has a finite amount of labor to produce the output. The rationale for requiring that interest rates be bounded below comes from the restriction that the nominal interest rate must be nonnegative.5 These bounds allow outcomes in which (the log of) output, yt , falls without bound (so that the level of output converges to zero). The bounds also allow for outcomes in which inflation rates explode upward without limit. Here, then, a collection of allocations, prices, and policies at = {xt , δt , πt , yt } is a competitive equilibrium if it satisfies (i) consumer optimality, namely, the deterministic versions of (1) and (2); (ii) sticky-price producer optimality, (27)–(29); and (iii) the boundedness conditions, (30) and (31). Note that any allocations that satisfy (27)–(29) also satisfy the New Keynesian Phillips curve, (32)

πt = κ yt + βπt+1 ,

where now κ = (1 − α)(1 − αβ)γ /α. To see this result, use (28) to substitute for xt and xt+1 in (27) and collect terms. Here, as we did in the simple-sticky price model, we define continuation competitive equilibria. For example, consider the beginning of period t with a state variable yt−1 . A collection of allocations a(yt−1 ) = {xr (yt−1 ), δr (yt−1 ), πr (yt−1 ), yr (yt−1 )}r≥t is a continuation competitive equilibrium with yt−1 if it satisfies the three conditions of a competitive equilibrium above in all periods r ≥ t. A continuation competitive equilibrium that starts at the end of period t given (yt−1 , xt , δt ) is defined similarly. This definition requires optimality by consumers from t onward and optimality by sticky-price producers from t + 1 onward. III.B. Sophisticated Equilibrium We turn now to sophisticated equilibrium in the staggered price-setting model, its definition and how it can be implemented. 5. Note that even though the real value of consumer holdings of bonds must satisfy a transversality condition, this condition does not impose any restrictions on the paths of yt and πt . The reason is that in our nonlinear model, the government has access to lump-sum taxes, so that government debt can be arbitrarily chosen to satisfy any transversality condition.

SOPHISTICATED MONETARY POLICIES

71

Definition. The definition of a sophisticated equilibrium in the staggered price-setting model parallels that in the simple sticky-price model. The elements needed for that definition are basically the same. The public events that occur in a period are, in chronological order, qt = (xt ; δt ; yt , πt ). We let ht−1 denote the history of these events up until the beginning of period t. A strategy for the sticky-price producers is a sequence of rules σx = {xt (ht−1 )}. The public history faced by the central bank is hgt = (ht−1 , xt ) and its strategy, {δt (hgt )}. The public history faced by consumers in period t is hyt = (ht−1 , xt , δt ). We let σ y = {yt (hyt )} and σπ = {πt (hyt )} denote the sequences of output and inflation rules. Strategies and allocation rules induce continuation outcomes written as {ar (ht−1 ; σ )}r≥t or {a(hyt ; σ )}r≥t in the obvious recursive fashion. Formally, then, a sophisticated equilibrium given the policies here is a collection of strategies (σx , σg ) and allocation rules (σ y , σπ ) such that (i) given any history ht−1 , the continuation outcomes {ar (ht−1 ; σ )}r≥t induced by σ constitute a continuation competitive equilibrium and (ii) given any history hyt , so do the continuation outcomes {ar (hyt ; σ )}r≥t . In this model, as in the simple sticky-price model, the choices of the sticky-price producers must satisfy a key fixed point property, that (33)

xt (ht−1 ) = (1 − αβ)γ yt (hyt ) + πt (hyt ) + αβxt+1 (ht ),

where hyt = (ht−1 , xt (ht−1 ), δt (ht−1 , xt (ht−1 ))) and ht = (hyt , πt (hyt ), yt (hyt )). Here, as in the simple sticky-price model, xt (ht−1 ) shows up on both sides of the fixed point equation—on the right side, through its effect on the histories hyt and ht . Implementation with Sophisticated Policies. We now show that in the staggered price-setting model, any competitive equilibrium can be uniquely implemented with sophisticated policies. The basic idea behind our construction is, again, that the central bank starts by picking any competitive equilibrium allocations and sets its policy on the equilibrium path consistent with those allocations. The central bank then constructs its policy off the equilibrium path so that any deviations from these allocations would never be a best response for any individual price-setter. In so doing, the constructed sophisticated policies support the chosen allocations as the unique equilibrium allocations. As we did with the simple model, here we show that, under sufficient conditions, policies that specify infinite reversion

72

QUARTERLY JOURNAL OF ECONOMICS

to a money regime can achieve unique implementation, a pure interest-rate rule of the King rule form cannot, and a King–money hybrid rule can. With reversion to a money regime. We start with sophisticated policies that specify reversion to a money regime after deviations. In our construction of sophisticated policies, we assume that the best responses of sticky-price producers are controllable in that if they deviate by setting xˆt = xt∗ , then by infinitely reverting to the money regime, the central bank can set money growth rate policies so that the profit-maximizing value of xt ( j) is such that xt ( j) = xˆt . The sophisticated policy that supports a desired outcome is to follow the chosen monetary policy as long as private agents have not deviated from the desired outcome. If sticky-price producers ever deviate to some choice xˆt , the central bank switches to a money regime set such that xt ( j) = xˆt . The following proposition follows immediately: PROPOSITION 4 (Unique Implementation with Money Reversion). If the best responses of the sticky-price producers are controllable, then any competitive equilibrium outcome in which the central bank uses interest rates as its instrument can be implemented as a unique equilibrium by sophisticated policies which specify reversion to a money regime. A sufficient condition for best responses to be controllable is that in the nonlinear economy, preferences are given by U (c, l) = log c + b(1 − l), where c is consumption and l is labor supply, so that in the linearized economy, Taylor’s γ equals one. To demonstrate controllability, suppose that after a deviation, the central bank reverts to a constant money supply m = log M. With a constant money supply, it is convenient to use the original formulation of the economy with price levels rather than inflation rates. With that translation, the cash-in-advance constraint implies that yr + pr = m for all r, so that (24) implies that the producer’s price is simply to set ∞ (αβ)r−t m = m. (34) pst ( j) = (1 − αβ) r=0

That is, if after a deviation the central bank chooses a constant level of the money supply m, then sticky-price producers optimally

SOPHISTICATED MONETARY POLICIES

73

choose their prices to be m. Clearly, (34) implies that the best responses of these producers are controllable. For example, consider ∗ to a history in which price-setters in period t deviate from pst pˆ st . Obviously, the central bank can choose the level of the money supply so that the optimal choice for an individual price-setter becomes pst ( j) = pˆ st , so that xt ( j) = m − pt−1 = xˆt . With pure interest-rate rules. Now, as with the simple model, we turn to pure interest-rate rules such as the King rule. For the staggered price-setting model, we ask, can such rules uniquely implement bounded competitive equilibrium? We find that for a large class of parameter values, the answer is, again, no. We arrive at this answer by first showing that under the King rule, the economy has a continuum of period-zero competitive equilibria. We then argue that associated with each competitive equilibrium is a sophisticated equilibrium. Here, we write the King rule as (35)

it = it∗ + φ(1 − α)(xt − xt∗ ),

where it∗ and πt∗ are the interest rates and the inflation rates associated with the desired (bounded) competitive equilibrium. From (28), it follows that in all periods, inflation and the aggregate price-setting choice are mechanically linked by πt = (1 − α)xt . This mechanical link means that we can equally well think of policy as feeding back on either inflation or the price-setting choice, so that (35) is equivalent to (36)

it = it∗ + φ(πt − πt∗ ).

Now we show that the economy has a continuum of competitive equilibria by showing that there is a continuum of solutions to (1), (32), and (36) and that these solutions do not violate the transversality and boundedness conditions (29), (30), and (31). Expressing the variables as deviations from the desired equilibrium is convenient. To that end, let π˜ t = πt − πt∗ and y˜t = yt − yt∗ . Subtracting the equations governing {πt∗ , yt∗ } from those governing {πt , yt } gives a system governing {π˜ t , y˜t } that satisfies (1), (32), and (36). Substituting for ˜ıt in (1), using (36), we get that (37)

y˜t+1 + ψ π˜ t+1 = y˜t + ψφ π˜ t ,

74

QUARTERLY JOURNAL OF ECONOMICS

and from (32) we have that (38)

π˜ t = κ y˜t + β π˜ t+1 .

Equations (37) and (38) define a dynamical system. Letting zt = ( y˜t , π˜ t ) , with some manipulation we can stack these equations to give zt+1 = Azt , where ⎡ ⎤ a b ⎢ ⎥ A = ⎣ −κ 1 ⎦ β

β

and where a = 1 + κψ/β and b = ψ(φ − 1/β). This system has a continuum of solutions of the form y˜t = λt1 ω1 + λt2 ω2 and λ1 − a λ2 − a ω1 + λt2 ω2 , π˜ t = λt1 b b where λ1 < λ2 , the eigenvalues of A, are given by 2 1 + κψ 1 1 + κψ κψ 1 λ1 , λ2 = +1 ± − 1 − 4(φ − 1) , 2 β 2 β β (40)

(39)

1 ) y˜0 + π˜ 0 ]/, where is and ω1 = [( λ2b−a ) y˜0 − π˜ 0 ]/ and ω2 = [( a−λ b the determinant of A.6 This continuum of solutions is indexed by y˜0 and π˜ 0 . In the Appendix, we show that for a class of economies that satisfy the restriction

(41)

1 − κψ < β and α(1 + κψ) < 1,

equilibrium is indeterminate under the King rule. We can think of (41) as requiring that the period length is sufficiently short, in the sense that β is close enough to 1, and that the price stickiness is not too large, in the sense that α is sufficiently small. Formally, in the Appendix, we prove the following proposition: PROPOSITION 5 (Indeterminacy of Equilibrium under the King Rule). Suppose that the central bank sets interest rates it according to the King rule (35) with φ > 1 and that (41) is 6. Here and throughout, we restrict attention to values of φ ∈ [0, φmax ], where φmax is the largest value of φ that yields real eigenvalues. That is, at φmax , the discriminant in (40) is zero.

SOPHISTICATED MONETARY POLICIES

75

satisfied. Then the economy has a continuum of competitive equilibria indexed by y0 ≤ y0∗ , (42)

yt = yt∗ + λt2 (y0 − y0∗ )

and

πt = πt∗ + λt2 c(y0 − y0∗ ),

where λ2 > 1 and c = (λ2 − a)/b < 0 are constants. It is immediate to construct a sophisticated equilibrium for each of the continuum of competitive equilibria in (42). Notice that under the King rule, there is one equilibrium with yt = yt∗ and πt = πt∗ for all t, and in the rest, yt goes to minus infinity and πt to plus infinity. All of these equilibria satisfy the boundedness conditions (30) and (31) and, under (41), the transversality condition (29). It turns out that if the inequality in the second part of (41) is reversed, then the set of solutions to the New Keynesian dynamical system, (1), (28), (32), and (35), has the form (42), but the transversality condition rules out all solutions except the one with yt = yt∗ and πt = πt∗ for all t. We find this way of ruling out solutions unappealing because it hinges critically on the idea that sticky-price producers may be unable to change their prices for extremely long periods, even in the face of exploding inflation. With reversion to a hybrid rule. We now show that in the staggered price-setting model, as in the simple model, a King– money hybrid rule can uniquely implement any bounded competitive equilibrium. To do so in this model, we will assume boundedness under money, namely, that for any state variable yt−1 there exists a money regime from period t onward such that a continuation competitive equilibrium exists, and for all such equilibria, inflation in period t, πt , is uniformly bounded. Here uniformly bounded means that there exist constants π and π¯ such that for all yt−1 , πt ∈ [π, π¯ ]. It is immediate that a sufficient condition for boundedness under money is that preferences in the nonlinear economy are given by U (c, l) = log c + b(1 − l). In an economy that satisfies boundedness under money, the King–money hybrid rule that implements a competitive equilibrium {xt∗ , πt∗ , yt∗ } with an associated interest rate it∗ is defined as follows. Set x¯ to be greater than both maxt xt∗ and π¯ , and set x to be lower than both mint xt∗ and π . This rule specifies that if ¯ then the central bank follows a King rule of the form xt ∈ [x, x],

76

QUARTERLY JOURNAL OF ECONOMICS

(35) with φ > 1. If xt falls outside the interval [x, x], ¯ then the central bank reverts to a money regime forever. PROPOSITION 6 (Unique Implementation with a Hybrid Rule). Suppose the staggered price-setting economy satisfies boundedness under money. Then the King–money hybrid rule implements any desired bounded competitive equilibrium. Moreover, under this rule, after any deviation in period t, the equilibrium outcomes from period t + 1 are the desired outcomes. The formal proof of this proposition is in the Appendix. The key idea of this proof is the same as that for this proof of Proposition 3. The idea is that under the King rule, any xˆt that does not equal xt∗ leads subsequent price-setting choices to eventually ¯ But given boundedness under money, leave the interval [x, x]. ¯ cannot be part price-setting choices outside of the interval [x, x] of an equilibrium. Note that with the staggered price-setting model, as with the simple model, under a hybrid rule, deviations lead to only very transitory departures from desired outcomes.

IV. TREMBLES AND IMPERFECT INFORMATION We have shown that in both of the models we have analyzed— a simple one-period price-setting model and a staggered pricesetting model—any equilibrium outcome can be implemented as a unique equilibrium with sophisticated policies. In our equilibria, deviations in private actions lead to changes in the regime. This observation leads to the question of how to construct sophisticated policies if trembles in private actions occur or if deviations in private actions can be detected only imperfectly, say, with measurement error. We show that we can achieve unique implementation with trembles. We show that, with imperfect detection, the King–money hybrid rule leads to a unique equilibrium. This equilibrium is arbitrarily close to the desired equilibrium when the detection error is small. In this sense, our results are robust to trembles and imperfect information. IV.A. Trembles Unique implementation is not a problem if trembles in private actions occur.

SOPHISTICATED MONETARY POLICIES

77

To see that, consider allowing for trembles in private decisions by supposing that the actual price chosen by a price-setter, xt ( j), differs from the intended price, x˜t ( j), by an additive error εt ( j), so that xt ( j) = x˜t ( j) + εt ( j). Trembles are clearly a trivial consideration. If εt ( j) is independently distributed across agents, then it simply washes out in the aggregate; it is irrelevant. Even if εt ( j) is correlated across agents, say, because it has both aggregate and idiosyncratic components, our argument goes through unchanged if the central bank can observe the aggregate component, for example, with a random sample of prices. IV.B. Imperfect Information Not as trivial is a situation in which the central bank has imperfect information about prices. But even in that situation, the King–money hybrid rule leads to a unique equilibrium; and when the detection error is small, this equilibrium is arbitrarily close to the desired equilibrium. To see that, consider a formulation in which the central bank observes the actions of price-setters with measurement error. Of course, if the central bank could see some other variable perfectly, such as output or interest rates on private debt, then it could infer what the private agents did. We think of this formulation as giving the central bank minimal amounts of information relative to what actual central banks have. We show here that with this sort of imperfect information, we can implement outcomes that are close to the desired outcomes when the measurement error is small. Here the central bank observes the price-setters’ choices with error, so that (43)

xˆt = xt + εt ,

where the error εt is i.i.d. over time with mean zero and bounded support [ε, ε¯ ]. Consider using the King–money hybrid rule to support some desired competitive equilibrium. Choose the interest¯ such that xt∗ + εt is contained in this interval rate interval [x, x] for all t. Here, the King rule is of the form (44) with φ > 1.

it (hgt ) = it∗ + φ(1 − α)(xˆt − xt∗ )

78

QUARTERLY JOURNAL OF ECONOMICS

In this economy with measurement error, the best response of any individual price-setter is identical to that in the economy without measurement error. This result follows because the best response depends on only the expected values of future variables. Because the measurement error εt has mean zero, these expected values are unchanged. Therefore, the unique equilibrium in this economy with measurement error has xt = xt∗ ; thus, πt = πt∗ . The realized values of the interest rate it and output yt , however, fluctuate around their desired values it∗ and yt∗ . Using (43) and (44), we know that the realized value of the interest rate is given by (45)

it = it∗ + φ(1 − α)εt ,

whereas using the Euler equation, we know that the realized value of output is given by (46)

yt = yt∗ − ψφ(1 − α)εt .

Notice that when the central bank observes private actions imperfectly, the King–money hybrid rule does not exactly implement any desired competitive equilibrium. Rather, this rule implements an equilibrium in which output fluctuates around its desired level. These fluctuations are proportional to the size of the measurement error. Clearly, as the size of the measurement error εt goes to zero, the outcomes converge to the desired outcomes. We have thus established a proposition: PROPOSITION 7 (Approximate Implementation with Measurement Error). Suppose the sophisticated policy is described by the King–money hybrid rule described above. Then the economy has a unique equilibrium with xt = xt∗ and yt given by (46). As the variance of the measurement error approaches zero, the economy’s outcomes converge to the desired outcomes. Note that although the central bank never reverts to a money regime when it is on the equilibrium path, the possibility that it will do so off the equilibrium path plays a critical role in this implementation. V. IMPLICATIONS FOR THE TAYLOR PRINCIPLE The sophisticated policy approach we have just described has implications for the use of the Taylor principle as a device to

SOPHISTICATED MONETARY POLICIES

79

ensure determinacy and to guide inferences from empirical investigations about whether central bank policy has led the economy into a determinate or indeterminate region. (Recall that the Taylor principle is the notion that interest rates should rise more than one for one with inflation rates, both compared to some exogenous, possibly stochastic, levels.) V.A. Setup In order to show what the sophisticated policy approach implies for our discussion of the Taylor principle, we consider a popular specification of the Taylor rule of the form (47)

it = ¯ıt + φ Et−1 πt + bEt−1 yt ,

where ¯ıt is an exogenously given, possibly stochastic, sequence. (See Taylor [1993] for a similar specification.) In our simple model, from (12), policies of the Taylor rule form (47) can be written as (48)

it = ¯ıt + φ(xt − x¯t ).

When the parameter φ > 1, such policies are said to satisfy the Taylor principle: The central bank should raise its interest rate more than one for one with increases in inflation. When φ < 1, such policies are said to violate that principle. Notice that when ¯ıt and x¯t coincide with the desired competitive equilibrium outcomes it∗ and xt∗ for all periods, the Taylor rule (48) reduces to the simple model’s King rule (19). V.B. Implications for Determinacy Many economists have argued that central banks must adhere to the Taylor principle in order to ensure unique implementation. Our results clearly imply that if the central bank is following a pure interest-rate rule, then adherence to the Taylor principle is neither necessary nor sufficient for unique implementation. If, however, the central bank is following a King–money hybrid rule, then adherence to this principle after deviations between observed outcomes and desired outcomes can help ensure unique implementation. Note that policies of the Taylor rule form (48) are linear feedback rules of the form (23) and lead to indeterminacy, regardless of the value of φ. In this sense, if the central bank is following a pure interest-rate rule, then adherence to the Taylor principle is not sufficient for unique implementation. A similar argument implies

80

QUARTERLY JOURNAL OF ECONOMICS

that, under (41), it is not sufficient in the staggered price-setting model either. Clearly, under pure interest-rate rules, adherence to the Taylor principle is also not necessary for unique implementation. Propositions 1 and 4 imply that, in both models, the central bank can uniquely implement any competitive equilibrium, including those that violate the Taylor principle along the equilibrium path. V.C. Implications for Estimation Many economists have estimated monetary policy rules and then inferred that these rules have led the economy to be in the determinate region if and only if they satisfy the Taylor principle. Indeed, one branch of this literature argues that the undesirable inflation experiences of the 1970s in the United States occurred in part because monetary policy led the economy to be in the indeterminate region. See, for example, the work of Clarida, Gal´ı, and Gertler (2000). We provide a set of stark assumptions under which such inferences can be made more confidently. Nonetheless, finding appropriate assumptions in more interesting applied examples remains a challenge. Perfect Information. In economies in which the central bank and private agents have the same information, observations of variables along the equilibrium path shed no light on the properties of policies off that path, and it is these properties that govern the determinacy of equilibrium. Of course, any estimation procedure can rely only on data along the equilibrium path; it cannot uncover the properties of policies off that path. In this sense, estimation procedures in economies with perfect information cannot determine whether monetary policy is leading the economy to be in the determinate or the indeterminate region. (See Cochrane [2007] for a related point.) To see this general point in the context of our models, note that any estimation procedure can only uncover relationships between the equilibrium interest rate it∗ and the equilibrium inflation rate πt∗ . These relationships have nothing whatsoever to do with the off-equilibrium path policies that govern determinacy. For example, in the context of the King–money hybrid rule with the King rule of the form (35), neither it∗ nor πt∗ depend on the parameter φ, but the size of this parameter plays a key role in ensuring determinacy. In this sense, without trivial identifying

SOPHISTICATED MONETARY POLICIES

81

assumptions, no estimation procedure can uncover the key parameter for determinacy. For example, suppose that along the equilibrium path, interest rates satisfy (49)

it∗ = ¯ı + φ ∗ (xt∗ − x), ¯

where it∗ and xt∗ are the desired equilibrium outcomes and ı¯ and x¯ are some constants that differ from those desired outcomes. This equilibrium can be supported in many ways, including reversion after deviations to a money regime or some sort of hybrid rule. Notice that in (49) the parameter φ ∗ simply describes the relation between the equilibrium outcomes it∗ and xt∗ and has no connection to the behavior of policy after deviations. Obviously, with a policy that specifies reversion to a money regime, the size of φ ∗ (whether it is smaller or larger than one) has no bearing on the determinacy of equilibrium. That is also true with a policy that reverts to a hybrid rule after deviations, though perhaps not as obviously. Suppose that for small deviations, the hybrid rule specifies the King rule (20) with φ > 1. The parameter φ of this King rule has no connection to the parameter φ ∗ in (49). The former governs the behavior of policies after deviations, whereas the latter simply describes a relationship that holds along the equilibrium path. Furthermore, although φ > 1 ensures determinacy, the size of φ ∗ —whether it is smaller or larger than 1—has no bearing on determinacy. These arguments clearly generalize to situations in which the constants ¯ı and x¯ are replaced by exogenous, possibly stochastic, sequences ¯ıt and x¯t that differ from the desired outcomes, so that along the equilibrium path, interest rates satisfy (50)

it∗ = ¯ıt + φ ∗ (xt∗ − x¯t ).

We interpret most of the current estimation procedures of the Taylor rule variety as estimating φ ∗ , the parameter governing desired outcomes in (50) or its analog in more general setups. To use these estimates to draw inferences about determinacy, researchers implicitly assume that the parameter φ (the parameter describing off-equilibrium path behavior) is the same as φ ∗ (the parameter describing on-equilibrium path behavior). Researchers also restrict attention to bounded solutions. As we have discussed, with perfect information, theory imposes no connection between φ and φ ∗ , so the assumption that φ = φ ∗ is not grounded in theory.

82

QUARTERLY JOURNAL OF ECONOMICS

Also, the rationale for restricting attention to bounded solutions is not clear. With perfect information, then, current estimation procedures simply cannot uncover whether the economy is in the determinate or the indeterminate region. Imperfect Information. With imperfect information, however, there is some hope that variants of current procedures may be able to uncover some of the key parameters for determinacy, provided researchers are willing to make some quite strong assumptions. Here we provide a stark example in which a variant of current procedures can uncover one of the key parameters governing determinacy. Consider our staggered price-setting economy, in which the central bank observes the price-setters’ choices with error. Recall that in this economy, the equilibrium outcomes for interest rates and output, (45) and (46), depend on the parameter φ in the King–money hybrid rule and that this parameter plays a key role in ensuring determinacy. Note the contrast with the perfect information economy, in which the equilibrium outcomes do not depend on the parameter φ. The fact that equilibrium outcomes depend on the key determinacy parameter here offers some hope that researchers will be able to estimate it. For our stark example, we assume that researchers observe the same data as the central bank and that along the equilibrium path, the central bank follows a King rule of the form (51)

it = it∗ + φ(1 − α)(xˆt − xt∗ ).

If researchers know the desired outcomes xt∗ and it∗ , as well as the parameter α, then they can simply solve (51) for φ as long as xˆt does not identically equal xt∗ . To go from this solution for φ to an inference about determinacy requires more assumptions. One set of assumptions is that the data are generated by our staggered price-setting model, in which the central bank observes xˆt = xt + εt , where εt is i.i.d. over time and has mean zero and bounded support [ε, ε¯ ], and the central bank follows the King–money hybrid rule, with the King rule given by (51). The key feature of the formulation that allows this inference is that xˆt does not identically equal xt∗ as it does in the economies with perfect information. Note that in our stark example, this procedure can uncover the King rule parameter φ, but not the hybrid rule parameters π and π. ¯ More generally, no procedure can uncover what behavior would be in situations that are never reached in equilibrium, even

SOPHISTICATED MONETARY POLICIES

83

if the specification of such behavior plays a critical role in unique implementation. This observation implies that even in our stark example, we cannot distinguish between a pure interest-rate rule and the King–money hybrid rule. Although we have offered some hope for uncovering some of the key parameters for determinacy, applying our insight to a broader class of environments is apt to be hard. In practice, after all, the desired outcomes are not known, the other parameters of the economy are not known, the measurement error is likely to be serially correlated, and the interest-rate rule is subject to stochastic shocks. Quite beyond these practical issues is a theoretical one: drawing inferences about determinacy requires confronting a subtle identification issue. This issue stems from the fact that characterizing the equilibrium is relatively easy if the economy is in the determinate region, but extremely hard if it is not. Specifically, if the economy is in the determinate region, then the probability distribution over observed variables is a relatively straightforward function of the primitive parameters. If the economy is in the indeterminate region, however, then this probability distribution (which must take account of the possibility of sunspots) is more complicated. One way to proceed is to tentatively assume that the economy is in the determinate region and estimate the key parameters governing determinacy. Suppose that under this tentative assumption, we find that the parameters fall in the determinate region. Can we then conclude that the economy is in the determinate region? Not yet. We must still show that the data could not have been generated by one of the indeterminate equilibria—not an easy task. VI. CONCLUSIONS We have here described our sophisticated policy approach and illustrated its use as an operational guide to policy that achieves unique implementation of any competitive equilibrium outcome. We have demonstrated that using a pure interest-rate rule leads to indeterminacy. We have also constructed policies that avoid this by switching regimes: they use interest rates until private agents deviate and then revert to a money regime or a hybrid rule. Our work has strong implications for the use of the Taylor principle as a guide to policy. We have shown that if a central bank

84

QUARTERLY JOURNAL OF ECONOMICS

follows a pure interest-rate rule, then adherence to the Taylor principle is neither necessary nor sufficient for unique implementation. Adherence to that principle may ensure determinacy, however, if monetary policy includes a reversion to the King–money hybrid rule after deviations. We have also argued that existing empirical procedures used to draw inferences about the relationship between adherence to the Taylor principle and determinacy should be treated with caution. We have provided a set of stark assumptions that can be more confidently used in applied work to draw inferences regarding the relationship between central bank policy and determinacy. Using this method, however, requires solving multiple difficult identification problems. Finally, although we have here focused exclusively on monetary policy, the use of our operational guide is not necessarily limited to that application. The logic behind the construction of the guide should be applicable as well to other governmental policies—for example, to fiscal policy and to policy responses to financial crises—or to any application that aims to uniquely implement a desired outcome.

APPENDIX: THE PROOFS OF PROPOSITIONS 3, 5, AND 6 A. Proof of Proposition 3: A Unique Implementation with a Hybrid Rule in the Simple Model Given that the central bank follows the King–money hybrid rule, say, σg∗ , we will show here that there are unique strategies σx , σ y , and σπ for private agents that, together with σg∗ , constitute a sophisticated equilibrium. We then show that this sophisticated equilibrium implements the desired outcomes. The strategies σx , σ y , and σπ are as follows. The strategy σx specifies that xt (ht−1 ) = xt∗ (st−1 ) for all histories. The strategies σ y and σπ specify yt (hyt ) and πt (hyt ) as the unique solutions to conditions defining consumer optimality; (1) and (2), which define flexible price–producer optimality, (10); and the King–money ∗ ∗ (st+1 ) and xt+1 (st+1 ) = xt+1 (st+1 ). hybrid rule with yt+1 (st+1 ) = yt+1 Note that the value of xt in the history hyt = (ht−1 , xt , δt , st ) determines the regime in the current period and, hence, determines whether the Euler equation (1) or the cash-in-advance constraint (2) is used to solve for yt (hyt ) and πt (hyt ).

SOPHISTICATED MONETARY POLICIES

85

We now show that (σg∗ , σx , σ y , σπ ) is a sophisticated equilibrium. Given that {xt∗ (st−1 ), πt∗ (st ), yt∗ (st )} is a period-zero competi¯ so that the central bank tive equilibrium and that xt∗ (st−1 ) ∈ [x, x], is following an interest-rate regime, we know that any tail of these outcomes {xt∗ (st−1 ), πt∗ (st ), yt∗ (st )}t≥r is a continuation competitive equilibrium starting in period r regardless of the history hr−1 . On the equilibrium path, this claim follows immediately because the continuation of any competitive equilibrium is also a competitive equilibrium. Off the equilibrium path, for histories ht−1 , the tail is a period-zero competitive equilibrium (with periods suitably relabeled) and is, therefore, a continuation competitive equilibrium. A similar argument shows that the tail of the outcomes starting from the end of period r, namely, πr (hyr ) and yr (hyr ), together with the outcomes {xt∗ (st−1 ), πt∗ (st ), yt∗ (st )}t≥r+1 , constitutes a continuation competitive equilibrium. Note that our construction implies that after any deviation in period t, the equilibrium outcomes from period t + 1 are the desired outcomes. We now establish uniqueness of the sophisticated equilibrium of the form (σg∗ , σx , σ y , σπ ). We begin with a preliminary result that shows that for any st−1 in any equilibrium, xt (st−1 ) ∈ [x, x]. ¯ / This argument is by contradiction. Suppose that at st−1 , xt (st−1 ) ∈ ¯ Under the hybrid rule, the central bank reverts to a money [x, x]. ¯ From Lemma 1, regime with expected inflation equal to π¯ ∈ [x, x]. ¯ which contradicts xt (st−1 ) ∈ / [x, x]. ¯ This result xt (st−1 ) = π¯ ∈ [x, x], implies that along the equilibrium path, the central bank never reverts to money, so that interest rates are given by the King rule (19). With this preliminary result, we establish uniqueness by another contradiction argument. Suppose that the economy has a sophisticated equilibrium in which in some history hr−1 , xr (hr−1 ) = xˆr , which differs from xr∗ (sr−1 ). Without loss of generality, suppose that xˆr − xr∗ (sr−1 ) = ε > 0. Let {xˆt (st−1 ), πˆ t (st ), yˆt (st )}t≥r denote the associated continuation competitive equilibrium outcomes. Our preliminary result implies that the central bank follows the King rule in all periods. Let {ˆıt (st−1 )}t≥r denote the associated interest rates. From (13), using the law of iterated expectations, we have that (52)

∗ E[it∗ (st−1 ) | sr−1 ] = E[xt+1 (st ) | sr−1 ]

E[ˆıt (s

t−1

)|s

r−1

] = E[xˆt+1 (s ) | s t

r−1

].

and

86

QUARTERLY JOURNAL OF ECONOMICS

Substituting (52) into the King rule (19) gives that ∗ (st ) | sr−1 = φ t−r ε. E xˆt+1 (st ) − xt+1 ∗ (st ) is bounded, for every ε there exists Because φ > 1 and xt+1 some T such that ¯ E xˆ T +1 (sT ) | sT −1 > x.

But this contradicts our preliminary result that xt (st−1 ) ≤ x¯ for all QED t and st−1 . B. Proof of Proposition 5: Indeterminacy of Equilibrium under the King Rule in the Staggered Price-Setting Model It is straightforward to verify that output and inflation satisfying (42) satisfy all equilibrium conditions except the model’s transversality condition (29) and its two boundedness conditions (30) and (31). Here we verify these conditions. Consider first the transversality condition. Under (40) it follows that the larger eigenvalue λ2 (φ) is a decreasing function of φ and that λ2 (1) = (1 + κψ)/β. From (41) it then follows that βαλ2 (φ) < 1 for all φ ≥ 1. Hence, limt→∞ (αβ)t π˜ t = 0. Because πt∗ is bounded, it follows that πt satisfies the transversality condition (29). Consider next the output and interest-rate boundedness conditions. We first show that [λ2 (φ) − a]/b < 0 for all φ ≥ 1. To do so, we show that λ2 (φ) − a is positive for φ ∈ [1, 1/β), zero at φ = 1/β, and negative for φ ∈ (1/β, φmax ]. From (40) we know that 1 1 1 + κψ (53) λ2 = +1 β 2 β 1 κψ 2 κψ 1 1 −1 + −1 . + −4 2 β β β β Note that the term in the radical is a perfect square. Then using that and the first part of (41) turns (53) into 1 κψ λ2 =1+ = a. β β Because λ2 (φ) is decreasing, it follows that λ2 (φ) − a has the desired sign pattern. Because b = ψ(φ − 1/β), the numerator and the denominator of [λ2 (φ) − a]/b have opposite signs for all φ ≥ 1, so that [λ2 (φ) − a]/b is negative. Thus, the boundedness conditions

SOPHISTICATED MONETARY POLICIES

87

are satisfied for all ω2 ≤ 0. In the resulting equilibria, inflation goes to plus infinity and output goes to minus infinity (so that the level of output goes to zero). QED C. Proof of Proposition 6: Unique Implementation with a Hybrid Rule in the Staggered Price-Setting Model Let {xt∗ , πt∗ , yt∗ } be the desired bounded competitive equilibrium. The strategies that implement this competitive equilibrium are as follows. The strategy σg∗ is the King–money hybrid rule. The strategy σx specifies that xt (ht−1 ) = xt∗ for all histories. The strategies σ y and σπ specify yt (hyt ) and πt (hyt ) that are the unique solutions to the deterministic versions of the conditions defining consumer optimality, (1), (2), (28), (32), and the King–money hy∗ ∗ and xt+1 = xt+1 . brid rule with yt+1 = yt+1 ∗ The proof that (σg , σx , σ y , σπ ) is a sophisticated equilibrium closely parallels that of Proposition 3. We now establish uniqueness of the sophisticated equilibrium of the form (σg∗ , σx , σ y , σπ ). We begin by showing that given σg∗ , xt (ht−1 ) = xt∗ for all histories. (Clearly, given σg∗ and σx , σ y and σπ are unique.) For reasons similar to those underlying the preliminary result in Proposition 3, for any history ht−1 , xt (ht−1 ) must ¯ so that for any history, interest rates are be in the interval [x, x], given by the King rule (35). Under an interest-rate rule, the state yt−1 is irrelevant; therefore, a continuation competitive equilibrium starting at the beginning of any period t solves the same equations as a competitive equilibrium (starting from period 0). For notational simplicity, we focus on a competitive equilibrium starting from period 0. Suppose by way of contradiction that {xˆt , πˆ t , yˆt } is an equilibrium that does not coincide with {xt∗ , πt∗ , yt∗ }. Let x˜t = xˆt − xt∗ , and use similar notation for π˜ t and y˜t . Then, subtracting the equations governing the systems denoted with an asterisk from those denoted with a caret, we have a system governing {x˜t , π˜ t , y˜t } that satisfies (the analogs of) (1), (32), and (35). The resulting system, given by (37) and (38), coincides with that in the proof of Proposition 5. Hence, the solution is given by (39) with eigenvalues given by (40). It is easy to check that φ > 1 implies that both eigenvalues λ1 and λ2 are greater than one. Furthermore, at least one of (λ1 − a)/b and (λ2 − a)/b is nonzero. Because both of the eigenvalues are greater than one, (39) implies that if the two equilibria ever differ, then π˜ t becomes unbounded, so that x˜t does as well. Because

88

QUARTERLY JOURNAL OF ECONOMICS

xt∗ is bounded, xˆt must eventually leave the interval [x, x], ¯ which cannot happen in equilibrium. So we have a contradiction, and the first part of Proposition 6 is established. Note that our construction implies that after any deviation in period t, the equilibrium outcomes from period t + 1 are the desired outcomes. Thus, we have also established the second part of the proposition. QED UNIVERSITY OF CALIFORNIA, LOS ANGELES, FEDERAL RESERVE BANK OF MINNEAPOLIS, AND NATIONAL BUREAU OF ECONOMIC RESEARCH UNIVERSITY OF MINNESOTA AND FEDERAL RESERVE BANK OF MINNEAPOLIS FEDERAL RESERVE BANK OF MINNEAPOLIS, UNIVERSITY OF MINNESOTA, AND NATIONAL BUREAU OF ECONOMIC RESEARCH

REFERENCES ˜ Bernardino, Isabel Correia, and Pedro Teles, “Unique Monetary Equilibria Adao, with Interest Rate Rules,” manuscript, Bank of Portugal, 2007. Atkeson, Andrew, V. V. Chari, and Patrick J. Kehoe, “Sophisticated Monetary Policies,” Federal Reserve Bank of Minneapolis, Research Department Staff Report 419, 2009. Barro, Robert J., “On the Determination of the Public Debt,” Journal of Political Economy, 87 (1979), 940–971. Bassetto, Marco, “A Game-Theoretic View of the Fiscal Theory of the Price Level,” Econometrica, 70 (2002), 2167–2195. ——, “Equilibrium and Government Commitment,” Journal of Economic Theory, 124 (2005), 79–105. Benhabib, Jess, Stephanie Schmitt-Groh´e, and Mart´ın Uribe, “Monetary Policy and Multiple Equilibria,” American Economic Review, 91 (2001), 167–186. Buiter, Willem H., “The Fiscal Theory of the Price Level: A Critique,” Economic Journal, 112 (2002), 459–480. Cagan, Phillip, “The Monetary Dynamics of Hyperinflation,” in Studies in the Quantity Theory of Money, Milton Friedman, ed. (Chicago: University of Chicago Press, 1956). Calvo, Guillermo A., “Staggered Prices in a Utility-Maximizing Framework,” Journal of Monetary Economics, 12 (1983), 383–398. Chari, Varadarajan V., Lawrence J. Christiano, and Patrick J. Kehoe, “Optimality of the Friedman Rule in Economies with Distorting Taxes,” Journal of Monetary Economics, 37 (1996), 203–223. Chari, Varadarajan V., and Patrick J. Kehoe, “Sustainable Plans,” Journal of Political Economy, 98 (1990), 783–802. Christiano, Lawrence J., and Massimo Rostagno, “Money Growth Monitoring and the Taylor Rule,” NBER Working Paper No. 8539, 2001. Clarida, Richard, Jordi Gal´ı, and Mark Gertler, “Monetary Policy Rules and Macroeconomic Stability: Evidence and Some Theory,” Quarterly Journal of Economics, 115 (2000), 147–180. Cochrane, John H., “Inflation Determination with Taylor Rules: A Critical Review,” NBER Working Paper No. 13409, 2007. Correia, Isabel, Juan Pablo Nicolini, and Pedro Teles, “Optimal Fiscal and Monetary Policy: Equivalence Results,” Journal of Political Economy, 116 (2008), 141–170. Jackson, Matthew O., “A Crash Course in Implementation Theory,” Social Choice and Welfare, 18 (2001), 655–708. King, Robert G., “The New IS-LM Model: Language, Logic, and Limits,” Federal Reserve Bank of Richmond Economic Quarterly, 86 (2000), 45–103.

SOPHISTICATED MONETARY POLICIES

89

Kocherlakota, Narayana, and Christopher Phelan, “Explaining the Fiscal Theory of the Price Level,” Federal Reserve Bank of Minneapolis Quarterly Review, 23 (1999), 14–23. Ljungqvist, Lars, and Thomas J. Sargent, Recursive Macroeconomic Theory, 2nd ed. (Cambridge, MA: MIT Press, 2004). Lucas, Robert E., Jr., and Nancy L. Stokey, “Optimal Fiscal and Monetary Policy in an Economy without Capital,” Journal of Monetary Economics, 12 (1983), 55–93. McCallum, Bennett T., “Price Level Determinacy with an Interest Rate Policy Rule and Rational Expectations,” Journal of Monetary Economics, 8 (1981), 319–329. Obstfeld, Maurice, and Kenneth Rogoff, “Speculative Hyperinflations in Maximizing Models: Can We Rule Them Out?” Journal of Political Economy, 91 (1983), 675–687. Ramsey, Frank P., “A Contribution to the Theory of Taxation,” Economic Journal, 37 (1927), 47–61. Sargent, Thomas J., and Neil Wallace, “‘Rational’ Expectations, the Optimal Monetary Instrument, and the Optimal Money Supply Rule,” Journal of Political Economy, 83 (1975), 241–254. Schmitt-Groh´e, Stephanie, and Mart´ın Uribe, “Optimal Fiscal and Monetary Policy under Sticky Prices,” Journal of Economic Theory, 114 (2004), 198–230. Siu, Henry E., “Optimal Fiscal and Monetary Policy with Sticky Prices,” Journal of Monetary Economics, 51 (2004), 575–607. Svensson, Lars E. O., and Michael Woodford, “Implementing Optimal Policy through Inflation-Forecast Targeting,” in The Inflation-Targeting Debate, Ben S. Bernanke and Michael Woodford, eds. (Chicago: University of Chicago Press, 2005). Taylor, John B., “Discretion Versus Policy Rules in Practice,” Carnegie–Rochester Conference Series on Public Policy, 39 (1993), 195–214. Wallace, Neil, “A Hybrid Fiat–Commodity Monetary System,” Journal of Economic Theory, 25 (1981), 421–430. Woodford, Michael, “Monetary Policy and Price Level Determinacy in a Cash-inAdvance Economy,” Economic Theory, 4 (1994), 345–380. ——, Interest and Prices: Foundations of a Theory of Monetary Policy (Princeton, NJ: Princeton University Press, 2003).

EARNINGS INEQUALITY AND MOBILITY IN THE UNITED STATES: EVIDENCE FROM SOCIAL SECURITY DATA SINCE 1937∗ WOJCIECH KOPCZUK EMMANUEL SAEZ JAE SONG This paper uses Social Security Administration longitudinal earnings micro data since 1937 to analyze the evolution of inequality and mobility in the United States. Annual earnings inequality is U-shaped, decreasing sharply up to 1953 and increasing steadily afterward. Short-term earnings mobility measures are stable over the full period except for a temporary surge during World War II. Virtually all of the increase in the variance in annual (log) earnings since 1970 is due to increase in the variance of permanent earnings (as opposed to transitory earnings). Mobility at the top of the earnings distribution is stable and has not mitigated the dramatic increase in annual earnings concentration since the 1970s. Long-term mobility among all workers has increased since the 1950s but has slightly declined among men. The decrease in the gender earnings gap and the resulting substantial increase in upward mobility over a lifetime for women are the driving force behind the increase in long-term mobility among all workers.

I. INTRODUCTION Market economies are praised for creating macroeconomic growth but blamed for the economic disparities among individuals they generate. Economic inequality is often measured using highfrequency economic outcomes such as annual income. However, market economies also generate substantial mobility in earnings over a working lifetime. As a result, annual earnings inequality might substantially exaggerate the extent of true economic disparity among individuals. To the extent that individuals can smooth changes in earnings using savings and credit markets, inequality based on longer periods than a year is a better measure ∗ We thank Tony Atkinson, Clair Brown, David Card, Jessica Guillory, Russ Hudson, Jennifer Hunt, Markus Jantti, Alan Krueger, David Lee, Thomas Lemieux, Michael Leonesio, Joyce Manchester, Robert Margo, David Pattison, Michael Reich, Jonathan Schwabish, numerous seminar participants, and especially the editor, Lawrence Katz, and four anonymous referees for very helpful comments and discussions. We also thank Ed DeMarco, Linda Maxfield, and especially Joyce Manchester for their support, Bill Kearns, Joel Packman, Russ Hudson, Shirley Piazza, Greg Diez, Fred Galeas, Bert Kestenbaum, William Piet, Jay Rossi, and Thomas Mattson for help with the data, and Thomas Solomon and Barbara Tyler for computing support. Financial support from the Sloan Foundation and NSF Grant SES-0617737 is gratefully acknowledged. All our series are available in electronic format in the Online Appendix. C 2010 by the President and Fellows of Harvard College and the Massachusetts Institute of

Technology. The Quarterly Journal of Economics, February 2010

91

92

QUARTERLY JOURNAL OF ECONOMICS

of economic disparity. Thus, a comprehensive analysis of disparity requires studying both inequality and mobility. A large body of academic work has indeed analyzed earnings inequality and mobility in the United States. A number of key facts from the pre–World War II years to the present have been established using five main data sources:1 (1) Decennial Census data show that earnings inequality decreased substantially during the “Great Compression” from 1939 to 1949 (Goldin and Margo 1992) and remained low over the next two decades; (2) the annual Current Population Surveys (CPS) show that earnings inequality has increased substantially since the 1970s and especially during the 1980s (Katz and Murphy 1992; Katz and Autor 1999); (3) income tax statistics show that the top of the annual earnings distribution experienced enormous gains over the last 25 years (Piketty and Saez 2003); (4) panel survey data, primarily the Panel Study of Income Dynamics (PSID), show that short-term rank-based mobility has remained fairly stable since the 1970s (Gottschalk 1997); and (5) the gender gap has narrowed substantially since the 1970s (Goldin 1990, 2006; Blau 1998). There are, however, important questions that remain open due primarily to lack of homogeneous and longitudinal earnings data covering a long period of time. First, no annual earnings survey data covering most of the U.S. workforce are available before the 1960s, so that it is difficult to measure overall earnings inequality on a consistent basis before the 1960s, and in particular to analyze the exact timing of the Great Compression. Second, studies of mobility have focused primarily on short-term mobility measures due to lack of longitudinal data with large sample size and covering a long time period. Therefore, little is known about earnings mobility across an entire working life, let alone how such long-term mobility has evolved over time. Third and related, there is a controversial debate on whether the increase in inequality since the 1970s has been offset by increases in earnings mobility, and whether consumption inequality has increased to the same extent as income inequality.2 In particular, the development of performance pay such as bonuses and stock options for highly compensated employees might have increased year-to-year earnings variability substantially among 1. A number of studies have also analyzed inequality and mobility in America in earlier periods (see Lindert [2000] for a survey on inequality and Ferrie [2008] for an analysis of occupational mobility). 2. See, for example, Cutler and Katz (1991), Slesnick (2001), Krueger and Perri (2006), and Attanasio, Battistin, and Ichimura (2007).

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

93

top earners, so that the trends documented in Piketty and Saez (2003) could be misleading. The goal of this paper is to use the Social Security Administration (SSA) earnings micro data available since 1937 to make progress on those questions. The SSA data we use combine four key advantages relative to the data that have been used in previous studies on inequality and mobility in the United States. First, the SSA data we use for our research purposes have a large sample size: a 1% sample of the full US covered workforce is available since 1957, and a 0.1% sample since 1937. Second, the SSA data are annual and cover a very long time period of almost seventy years. Third, the SSA data are longitudinal balanced panels, as samples are selected based on the same Social Security number pattern every year. Finally, the earnings data have very little measurement error and are fully uncapped (with no top code) since 1978.3 Although Social Security earnings data have been used in a number of previous studies (often matched to survey data such as the Current Population Survey), the data we have assembled for this study overcome three important previous limitations. First, from 1946 to 1977, we use quarterly earnings information to extrapolate earnings up to four times the Social Security annual cap.4 Second, we can match the data to employer and industry information starting in 1957, allowing us to control for expansions in Social Security coverage that started in the 1950s. Finally, to our knowledge, the Social Security annual earnings data before 1951 have not been used outside the SSA for research purposes since Robert Solow’s unpublished Harvard Ph.D. thesis (Solow 1951). Few sociodemographic variables are available in the SSA data relative to standard survey data. Date of birth, gender, place of birth (including a foreign country birthplace), and race are available since 1937. Employer information (including geographic location, industry, and size) is available since 1957. Because we do not have information on important variables such as family 3. A number of studies have compared survey data to matched administrative data to assess measurement error in survey data (see, e.g., Abowd and Stinson [2005]). 4. Previous work using SSA data before the 1980s has almost always used data capped at the Social Security annual maximum (which was around the median of the earnings distribution in the 1960s), making it impossible to study the top half of the distribution. Before 1946, the top code was above the top quintile, allowing us to study earnings up to the top quintile over the full period.

94

QUARTERLY JOURNAL OF ECONOMICS

structure, education, and hours of work, our analysis will focus only on earnings rather than on wage rates and will not attempt to explain the links between family structure, education, labor supply, and earnings, as many previous studies have done. In contrast to studies relying on income tax returns, the whole analysis is also based on individual rather than family-level data. Furthermore, we focus only on employment earnings and hence exclude self-employment earnings as well as all other forms of income such as capital income, business income, and transfers. We further restrict our analysis to employment earnings from commerce and industry workers, who represent about 70% of all U.S. employees, as this is the core group always covered by Social Security since 1937. This is an important limitation when analyzing mobility as (a) mobility within the commerce and industry sector may be different than overall mobility and (b) mobility between the commerce and industry sector and all other sectors is eliminated. We obtain three main findings. First, our annual series confirm the U-shaped evolution of earnings inequality since the 1930s. Inequality decreases sharply up to 1953 and increases steadily and continuously afterward. The U-shaped evolution of inequality over time is also present within each gender group and is more pronounced for men. Percentile ratio series show that (1) the compression in the upper part of the distribution took place from 1942 to 1950 and was followed by a steady and continuous widening ever since the early 1950s, and (2) the compression in the lower part of the distribution took place primarily in the postwar period from 1946 to the late 1960s and unraveled quickly from 1970 to 1985, especially for men, and has been fairly stable over the last two decades. Second, we find that short-term relative mobility measures such as rank correlation measures and Shorrocks indices comparing annual vs. multiyear earnings inequality have been quite stable over the full period, except for a temporary surge during World War II.5 In particular, short-term mobility has been remarkably stable since the 1950s, for a variety of mobility measures and also when the sample is restricted to men only. Therefore, the

5. Such a surge is not surprising in light of the large turnover in the labor market generated by the war.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

95

evolution of annual earnings inequality over time is very close to the evolution of inequality of longer term earnings. Furthermore, we show that most of the increase in the variance of (log) annual earnings is due to increases in the variance of (log) permanent earnings, with modest increases in the variance of transitory (log) earnings. Finally, mobility at the top of the earnings distribution, measured by the probability of staying in the top percentile after one, three, or five years, has also been very stable since 1978 (the first year in our data with no top code). Therefore, in contrast to the stock-option scenario mentioned above, the SSA data show very clearly that mobility has not mitigated the dramatic increase in annual earnings concentration. Third, we find that long-term mobility measures among all workers, such as the earnings rank correlations from the early part of a working life to the late part of a working life, display significant increases since 1951 either when measured unconditionally or when measured within cohorts. However, those increases mask substantial heterogeneity across gender groups. Long-term mobility among males has been stable over most of the period, with a slight decrease in recent decades. The decrease in the gender earnings gap and the resulting substantial increase in upward mobility over a lifetime for women is the driving force behind the increase in long-term mobility among all workers. The paper is organized as follows. Section 2 presents the conceptual framework linking inequality and mobility measures, the data, and our estimation methods. Section 3 presents inequality results based on annual earnings. Section 4 focuses on short-term mobility and its effect on inequality, whereas Section 5 focuses on long-term mobility and inequality. Section 6 concludes. Additional details on the data and our methodology, as well as extensive sensitivity analysis and the complete series, are presented in the Online Appendix. II. FRAMEWORK, DATA, AND METHODOLOGY II.A. Conceptual Framework Our main goal is to document the evolution of earnings inequality. Inequality can be measured over short-term earnings (such as annual earnings) or over long-term earnings (such as earnings averaged over several years or even a lifetime). When there is mobility in individual earnings over time, long-term

96

QUARTERLY JOURNAL OF ECONOMICS

inequality will be lower than short-term inequality, as moving up and down the distribution of short-term earnings will make the distribution of long-term earnings more equal. Therefore, conceptually, a way to measure mobility (Shorrocks 1978) is to compare inequality of short-term earnings to inequality of long-term earnings and define mobility as a coefficient between zero and one (inclusive) as follows: (1)

Long-term earnings inequality = Short-term earning inequality × (1 − Mobility).

Alternatively, one can define mobility directly as changes or “shocks” in earnings.6 In our framework, such shocks are defined broadly as any deviation from long-term earnings. Those shocks could indeed be real shocks such as unemployment, disability, or an unexpected promotion. Changes could also be the consequence of voluntary choices such as reducing (or increasing) hours of work, voluntarily changing jobs, or obtaining an expected pay raise. Such shocks can be transitory (such as working overtime in response to a temporarily increased demand for an employer’s product, or a short unemployment spell in the construction industry) or permanent (being laid off from a job in a declining industry). In that framework, both long-term inequality and the extent of shocks contribute to shaping short-term inequality: (2) Short-term earnings inequality = Long-term earnings inequality + Variability in earnings. Equations (1) and (2) are related by the formula (3) Variability in earnings = Short-term earnings inequality × Mobility = Long-term earnings inequality × Mobility/(1 − Mobility). Thus, equation (3) shows that a change in mobility with no change in long-term inequality is due to an increase in variability in earnings. Conversely, an increase in inequality (either short-term or long-term) with no change in mobility implies an increased 6. See Fields (2007) for an overview of different approaches to measuring income mobility.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

97

variability in earnings. Importantly, our concept of mobility is relative rather than absolute.7 Formally, we consider a situation where a fixed group of individuals i = 1, . . . , I have short-term earnings zit > 0 in each period t = 1, . . . , K. For example, t can represent a year. We can define long-term earnings for individual i as average earnings across all K periods: z¯ i = t zit /K. We normalize earnings so that average earnings (across individuals) are the same in each period.8 From a vector of individual earnings z = (z1 , . . . , zI ), an inequality index can be defined as G(z), where G(.) is convex in z and homogeneous of degree zero (multiplying all earnings by a given factor does not change inequality). For example, G(.) can be the Gini index or the variance of log earnings. Shorrocks (1978, Theorem 1, p. 381) shows that G(¯z) ≤

K

G(zt )/K,

t=1

where zt is the vector of earnings in period t and z¯ the vector of long-term earnings (the average across the K periods). This inequality result captures the idea that movements in individual earnings up and down the distribution reduce long-term inequality (relative to short-term inequality). Hence we can define a related Shorrocks mobility index 0 ≤ M ≤ 1 as 1 − M = K

G(¯z)

t=1

G(zt )/K

,

which is a formalization of equation (1) above. M = 0 if and only if individuals’ incomes (relative to the mean) do not change over time. The central advantage of the Shorrocks mobility index is that it formally links short-term and long-term inequality, which is perhaps the primary motivation for analyzing mobility. The disadvantage of the Shorrocks index is that it is an indirect measure of mobility. 7. Our paper focuses exclusively on relative mobility measures, although absolute mobility measures (such as the likelihood of experiencing an earnings increase of at least X% after one year) are also of great interest. Such measures might produce different time series if economic growth or annual inequality changed over time. 8. In our empirical analysis, earnings will be indexed to the nominal average earnings index.

98

QUARTERLY JOURNAL OF ECONOMICS

Therefore, it is also useful to define direct mobility indices such as the rank correlation in earnings from year t to year t + p (or quintile mobility matrices from year t to year t + p). Such mobility indices are likely to be closely related to the Shorrocks indices, as reranking from one period to another is precisely what creates a wedge between long-term inequality and (the average of) short-term inequality. The advantage of direct mobility indices is that they are more concrete and transparent than Shorrocks indices. In our paper, we will therefore use both and show that they evolve very similarly over time. One specific measure of inequality—the variance of log earnings—has received substantial attention in the literature on inequality and mobility. Introducing yit = log zit and y¯i = t log zit /K, we can define deviations in (log) earnings as εit = yit − y¯i . It is important to note that εit may reflect both transitory earnings shocks (such as an i.i.d. process) and permanent earnings shocks (such as a Brownian motion). The deviation εit could either be uncertain ex ante from the individual perspective, or predictable.9 The Shorrocks theorem applied to the inequality index variance of log-earnings implies that vari ( y¯i ) ≤ varit (yit ), where the variance varit (yit ) is taken over both i = 1, . . . , I and K = 1, . . . , t. If, for illustration, we make the statistical assumption that εit ⊥ y¯i and we denote var(εit ) = σε2 , then we have varit (yit ) = vari ( y¯i ) + σε2 , which is a formalization of equation (2) above. The Shorrocks inequality index in that case is M = σε2 /varit (yit ) = σε2 / vari ( y¯i ) + σε2 . This shows that short-term earnings variance can increase because of an increase in long-term earnings variance or an increase in the variance of earnings deviations. Alternatively and 9. Uncertainty is important conceptually because individuals facing no credit constraints can fully smooth predictable shocks, whereas uncertain shocks can only be smoothed with insurance. We do not pursue this distinction in our analysis, because we cannot observe the degree of uncertainty in the empirical earnings shocks.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

99

equivalently, short-term inequality can increase while long-term inequality remains stable if mobility increases. This simple framework can help us understand the findings from the previous literature on earnings mobility in the United States. Rank-based mobility measures (such as year-to-year rank correlation or quintile mobility matrices) are stable over time (Gottschalk 1997), whereas there has been an increase in the variance of transitory earnings (Gottschalk and Moffitt 1994). Such findings can be reconciled if the disparity in permanent earnings has simultaneously widened to keep rank-based mobility of earnings stable. In the theoretical framework we just described, the same set of individuals are followed across the K short-term periods. In practice, because individuals leave or enter the labor force (or the “commerce and industry” sector we will be focusing on), the set of individuals with positive earnings varies across periods. As the number of periods K becomes large, the sample will become smaller. Therefore, we will mostly consider relatively small values of K such as K = 3 or K = 5. When a period is a year, that allows us to analyze short-term mobility. When a period is a longer period of time such as twelve consecutive years, with K = 3, we cover 36 years, which is almost a full lifetime of work, allowing us to analyze long-term mobility, that is, mobility over a full working life. Our analysis will focus on the time series of various inequality and mobility statistics. The framework we have considered can be seen as an analysis at a given point in time s. We can recompute those statistics for various points in time to create time series. II.B. Data and Methodology Social Security Administration Data. We use primarily data sets constructed in SSA for research and statistical analysis, known as the continuous work history sample (CWHS) system.10 The annual samples are selected based on a fixed subset of digits of (a transformation of) the Social Security number (SSN). The same digits are used every year so that the sample is a balanced panel and can be treated as a random sample of the full population data. We use three main SSA data sets. (1) The 1% CWHS file contains information about taxable Social Security earnings from 1951 to 2004, basic demographic 10. Detailed documentation of these data sets can be found in Panis et al. (2000).

100

QUARTERLY JOURNAL OF ECONOMICS

characteristics such as year of birth, sex, and race, type of work (farm or nonfarm, employment or self-employment), selfemployment taxable income, insurance status for the Social Security programs, and several other variables. Because Social Security taxes apply up to a maximum level of annual earnings, however, earnings in this data set are effectively top-coded at the annual cap before 1978. Starting in 1978, the data set also contains information about full compensation derived from the W2 forms, and hence earnings are no longer top-coded. Employment earnings (either FICA employment earnings before 1978 or W2 earnings from 1978 on) are defined as the sum of all wages and salaries, bonuses, and exercised stock options exactly as wage income reported on individual income tax returns.11 (2) The second file is known as the employee–employer file (EE-ER), and we will rely on its longitudinal version (LEED), which covers 1957 to date. Although the sampling approach based on the SSN is the same as the 1% CWHS, individual earnings are reported at the employer level so that there is a record for each employer a worker is employed by in a year. This data set contains demographic characteristics, compensation information subject to top-coding at the employer–employee record level (and with no top code after 1978), and information about the employer, including geographic information and industry at the three-digit (major group and industry group) level. The industry information allows us to control for expansion in coverage overtime (see below). Importantly, the LEED (and EE-ER) data set also includes imputations based on quarterly earnings structure from 1957 to 1977, which allows us to handle earnings above the top code (see below).12 (3) Third, we use the so-called 0.1% CWHS file (one-tenth of 1%) that is constructed as a subset of the 1% file but covers 1937– 1977. This file is unique in its covering the Great Compression of the 1940s. The 0.1% file contains the same demographic variables as well as quarterly earnings information starting with 1951 (and quarter at which the top code was reached for 1946–1950), thereby extending our ability to deal with top-coding problems (see below). 11. FICA earnings include elective employee contributions for pensions (primarily 401(k) contributions), whereas W2 earnings exclude such contributions. However, before 1978, such contributions were almost nonexistent. 12. To our knowledge, the LEED has hardly ever been used in academic publications. Two notable exceptions are Schiller (1977) and Topel and Ward (1992).

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

101

Top Coding Issues. From 1937 to 1945, no information above the taxable ceiling is available. From 1946 to 1950, the quarter at which the ceiling is reached is available. From 1951 to 1977, we rely on imputations based on quarterly earnings (up to the quarter at which the annual ceiling is reached). Finally, since 1978, the data are fully uncapped. To our knowledge, the exact quarterly earnings information seems to have been retained only in the 0.1% CWHS sample since 1951. The LEED 1% sample since 1957 contains imputations that are based on quarterly earnings, but the quarterly earnings themselves were not retained in the data available to us. The imputation method is discussed in more detail in Kestenbaum (1976, his method II) and in the Online Appendix. It relies on earnings for quarters when they are observed to impute earnings in quarters that are not observed (when the taxable ceiling is reached after the first quarter). Importantly, this imputation method might not be accurate if individual earnings were not uniform across quarters. We extend the same procedure to 1951–1956 using the 0.1% file and because of the overlap of the 0.1% file and 1% LEED between 1957 and 1977 are able to verify that this is indeed the exact procedure that was applied in the LEED data. For 1946–1950, the imputation procedure (see the Online Appendix and Kestenbaum [1976, his method I]) uses Pareto distributions and preserves the rank order based on the quarter when the taxable maximum was reached. For individuals with earnings above the taxable ceiling (from 1937 to 1945) or who reach the taxable ceiling in the first quarter (from 1946 to 1977), we impute earnings assuming a Pareto distribution above the top code (1937–1945) or four times the top code (1946–1977). The Pareto distribution is calibrated from wage income tax statistics published by the Internal Revenue Service to match the top wage income shares series estimated in Piketty and Saez (2003). The number of individuals who were top-coded in the first quarter and whose earnings are imputed based on the Pareto imputation is less than 1% of the sample for virtually all years after 1951. Consequently, high-quality earnings information is available for the bottom 99% of the sample, allowing us to study both inequality and mobility up to the top percentile. From 1937 to 1945, the fraction of workers top-coded (in our sample of interest defined below) increases from 3.6% in 1937 to 19.5% in 1944 and 17.4% in 1945. The number of top-coded observations increases

102

QUARTERLY JOURNAL OF ECONOMICS

to 32.9% by 1950, but the quarter when a person reached the taxable maximum helps in classifying people into broad income categories. This implies that we cannot study groups smaller than the top percentile from 1951 to 1977 and we cannot study groups smaller than the top quintile from 1937 to 1950. To assess the sensitivity of our mobility and multiyear inequality estimates with respect to top code imputation, we use two Pareto imputation methods (see the Online Appendix). In the first or main method, the Pareto imputation is based on draws from a uniform distribution that are independent across individuals but also across time periods. As there is persistence in ranking even at the top of the distribution, this method generates an upward bias in mobility within top-coded individuals. In the alternative method, the uniform distribution draws are independent across individuals but fixed over time for a given individual. As there is some mobility in rankings at the top of the distribution, this method generates a downward bias in mobility. We always test that the two methods generate virtually the same series (see Online Appendix Figures A.5 to A.9 for examples).13 Changing Coverage Issues. Initially, Social Security covered only “commerce and industry” employees, defined as most private for-profit sector employees, and excluding farm and domestic employees as well as self-employed workers. Since 1951, there has been an expansion in the workers covered by Social Security and hence included in the data. An important expansion took place in 1951 when self-employed workers and farm and domestic employees were included. This reform also expanded coverage to some government and nonprofit employees (including large parts of the education and health care industries), with coverage increasing significantly further in 1954 and then slowly expanding since then. We include in our sample only commerce and industry employment earnings in order to focus on a consistent definition of workers. Using SIC classification in the LEED, we define commerce and industry as all SIC codes excluding agriculture, forestry, and fishing (01–09), hospitals (8060–8069), educational services (82), social services (83), religious organizations and nonclassified membership organizations (8660–8699), private households (88), and public administration (91–97). 13. This is not surprising because, starting with 1951, imputations matter for just the top 1% of the sample and mobility measures for the full population are not very sensitive to what happens within the very top group.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

103

Between 1951 and 1956, we do not have industry information, as the LEED starts in 1957. Therefore, we impute “commerce and industry” classification using 1957–1958 industrial classification as well as discontinuities in covered earnings from 1950 to 1951 (see the Online Appendix for complete details). In 2004, commerce and industry employees are about 70% of all employees, and this proportion has declined only very modestly since 1937. Using only commerce and industry earnings is a limitation for our study for two reasons. First, inequality and mobility within the commerce and industry sector may be different from those in the full population. Second and more important, mobility between the commerce and industry sector and all other sectors is eliminated. Because in recent decades Social Security covers over 95% of earnings, we show in the Online Appendix that our mobility findings for recent decades are robust to including all covered workers. However, we cannot perform such a robustness check for earlier periods when coverage was much less complete. Note also that, throughout the period, the data include immigrant workers only if they have valid SSNs. Sample Selection. For our primary analysis, we are restricting the sample to adult individuals aged 25 to 60 (by January 1 of the corresponding year). This top age restriction allows us to concentrate on the working-age population.14 Second, we consider for our main sample only workers with annual (commerce and industry) employment earnings above a minimum threshold defined as one-fourth of a full year–full time minimum wage in 2004 ($2,575 in 2004), and then indexed by nominal average wage growth for earlier years. For many measures of inequality, such as log-earnings variance, it is necessary to trim the bottom of the earnings distribution. We show in Online Appendix Figures A.2 to A.9 that our results are not sensitive to choosing a higher minimum threshold such as a full year–full time minimum wage. We cannot analyze the transition into and out of the labor force satisfactorily using our sample because the SSA data cover only about 70% of employees in the early decades. From now on, we refer to our main sample of interest, namely “commerce and industry” workers aged 25 to 60 with earnings above the indexed minimum threshold (of $2,575 in 2004), as the “core sample.” 14. Kopczuk, Saez, and Song (2007) used a wider age group from 18 to 70 and obtain the same qualitative findings.

104

QUARTERLY JOURNAL OF ECONOMICS

0.50

0.45 ●

Gini coefficient

●

●●●

● ● ● ●

0.40

●

●● ●

●● ●●●

● ●

●

●

●●

●

●

●●

●●●●

●

●●●

●

●●

●●

●

●●

●

●●●

●● ●

●

●

●

●●

●

●

●

● ●

●

●

●●

●●●

0.35

●

All workers Men Women

0.30 1940

1950

1960

1970

1980

1990

2000

Year

FIGURE I Annual Gini Coefficients The figure displays the Gini coefficients from 1937 to 2004 for earnings of individuals in the core sample, men in the core sample, and women in the core sample. The core sample in year t is defined as all employees with commerce and industry earnings above a minimum threshold ($2,575 in 2004 and indexed using average wage for earlier years) and aged 25 to 60 (by January 1 of year t). Commerce and industry are defined as all industrial sectors excluding government employees, agriculture, hospitals, educational services, social services, religious and membership organizations, and private households. Self-employment earnings are fully excluded. Estimations are based on the 0.1% CWHS data set for 1937 to 1956, the 1% LEED sample from 1957 to 1977, and the 1% CWHS (matched to W-2 data) from 1978 on. See the Online Appendix for complete details.

III. ANNUAL EARNINGS INEQUALITY Figure I plots the annual Gini coefficient from 1937 to 2004 for the core sample of all workers, and for men and women separately in lighter gray. The Gini series for all workers follows a U-shape over the period, which is consistent with previous work based on decennial Census data (Goldin and Margo 1992), wage income from tax return data for the top of the distribution (Piketty and Saez 2003), and CPS data available since the early 1960s (Katz and Autor 1999). The series displays a sharp decrease of the Gini coefficient from 0.44 in 1938 down to 0.36 in 1953 (the Great Compression) followed by a steady increase since 1953 that accelerates in the 1970s and especially the 1980s. The Gini coefficient surpassed the prewar level in the late 1980s and was highest in 2004 at 0.47. Our series shows that the Great Compression is indeed the period of most dramatic change in inequality since the late 1930s

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

105

and that it took place in two steps. The Gini coefficient decreased sharply during the war from 1942 to 1944, rebounded very slightly from 1944 to 1946, and then declined again from 1946 to 1953. Among all workers, the increase in the Gini coefficient over the five decades from 1953 to 2004 is close to linear, which suggests that changes in overall inequality were not limited to an episodic event in the 1980s. Figure I shows that the series for males and females separately display the same U-shaped evolution over time. Interestingly, the Great Compression as well as the upward trend in inequality is much more pronounced for men than for all workers. This shows that the rise in the Gini coefficient since 1970 cannot be attributed to changes in gender composition of the labor force. The Gini for men shows a dramatic increase from 0.35 in 1979 to 0.43 in 1988, which is consistent with the CPS evidence extensively discussed in Katz and Autor (1999).15 On the other hand, stability of the Gini coefficients for men and for women from the early 1950s through the late 1960s highlights that the overall increase in the Gini coefficient in that period has been driven by a widening of the gender gap in earnings (i.e., the betweenrather than within-group component). Strikingly, there is more earnings inequality among women than among men in the 1950s and 1960s, whereas the reverse is true before the Great Compression and since the late 1970s. Finally, the increase in the Gini coefficient has slowed since the late 1980s in the overall sample. It is interesting to note that a large part of the 3.5 points increase in the Gini from 1990 to 2004 is due to a surge in earnings within the top percentile of the distribution. The series of Gini coefficients estimated, excluding the top percentile, increases by less than 2 points since 1990 (see Online Appendix Figure A.3).16 It should also be noted that, since the 1980s, the Gini coefficient has increased faster for men and women separately than for all workers. This has been driven by 15. There is a controversial debate in labor economics about the timing of changes in male wage inequality, due in part to discrepancies across different data sets. For example, Lemieux (2006), using May CPS data, argues that most of the increase in inequality occurs in the 1980s, whereas Autor, Katz, and Kearney (2008), using March CPS data, estimate that inequality starts to increase in the late 1960s. The Social Security data also point to an earlier increase in earnings inequality among males. 16. Hence, results based on survey data such as official Census Bureau inequality statistics, which do not measure the top percentile well, can give an incomplete view of inequality changes even when using global indices such as the Gini coefficient.

106

QUARTERLY JOURNAL OF ECONOMICS

1.0 ●

●●

0.9

● ●

●

●

●

●

●

● ●

0.8 Log percentile ratios

P50 P20 ●

●

● ●

●

●

●●

●

●●●

●

●●

●●

●

●

●

●●●

●●

●

●●● ●

●●

●

●

●●●

●●

●●

●●●●

●●

●●

●●

0.7

0.6

●

● ●

●●●

●

● ●

0.5

● ●

0.4

●●●

●

●●●

●●●

●●●●

●●●●●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●●●●

P80 P50

●●

●●

●●●●●●●●●●●

●

●

●

●

All workers Men Women

0.3 1940

1950

1960

1970

1980

1990

2000

Year

FIGURE II Percentile Ratios log(P80/P50) and log(P50/P20) Sample is the core sample (commerce and industry employees aged 25 to 60; see Figure I). The figure displays the log of the 50th to 20th percentile earnings ratio (upper part of the figure) and the log of the 80th to 50th percentile earnings ratio (lower part of the figure) among all workers, men only (in lighter gray), and women only (in lighter gray).

an increase in the earnings of women relative to men, especially at the top of the distribution, as we shall see. Most previous work in the labor economics literature has focused on gender-specific measures of inequality. As men and women share a single labor market, it is also valuable to analyze the overall inequality generated in the labor market (in the “commerce and industry” sector in our analysis). Our analysis for all workers and by gender provides clear evidence of the importance of changes in women’s labor market behavior and outcomes for understanding overall changes in inequality, a topic we will return to. To understand where in the distribution the changes in inequality displayed in Figure I are occurring, Figure II displays the (log) percentile annual earnings ratios P80/P50—measuring inequality in the upper half of the distribution—and P50/P20— measuring inequality in the lower half of the distribution. We also depict the series for men and women only separately in lighter gray.17 17. We choose P80 (instead of the more usual P90) to avoid top-coding issues before 1951 and P20 (instead of the more usual P10) so that our low percentile

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

107

The P80/P50 series (depicted in the bottom half of the figure) are also U-shaped over the period, with a brief but substantial Great Compression from 1942 to 1947 and a steady increase starting in 1951, which accelerates in the 1970s. Interestingly, P80/P50 is virtually constant from 1985 to 2000, showing that the gains at the top of the distribution occurred above P80. The series for men is similar except that P80/P50 increases sharply in the 1980s and continues to increase in the 1990s. The P50/P20 series (depicted in the upper half of the figure) display a fairly different time pattern from the P80/P50 series. First, the compression happens primarily in the postwar period from 1946 to 1953. There are large swings in P50/P20 during the war, especially for men, as many young low income earners leave and enter the labor force because of the war, but P50/P20 is virtually the same in 1941 and 1946 or 1947.18 After the end of the Great Compression in 1953, the P50/P20 series for all workers remains fairly stable to the present, alternating periods of increase and decrease. In particular, it decreases smoothly from the mid1980s to 2000, implying that inequality in the bottom half shrank in the last two decades, although it started increasing after 2000. The series for men only is quite different and displays an overall U shape over time, with a sharper great compression that extends well into the postwar period, with an absolute minimum in 1969 followed by a sharp increase up to 1983 and relative stability since then (consistent with recent evidence by Autor, Katz, and Kearney [2008]). For women, the P50/P20 series display a secular and steady fall since World War II. Table I summarizes the annual earnings inequality trends for all (Panel A), men (Panel B), and women (Panel C) with various inequality measures for selective years (1939, 1960, 1980, and 2004). In addition to the series depicted in the Figures, Table I contains the variance of log-earnings, which also displays a U-shaped pattern over the period, as well as the shares of total earnings going to the bottom quintile group (P0–20), the top quintile group (P80–100), and the top percentile group (P99–100). Those last two series also display a U shape over the period. In particular, the top percentile share has almost doubled from 1980 estimate is not too closely driven by the average wage-indexed minimum threshold we have chosen ($2,575 in 2004). 18. In the working paper version (Kopczuk, Saez, and Song 2007), we show that compositional changes during the war are strongly influencing the bottom of the distribution during the early 1940s.

0.433 0.375 0.408 0.471

0.417 0.326 0.366 0.475

0.380 0.349 0.354 0.426

1939 1960 1980 2004

1939 1960 1980 2004

1939 1960 1980 2004

0.635 0.570 0.564 0.693

0.800 0.533 0.618 0.797

0.826 0.681 0.730 0.791

1.36 1.31 1.22 1.34

1.32 0.94 1.06 1.34

1.43 1.24 1.33 1.39

0.87 0.82 0.74 0.74

0.85 0.58 0.64 0.73

0.88 0.79 0.76 0.76

A. All 0.55 3.64 0.46 4.54 0.57 4.34 0.63 3.91 B. Men 0.47 3.82 0.35 5.89 0.43 5.25 0.61 3.92 C. Women 0.49 4.49 0.50 4.98 0.49 5.15 0.59 4.45 42.25 39.18 40.38 47.36

45.52 38.80 42.02 51.83

46.82 41.66 44.98 51.41

6.11 4.05 4.37 8.00

9.58 5.55 6.85 13.44

9.55 5.92 7.21 12.28

9,145 15,148 20,439 32,499

17,918 32,989 44,386 52,955

15,806 27,428 35,039 44,052

4,911 11,006 19,566 33,063

15,493 24,309 30,564 42,908

20,404 35,315 50,129 75,971

#Workers (’000s) (11)

Notes. The table displays various annual earnings inequality statistics for selected years, 1939, 1960, 1980, and 2004 for all workers in the core sample (Panel A), men in the core sample (Panel B), and women in the core sample (Panel C). The core sample in year t is defined as all employees with commerce and industry earnings above a minimum threshold ($2,575 in 2004 and indexed using average wage for earlier years) and aged 25 to 60 (by January 1 of year t). Commerce and industry are defined as all industrial sectors excluding government employees, agriculture, hospitals, educational services, social services, religious and membership organizations, and private households. Self-employment earnings are fully excluded. Estimates are based on the 0.1% CWHS data set for 1937 to 1956, the 1% LEED sample from 1957 to 1977, and the 1% CWHS from 1978 on. See the Online Appendix for complete details. Columns (2) and (3) report the Gini coefficient and variance of log earnings. Columns (4), (5), and (6) report the percentile log ratios P80/P20, P50/P20, and P80/P50. P80 denotes the 80th percentile, etc. Columns (7), (8), and (9) report the share of total earnings accruing to P0–20 (the bottom quintile), P80–100 (the top quintile), and P99–100 (the top percentile). Column (10) reports average earnings in 2004 dollars using the CPI index (the new CPI-U-RS index is used after 1978). Column (11) reports the number of workers in thousands.

Gini (2)

Year (1)

Variance Log percentile ratios Earnings shares Average log earnings earnings P80/P20 P50/P20 P80/P50 P0–20 P80–100 P99–100 (2004 $) (3) (4) (5) (6) (7) (8) (9) (10) 0.1% sample from 1937 to 1956, 1% from 1957 to 2004. Number of workers in thousands

TABLE I ANNUAL EARNINGS INEQUALITY

108 QUARTERLY JOURNAL OF ECONOMICS

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

109

to 2004 in the sample of men only and the sample of women only and accounts for over half of the increase in the top quintile share from 1980 to 2004. IV. THE EFFECTS OF SHORT-TERM MOBILITY ON EARNINGS INEQUALITY In this section, we apply our theoretical framework from Section II.A to analyze multiyear inequality and relate it to the annual earnings inequality series analyzed in Section III. We will consider each period to be a year and the longer period to be five years (K = 5).19 We will compare inequality based on annual earnings and earnings averaged over five years. We will then derive the implied Shorrocks mobility indices and decompose annual inequality into permanent and transitory inequality components. We will also examine some direct measures of mobility such as rank correlations. Figure III plots the Gini coefficient series for earnings averaged over five years20 (numerator of the Shorrocks index) and the five-year average of the Gini coefficients of annual earnings (the denominator of the Shorrocks index). For a given year t, the sample for both the five-year Gini and the annual Ginis is defined as all individuals with “Commerce and Industry” earnings above the minimum threshold in all five years, t − 2, t − 1, t, t + 1, t + 2 (and aged 25 to 60 in the middle year t). We show the average of the five annual Gini coefficients between t − 2 and t + 2 as our measure of the annual Gini coefficient, because it matches the Shorrocks approach. Because the sample is the same for both series, Shorrocks’ theorem implies that the five-year Gini is always smaller than the average of the annual Gini (over the corresponding five years), as indeed displayed in the figure.21 We also display the same series for men only (in lighter gray). The annual Gini displays the same overall evolution over time as in Figure I. The level is lower, as there is naturally less inequality in the group of 19. Series based on three-year averages instead of five year generates display a very similar time pattern. Increasing K beyond five would reduce sample size substantially, as we require earnings to be above the minimum threshold in each of the five years, as described below. 20. The average is taken after indexing annual earnings by the average wage index. 21. Alternatively, we could have defined the sample as all individuals with earnings above the minimum threshold in any of the five years, t − 2, t − 1, t, t + 1, t + 2. The time pattern of those series is very similar. We prefer to use the positive-earnings in all five years criterion because this is a necessity when analyzing variability in log-earnings, as we do below.

110

QUARTERLY JOURNAL OF ECONOMICS

0.45

0.40

Gini coefficient

●

● ●

● ●

0.35

● ●

●

●

● ●

0.30

●

●●●

●

● ●●

●

●●

●

●

●●●●●

●

●●

●●

●●●●● ●●●

●●●

●●●

●

●● ●●●

●●

●●●●●●●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

● ● ●●● ●● ●

●

●

● ●

● ●

● ●● ●●● ● ●●●

● ●

● ● ●

● ● ●

●

●

●

●

●

●

●

●

● ●

Annual earnings, all workers Five-year earnings, all workers Annual earnings, men Five-year earnings, men

0.25 1940

1950

1960

1970

1980

1990

2000

Year

FIGURE III Gini Coefficients: Annual Earnings vs. Five-Year Earnings The figure displays the Gini coefficients for annual earnings and for earnings averaged over five years from 1939 to 2002. In year t, the sample for both series is defined as all individuals aged 25 to 60 in year t, with commerce and industry earnings above the minimum threshold in all five years t − 2, t − 1, t, t + 1, t + 2. Earnings are averaged over the five-year span using the average earnings index. The Gini coefficient for annual earnings displayed for year t is the average of the Gini coefficient for annual earnings in years t − 2, . . . , t + 2. The same series are reported in lighter gray for the sample restricted to men only.

individuals with positive earnings for five consecutive years than in the core sample. The Gini coefficient estimated for five-year earnings average follows a very similar evolution over time and is actually extremely close to the annual Gini, especially in recent decades. Interestingly, in this sample, the Great Compression takes place primarily during the war from 1940 to 1944. The war compression is followed by a much more modest decline till 1952. This suggests that the postwar compression observed in annual earnings in Figure I was likely due to entry (of young men in the middle of the distribution) and exit (likely of wartime working women in the lower part of the distribution). Since the early 1950s, the two Gini series are remarkably parallel, and the five-year earnings average Gini displays an accelerated increase during the 1970s and especially the 1980s, as did our annual Gini series. The fiveyear average earnings Gini series for men show that the Great Compression is concentrated during the war, with little change in the Gini from 1946 to 1970, and a very sharp increase over the next three decades, especially the 1980s.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

111

Shorrocks Gini mobility index and rank correlation

1.0

● ●

0.9

● ●●

●●● ●

●

●

●

●●

●●●●●●●●●●●●●●● ●●●● ●● ●●●●●●●●●●●●●●●●●●●●● ●●● ●●●● ●● ●●●●●● ●●●●●●●●●●●●● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ●

●

0.8

0.7

0.6 ● ●

Shorrocks Index (five-year Gini/annual Gini), all workers Shorrocks Index (five-year Gini/annual Gini), men Rank correlation (after one year), all workers Rank correlation (after one year), men

0.5 1940

1950

1960

1970

1980

1990

2000

Year

FIGURE IV Short-Term Mobility: Shorrocks’ Index and Rank Correlation The figure displays the Shorrocks mobility coefficient based on annual earnings Gini vs. five-year average earnings Gini and the rank correlation between earnings in year t and year t + 1. The Shorrocks mobility coefficient in year t is defined as the ratio of the five-year earnings (from t − 2 to t + 2) Gini coefficient to the average of the annual earnings Gini for years t − 2, . . . , t + 2 (those two series are displayed in Figure III). The rank correlation in year t is estimated on the sample of individuals present in the core sample (commerce and industry employees aged 25 to 60; see Figure I) in both year t and year t + 1. The same series are reported in lighter gray for the sample restricted to men only.

Figure IV displays two measures of mobility (in black for all workers and in lighter gray for men only). The first measure is the Shorrocks measure, defined as the ratio of the five-year Gini to (the average of) the annual Gini. Mobility decreases with the index, and an index equal to one implies no mobility at all. The Shorrocks index series is above 0.9, except for a temporary dip during the war. The increased earnings mobility during the war is likely explained by the large movements into and out of the labor force of men serving in the army and women temporarily replacing men in the civilian labor force. The Shorrocks series have very slightly increased since the early 1970s, from 0.945 to 0.967 in 2004.22 This small change in the direction of reduced mobility further confirms that, as we expected from Figure III, short-term mobility has played a minor role in the surge in annual earnings inequality documented in Figure I. 22. The increase is slightly more pronounced for the sample of men.

112

QUARTERLY JOURNAL OF ECONOMICS

The second mobility measure displayed on Figure IV is the straight rank correlation in earnings between year t and year t + 1 (computed in the sample of individuals present in our core sample in both years t and t + 1).23 As with the Shorrocks index, mobility decreases with the rank correlation and a correlation of one implies no year-to-year mobility. The rank mobility series follows the same overall evolution over time as the Shorrocks mobility index: a temporary but sharp dip during the war followed by a slight increase. Over the last two decades, the rank correlation in year-to-year earnings has been very stable and very high, around .9. As with the Shorrocks index, the increase in rank correlation is slightly more pronounced for men (than for the full sample) since the late 1960s. Figure V displays (a) the average of variance of annual log earnings from t − 2 to t + 2 (defined on the stable sample as in the Shorrocks index analysis before), (b) the variance of five-year average log-earnings, var(( t+2 s=t−2 log zis )/5), and (c) the variance of log earnings deviations, estimated as t+2 s=t−2 log zis Dt = var log(zit ) − , 5 where the variance is taken across all individuals i with earnings above the minimum threshold in all five years t − 2, . . . , t + 2. As with the previous two mobility measures, those series, displayed in black for all workers and in lighter gray for men only, show a temporary surge in the variance of transitory earnings during the war, and are stable after 1960. In particular, it is striking that we do not observe an increased earnings variability over the last twenty years, so that all the increase in the log-earnings variance can be attributed to the increase in the variance of permanent (five-year average) log-earnings. Our results differ somewhat from those of Gottschalk and Moffitt (1994), using PSID data, who found that over one-third of the increase in the variance of log-earnings from the 1970s to the 1980s was due to an increase in transitory earnings (Table 1, row 1, p. 223). We find a smaller increase in transitory earnings in 23. More precisely, within the sample of individuals present in the core sample in both years t and t + 1, we measure the rank rt and rt+1 of each individual in each of the two years, and then compute the correlation between rt and rt+1 across individuals.

113

EARNINGS INEQUALITY AND MOBILITY IN THE U.S. 0.7 ●

0.6

All Annual variance Permanent (five-year) variance Transitory variance ●●

Variance of log(earnings)

0.5

●

●● ●

●

●●●

● ●●

0.4

●

●●

●

Men Annual variance Permanent (five-year) variance Transitory variance

● ●● ●

●●●●●●● ●●

●●●

●●●●●●●●●●●●●●

●●●●●●●●●●

●●●

● ●●

●●●●

●●

●

●

●●

●●

●●●

●●

●●

●●

●

●

●

●●

●

●

●● ●

●

●● ●●●●●● ●●●●●● ●●●●●● ●●●● ●●●● ● ● ●● ●

0.3

0.2

0.1

0.0 1940

1950

1960

1970

1980

1990

2000

Year

FIGURE V Variance of Annual, Permanent, and Transitory (log) Earnings The figure displays the variance of (log) annual earning, the variance of (log) five-year average earnings (permanent variance), and the transitory variance, defined as the variance of the difference between (log) annual earnings and (log) five-year average earnings. In year t, the sample for all three series is defined as all individuals aged 25 to 60 in year t, with commerce and industry earnings above the minimum threshold in all five years t − 2, t − 1, t, t + 1, t + 2. The (log) annual earnings variance is estimated as the average (across years t − 2, . . . , t + 2) of the variance of (log) annual earnings. The same series are reported in lighter gray for the sample restricted to men only.

the 1970s and we find that this increase reverts in the late 1980s and 1990s so that transitory earnings variance is virtually identical in 1970 and 2000. To be sure, our results could differ from those of Gottschalk and Moffitt (1994) for many reasons, such as measurement error and earnings definition consistency issues in the PSID or the sample definition. Gottschalk and Moffitt focus exclusively on white males, use a different age cutoff, take out age-profile effects, and include earnings from all industrial sectors. Gottschalk and Moffitt also use nine-year earnings periods (instead of five as we do) and include all years with positive annual earnings years (instead of requiring positive earnings in all nine years as we do).24 24. The recent studies of Dynan, Elmendorf, and Sichel (2008) and Shin and Solon (2008) revisit mobility using PSID data. Shin and Solon (2008) find an increase in mobility in the 1970s followed by stability, which is consistent with our results. Dynan, Elmendorf, and Sichel (2008) find an increase in mobility in recent decades, but they focus on household total income instead of individual earnings.

114

QUARTERLY JOURNAL OF ECONOMICS ●

13 ●

Earnings Share (%)

12

●

11

● ●

10

●

●

●

● ●

●

●

●

●

●

●

●

●

9 ●

8 ●

7 ● ●

●

● ● ●

● ●

6

Annual earnings Five-year average earnings

A. Top 1% earnings share: annual vs. five-year

100 ●

After one year After three years After five years

Probability (%)

90

80 ●

● ●

● ●

● ● ●

●

● ● ●

● ● ● ● ● ● ● ●

70

● ●

●

●

● ●

60

50 1980

1985 1990 1995 B. Probability of staying in the top 1%

2000

2005

FIGURE VI Top Percentile Earnings Share and Mobility In Panel A, the sample in year t is all individuals aged 25 to 60 in year t and with commerce and industry earnings above the minimum threshold in all five years t − 2, t − 1, t, t + 1, t + 2. In year t, Panel A displays (1) the share of total year t annual earnings accruing to the top 1% earners in that year t and (2) the share of total five-year average earnings (from year t − 2, . . . , t + 2) accruing to the top 1% earners (defined as top 1% in terms of average five-year earnings). Panel B displays the probability of staying in the top 1% annual earnings group after X years (where X = 1, 3, 5). The sample in year t is all individuals present in the core sample (commerce and industry employees aged 25 to 60; see Figure I) in both year t and year t + X. Series in both panels are restricted to 1978 and on because sample has no top code since 1978.

The absence of top-coding since 1978 allows us to zoom on top earnings, which, as we showed in Table I, have surged in recent decades. Figure VI.A uses the uncapped data since 1978 to plot the share of total annual earnings accruing to the top 1% (those with

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

115

earnings above $236,000 in 2004). The top 1% annual earnings share doubles from 6.5% in 1978 to 13% in 2004.25 Figure VI.A then compares the share of earnings of the top 1% based on annual data with shares of the top 1% defined based on earnings averaged at the individual level over five years. The five-year average earnings share series naturally smoothes short-term fluctuations but shows the same time pattern of robust increase as the annual measure.26 This shows that the surge in top earnings is not due to increased mobility at the top. This finding is confirmed in Figure VI.B, which shows the probability of staying in the top 1% earnings group after one, three, and five years (conditional on staying in our core sample) starting in 1978. The one-year probability is between sixty and seventy percent and it shows no overall trend. Therefore, our analysis shows that the dramatic surge in top earnings has not been accompanied by a similar surge in mobility into and out of top earnings groups. Hence, annual earnings concentration measures provide a very good approximation to longer-term earnings concentration measures. In particular, the development of performance-based pay such as bonuses and profits from exercised stock options (both included in our earnings measure) does not seem to have increased mobility dramatically.27 Table II summarizes the key short-term mobility trends for all (Panel A) and men (Panel B) with various mobility measures for selected years (1939, 1960, 1980, and 2002). In sum, the movements in short-term mobility series appear to be much smaller than changes in inequality over time. As a result, changes in short-term mobility have had no significant impact on inequality trends in the United States. Those findings are consistent with previous studies for recent decades based on PSID data (see, e.g., Gottschalk [1997] for a summary) as well as the most recent SSA

25. The closeness of our SSA-based (individual-level) results and the tax return–based (family-level) results of Piketty and Saez (2003) shows that changes in assortative mating played at best a minor role in the surge of family employment earnings at the top of the earnings distribution. 26. Following the framework from Section II.A (applied in this case to the top 1% earnings–share measure of inequality), we have computed such shares (in year t) on the sample of all individuals with minimum earnings in all five years, t − 2, . . . , t + 2. Note also that, in contrast to Shorrocks’ theorem, the series cross because we do not average the annual income share in year t across the five years t − 2, . . . , t + 2. 27. Conversely, the widening of the gap in annual earnings between the top 1% and the rest of the workforce has not affected the likelihood of top-1% earners falling back into the bottom 99%.

116

QUARTERLY JOURNAL OF ECONOMICS TABLE II FIVE-YEAR AVERAGE EARNINGS INEQUALITY AND SHORT-TERM MOBILITY

Annual Permanent Annual 5-year earnings Rank (5-year log-earnings Transitory earnings Gini correlation average) variance logaverage (average after log-earnings (average earnings #Workers Year Gini t − 2, . . . , t + 2) 1 year variance t − 2, . . . , t + 2) variance (’000s) (1) (2) (3) (4) (5) (6) (7) (8) 1939 1960 1980 2002

0.357 0.307 0.347 0.421

0.380 0.324 0.364 0.435

0.859 0.883 0.885 0.897

1939 1960 1980 2002

0.340 0.272 0.310 0.426

0.365 0.291 0.329 0.440

0.853 0.855 0.869 0.898

A. All 0.416 0.371 0.426 0.514 B. Men 0.373 0.288 0.337 0.509

0.531 0.447 0.513 0.594

0.085 0.054 0.061 0.058

14,785 26,479 35,500 55,108

0.494 0.362 0.425 0.591

0.091 0.052 0.062 0.061

11,700 19,577 23,190 32,259

Notes. The table displays various measures of 5-year average earnings inequality and short-term mobility measures centered around selected years, 1939, 1960, 1980, and 2002 for all workers (Panel A) and men (Panel B). In all columns (except (4)), the sample in year t is defined as all employees with commerce and industry earnings above a minimum threshold ($2,575 in 2004 and indexed using average wage for earlier years) in all five years t − 2, t − 1, t, t + 1, and t + 2, and aged 25 to 60 (by January 1 of year t). Column (2) reports the Gini coefficients based on average earnings from year t − 2 to year t + 2 (averages are computed using indexed wages). Column (3) reports the average across years t − 2, . . . , t + 2 of the Gini coefficients of annual earnings. Column (4) reports the rank correlation between annual earnings in year t and annual earnings in year t + 1 in the sample of workers in the core sample (see Table I footnote for the definition) in both years t and t + 1. Column (5) reports the variance of average log-earnings from year t − 2 to year t + 2. Column (6) reports the average across years t − 2, . . . , t + 2 of the variance of annual log-earnings. Column (7) reports the variance of the difference between log earnings in year t and the average of log earnings from year t − 2 to t + 2. Column (8) reports the number of workers in thousands.

data–based analysis of the Congressional Budget Office (2007)28 and the tax return–based analysis of Carroll, Joulfaian, and Rider (2007). They are more difficult to reconcile, however, with the findings of Hungerford (1993) and especially Hacker (2006), who find great increases in family income variability in recent decades using PSID data. Our finding of stable transitory earnings variance is also at odds with the findings of Gottschalk and Moffitt (1994), who decompose transitory and permanent variance in logearnings using PSID data and show an increase in both components. Our decomposition using SSA data shows that only the variance of the relatively permanent component of earnings has increased in recent decades. V. LONG-TERM MOBILITY AND LIFETIME INEQUALITY The very long span of our data allows us to estimate long-term mobility. Such mobility measures go beyond the issue of transitory 28. The CBO study focuses on probabilities of large earnings increases (or drops).

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

117

earnings analyzed above and instead describe mobility across a full working life. Such estimates have not yet been produced for the United States in any systematic way because of the lack of panel data with large sample size and covering a long time period. V.A. Unconditional Long-Term Inequality and Mobility We begin with the simplest extension of our previous analysis to a longer horizon. In the context of the theoretical framework from Section II.A, we now assume that a period is eleven consecutive years. We define the “core long-term sample” in year t as all individuals aged 25–60 in year t with average earnings (using the standard wage indexation) from year t − 5 to year t + 5 above the minimum threshold. Hence, our sample includes individuals with zeros in some years as long as average earnings are above the threshold.29 Figure VII displays the Gini coefficients for all workers, and for men and women separately based on those eleven-year average earnings from 1942 to 1999. The overall picture is actually strikingly similar to our annual Figure I. The Gini coefficient series for all workers displays on overall U shape with a Great Compression from 1942 to 1953 and an absolute minimum in 1953, followed by a steady increase that accelerates in the 1970s and 1980s and slows down in the 1990s. The U-shaped evolution over time is also much more pronounced for men than for women and shows that, for men, the inequality increase was concentrated in the 1970s and 1980s.30 After exploring base inequality over those eleven-year spells, we turn to long-term mobility. Figure VIII displays the rank correlation between the eleven-year earnings spell centered in year t and the eleven-year earnings spell after T years (i.e., centered in year t + T ) in the same sample of individuals present in the “long-term core sample” in both year t and year t + T . The figure presents such correlations for three choices of T : ten years, fifteen years, and twenty years. Given our 25–60 age restriction (which applies in both year t and year t + T ), for T = 20, the sample in year t is aged 25 to 40 (and the sample in year t + 20 is aged 45 to 60). Thus, this measure captures mobility from early career to late career. The figure also displays the same series for men only 29. This allows us to analyze large and representative samples as the number of individuals with positive “commerce and industry” earnings in eleven consecutive years is only between 35% and 50% of the core annual samples. 30. We show in Online Appendix Figures A.8 and A.9 that these results are robust to using a higher minimum threshold.

118

QUARTERLY JOURNAL OF ECONOMICS

0.50

●

Gini coefficient

●

●

●

● ●

0.45

●

●

●●● ● ● ●● ●● ●●●

● ●●●● ●●●

●

●

●

●

●

●●

● ●● ●●

●

●

●

●●

●●●●●● ●●● ●● ●●

0.40

●

1940

1950

1960

1970

1980

1990

All workers Men Women 2000

Year (middle of the eleven-year span)

FIGURE VII Long-Term Earnings Gini Coefficients The figure displays the Gini coefficients from 1942 to 1999 for eleven-year average earnings for all workers, men only, and women only. The sample in year t is defined as all employees aged 25 to 60 in year t, alive in all years t − 5 to t + 5, and with average commerce and industry earnings (averaged using the average wage index) from year t − 5 to t + 5 above the minimum threshold. Gini coefficient in year t is based on average (indexed) earnings across the eleven-year span from year t − 5 to t + 5.

in lighter gray, in which case rank is defined within the sample of men. Three points are worth noting. First, the correlation is unsurprisingly lower as T increases, but it is striking to note that even after twenty years, the correlation is still substantial (in the vicinity of .5). Second, the series for all workers shows that rank correlation has actually significantly decreased over time: for example, the rank correlation between 1950s and 1970s earnings was around .57, but it is only .49 between 1970s and 1990s earnings. This shows that long-term mobility has increased significantly over the last five decades. This result stands in contrast to our short-term mobility results displaying substantial stability. Third, however, Figure VIII shows that this increase in long-term mobility disappears in the sample of men. The series for men displays a slight decrease in rank correlation in the first part of the period followed by an increase in the last part of the period. On net, the series for men displays almost no change in rank correlation and hence no change in long-term mobility over the full period.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

119

0.8 ● ●

●

●

● ● ● ● ● ● ● ● ●

● ●

0.7

Rank correlation

● ● ● ● ● ● ● ● ● ● ●

● ●

After ten years, all After ten years, men

● ● ● ● ● ● ● ●

After fifteen years, all After fifteen years, men

After twenty years, all After twenty years, men

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.6

0.5

0.4 1950

1960

1970

1980

1990

Year (middle of the initial eleven-year span)

FIGURE VIII Long-Term Mobility: Rank Correlation in Eleven-Year Earnings Spans The figure displays in year t the rank correlation between eleven-year average earnings centered around year t and eleven-year average earnings centered around year t + X, where X = ten, fifteen, twenty. The sample is defined as all individuals aged 25 to 60 in year t and t + X, with average eleven-year earnings around years t and t + X above the minimum threshold. Because of small sample size, series including earnings before 1957 are smoothed using a weighted three-year moving average with weight of 0.5 for cohort t and weights of 0.25 for t − 1 and t + 1. The same series are reported in lighter gray for the sample restricted to men only (in which case, rank is estimated within the sample of men only).

V.B. Cohort-Based Long-Term Inequality and Mobility The analysis so far ignored changes in the age structure of the population as well as changes in the wage profiles over a career. We turn to cohort-level analysis to control for those effects. In principle, we could control for age (as well as other demographic changes) using a regression framework. In this paper, we focus exclusively on series without controls because they are more transparent, easier to interpret, and less affected by imputation issues. We defer a more comprehensive structural analysis of earnings processes to future work.31 We divide working lifetimes from age 25 to 60 into three stages: Early career is defined as from the calendar year the 31. An important strand of the literature on income mobility has developed covariance structure models to estimate such earnings processes. The estimates of such models are often difficult to interpret and sensitive to the specification (see, e.g., Baker and Solon [2003]). As a result, many recent contributions in the mobility literature have also focused on simple measures without using a complex framework (see, e.g., Congressional Budget Office [2007] and in particular the discussion in Shin and Solon [2008]).

120

QUARTERLY JOURNAL OF ECONOMICS

0.55

Gini coefficient

0.50

0.45

●●

●

0.40

●

●

●●

●

●●●●

● ●●● ●

●

● ● ● ●●● ●●●● ● ●

●●

●●

● ● ● ●

●

Early career: age 25 to 36 Mid-career:age 37 to 48 Late career: age 49 to 60 Men only in lighter gray

●●●●●●●●●●●

● ●

0.35

●●●

●●

● ● ● ●● ● ● ●● ● ●

●●

● ●● ● ●

●

●

●

●●●

●

●

●● ● ● ● ● ●● ● ●●●●●● ● ● ● ●●● ●●● ●●● ●

●●

●●

0.30 1900

1920

1940

1960

Year of birth

FIGURE IX Long-Term Earnings Gini Coefficients by Birth Cohort Sample is career sample defined as follows for each career stage and birth cohort: all employees with average commerce and industry earnings (using average wage index) over the twelve-year career stage above the minimum threshold ($2,575 in 2004 and indexed on average wage for earlier years). Note that earnings can be zero for some years. Early career is from age 25 to 36, middle career is from age 37 to 48, late career is from age 49 to 60. Because of small sample size, series including earnings before 1957 are smoothed using a weighted three-year moving average with weight of 0.5 for cohort t and weights of 0.25 for t − 1 and t + 1.

person reaches 25 to the calendar year the person reaches 36. Middle and later careers are defined similarly from age 37 to 48 and age 49 to 60, respectively. For example, for a person born in 1944, the early career is calendar years 1969–1980, the middle career is 1981–1992, and the late career is 1993–2004. For a given year-of-birth cohort, we define the “core early career sample” as all individuals with average “commerce and industry” earnings over the twelve years of the early career stage above the minimum threshold (including zeros and using again the standard wage indexation). The “core mid-career” and “core late career” samples are defined similarly for each birth cohort. The earnings in early, mid-, and late career are defined as average “commerce and industry” earnings during the corresponding stage (always using the average wage index). Figure IX reports the Gini coefficient series by year of birth for early, mid-, and late career. The Gini coefficients for men only are also displayed in lighter gray. The cohort-based Gini coefficients

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

121

are consistent with our previous findings and display a U shape over the full period. Three results are notable. First, there is much more inequality in late career than in middle career, and in middle career than in early career, showing that long-term inequality fans out over the course of a working life. Second, the Gini series show that long-term inequality has been stable for the baby-boom cohorts born after 1945 in the sample of all workers (we can observe only early- and mid-career inequality for those cohorts, as their late-career earnings are not completed by 2004). Those results are striking in light of our previous results showing a worsening of inequality in annual and five-year average earnings. Third, however, the Gini series for men only show that inequality has increased substantially across baby-boom cohorts born after 1945. This sharp contrast between series for all workers versus men only reinforces our previous findings that gender effects play an important role in shaping the trends in overall inequality. We also find that cohort-based rank mobility measures display stability or even slight decreases over the last five decades in the full sample, but that rank mobility has decreased substantially in the sample of men (figure omitted to save space). This confirms that the evolution of long-term mobility is heavily influenced by gender effects, to which we now turn. V.C. The Role of Gender Gaps in Long-Term Inequality and Mobility As we saw, there are striking differences in the long-term inequality and mobility series for all workers vs. for men only: Long-term inequality has increased much less in the sample of all workers than in the sample of men only. Long-term mobility has increased over the last four decades in the sample of all workers, but not in the sample of men only. Such differences can be explained by the reduction in the gender gap that has taken place over the period. Figure X plots the fraction of women in our core sample and in various upper earnings groups: the fourth quintile group (P60–80), the ninth decile group (P80–90), the top decile group (P90–100), and the top percentile group (P99–100). As adult women aged 25 to 60 are about half of the adult population aged 25 to 60, with no gender differences in earnings, those fractions should be approximately 0.5. Those representation indices with no adjustment capture the total realized earnings gap including labor

122

QUARTERLY JOURNAL OF ECONOMICS

0.5 ●

Fraction of women in each group

0.4

All workers P60–80 P80–90 P90–100 P99–100

● ●

0.3

●●

●●●

● ●●●●●●

●●●

●●●●●●

●●●●

●

●●

● ●●●●

●●

●

●●

●●

●

● ●●

●●

●●●●

●●●●●●●

●●●●●●●●

●

0.2

0.1

0.0 1940

1950

1960

1970

1980

1990

2000

Year

FIGURE X Gender Gap in Upper Earnings Groups Sample is core sample (commerce and industry employees aged 25 to 60; see Figure I). The figure displays the fraction of women in various groups. P60–80 denotes the fourth quintile group from percentile 60 to percentile 80, P90–100 denotes the top 10%, etc. Because of top-coding in the micro data, estimates from 1943 to 1950 for P80–90 and P90–100 are estimated using published tabulations in Social Security Administration (1937–1952, 1967) and reported in lighter gray.

supply decisions.32 We use those representation indices instead of the traditional ratio of mean (or median) female earnings to male earnings because such representation indices remain meaningful in the presence of differential changes in labor force participation or in the wage structure across genders, and we do not have covariates to control for such changes, as is done in survey data (see, e.g., Blau, Ferber, and Winkler [2006]). Two elements in Figure X are worth noting. First, the fraction of women in the core sample of commerce and industry workers has increased from around 23% in 1937 to about 44% in 2004. World War II generated a temporary surge in women’s labor force participation, two-thirds of which was reversed immediately after the war.33 Women’s labor force participation has been steadily and continuously increasing since the mid-1950s and has been stable at around 43%–44% since 1990. 32. As a result, they combine not only the traditional wage gap between males and females but also the labor force participation gap (including the decision to work in the commerce and industry sector rather than other sectors or selfemployment). 33. This is consistent with the analysis of Goldin (1991), who uses unique micro survey data covering women’s workforce history from 1940 to 1951.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

123

Second, Figure X shows that the representation of women in upper earnings groups has increased significantly over the last four decades and in a staggered time pattern across upper earnings groups.34 For example, the fraction of women in P60–80 starts to increase in 1966 from around 8% and reaches about 34% in the early 1990s and has remained about stable since then. The fraction of women in the top percentile (P99– 100) does not really start to increase significantly before 1980. It grows from around 2% in 1980 to almost 14% in 2004 and is still quickly increasing. Those results show that the representation of women in top earnings groups has increased substantially over the last three to four decades. They also suggest that economic progress of women is likely to impact measures of upward mobility significantly, as many women are likely to move up the earnings distribution over their lifetimes. Indeed, we have found that such gender effects are strongest in upward mobility series such as the probability of moving from the bottom two quintile groups (those earning less than $25,500 in 2004) to the top quintile group (those earning over $59,000 in 2004) over a lifetime. Figure XI displays such upward mobility series, defined as the probability of moving from the bottom two quintile groups to the top quintile group after twenty years (conditional on being in the “long-term core sample” in both year t and year t + 20) for all workers, men, and women.35 The figure shows striking heterogeneity across groups. First, men have much higher levels of upward mobility than women. Thus, in addition to the annual earnings gap we documented, there is an upward mobility gap as well across groups. Second, the upward mobility gap has also been closing over time: the probability of upward mobility among men has been stable overall since World War II, with a slight increase up to the 1960s and declines after the 1970s. In contrast, the probability of upward mobility of women has continuously increased from a very low level of less than 1% in the 1950s to about 7% in the 1980s. The increase in upward mobility for women compensates for the stagnation or slight decline in mobility for men, so that upward mobility among 34. There was a surge in women in P60–80 during World War II, but this was entirely reversed by 1948. Strikingly, women were better represented in upper groups in the late 1930s than in the 1950s. 35. Note that quintile groups are always defined based on the sample of all workers, including both male and female workers.

Probability of moving from P0−40 to P80−100 (%) after twenty years

124

QUARTERLY JOURNAL OF ECONOMICS

10

8

6

4 ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

2 ●

All Men Women

0 1950

1955

1960

1965

1970

1975

1980

Year (middle of the initial eleven-year span)

FIGURE XI Long-Term Upward Mobility: Gender Effects The figure displays in year t the probability of moving to the top quintile group (P80–100) for eleven-year average earnings centered around year t + 20 conditional on having eleven-year average earnings centered around year t in the bottom two quintile groups (P0–40). The sample is defined as all individuals aged 25 to 60 in year t and t + 20, with average eleven-year “commerce and industry” earnings around years t and t + 20 above the minimum threshold. Because of small sample size, series including earnings before 1957 are smoothed using a weighted three-year moving average with weight of 0.5 for cohort t and weights of 0.25 for t − 1 and t + 1. The series are reported for all workers, men only, and women only. In all three cases, quintile groups are defined based on the sample of all workers.

all workers is slightly increasing.36 Figure XI also suggests that the gains in female annual earnings we documented above were in part due to earnings gains of women already in the labor force rather than entirely due to the entry of new cohorts of women with higher earnings. Such gender differential results are robust to conditioning on birth cohort, as series of early- to late-career upward mobility display a very similar evolution over time (see Online Appendix Figure A.10). Hence, our upward mobility results show that the economic progress of women since the 1960s has had a large impact on long-term mobility series among all U.S. workers. Table III summarizes the long-term inequality and mobility results for all (Panel A), men (Panel B), and women (Panel C) by 36. It is conceivable that upward mobility is lower for women because even within P0–40, they are more likely to be in the bottom half of P0–40 than men. Kopczuk, Saez, and Song (2007) show that controlling for those differences leaves the series virtually unchanged. Therefore, controlling for base earnings does not affect our results.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

125

TABLE III LONG-TERM INEQUALITY AND MOBILITY

Year (1)

11-year earnings average Gini (2)

1956 1978 1999

0.437 0.477 0.508

1956 1978 1999

0.376 0.429 0.506

1956 1978 1999

0.410 0.423 0.459

Rank correlation after 20 years (3)

Upward mobility after 20 years (4)

#Workers (’000s) (5)

A. All 0.572 0.494

0.037 0.053

42,753 61,828 94,930

B. Men 0.465 0.458

0.084 0.071

27,952 37,187 52,761

C. Women 0.361 0.358

0.008 0.041

14,801 24,641 42,169

Notes. The table displays various measures of eleven-year average earnings inequality and long-term mobility centered around selected years, 1956, 1978, and 1999, for all workers (Panel A), men (Panel B), and women (Panel C). The sample in year t is defined as all employees with commerce and industry earnings averaged across the eleven-year span from t − 5 to t + 5 above a minimum threshold ($2,575 in 2004 and indexed using average wage for earlier years) and aged 25 to 60 (by January 1 of year t). Column (2) reports the Gini coefficients for those eleven-year earnings averages. Column (3) reports the rank correlation between eleven-year average earnings centered around year t and eleven-year average earnings centered around year t + 20 in the sample of workers (1) aged between 25 and 60 in both years t and t + 20, and (2) with eleven-year average earnings above the minimum threshold in both earnings spans t − 5 to t + 5 and t + 15 to t + 25. Column (4) reports the probability of moving to the top quintile group (P80–100) for eleven-year average earnings centered around year t + 20 conditional on having eleven-year average earnings centered around year t in the bottom two quintile groups (P0–40). The sample is the same as in column (3). Column (5) reports the number of workers in thousands.

reporting measures for selected eleven-year spans (1950–1960, 1973–1983, and 1994–2004). VI. CONCLUSIONS Our paper has used U.S. Social Security earnings administrative data to construct series of inequality and mobility in the United States since 1937. The analysis of these data has allowed us to start exploring the evolution of mobility and inequality over a lifetime as well as to complement the more standard analysis of annual inequality and short-term mobility in several ways. We found that changes in short-term mobility have not substantially affected the evolution of inequality, so that annual snapshots of the distribution provide a good approximation of the evolution of the longer-term measures of inequality. In particular, we find that increases in annual earnings inequality are driven almost entirely by increases in permanent earnings inequality, with much more modest changes in the variability of transitory earnings.

126

QUARTERLY JOURNAL OF ECONOMICS

However, our key finding is that although the overall measures of mobility are fairly stable, they hide heterogeneity by gender groups. Inequality and mobility among male workers has worsened along almost any dimension since the 1950s: our series display sharp increases in annual earnings inequality, slight reductions in short-term mobility, and large increases in long-term inequality with slight reduction or stability of long-term mobility. Against those developments stand the very large earning gains achieved by women since the 1950s, due to increases in labor force attachment as well as increases in earnings conditional on working. Those gains have been so great that they have substantially reduced long-term inequality in recent decades among all workers, and actually almost exactly compensate for the increase in inequality for males. COLUMBIA UNIVERSITY AND NATIONAL BUREAU OF ECONOMIC RESEARCH UNIVERSITY OF CALIFORNIA BERKELEY AND NATIONAL BUREAU OF ECONOMIC RESEARCH SOCIAL SECURITY ADMINISTRATION

REFERENCES Abowd, John M., and Martha Stinson, “Estimating Measurement Error in SIPP Annual Job Earnings: A Comparison of Census Survey and SSA Administrative Data,” Cornell University, Mimeo, 2005. Attanasio, Orazio, Erich Battistin, and Hidehiko Ichimura, “What Really Happened to Consumption Inequality in the US?” in Measurement Issues in Economics—The Paths Ahead. Essays in Honor of Zvi Griliches, Ernst Berndt and Charles Hulten, eds. (Chicago: University of Chicago Press, 2007). Autor, David, Lawrence F. Katz, and Melissa Schettini Kearney, “Trends in U.S. Wage Inequality: Revising the Revisionists,” Review of Economics and Statistics, 90 (2008), 300–323. Baker, Michael, and Gary Solon, “Earnings Dynamics and Inequality among Canadian Men, 1976–1992: Evidence from Longitudinal Income Tax Records,” Journal of Labor Economics, 21 (2003), 289–321. Blau, Francine D., “Trends in the Well-being of American Women, 1970–1995,” Journal of Economic Literature, 36 (1998), 112–165. Blau, Francine D., Marianne Ferber, and Anne Winkler, The Economics of Women, Men and Work, 4th ed. (Prentice-Hall, 2006). Carroll, Robert, David Joulfaian, and Mark Rider, “Income Mobility: The Recent American Experience,” Andrew Young School of Policy Studies, Georgia State University Working Paper 07-18, 2007. Congressional Budget Office, “Trends in Earnings Variability over the Past 20 Years,” Letter to the Honorable Charles E. Schumer and the Honorable Jim Webb, April 2007. Available at http://www.cbo.gov/ftpdocs/80xx/doc8007/ 04-17-EarningsVariability.pdf. Cutler, David, and Lawrence Katz, “Macroeconomic Performance and the Disadvantaged,” Brookings Papers on Economic Activity, 2 (1991), 1–74. Dynan, Karen E., Douglas W. Elmendorf, and Daniel E. Sichel, “The Evolution of Household Income Volatility,” Brookings Institution Working Paper, 2008.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

127

Ferrie, Joseph P., “History Lessons: The End of American Exceptionalism? Mobility in the United States since 1850,” Journal of Economic Perspectives, 19 (2005), 199–215. Fields, Gary S., “Income Mobility,” Cornell University ILR School Working Paper 19, 2007. Available at http://digitalcommons.ilr.cornell.edu/workingpapers/ 19. Goldin, Claudia, Understanding the Gender Gap: An Economic History of American Women, NBER Series on Long-Term Factors in Economic Development (New York/Oxford/Melbourne: Oxford University Press, 1990). ——, “The Role of World War II in the Rise of Women’s Employment,” American Economic Review, 81 (1991), 741–756. ——, “The Quiet Revolution That Transformed Women’s Employment, Education, and Family,” American Economic Review Papers and Proceedings, 96 (2006), 1–21. Goldin, Claudia, and Robert A. Margo, “The Great Compression: The Wage Structure in the United States at Mid-Century,” Quarterly Journal of Economics, 107 (1992), 1–34. Gottschalk, Peter, “Inequality, Income Growth, and Mobility: The Basic Facts,” Journal of Economic Perspectives, 11 (1997), 21–40. Gottschalk, Peter, and Robert Moffitt, “The Growth of Earnings Instability in the U.S. Labor Market,” Brookings Papers on Economic Activity, 2 (1994), 217–254. Hacker, Jacob S, The Great Risk Shift: The Assault on American Jobs, Families Health Care, and Retirement—And How You Can Fight Back (Oxford, UK: Oxford University Press, 2006). Hungerford, Thomas L., “U.S. Income Mobility in the Seventies and Eighties,” Review of Income and Wealth, 39 (1993), 403–417. Katz, Lawrence F., and David Autor, “Changes in the Wage Structure and Earnings Inequality,” in Handbook of Labor Economics, Orley Ashenfelter and David Card, eds. (Amsterdam/New York: Elsevier/North Holland, 1999). Katz, Lawrence F., and Kevin M. Murphy, “Changes in Relative Wages, 1963– 87: Supply and Demand Factors,” Quarterly Journal of Economics, 107 (1992), 35–78. Kestenbaum, Bert, “Evaluating SSA’s Current Procedure for Estimating Untaxed Wages,” American Statistical Association Proceedings of the Social Statistics Section, Part 2 (1976), 461–465. Kopczuk, Wojciech, Emmanuel Saez, and Jae Song, “Uncovering the American Dream: Inequality and Mobility in Social Security Earnings Data since 1937,” National Bureau of Economic Research Working Paper 13345, 2007. Krueger, Dirk, and Fabrizio Perri, “Does Income Inequality Lead to Consumption Inequality? Evidence and Theory,” Review of Economic Studies, 73 (2006), 163– 193. Lemieux, Thomas, “Increasing Residual Wage Inequality: Composition Effects, Noisy Data, or Rising Demand for Skill?” American Economic Review, 96 (2006), 461–498. Lindert, Peter, “Three Centuries of Inequality in Britain and America,” in Handbook of Income Distribution, Anthony B. Atkinson and Francois Bourguignon, eds. (Amsterdam/New York: Elsevier/North Holland, 2000). Panis, Constantijn, Roald Euller, Cynthia Grant, Melissa Bradley, Christine E. Peterson, Randall Hirscher, and Paul Steinberg, SSA Program Data User’s Manual, RAND, 2000. Prepared for the Social Security Administration. Piketty, Thomas, and Emmanuel Saez, “Income Inequality in the United States, 1913–1998,” Quarterly Journal of Economics, 118 (2003), 1–39. Schiller, Bradley R., “Relative Earnings Mobility in the United States,” American Economic Review, 67 (1977), 926–941. Shin, Donggyun, and Gary Solon, “Trends in Men’s Earnings Volatility: What Does the Panel Study of Income Dynamics Show?” National Bureau of Economic Research Working Paper 14075, 2008. Shorrocks, Anthony F., “Income Inequality and Income Mobility,” Journal of Economic Theory, 19 (1978), 376–93. Slesnick, Daniel T., Consumption and Social Welfare: Living Standards and Their Distribution in the United States (Cambridge/New York/Melbourne: Cambridge University Press, 2001).

128

QUARTERLY JOURNAL OF ECONOMICS

Social Security Administration, Handbook of Old-Age and Survivors Insurance Statistics (annual), (Washington, DC: U.S. Government Printing Office, 1937– 1952). ——, Social Security Bulletin: Annual Statistical Supplement (Washington, DC: Government Printing Press Office, 1967). Solow, Robert M., “On the Dynamics of the Income Distribution,” Ph.D. dissertation (Harvard University, 1951). Topel, Robert H., and Michael P. Ward, “Job Mobility and the Careers of Young Men,” Quarterly Journal of Economics, 107 (1992), 439–479.

THE ROLE OF THE STRUCTURAL TRANSFORMATION IN AGGREGATE PRODUCTIVITY∗ MARGARIDA DUARTE AND DIEGO RESTUCCIA We investigate the role of sectoral labor productivity in explaining the process of structural transformation—the secular reallocation of labor across sectors—and the time path of aggregate productivity across countries. We measure sectoral labor productivity across countries using a model of the structural transformation. Productivity differences across countries are large in agriculture and services and smaller in manufacturing. Over time, productivity gaps have been substantially reduced in agriculture and industry but not nearly as much in services. These sectoral productivity patterns generate implications in the model that are broadly consistent with the cross-country data. We find that productivity catch-up in industry explains about 50% of the gains in aggregate productivity across countries, whereas low productivity in services and the lack of catch-up explain all the experiences of slowdown, stagnation, and decline observed across countries.

I. INTRODUCTION It is a well-known observation that over the last fifty years countries have experienced remarkably different paths of economic performance.1 Looking at the behavior of GDP per hour in individual countries relative to that in the United States, we find experiences of sustained catch-up, catch-up followed by a slowdown, stagnation, and even decline. (See Figure I for some illustrative examples.2 ) Consider, for instance, the experience of Ireland. Between 1960 and 2004, GDP per hour in Ireland relative to that of the United States rose from about 35% to 75%.3 Spain also experienced a period of rapid catch-up to the United States from 1960 to around 1990, a period during which relative GDP per hour rose from about 35% to 80%. Around 1990, however, this ∗ We thank Robert Barro, three anonymous referees, and Francesco Caselli for very useful and detailed comments. We also thank Tasso Adamopoulos, John Coleman, Mike Dotsey, Gary Hansen, Gueorgui Kambourov, Andr´es Rodr´ıguez-Clare, Richard Rogerson, Marcelo Veracierto, Xiaodong Zhu, and seminar participants at several conferences and institutions for comments and suggestions. Andrea Waddle provided excellent research assistance. All errors are our own. We gratefully acknowledge support from the Connaught Fund at the University of Toronto (Duarte) and the Social Sciences and Humanities Research Council of Canada (Restuccia). [email protected], [email protected] 1. See Chari, Kehoe, and McGrattan (1996), Jones (1997), Prescott (2002), and Duarte and Restuccia (2006), among many others. 2. We use GDP per hour as our measure of economic performance. Throughout the paper we refer to labor productivity, output per hour, and GDP per hour interchangeably. 3. All numbers reported refer to data trended using the Hodrick–Prescott filter. See Section II for details. C 2010 by the President and Fellows of Harvard College and the Massachusetts Institute of

Technology. The Quarterly Journal of Economics, February 2010

129

130

QUARTERLY JOURNAL OF ECONOMICS

FIGURE I Relative GDP per Hour—Some Countries GDP per hour in each country relative to that of the United States.

process slowed down dramatically and relative GDP per hour in Spain stagnated and later declined. Another remarkable growth experience is that of New Zealand, where GDP per hour fell from about 70% to 60% of that of the United States between 1970 and 2004. Along their modern paths of development, countries undergo a process of structural transformation by which labor is reallocated among agriculture, industry, and services. Over the last fifty years many countries have experienced substantial amounts of labor reallocation across sectors. For instance, from 1960 to 2004 the share of hours in agriculture in Spain fell from 44% to 6%, while the share of hours in services rose from 25% to 64%. In about the same period, the share of hours in agriculture in Belgium fell from 7% to 2%, while the share in services rose from 43% to 72%. In this paper we study the behavior of GDP per hour over time from the perspective of sectoral productivity and the structural transformation.4 Does a sectoral analysis contribute to the 4. See Baumol (1967) for a discussion of the implications of structural change on aggregate productivity growth.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

131

understanding of aggregate productivity paths? At a qualitative level the answer to this question is clearly yes. Because aggregate labor productivity is the sum of labor productivity across sectors weighted by the share of hours in each sector, the structural transformation matters for aggregate productivity. At a quantitative level the answer depends on whether there are substantial differences in sectoral labor productivity across countries. Our approach in this paper is to first develop a simple model of the structural transformation that is calibrated to the growth experience of the United States. We then use the model to measure sectoral labor productivity differences across countries at a point in time. These measures, together with data on growth in sectoral labor productivity, imply time paths of sectoral labor productivity for each country. We use these measures of sectoral productivity in the model to assess their quantitative effect on labor reallocation and aggregate productivity outcomes across countries. We find that there are large and systematic differences in sectoral labor productivity across countries. In particular, differences in labor productivity levels between rich and poor countries are larger in agriculture and services than in manufacturing. Moreover, over time, productivity gaps have been substantially reduced in agriculture and industry but not nearly as much in services. To illustrate the implications of these sectoral differences for aggregate productivity, imagine that productivity gaps remain constant as countries undergo the structural transformation. Then as developing countries reallocate labor from agriculture to manufacturing, aggregate productivity can catch up as labor is reallocated from a low–relative productivity sector to a high–relative productivity sector. Countries further along the structural transformation can slow down, stagnate, and decline as labor is reallocated from industry (a high–relative productivity sector) to services (a low–relative productivity sector). When the time series of sectoral productivity are fed into the model of the structural transformation, we find that high growth in labor productivity in industry relative to that of the United States explains about 50% of the catch-up in relative aggregate productivity across countries. Although there is substantial catch-up in agricultural productivity, we show that this factor contributes little to aggregate productivity gains in our sample countries. In addition, we show that low relative productivity in services and the lack of catch-up explain all the experiences of slowdown, stagnation, and decline in relative aggregate productivity observed across countries.

132

QUARTERLY JOURNAL OF ECONOMICS

We construct a panel data set on PPP-adjusted real output per hour and disaggregated output and hours worked for agriculture, industry, and services. Our panel data include 29 countries with annual data covering the period from 1956 to 2004 for most countries.5 From these data, we document three basic facts. First, countries follow a common process of structural transformation characterized by a declining share of hours in agriculture over time, an increasing share of hours in services, and a hump-shaped share of hours in industry. Second, there is substantial lag in the process of structural transformation for some countries, and this lag is associated with the level of relative income. Third, there are sizable and systematic differences in sectoral growth rates of labor productivity across countries. In particular, most countries observe higher growth rates of labor productivity in agriculture and manufacturing than in services. In addition, countries with high rates of aggregate productivity growth tend to have much higher productivity growth in agriculture and manufacturing than the United States, but this strong relative performance is not observed in services. Countries with low rates of aggregate labor productivity growth tend to observe low labor productivity growth in all sectors. We develop a general equilibrium model of the structural transformation with three sectors—agriculture, industry, and services. Following Rogerson (2008), labor reallocation across sectors is driven by two channels: income effects due to nonhomothetic preferences and substitution effects due to differential productivity growth across sectors.6 We calibrate the model to the structural transformation of the United States between 1956 and 2004. A model of the structural transformation is essential for the purpose of this paper for two reasons. First, we use the calibrated model to measure sectoral productivity differences across countries at one point in time. This step is needed because of the lack of comparable (PPP-adjusted) sectoral output data across a large set of countries. Second, the process of structural transformation is endogenous to the level and changes over time in sectoral labor productivity. As a result, a quantitative assessment of the aggregate implications of sectoral productivity differences requires that 5. Our sample does not include the poorest countries in the world: the labor productivity ratio between the richest and poorest countries in our data is only 10:1. 6. For recent models of the structural transformation emphasizing nonhomothetic preferences, see Kongsamut, Rebelo, and Xie (2001), and emphasizing substitution effects see Ngai and Pissarides (2007).

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

133

changes in the distribution of labor across sectors be consistent with sectoral productivity paths.7 The model implies that sectoral productivity levels in the first year in the sample tend to be lower in poor than in rich countries, particularly in agriculture and services, and the model implies low dispersion in productivity levels in manufacturing across countries. We argue that these differences in sectoral labor productivity levels implied by the model are consistent with the available evidence from studies using producer and micro data for specific sectors, for instance, Baily and Solow (2001) for manufacturing and service sectors and Restuccia, Yang, and Zhu (2008) for agriculture. These productivity levels together with data on sectoral labor productivity growth for each country imply time paths for sectoral productivity. Given these time paths, the model reproduces the broad patterns of labor reallocation and aggregate productivity growth across countries. The model also has implications for sectoral output and relative prices that are broadly consistent with the cross-country data. This paper is related to a large literature studying income differences across countries. Closely connected is the literature studying international income differences in the context of models with delay in the start of modern growth.8 Because countries in our data set have started the process of structural transformation well before the first year in the sample period, our focus is on measuring sectoral productivity across countries at a point in time and on assessing the role of their movement over time in accounting for the patterns of structural transformation and aggregate productivity growth across countries.9 Our paper is also closely related to a literature that emphasizes the sectoral composition of the economy in aggregate outcomes, for instance, Caselli and Coleman (2001), C´ordoba and Ripoll (2004), Coleman (2007), Chanda and Dalgaard (2008), Restuccia, Yang, and Zhu (2008), Adamopoulos and Akyol (2009), and Vollrath (2009).10 In studying the role of the structural transformation for cross-country aggregate productivity catch-up, our paper is closest to that of Caselli 7. This is in sharp contrast to the widely followed shift-share analysis approach where aggregate productivity changes are decomposed into productivity changes within sectors and labor reallocation. 8. See, for instance, Lucas (2000), Hansen and Prescott (2002), Ngai (2004), and Gollin, Parente, and Rogerson (2002). 9. Herrendorf and Valentinyi (2006) also consider a model to measure sectoral productivity levels across countries but instead use expenditure data from the Penn World Table. 10. See also the survey article by Caselli (2005) and the references therein.

134

QUARTERLY JOURNAL OF ECONOMICS

and Tenreyro (2006). We differ in that we use a model of the structural transformation to measure sectoral productivity levels and to assess the contribution of sectoral productivity for aggregate growth. In studying labor productivity over time, our paper is related to a literature studying country episodes of slowdown and depression.11 Most of this literature focuses on the effect of exogenous movements in aggregate total factor productivity and aggregate distortions on GDP relative to trend. We differ from this literature by emphasizing the importance of sectoral productivity in the structural transformation and the secular movements in relative GDP per hour across countries. The paper is organized as follows. In the next section we document some facts about the process of structural transformation and sectoral labor productivity growth across countries. Section III describes the economic environment and calibrates a benchmark economy to U.S. data for the period between 1956 and 2004. In Section IV we discuss the quantitative experiment and perform counterfactual analysis. We conclude in Section V.

II. SOME FACTS In this section we document the process of structural transformation and labor productivity growth in agriculture, industry, and services for the countries in our data set. Because we focus on long-run trends, data are trended using the Hodrick–Prescott filter with a smoothing parameter λ = 100. The Appendix provides a detailed description of the data. II.A. The Process of Structural Transformation The reallocation of labor across sectors over time is typically referred to in the economic development literature as the process of structural transformation. This process has been extensively documented.12 The structural transformation is characterized by a systematic fall over time in the share of labor allocated to agriculture, by a steady increase in the share of labor in services, and by a hump-shaped pattern for the share of labor in manufacturing. That is, the typical process of sectoral reallocation involves an increase in the share of labor in manufacturing in the early 11. See Kehoe and Prescott (2002) and the references therein. 12. See, for instance, Kuznets (1966) and Maddison (1980), among others.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

135

stages of the reallocation process, followed by a decrease in the later stages.13 We document the processes of structural transformation in our data set by focusing on the distribution of labor hours across sectors. We note, however, that this characterization is very similar to the one obtained by looking at shares of employment. Our panel data cover countries at very different stages in the process of structural transformation. For instance, our data include countries that in 1960 allocated about 70% of their labor hours to agriculture (e.g., Turkey and Bolivia), as well as countries that in the same year had shares of hours in agriculture below 10% (e.g., the United Kingdom). Despite this diversity, all countries in the sample follow a common process of structural transformation. First, all countries exhibit declining shares of hours in agriculture, even the most advanced countries in this process, such as the United Kingdom and the United States. Second, countries at an early stage of the process of structural transformation exhibit a hump-shaped share of hours in industry, whereas this share is decreasing for countries at a more advanced stage. Finally, all countries exhibit an increasing share of hours in services. To illustrate these features, Figure II plots sectoral shares of hours for Greece, Ireland, Spain, and Canada. The processes of structural transformation observed in our sample suggest two additional observations. First, the lag in the structural transformation observed across countries is systematically related to the level of development: poor countries have the largest shares of hours in agriculture, while rich countries have the smallest shares.14 Second, our data suggest the basic tendency for countries that start the process of structural transformation later to accomplish a given amount of labor reallocation faster than those countries that initiated this process earlier.15 13. In this paper we refer to manufacturing and industry interchangeably. In the Appendix we describe in detail our definition of sectors in the data. 14. See, for instance, Gollin, Parente, and Rogerson (2007) and Restuccia, Yang, and Zhu (2008) for a detailed documentation of this fact for shares of employment across a wider set of countries. 15. According to the U.S. Census Bureau (1975), Historical Statistics of the United States, the distribution of employment in the United States circa 1870 resembles that of Portugal in 1950. By 1948 the sectoral shares in the United States were 0.10, 0.34, and 0.56, levels that Portugal reached sometime during the 1990s. Although Portugal is lagging behind the process of structural transformation of the United States, it has accomplished about the same reallocation of labor across sectors in less than half the time (39 years as opposed to 89 years in the United States). See Duarte and Restuccia (2007) for a detailed documentation of these observations.

136

QUARTERLY JOURNAL OF ECONOMICS

FIGURE II Shares of Hours—Some Countries

II.B. Sectoral Labor Productivity Growth For the United States, the annualized growth rate of labor productivity between 1956 and 2004 has been highest in agriculture (3.8%), second in industry (2.4%), and lowest in services (1.3%).16 This ranking of growth rates of labor productivity across sectors is observed in 23 of the 29 countries in our sample, and in all countries but Venezuela, the growth rate in services is the smallest. Nevertheless, there is an enormous variation in sectoral labor productivity growth across countries. Figure III plots the annualized growth rate of labor productivity in each sector against the annualized growth rate of aggregate labor productivity for all countries in our data set. The sectoral growth rate of the United States in each panel is identified by the horizontal dashed line, whereas the vertical dashed 16. The annualized percentage growth rate of variable x over the period t to t + T is computed as [(xt+T /xt )1/T − 1] × 100.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

137

Annualized growth rate of aggregate labor productivity

FIGURE III Sectoral Growth Rates of Labor Productivity (%) Aggregate labor productivity is GDP per hour, whereas sectoral labor productivity is value added per hour in each sector. Annualized percentage growth rates during the sample period are given for each country. The horizontal lines indicate the sectoral growth rates observed in the United States, and the vertical line indicates the aggregate growth rate of the United States.

line marks the growth rate of aggregate productivity of the United States. This figure documents the tendency for countries to feature higher growth rates of labor productivity in agriculture and manufacturing than in services. For instance, in our panel, the average growth rates in agriculture and manufacturing are 4.0% and 3.1%, whereas the average growth rate in services is 1.3%. Figure III also illustrates that countries with low aggregate labor productivity growth relative to the United States tend to have low productivity growth in all sectors (e.g., Latin American countries), whereas countries with high relative aggregate labor productivity growth tend to have higher productivity growth than the United States in agriculture and, especially, industry (e.g., European countries, Japan, and Korea). For the countries that grew faster than the United States in aggregate productivity,

138

QUARTERLY JOURNAL OF ECONOMICS

labor productivity growth exceeds that for the United States by, on average, 1 percentage point in agriculture and 1.5 percentage points in industry. In contrast, labor productivity growth in services for these countries exceeds that for the United States by only 0.4 percentage point. The fact is that few countries have observed a much higher growth rate of labor productivity in services than the United States. These features of the data motivate some of the counterfactual exercises we perform in Section IV.

III. ECONOMIC ENVIRONMENT We develop a simple model of the structural transformation of an economy where at each date three goods are produced: agriculture, industry, and services. Following Rogerson (2008), labor reallocation across sectors is driven by two forces—an income effect due to nonhomothetic preferences and a substitution effect due to differential productivity growth between industry and services. We calibrate a benchmark economy to U.S. data and show that this basic framework captures the salient features of the structural transformation in the United States from 1956 to 2004. III.A. Description Production. At each date three goods are produced— agriculture (a), manufacturing (m), and services (s)—according to the following constant–returns to scale production functions: (1)

Yi = Ai Li ,

i ∈ {a, m, s},

where Yi is output in sector i, Li is labor input in sector i, and Ai is a sector-specific technology parameter.17 When mapping the model to data, we associate the labor input Li with hours allocated to sector i. We assume that there is a continuum of homogeneous firms in each sector that are competitive in goods and factor markets. At each date, given the price of good i, output pi , and wages w, a 17. We note that labor productivity in each sector is summarized in the model by the productivity parameter Ai . There are many features that can explain differences over time and across countries in labor productivity such as capital intensity and factor endowments. Accounting for these sources can provide a better understanding of labor productivity facts. Our analysis abstracts from the sources driving labor productivity observations.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

139

representative firm in sector i solves max{ pi Ai Li − wLi }.

(2)

Li ≥0

Households. The economy is populated by an infinitely lived representative household of constant size. Without loss of generality we normalize the population size to one. The household is endowed with L units of time each period, which are supplied inelastically to the market. We associate L with total hours per capita in the data. The household has preferences over consumption goods as follows: ∞

β t u(ca,t , ct ),

β ∈ (0, 1),

t=0

where ca,t is the consumption of agricultural goods at date t and ct is the consumption of a composite of manufacturing and service goods at date t. The per-period utility is given by ¯ + (1 − a) log(ct ), u(ca,t , ct ) = a log(ca,t − a)

a ∈ [0, 1],

where a¯ > 0 is a subsistence level of agricultural goods below which the household cannot survive. This feature of preferences has a long tradition in the development literature and it has been emphasized as a quantitatively important feature leading to the movement of labor away from agriculture in the process of structural transformation.18 The composite nonagricultural consumption good ct is given by 1/ρ ρ , ct = bcm,t + (1 − b)(cs,t + s¯ )ρ where s¯ > 0, b ∈ (0, 1), and ρ < 1. For s¯ > 0, these preferences imply that the income elasticity of service goods is greater than one. We note that s¯ works as a negative subsistence consumption level—when the income of the household is low, less resources are allocated to the production of services, and when the income of the household increases, resources are reallocated to services. The parameter s¯ can also be interpreted as a constant level of production of service goods at home. Our approach to modeling the 18. See, for instance, Echevarria (1997), Laitner (2000), Caselli and Coleman (2001), Kongsamut, Rebelo, and Xie (2001), Gollin, Parente, and Rogerson (2002), and Restuccia, Yang, and Zhu (2008).

140

QUARTERLY JOURNAL OF ECONOMICS

home sector for services is reduced-form. Rogerson (2008) considers a generalization of this feature where people can allocate time to market and nonmarket production of service goods. However, we argue that our simplification is not as restrictive as it may first appear, because we abstract from the allocation of time between market and nonmarket activities. Our focus is on the determination of aggregate productivity from the allocation of time across market sectors. Because we abstract from intertemporal decisions the problem of the household is effectively a sequence of static problems.19 At each date and given prices, the household chooses consumption of each good to maximize the per-period utility subject to the budget constraint. Formally, ρ 1 ρ , (3) max a log(ca − a) ¯ + (1 − a) log bcm + (1 − b)(cs + s¯ ) ci ≥0 ρ subject to pa ca + pmcm + ps cs = wL. Market Clearing. The demand for labor from firms must equal the exogenous supply of labor by households at every date: La + Lm + Ls = L.

(4)

Also, at each date, the market for each good produced must clear: (5)

ca = Ya ,

cm = Ym,

cs = Ys .

III.B. Equilibrium A competitive equilibrium is a set of prices { pa , pm, ps }, allocations {ca , cm, cs } for the household, and allocations {La , Lm, Ls } for firms such that (i) given prices, firm’s allocations {La , Lm, Ls } solve the firm’s problem in (2); (ii) given prices, household’s allocations {ca , cm, cs } solve the household’s problem in (3); and (iii) markets clear: equations (4) and (5) hold. The first-order condition from the firm’s problem implies that the benefit and cost of a marginal unit of labor must be equal. Normalizing the wage rate to one, this condition implies that prices 19. Because we are abstracting from intertemporal decisions such as investment, our analysis is not crucially affected by alternative stochastic assumptions on the time path for labor productivity.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

141

of goods are inversely related to productivity: pi =

(6)

1 . Ai

Note that in the model, price movements are driven solely by labor productivity changes. The first-order conditions for consumption imply that the labor input in agriculture is given by (7)

La = (1 − a)

a¯ s¯ . + a L+ Aa As

When a = 0, the household consumes a¯ of agricultural goods each period, and labor allocation in agriculture depends only on the level of labor productivity in that sector. When productivity in agriculture increases, labor moves away from the agricultural sector. This restriction on preferences implies that output and consumption per capita of agricultural goods are constant over time, implications that are at odds with data. When a > 0 and productivity growth is positive in all sectors, the share of labor allocated to agriculture converges asymptotically to a and the nonhomothetic terms in preferences become asymptotically irrelevant in the determination of the allocation of labor. In this case, output and consumption per capita of agricultural goods grow at the rate of labor productivity. The first-order conditions for consumption of manufacturing and service goods imply that b (1 − b)

cm cs + s¯

ρ−1

=

pm . ps

This equation can be rewritten as Lm =

(8)

(L − La ) + s¯ /As , 1+x

where x≡

b 1−b

1/(ρ−1)

Am As

ρ/(ρ−1)

,

142

QUARTERLY JOURNAL OF ECONOMICS

and La is given by (7).20 Equation (8) reflects the two forces that drive labor reallocation between manufacturing and services in the model. First, suppose that preferences are homothetic (i.e., s¯ = 0). In this case, Lm/Ls = 1/x and differential productivity growth in manufacturing relative to services is the only source of labor reallocation between these sectors (through movements in x) as long as ρ is not equal to zero. In particular, when s¯ = 0, the model can be consistent with the observed labor reallocation from manufacturing into services as labor productivity grows in the manufacturing sector relative to services if the elasticity of substitution between these goods is low (ρ < 0). Second, suppose that s¯ > 0 (i.e., preferences are nonhomothetic) and that either labor productivity grows at the same rate in manufacturing and services, or ρ = 0, so that x is constant. Then, for a given La , productivity improvements lead to the reallocation of labor from manufacturing into services (services are more income-elastic). The model allows both channels to be operating during the structural transformation. III.C. Calibration We calibrate a benchmark economy to U.S. data for the period from 1956 to 2004. Our calibration strategy involves selecting parameter values so that the equilibrium of the model matches the salient features of the structural transformation for the United States during this period. We assume that a period in the model is one year. We need to select parameter values for a, b, ρ, a, ¯ s¯ , and the time series of productivity for each sector Ai,t for t from 1956 to 2004 and i ∈ {a, m, s}. We proceed as follows. First, we normalize productivity levels across sectors to one in 1956; that is, Ai,1956 = 1 for all i ∈ {a, m, s}. Then we use data on the growth rate of sectoral value added per hour in the United States to obtain the time paths of sectoral labor productivity. In particular, denoting as γi,t the growth rate of labor productivity in sector i at date t, we obtain the time path of labor productivity in each sector as Ai,t+1 = (1 + γi,t )Ai,t . Second, with positive productivity growth in all sectors, the share of hours 20. When the growth rates of sectoral labor productivity are positive, the model implies that, in the long run, the share of hours in manufacturing and services asymptote to constants that depend on preference parameters a, b, ρ and any permanent level difference in labor productivity between manufacturing and services. If productivity growth in manufacturing is higher than in services, then the share of hours in manufacturing asymptotes to 0 and the share of hours in services to (1 − a).

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

143

TABLE I PARAMETER VALUES AND U.S. DATA TARGETS Parameter

Value

Target

Ai,1956 {Aa,t }2004 t=1957

1.0 {·}

Normalization Productivity growth in agriculture

{Am,t }2004 t=1957

{·}

Productivity growth in industry

{As,t }2004 t=1957 a a¯ s¯ b ρ

{·} 0.01 0.11 0.89 0.04 −1.5

Productivity growth in services Long-run share of hours in agriculture Share of hours in agriculture 1956 Share of hours in industry 1956 Share of hours in industry 1957–2004 Aggregate productivity growth

in agriculture converges to a in the long run. Because the share of hours in agriculture has been falling systematically and was about 3% in 2004, we assume a long-run share of 1%. Although this target is somewhat arbitrary, our main results are not sensitive to this choice. Third, given values for ρ and b, a¯ and s¯ are chosen to match the shares of hours in agriculture and manufacturing in the United States in 1956 using equations (7) and (8). Finally, b and ρ are jointly chosen to match as close as possible the share of hours in manufacturing over time and the annualized growth rate of aggregate productivity. The annualized growth rate in labor productivity in the United States between 1956 and 2004 is roughly 2%. Table I summarizes the calibrated parameters and targets. The shares of hours implied by the model are reported in Figure IV (dotted lines), together with data on the shares of hours in the United States (solid lines). The equilibrium allocation of hours across sectors in the model closely matches the process of structural transformation in the United States during the calibrated period. The model implies a fall in the share of hours in manufacturing from about 39% in 1956 to 24% in 2004, whereas the share of hours in services increases from about 49% to 73% during this period.21 Notice that even though the calibration only targets the share of hours in agriculture in 1956 (13%), the model implies a time path for the equilibrium share of hours in agriculture that is remarkably close to the data, declining to about 3% in 2004. 21. We emphasize that the model can deliver a hump-shaped pattern for labor in manufacturing for less developed economies even though during the calibrated period the U.S. economy is already in the second stage of the structural transformation, whereby labor is being reallocated away from manufacturing.

144

QUARTERLY JOURNAL OF ECONOMICS

FIGURE IV Share of Hours by Sector—Model vs. U.S. Data

The model also has implications for sectoral output and for relative prices. Sectoral output is given by labor productivity times labor input. Because the model matches closely the time path of sectoral labor allocation for the U.S. economy, the output implications of the model over time for the United States are very close to the data. In particular, the model implies that output growth in agriculture is 2.08% per year (versus 2.29% in the data), whereas output growth in manufacturing and services in the model is 2.74% and 3.60% (versus 2.70% and 3.61% in the data). The model implies that the producer price of good i relative to good i is given by the ratio of labor productivity in these sectors: (9)

pi Ai = . pi Ai

We assess the price implications of the model against data on sectoral relative prices.22 The model implies that the producer price of 22. Data for sectoral relative prices are available from 1971 to 2004. See the Appendix for details.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

145

services relative to industry increases by 0.94% per year between 1971 and 2004, very close to the increase in the data for the relative price of services from the implicit price deflators (0.87% per year). The price of agriculture relative to manufacturing declines in the model at a rate of 1.04% per year from 1971 to 2004. This fall in the relative price of agriculture is consistent with the data, although the relative price of agriculture falls somewhat more in the data (3.12% per year) than in the model.23 Because productivity growth across sectors is the driving force in the model, it is reassuring that this mechanism generates implications that are broadly consistent with the data. For this reason, we also discuss the relative price implications of the model when assessing the relevance of sectoral productivity growth for labor reallocation in the cross-country data in Section IV. IV. QUANTITATIVE ANALYSIS In this section, we assess the quantitative effect of sectoral labor productivity on the structural transformation and aggregate productivity outcomes across countries. In this analysis we maintain preference parameters as in the benchmark economy and proceed in three steps. First, we use the model to restrict the level of sectoral labor productivity in the first period for each country. Second, using these levels and data on sectoral labor productivity growth in each country as the exogenous time-varying factors, the model implies time paths for the allocation of hours across sectors and aggregate labor productivity for each country. We assess the cross-country implications of the model with data for labor reallocation across sectors, aggregate productivity, and relative prices. Third, we perform counterfactual exercises to assess the quantitative importance of sectoral analysis in explaining aggregate productivity experiences across countries. IV.A. Relative Sectoral Productivity Levels We use the model to restrict the levels of labor productivity in agriculture, industry, and services relative to those in the 23. We note that in the context of our model distortions to the price of agriculture would not substantially affect the equilibrium allocation of labor in agriculture because this is mainly determined by labor productivity in agriculture relative to the subsistence constraint (a is close to zero in the calibration). In this context, it would be possible to introduce price distortions to match the faster decline in the relative price of agriculture in the data without affecting our main quantitative results.

146

QUARTERLY JOURNAL OF ECONOMICS

United States for the first year in the sample for each country. This step is needed because of the lack of comparable (PPPadjusted) sectoral output data across a large set of countries. Because our data on sectoral value added are in constant local currency units, some adjustment is needed. Using market exchange rates would be problematic for arguments well discussed in the literature, such as Summers and Heston (1991). Another approach would be to apply the national currency shares of value added to the PPP-adjusted measure of real aggregate output from the Penn World Tables (PWT). This is problematic because it assumes that the PPP-conversion factor for aggregate output applies to all sectors in that country, whereas there is strong evidence that the PPP-conversion factors differ systematically across sectors in development.24 Using detailed categories from the International Comparisons Program (ICP) benchmark data in the PWT would also be problematic for inferences at the sector level because these data are based on the expenditure side of national accounts. For instance, it would not be advisable to use food expenditures and their PPP-conversion factor to adjust units of agricultural output across countries because food expenditures include charges for goods and services not directly related to agricultural production. Our approach is to use the model to back out sector-specific PPP-conversion factors for each country and to use the constantprice value-added data in local currency units to calculate growth rates of labor productivity in each sector for each country. In particular, we use the model to restrict productivity levels in the initial period and use the data on growth rates of labor productivity to construct the time series for productivity that we feed into the model. The underlying assumption is that the growth rate of value added in constant domestic prices is a good measure of real changes in output. This approach of using growth rates as a measure of changes in “quantities” is similar to the approach followed in the construction of panel data of comparable output across countries, such as the PWT.25 We proceed as follows. For each country j, we choose the three j j j labor productivity levels Aa , Am, and As to match three targets 24. See, for instance, the evidence on agriculture relative to nonagriculture in Restuccia, Yang, and Zhu (2008). 25. In particular, in the PWT, the growth rates of expenditure categories such as consumption and investment are the growth rates of constant domestic price expenditures from national accounts.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

147

FIGURE V Relative Labor Productivity across Sectors—First Year Labor productivity relative to the level of the United States.

from the data in the first year in the sample: (1) the share of hours in agriculture, (2) the share of hours in manufacturing (therefore the model matches the share of hours in services by labor market clearing), and (3) aggregate labor productivity relative to that of the United States.26 Figure V plots the average level of sectoral labor productivity relative to the level of the United States for countries in each quintile of aggregate productivity in the first year. The model implies that relative sectoral productivity in the first year tends to be lower in poorer countries than in richer countries, but particularly so in agriculture and services. In fact, the model implies 26. We adjust s¯ by the level of relative productivity in services in the first period for each country so that s¯ /As is constant across countries in the first period of the sample. Although it is not modeled explicitly, one interpretation of s¯ is as service goods produced at home. Therefore, s¯ cannot be invariant to large changes in productivity levels in services.

148

QUARTERLY JOURNAL OF ECONOMICS

FIGURE VI Relative Labor Productivity across Sectors—First and Last Years Labor productivity relative to the level of the United States.

that the dispersion of relative productivity in agriculture and services is much larger than in manufacturing. In the first year, the six poorest countries have relative productivity in agriculture and services of around 20% and 10%, whereas the six richest countries have relative productivity in these sectors of around 86% and 84%. In contrast, for manufacturing, average relative productivity of the six poorest countries in the first year is 31% and that of the six richest countries is 70%. The levels of sectoral labor productivity implied by the model for the first year, together with data on growth rates of sectoral value added per hour in local currency units, imply time paths for sectoral labor productivity in each country. In particular, letting j γi,t denote the growth rate of labor productivity in country j, sector j j j i, at date t, we obtain sectoral productivity as Ai,t+1 = (1 + γi,t )Ai,t . Figure VI plots the average level of sectoral labor productivity relative to the level in the United States in the first and last years

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

149

for countries in each quintile of aggregate productivity in the first year. We note that, on average, countries have experienced substantial gains in productivity in agriculture and industry relative to the United States (from an average relative productivity level of 48% and 51% in the first period to 71% and 75% in the last period). In sharp contrast, countries experienced, on average, much smaller gains in productivity in services relative to the United States (from an average relative productivity level of 46% to 49%). These features are particularly pronounced for countries in the top three quintiles of the productivity distribution. For these countries, average relative labor productivity in agriculture and industry increased from 66% and 59% to 100% and 85%, whereas average productivity in services increased from 63% to only 66%. We emphasize that the low levels of relative productivity in services in the first period together with the lack of catch-up over time imply that, for most countries, relative productivity levels in services are lower than those in agriculture and industry at the end of the sample period. Therefore, as these economies allocate an increasing share of hours to services, low relative labor productivity in this sector dampens aggregate productivity growth. These relative productivity patterns are suggestive of the results we discuss in Section IV.C, where we show that productivity catchup in industry explains a large portion of the gains in aggregate productivity across countries. In addition, we show that low relative productivity levels in services and the lack of catch-up play a quantitatively important role in explaining the growth episodes of slowdown, stagnation, and decline in aggregate productivity across countries. We argue that our productivity-level results are consistent with the available evidence from studies using producer and micro data. Empirical studies provide internationally comparable measures of labor productivity for some sectors and some countries. These studies typically provide estimates for narrow sectoral definitions at a given point in time. One such study for agriculture is from the Food and Agriculture Organization (FAO) of the United Nations. This study uses producer data (prices of detailed categories at the farm gate) to calculate international prices and comparable measures of output in agriculture using a procedure similar to that of Summers and Heston (1991) for the construction of the PWT. We find that the labor productivity differences in agriculture implied by the model are qualitatively consistent with the differences in GDP per worker in agriculture between

150

QUARTERLY JOURNAL OF ECONOMICS

rich and poor countries from the FAO for 1985.27 Baily and Solow (2001) have compiled a number of case studies from the McKinsey Global Institute (MGI) documenting labor productivity differences in some sectors and countries. Their findings are broadly consistent with our results. In particular, Baily and Solow emphasize a pattern that emerges from the micro studies where productivity differences across countries in services are not only large but also larger than the differences for manufacturing. The Organization for Economic Cooperation and Development (OECD) and MGI provide studies at different levels of sectoral disaggregation for manufacturing. These studies report relative productivity for a relatively small set of countries, and most studies report estimates only at one point in time. One exception is Pilat (1996). This study reports relative labor productivity levels in manufacturing for 1960, 1973, 1985, and 1995 for thirteen countries. Although the implied relative labor productivity levels in industry in our model tend to be higher than those reported in this study, the patterns of relative productivity are consistent for most countries. Finally, consistent with our findings, several studies report that the United States has higher levels of labor productivity in service sectors than other developed countries and that lower labor productivity in service sectors compared to manufacturing is pervasive.28 IV.B. The Structural Transformation across Countries Given paths for sectoral labor productivity, the model has time-series implications for the allocation of labor hours and output across sectors, aggregate labor productivity, and relative prices for each country. In this section we evaluate the implications of the model against the available cross-country data. Overall, the model reproduces the salient features of the structural transformation and aggregate productivity across countries. Figures VII and VIII illustrate this performance. Figure VII reports the shares of hours in each sector and relative aggregate productivity in the last period of the sample for each country in the model and in the data. Figure VIII reports the change in 27. See Restuccia, Yang, and Zhu (2008) for a detailed documentation of the cross-country differences in labor productivity in agriculture. 28. Baily, Farrell, and Remes (2005), for instance, estimate that, relative to the United States, France and Germany had lower relative productivity levels in 2000 and had lower growth rates of labor productivity between 1992 and 2000 for a set of narrowly defined service sectors, with the exception of mobile telecommunications.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

151

FIGURE VII Model vs. Data across Countries—Levels in the Last Year Each plot reports the value for each variable in the last period for the model and the data.

these variables (in percentage points) between the last and first periods in the model and in the data. As these figures illustrate, the model replicates well the patterns of the allocation of hours across sectors and relative aggregate productivity observed in the data, particularly so for the share of hours in agriculture and relative aggregate productivity. This performance attests to the ability of the model to replicate the basic trends observed for the share of hours in agriculture across a large sample of countries. Regarding the share of hours in industry, the model tends to imply a smaller increase over time compared to the data, particularly for less developed economies where the share of hours in industry increased over the sample period. Conversely, the model tends to imply a larger increase in the share of hours in services over the sample period than that observed in the data. This implication of the model suggests that, especially for some less developed countries, distortions or frictions in labor reallocation between industry and services may be important in accounting for their

152

QUARTERLY JOURNAL OF ECONOMICS

FIGURE VIII Model vs. Data across Countries—Changes Each plot reports the change between the last and first period (in percentage points) of each variable during the sample period in the data and in the model.

structural transformation.29 As a summary statistic for the performance of the model in replicating the time-series properties of the data, we compute the average absolute deviation (over time and across countries) in percentage points (p.p.) between a given time series in the model and in the data.30 The average absolute deviations for the shares of hours in agriculture, industry, and 29. Although in most cases the model does well in reproducing the time series in the data, in some countries modifications to the simple model would be required in order to better account for the process of structural transformation and aggregate productivity growth—see Duarte and Restuccia (2007) for an application of wedges across sectors in Portugal. These richer environments, however, would require country-specific analysis. We instead maintain our simple model specification and leave these interesting country-specific experiences for future research. 30. We measure the average absolute deviation in percentage points between the time series in the model and the data across countries as ϒ = 1 J T j d m j=1 t=1 abs(x j,t − x j,t ) × 100, where j is the country index and T j is the JT j

sample size for country j.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

153

services are 2, 6, and 7 p.p., respectively, and 4 p.p. for relative aggregate productivity. We conclude that the model captures the bulk of the labor reallocation and aggregate productivity experiences across countries. To better understand our finding about aggregate productivity, recall that aggregate labor productivity is the sum of labor productivity in each sector weighted by the share of labor in that sector, that is, Yi Li Y = . L i∈{a,m,s} Li L As a result, the behavior of aggregate productivity arises from the behavior of sectoral labor productivity and the allocation of labor across sectors over time.31 Because the model reproduces the salient features of labor reallocation across countries, aggregate productivity growth in the model is also broadly consistent with the cross-country data. The model has implications for sectoral output in each country. Sectoral output is given by the product of labor productivity and labor hours. As a result, the growth rate of output in sector i is the sum of the growth rates of labor productivity Ai (which we take from the data) and the growth in labor hours Li . The fact that the model reproduces well the cross-country patterns of the structural transformation implies that sectoral output growth is also well captured by the model. The model also has implications for levels and changes over time in relative prices across countries. We first discuss the implications for changes in relative prices. Figure IX plots the annualized percentage change in the producer prices of agriculture and services relative to manufacturing in the model and in the data. The figure shows that the model captures the broad patterns of price changes in the data—because productivity growth tends to be faster in agriculture than in industry and in industry than in services in most countries, the tendency is for the relative price of agriculture to fall and the relative price of services to increase over time. The direction of changes in the relative price of agriculture in the model matches the data for 23 of 29 countries in the sample (80%). For the relative price of services, the model is consistent 31. Note that in the above equation, sectoral labor productivity is measured at a common set of prices across countries. We use the prices of the benchmark economy in 1956.

154

QUARTERLY JOURNAL OF ECONOMICS

FIGURE IX Changes in Relative Prices (%) Each figure reports the annualized percentage change of the variable in the time series in the data and in the model. Relative prices of agriculture and services refer to the prices of agriculture and services relative to industry. Data on relative prices cover the period 1971 to 2004.

with the data in 25 countries (86%). We note that in the model, the only factors driving relative price changes over time are the growth in labor productivity across sectors. Of course, many other factors can affect the magnitude of price changes over time, so the model cannot capture all the changes. Now we turn to the implications of the model for price-level differences across countries. Recall that the prices of agriculture and services relative to industry are given by the inverse of labor productivity ( pa / pm = Am/Aa and ps / pm = Am/As ). The fact that the dispersion in productivity across rich and poor countries is large in agriculture and services relative to industry implies that the relative prices of agriculture and services are higher in poor than in rich countries. These implications may seem inconsistent at first with conventional wisdom about price-level differences

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

155

across countries. We emphasize that this view stems from observations about expenditure prices (often from ICP or PWT data) instead of producer prices. Our model, however, is better characterized as having implications for producer prices across countries. To see why the distinction between producer and expenditure prices is important, consider first the conventional wisdom that food is cheap in poor countries. This observation arises when the PPP-expenditure price of food is compared across countries using market exchange rates. For the sample of countries in Restuccia, Yang, and Zhu (2008), the dollar price of food is 60% higher in rich than in poor countries and the elasticity with respect to GDP per worker is positive and significant at 0.23.32 Food expenditures, however, include distribution and other charges— in the United States, for every dollar of food expenditure, only 20 cents represents payments to the farmer for the agricultural product—and the distinction between producer and expenditure prices may differ systematically across countries. In fact, producer-price data reveal a striking conclusion about the relative price of agriculture across countries: the evidence from FAO and PWT data in Restuccia, Yang, and Zhu (2008) is that the price of agricultural goods relative to nonagricultural goods is much lower in rich than in poor countries (a ratio of 0.22) and the elasticity of this relative price with respect to GDP per worker is negative and statistically significant at −0.34. This evidence is consistent with the price implications of the model for agriculture.33 Regarding the relative price of services, the conventional wisdom is that the price of services is higher in rich than in poor countries. This view stems again from observations about expenditure prices; see Summers and Heston (1991, pp. 338 and 339). We argue that this evidence is not necessarily inconsistent with the producer-price implications of the model, because the gap between expenditure and producer price-levels may be affected by many factors that can be systematically related to development.34 32. Note, however, that when the price of food is compared relative to the price of all goods, food appears expensive in poor countries. See Summers and Heston (1991, p. 338). 33. The distinction between food and agricultural goods prices is also important for the implications of price changes through time. For example, in the United States, the annualized growth rate of food prices from the Consumer Price Index relative to the price of manufacturing goods is positive, about 1% per year from 1971 to 2005, whereas the growth rate of the price of agriculture relative to manufacturing is negative, at roughly −2.5%. 34. Nevertheless, it is an interesting question for future research to assess the factors explaining higher expenditure price levels of services in rich countries.

156

QUARTERLY JOURNAL OF ECONOMICS

Because there are no systematic producer price-level data for services that can be compared with the price implications of the model, we focus instead on the indirect evidence from productivity measurements found in micro studies. The lower relative price of services in rich countries in the model stems from a higher relative productivity in services than in manufacturing compared to poor countries. Thus, we use the available sectoral productivity measurements to indirectly assess the price implications of the model for services. The evidence presented by Baily and Solow (2001) and other OECD studies discussed earlier suggests that labor productivity differences between rich and poor countries in services are larger than those for manufacturing sectors. This evidence is consistent with our productivity findings and therefore indirectly provides some assurance of the price implications of the model for services. IV.C. Counterfactuals We construct a series of counterfactuals aimed at assessing the quantitative importance of sectoral labor productivity on the process of structural transformation and aggregate productivity experiences across countries. We focus on two sets of counterfactuals. The first set is designed to illustrate the mechanics of positive sectoral productivity growth for labor reallocation and the contribution of productivity growth differences across sectors and countries for labor reallocation and aggregate productivity. The second set of counterfactuals focuses on explaining aggregate productivity growth experiences of catch-up, slowdown, stagnation, and decline by assessing the contribution of specific cross-country sectoral productivity patterns, such as productivity catch-up in agriculture and industry and low productivity levels and the lack of catch-up in services. The Mechanics of Sectoral Productivity Growth. We start by considering counterfactuals where we set the growth rate of labor productivity in one sector to zero in all countries, leaving the remaining growth rates as in the data. These counterfactuals illustrate the importance of productivity growth in each sector for labor reallocation and aggregate productivity. Summary statistics are reported in Figure X and Table II. In Figure X we report, for each country, the change in the time series of the share of hours in each sector and relative aggregate productivity between the last and first periods (in percentage points) in the counterfactual

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

157

FIGURE X The Mechanics of Sectoral Productivity Growth Counterfactuals (1) to (3) set the growth rate of labor productivity in a sector to zero in all countries, leaving the other sectors as in the data, for agriculture (first column), industry (second column), and services (third column). Counterfactual (4) sets labor productivity growth in each sector to aggregate productivity growth in the United States. Each panel plots the change between the last and the first period in the time series (in percentage points) of the share of hours in each sector and relative aggregate productivity in the model and in the counterfactual.

and in the model. In Table II we report the average change in the model for all countries, for countries that catch up, and for countries that decline relative to the United States. Consider first the counterfactual for agriculture (γa = 0). No productivity growth in agriculture generates no labor reallocation away from agriculture: there is an average increase in the share of hours in agriculture of 2 p.p. in the counterfactual instead of a decrease of 26 p.p. in the model. As a result, much less labor is reallocated to services. This counterfactual has important negative implications for relative aggregate productivity for most countries regardless of their level

158

QUARTERLY JOURNAL OF ECONOMICS TABLE II SECTORAL GROWTH, LABOR REALLOCATION, AND AGGREGATE PRODUCTIVITY Change in share of hours Agriculture

Model Counterfactual: (1) γa = 0 (2) γm = 0 (3) γs = 0 (4) γi = γ US

−25.5 2.1 −25.5 −25.2 −16.8

Model Counterfactual: (1) γa = 0 (2) γm = 0 (3) γs = 0 (4) γi = γ US

−24.3

Model Counterfactual: (1) γa = 0 (2) γm = 0 (3) γs = 0 (4) γi = γ US

−27.6

4.9 −24.3 −23.8 −13.3

−2.9 −27.6 −27.6 −23.2

Industry All countries −10.3

Services

Change in relative aggregate productivity

35.8

12.8

11.6 18.2 36.9 21.5

−0.5 −7.0 −2.2 0.4

Catch-up countries −13.5 37.8

25.8

−13.7 7.3 −11.8 −4.7

−17.3 9.5 −15.6 −4.5

12.4 14.8 39.4 17.8

7.9 −1.5 4.0 1.6

Decline countries −4.5 32.1

−10.5

−7.2 3.3 −4.9 −5.1

10.1 24.3 32.5 28.2

−15.7 −16.8 −13.3 −1.9

Notes. The table reports the average change between the last and first periods in the time series (in percentage points) of each variable for the model and the counterfactuals. Counterfactuals (1) to (3) assume zero growth in labor productivity in a sector, leaving the other sectoral growth rates as in the data. Counterfactual (4) assumes labor productivity growth in each sector equal to the aggregate productivity growth in the United States.

of development: there is an average decline in relative aggregate productivity of 1 p.p. in the counterfactual instead of the 13 p.p. increase in the model. Next we turn to the counterfactual for industry (γm = 0). This counterfactual has no effect on the share of hours in agriculture (see equation (7)). With no productivity growth in industry there is much less reallocation of labor away from industry into services compared to the model and thus industry represents a larger share of output in the counterfactual. The result is a process for relative aggregate productivity that is sharply diminished across countries: an average decline of 7 p.p. in the counterfactual instead of the catch-up of 13 p.p. in the model. And indeed the largest negative impact is on countries that observed the most

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

159

catch-up in relative aggregate productivity in the model. Finally, having no productivity growth in services (γs = 0) has a very small impact on labor reallocation across sectors.35 Relative aggregate productivity declines by an average of 2 p.p. in this counterfactual. The negative impact of this counterfactual on relative aggregate productivity is smaller than that in the case with no productivity growth in industry for all countries but three (Japan, Portugal, and Venezuela), even though services account for a larger share of hours than industry in most countries. We end this set of counterfactuals by assessing the quantitative importance of differences in labor productivity growth across sectors and countries. We set labor productivity growth in each sector to the growth rate of aggregate labor productivity in the United States (γi = γ US ) and document the results in Table II and in the fourth column in Figure X. The counterfactual has a substantial impact on the process of structural transformation. In particular, much less labor is reallocated away from agriculture and industry toward services. For instance, over the sample period, the share of hours in agriculture fell, on average, 26 p.p. in the model and 17 p.p. in the counterfactual. In turn, the share of hours in services increased, on average, 36 p.p. in the model and 22 p.p. in the counterfactual. And indeed this different reallocation process, together with the assumption about sectoral labor productivity growth, explains a large portion of the experiences of catch-up and decline in aggregate productivity. For countries that catch up in aggregate productivity to the United States in the model over the sample period, the average catch-up is 26 p.p. in the model and only 2 p.p. in the counterfactual. For countries that decline in relative aggregate productivity, the average decline is 11 p.p. in the model and only 2 p.p. in the counterfactual.36 We conclude from these counterfactuals that sectoral productivity growth generates substantial effects on labor reallocation, 35. This is due to two opposing effects of productivity growth in services on the labor allocation between industry and services, which roughly cancel each other in the model. See Duarte and Restuccia (2007, p. 42) for a detailed discussion of these effects. 36. Notice that this counterfactual does not eliminate all aggregate productivity growth differences across countries, even though productivity growth rates are identical across sectors and countries and labor reallocation is much diminished as a result. For instance, in the counterfactual, relative aggregate productivity in Finland increases by 8 p.p. over the sample period, and it decreases by 6 p.p. in Mexico. These movements in relative aggregate productivity in the counterfactual stem solely from labor reallocation across sectors (due to positive productivity growth) that have different labor productivity levels.

160

QUARTERLY JOURNAL OF ECONOMICS TABLE III CHANGE IN RELATIVE AGGREGATE PRODUCTIVITY

Model Counterfactual: (1) γi = γiUS (1a) Agriculture (1b) Industry (1c) Services (2) γi = γiUS ∀i (3) Catch-up in services

All countries

Catch-up countries

Decline countries

12.8

25.8

−10.5

11.5 6.0 10.4 3.9 30.7

23.2 13.9 18.3 5.8 46.9

−9.4 −8.4 −3.7 0.5 1.6

Notes. The table reports the average change between the last and first periods in the time series (in percentage points) of relative aggregate productivity for the model and the counterfactuals. Counterfactuals (1a) to (1c) set the growth rate in a sector to the rate in the United States in that sector. Counterfactual (2) sets the growth rate of all sectors to the sectoral growth rates in the United States. Counterfactual (3) sets the productivity growth in services such that in the last period in the sample relative productivity in services is the same as relative productivity in industry in each country.

which in turn are important in understanding aggregate productivity growth across countries. Sectoral Productivity Patterns and Cross-Country Experiences. We now turn to the second set of counterfactuals, where we assess the role of specific labor productivity patterns across sectors in explaining cross-country episodes of catch-up, slowdown, stagnation, and decline in relative aggregate productivity. In Figure VI we documented a substantial catch-up across countries in labor productivity in agriculture and industry but not in services. To assess the importance of sectoral catch-up for aggregate productivity, we compute counterfactuals where we set the growth rate of labor productivity in one sector to the growth rate in that sector in the United States, leaving the other sectoral growth rates as in the data (γi = γiUS for each i ∈ {a, m, s}). For completeness we also compute a counterfactual where all sectoral growth rates are set to the ones in the United States (γi = γiUS ∀i). Table III summarizes the results for these counterfactuals. Although there has been substantial catch-up of labor productivity in agriculture during the sample period (from an average relative productivity of 48% in the first period to 71% in the last period of the sample), this factor contributes little, about 10%, to catch-up in aggregate productivity across countries (1.3 p.p. of 12.8 p.p. in the model). The substantial catch-up in agricultural productivity produces a reallocation of labor away from this

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

161

FIGURE XI Change in Relative Aggregate Productivity—The Importance of Industry This counterfactual sets the growth of labor productivity in industry in each country to the rate in the United States. The figure plots the difference between the last and first periods (in percentage points) of relative aggregate productivity during the sample period in the model and in the counterfactual.

sector, which dampens its positive effect on aggregate productivity growth.37 The catch-up in industry productivity has also been substantial. Unlike agriculture, this catch-up has a significant impact on relative aggregate productivity. Given that most countries have observed higher growth rates of labor productivity in industry than the United States, labor reallocation away from industry and toward services is diminished in the counterfactual for industry. On average, the share of hours in industry decreases 6.5 p.p. in the counterfactual, compared to a decrease of 10.3 p.p. in the model. Figure XI summarizes our findings for the effect of this counterfactual on relative aggregate productivity by reporting 37. The effect of labor reallocation on relative aggregate productivity depends on the normalized end-of-period sectoral labor productivity. Because there is substantial catch-up in agricultural productivity but not in services, the effect of reallocation from agriculture to services is negative.

162

QUARTERLY JOURNAL OF ECONOMICS

the difference in relative aggregate productivity between the last and the first period in the time series for each country in the model and in the conterfactual. Industry productivity growth is important for countries that catch up in aggregate productivity to the United States, because these countries are substantially below the 45◦ line. In fact, we draw in this figure a dash-dotted line indicating half the gains in aggregate productivity in the counterfactual relative to the model. Many countries are in this category and some countries substantially below it, such as Australia, Sweden, and the United Kingdom. For all countries, the average change in relative aggregate productivity is only 6 p.p. in the counterfactual instead of 12.8 p.p. in the model.38 We conclude from this counterfactual that productivity catch-up in industry explains about 50% (6.8 p.p. of 12.8 in the model) of the relative aggregate productivity gains observed during the sample period. Recall that, in contrast to agriculture and industry, there has been no substantial catch-up in services across countries and, as reported in Figure VI, there has been a decline in relative productivity in services for the richer countries. As a result, even though services represent an increasing share of output in the economy, we do not expect services to contribute much to catchup in the model. This is confirmed in the third counterfactual, as productivity catch-up in services contributes about 15% of the catch-up in relative aggregate productivity (2.4 p.p. of 12.8 p.p. in the model). We note, however, that for countries that decline in relative aggregate productivity, lower growth in services than in the United States contributes substantially to this decline (−6.8 p.p. of −10.5 p.p. in the model; see Table III). Among the developed economies—which feature a large share of hours in services— Canada, New Zealand, and Sweden had lower productivity growth rates in services than the United States. In the model, Canada and New Zealand declined in relative aggregate productivity by 9 p.p. and 8 p.p. over the sample period, whereas Sweden observed a substantial catch-up in relative aggregate productivity but stagnated at around 82% during the mid-1970s. In the counterfactual, relative aggregate productivity increases by 3 p.p. in Canada, remains constant for New Zealand, and increases by 9 p.p. from the stagnated level in Sweden. Low productivity growth in services is 38. Note that among countries that decline in relative productivity the effect of industry growth is not systematic and the gaps are not as large.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

163

essential for understanding these growth experiences of stagnation and decline among rich economies. Figure VI also documents that the level of relative productivity in services is lower than that of industry and that most countries failed to catch up in services to the relative level of industry. For instance, the average relative productivity in services increased from 46% in the first period to 49% in the last period in the sample, whereas the average relative productivity in industry increased from 51% to 75%. In the last period of the sample, all countries except Austria, France, Denmark, the United Kingdom, and New Zealand feature lower relative productivity in services than in industry. Moreover, in many instances the differences in productivity between services and industry are substantial: around 40% lower in services in Spain, Finland, and Norway, around 60% lower in Portugal, and around 80% lower in Korea and Ireland. These features imply that the service sector represents an increasing drag on aggregate productivity as resources are reallocated to this sector in the process of structural transformation. To illustrate the role of low productivity in services and the lack of catch-up in accounting for the growth experiences of slowdown, stagnation, and decline, we compute a counterfactual where we let productivity growth in services be such that in the last period in the sample relative productivity in services is the same as relative productivity in industry in each country. Although the impact of these different productivity growth rates in services on labor reallocation is somewhat limited, the impact on growth experiences across countries is quite striking: for countries that catch up to the United States during the sample period, the average catch-up increases by almost 80% to 46 p.p., whereas for countries that decline there is instead a catch-up of 1.6 p.p. during the sample period. (See Table III.) More important, these summary statistics hide the impact of productivity in services in explaining experiences of slowdown, stagnation, and decline observed in the time series. For this reason, Figure XII plots the time path of relative aggregate productivity for all country experiences of slowdown, stagnation, and decline in relative aggregate productivity. The solid lines represent the model and the dash-dotted lines represent the counterfactual. This figure clearly indicates the extent to which low productivity in services and the lack of catch-up account for all these poor growth experiences. To summarize, although productivity convergence in industry (and agriculture) are essential in the first stages of the process

164

QUARTERLY JOURNAL OF ECONOMICS

FIGURE XII Relative Aggregate Productivity—The Importance of Services This counterfactual sets the productivity growth in services such that in the last period in the sample relative productivity in services is the same as relative productivity in industry in each country. Each panel plots aggregate labor productivity relative to that of the United States in the model and the counterfactual for each country which, during the sample period, experienced an episode of slowdown, stagnation, or decline. The solid line represents the model and the dash-dotted line the counterfactual.

of structural transformation, poor relative performance in services has determined a slowdown, stagnation, and decline in aggregate productivity. In fact, in the last period of the sample, almost all countries observe a lower relative labor productivity in services than in aggregate. (See Figure XIII.) Because growth rate differences across countries in the service sector tend to be small and services represent a large and increasing share of hours in most countries, this suggests an increasing role of services in determining cross-country aggregate productivity outcomes.

165

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

Relative labor productivity in services—last year

1.2

NOR

1

AUT FRA DNK BEL GBRNLD

0.8

NZL

0.6

CAN SWE AUSITA FIN

JPN IRL

ESP

0.4 VEN

0.2 BOL

0

0

CRI MEX COL TUR BRA

0.2

ARG CHL

GRC PRT

KOR

0.4

0.6

0.8

1

1.2

Relative aggregate labor productivity—last year FIGURE XIII Labor Productivity in Services across Countries—Last Period This figure plots relative labor productivity in services against relative aggregate productivity in the last period of the sample for all countries.

IV.D. Discussion Our analysis of structural transformation and aggregate productivity growth relies on a collection of closed economies. It is of interest to discuss the limitations and implications of this assumption for the results. Openness and trade can have two important effects in an economy. First, competition from trade can affect domestic productivity. Second, for an open economy, prices of traded goods reflect world market conditions and not just domestic factors. Regarding the effect of trade on productivity, we argue that the closed-economy assumption is not as restrictive for our analysis as it may first appear. To see this point, notice that the effect of openness on labor allocations and aggregate productivity is already embedded in the measures of labor productivity growth by sector, which the analysis takes as given. For instance, we found that the growth rate of labor productivity in manufacturing for Korea was almost three times that of the United States. It is

166

QUARTERLY JOURNAL OF ECONOMICS

likely that openness to trade during this period can help explain this fact. Moreover, openness would imply that productivity differences across countries for those goods that are most tradable would tend to be small relative to the differences for those goods that are less traded. The productivity implications of the model are consistent with this broad prediction, because differences in manufacturing productivity are smaller than productivity differences in services (mostly nontraded goods). It is an interesting question for future research to assess the importance of trade for productivity convergence in manufacturing across countries and the lack of convergence in services. Regarding the effect of trade on relative prices, recall that the closed-economy assumption implies a one-to-one mapping from sectoral productivity growth to relative prices. An open-economy version of the model would tend to produce a weaker link between domestic productivity growth and relative prices. In fact, in a small open economy, relative prices are invariant to domestic productivity. As we discussed earlier, the relative price implications of the model are broadly consistent with the data, which suggests that domestic productivity growth is a substantial component of the movements in relative prices. To put it differently, we found a strong correlation between changes in relative prices and labor productivity growth across countries, as documented in Figure IX. As a result, the labor allocations implied by the model are broadly consistent with the incentives that consumers face in these economies. We found that not all differences in relative prices are captured by the model. In particular, we found that the price of services relative to manufacturing increased faster in the model than in the data for many countries. This departure of the model from the data may arise not only from the closedeconomy assumption, but also from other features of the data, such as price distortions and barriers to labor reallocation across sectors. Finally, note that standard open-economy models imply that the prices of traded goods are equalized across countries. The evidence, however, suggests large departures from the law of one price. For instance, the price exercise on agricultural goods from the FAO suggests large price differences across countries and the international macro literature documents large deviations in prices across countries even for highly tradable goods. Another potential avenue to assessing the limitations of the closed-economy assumption of the model would be to compare the consumption and production implications relative to data. For

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

167

instance, in the closed economy, output and consumption shares are equal, but in the open economy, they would differ. Unfortunately, this implication cannot be tested directly, because consumption is measured as expenditures in final goods and any gap between production and consumption of goods may also be due to processing, distribution, and marketing services and other charges. But because for the more developed countries most of the trade occurs intra-industry, consumption and production shares of broad sectors tend not to differ greatly. V. CONCLUSIONS This paper highlights the role of sectoral labor productivity for the structural transformation and aggregate productivity experiences across countries. Using a model of the structural transformation that is calibrated to the growth experience of the United States, we showed that sectoral differences in labor productivity levels and growth explain the broad patterns of labor reallocation and aggregate productivity experiences across countries. We found that sectoral labor productivity differences across countries are large and systematic both at a point in time and over time. In particular, labor productivity differences between rich and poor countries are large in agriculture and services and smaller in manufacturing. Moreover, most countries have experienced substantial productivity catch-up in agriculture and industry, but productivity in services has remained low relative to the United States. An implication of these findings is that, as countries move through the process of structural transformation, relative aggregate labor productivity can first increase and later stagnate or decline. We find that labor productivity catch-up in manufacturing explains about 50% of the gains in aggregate productivity across countries and that low labor productivity in services and the lack of catch-up explain all the experiences of slowdown, stagnation, and decline in relative aggregate productivity across countries. Our findings suggest that understanding the sources of sectoral differences in labor productivity levels and growth across countries is crucial in understanding the relative performance of countries. In analyzing sectoral labor productivity levels and growth rates across countries, a number of interesting questions arise. What factors contribute to cross-country differences in labor productivity across sectors? Why were countries able to catch up in manufacturing productivity but not in services? What are the

168

QUARTERLY JOURNAL OF ECONOMICS

barriers that prevent other developed economies from sustaining growth rates of labor productivity in services as high as in the United States? How are trade openness and regulation related to these productivity differences across countries? Although there may not be a unifying explanation for all these observations, a recurrent theme in productivity studies at the sectoral level is that the threat or actual pressure of competition is crucial for productivity performance; see, for instance, Schmitz ´ (2005) and Gald´on-Sanchez and Schmitz (2002). Because services are less traded than manufacturing goods, there is a tendency for services to be less subject to competitive pressure, which may explain the larger productivity gaps observed in services relative to manufacturing across countries. Moreover, protected domestic sectors may be the explanation for poor productivity performance in some countries. Because openness to trade would not generally have the desired competitive-pressure impact in services, other factors such as the regulatory environment may prove useful in explaining productivity differences across countries in this sector. For instance, the role of land and size regulations on productivity in retail services is often emphasized; see, for instance, Baily and Solow (2001). As a first pass at providing some empirical support for this potential explanation for productivity differences across countries, we have correlated labor productivity differences in industry and services derived from our model to measures of trade openness and government regulation. We find that trade openness is strongly correlated with industry productivity but less so with services productivity, whereas measures of regulation (such as that from the World Bank’s Doing Business) are strongly correlated with productivity in services. We leave a detailed investigation of these important issues for future research. APPENDIX: DATA SOURCES AND DEFINITIONS We build a panel data set with annual observations for aggregate GDP per hour and value added per hour and shares of hours for agriculture, industry, and services for 29 countries. The countries covered in our data set are, with sample period in parentheses, Argentina (1950–2004), Australia (1964–2004), Austria (1960–2004), Belgium (1956–2004), Bolivia (1950–2002), Brazil (1950–2003), Canada (1956–2004), Chile (1951–2004), Colombia (1950–2003), Costa Rica (1950–2002), Denmark (1960–2004), Finland (1959–2004), France (1969–2003), Greece (1960–2004),

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

169

Ireland (1958–2004), Italy (1956–2004), Japan (1960–2004), Korea (1972–2003), Mexico (1950–2004), the Netherlands (1960– 2004), New Zealand (1971–2004), Norway (1956–2004), Portugal (1956–2004), Spain (1960–2004), Sweden (1960–2004), Turkey (1960–2003), the United Kingdom (1956–2004), the United States (1956–2004), and Venezuela (1950–2004). All series are trended using the Hodrick–Prescott filter with a smoothing parameter λ = 100 before any ratios are computed. A. Aggregate Data We obtain data on PPP-adjusted real GDP per capita in constant prices (RGDPL) and population (POP) from Penn World Tables version 6.2; see Heston, Summers, and Aten (2006). We obtain data on employment (EMP) and annual hours actually worked per person employed (HOURS) from the Total Economy Database; see the Conference Board (2008). With these data we construct annual time series of PPP-adjusted GDP per hour in constant prices for each country as Y Lh = RGDPL × POP/(EMP × HOURS). B. Sectoral Data We obtain annual data on employment, hours worked, and constant domestic-price value added for agriculture, industry, and services for the countries listed above. The sectors are defined by the International Standard Industrial Classification, revision 3 (ISIC III) definitions, with agriculture corresponding to ISIC divisions 1–5 (agriculture, forestry, hunting, and fishing), industry to ISIC divisions 10–45 (mining, manufacturing, construction, electricity, water, and gas), and services to ISIC divisions 50– 99 (wholesale and retail trade—including hotels and restaurants, transport, and government, financial, professional, and personal services such as education, health care, and real estate services). Value Added by Sector. Value added by sector is obtained by combining data from the World Bank (2008) World Development Indicators online and historical data from the OECD National Accounts publications for the following countries: Australia, Austria, Belgium, Canada, Denmark, Finland, France, Greece, Ireland, Italy, Japan, Korea, the Netherlands, New Zealand, Norway, Portugal, Spain, Sweden, Turkey, the United Kingdom, and the United States. The data series from the World Bank’s World Development Indicators are agriculture value added, industry

170

QUARTERLY JOURNAL OF ECONOMICS

value added, and services value added. All series are measured in constant local currency units, base year 2000 (with the exception of Turkey, 1987). These series are extended backward using historical data from the OECD National Accounts publications, except for Korea. A combination of three OECD publications was used: National Accounts of OECD Countries (1950–1968), National Accounts of OECD Countries (1950–1961), and National Accounts of OECD Countries (1960–1977); see OECD (1963, 1970, 1979). The primary resource was the book covering the period from 1950 to 1968. We compute growth rates of the OECD data for corresponding variables for years prior to those available through the World Bank and apply them to the World Bank series. Data on value added by sector for all Latin American countries in our data set (Argentina, Bolivia, Brazil, Chile, Colombia, Costa Rica, Mexico, and Venezuela) are obtained from the 10-Sector Database; see Timmer and de Vries (2009). This database has data on value added in constant local prices for ten sectors. These data are aggregated into value added in agriculture, industry, and services using the ISIC III definitions above. Employment by Sector. The sectoral employment data are obtained from a variety of sources as well. We obtain data on civilian employment in each broad sector from The OECD (2008) Labor Force Statistics database online for Australia, Austria, Belgium, Canada, Finland, France, Ireland, Italy, Japan, the Netherlands, New Zealand, Norway, Spain, Turkey, the United Kingdom, and the United States. Data for Portugal on sectoral employment are obtained from the Banco de Portugal (2006). The data are aggregated into the same three broad sectors. We extend this series forward to 2005 by using growth rates for each variable computed from the EU KLEMS database; see O’Mahony and Timmer (2009). Data for Korea and all Latin American countries are obtained from the 10-Sector Database. We aggregate these data into the three broad sectors using the ISIC III definitions above. Hours Worked by Sector. We obtain data on hours of work per worker from the EU KLEMS database for Australia, Austria, Belgium, Denmark, Finland, France, Ireland, Italy, Japan, the Netherlands, Portugal, Spain, Sweden, the United Kingdom, and the United States. These data cover the period 1970 to 2005. Data for Brazil, Canada, Chile, Colombia, Costa Rica, Greece, Mexico, Norway, New Zealand, and Turkey are obtained from the International Labour Office (2008) Laborsta database. These series are

STRUCTURAL TRANSFORMATION IN PRODUCTIVITY

171

much shorter; the time period covered varies by country, but it starts after 1990 for all countries. From these data, we compute the ratio of per-worker hours by sector relative to per-worker aggregate hours. In analyzing these ratios, we find that relative sectoral hours are remarkably stable over time for most countries and that these ratios are very close to one for many countries. Moreover, any deviations from one in relative hours across countries are not systematically related to the level of development. For each country, we use the average value of each of these ratios, denoted as hi , i ∈ {a, m, s}, to calculate shares of hours by sector and value added per hour by sector. Because the time series of sectoral hours are shorter than those of sectoral employment and value added, this simplification allows us to compute sectoral shares of total hours and value added per hour without shortening the time series. We do not have data on sectoral hours for Argentina, Bolivia, Korea, and Venezuela, and we assume that hi = 1 for these countries. Total hours by sector are computed by multiplying employment with hours per worker in each sector. We construct value added per hour by dividing the series of value added with the corresponding series of total hours for each sector. Shares of hours by sector are simply the ratio of total hours by sector relative to total aggregate hours. Prices by Sector. We compute implicit producer price deflators for each sector using data on sectoral value added at constant and current prices from the World Development Indicators. The price data are consistent with the sectoral definitions for value added. They cover the period from 1971 to 2004. DEPARTMENT OF ECONOMICS, UNIVERSITY OF TORONTO DEPARTMENT OF ECONOMICS, UNIVERSITY OF TORONTO

REFERENCES Adamopoulos, Tasso, and Ahmet Akyol, “Relative Underperformance Alla Turca,” Review of Economic Dynamics, 12 (2009), 697–717. Baily, Martin, Diana Farrell, and Jaana Remes, “Domestic Services: The Hidden Key to Growth,” McKinsey Global Institute, 2005. Baily, Martin, and Robert Solow, “International Productivity Comparisons Built from the Firm Level,” Journal of Economic Perspectives, 15 (2001), 151–172. Banco de Portugal, “S´eries Longas para a Economia Portuguesa p´os II Guerra Mundial, 2006.” Available at http://www.bportugal.pt/publish/serlong/ serlong p.htm. Baumol, William, “Macroeconomics of Unbalanced Growth: The Anatomy of Urban Crisis,” American Economic Review, 57 (1967), 415–426. Caselli, Francesco, “Accounting for Cross-Country Income Differences,” in Handbook of Economic Growth, Philippe Aghion and Steven Durlauf, eds. (New York: North Holland Elsevier, 2005).

172

QUARTERLY JOURNAL OF ECONOMICS

Caselli, Francesco, and Wilbur J. Coleman II, “The U.S. Structural Transformation and Regional Convergence: A Reinterpretation,” Journal of Political Economy, 109 (2001), 584–616. Caselli, Francesco, and Silvana Tenreyro, “Is Poland the Next Spain?” in NBER International Seminar on Macroeconomics 2004, Richard Clarida, Jeffrey Frankel, Francesco Giavazzi, and Kenneth West, eds. (Cambridge, MA: The MIT Press, 2006). Chanda, Areendam, and Carl-Johan Dalgaard, “Dual Economies and International Total Factor Productivity Differences: Channelling the Impact from Institutions, Trade, and Geography,” Economica, 75 (2008), 629–661. Chari, Varadarajan V., Patrick Kehoe, and Ellen McGrattan, “The Poverty of Nations: A Quantitative Exploration,” NBER Working Paper No. 5414, 1996. Coleman, Wilbur J., “Accommodating Emerging Giants,” Mimeo, Duke University, 2007. Conference Board, Total Economy Database, 2008. Available at www.conference -board.org/economics/. C´ordoba, Juan, and Marla Ripoll, “Agriculture, Aggregation, and Development Accounting,” Mimeo, University of Pittsburgh, 2004. Duarte, Margarida, and Diego Restuccia, “The Productivity of Nations,” Federal Reserve Bank of Richmond Economic Quarterly, 92 (2006), 195–223. ——, “The Structural Transformation and Aggregate Productivity in Portugal,” Portuguese Economic Journal, 6 (2007), 23–46. Echevarria, Cristina, “Changes in Sectoral Composition Associated with Growth,” International Economic Review, 38 (1997), 431–452. ´ Gald´on-Sanchez, Jos´e, and James Schmitz Jr., “Competitive Pressure and Labor Productivity: World Iron-Ore Markets in the 1980’s,” American Economic Review, 92 (2002), 1222–1235. Gollin, Douglas, Stephen Parente, and Richard Rogerson, “The Role of Agriculture in Development,” American Economic Review Papers and Proceedings, 92 (2002), 160–164. ——, “The Food Problem and the Evolution of International Income Levels,” Journal of Monetary Economics, 54 (2007), 1230–1255. Hansen, Gary, and Edward C. Prescott, “From Malthus to Solow,” American Economic Review, 92 (2002), 1205–1217. ` Herrendorf, Berthold, and Akos Valentinyi, “Which Sectors Make the Poor Countries So Unproductive?” Mimeo, Arizona State University, 2006. Heston, Alan, Robert Summers, and Bettina Aten, “Penn World Table Version 6.2,” Center for International Comparisons of Production, Income and Prices at the University of Pennsylvania, 2006. Available at http://pwt.econ.upenn.edu. International Labour Office, “LABORSTA Database,” Bureau of Statistics, 2008. Available at http://laborsta.ilo.org/. Jones, Charles (1997) “On the Evolution of the World Income Distribution,” Journal of Economic Perspectives, 11 (1997), 19–36. Kehoe, Timothy, and Edward C. Prescott, “Great Depressions of the 20th Century,” Review of Economic Dynamics, 5 (2002), 1–18. Kongsamut, Piyabha, S´ergio Rebelo, and Danyang Xie, “Beyond Balanced Growth,” Review of Economic Studies, 68 (2001), 869–882. Kuznets, Simon, Modern Economic Growth (New Haven, CT: Yale University Press, 1966). Laitner, John, “Structural Change and Economic Growth,” Review of Economic Studies, 67 (2000), 545–561. Lucas, Robert, “Some Macroeconomics for the 21st Century,” Journal of Economic Perspectives, 14 (2000), 159–168. Maddison, Angus, “Economic Growth and Structural Change in the Advanced Countries,” in Western Economies in Transition, Irving Leveson and Jimmy Wheeler, eds. (London: Croom Helm, 1980). Ngai, Rachel, “Barriers and the Transition to Modern Growth,” Journal of Monetary Economics, 51 (2004), 1353–1383. Ngai, Rachel, and Christopher Pissarides, “Structural Change in a Multisector Model of Growth,” American Economic Review, 97 (2007), 429–443. OECD, National Accounts of OECD Countries: Detailed Tables, Volume II, 1950– 1961 (Paris, France: OECD, 1963).

STRUCTURAL TRANSFORMATION IN PRODUCTIVITY

173

——, National Accounts of OECD Countries: Detailed Tables, Volume II, 1950–1968 (Paris, France: OECD, 1970). ——, National Accounts of OECD Countries: Detailed Tables, Volume II, 1960–1977 (Paris, France: OECD, 1979). ——, Labor Force Statistics, 2008. Available at http://hermia.sourceoecd.org/ vl=718832/cl=16/nw=1/rpsv/outlookannuals.htm. O’Mahony, Mary, and Marcel P. Timmer, “Output, Input and Productivity Measures at the Industry Level: The EU KLEMS Database,” Economic Journal, 119 (2009), F374–F403. Available at www.euklems.net. Pilat, Dirk, “Labour Productivity Levels in OECD Countries: Estimates for Manufacturing and Selected Service Sectors,” OECD Working Paper No. 169, 1996. Prescott, Edward C., “Prosperity and Depression,” American Economic Review, 92 (2002), 1–15. Restuccia, Diego, Dennis Yang, and Xiaodong Zhu, “Agriculture and Aggregate Productivity: A Quantitative Cross-Country Analysis,” Journal of Monetary Economics, 55 (2008), 234–250. Rogerson, Richard, “Structural Transformation and the Deterioriation of European Labor Market Outcomes,” Journal of Political Economy, 116 (2008), 235– 259. Schmitz, James Jr., “What Determines Productivity? Lessons from the Dramatic Recovery of the U.S. and Canadian Iron Ore Industries Following Their Early 1980s Crisis,” Journal of Political Economy, 113 (2005), 582–625. Summers, Robert, and Alan Heston, “The Penn World Table: An Expanded Set of International Comparisons, 1950–1988,” Quarterly Journal of Economics, 106 (1991), 327–368. Timmer, Marcel P., and Gaaitzen J. de Vries, “Structural Change and Growth Accelerations in Asia and Latin America: A New Sectoral Data Set,” Cliometrica, 3 (2009), 165–190. Available at www.ggdc.net. U.S. Census Bureau, Department of Commerce, Historical Statistics of the United States: Colonial Times to 1970 (Part I) (Washington, DC: U.S. Government Printing Office, 1975). Vollrath, Dietrich, “How Important Are Dual Economy Effects for Aggregate Productivity?” Journal of Development Economics, 88 (2009), 325–334. World Bank, World Development Indicators, 2008. Available at http://devdata .worldbank.org/dataonline/.

TEACHER QUALITY IN EDUCATIONAL PRODUCTION: TRACKING, DECAY, AND STUDENT ACHIEVEMENT∗ JESSE ROTHSTEIN Growing concerns over the inadequate achievement of U.S. students have led to proposals to reward good teachers and penalize (or fire) bad ones. The leading method for assessing teacher quality is “value added” modeling (VAM), which decomposes students’ test scores into components attributed to student heterogeneity and to teacher quality. Implicit in the VAM approach are strong assumptions about the nature of the educational production function and the assignment of students to classrooms. In this paper, I develop falsification tests for three widely used VAM specifications, based on the idea that future teachers cannot influence students’ past achievement. In data from North Carolina, each of the VAMs’ exclusion restrictions is dramatically violated. In particular, these models indicate large “effects” of fifth grade teachers on fourth grade test score gains. I also find that conventional measures of individual teachers’ value added fade out very quickly and are at best weakly related to long-run effects. I discuss implications for the use of VAMs as personnel tools.

I. INTRODUCTION Parallel literatures in labor economics and education adopt similar econometric strategies for identifying the effects of firms on wages and of teachers on student test scores. Outcomes are modeled as the sum of firm or teacher effect, individual heterogeneity, and transitory, orthogonal error. The resulting estimates of firm effects are used to gauge the relative importance of firm and worker heterogeneity in the determination of wages. In education, so-called “value added” models (hereafter, VAMs) have been used to measure the importance of teacher quality to educational production, to assess teacher preparation and certification programs, and as important inputs to personnel evaluations and merit pay programs.1 ∗ Earlier versions of this paper circulated under the title “Do Value Added Models Add Value?” I am grateful to Nathan Wozny and Enkeleda Gjeci for exceptional research assistance. I thank Orley Ashenfelter, Henry Braun, David Card, Henry Farber, Bo Honor´e, Brian Jacob, Tom Kane, Larry Katz, Alan Krueger, Sunny Ladd, David Lee, Lars Lefgren, Austin Nichols, Amine Ouazad, Mike Rothschild, Cecilia Rouse, Diane Schanzenbach, Eric Verhoogen, Tristan Zajonc, anonymous referees, and conference and seminar participants for helpful conversations and suggestions. I also thank the North Carolina Education Data Research Center at Duke University for assembling, cleaning, and making available the confidential data used in this study. Financial support was generously provided by the Princeton Industrial Relations Section and Center for Economic Policy Studies and the U.S. Department of Education (under Grant R305A080560). [email protected] 1. On firm effects, see, for example, Abowd and Kramarz (1999). For recent examinations of teacher effects modeling, see McCaffrey et al. (2003); Wainer (2004); Braun (2005a, 2005b); and Harris and Sass (2006). C 2010 by the President and Fellows of Harvard College and the Massachusetts Institute of

Technology. The Quarterly Journal of Economics, February 2010

175

176

QUARTERLY JOURNAL OF ECONOMICS

All of these applications suppose that the estimates can be interpreted causally. But observational analyses can identify causal effects only under unverifiable assumptions about the correlation between treatment assignment—the assignment of students to teachers, or the matching of workers to firms—and other determinants of test scores and wages. If these assumptions do not hold, the resulting estimates of teacher and firm effects are likely to be quite misleading. Anecdotally, assignments of students to teachers incorporate matching to take advantage of teachers’ particular specialties, intentional separation of children who are known to interact badly, efforts on the principal’s part to reward favored teachers through the allocation of easy-to-teach students, and parental requests (see, e.g., Monk [1987]; Jacob and Lefgren [2007]). These are difficult to model statistically. Instead, VAMs typically assume that teacher assignments are random conditional on a single (observed or latent) factor. In this paper, I develop and implement tests of the exclusion restrictions of commonly used value added specifications. My strategy exploits the fact that future teachers cannot have causal effects on past outcomes, whereas violations of model assumptions may lead to apparent counterfactual “effects” of this form. Test scores, like wages, are serially correlated, and as a result an association between the current teacher and the lagged score is strong evidence against exogeneity with respect to the current score. I examine three commonly used VAMs, two of which have direct parallels in the firm effects literature. In the simplest, most widely used VAM—which resembles the most common specification for firm effects—the necessary exclusion restriction is that teacher assignments are orthogonal to all other determinants of the so-called “gain” score, the change in a student’s test score over the course of the year. If this restriction holds, fifth grade teacher assignments should not be correlated with students’ gains in fourth grade. Using a large microdata set describing North Carolina elementary students, I find that there is in fact substantial within-school dispersion of students’ fourth grade gains across fifth grade classrooms. Sorting on past reading gains is particularly prominent, though there is clear evidence of sorting on math gains as well. Because test scores exhibit strong mean reversion— and thus gains are negatively autocorrelated—sorting on past gains produces bias in the simple VAM’s estimates.

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

177

The other VAMs that I consider rely on different exclusion restrictions, namely that classroom assignments are as good as random conditional on either the lagged test score or the student’s (unobserved, but permanent) ability. I discuss how similar strategies can be used to test these restrictions as well. I find strong evidence in the data against each. Evidently, classroom assignments respond dynamically to annual achievement in ways that are not captured by the controls typically included in VAM specifications. To evaluate the magnitude of the biases that assignments produce, I compare common VAMs to a richer model that conditions on the complete achievement history. Estimated teacher effects from the rich model diverge importantly from those obtained from the simple VAMs in common use. I discuss how selection on unobservables is likely to produce substantial additional biases. I use a simple simulation to explore the sensitivity of teacher rankings to these biases. Under plausible assumptions, simple VAMs can be quite misleading. The rich VAM that controls for all observables does better, but still yields rankings that diverge meaningfully from the truth. My estimates also point to an important substantive result. To the extent that any of the VAMs that I consider identify causal effects, they indicate that teachers’ long-run effects are at best weakly proxied by their immediate impacts. A teacher’s effect in the year of exposure—the universal focus of value added analyses—is correlated only .3 to .5 with her cumulative effect over two years, and even less with her effect over three years. Accountability policies that rely on measures of short-term value added would do an extremely poor job of rewarding the teachers who are best for students’ longer-run outcomes. An important caveat to the empirical results is that they may be specific to North Carolina. Students in other states or in individual school districts might be assigned to classrooms in ways that satisfy the assumptions required for common VAMs. But at the least, VAM-style analyses should attempt to evaluate the model assumptions, perhaps with methods like those used here. Models that rely on incorrect assumptions are likely to yield misleading estimates, and policies that use these estimates in hiring, firing, and compensation decisions may reward and punish teachers for the students they are assigned as much as for their actual effectiveness in the classroom. Section II reviews the use of preassignment variables to test exogeneity assumptions. Section III introduces the three VAMs,

178

QUARTERLY JOURNAL OF ECONOMICS

discusses their implicit assumptions, and describes my proposed tests. Section IV describes the data. Section V presents results. Section VI attempts to quantify the biases that nonrandom classroom assignments produce in VAM-based analyses. Section VII presents evidence on teachers’ long-run effects. I conclude, in Section VIII, by discussing some implications for the design of incentive pay systems in education. II. USING PANEL DATA TO TEST EXCLUSION RESTRICTIONS A central assumption in all econometric studies of treatment effects is that the treatment is uncorrelated with other determinants of the outcome, conditional on covariates. Although the assumption is ultimately untestable—the “fundamental problem of causal inference” (Holland 1986)—the data can provide indications that it is unlikely to hold. In experiments, for example, significant correlations between treatment and preassignment variables are interpreted as evidence that randomization was unsuccessful.2 Panel data can be particularly useful. A correlation between treatment and some preassignment variable X need not indicate bias in the estimated treatment effect if X is uncorrelated with the outcome variable of interest. But outcomes are typically correlated within individuals over time, so an association between treatment and the lagged outcome strongly suggests that the treatment is not exogenous with respect to posttreatment outcomes. This insight has been most fully explored in the literature on the effect of job training on wages and employment. Today’s wage or employment status is quite informative about tomorrow’s, even controlling for all observables. Evidence that assignment to job training is correlated with lagged wage dynamics indicates that simple specifications for the effect of training on outcomes are likely to yield biased estimates (Ashenfelter 1978). Richer models of the training assignment process may absorb this correlation while permitting identification (Heckman, Hotz, and Dabos 1987). But even these models may impose testable restrictions on the relationship between treatment and the outcome history 2. Similar tests are often used in nonexperimental analyses: Researchers conducting propensity score matching studies frequently check for “balance” of covariates conditional on the propensity score (Rosenbaum and Rubin 1984), and Imbens and Lemieux (2008) recommend analogous tests for regression discontinuity analyses.

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

179

(Ashenfelter and Card 1985; Card and Sullivan 1988; Jacobson, LaLonde, and Sullivan 1993).3 In value added studies, the multiplicity of teacher “treatments” can blur the connection to program evaluation methods. But the utility of past outcomes for specification diagnostics carries over directly. Identification of a teacher’s effect rests on assumptions about the relationship between the teacher assignment and the other determinants of future achievement, and the relationship with past achievement can be informative about the plausibility of these assumptions. Only a few studies have attempted to validate VAMs. Harris and Sass (2007) and Jacob and Lefgren (2008) show that value added coefficients are weakly but significantly correlated with principals’ ratings of teacher performance. Of course, if principal decisions about classroom assignments created bias in the VAMs, causality could run from principal opinions to estimated value added rather than the reverse. More relevant to the current analysis, Kane and Staiger (2008) demonstrate that VAM estimates from observational data are approximately unbiased predictors of teachers’ effects when students are randomly assigned. Although I examine a question closely related to that considered by Kane and Staiger, my larger and more representative sample permits me to extend their analysis in two ways. First, I have much more statistical power. This enables me to identify biases that are substantively important but that lie well within Kane and Staiger’s confidence intervals. Second, my sample resembles the sort that would be used for any VAM intended as a teacher compensation or retention tool. In particular, it includes teachers specializing in students (e.g., late readers) who cannot be readily identified and excluded from large-scale analyses. The likely exclusion of such teachers from Kane and Staiger’s sample quite plausibly avoids the most severe biases in observational VAM estimates.4 3. Of course, these sorts of tests cannot diagnose all model violations. If treatment assignments depend on unobserved determinants of future outcomes that are uncorrelated with the outcome history, the treatment effect estimator may be biased even though treatment is uncorrelated with past outcomes. 4. In the Kane and Staiger experiment, principals were given the name of one teacher and asked to identify a comparison teacher such that it would be appropriate to randomly assign students within the pair. One imagines that principals generally chose a comparison who was assigned similar students as the focal teacher in the preexperimental data. Moreover, a substantial majority of principals declined to participate, perhaps because the initial teacher was a specialist for whom no similar comparison could be found.

180

QUARTERLY JOURNAL OF ECONOMICS

III. STATISTICAL MODEL AND METHODS This section develops the statistical framework for VAM analysis and introduces my tests. I begin by defining the parameters of interest in Section III.A. In Section III.B, I introduce the three VAMs that I consider. Section III.C describes the exclusion restrictions that the VAM requires to permit identification of the causal effects of interest and develops the implications of these restrictions for the relationship between the current teacher and lagged outcome. Section III.D discusses the implementation of the tests. III.A. Defining the Problem I take the parameter of interest in value added modeling to be the effect on a student’s test score at the end of grade g of being assigned to a particular grade-g classroom rather than to another classroom at the same school. Later, I extend this to look at dynamic treatment effects (that is, the effect of the grade-g classroom on the g + s score). I do not distinguish between classroom and teacher effects, and use the terms interchangably. In the Online Appendix, I consider this distinction, defining a teacher’s effect as the time-invariant component of the effects of the classrooms taught by the teacher over several years. The basic conclusions are unaffected by this redefinition. I am interested in whether common VAMs identify classroom effects with arbitrarily large samples. I therefore sidestep smallsample issues by considering the properties of VAM estimates as the number of students grows with the number of teachers (and classrooms) fixed.5 If classroom effects are identified under these unrealistic asymptotics, VAMs may be usable in compensation and retention policy with appropriate allowances for the sampling errors that arise with finite class sizes;6 if not, these corrections are likely to go awry. A final important distinction is between identification of the variance of teacher quality and identification of individual teachers’ effects. I focus exclusively on the latter. It is impractical 5. Under realistic asymptotics, the number of classrooms should rise in proportion to the number of students. If so, classroom effects are not identified under any exogeneity restrictions: Even in the asymptotic limit, the number of students per teacher remains finite and the sampling error in an individual teacher’s effect remains nontrivial. 6. A typical approach shrinks a teacher’s estimated effect toward the population mean in proportion to the degree of imprecision in the estimate. The resulting empirical Bayes estimate is the best linear predictor of the teacher’s true effect, given the noisy estimate. See McCaffrey et al. (2003, pp. 63–68).

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

181

to report each of several thousand teachers’ estimated effects, however. I therefore report only the implied standard deviations (across teachers) of teachers’ actual and counterfactual effects, along with tests of the hypothesis that the teacher effects are all zero.7 III.B. Data Generating Process and the Three VAMs I develop the three VAMs and the associated tests in the context of a relatively general educational production function, modeled on those used by Todd and Wolpin (2003) and Harris and Sass (2006), that allows student achievement to depend on the full history of inputs received to date plus the student’s innate ability. Separating classroom effects from other inputs, I assume that the test score of student i at the end of grade g, Aig , can be written as (1)

Aig = αg +

g h=1

βhgc(i,h) + μi τg +

g

εihφhg + vig .

h=1

Here, βhgc is the effect of being in classroom c in grade h on the grade-g test score, and c (i, h) ∈ {1, . . . , Jh} indexes the classroom to which student i is assigned in grade h. μi is individual ability. We might expect the achievement gap between high-ability and low-ability students to grow over time; this would correspond to τk > τg > 0 for each k > g. εih captures all other inputs in grade h, including those received from the family, nonclassroom peers, and the community. It might also include developmental factors: A precocious child might have positive εs in early grades and negative εs in later grades as her classmates caught up. As this example shows, ε is quite likely to be serially correlated within students across grades. Finally, vig represents measurement error in the grade-g test relative to the student’s “true” grade-g achievement. This is independent across grades within students.8 A convenient restriction on the time pattern of classroom effects is uniform geometric decay, βhg c = βhgc λg −g for some 0 ≤ λ ≤ 1 and all h ≤ g ≤ g . A special case is λ = 1, corresponding to perfect persistence. Although my results do not depend on these restrictions, I impose them as needed for notational simplicity. 7. Rivkin, Hanushek, and Kain (2005) develop a strategy for identifying the variance of teachers’ effects, but not the effect of individual teachers, under weaker assumptions than are required by the VAMs described below. 8. I define the β parameters to include any classroom-level component of vig and assume that vig is independent across students in the same classroom.

182

QUARTERLY JOURNAL OF ECONOMICS

I consider nonuniform decay in Section VII. Note that there is no theoretical basis for restrictions on the decay of nonclassroom effects (i.e., on φhg ). Itwill be useful to adopt some simplifying notation. Let g ωig ≡ h=1 εihφhg be the composite grade-g residual achievement, and let indicate first differences across student grades: βhgc ≡ βhgc − βh,g−1,c , τg ≡ τg − τg−1 , ωig ≡ ωig − ωig−1 , and so on. Tractable VAMs amount to decompositions of Aig (or, more commonly, of Aig ≡ Aig − Aig−1 ) into the current teacher’s effect βggc(i,g) , a student heterogeneity component, and an error assumed to be orthogonal to the classroom assignment. Models differ in the form of this decomposition. In this paper I consider three specifications: A simple regression of gain scores on grade and contemporaneous classroom indicators, VAM1: Aig = αg + βggc(i, g) + e1ig ; a regression of score levels (or, equivalently, of gains) on classroom indicators and the lagged score, VAM2: Aig = αg + Aig−1 λ + βggc(i, g) + e2ig ; and a regression that stacks gain scores from several grades and adds student fixed effects, VAM3: Aig = αg + βggc(i, g) + μi + e3ig . All three are widely used.9 VAM2 and VAM3 can both be seen as generalizations of VAM1: Constraining λ = 1 converts VAM2 to VAM1, whereas constraining μi ≡ 0 converts VAM3. III.C. Exclusion Restrictions and Falsification Tests Despite their similarity, the three VAMs rely on quite distinct restrictions on the process by which students are assigned to classrooms. I discuss the three in turn. 9. The most widely used VAM, the Tennessee Value Added Assessment System (TVAAS; see Sanders, Saxton, and Horn [1997]), is specified as a mixed model for level scores that depend on the full history of classroom assignments, but this model implies an equation for annual gain scores of the form used in VAM1. VAM2 is more widely used in the recent economics literature. See, for example, Aaronson, Barrow, and Sander (2007); Goldhaber (2007); Jacob and Lefgren (2008); and Kane, Rockoff, and Staiger (2008). VAM3 was proposed by Boardman and Murnane (1979) and has been used recently by Rivkin, Hanushek, and Kain (2005); Harris and Sass (2006); Boyd et al. (2007); and Jacob and Lefgren (2008).

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

183

The Gain Score Model (VAM1). First-differencing the production function (1), we can write the grade-g gain score as (2) Aig = αg +

g−1

βhgc(i,h) + βggc(i,g) + μi τg + ωig + vig .

h=1

If we assume that teacher effects do not decay, βhgc = 0 for all h < g. The error term e1ig from VAM1 then has three components: e1ig = μi τg + ωig + vig . VAM1 will yield consistent estimates of the grade-g classroom effects only if, for each c, (3)

E[e1ig | c(i, g) = c] = 0.

The most natural model that is consistent with (3) is for assignments to depend only on student ability, μi , and for ability to have the same effect on achievement in grades g and g − 1 (i.e., τg = 0). With these restrictions, VAM1 can be seen as the firstdifference estimator for a fixed effects model, with strict exogeneity of classroom assignments conditional on μi . By contrast, (3) is not likely to hold if c (i, g) depends, even in part, on ωig−1 , vig−1 , or Aig−1 . Differences in last year’s gains across this year’s classrooms are informative about the exclusion restriction. Using (2), the average g − 1 gain in classroom c is (4) E[ Aig−1 | c(i, g) = c] = αg−1 + E[βg−1,g−1,c(i, g−1) | c(i, g) = c] + E[e1ig−1 | c(i, g) = c]. The first term is constant across c and can be neglected. The second term might vary with c if (for example) a principal compensated for a bad teacher in grade g − 1 by assignment to a better-than-average teacher in grade g. This can be absorbed by examining the across-c (i, g) variation in Aig−1 controlling for c (i, g − 1). I estimate specifications of this form below.10 Any 10. This is a test of the hypothesis that students are randomly assigned to grade-g classrooms conditional on the g − 1 classroom. This test is uninformative unless there is independent variation in c (i, g − 1) and c (i, g). To take one example, Nye, Konstantopoulos, and Hedges (2004) use data from the Tennessee STAR class size experiment to study teacher effects. In STAR, “streaming” was quite common, and in many schools there is zero independent variation in third grade classroom assignments controlling for second grade assignments. In this case, identification of teacher effects rests entirely on the assumption that past teachers’ effects do not decay.

184

QUARTERLY JOURNAL OF ECONOMICS

remaining variation across grade-g classrooms in g − 1 gains, after controlling for g − 1 classroom assignments, must indicate that students are sorted into grade-g classrooms on the basis of e1ig−1 . Sorting on e1ig−1 would not necessarily violate (3) if e1ig were not serially correlated. But the definition of e1ig above indicates four sources of potential serial correlation. First, ability μi appears in both e1ig and e1ig−1 (unless τg = 0). Second, the εig process may be serially correlated. Third, even if ε is white noise, ωig is a moving average of order g − 1 (absent strong restrictions on the φ coefficients). Finally, vig is an MA(1), degenerate only if var(v) = 0.11 Thus, (3) is not likely to hold if E[e1ig−1 | c(i, g)] is nonzero. The Lagged Score Model (VAM2). VAM2 frees up the coefficient on the lagged test score. If teacher effects decay geometrically at uniform rate 1 − λ, the grade-g score can be written in terms of the g − 1 score, (5)

Aig = αˇ g + Aig−1 λ + βggc(i, g) + e2ig ,

where αˇ g = αg − αg−1 λ. This can equivalently be expressed as a model for the grade-g gain, by subtracting Aig−1 from each side of (5). In either case, the error is (6) e2ig = μi τg − τg−1 λ + εih φhg − φhg−1 λ + εig + vig − vig−1 λ . g−1

h=1

As before, each of the terms in (6) is likely to be serially correlated. The exclusion restriction for VAM2 is that e2ig is uncorrelated with c (i, g) conditional on Aig−1 . This would hold if c (i, g) were randomly assigned conditional on Aig−1 . It is unlikely to hold if assignments depend on e2ig−1 or on any of its components (including μi ).12 As with the VAM1, I test the VAM2 exclusion restriction by 11. In Rothstein (2008), I conclude that vig accounts for as much as 80% of the variance of Aig . 12. Alternatively, if τg − τg−1 λ is constant across g, (5) can be seen as a fixed effects model with a lagged dependent variable. λ and βgg can be identified via IV or GMM (instrumenting for Aig−1 in a model for Aig ) if c (i, g) depends on μi but is strictly exogenous conditional on this (Anderson and Hsiao 1981; Arellano and Bond 1991). See, for example, Koedel and Betts (2007). Value added researchers typically apply OLS to (5). This is inconsistent for λ and identifies βggc only if c (i, g) is random conditional on Aig−1 .

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

185

reestimating the model with the g − 1 gain as the dependent variable. By rearranging the lag of (5), we can write the g − 1 gain as (7) Aig−1 = λ−1 αˇ g + Aig−1 (λ − 1) + βg−1,g−1,c(i,g−1) + e2ig−1 . Thus, the grade-g classroom assignment will have predictive power for the gain in grade g − 1, controlling for the g − 1 achievement level, if grade-g classrooms are correlated either with the g − 1 teacher’s effect (i.e., with βg−1,g−1,c(i,g−1) ) or with e2ig−1 .13 As in VAM1, the former can be ruled out by controlling for g − 1 classroom assignments; the latter would indicate a violation of the VAM2 exclusion restriction if e2 is serially correlated. The Fixed Effects in Gains Model (VAM3). For the final VAM, we return to equation (2) and to the earlier assumption of zero decay of teachers’ effects.14 The student fixed effects used in VAM3 absorb any variation in μi (assuming that τg = 1 for each g). Thus, the VAM3 error term is e3ig = ωig + vig . The reliance on fixed effects, combined with the small time dimension of student data sets, means that VAM3 requires stronger assumptions than the earlier models. To avoid bias in the teacher effects βggc , even in large samples, teacher assignments must be strictly exogenous conditional on μi : E[e3ih | c(i, g)] = 0 for all g and all h (Wooldridge 2002, p. 253).15 Conditional strict exogeneity means that the same information, μi or some function of it, is used to make teacher assignments in each grade. This requires, in effect, that principals decide on classroom assignments for the remainder of a child’s career before she starts kindergarten. If teacher assignments are updated each year in response to the student’s performance during the previous year, strict exogeneity is violated. 13. The test can alternatively be expressed in terms of a model for the score level in g − 2. (Simply rearrange terms in (7).) The VAM2 exclusion restriction of random assignment conditional on Aig−1 will be rejected if the grade-g classroom predicts Aig−2 conditional on Aig−1 . 14. Although VAM1 and VAM2 can easily be generalized to allow for nonuniform decay, VAM3 cannot. 15. For practical value added implementations, it is rare to have more than three or four student grades, so asymptotics based on the g dimension are infeasible. One approach if strict exogeneity does not hold is to focus on the first difference of (2). OLS estimation of the first-differenced equation requires that c (i, g) be uncorrelated with e3ig−1 , e3ig , and e3ig+1 . Though this is weaker than strict exogeneity, it is difficult to imagine an assignment process that would satisfy one but not the other. If the OLS requirements are not satisfied, the only option is IV/GMM (see note 12), instrumenting for both the g and g − 1 classroom assignments. Satisfactory instruments are not apparent.

186

QUARTERLY JOURNAL OF ECONOMICS

As before, my test is based on analyses of the apparent effects of grade g teachers on gains in prior grades. Consider estimation of VAM1, without the student fixed effects that are added in VAM3. If teacher assignments depend on ability, this will bias the VAM coefficients and will lead me to reject the VAM1 exclusion restriction. But the conditional strict exogeneity assumption imposes restrictions on the coefficients from the VAM1 falsification test. Under this assumption, the only source of bias in VAM1 is the omission of controls for μi . As μi enters into every grade’s gain equation, grade-g teachers should have the same apparent effects on g − 2 gains as they do on g − 1 gains. An indication that these differ would indicate that omitted time-varying determinants of gains are correlated with teacher assignments, and therefore that assignments are not strictly exogenous. Following Chamberlain (1984), consider a projection of μ onto the full sequence of classroom assignments in grades 1 through G: (8)

μi = ξ1c(i,1) + · · · + ξGc(i,G) + ηi .

ξhc is the incremental information about μi provided by the knowledge that the student was in classroom c in grade h, conditional on classroom assignments in all other grades. Substituting (8) into (2), we obtain (9)

Aig = αg +

G

πhgc(i,h) + ηi + e3ig ,

h=1

where πggc = ξgc τg + βggc and πhgc = ξhc τg for h = g. Under conditional strict exogeneity, E[e3ih | c(i, 1), . . . , c(i, G)] = 0 for each h, and the fact that (8) is a linear projection ensures that ηi is uncorrelated with the regressors as well. An OLS regression of grade-g gains onto classroom indicators in grades 1 through G thus estimates the πhgc coefficients without bias. When G ≥ 3, the underlying parameters are overidentified. To see this, note that (10)

πhgc = ξhc τg = ξhc τg−1

τg τg = πh,g−1,c τg−1 τg−1

for all h > g: The coefficient for grade-h classroom c in a model of gains in grade g is proportional to the same coefficient in a model of gains in g − 1. If there are Jh grade-h classrooms in the sample, this represents Jh − 1 overidentifying restrictions on

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

187

the 2Jh elements of the vectors hg = {πhg1 . . . πhgJh } and hg−1 = {πh,g−1,1 . . . πh,g−1,Jh } .16 To test these restrictions, I estimate the the Jh-vector h and the scalars τ1 and τ2 that minimize (11) ˆ hg−1 ˆ hg−1

h τg−1 h τg−1

−1 − − , W D= ˆ hg ˆ hg h τg h τg

ˆ ˆ using the sampling variance of ( hg−1 hg ) as W. Under the null hypothesis of strict exogeneity, the minimized value D is distributed χ 2 with Jh − 1 degrees of freedom.17 If D is above the 95% critical value from this distribution, the null is rejected. Intuitively, the correlation between corresponding elements of the coefficient vectors hg−1 and hg , representing apparent “effects” of grade-h teachers on gains in grades g − 1 and g (g < h), should be 1 or −1 under the null; a correlation far from this would suggest that the exclusion restriction is violated. III.D. Implementation To put the three VAMs in the best possible light, I focus on estimation of within-school differences in classroom effects. For many purposes, one might want to make across-school comparisons. But students are not randomly assigned to schools, and those at one school may gain systematically faster than those at another for reasons unrelated to teacher quality. Random assignment to classrooms within schools is at least somewhat plausible. To isolate within-school variation, I augment each of the estimating equations discussed above with a set of indicators for the school attended.18 The tests for VAM1 and VAM2 then amount to tests of whether students are (conditionally) randomly assigned to 16. When G > 3, there are many such pairs of vectors that must be proportional. Even when G = 3, there are additional overidentifying restrictions created by similar proportionality relationships for teachers’ effects on future gains. These restrictions might fail either because strict exogeneity is violated or because teachers’ effects decay (that is, βhh = βhg for some g > h). I therefore focus on restrictions on the coefficients for teachers’ effects on past gains, as these provide sharper tests of strict exogeneity. 17. Although there are Jh + 2 unknown parameters, they are underidentified: Multiplying h by a constant and dividing τg−1 and τg by the same constant does not change the fit. 18. This makes W singular in (11). For the OMD analysis of VAM3, I drop the elements of πgh that correspond to the largest class at each school.

188

QUARTERLY JOURNAL OF ECONOMICS

classrooms within schools. They resemble tests of successful randomization in stratified experiments, treating schools as strata. Intuitively, I will reject random assignment if replacing a set of school indicators with grade-g grade classroom indicators adds more explanatory power for g − 1 gains than would be expected by chance alone. Let Sg and Tg be matrices of indicators for grade-g schools and classrooms. These are collinear, so to eliminate this I define T˜ g as the submatrix of Tg that results from excluding the columns corresponding to one classroom per school. The VAM1 test is based on a simple regression: (12)

Ag−1 = α + Sg δ + T˜ g β + e.

The identifying assumption of VAM1 is rejected if β = 0. I use a heteroscedasticity-robust score test (Wooldridge 2002, p. 60) to evaluate this. I also estimate versions of (12) that include controls for grade-(g − 1) classroom assignments. To test VAM2, I simply add a control for Ag−1 on the right-hand side of (12). It is clear from the definition of T˜ g that only schools with multiple classrooms per grade can contribute to the analysis. One might be concerned that schools with only two or three classrooms will be misleading, as even with random assignment of students to classrooms there will be substantial overlap in the composition of a student’s grade-g and grade-(g − 1) classrooms. The Online Appendix presents a Monte Carlo analysis of the VAM1 and VAM2 tests in schools of varying sizes. The VAM1 test has appropriate size even with just two classrooms per school, so long as the number of students per classroom is large. (Recall that I focus on large-class asymptotics.) With small classes, the asymptotic distribution of the test statistic is an imperfect approximation, and as a result the test over-rejects slightly. When there are twenty students per class, the test of VAM1 has size around 10%. With empirically reasonable parameter values, the VAM2 test performs similarly.19,20 19. When students are assigned to classrooms based on the lagged score and when this score incorporates implausibly high degrees of clustering at the fourth grade classrom level, the VAM2 test rejects at high rates even with large classes. This reflects my use of a test that assumes independence of residuals within schools. Unfortunately, it is not possible to allow for dependence, as clustered variance-covariance matrices are consistent only if the number of clusters grows with the number of parameters fixed (Kezdi 2004) and in my application, the number of parameters grows with the number of clusters. 20. Kinsler (2008) claims that the VAM3 test also overrejects in simulations. In personal communication, he reports that the problem disappears with large classes.

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

189

I also report the standard deviation of the teacher coefficients (the βs in (12)) themselves. The standard deviation of the estimated coefficients necessarily exceeds that of the true coefficients (those that would be identified with large samples of students per teacher, even if these are biased estimates of teachers’ true causal effects). Aaronson, Barrow, and Sander (2007) propose a simple estimator for the variance of the true coefficients across teachers. Let β be a mean-zero vector of true projection coefficients and let βˆ be an unbiased finite-sample estimate of β, with E[β (βˆ − β)] = 0. The variance (across elements) of β can be written as (13)

ˆ − E[(βˆ − β) (βˆ − β)]. E[β β] = E[βˆ β]

ˆ is simply the variance across teachers of the coefficient E[βˆ β] estimates.21 E[(βˆ − β) (βˆ − β)] is the average heteroscedasticityrobust sampling variance. I weight each by the number of students taught. Specifications that include indicators for classroom assignments in several grades simultaneously—such as that used for the test of VAM3—introduce two complications. First, the coefficients for teachers in different grades can only be separately identified when there is sufficient shuffling of students between classrooms. If students are perfectly streamed—if a student’s classmates in third grade are also his or her classmates in fourth grade—the third and fourth grade classroom indicators are collinear. I exclude from my samples a few schools where inadequate shuffling leads to perfect collinearity. Second, these regressions are difficult to compute, due to the presence of several overlapping sets of fixed effects. As discussed in the Online Appendix, this difficulty is avoided by restricting the samples to students who do not switch schools during the grades for which classroom assignments are controlled. IV. DATA AND SAMPLE CONSTRUCTION The specifications described in Section III require longitudinal data that track students’ outcomes across several grades, linked to classroom assignments in each grade. I use administrative data on elementary students in North Carolina public schools, assembled and distributed by the North Carolina 21. βˆ is normalized to have mean zero across teachers at the same school, and its variance is adjusted for the degrees of freedom that this consumes.

190

QUARTERLY JOURNAL OF ECONOMICS

Education Research Data Center. These data have been used for several previous value added analyses (see, e.g., Clotfelter, Ladd, and Vigdor [2006]; Goldhaber [2007]). I examine end-of-grade math and reading tests from grades 3 through 5, plus “pretests” from the beginning of third grade (which I treat as second grade tests). I standardize the scale scores separately for each subject–grade–year combination.22 The North Carolina data identify the school staff member who administered the end-of-grade tests. In the elementary grades, this was usually the regular teacher. Following Clotfelter, Ladd, and Vigdor (2006), I count a student–teacher match as valid if the test administrator taught a “self-contained” (i.e., all day, all subject) class for the relevant grade in the relevant year, if that class was not designated as special education or honors, and if at least half of the tests that the teacher administered were to students in the correct grade. Using this definition, 73% of fifth graders can be matched to teachers. In each of my analyses, I restrict the sample to students with valid teacher matches in all grades for which teacher assignments are controlled. I focus on the cohort of students who were in fifth grade in 2000–2001. Beginning with the population (N = 99,071), I exclude students who have inconsistent longitudinal records (e.g., gender changes between years); who were not in fourth grade in 1999– 2000; who are missing fourth or fifth grade test scores; or who cannot be matched to a fifth grade teacher. I additionally exclude fifth grade classrooms that contain fewer than twelve sample students or are the only included classroom at the school. This leaves my base sample, consisting of 60,740 students from 3,040 fifth grade classrooms and 868 schools. My analyses all use subsets of this sample that provide sufficient longitudinal data. In analyses of fourth grade gains, for example, I exclude students who have missing third grade scores or who were not in third grade in 1998–1999. In specifications that include identifiers for teachers in multiple grades, I further exclude students who changed schools between grades, plus a few schools where streaming produces perfect collinearity. Table I presents summary statistics. I show statistics for the population, for the base sample, and for my most restricted sample 22. The original score scale is meant to ensure that one point corresponds to an equal amount of learning at each grade and at each point in the within-grade distribution. Rothstein (2008) and Ballou (2009) emphasize the importance of this property for value added modeling. All of the results here are robust to using the original scale.

# of students # of schools 1 fifth grade teacher 2 fifth grade teachers 3–5 fifth grade teachers >5 fifth grade teachers # of fifth grade classrooms # of fifth grade classrooms w/valid teacher match Female (%) Black (%) Other nonwhite (%) Consistent student record (%) Complete test score record, G4–5 (%) G3–5 (%) G2–5 (%) Changed schools between G3 and G5 (%) Valid teacher assignment in grade 3 (%) grade 4 (%) grade 5 (%) Fr. of students in G5 class in same G4 class Fr. of students in G5 class in same G3 class [0.19] [0.15]

(2)

(1) 99,071 1,269 122 168 776 203 4,876 3,315 49 29 8 99 88 81 72 30 68 70 72 0.22 0.15

SD

Mean

Population

TABLE I SUMMARY STATISTICS

60,740 868 0 207 602 59 3,040 3,040 50 28 7 100 99 91 80 27 78 86 100 0.22 0.15

(3)

Mean

[0.17] [0.13]

(4)

SD

Base sample

23,415 598 0 122 440 36 2,116 2,116 51 23 6 100 100 100 100 0 100 100 100 0.30 0.28

(5)

Mean

[0.19] [0.18]

(6)

SD

Most restricted sample TEACHER QUALITY IN EDUCATIONAL PRODUCTION

191

0.11 0.09 0.04 0.00 −0.02 −0.02 −0.01 0.08 0.08 0.04 0.00 0.01 −0.02 −0.01

[0.97] [0.94] [0.97] [1.00] [0.70] [0.58] [0.55] [0.98] [0.95] [0.98] [1.00] [0.76] [0.59] [0.59]

(2)

(1) 0.14 0.11 0.07 0.09 −0.02 −0.01 0.01 0.12 0.11 0.07 0.07 0.00 −0.02 0.00

(3)

Mean

[0.96] [0.94] [0.97] [0.98] [0.69] [0.58] [0.55] [0.98] [0.94] [0.97] [0.97] [0.75] [0.59] [0.58]

(4)

SD

Base sample

0.20 0.19 0.20 0.20 0.00 0.01 −0.01 0.17 0.19 0.18 0.17 0.01 0.00 −0.02

(5)

Mean

[0.96] [0.91] [0.93] [0.94] [0.69] [0.56] [0.53] [0.98] [0.91] [0.93] [0.94] [0.75] [0.57] [0.57]

(6)

SD

Most restricted sample

Notes. Summary statistics are computed over all available observations. Test scores are standardized using all third graders in 1999, fourth graders in 2000, and fifth graders in 2001, regardless of grade progress. “Population” in columns (1) and (2) is students enrolled in fifth grade in 2001, merged with third and fourth grade records (if present) for the same students in 1999 and 2000, respectively. Columns (3) and (4) describe the base sample discussed in the text; it excludes students with missing fourth and fifth grade test scores, students without valid fifth grade teacher matches, fifth grade classes with fewer than twelve sample students, and schools with only one fifth grade class. Columns (5) and (6) further restrict the sample to students with nonmissing scores in grades 3–5 (plus the third grade beginning-of-year tests) and valid teacher assignments in each grade, at schools with multiple classes in each school in each grade and without perfect collinearity of classroom assignments in different grades.

Third grade (beginning of year) Third grade (end of year) Fourth grade (end of year) Fifth grade (end of year) Third grade gain Fourth grade gain Fifth grade gain Reading scores Third grade (beginning of year) Third grade (end of year) Fourth grade (end of year) Fifth grade (end of year) Third grade gain Fourth grade gain Fifth grade gain

Math scores

SD

Mean

Population

TABLE I (CONTINUED)

192 QUARTERLY JOURNAL OF ECONOMICS

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

193

(used for estimation of equation (9)). The last is much smaller than the others, largely because I require students to have attended the same school in grades 3 through 5 and to have valid teacher matches in each grade. Table I indicates that the restricted sample has higher mean fifth grade scores than the full population. This primarily reflects the lower scores of students who switch schools frequently.23 Average fifth grade gains are similar across samples. The Online Appendix describes each sample in more detail. As discussed above, my tests can be applied only if there is sufficient reshuffling of classrooms between grades. Table A2 in the Online Appendix shows the fraction of students’ fifth grade classmates who were also in the same fourth grade classes, by the number of fourth grade classes at the school. Complete reshuffling (combined with equal-sized classes) would produce 0.5 with two classes, 0.33 with three, and so on. The actual fractions are larger than this, but only slightly. In schools with exactly three fifth grade teachers, for example, 35% of students’ fifth grade classmates were also their classmates in fourth grade. In only 7% of multiple-classroom schools do the fourth and fifth grade classroom indicators have deficient rank. Table II presents the correlation of test scores and gains across grades and subjects. The table indicates that fifth grade scores are correlated above .8 with fourth grade scores in the same subject, whereas correlations with scores in earlier grades or other subjects are somewhat lower. Fifth grade gains are strongly negatively correlated with fourth grade levels and gains in the same subject and weakly negatively correlated with those in the other subject. The correlations between fifth and third grade gains are small but significant both within and across subjects. VAM3 is predicated on the notion that student ability is an important component of annual gains. Assuming that high-ability students gain faster, this would imply positive correlations between gains in different years. There is no indication of this in Table II. One potential explanation is that noise in the annual tests introduces negative autocorrelation in gains, but I conclude elsewhere (Rothstein 2008) that even true gains are negatively 23. Table I shows that average third and fourth grade scores in the “population” are well above zero. The norming sample that I use to standardize scores in each grade consists of all students in that grade in the relevant year (i.e., of all third graders in 1999), whereas only those who make normal progress to fifth grade in 2001 are included in the sample for columns (1) and (2). The low scores of students who repeat grades account for the discrepancy.

194

QUARTERLY JOURNAL OF ECONOMICS TABLE II CORRELATIONS OF TEST SCORES AND SCORE GAINS ACROSS GRADES Correlations Summary statistics Fifth grade score Fifth grade gain Mean

SD

Math

Reading

(1)

(2)

(3)

(4)

(5)

(6)

(7)

1.00 0.97 0.95 0.97

1 .84 .80 .71

.78 .73 .70 .64

.29 −.27 −.02 .00

.08 −.07 −.03 −.03

70,740 61,535 57,382 50,661

1.00 0.97 0.95 0.99

.78 .73 .70 .59

1 .82 .78 .65

.10 −.05 −.01 .00

.31 −.29 −.05 −.05

70,078 61,535 57,344 50,629

0.55 0.58 0.70

.29 .11 .08

.10 .07 .05

1 −.41 −.02

.25 −.07 .01

61,349 56,171 50,615

0.58 0.59 0.75

.08 .08 .09

.31 .10 .10

.25 −.08 −.01

1 −.41 .02

60,987 56,159 50,558

Math scores G5 0.02 G4 0.07 G3 0.09 G3 pretest 0.08 Reading scores G5 0.01 G4 0.06 G3 0.09 G3 pretest 0.08 Math gains G4–G5 0.01 G3–G4 −0.01 G2–G3 0.02 Reading gains G4–G5 0.00 G3–G4 −0.02 G2–G3 0.02

Math Reading

N

Notes. Each statistic is calculated using the maximal possible sample of valid student records with observations on all necessary scores and normal grade progress between the relevant grades. Column (7) lists the sample size for each row variable; correlations use smaller samples for which the column variable is also available. Italicized correlations are not different from zero at the 5% level.

autocorrelated. This strongly suggests that VAM3 is poorly suited to the test score data generating process. V. RESULTS Tables III, IV, and V present results for the three VAMs in turn. I begin with VAM1, in Table III. I regress fifth grade math and reading gains (in columns (1) and (2), respectively) on indicators for fifth grade schools and classrooms, excluding one classroom per school. In each case, the hypothesis that all of the classroom coefficients are zero (i.e., that classroom indicators have no explanatory power beyond that provided by school indicators) is decisively rejected. The VAM indicates that the withinschool standard deviations of fifth grade teachers’ effects on math and reading are 0.15 and 0.11, respectively. This is similar to what

0.160 0.113 μ M or μ A < μ M . In the case where μ A > μ M (μ A < μ M ), the chairman is hawkish (dovish) in the sense that, conditional on inflation, inflation volatility, and the output gap, he prefers a higher (lower) interest rate than M. Note that this specification does not require the chairman to be the most hawkish (or dovish) member of the committee, and there may be members that systematically prefer higher (lower) interest rates than the chairman. The identity of the chairman is assumed to be fixed over time. For the sake of exposition, this section focuses on the case of the hawkish chairman only, but the dovish case is perfectly symmetric. The voting protocol is the following. In each meeting, given the current status quo qt = it−1 , the chairman proposes an interest rate it under closed rule. That is, the other committee members can either accept or reject the chairman’s proposal. If the proposal passes (i.e., it obtains at least (N + 1)/2 votes), then the proposed policy is implemented and becomes the status quo for next meeting. If the proposal is rejected, then the status quo is maintained and it = it−1 . This procedure is repeated at the next meeting. As in the consensus model, individuals vote as if they were pivotal and disregard the consequences of their voting decisions for future meetings via the status quo. Thus, members accept a proposal whenever the current utility from the proposal is larger than or equal to the utility from the current status quo, and the chairman picks the policy closest to his ideal point among those that are acceptable to a majority of (N + 1)/2 members. This voting game is well-known in the political economy literature and was originally derived by Romer and Rosenthal (1978) under the assumption 20. The case μ A = μ M is trivial in that it always delivers the median outcome, and it is therefore observationally equivalent to the protocols studied in the next section.

MONETARY POLICY BY COMMITTEE

379

of symmetric preferences.21 Here, instead, the induced utilities of all members other than the median are single-peaked but not symmetric. In principle, this lack of symmetry may imply that proposals are accepted by a coalition that excludes the median. The proof of Proposition 2 ensures that this is not the case. Define ϒ(st , qt , ωt ) to be the political aggregator in the agendasetting game. The following proposition establishes the policy outcome under this protocol. PROPOSITION 2. The policy outcome in the agenda-setting model with μ A > μ M is given by (10)

it = ϒ(st , qt , ωt ) ⎧ ∗ ⎪ i A,t , ⎪ ⎪ ⎨q , t = ∗ ⎪ 2i ⎪ M,t − qt , ⎪ ⎩ ∗ i A,t ,

if qt > i ∗A,t , ∗ if iM,t ≤ qt ≤ i ∗A,t ,

∗ ∗ if 2iM,t − i ∗A,t ≤ qt < iM,t , ∗ ∗ if qt < 2iM,t − i A,t .

Proof. The proof consists of the following steps. Step 1. Let V j (.) denote the indirect utility of member j as a function of the interest rate and let it denote the current proposal. We show that V j (it ) − V j (qt ) is increasing in μ j for all it and qt ∗ such that qt ≤ iM,t ≤ it . The difference of the expected payoff of committee member j associated with interest rates it and qt is V j (it ) − V j (qt ) exp(t )(exp(−μ j α1 β2 qt ) − exp(−μ j α1 β2 it )) + μ j α1 β2 (qt − it ) , = μ2j where t = (1 + α1 β2 )πt + α1 β2 ι + α1 (1 + β1 )yt + μ2j σπ2 /2 + γ ut + ∗ ≤ it , it can be shown that a sufficient, but ς α1 vt . When qt ≤ iM,t not necessary, condition for V j (it ) − V j (qt ) to be increasing in μ j is that μ2j σπ2 ≥ 2 for all j > M. In the rest of the proof we will assume that this condition is verified. Step 2. First, when qt ∈ (i ∗A, i], the agenda setter proposes i ∗A,t , which is accepted by all members j such that μ j ≤ μ A. This follows ∗ , i ∗A,t ], from the indirect utility being single-peaked. When qt ∈ [iM,t the agenda setter cannot increase the interest rate. The best proposal among the acceptable ones is the status quo, which 21. The agenda-setting model has been used to study monetary policy making by Riboni and Ruge-Murcia (2008c) and Montoro (2006).

380

QUARTERLY JOURNAL OF ECONOMICS

∗ ∗ is always accepted. When qt ∈ [2iM,t − i ∗A,t , iM,t ), the set of policies ∗ − qt ]. By that the median accepts is given by the interval [qt , 2iM,t Step 1, we know that these proposals are accepted by all members ∗ − qt , j such that μ j ≥ μ M and that any proposal greater than 2iM,t which is rejected by the median, is also rejected by all members j ∗ − i ∗A,t ), the agenda such that μ j ≤ μ M . Finally, when qt ∈ [0, 2iM,t ∗ setter is again able to propose i A, which is accepted by the median. By Step 1 this proposal is also accepted by all members j such that μ j ≥ μ M .

The political aggregator as a function of qt is plotted in Panel B of Figure II. The policy aggregator for the case of a dovish chairman is plotted in Panel C of the same figure, and it is easy to see that it is the mirror image of the one derived here for the hawkish chairman. Control over the agenda of the part of the chairman implies deviations from the median outcome. This is due to the fact that the chairman can propose the policy he prefers, among those alternatives that at least a majority of committee members (weakly) prefer to the status quo. Among the acceptable alternatives, there is no reason to expect the chairman to propose the median outcome. Moreover, deviations from the median outcome are systematically in one direction. That is, they will always bring the policy outcome closer to the policy preferred by the chairman. As before, there is an interval of status quo policies for which policy change is not possible (i.e., a gridlock interval). This interval ∗ , i ∗A,t ], that is, all policies between the interest is given by [iM,t rate preferred by the median and the chairman. If the status quo falls within this interval, policy changes are blocked by either the chairman or a majority of committee members. To see this, ∗ , i ∗A,t ], a majority would veto any increase note that when qt ∈ [iM,t of the instrument value towards i ∗A and proposing the status quo is then the best option for the chairman. The width of the gridlock interval is increasing in the distance between the chairman’s and the median’s preferred interest rates. A policy change occurs only if the status quo is sufficiently extreme, compared with the members’ preferred policies. In par∗ ∗ − i ∗A,t , iM,t ), the chairticular, when qt falls in the interval [2iM,t man chooses the policy closest to his or her ideal point subject to the constraint that M will accept it. This constraint is binding at equilibrium, meaning that M will be indifferent between the status quo and the interest rate that A proposes. Because the median has a symmetric induced utility (recall that μ M = 0), this proposal

MONETARY POLICY BY COMMITTEE

381

∗ is the reflection point of qt with respect to iM,t . When the status ∗ ∗ quo policy is either lower than 2iM,t − i A,t or higher than i ∗A,t , the chairman is able to offer and pass the proposal that coincides with his ideal point. In the rest of this section, we compare the theoretical predictions of the consensus and agenda-setting models. First, both models deliver a gridlock interval where it is not possible to change the status quo. However, it is difficult to predict a priori which voting procedure features the largest gridlock interval, because the comparison depends on the degree of consensus that the committee requires (summarized by K) and on the extent of disagreement between the chairman and the median. The intersection of the two ∗ must belong to both gridlock intervals is nonempty, given that iM,t intervals. In principle, the gridlock interval in the agenda-setting model could be a strict subset of the one in the consensus model if |μ A − μ M | were sufficiently small and K sufficiently large, but the converse cannot happen. Second, whenever the committee decides to change the status quo, the models deliver different predictions with respect to the size of the policy change. The agenda-setting model with hawkish (dovish) chairman yields more aggressive interest rate increases (decreases) than the other two models. For example, suppose that ∗ , the agenda-setting the chairman is a hawk. Then, when qt < iM,t model unambiguously predicts a larger policy change than the ∗ , the comparison is amconsensus model. Instead, when qt ≥ iM,t biguous, and the size of the interest rate decrease depends on the ∗ . location of i ∗A,t versus iM+K,t Finally, note that under both protocols, the endpoints of the gridlock interval are stochastic and depend on the current state of the economy. An implication of the predicted local inertia is that the relation between changes in the state of nature and in policy is nonlinear. In particular, small changes in the state of economy are less likely to produce policy changes compared with larger ones. Empirically, this would mean, for example, that small variations in the rates of inflation and unemployment would be less likely to result in a change in the key nominal interest rate, compared with large movements in these variables.

III.C. The Frictionless Model This model is used to describe two protocols, namely the median and the dictator models, which involve different decisionmaking processes, but deliver essentially the same empirical

382

QUARTERLY JOURNAL OF ECONOMICS

implications for the nominal interest rate. In particular, both protocols predict that regardless of the initial status quo, the committee will adopt the interest rate preferred by one key individual: the median member (in the median model) or the chairman (in the dictator model). The protocols are, therefore, frictionless in the sense that the status quo plays no role in determining the current interest rate. Consider first the median model, which is the standard framework of analysis in political economy. Under standard conditions, which are all satisfied in our setting, the Median Voter Theorem (Black 1958) implies a unique core outcome represented by the alternative preferred by the individual whose ideal point constitutes the median of the set of ideal points. Although in its original formulation the Median Voter Theorem lacks a noncooperative underpinning, notice that the median outcome may be obtained as a special case of the consensus-based model when a simple majority is needed to pass a proposal. Applying Proposition 1 to the case where K = 0 (that is, when the required majority equals (N + 1)/2), it is easy to see that, starting from any status quo policy, the interest rate preferred by the median is always selected. Consider now the dictator model. Under this protocol, the chairman, denoted by C, has absolute power over the committee and is able to impose his or her views at every meeting. Hence, the interest rate selected by the committee is the one preferred by the chairman. In this respect, the chairman has much greater power than in the agenda-setting model, where the chairman fully controls the agenda but is subject to an acceptance constraint because a majority is required to pass a proposal. Absent any friction in the political process, both the median and dictator models predict that, within each meeting and starting from any status quo, the committee adopts (11)

∗ = aS + bπt + cyt + ζt , iS,t

where S equals M or C, depending on the model. (Recall that M and C, respectively, stand for the median and chairman.) It is important to note that in a frictionless model there is neither inertia nor path dependence. Having a committee is then equivalent to having either the median or the chairman as single central banker and, therefore, the reaction function is observationally indistinguishable from a standard Taylor rule derived under the assumption that monetary policy is selected by one individual. This

MONETARY POLICY BY COMMITTEE

383

model predicts a proportional adjustment of the policy instrument in response to any change in inflation and unemployment, regardless of their size, and generates interest rate autocorrelation only from the serial correlation of the fundamentals. The policy outcome predicted by the frictionless model is plotted in Panel D of Figure II. In all the panels of this figure, the size of the policy change may be inferred from the vertical distance between the policy rule and the 45◦ line. IV. EMPIRICAL ANALYSIS IV.A. The Data The data set consists of interest rate decisions by monetary policy committees in five central banks, namely the Bank of Canada, the Bank of England, the ECB, the Swedish Riksbank, and the U.S. Federal Reserve, along with measures of inflation and the output gap in their respective countries. Inflation is measured by the twelve-month percentage change of the Consumer Price Index (Canada and Sweden), the Retail Price Index excluding mortgage-interest payments or RPIX (United Kingdom), the Harmonized Consumer Price Index (European Union), and the Consumer Price Index for All Urban Consumers (United States).22 The output gap is measured by the deviation of the seasonally adjusted unemployment rate from a trend computed using the Hodrick–Prescott filter. Interest rate decisions concern the target values for the Overnight Rate (Canada), the Repo Rate (United Kingdom and Sweden), the Rate for Main Refinancing Operations (European Union), and the Federal Funds Rate (United States). For the Federal Reserve, the sources are Chappell, McGregor, and Vermilyea (2005) and the minutes of the FOMC meetings, which are available at www.federalreserve.gov. For the Riksbank, the source is the minutes of the meetings of the Executive Board, which are available at www.riksbank.com. For the other central banks, the sources are official press releases compiled by the authors. The sample for Canada starts with the first preannounced date for monetary policy decisions in December 2000 and ends in March 2007. The sample for the United Kingdom starts with the 22. Since December 2003, the inflation target in the United Kingdom applies to the consumer price index (CPI) rather than to the RPIX. However, results using the CPI are similar to the ones reported below and are available upon request.

384

QUARTERLY JOURNAL OF ECONOMICS

first meeting of the Monetary Policy Committee in June 1997 and ends in June 2007. The sample for the European Union starts on January 1999, when the ECB officially took over monetary policy from the national central banks, and ends in March 2007. The sample for Sweden starts with the first meeting of the Executive Board on January 1999 and ends in June 2007. The sample for the United States starts in August 1988 and ends in January 2007. This period corresponds to the chairmanship of Alan Greenspan, with a small number of observations from the chairmanship of Ben Bernanke.23 The number of scheduled meetings per year varies from seven or eight (Bank of Canada, Riksbank, and Federal Reserve) to eleven (ECB) and twelve (Bank of England). There is substantial heterogeneity in the formal procedures followed by the monetary policy committees in our sample. The Governing Council of the Bank of Canada consists of the Governor and five Deputy Governors and explicitly operates on a consensus basis. This means that the discussion at the meeting is expected to move the committee toward a shared view. The Monetary Policy Committee of the Bank of England consists of nine members of whom five are internal (that is, chosen from within the ranks of bank staff) and four are external appointees. Meetings are chaired by the Governor of the Bank of England, decisions are made by simple majority, and dissenting votes are public. The decision-making body of the ECB consists of six members of the Executive Board and thirteen governors of the national central banks. According to the statutes (see footnote 3 above), monetary policy is decided by simple majority rule. The ECB issues no minutes and, consequently, dissenting opinions are not made public. Under the Riksbank Act of 1999, the Swedish Riksbank is governed by an Executive Board, which includes the Governor and five Deputy Governors, and decisions concerning the Repo Rate are made by majority vote, but formal reservations against the majority decision are recorded in the minutes. Finally, the FOMC takes decisions by majority rule among the seven members of the Board of Governors, the president of the New York Fed, and four members of the remaining district banks, chosen according to an annual rotation scheme. The minutes of FOMC meetings are made public. However, unlike the Riksbank and 23. The working paper version of this article (Riboni and Ruge-Murcia 2008a) also reports results for a U.S. subsample from February 1970 to February 1978, which corresponds to the chairmanship of Arthur Burns. The conclusions drawn from that subsample are the same as those reported here.

MONETARY POLICY BY COMMITTEE

385

the Bank of England, dissenting members in the FOMC do not always state the exact interest rate they would have preferred, but only the direction of dissent (either tightening or easing). IV.B. Formulation of the Likelihood Functions This section shows that the political aggregators derived in Section II imply particular time-series processes for the nominal interest rate and presents their log-likelihood functions under the maintained assumption that shocks are normally distributed.24 First, consider the consensus-based model. The political aggregator (9) in Proposition 1 means that the nominal interest rate follows a nonlinear process whereby each observation may belong to one of three possible regimes depending on whether the sta∗ ∗ , smaller than iM−K,t , or in tus quo (qt = it−1 ) is larger than iM+K,t between these two values. In the first case, the committee cuts ∗ ; in the second case, it raises the inthe interest rate to iM+K,t ∗ terest rate to iM−K,t ; in the third case, it keeps the interest rate unchanged. Because the data clearly show the instances where the committee takes each of these three possible actions, it follows that the sample separation is perfectly observable and each interest rate observation can be unambiguously assigned to its respective regime. Define the set t = {it−1 , πt , yt } with the predetermined variables at time t, and the sets 1 , 2 and 3 that contain the observations where the interest rate was cut, left unchanged, and raised, respectively. Denote by T1 , T2 , and T3 the number of observations in each of these sets and by T (= T1 + T2 + T3 ) the total number of observations. Then the log likelihood function of the T available interest rate observations is simply L(θ ) = −(T1 + T3 )σ + log φ(zM+K,t ) +

it ∈2

it ∈1

log((zM−K,t ) − (zM+K,t )) +

log φ(zM−K,t ),

it ∈3

where θ = {aM+K , aM−K , b, c, σ } is the set of unknown parameters, zM+K,t = (it−1 − aM+K − bπt − cyt )/σ , zM−K,t = (it−1 − aM−K − bπt − cyt )/σ , and φ(·) and (·) are the probability density and cumulative distribution functions of the standard normal variable, respectively. The maximization of this function with respect 24. For the detailed derivation of these functions, see Section 4.2 in the working paper version of this article (Riboni and Ruge-Murcia 2008a).

386

QUARTERLY JOURNAL OF ECONOMICS

to θ delivers consistent maximum likelihood (ML) estimates of the parameters of the interest rate process under the consensus model. The log likelihood function of the consensus model is similar to the one studied by Rosett (1959), who generalizes the two-sided Tobit model to allow the mass point to be anywhere in the conditional cumulative distribution function. In both models, the dependent variable reacts only to large changes in the fundamentals. However, whereas Rosett’s frictional model is static and the mass point is concentrated around a fixed value, the consensus-based model is dynamic and the mass point is concentrated around a time-varying and endogenous value, albeit predetermined at the beginning of the meeting. Second, consider the agenda-setting model with a hawkish chairman. (The case of the dovish chairman is isomorphic and not presented here to save space.) The political aggregator (10) in Proposition 2 means that the nominal interest rate follows a nonlinear process where each realization belongs to one of four possible regimes, rather than three, as in the consensus model. In the case where it−1 is larger than i ∗A,t , the committee cuts the interest rate to i ∗A,t , and the observation can be unambiguously ∗ and assigned to the set 1 . In the case where it−1 is between iM,t ∗ i A,t , the committee keeps the interest rate unchanged and the observation clearly belongs to 2 . However, in the case where it−1 ∗ is smaller than iM,t (for example, as a result of a sufficiently large realization of ζt ), the agenda setter may propose an interest rate ∗ − it−1 or i ∗A,t depending on whether the increase to either 2iM,t acceptance constraint is binding or not. Although the observation can be assigned to 3 , one cannot be sure which of the two regimes ∗ − it−1 or i ∗A,t ) has generated it . The reason is simply (whether 2iM,t that on the basis of interest rate data alone, it is not possible to know ex ante whether the acceptance constraint is binding or not. Hence, in the agenda-setting model, the sample separation is imperfect. The log likelihood function of the T available observations is

L(ϑ) = −(T1 + T3 )σ + +

it ∈3

it ∈1

log φ(z A,t ) +

log((zM,t ) − (z A,t ))

it ∈2

log(φ(z A,t )I(wt ) + (1/2)φ(zD,t )(1 − I(wt ))),

MONETARY POLICY BY COMMITTEE

387

where ϑ = {aA, aM , b, c, σ } is the set of unknown parameters, z A,t = (it−1 − aA − bπt − cyt )/σ, zM,t = (it−1 − aM −bπt − cyt )/σ, zD,t = (it − 2(aM + bπt + cyt ) + it−1 )/σ , wt is short-hand for the condition it − it−1 − 2(aA − aM ) < 0, and I(·) is an indicator function that takes the value 1 if its argument is true and zero otherwise. The terms in the latter summation show that for interest rate increases, the density is a mixture of the two normal distributions ∗ − it−1 and i ∗A,t . The weights associated with the processes 2iM,t of this mixture take either the value zero or one because the disturbance term is the same in both processes and, hence, these distributions are perfectly correlated. By maximizing this function with respect to ϑ, it is possible to obtain consistent ML estimates of the parameters of the interest rate process under the agenda-setting model.25 Finally, consider the frictionless model where it = aS + bπt + cyt + ζt and the log likelihood function of the T available observations is just L(ϕ) = −T σ + log φ(ZS,t ), ∀it

where ϕ = {aS , b, c, σ } is the set of unknown parameters and ZS,t = (it − aS − bπt − cyt )/σ . The maximization of this function with respect to ϕ delivers consistent ML estimates of the parameters of the interest rate process under the frictionless model. Notice, however, that with data on interest rate decisions alone, it is not possible to distinguish between the two possible interpretations of the frictionless model. IV.C. Empirical Results Tables I through V report empirical results for the monetary committees of the Bank of Canada, the Bank of England, the ECB, the Swedish Riksbank, and the U.S. Federal Reserve. Panel A in these tables reports maximum likelihood estimates of the parameters of the interest rate process under each protocol. Although some coefficients are not statistically significant, estimates for all protocols are generally in line with the theory in the sense that 25. The indicator function I(·) induces a discontinuity in the likelihood function and, consequently, this maximization requires either the use of a non-gradientbased optimization algorithm or a smooth approximation to the indicator function. We followed the latter approach here, but using the simulated annealing algorithm in Corana et al. (1987), which does not require numerical derivatives but is more time-consuming, delivers the same results.

388

QUARTERLY JOURNAL OF ECONOMICS TABLE I BANK OF CANADA Dominant chairman Consensus Hawkish

aM+K aM−K

3.175∗ (0.422) 1.357∗ (0.471)

aM aA aS b c σ

0.386∗ (0.174) −0.120 (0.304) 0.965∗ (0.124)

L(·) −58.76 AIC 127.53 RMSE 0.506 MAE 0.388 Chairman extracts all rents (p-value) Autocorrelation Standard deviation Proportion of Cuts Increases No changes Policy reversals

Dovish

With size Frictionless friction Data

A. Parameter estimates

1.314∗ (0.462) 3.162∗ (0.413)

3.257∗ (0.414) 1.377∗ (0.462)

2.618∗ (0.343)

2.604∗ (0.343)

0.381∗ (0.171) −0.112 (0.297) 0.942∗ (0.120)

0.388∗ (0.171) −0.135 (0.297) 0.941∗ (0.120)

0.231 (0.143) −0.041 (0.256) 0.845∗ (0.085)

0.237 (0.143) −0.036 (0.246) 0.845∗ (0.086)

B. Criteria for model selection −67.07 −67.53 −62.82 −77.38 144.14 145.06 133.64 162.75 0.631 0.867 0.850 0.992 0.502 0.561 0.715 0.844

QUARTERLY JOURNAL OF ECONOMICS Vol. CXXV

February 2010

Issue 1

FREE DISTRIBUTION OR COST-SHARING? EVIDENCE FROM A RANDOMIZED MALARIA PREVENTION EXPERIMENT∗ JESSICA COHEN AND PASCALINE DUPAS It is often argued that cost-sharing—charging a subsidized, positive price— for a health product is necessary to avoid wasting resources on those who will not use or do not need the product. We explore this argument through a field experiment in Kenya, in which we randomized the price at which prenatal clinics could sell long-lasting antimalarial insecticide-treated bed nets (ITNs) to pregnant women. We find no evidence that cost-sharing reduces wastage on those who will not use the product: women who received free ITNs are not less likely to use them than those who paid subsidized positive prices. We also find no evidence that costsharing induces selection of women who need the net more: those who pay higher prices appear no sicker than the average prenatal client in the area in terms of measured anemia (an important indicator of malaria). Cost-sharing does, however, considerably dampen demand. We find that uptake drops by sixty percentage points when the price of ITNs increases from zero to $0.60 (i.e., from 100% to 90% subsidy), a price still $0.15 below the price at which ITNs are currently sold to pregnant women in Kenya. We combine our estimates in a cost-effectiveness analysis of the impact of ITN prices on child mortality that incorporates both private and social returns to ITN usage. Overall, our results suggest that free distribution of ITNs could save many more lives than cost-sharing programs have achieved so far, and, given the large positive externality associated with widespread usage of ITNs, would likely do so at a lesser cost per life saved. ∗ We thank Larry Katz, the editor, and four anonymous referees for comments that significantly improved the paper. We also thank David Autor, Moshe Bushinsky, Esther Duflo, William Easterly, Greg Fischer, Raymond Guiteras, Sendhil Mullainathan, Mead Over, Dani Rodrik, and numerous seminar participants for helpful comments and suggestions. We thank the Mulago Foundation for its financial support, and the donors to TAMTAM Africa for providing the free nets distributed in this study. Jessica Cohen was funded by a National Science Foundation Graduate Research Fellowship. We are very grateful to the Kenya Ministry of Health and its staff for their collaboration. We thank Eva Kaplan, Nejla Liias, and especially Katharine Conn, Carolyne Nekesa, and Moses Baraza for the smooth implementation of the project and the excellent data collection. All errors are our own. [email protected], [email protected] C 2010 by the President and Fellows of Harvard College and the Massachusetts Institute of

Technology. The Quarterly Journal of Economics, February 2010

1

2

QUARTERLY JOURNAL OF ECONOMICS

I. INTRODUCTION Standard public finance analysis implies that health goods generating positive externalities should be publicly funded, or even subsidized at more than 100% if the private nonmonetary costs (such as side effects) are high. Although this analysis applies to goods whose effectiveness is independent of the behavior of the recipients (e.g., vaccines, deworming pills administered to schoolchildren), it does not necessarily apply to goods that require active usage (adherence) by their owner for the public health benefits to be realized (e.g., bed nets for reduced malaria transmission, pit latrines for reduced water contamination). For such goods, charging nonzero prices (“cost-sharing”) could improve the efficacy of public subsidies by reducing wastage from giving products to those who will not use them. There are three possible effects of positive prices on the likelihood that people who acquire the product use it appropriately. First, a selection effect: charging a positive price could select out those who do not value the good and place it only in the hands of those who are likely to use it (Oster 1995; Population Services International [PSI] 2003; Ashraf, Berry, and Shapiro forthcoming). Second, a psychological effect: paying a positive price for a good could induce people to use it more if they exhibited “sunk cost” effects (Thaler 1980; Arkes and Blumer 1985). Third, higher prices may encourage usage if they are interpreted as a signal of higher quality (Bagwell and Riordan 1991; Riley 2001). Although cost-sharing may lead to higher usage intensity than free distribution, it may also reduce program coverage by dampening demand. A number of experimental and field studies indicate that there may be special psychological properties to zero financial price and that demand may drop precipitously when the price is raised slightly above zero (Ariely and Shampan’er 2007; Kremer and Miguel 2007). Beyond reducing demand, selection effects are not straightforward in the context of credit and cash constraints: if people who cannot afford to pay a positive price are more likely to be sick and need the good, then charging a positive price would screen out the neediest and could significantly reduce the health benefits of the partial subsidy. In the end, the relative benefits of various levels of subsidization of health products depend on a few key factors: (1) the elasticity of demand with respect to price, (2) the elasticity of usage with respect to price (which potentially includes selection, psychological, and signaling effects), (3) the impact of price variation on the

FREE DISTRIBUTION OR COST SHARING?

3

vulnerability (i.e., need) of the marginal consumer, and, finally, (4) the presence of nonlinearities or externalities in the health production function.1 This paper estimates the first three parameters and explores the trade-offs between free distribution and cost-sharing for a health product with a proven positive externality: insecticidetreated bed nets (ITNs). ITNs are used to prevent malaria infection and have proven highly effective in reducing maternal anemia and infant mortality, both directly for users and indirectly for nonusers with a large enough share of users in their vicinity. The manufacture of ITNs is expensive, and the question of how much to subsidize them is at the center of a very vivid debate in the international community, opposing proponents of free distribution (Sachs 2005; World Health Organization [WHO] 2007) to advocates of cost-sharing (PSI 2003; Easterly 2006). In a field experiment in Kenya, we randomized the price at which 20 prenatal clinics could sell long-lasting ITNs to pregnant women. Four clinics served as a control group and four price levels were used among the other 16 clinics, ranging from 0 (free distribution) to 40 Kenyan shillings (Ksh) ($0.60). ITNs were thus heavily subsidized, with the highest price corresponding to a 90% subsidy, comparable to the subsidies offered by the major costsharing interventions operating in the area and in many other malaria-endemic African countries. To check whether women who need the ITN most are willing to pay more for it, we measured hemoglobin levels (a measure of anemia and an important indicator of malaria in pregnancy) at the time of the prenatal visit. To estimate the impact of price variation on usage, we visited a subsample of women at home a few months later to check whether they still had the nets and whether they were using them. The relationship between prices and usage that we estimate based on follow-up home visits is the combined effect of selection and sunk cost effects.2 To isolate these separate channels, we 1. There are other potential channels from the price of a health product to its health impact. For example, the price could influence how the product is cared for (e.g., a more expensive bed net could be washed too frequently, losing the efficacy of its insecticide) or could have spillover effects to other health behaviors. We focus on the four channels described because these are most commonly cited in the debate over pricing of public health products and likely to have first-order impacts on the relationship between prices and health outcomes. 2. The correlation between prices and usage is also potentially the product of signaling effects of prices, but this is unlikely in our context. Qualitative evidence suggests that the great majority of households in Kenya know that ITNs are subsidized heavily for pregnant women and young children and that the “true” price of ITNs (i.e., the signal of their value) is in the $4–$6 range. This is likely due to the fact that retail shops sell unsubsidized ITNs at these prices.

4

QUARTERLY JOURNAL OF ECONOMICS

follow Karlan and Zinman (forthcoming) and Ashraf, Berry, and Shapiro (forthcoming) and implement a randomized two-stage pricing design. In clinics charging a positive price, a subsample of women who decided to buy the net at the posted price were surprised with a lottery for an additional discount; for the women sampled for this second-stage lottery, the actual price ranged from 0 to the posted price. Among these women, any variation in usage with the actual price paid should be the result of psychological sunk cost effects. Taken together, both stages of this experimental design enable us to estimate the relative merits of free distribution and varying degrees of cost-sharing on uptake, selection and usage intensity. We find that uptake of ITNs drops significantly at modest costsharing prices. Demand drops by 60% when the price is increased from zero to 40 Ksh ($0.60). This latter price is still 10 Ksh ($0.15) below the prevailing cost-sharing price offered to pregnant women through prenatal clinics in this region. Our estimates suggest that of 100 pregnant women receiving an ITN under full subsidy, 25 of them would purchase an ITN at the prevailing cost-sharing price. Given the very low uptake at higher prices, the sample of women for which usage could be measured is much smaller than the initial sample of women included in the experiment, limiting the precision of the estimates of the effect of price on usage. Keeping this caveat in mind, we find no evidence that usage intensity is increasing with the offer price of ITNs. Women who paid the highest price were slightly more likely (though without statistical significance) to be using the net than women who received the net for free, but at intermediate prices the opposite was true, showing no clear relationship between the price paid and probability of usage, as well as no discontinuity in usage rates between zero and positive prices. Further, when we look only at women coming for their first prenatal care visits (the relevant long-run group to consider), usage is highest among women receiving the fully subsidized net. Women who received a net free were also no more likely to have resold it than women paying higher prices. Finally, we did not observe a second-hand market develop. Among both buyers of ITNs and recipients of free ITNs, the retention rate was above 90%. The finding that there is no overall effect of ITN prices on usage suggests that potential psychological effects of prices on usage are minor in this context, unless they are counteracted by opposite selection effects, which is unlikely. The second-stage randomization enables us to formally test for the presence of sunk-cost

FREE DISTRIBUTION OR COST SHARING?

5

effects (without potentially confounding selection effects) and yields no significant effect of the actual price paid (holding the posted price constant) on usage. This result is consistent with a recent test of the sunk-cost fallacy for usage of a water purification product in Zambia (Ashraf, Berry, and Shapiro forthcoming). In order to explore whether higher prices induce selection of women who need the net more, we measured baseline hemoglobin levels (anemia rates) for women buying/receiving nets at each price. Anemia is an important indicator of malaria, reflecting repeated infection with malaria parasites, and is a common symptom of the disease in pregnant women in particular. We find that prenatal clients who pay positive prices for an ITN are no sicker, at baseline, than the clients at the control clinics. On the other hand, we find that recipients of free nets are healthier at baseline than the average prenatal population observed at control clinics. We suspect this is driven by the incentive effect the free net had on returning for follow-up prenatal care before the benefits of the previous visit (e.g., iron supplementation) had worn off. Taken together, our results suggest that cost-sharing ITN programs may have difficulty reaching a large fraction of the populations most vulnerable to malaria. Although our estimates of usage rates among buyers suffer from small-sample imprecision, effective coverage (i.e., the fraction of the population using a program net) can be precisely estimated and appears significantly (and considerably) higher under free distribution than under a 90% subsidy. In other words, we can confidently reject the possibility that the drop in demand induced by higher prices is offset by an increase in usage. Because effective coverage declines with price increases, the level of coverage under cost-sharing is likely to be too low to achieve the strong social benefits that ITNs can confer. When we combine our estimates of demand elasticity and usage elasticity in a model of cost-effectiveness that incorporates both private and social benefits of ITNs on child mortality, we find that for reasonable parameters, free distribution is at least as cost-effective as partially but still highly subsidized distribution, such as the cost-sharing program for ITNs that was under way in Kenya at the time of this study. We also find that, for the full range of parameter values, the number of child lives saved is highest when ITNs are distributed free. Our results have to be considered in their context: ITNs have been advertised heavily for the past few years in Kenya, both by the Ministry of Health and by the social-marketing

6

QUARTERLY JOURNAL OF ECONOMICS

nongovernmental organization Population Services International (PSI); pregnant women and parents of young children have been particularly targeted by the malaria prevention messages; and most people (even in rural areas) are aware that the unsubsidized price of ITNs is high, thus reducing the risk that low prices through large subsidies are taken as a signal of bad quality. Our results thus do not speak to the debate on optimal pricing for health products that are unknown to the public. But if widespread awareness about ITNs explains why price does not seem to affect usage among owners, it makes the price sensitivity we observe all the more puzzling. Although large effects of prices on uptake have been observed in other contexts, they were found for less well-known products, such as deworming medication (Kremer and Miguel 2007) and contraceptives (Harvey 1994). Given the high private returns to ITN use and the absence of a detected effect of price on usage, the price sensitivity of demand we observe suggests that pregnant women in rural Kenya are credit- or saving-constrained. The remainder of the paper proceeds as follows. Section II presents the conceptual framework. Section III provides background information on ITNs and describes the experiment and the data. Section IV describes the results on price elasticity of demand, price elasticity of usage, and selection effects on health. Section V presents a cost-effectiveness analysis, and Section VI concludes. II. A SIMPLE MODEL OF PIGOUVIAN SUBSIDIES This section develops a simple model to highlight the parameters that must be identified by the experiment to determine the optimal subsidy level. Assume that ITNs have two uses: a health use, when the net is hung, and a nonhealth use, for which the net is not hung.3 Nonhealth uses could be using the net for fishing, or simply leaving it in its bag for later use, for example, when a previous net wears out. Health use of the ITNs generates positive health externalities but nonhealth uses do not. Purchasing a net for health or nonhealth purposes costs the same to the household. The price of a net to a household is the marginal cost C minus a subsidy T. We call h the number of nets used for health purposes and n the number of nets used for nonhealth purposes. The household 3. We thank an anonymous referee for suggesting this formalization.

FREE DISTRIBUTION OR COST SHARING?

7

utility is U = u(h) + v(n) − (C − T )(h + n) + kH, where u(h) is the utility from having hanging nets, with u ≥ 0 and u ≤ 0; v(n) is the utility from nonhanging nets, with v ≥ 0 and v ≤ 0; H is the average number of nets used for health purposes per household; and the constant k represents the positive health externality.4 When choosing how many nets to invest in, the household ignores the health externality and chooses h and n such that u (h) = v (h) = C − T . Increasing the size of the subsidy T increases households’ investment in nets for health use, and thus the health externality. Because the subsidy is common for all nets, however, increasing T might also affect households’ investment in nets for nonhealth use. Call N the average number of nets used for nonhealth purposes per household. The marginal cost of increasing the health externality is T × [d(H + N)/dT], whereas the marginal benefit is only k × (dH/dT). The efficient subsidy level is the level that equates the marginal cost of increasing the externality to the marginal benefit of increasing it: T = [k × (dH/dT )]/[d(H + N)/dT ]. If N does not respond to the subsidy (dN/dT = 0), the optimal subsidy is k, the level of the externality, as in Pigou’s standard theory. But if subsidizing H distorts the amount of N consumed upward, the optimal subsidy is lower than the level of the externality. The gap between the level of the externality and the optimal subsidy level will depend on how sensitive the hanging of nets is to price, relative to total ownership of nets. In other words, what we need to learn from the experiment is the following: when we increase the price, by how much do we reduce the number of hanging nets (nets put to health use), and how does it compare to the reduction in the total number of nets acquired? This simple model could be augmented to incorporate imperfect information (for the household) on the true returns to hanging nets, especially on the relative curvature of u(.) and v(.). The lack of information could be on the effectiveness or the quality of ITNs. In this context, households could use the subsidy level as a signal of effectiveness or quality (i.e., if households interpret the size of the subsidy as the government’s willingness to pay to increase coverage and thus as a measure of the net’s likely effectiveness). 4. For simplicity we assume that the positive health externality is linear in the share of the population that is covered with a net. In reality the health externality for malaria seems to be S-shaped.

8

QUARTERLY JOURNAL OF ECONOMICS

In such a case, subsidizing H would distort the amount of N consumed downward, and the optimal subsidy would be greater than the level of the externality. Alternatively, households could lack information on the nonmonetary transaction cost of hanging the net and underestimate this cost when they invest in nets for health use. Once households realize how much effort is required to hang the net (hanging it every evening and dehanging it every morning can be cumbersome for households that sleep in their living rooms), they might decide to reallocate a net from health use to nonhealth use. Households that suffer from the sunk-cost fallacy, however, would be less likely to reallocate a net from health use to nonhealth use if they had to pay a greater price for the net. This could be formalized, for example, by adding an effort cost in the function u(.), and assuming that the disutility of the effort needed to hang the net is weighted by the relative importance of the nonmonetary cost (effort) in the total cost of the net (nonmonetary cost + monetary cost). Increasing the subsidy level (decreasing the price) would then increase the disutility of putting forth effort to hang the net and increase the likelihood that households do not use the net. This sunk cost effect would lead to an upward distortion of N, and imply a subsidy level lower than the level of the externality. For a quick preview of our findings, Figure I plots the demand curve and the “hanging curve” observed in our experiment. The slope of the top curve is an estimate of −d(H + N)/dT and the slope of the bottom curve estimates −dH/dT. We find no systematic effect of the price on the ratio of these two slopes. When the price decreases from 10 Ksh to 0, the ratio of hanging nets to acquired nets actually increases, suggesting that the full subsidy (a price of zero) distorts the demand for nonhanging nets downward. However, at higher price levels, the effect of changing the subsidy is different. The ratio increases when the price decreases from 40 to 20 Ksh and from 20 to 10 Ksh. Overall, however, the ratio remains quite close to 1 over the price range we study.

III. BACKGROUND ON ITNS AND EXPERIMENTAL SETUP III.A. Background on Insecticide-Treated Nets ITNs have been shown to reduce overall child mortality by at least 20% in regions of Africa where malaria is the leading cause of death among children under five (Lengeler 2004). ITN

9

0

0.2

0.4

0.6

0.8

1

FREE DISTRIBUTION OR COST SHARING?

Free

10Ksh

20Ksh

40Ksh

Price of ITN Acquired ITN Acquired ITN and using it

95% CI 95% CI

FIGURE I Ownership vs. Effective Coverage Sample includes women sampled for baseline survey during clinic visit, and who either did not acquire an ITN or acquired one and were later randomly sampled for the home follow-up. Usage of program ITN is zero for those who did not acquire a program ITN. Error bars represent ±2.14 standard errors (5% confidence interval with fourteen degrees of freedom). At the time this study was conducted, ITNs in Kenya were social-marketed through prenatal clinics at a price of 50 Ksh.

coverage protects pregnant women and their children from the serious detrimental effects of maternal malaria. In addition, ITN use can help avert some of the substantial direct costs of treatment and the indirect costs of malaria infection on impaired learning and lost income. Lucas (forthcoming) estimates that the gains to education from a malaria-free environment alone more than compensate for the cost of an ITN. Despite the proven efficacy and increasing availability of ITNs on the retail market, the majority of children and pregnant women in sub-Saharan Africa do not use ITNs.5 At $5–$7 a net (US$ in PPP), they are unaffordable to most families, and so governments and NGOs distribute ITNs at heavily subsidized prices. However, the price that is charged for the net 5. According to the World Malaria Report (2008), which compiled results from surveys in 18 African countries, 23% of children and 27% of pregnant women sleep under ITNs.

10

QUARTERLY JOURNAL OF ECONOMICS

varies greatly by the distributing organization, country, and consumer. The failure to achieve higher ITN coverage rates despite repeated pledges by governments and the international community (such as the Abuja Declaration of 2000) has put ITNs at the center of a lively debate over how to price vital public health products in developing countries (Lengeler et al. 2007). Proponents of cost-sharing ITN distribution programs argue that a positive price is needed to screen out people who will not use the net, and thus avoid wasting the subsidy on nonusers. Cost-sharing programs often have a “social marketing” component, which uses mass media communication strategies and branding to increase the consumer’s willingness to pay (Schellenberg et al. 2001; PSI 2003). The goal is to shore up demand and usage by making the value of ITN use salient to consumers. Proponents of cost-sharing programs also point out that positive prices are necessary to ensure the development of a commercial market, considered key to ensuring a sustainable supply of ITNs. Proponents of full subsidization argue that, although the private benefits of ITN use can be substantial, ITNs also have important positive health externalities deriving from reduced disease transmission.6,7 In a randomized trial of an ITN distribution program at the village level in western Kenya, the positive impacts of ITN distribution on child mortality, anemia, and malaria infection were as strong among nonbeneficiary households within 300 meters of beneficiary villages as they were among households in the beneficiary villages themselves (Gimnig et al. 2003).8 Although ITNs may have positive externalities at low levels of coverage (e.g., for unprotected children in the same household), it is estimated that at least 50% coverage is required to achieve strong community effects on mortality and morbidity (Hawley et al. 2003). To date, no cost-sharing distribution program is known to have reached this threshold (WHO 2007). 6. The external effects of ITN use derive from three sources: (1) fewer mosquitoes due to contact with insecticide, (2) reduction in the infective mosquito population due to the decline in the available blood supply, and (3) fewer malaria parasites to be passed on to others. 7. The case for fully subsidizing ITNs has also been made on the basis of the substantial costs to the government of hospital admissions and outpatient consultations due to malaria (Evans et al. 1997). 8. In a similar study in Ghana, Binka, Indome, and Smith (1998) find that child mortality increases by 6.7% with each 100-meter shift away from the nearest household with an ITN.

FREE DISTRIBUTION OR COST SHARING?

11

III.B. Experimental Setup The experiment was conducted in twenty communities in western Kenya, spread across four districts: Busia, Bungoma, Butere, and Mumias. Malaria is endemic in this region of Kenya: transmission occurs throughout the year with two peaks corresponding to periods of heavy rain, in May/June/July and October/November. In two nearby districts, a study by the CDC and the Kenyan Medical Research Institute found that pregnant women may receive as many as 230 infective bites during their forty weeks of gestation, and as a consequence of the high resulting levels of maternal anemia, up to a third of all infants are born either premature, small for gestational age, or with low birth weight (Ter Kuile et al. 2003). The latest published data on net ownership and usage available for the region come from the Kenya Demographic and Health Survey of 2003. It estimated that 19.8% of households in Western Kenya had at least one net and 6.7% had a treated net (an ITN); 12.4% of children under five slept under a net and 4.8% under an ITN; 6% of pregnant women slept under a net the night before and 3% under an ITN. Net ownership is very likely to have gone up since, however. In July 2006, the Measles Initiative ran a oneweek campaign throughout western Kenya to vaccinate children between nine months and five years of age and distributed a free long-lasting ITN to each mother who brought her children to be vaccinated. The 2008 World Malaria Report uses ITN distribution figures to estimate that 65% of Kenyan households now own an ITN. A 2007 survey conducted (for a separate project) in the area of study among households with school-age children found a rate of long-lasting ITN ownership around 30% (Dupas 2009b). Our experiment targeted ITN distribution to pregnant women visiting health clinics for prenatal care.9 We worked with 20 rural public health centers chosen from a total of 70 health centers in the region, 17 of which were private and 53 were public. The 20 health centers we sampled were chosen based on their public status, their size, services offered, and distance from each other. We then randomly assigned them to one of five groups: four clinics formed the control group; five clinics were provided with ITNs 9. The ITNs distributed in our experiment were PermaNets, sold by Vestergaard Frandsen. They are circular polyester bed nets treated with the insecticide Deltamethrin and maintain efficacy without retreatment for about three to five years (or about twenty washes).

12

QUARTERLY JOURNAL OF ECONOMICS

and instructed to give them free of charge to all expectant mothers coming for prenatal care; five clinics were provided with ITNs to be sold at 10 Ksh (corresponding to a 97.5% subsidy); three clinics were provided with ITNs to be sold at 20 Ksh (95.0% subsidy); and the last three clinics were provided with ITNs to be sold at 40 Ksh (90% subsidy). The highest price is 10 Ksh below the prevailing subsidized price of ITNs in this region, offered through PSI to pregnant women at prenatal clinics.10 Table I presents summary statistics on the main characteristics of health centers in each group. Although the relatively small number of clinics leads to imperfect balancing of characteristics, the clinics appear reasonably similar across ITN price assignment and we show below that controlling for clinic characteristics does not change our estimates except to add precision. Clinics were provided with financial incentives to carry out the program as designed. For each month of implementation, clinics received a cash bonus (or a piece of equipment of their choice) worth 5,000 Ksh (approximately $75) if no evidence of “leakage” or mismanagement of the ITNs or funds was observed. Clinics were informed that random spot checks of their record books would be conducted, as well as visits to a random subsample of beneficiaries to confirm the price at which the ITNs had been sold and to confirm that they had indeed purchased ITNs (if the clinic’s records indicated so). Despite this, we observed leakages and mismanagement of the ITNs in four of the eleven clinics that were asked to sell ITNs for a positive price. We did not observe any evidence of mismanagement in the five clinics instructed to give out the ITNs for free. Of the four clinics that mismanaged the ITNs, none of them altered the price at which ITNs were made available to prenatal clients, but they sold some of the program ITNs to ineligible recipients (i.e., nonprenatal clients). The ITN distribution program was phased into program clinics between March and May 2007 and was kept in place for at least three months in each clinic, throughout the peak “long rains” malaria season and subsequent months. Posters were put up in clinics to inform prenatal clients of the price at which the ITNs were sold. Other than offering a free hemoglobin test to each woman on survey days, we did not interfere with the normal 10. Results from a preprogram clinic survey suggest that it is perhaps not appropriate to interpret our results in the context of widely available ITNs to pregnant women at 50 Ksh, as many of the clinics reported the supply of PSI nets to be erratic and frequently out of stock.

4

67 [46.3] 114 [69.4] 10.0 [8.2] 0.50 [0.58] 3.8 [2.9] 11.3 [2.6] 5

13.3

3.4

0.40

12.0

117

63

(2)

(1)

5

13.0

3.6

0.80

4.0

164

75

(3)

10 Ksh ($0.15)

3

12.1

4.3

0.67

13.3

106

54

(4)

20 Ksh ($0.30)

3

11.4

5.0

0.33

10.0

122

62

(5)

40 Ksh ($0.60)

.743

.769

.507

.292

.565

.769

(6)

p-value, joint test 1

.593

.758

.713

.619

.847

.965

(7)

p-value, joint test 2

Notes: Standard deviations presented in brackets. At the time of the program, $US 1 was equivalent to around 67 Kenyan shillings (Ksh). Prenatal clinics were sampled from a pool of seventy prenatal clinics over four districts in Kenya’s Western Province: Busia, Bungoma, Butere, and Mumias. Joint test 1: Test of equality of means across four treatment groups. Joint test 2: Joint test that means in treatment groups are equal to mean in control group.

Number of clinics

Distance (in km) to closest prenatal clinic in the sample

Total other prenatal clinics within 10 kilometers (km)

Fraction of clinics with HIV testing services

Prenatal enrollment fee (in Ksh)

Average monthly attendance in 2006 (first + subsequent visits)

Average monthly attendance in 2006 (first visits ONLY)

0 Ksh (free)

Control group

Treatment groups ITN price:

TABLE I CHARACTERISTICS OF PRENATAL CLINICS IN THE SAMPLE, BY TREATMENT GROUP

FREE DISTRIBUTION OR COST SHARING?

13

14

QUARTERLY JOURNAL OF ECONOMICS

procedures these clinics used at prenatal care visits, which in principle included a discussion of the importance of bed net usage. Within clinics where the posted price was positive, a second stage randomization was conducted on unannounced, random days. On those days, women who had expressed their willingness and showed their ability to purchase an ITN at the posted price (by putting the required amount of money on the counter) were surprised by the opportunity to participate in a lottery for an additional promotion by picking an envelope from a basket. All women given the opportunity to participate in the lottery agreed to pick an envelope. The final price paid by these women was the initial offer price if they picked an empty envelope; zero if they picked a “free net” envelope; or a positive price below the initial offer price if the initial price was 40 Ksh. This second-stage randomization started at least five weeks after the program had started in a given clinic, and took place no more than once a week, on varying week days, to avoid biasing the women’s decisions to purchase the ITN based on the expectation of a second-stage discount.11 III.C. Data Three types of data were collected. First, administrative records kept by the clinic on ITN sales were collected. Second, each clinic was visited three or four times on random days, and on those days enumerators surveyed all pregnant women who came for a prenatal visit. Women were asked basic background questions and whether they purchased a net, and their hemoglobin levels were recorded. In total, these measures were collected from 545 pregnant women. Third, a random sample of 246 prenatal clients who had purchased/received a net through the program were selected to be visited at their homes three to ten weeks after their net purchases. All home visits were conducted within three weeks in July 2007 to ensure that all respondents faced the same environment (especially in terms of malaria seasonality) at the time of the follow-up. Of this subsample, 92% (226 women) were found and consented to be interviewed. During the home visits, respondents were asked to show the net, whether they had started using it, and who was sleeping under it. Surveyors 11. By comparing days with and those without the lottery, we can test whether women heard about the lottery on days we did the lottery. We do not find evidence that uptake was higher on the days we performed the lottery; we also do not observe a significant increase in the uptake of nets after the first lottery day (data not shown).

FREE DISTRIBUTION OR COST SHARING?

15

checked to see whether the net had been taken out of the packaging, whether it was hanging, and the condition of the net.12 Note that, at the time of the baseline survey and ITN purchase, women were not told that follow-up visits could be made at their homes. What’s more, neither the clinic staff nor the enumerators conducting the baseline surveys knew that usage would be checked. This limits the risk that usage behavior might be abnormally high during the study period. Also note that we do not observe an increase in reported or observed usage over the three weeks during which the home surveys were conducted. This suggests that the spread of information about the usage checks was limited and unlikely to have altered usage behavior. III.D. Clinic-Level Randomization The price at which ITNs were sold was randomized at the clinic level, but our outcomes of interest are at the individual level: uptake, usage rates, and health. When regressing individual-level dependent variables on clinic-level characteristics, we are likely to overstate the precision of our estimators if we ignore the fact that observations within the same clinic (cluster) are not independent (Moulton 1990; Donald and Lang 2007). We compute clusterrobust standard errors using the cluster-correlated Huber–White covariance matrix method. In addition, because the number of clusters is small (sixteen treatment clinics), the critical values for the tests of significance are drawn from a t-distribution with fourteen (= 16 − 2) degrees of freedom (Cameron, Miller, and Gelbach 2007). The critical values for the 1%, 5%, and 10% significance levels are thus 2.98, 2.14, and 1.76, respectively. Another approach to credibly assessing causal effects with a limited number of randomization units is to use (nonparametric) randomization inference, first proposed by Fisher (1935), later developed by Rosenbaum (2002), and recently used by Bloom et al. (2006). Hypothesis testing under this method is done as follows. For each clinic, we observe the share of prenatal clients who purchased a net (or were using a net). Let yi denote the observed purchase rate for clinic i. For each clinic i = 1, 2, . . . ,16, Yi (Pi ) represents the purchase rate at clinic i when the ITN price at clinic i is Pi , Pi ∈ [0, 10, 20, 40]. The outcome variable is a function of 12. The nets that were distributed through the program were easily recognizable through their tags. Enumerators were instructed to check the tags to confirm the origin of the nets.

16

QUARTERLY JOURNAL OF ECONOMICS

the treatment variable and potential outcomes: (1|Pi = k)Yi (k). yi = k=0,10,20,40

The effect of charging price k in clinic i (relative to free distribution) is Eki = Yi (k) − Yi (0). To make causal inferences for a price level k via Fisher’s exact test, we use the null hypothesis that the effect of charging k is zero for all clinics: H0 : Eki = 0 for all i = 1, . . . , 16. Under this null hypothesis, all potential outcomes are known exactly. For example, although we do not observe the outcome under price 0 for clinic i subject to price k > 0, the null hypothesis implies that the unobserved outcome is equal to the observed outcome, Yi (0) = yi . For a given price level k, we can test the null hypothesis against the alternative hypothesis that Eki = 0 for some clinics by using the difference in average outcomes by treatment status as a test statistic: (1|Pi = 0)yi (1|Pi = k)yi − . Tk = (1|Pi = k) (1|Pi = 0) Under the null hypothesis, only the price variable P is random, and thus the distribution of the test statistic (generated by taking all possible treatment assignments of clinics to prices) is completely determined by that of P. By checking whether Tkobs , the statistic for the “true” assignment of prices (the actual assignment in our experiment), falls in the tails of the distribution, we can test the null hypothesis. We can reject the null hypothesis with a confidence level of 1 − α if the test statistic for the true assignment is in the (α/2)% tails of the distribution. This test is nonparametric because it does not make distributional assumptions. We call the p-values computed this way “randomization inference p-values.” IV. RESULTS IV.A. Clinic-Level Analysis: Randomization Inference Results Table II presents the results of randomization inference tests of the hypotheses that the three positive prices in our experiment

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)

Share of prenatal clients who acquired program ITN (12)

Panel A: Takeup Mean in free group 41.03 0.99 Difference with free group: ITN price = 10 Ksh −3.43 −13.77 −0.07 −0.07 S.E. (18.60) (13.98) (0.03)∗∗ (0.03)∗ Randomization inference p-value .824 .460 .125 .091 ITN price = 20 Ksh −13.87 −10.75 −0.17 −0.18 S.E. (20.36) (18.16) (0.02)∗∗∗ (0.02)∗∗∗ Randomization inference p-value .64 .61 .000 .036 ITN price = 40 Ksh −32.12 −34.03 −0.58 −0.58 S.E. (25.05) (22.00) (0.06)∗∗∗ (0.05)∗∗∗ Randomization inference p-value .23 .19 .000 .018 Clinic-level controls X X X X X X Number of clinics 10 10 8 8 7 7 10 10 8 8 8 8 R2 .00 .54 .07 .39 .25 .54 .45 .45 .93 .96 .95 .97 # of possible assignments 252 252 56 56 21 21 252 252 56 56 56 56 for random inference

(1)

Average weekly ITN sales over first 6 weeks

TABLE II CLINIC-LEVEL ANALYSIS: FISHERIAN PERMUTATION TESTS FREE DISTRIBUTION OR COST SHARING?

17

Panel B: Effective coverage

(14)

X 10 .22 252 2.429 1.418

10 .2 252 2.571 2.107

−0.18 −0.17 (0.13) (0.14) .173 .206

0.70

(13)

(16)

0.598

1.588

8 .42 56

0.822

1.500

X 8 .42 56

−0.27 −0.27 (0.13)∗ (0.14) .071 .143

(15)

(18)

0.185

0.948

0.153

0.931

−0.55 −0.54 (0.14)∗∗∗ (0.15)∗∗ .018 .054 X 8 8 .71 .73 56 56

(17)

Share using program ITN at follow-up (unconditional on takeup)

Notes: Panel A, columns (1)–(6): Sales data from clinics’ records. Data missing for one clinic due to misreporting of sales. Panel A, columns (7)–(12), and Panel B: Individual data collected by research team, averaged at the clinic level (the level of randomization). “Using program ITN” is equal to 1 only for those who (1) acquired the ITN and (2) had the ITN hanging in the home during an unannounced visit. Standard errors in parentheses, estimated through linear regressions. P-values for treatment effects computed by randomization inference. ∗∗∗ , ∗∗ , ∗ Significance at 1%, 5%, and 10% levels, respectively.

Mean in free group Difference with free group: ITN price = 10 Ksh S.E. Randomization inference p-value ITN price = 20 Ksh S.E. Randomization inference p-value ITN price = 40 Ksh S.E. Randomization inference p-value Clinic-level controls Number of clinics R2 # of possible assignments for random inference Ratio [(H/T ) from Panel B / ((H + N)/T ) from Panel A] Standard error of ratio (H/T )/(H + N/T )

TABLE II (CONTINUED)

18 QUARTERLY JOURNAL OF ECONOMICS

FREE DISTRIBUTION OR COST SHARING?

19

had no effect on demand and coverage. The data used in Table II were collapsed at the clinic level. (The raw data on clinic level outcomes are provided in the Online Appendix). We have two indicators of demand presented in Panel A: average weekly sales of ITNs (recorded by the clinics) in columns (1)–(6) and the share of surveyed pregnant women who acquired an ITN in columns (7)–(12). Panel B shows the rate of effective coverage: the share of surveyed pregnant women in the clinic who not only acquired the ITN but also reported using it at follow-up. For each outcome (sales, uptake, effective coverage), we present the estimated effect of prices both without and with clinic-level controls. We present the standard errors estimated through parametric linear regressions, as well as the randomization inference p-values. Results in columns (1)–(6) suggest that, although the ITN sales were lower on average in clinics charging a higher price for the ITN, none of the differences between clinics can be attributed to the price. Even the 32/41 = 78% lower sales in the clinics charging 40 Ksh are not significant. Note, however, that the sales data are missing for one of the three 40 Ksh clinics, and as a consequence the power of the randomization inference test in columns (5) and (6) is extremely low: there are only 21 possible assignments of seven clinics to two groups of sizes five and two, and each of them has a 1/21 = 4.76% chance of being selected. This means that even the largest effect cannot fall within the 2.5% tails of the distribution, and randomization inference would thus fail to reject the null hypothesis of no price effect with 95% confidence no matter how large the difference in uptake between 0 Ksh and 40 Ksh clinics is (Bloom et al. 2006). The power is higher for the tests performed on the survey data (columns (7)–(12) of Panel A, and Panel B), but still lower relative to tests that impose some structure on the error term. Nevertheless, the p-values in columns (9)–(12) suggest that we can reject the hypothesis that charging either 20 or 40 Ksh for nets has no effect on uptake with 95% confidence. In particular, uptake of the net is 58 percentage points lower in the 40 Ksh group than in the free distribution group, and the confidence level for this effect is 98%. The results on effective coverage (usage of the net unconditional on uptake) are weaker for the 20 Ksh treatment but still significant for the 40 Ksh treatment: effective coverage is 54 percentage points lower in the 40 Ksh group than in the free distribution group, and the confidence level for this effect is 94%.

20

QUARTERLY JOURNAL OF ECONOMICS

As shown in Section II, the key parameter of interest in determining the optimal subsidy level is the ratio (H/T)/((H + N)/T). We compute this ratio for T = 10 Ksh, T = 20 Ksh, and T = 40 Ksh at the bottom of Panel B in Table II. The ratio is greater than 1 for price changes from 0 to 10 Ksh or 0 to 20 Ksh, but the standard errors are massive and there is little informational content in those numbers. For T = 40 Ksh, the ratio is more precisely estimated, at 0.95, still quite close to 1. The standard error of this ratio is 0.18 in the absence of covariates, and implies a 95% confidence interval of [0.58–1.31]. When we control for clinic-level covariates in the estimations of the two effects, the confidence interval on the ratio is somewhat reduced to [0.63–1.23]. The finding in Table II that effective coverage is statistically significantly lower by 54 percentage points in the 40 Ksh group (the group that proxies the cost-sharing program in place in Kenya at the time of the study) compared to the free distribution group is the main result of the paper. In the remainder of the analysis, we investigate the effects in more detail by conducting parametric analysis on the disaggregated data with cluster standard errors adjusted for the small number of clusters. IV.B. Micro-Level Analysis: Price Elasticity of Demand for ITNs Table III presents coefficient estimates from OLS regressions of weekly ITN sales on price. The coefficient estimate on ITN price from the most basic specification in column (1) is −0.797. This estimate implies that weekly ITN sales drop by about eight nets for each 10 Ksh increase in price. Because clinics distributing ITNs for free to their clients distribute an average of 41 ITNs per week, these estimates imply that a 10 Ksh increase in ITN price leads to a 20% decline in weekly ITN sales. The specification in column (4) regresses weekly ITN sales on indicator variables for each ITN price (0 Ksh is excluded). Raising the price from 0 to 40 Ksh reduces demand by 80% (from 41 ITNs per week to 9)— a substantial decline in demand, a bit smaller than the decline implied by the linear estimate in column (1). These results are not sensitive to adding controls for time effects (columns (2) and (5)). Columns (3) and (6) present results of robustness checks conducted by including various characteristics of the clinics as controls. Because net sales are conditional on enrollment at prenatal clinics, one concern is that our demand estimates are confounded

ANC clinic offers HIV testing services

Prenatal enrollment fee (in Ksh)

Average attendance in 2006 (total)

Average attendance in 2006 (first visits)

Number of weeks since program started

ITN price = 40 Ksh ($0.60)

ITN price = 20 Ksh ($0.30)

ITN price = 10 Ksh ($0.15)

ITN price in Kenyan shillings (Ksh)

−0.797 (0.403)∗

−0.797 (0.401)∗

−5.08 (1.41)∗∗∗

(2)

(1)

−5.08 (1.46)∗∗∗ 1.48 (0.21)∗∗∗ −0.46 (0.15)∗∗∗ −0.77 (0.27)∗∗ 14.08 (7.44)∗

−0.803 (0.107)∗∗∗

(3)

−0.33 (16.81) −9.50 (16.04) −32.42 (15.38)∗

(4)

Weekly ITN sales

TABLE III WEEKLY ITN SALES ACROSS PRICES: CLINIC-LEVEL DATA

−0.33 (16.92) −9.50 (16.14) −32.42 (15.47)∗ −5.08 (1.42)∗∗∗

(5)

1.52 (4.37) −14.08 (5.00)∗∗ −33.71 (2.88)∗∗∗ −5.08 (1.48)∗∗∗ 1.56 (0.22)∗∗∗ −0.50 (0.15)∗∗∗ −0.54 (0.32) 7.07 (7.65)

(6)

FREE DISTRIBUTION OR COST SHARING?

21

90 .13 41.03

90 .21

(2)

(4)

90 .14

(3) −1.08 (0.77) −8.85 (2.89)∗∗∗ 90 .64

Weekly ITN sales

90 .23

(5)

−1.84 (0.68)∗∗ −9.63 (2.70)∗∗∗ 90 .65

(6)

Notes: Each column is an OLS regression of weekly ITN sales on ITN price or on a set of indicator variables for each price (0 Ksh is excluded). All regressions include district fixed effects. The sample includes fifteen clinics in three districts over six weeks after program introduction. (One 40 Ksh clinic is not included because of problems with net sales reporting.) Standard errors in parentheses are clustered at the clinic level. Given the small number of clusters (fifteen), the critical values for T -tests were drawn from a t-distribution with 13 (15 − 2) degrees of freedom. ∗ ∗ ∗ , ∗∗ , ∗ Significance at 1%, 5%, and 10% levels, respectively.

Observations (clinic-weeks) R2 Mean of dep. var. in clinics with free ITNs

Distance to the closest ANC clinic in the sample

Distance to the closest ANC clinic

(1)

TABLE III (CONTINUED)

22 QUARTERLY JOURNAL OF ECONOMICS

FREE DISTRIBUTION OR COST SHARING?

23

by variation in the level of prenatal attendance across clinics. Subsidized ITNs may provide an incentive to receive prenatal care, and therefore the level of prenatal enrollment after the introduction of the program is an endogenous variable of interest (Dupas 2005). Any impact of ITN price on total enrollment should be captured by total ITN sales (which reflect the change in the number of patients and in the fraction of patients willing to buy ITNs at each price). However, our demand estimates could be biased if total attendance prior to program introduction is correlated with the assigned ITN price. To check whether this is the case, the specification in columns (3) and (6) control for monthly prenatal attendance at each clinic in 2006, as well as additional clinic characteristics that could potentially influence attendance such as any fee for prenatal care, whether the clinic offers counseling and/or testing for HIV, the distance to the closest other clinic/hospital in our sample, and the distance to the closest other clinic/hospital in the area. The coefficient estimates on ITN price are basically unchanged when clinic controls are included, but their precision is improved. One might be concerned that our net sales data are biased due to (a moderate amount of) mismanagement, theft, and misreporting by clinics. Further, because the number of observations in Table III is small, demand estimates are not precisely estimated. For these reasons, it is important to check that the demand estimates based on net sales are consistent with those based on our survey data. Table IV presents additional estimates of demand based on individual-level data from surveys conducted among all prenatal clients who visited the clinics on the randomly chosen days when baseline surveys were conducted. These specifications correspond to linear probability models where the dependent variable is a dummy equal to one if the prenatal client bought or received an ITN; the independent variables are the price at which ITNs were sold, or dummies for each price. The coefficient estimate of −0.015 on ITN price in column (1) implies that a 10 Ksh ($0.15) increase in the price of ITNs reduces demand by fifteen percentage points (or roughly 20% at the mean purchase probability of .81). This is very consistent with the results based on net sales and corresponds to a price elasticity (at the mean price and purchase probability) of −.37. These results imply that demand for ITNs is 75% lower at the cost-sharing price prevailing in Kenya at the time of the study (50 Ksh or $0.75) than it is under a free distribution scheme.

424 .26 0.81 .23

424 .28 0.81

X X

−0.015 −0.017 (0.002)∗∗∗ (0.001)∗∗∗

(2)

(4)

424 .32 0.81

424 .32 0.81

−0.073 −0.058 (0.018)∗∗∗ (0.037) −0.172 −0.331 (0.035)∗∗∗ (0.102)∗∗∗ −0.605 −0.656 (0.058)∗∗∗ (0.037)∗∗∗ X X

(3)

(6)

(7)

201 .42 0.77

X X X

134 .24 0.84

X

X X

266 .32 0.84

X

X X

−0.018 −0.012 −0.016 (0.001)∗∗∗ (0.002)∗∗∗ (0.002)∗∗∗

(5)

266 .33 0.84

X

0.046 (0.034) −0.350 (0.142)∗∗ −0.635 (0.061)∗∗∗ X X

(8)

Notes: Data are from clinic-based surveys conducted in April–June 2007, throughout the first six weeks of the program. All regressions include district fixed effects. Standard errors in parentheses are clustered at the clinic level. Given the small number of clusters (sixteen), the critical values for T -tests were drawn from a t-distribution with 14 (16 − 2) degrees of freedom. All specifications are OLS regressions of an indicator variable equal to one if the respondent bought or received an ITN for free on the price of the ITN, except columns (4) and (8), in which regressors are indicator variables for each price (price = 0 is excluded). Time controls include fixed effects for the day of the week the survey was administered and a variable indicating how much time had elapsed between the day the survey was administered and the program introduction. Clinic controls include total monthly first prenatal care visits between April and June of 2006, the fee charged for a prenatal care visit, whether or not the clinic offers voluntary counseling and testing for HIV or prevention-of-mother-to-child-transmission of HIV services, the distance between the clinic and the closest other clinic or hospital and the distance between the clinic and the closest other clinic or hospital in the program. ∗ ∗ ∗ , ∗∗ , ∗ Significance at 1%, 5%, and 10% levels, respectively.

Time controls Clinic controls Restricted sample: first prenatal visit Restricted sample: first pregnancy Restricted sample: did not receive free ITN previous year Observations R2 Mean of dep. var. Intracluster correlation

ITN price = 40 Ksh ($0.60)

ITN price = 20 Ksh ($0.30)

ITN price = 10 Ksh ($0.15)

ITN price in Kenyan shillings (Ksh)

(1)

Bought/received an ITN during prenatal visit

TABLE IV DEMAND FOR ITNS ACROSS PRICES: INDIVIDUAL-LEVEL DATA

24 QUARTERLY JOURNAL OF ECONOMICS

FREE DISTRIBUTION OR COST SHARING?

25

In column (2) of Table IV, we add controls for when the survey was administered, including day-of-the-week fixed effects and the time elapsed since program introduction, as well as controls for the clinic characteristics used in Table III, column (3). The coefficient estimate for price remains very close to that obtained in the basic specification. Columns (3) and (4) present estimates of demand at each price point. In the absence of clinic or time controls, the decrease in demand for an increase in price from 0 to 10 Ksh is estimated at seven percentage points (larger than suggested by the clinic-level ITN sales in Table III). An increase in price from 20 to 40 Ksh leads to a 43–percentage point drop in demand. Column (5) presents demand estimates for the restricted sample of women who are making their first prenatal care visits for their current pregnancies. It is important to separate first visits from revisits because the latter may be returning because they are sick. Alternatively, women who are coming for a second or third visit may be healthier, because they have already received the benefits of the earlier visit(s), some of which can directly affect their immediate need for an ITN (such as malaria prophylaxis and iron supplementation). The coefficient estimate in column (5) is larger than that for the entire sample, implying that women coming for the first time are more sensitive to price than women coming for a revisit. This could be because women learn about the subsidized ITN program at their first visit and bring the cash to purchase the net at their second visit. Access to free ITNs from other sources could have dampened demand for ITNs distributed through the program. This is a real concern, because the Measles Initiative ran a campaign in July 2006 (nine months before the start of our experiment) throughout Kenya to vaccinate children between nine months and five years of age, distributing free ITNs to mothers of these children in western Kenya. To examine the demand response among women who are less likely to have had access to free ITNs in the past, column (6) estimates the impact of ITN price on demand for women in their first pregnancies only. When we restrict the sample in this way, the coefficient on ITN price drops to −0.012. This implies that women in their first pregnancies are indeed less sensitive to ITN price differences, but their demand still drops by 55 percentage points when the ITN price is raised from 0 to 50 Ksh. Our baseline survey asked respondents if they had received a free ITN in the previous year, and 37.3% said they did. In columns

26

QUARTERLY JOURNAL OF ECONOMICS

(7) and (8), we focus on the 63% who reported not having received a free ITN and estimate how their demand for an ITN in our program was affected by price. We find a coefficient on price very similar to that obtained with the full sample (−0.016), and the specifications with dummies for each price group generate estimates that are also indistinguishable from those obtained with the full sample. Nearly three-quarters of prenatal clients walked to the clinics for prenatal care. Because clinics included in our sample were at least 13 kilometers from one another, it is unlikely that prenatal clients would switch from one of our program clinics to another. However, it is likely that our program generated some crowd-out of prenatal clients at nonprogram clinics in the vicinity, particularly in the case of free nets. Because these “switchers” are driven by price differences in ITNs that would not exist in a nationwide distribution program, we should look at the demand response of those prenatal clients who, at the time of the interview, were attending the same clinic that they had in the past. In Online Appendix Table A.1, we replicate Table IV for this subsample of prenatal clients who did not switch clinics. The results are nearly unchanged, suggesting that the same degree of price sensitivity would prevail in a program with a uniform price across all clinics. In sum, our findings suggest that demand for ITNs is not sensitive to small increases in price from zero, but that even a moderate degree of cost-sharing leads to large decreases in demand. At the mean, a 10 Ksh ($0.15) increase in ITN price decreases demand by 20%. These estimates suggest that the majority of pregnant women are either unable or unwilling to pay the prevailing cost-sharing price, which is itself still far below the manufacturing cost of ITNs. IV.C. Price-Elasticity of the Usage of ITNs Usage Conditional on Ownership. Let us start this section with an important caveat: Our sample size to study usage conditional on uptake is considerably hampered by the fact that uptake was low in the higher-priced groups: only a small fraction of the respondents interviewed at baseline in the 40 Ksh group purchased an ITN and could be followed up at home for a usage check. Keeping this caveat in mind, Figure II shows the average usage rate of program-issued ITNs across price groups. The top panel shows self-reported usage rates, and the bottom panel shows the likelihood that the ITN was found hanging, both measured during

27

FREE DISTRIBUTION OR COST SHARING?

0

0.2

0.4

0.6

0.8

1

Declare using ITN

Free

10Ksh

20Ksh

40Ksh

ITN Price Average

95% CI

0

0.2

0.4

0.6

0.8

1

ITN seen visibly hanging

Free

10Ksh

20Ksh

40Ksh

ITN Price Average

95% CI

FIGURE II Program ITN Usage Rates (Conditional on Uptake) by ITN Price Error bars represent ±2.14 standard errors (95% confidence interval with fourteen degrees of freedom). Number of observations: 226.

28

QUARTERLY JOURNAL OF ECONOMICS

an unannounced home visit by an enumerator. On average, 62% of women visited at home claimed to be using the ITN they acquired through the program, a short-term usage rate that is very consistent with previous usage studies (D’Alessandro 1994; Alaii et al. 2003). The observed hanging rate was only slightly lower, at 57%. However, we find little variation in usage across price groups, and no systematic pattern. This is confirmed by the regression estimates of selection effects on usage, presented in Table V. Our coefficient estimate on ITN price in column (1) is positive, but insignificant, suggesting that a price increase of 10 Ksh increases usage by four percentage points, representing an increase of 6% at the mean. The confidence interval is large, however, and the true coefficient could be on either side of zero (the 95% confidence interval is −0.004; 0.012). These estimates correspond to a price elasticity of usage (at the mean price and usage rate) of 0.097. Adding controls in column (2) does not improve precision but reduces the size of the estimated effect. The results also hold when the sample is restricted to the subsample of women coming for their first prenatal visit, women in their first pregnancy, or to those who reported not having received a free ITN the previous year (data not shown). Estimates using indicators for each price in column (3) are also very imprecise, but show no pattern of increasing use with price. Women who pay 10 or 20 Ksh are less likely to be using their ITNs than women receiving them for free, but women who pay 40 Ksh appear close to 10% more likely to be using their ITNs. In none of the cases, however, can we reject the null hypothesis that price has no effect on intensity of usage. We cannot observe whether the net is actually used at night, but it is reasonable to believe that, if the ITN is taken out of its packaging and has been hung on the ceiling, it is being used.13 Of those women who claimed to be using the ITN, 95% had the net hanging. Results for whether or not the net is hanging (columns (5) and (6)) are very similar to those using self-reported usage. One might be concerned that usage rates among prenatal clients receiving a free net are higher than they would be under a one-price policy, because pregnant women who value an ITN 13. Having the insecticide-treated net hanging from the ceiling creates health benefits even if people do not sleep under the net, because it repels, disables, and/or kills mosquitoes coming into contact with the insecticide on the netting material (WHO 2007).

226 0.62 .01 .04

0.004 (0.004)

X X 226 0.62 .06

0.003 (0.003)

(2)

1.16 .36

1.14 .37

226 0.62 .03

−0.094 (0.103) −0.017 (0.119) 0.125 (0.123) X X 226 0.62 .07

(4)

−0.125 (0.120) −0.017 (0.107) 0.098 (0.135)

(3)

222 0.57 .01

0.003 (0.003)

(5)

ITN is visibly hanging

1.87 .18

222 0.57 .03

−0.154 (0.129) −0.088 (0.124) 0.071 (0.131)

(6)

Notes: Data are from home visits to a random sample of patients who bought nets at each price or received a net for free. Home visits were conducted for a subsample of patients roughly three to six weeks after their prenatal visit. Each column is an OLS regression of the dependent variable indicated by column on either the price of the ITN or an indicator variable for each price. All regressions include district fixed effects. Standard errors in parentheses are clustered at the clinic level. Given the small number of clusters (sixteen), the critical values for T -tests were drawn from a t-distribution with 14 (16 − 2) degrees of freedom. The specifications in columns (2) and (4) control for the number of days that have elapsed since the net was purchased, the number of days that have elapsed since the program was introduced at the clinic in which the net was purchased, and whether the woman has given birth already, is still pregnant, or miscarried, as well as the clinic controls in Table III.

Time controls Clinic controls Observations Sample mean of dep. var. R2 Intracluster correlation Joint F-test Prob > F

ITN price = 40 Ksh

ITN price = 20 Ksh

ITN price = 10 Ksh

ITN price

(1)

Respondent is currently using the ITN acquired through the program

TABLE V ITN USAGE RATES ACROSS PRICES, CONDITIONAL ON OWNERSHIP

FREE DISTRIBUTION OR COST SHARING?

29

30

QUARTERLY JOURNAL OF ECONOMICS

highly may have switched clinics in order to get a free net. We show in Online Appendix Table A.2 that, as with our demand estimates, usage rates among the subsample of women who did not switch clinics (i.e., attended the same prenatal clinic after our program was introduced as before it) are not different from the sample as a whole. Overall, one might be surprised that the level of net usage is not higher than 60%. This result might come from the fact that usage was measured a relatively short time after the net was purchased. In the usage regressions, the coefficients on time controls (not shown) suggest that usage increases as time passes after the ITN purchase. Among women not using the net, the most common reasons given for not using it were waiting for the birth of the child and waiting for another net (typically untreated with insecticide) to wear out. Dupas (2009a) finds that, among the general population, usage among both buyers and recipients of free ITNs is around 90% a year after the ITNs were acquired. Unconditional Usage: “Effective Coverage.” Although our estimates of usage rates among buyers suffer from small sample size imprecision, effective coverage (i.e., the fraction of the population using a program net) can be precisely estimated. Figure I presents effective coverage with program ITNs across ITN prices. The corresponding regression is presented in Table VI, column (1). The coefficient on price is −0.012, significant at the 1% level. This corresponds to a price elasticity of effective coverage of −0.44. The share of prenatal clients that are protected by an ITN under the free distribution scheme is 65%, versus 15% when ITNs are sold for 40 Ksh; this difference is significant at the 1% level (column (3)). The results are robust to the addition of clinic controls (columns (2) and (4)), and hold for all subgroups (data not shown). Overall, our results suggest that, at least in the Kenyan context, positive prices do not help generate higher usage intensity than free distribution. The absence of a selection effect on usage could be due to the nature of the good studied, which is probably valued very highly in areas of endemic malaria, particularly among pregnant women who want to protect their babies. The context in which the evaluation took place also probably contributed to the high valuation among those who didn’t have to pay. In particular, women had to travel to the health clinic for the prenatal visit and were told at the check-up about the importance

259 0.42 0.65 .02

−0.012 (0.003)∗∗∗

X X 259 0.42 0.65

−0.010 (0.002)∗∗∗

(2)

12.71 .00

259 0.42 0.65

−0.188 (0.123) −0.203 (0.097)∗ −0.504 (0.112)∗∗∗

(3)

8.12 .00

0.020 (0.145) −0.143 (0.104) −0.389 (0.095)∗∗∗ X X 259 0.42 0.65

(4)

Notes: Data are from random sample of patients who visited program clinics. Usage for those who acquired the ITNs was measured through home visits conducted roughly three to six weeks after their prenatal visit. Each column is an OLS regression of the dependent variable indicated by column on either the price of the ITN or an indicator variable for each price. All regressions include district fixed effects. Standard errors in parentheses are clustered at the clinic level. Given the small number of clusters (sixteen), the critical values for T -tests were drawn from a t-distribution with 14 (16 − 2) degrees of freedom. ∗∗∗ , ∗∗ , ∗ Significance at 1%, 5%, and 10% levels, respectively.

Time controls Clinic controls Observations Sample mean of dep. var. Mean in (ITN price = 0) group Intracluster correlation Joint F-test Prob > F

ITN price = 40 Ksh

ITN price = 20 Ksh

ITN price = 10 Ksh

ITN price

(1)

Respondent is currently using an ITN acquired through the program

TABLE VI EFFECTIVE COVERAGE: ITN USAGE RATES ACROSS PRICES, UNCONDITIONAL ON OWNERSHIP

FREE DISTRIBUTION OR COST SHARING?

31

32

QUARTERLY JOURNAL OF ECONOMICS

of protection against malaria. In addition, PSI has been conducting a very intense advertising campaign for ITN use throughout Kenya over the past five years. Last, the evaluation took place in a very poor region of Kenya, in which many households do not have access to credit and have difficulty affording even modest prices for health goods. Thus, a large number of prenatal clients may value ITNs but be unable to pay higher prices for them. IV.D. Are There Psychological Effects of Prices on Usage of ITNs? In this section, we test whether the act of paying itself can stimulate higher product use by triggering a sunk cost effect, when willingness to pay is held constant. We use data from the ex post price randomization conducted with a subset of women who had expressed their willingness to pay the posted price (in clinics charging a positive price). For those women, the transaction price ranged from “free” to the posted price they initially agreed to pay. Table VII presents estimates of the effect of price (columns (1) and (2)) and of the act of paying (columns (3)–(6)) on the likelihood of usage and likelihood that the ITN has been hung. These coefficients are from linear probability models with clinic fixed effects, estimated on the sample of women who visited a clinic where ITNs were sold at a positive price, decided to buy an ITN at the posted price, and were sampled to participate in the ex post lottery determining the transaction price they eventually had to pay to take the net home. Because the uptake of ITNs decreased sharply with the price, the sample we have at hand to test for the presence of sunk cost effects is small, and therefore the precision of the estimates we present below is limited. We find no psychological effect of price or the act of paying on usage, as expected from the earlier result that there is no overall effect of prices on usage. In column (1), the coefficient for price is negative, suggesting that higher prices could discourage usage, but the effect is not significant and cannot be distinguished from zero. The 95% confidence interval is (−0.0158; 0.0098), suggesting that a 10 Ksh increase in price could lead to anything from a decrease of sixteen to an increase of ten percentage points in usage. Larger effects on either side can be confidently rejected, however. Adding controls, including a dummy for having received a free ITN from the government in the previous year, does not reduce the standard error but decreases the coefficient of price further, enabling us to rule out sunk cost effects of more than seven percentage points per 10 Ksh increase in price (column (2)).

132 0.58

123 0.58 3.23 .00

−0.192 (0.100)∗ −0.234 (0.121)∗ 0.202 (0.102)∗∗ 0.148 (0.104) 0.000 (0.001) 0.015 (0.006)∗∗∗

−0.006 (0.006)

−0.003 (0.006)

132 0.58

−0.017 (0.100)

(3)

124 0.58 2.99 .01

−0.195 (0.122) 0.199 (0.103)∗ 0.184 (0.100)∗ 0.000 (0.001) 0.014 (0.006)∗∗

−0.072 (0.101)

(4)

123 0.58 3.60 .00

−0.065 (0.100) −0.191 (0.101)∗ −0.231 (0.122)∗ 0.202 (0.104)∗ 0.153 (0.104) 0.000 (0.001) 0.015 (0.006)∗∗∗

(5)

121 0.53 1.97 .07

−0.084 (0.099) −0.165 (0.102) −0.213 (0.125)∗ 0.121 (0.107) 0.063 (0.106) 0.000 (0.001) 0.011 (0.005)∗∗

(6)

ITN is visibly hanging

Notes: Standard errors in parentheses. Estimates are from linear probability models with clinic fixed effects, estimated on the sample of women who (1) visited a clinic where ITNs were sold at a positive price; (2) decided to buy an ITN at the posted price; and (3) were sampled to participate in the ex post lottery determining the transaction price they eventually had to pay to take the net home. The transaction prices ranged from 0 (free) to the posted price. Some of the individual control variables are missing for some respondents. ∗∗∗ , ∗∗ , ∗ Significance at 1%, 5%, and 10% levels, respectively.

Observations Sample mean of dep. var. F stat Prob > F

Time elapsed since ITN purchase

Time to clinic

First pregnancy

First prenatal visit

Still pregnant at time of follow-up

Got a free ITN the previous year

Transaction price > 0

Transaction price

(2)

(1)

Respondent is currently using the ITN acquired through the program

TABLE VII SUNK COST EFFECTS? ITN USAGE RATES ACROSS PRICES (CONDITIONAL ON OWNERSHIP), HOLDING WILLINGNESS TO PAY CONSTANT

FREE DISTRIBUTION OR COST SHARING?

33

34

QUARTERLY JOURNAL OF ECONOMICS

In column (3), the coefficient for the act of paying a positive price is also negative, suggesting that if the act of paying had any effect, it would decrease usage rather than increase it, but here again the coefficient cannot be confidently distinguished from zero. The 95% confidence interval for this estimate is quite large and suggests that a 10 Ksh increase in price could lead to anything from a decrease of 22 to an increase of 20 percentage points in usage. Overall, these results suggest that, in the case of ITNs marketed through health clinics, there is no large positive psychological effect of price on usage. We do not have data on baseline time preferences to check whether certain subgroups are more likely to exhibit a “sunk cost” effect. We also do not have data on what women perceived ex post as the price they paid for the ITN; we thus cannot verify that those who received a discount mentally “integrated” the two events (payment and discount) to “cancel” the loss, in the terms of Thaler (1985), or whether they “segregated” the two events and perceived the payment as a cash loss and the discount as a cash gain. If usage might not increase with price, what about the private benefits to the users? Is it the case that the users reached through the 40 Ksh distribution system are those who really need the ITN, whereas the additional users obtained through the free distribution will not benefit from using the ITN because they don’t need it as much (i.e., they are healthier, or can afford other means to protect themselves against malaria)? From a public health point of view, this issue might be irrelevant in the case of ITNs, given the important community-wide effects of ITN use documented in the medical literature cited earlier. Nevertheless, it is interesting to test the validity of the argument advanced by cost-sharing programs with respect to the private returns of ITN use. This is what we attempt to do in the next section. IV.E. Selection Effects of ITN Prices This section presents results on selection effects of positive prices on the health of patients who buy them. The argument that cost-sharing targets those who are more vulnerable by screening out women who appear to need the ITN less assumes that willingness to pay is the main factor in the decision to buy an ITN. In the presence of extreme poverty and weak credit markets, however, it is possible that people are not able (do not have the cash) to pay what they would be willing to pay in the absence of

FREE DISTRIBUTION OR COST SHARING?

35

credit constraints. The optimal subsidy level will have to be low enough to discourage women who do not need the product from buying it, although at the same time high enough to enable creditconstrained women to buy it if they need it. We focus our analysis on an objective measure of health among prenatal clients— their hemoglobin levels. Women who are anemic (i.e., with low hemoglobin levels) are likely the women with the most exposure and least resistance to malaria, and are likely the consumers that a cost-sharing program would want to target. To judge whether higher prices encourage sicker women to purchase nets, we study the impact of price on the health of “takers” (i.e., buyers and recipients of free nets) relative to the health of the prenatal clients attending control clinics. Figure III plots the cumulative density functions (CDFs) of hemoglobin levels for women buying/receiving a net at each price relative to women in the control group. The surprising result in Figure III is that the CDFs for women receiving free nets stochastically dominates the distribution in the control group, implying that women who get free nets are healthier than the average prenatal woman (Panel A). In contrast, the CDFs of hemoglobin levels of women who pay a positive price (whether 10, 20, or 40 Ksh) are indistinguishable from the CDFs of women in the control clinics (Panels B, C, and D). In other words, women who pay a higher price do not appear to be sicker than the average prenatal clients in the area.14 Why would it be that women who receive free nets appear substantially healthier, even though higher prices do not appear to induce selection of women who are sicker than the general prenatal population? Dupas (2005) shows that there is a strong incentive effect of free ITNs on enrollment for prenatal care. To test whether such an effect was at play in our experiment, Table VIII presents the average characteristics of prenatal clients in control clinics (column (1)), and, for each price group, how the average buyer diverges from the average woman in the control group (columns (2)– (5)). The results provide some evidence that the incentive effect of free ITNs was strong: women who came for free nets were 12% 14. For each price level, we test the significance of the differences in CDFs (compared to the control group) with the Kolmogorov–Smirnov equalityof-distributions test. Following Præstgaard (1995), we use the bootstrap method to adjust the p-values for clustering at the clinic level. The results of the tests are presented in the notes of Figure III. We can reject the null hypothesis of equality of distributions between women who receive free nets and those attending control clinics at the 10% significance level. We cannot reject the equality of distributions for women in the control population and those paying 10, 20, or 40 Ksh for an ITN.

0 0.2 0.4 0.6 0.8 1

Control

Free net

10 Hemoglobin level (g/dL)

15

5

Control

20 Ksh net

10 Hemoglobin level (g/dL)

15

C: Clients at control clinics vs. clients buying 20 Ksh net

5

A: Clients at control clinics vs. clients receiving free net

Control

10 Ksh net

10 Hemoglobin level (g/dL)

15

5

Control

40 Ksh net

10 Hemoglobin level (g/dL)

15

D: Clients at control clinics vs. clients buying 40 Ksh net

5

B: Clients at control clinics vs. clients buying 10 Ksh net

FIGURE III Cumulative Density of Hemoglobin Levels among ITN Recipients/Buyers The p-values for Kolmogorov–Smirnov tests of equality of distribution (adjusted for clustering at the clinic level by bootstrap) are .091 (Panel A), .385 (Panel B), .793 (Panel C), and .781 (Panel D). Number of observations: 198 (Panel A), 217 (Panel B), 208 (Panel C), and 139 (Panel D).

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

36 QUARTERLY JOURNAL OF ECONOMICS

37

FREE DISTRIBUTION OR COST SHARING? TABLE VIII CHARACTERISTICS OF PRENATAL CLIENTS BUYING/RECEIVING ITN RELATIVE TO CLIENTS OF CONTROL CLINICS Differences with control clinics

Mean in control clinics

0 Ksh (free)

10 Ksh ($0.15)

20 Ksh ($0.30)

40 Ksh ($0.60)

(1)

(2)

(3)

(4)

(5)

Respondent owns animal assets

Panel A. Characteristics of visit to prenatal clinic 0.48 −0.12 −0.02 0.03 0.02 0.50 (0.06)∗∗ (0.04) (0.06) (0.04) 0.73 −0.12 0.04 0.07 −0.16 0.45 (0.13) (0.07) (0.06) (0.08)∗ 4.58 3.52 0.79 −1.17 4.27 10.83 (3.29) (1.78) (1.37) (1.94)∗∗ 0.81 0.10 0.05 0.00 0.09 0.40 (0.03)∗∗∗ (0.05) (0.04) (0.02)∗∗∗ 0.61 0.06 0.07 −0.11 0.11 0.49 (0.12) (0.12) (0.12) (0.12) 0.19 0.00 0.01 0.12 0.07 0.39 (0.06) (0.05) (0.05)∗∗ (0.09)

Hemoglobin level (Hb), in g/dL Moderate anemia (Hb < 11.5 g/dL) Severe anemia (Hb ≤ 9 g/dL)

10.44 1.77 0.69 0.46 0.16 0.37

First prenatal visit for current pregnancy Walked to the clinic If took transport to clinic: price paid (Ksh) Can read Swahili Wearing shoes

Observations

110

Panel B. Health status 0.94 0.49 0.22 (0.34)∗∗ (0.49) (0.47) −0.18 −0.09 −0.08 (0.07)∗∗ (0.12) (0.10) −0.10 −0.01 0.07 (0.06) (0.07) (0.09) 98

120

99

0.48 (0.78) −0.05 (0.19) −0.06 (0.11) 28

Notes: For each variable, column (1) shows the mean observed among prenatal clients enrolling in control clinics; the standard deviations are presented in italics. Columns (2), (3), (4), and (5) show the differences between “buyers” in the clinics providing ITNs at 0, 10, 20, and 40 Ksh and prenatal clients enrolling in control clinics. Standard errors in parentheses are clustered at the clinic level; given the small number of clusters (sixteen), the critical values for T -tests were drawn from a t-distribution with 14 (16 − 2) degrees of freedom. ∗∗∗ , ∗∗ , ∗ Significance at 1%, 5%, and 10% levels, respectively.

more likely to be coming for a repeat visit and 12% less likely to have come by foot (i.e., more likely to have come by public transportation), and they paid about 3.5 Ksh more to travel to the clinic than women in the control group (Panel A). These results suggest that the free ITN distribution induced women who had come to the clinic before the introduction of the program to come back for a revisit earlier than scheduled, and therefore before the health benefits of their first prenatal visit had worn out.15 As a result, 15. In Kenya, pregnant women are typically given free iron supplements, as well as free presumptive malaria treatment, when they come for prenatal care. Both of these “treatments” have a positive impact on hemoglobin levels.

38

QUARTERLY JOURNAL OF ECONOMICS

as seen in Figure III, women receiving free nets are substantially less likely to be anemic (eighteen percentage points off of a base of 69% in Panel B of Table VIII).16 In absolute terms, however, the number of anemic women covered by an ITN is substantially greater under free distribution than under cost-sharing. As shown in Table VIII, the great majority of pregnant women in Kenya are moderately anemic (71%). All of them receive ITNs under free distribution, but only 40% of them invest in ITNs when the price is 40 Ksh (Table IV). Given that usage of the ITN (conditional on ownership) is similar across price groups, effective coverage of the anemic population is thus 60% lower under cost-sharing.17 Finally, it is interesting to note in Table VIII that women who bought nets for 40 Ksh were more likely to pay for transportation and paid more to come to the clinic than the control group. Women who paid 40 Ksh were also more likely to be literate, more likely to be wearing shoes, and more likely to report owning animal assets. Not all of these differences are statistically different from zero, given the small-sample problem, but overall these results are suggestive that selection under cost-sharing happened at least partially along wealth lines.18 V. COST-EFFECTIVENESS ANALYSIS This section presents estimates of the cost-effectiveness of each pricing strategy in terms of children’s lives saved. There are many benefits to preventing malaria transmission in addition to saving children’s lives, and restricting ourselves to child mortality will lead to conservative estimates of cost-effectiveness. An important dimension to keep in mind in the costeffectiveness analysis is the nonlinearity in the health benefits associated with ITN use: high-density ITN coverage reduces overall transmission rates and thus positively affects the health of both 16. Because some of the women who received free nets appear to have traveled farther and spent more money on travel to the clinic, one might expect that this group was composed of many switchers from nonprogram clinics. However, we find that the effects of price on selection in terms of health are unchanged for the subsample of women staying with the same clinic (Online Appendix Table A3). 17. The usage results in Table V hold when the sample is restricted to moderately anemic women (data not shown). 18. This hypothesis is supported by the fact that, when we compare the average client at 40 Ksh clinics (rather than the average buyer at these clinics) to the average control client, they are not more likely to have paid for transportation and paid no more for transportation than the control group (results not shown).

FREE DISTRIBUTION OR COST SHARING?

39

nonusers and users. The results of a 2003 medical trial of ITNs in western Kenya imply that “in areas with intense malaria transmission with high ITN coverage, the primary effect of insecticidetreated nets is via area-wide effects on the mosquito population and not, as commonly supposed, by simple imposition of a physical barrier protecting individuals from biting” (Hawley et al. 2003, p. 121). In this context, we propose the following methodology to measure the health impact of each ITN pricing scheme: we create a “protection index for nonusers” (a logistic function of the share of users in the total population) and a “protection index for users” (a weighted sum of a “physical barrier” effect of the ITN and the externality effect, the weights depending on the share of users). This enables us to compute the health impact of each pricing scheme on both users and nonusers and to (roughly) approximate the total number of child lives saved, as well as the cost per life saved. Because the relative importance of the “physical barrier” effect and of the externality are uncertain, we consider three possible values for the parameter of the logistic function predicting the protection index for nonusers (the “threshold externality parameter”) and three possible values for the effectiveness of ITNs as physical barriers. This gives us a total of 3 × 3 = 9 different scenarios and 9 different cost-per-life-saved estimates for each of the four pricing strategies. The cost-effectiveness estimates are presented in Table IX. These estimates are provided to enable comparisons across distribution schemes, but their absolute values should be taken with caution, as they rely on a number of coarse assumptions (the details of the calculations are provided in the Online Appendix). In particular, two key assumptions made are the following: (1) We assume that the only difference in cost per ITN between free distribution and cost-sharing is the difference in the subsidy. That is, we assume that an ITN given for free costs 40 Ksh more to the social planner than an ITN sold for 40 Ksh. We thus ignore money management costs associated with cost-sharing schemes. (2) We assume that 65% of households will experience a pregnancy within five years and be eligible for the ITN distribution program.19 The estimates in Table IX suggest that, under all nine scenarios we study, child mortality is reduced more under free distribution than any cost-sharing strategy (Panel A). This result is not 19. Making less conservative assumptions would increase the relative costeffectiveness of free distribution programs.

0 10 20 40

100.0 97.5 95.0 90.0

200 234 189 175

38 29 32 16

High (1)

Low (3)

High (4)

Medium (5)

Low (6)

Panel A. Child lives saved per 1,000 prenatal clients 37 36 30 27 24 28 26 20 16 13 30 28 22 19 15 14 12 11 8 6 Panel B. Cost per child life saved (US$) 206 212 255 284 321 251 270 348 421 531 200 213 274 325 399 201 235 261 339 483

Medium (2)

Medium Hypothesis on physical barrier effectiveness:

352 448 361 302

22 15 17 9

High (7)

460 609 487 418

17 11 12 7

Medium (8)

662 949 748 678

11 7 8 4

Low (9)

High Hypothesis on physical barrier effectiveness:

Notes: Each cell corresponds to a separate state of the world. To this date, existing medical evidence on the relative importance of the physical barrier provided by an ITN and on the externality threshold is insufficient to know which cells are closest to the actual state of the world. See Online Appendix for details on how these estimates were computed and the hypotheses they rely on.

0 10 20 40

ITN price (Ksh)

100.0 97.5 95.0 90.0

Subsidy level (%)

Low Hypothesis on physical barrier effectiveness:

Hypothesis on externality threshold:

TABLE IX COST-EFFECTIVENESS COMPARISONS

40 QUARTERLY JOURNAL OF ECONOMICS

FREE DISTRIBUTION OR COST SHARING?

41

surprising considering the large negative effect of cost-sharing on the share of ITN users in the population. Under the low threshold assumption for the externality effect, in terms of cost per life saved, we find that charging 40 Ksh is more cost-effective than free distribution if the physical barrier effect of ITNs is high (Panel B, column (1)). When the assumptions about the effectiveness of ITNs as physical barriers for their users are less optimistic, we find that free distribution becomes at least as cost-effective, if not more, than cost-sharing. Under the assumption of a “medium” externality threshold level, we find that free distribution could dominate cost-sharing in terms of cost-effectiveness (Panel B, columns (4)–(6)). Last, in the scenario where a large share of ITN users is necessary for a substantial externality to take place, we find that cost-sharing is again slightly cheaper than free distribution, unless the physical barrier effectiveness is very low. This is due to the fact that under the high threshold hypothesis, even free distribution to pregnant women is not enough to generate significant community-wide effects, because not all households experience a pregnancy. That said, given the very large standard errors on the usage estimates, the differences observed across schemes in cost per life saved typically cannot be distinguished from zero. The general conclusion of this cost-effectiveness exercise is thus that cost-sharing is at best marginally more cost-effective than free distribution, but free distribution leads to many more lives saved. VI. DISCUSSION AND CONCLUSION The argument that charging a positive price for a commodity is necessary to ensure that it is effectively used has recently gained prominence in the debate on the efficiency of foreign aid. The cost-sharing model of selling nets for $0.50 to mothers through prenatal clinics is believed to reduce waste because “it gets the nets to those who both value them and need them” (Easterly 2006, p. 13). Our randomized pricing experiment in western Kenya finds no evidence to support this assumption. We find no evidence that cost-sharing reduces wastage by sifting out those who would not use the net: pregnant women who receive free ITNs are no less likely to put them to intended use than pregnant women who pay for their nets. This suggests that costsharing does not increase usage intensity in this context. Although it doesn’t increase usage intensity, cost-sharing does considerably

42

QUARTERLY JOURNAL OF ECONOMICS

dampen demand: we find that the cost-sharing scheme ongoing in Kenya at the time of this study results in a coverage rate 75 percentage points lower than with a full subsidy. In terms of getting nets to those who need them, our results on selection based on health imply that women who purchase nets at cost-sharing prices are no more likely to be anemic than the average prenatal woman in the area. We also find that localized, short-lived free distribution programs disproportionately benefit healthier women who can more easily travel to the distribution sites. Although our results speak to the ongoing debate regarding the optimal subsidization level for ITNs—one of the most promising health tools available in public health campaigns in sub-Saharan Africa—they may not be applicable to other public health goods that are important candidates for subsidization. In particular, it is important to keep in mind that this study was conducted when ITNs were already highly valued in Kenya, thanks to years of advertising by both the Ministry of Health and Population Services International. This high ex ante valuation likely diminished the risk that a zero or low price be perceived as a signal of bad quality. Our findings are consistent with previous literature on the value of free products: in a series of lab experiments, both hypothetical and real, Ariely and Shampan’er (2007) found that when people have to choose between two products, one of which is free, charging zero price increases consumers’ valuation of the product itself, in addition to reducing its cost. In a recent study in Uganda, Hoffmann (2007) found that households that are told about the vulnerability of children to malaria on the day they acquire an ITN are more likely to use the ITN to protect their children when they receive it for free than when they have to pay for it. In a study conducted with the general Kenyan population, Dupas (2009b) randomly varied ITN prices over a much larger range (between $0 and $4), and also found no evidence that charging higher prices leads to higher usage intensity. Dupas (2009b) also found that the demand curve for ITNs remains unaffected by common marketing techniques derived from psychology (such as the framing of marketing messages, the gender of the person targeted by the marketing, or verbal commitment elicitation), further suggesting that the high price-elasticity of the demand for ITNs is driven mostly by budget constraints. Our finding that usage of ITNs is insensitive to the price paid to acquire them contrasts with the finding of Ashraf, Berry, and Shapiro (forthcoming), in which Zambian households that paid a

FREE DISTRIBUTION OR COST SHARING?

43

higher price for a water-treatment product were more likely to report treating their drinking water two weeks later. Their experimental design departs from ours in multiple ways that could explain the difference in findings. First, because the range of prices at which the product was offered in their experiment did not include zero, Ashraf, Berry, and Shapiro do not measure usage under a free distribution scheme. Second, in contrast to a bed net that can be used for three years before it wears out, the bottle of water disinfectant used in Ashraf, Berry, and Shapiro lasts for only about one month if used consistently to treat the drinking water of an average family; in this context, it is possible that households that purchased the water disinfectant but were not using it two weeks later had stored the bottle for later use (e.g., for the next sickness episode in their household or the next cholera outbreak), and therefore the evidence on usage in Ashraf, Berry, and Shapiro has a different interpretation from ours. In addition, the baseline level of information about the product (its effectiveness, how to use it) might have differed across experiments. Although ITN distribution programs that use cost-sharing are less effective and not more cost-effective than free distribution in terms of health impact, they might have other benefits. Indeed, they often have the explicit aim of promoting sustainability. The aim is to encourage a sustainable retail sector for ITNs by combining public and private sector distribution channels (Mushi et al. 2003; Webster, Lines, and Smith 2007). Our experiment does not enable us to quantify the potentially negative impact of free distribution on the viability of the retail sector and therefore our analysis does not consider this externality. Another important dimension of the debate on free distribution versus cost-sharing is the effect of full subsidies on the distribution system. In particular, the behavior of agents on the distribution side, notably health workers in our context, could depend on the level of subsidy. Although user fees can be used to incentivize providers (World Bank 2004), free distribution schemes have been shown to be plagued by corruption (in the form of diversion) among providers (Olken 2006). Our experiment focused on the demand side and was not powered to address this distribution question. As with most randomized experiments, we are unable to characterize or quantify the impact of the various possible distribution schemes when they have been scaled up and general equilibrium effects have set in. Our experimental results should thus be seen as one piece in the puzzle of how to increase uptake of effective, externality-generating health products in resource-poor settings.

44

QUARTERLY JOURNAL OF ECONOMICS

HARVARD SCHOOL OF PUBLIC HEALTH UNIVERSITY OF CALIFORNIA, LOS ANGELES

REFERENCES Alaii, Jane A., William A. Hawley, Margarette S. Kolczak, Feiko O. Ter Kuile, John E. Gimnig, John M. Vulule, Amos Odhacha, Aggrey J. Oloo, Bernard L. Nahlen, and Penelope A. Phillips-Howard, “Factors Affecting Use of PermethrinTreated Bed Nets during a Randomized Controlled Trial in Western Kenya,” American Journal of Tropical Medicine and Hygiene, 68 (2003), 137–141. Ariely, Dan, and Krsitina Shampan’er, “How Small Is Zero Price? The True Value of Free Products,” Marketing Science, 26 (2007), 742–757. Arkes, Hal R., and Catherine Blumer, “The Psychology of Sunk Cost,” Organizational Behavior and Human Decision Processes, 35 (1985), 124–140. Ashraf, Nava, James Berry, and Jesse Shapiro, “Can Higher Prices Stimulate Product Use? Evidence from a Field Experiment in Zambia,” American Economic Review, forthcoming. Bagwell, Kyle, and Michael H. Riordan, “High and Declining Prices Signal Product Quality,” American Economic Review, 81 (1991), 224–239. Binka, F. N., F. Indome, and T. Smith, “Impact of Spatial Distribution of Permethrin-Impregnated Bed Nets on Child Mortality in Rural Northern Ghana,” American Journal of Tropical Medicine and Hygiene, 59 (1998), 80– 85. Bloom, Erik, Indu Bhushan, David Clingingsmith, Elizabeth King, Michael Kremer, Benjamin Loevinsohn, Rathavuth Hong, and J. Brad Schwartz, “Contracting for Health: Evidence from Cambodia,” Brookings Institution Report, 2006. Cameron, A. Colin, Douglas Miller, and Jonah B. Gelbach, “Bootstrapped-Based Improvements for Inference with Clustered Errors,” Review of Economics and Statistics, 90 (2007), 414–427. D’Alessandro, Umberto, “Nationwide Survey of Bednet Use in Rural Gambia,” Bulletin of the World Health Organization, 72 (1994), 391–394. Donald, Stephen, and Kevin Lang, “Inference with Differences-in-Differences and Other Panel Data,” Review of Economics and Statistics, 89 (2007), 221–233. Dupas, Pascaline, “Short-Run Subsidies and Long-Term Adoption of New Health Products: Evidence from a Field Experiment,” Mimeo, UCLA, 2009a. ——, “What Matters (and What Does Not) in Households’ Decision to Invest in Malaria Prevention?” American Economic Review: Papers and Proceedings, 99 (2009b), 224–230. ——, The Impact of Conditional In-Kind Subsidies on Preventive Health Behaviors: Evidence from Western Kenya, unpublished manuscript, 2005. Easterly, William, The White Man’s Burden: Why the West’s Efforts to Aid the Rest Have Done So Much Ill and So Little Good (New York: Penguin Press, 2006). Evans, David B., Girma Azene, and Joses Kirigia, “Should Governments Subsidize the Use of Insecticide-Impregnated Mosquito Nets in Africa? Implications of a Cost-Effectiveness Analysis,” Health Policy and Planning, 12 (1997), 107–114. Fisher, Ronald A., The Design of Experiments (London: Oliver and Boyd, 1935). Gimnig, John E., Margarette S. Kolczak, Allen W. Hightower, John M. Vulule, Erik Schoute, Luna Kamau, Penelope A. Phillips-Howard, Feiko O. Ter Kuile, Bernard L. Nahlen, and William A. Hawley, “Effect of Permethrin-Treated Bed Nets on the Spatial Distribution of Malaria Vectors in Western Kenya,” American Journal of Tropical Medicine and Hygiene, 68 (2003), 115–120. Harvey Philipp D., “The Impact of Condom Prices on Sales in Social Marketing Programs,” Studies in Family Planning, 25 (1994), 52–58. Hawley, William A., Penelope A. Phillips-Howard, Feiko O. Ter Kuile, Dianne J. Terlouw, John M. Vulule, Maurice Ombok, Bernard L. Nahlen, John E. Gimnig, Simon K. Kariuki, Margarette S. Kolczak, and Allen W. Hightower, “Community-Wide Effects of Permethrin-Treated Bed Nets on Child Mortality and Malaria Morbidity in Western Kenya,” American Journal of Tropical Medicine and Hygiene, 68 (2003), 121–127.

FREE DISTRIBUTION OR COST SHARING?

45

Hoffmann, Vivian, “Psychology, Gender, and the Intrahousehold Allocation of Free and Purchased Mosquito Nets,” Mimeo, Cornell University, 2007. Karlan, Dean, and Jonathan Zinman, “Observing Unobservables: Identifying Information Asymmetries with a Consumer Credit Field Experiment,” Econometrica, forthcoming. Kremer, Michael, and Edward Miguel, “The Illusion of Sustainability,” Quarterly Journal of Economics, 112 (2007), 1007–1065. Lengeler, Christian, “Insecticide-Treated Bed Nets and Curtains for Preventing Malaria,” Cochrane Dabatase Syst Rev 2:CF000363, 2004. Lengeler, Christian, Mark Grabowsky, David McGuire, and Don deSavigny, “Quick Wins versus Sustainability: Options for the Upscaling of Insecticide-Treated Nets,” American Journal of Tropical Medicine and Hygiene, 77 (2007), 222–226. Lucas, Adrienne, “Economic Effects of Malaria Eradication: Evidence from the Malarial Periphery,” American Economic Journal: Applied Economics, forthcoming. Moulton, Brent R., “An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units,” Review of Economics and Statistics, 72 (1990), 334–338. Mushi, Adiel K., Jonna R. Schellenberg, Haji Mponda, and Christian Lengeler, “Targeted Subsidy for Malaria Control with Treated Nets Using a Discount Voucher System in Southern Tanzania,” Health Policy and Planning, 18 (2003), 163–171. Olken, Benjamin, “Corruption and the Costs of Redistribution: Micro Evidence from Indonesia,” Journal of Public Economics, 90 (2006), 853–870. Oster, Sharon, Strategic Management for Nonprofit Organizations: Theory and Cases (Oxford, UK: Oxford University Press, 1995). Population Services International [PSI], “What Is Social Marketing?” available online at http://www.psi.org/resources/pubs/what is smEN.pdf, 2003. Præstgaard, Jens P., “Permutation and Bootstrap Kolmogorov-Smirnov Test for the Equality of Two Distributions,” Scandinavian Journal of Statistics, 22 (1995), 305–322. Riley, John G., “Silver Signals: Twenty-Five Years of Screening and Signaling,” Journal of Economic Literature, 39 (2001), 432–478. Rosenbaum, Paul R., Observational Studies, (New York: Springer-Verlag, 2002). Sachs, Jeffrey, The End of Poverty: Economic Possibilities for Our Time (New York: Penguin, 2005). Schellenberg, Joanna A., Salim Abdulla, Rose Nathan, Oscar Mukasa, Tanya Marchant, Nassor Kikumbih, Adiel Mushi, Haji Mponda, Happiness Minja, and Hassan Mshinda, “Effect of Large-Scale Social Marketing of InsecticideTreated Nets on Child Survival in Rural Tanzania,” Lancet, 357 (2001), 1241–1247. Ter Kuile, Feiko O., Dianne J. Terlouw, Penelope A. Phillips-Howard, William A. Hawley, Jennifer F. Friedman, Simon K. Kariuki, Ya Ping Shi, Margarette S. Kolczak, Altaf A. Lal, John M. Vulule, and Bernard L. Nahlen, “Reduction of Malaria during Pregnancy by Permethrin-Treated Bed Nets in an Area of Intense Perennial Malaria Transmission in Western Kenya,” American Journal of Tropical Medicine and Hygiene, 68 (2003), 50–60. Thaler, Richard, “Toward a Positive Theory of Consumer Choice,” Journal of Economic Behavior and Organization, 1 (1980), 39–60. ——, “Mental Accounting and Consumer Choice,” Marketing Science, 4 (1985), 199–214. Webster, Jayne, Jo Lines, and Lucy Smith, “Protecting All Pregnant Women and Children under Five Years Living in Malaria Endemic Areas in Africa with Insecticide Treated Mosquito Nets,” World Health Organization Working Paper, available at http://www.who.int/malaria/docs/VulnerableGroupsWP.pdf, 2007. World Bank, World Development Report 2004: Making Services Work for Poor People (Washington, DC: World Bank and Oxford University Press, 2004). World Health Organization [WHO], “WHO Global Malaria Programme: Position Statement on ITNs,” available at http://www.who.int/malaria/docs/itn/ ITNspospaperfinal.pdf, 2007. World Malaria Report, available at http://www.who.int/malaria/wmr2008/ malaria2008.pdf, 2008.

SOPHISTICATED MONETARY POLICIES∗ ANDREW ATKESON VARADARAJAN V. CHARI PATRICK J. KEHOE In standard monetary policy approaches, interest-rate rules often produce indeterminacy. A sophisticated policy approach does not. Sophisticated policies depend on the history of private actions, government policies, and exogenous events and can differ on and off the equilibrium path. They can uniquely implement any desired competitive equilibrium. When interest rates are used along the equilibrium path, implementation requires regime-switching. These results are robust to imperfect information. Our results imply that the Taylor principle is neither necessary nor sufficient for unique implementation. They also provide a direction for empirical work on monetary policy rules and determinacy.

I. INTRODUCTION The now-classic Ramsey (1927) approach to policy analysis under commitment specifies the set of instruments available to policy makers and finds the best competitive equilibrium outcomes given those instruments. This approach has been adapted to situations with uncertainty, by Barro (1979) and Lucas and Stokey (1983), among others, by specifying the policy instruments as functions of exogenous events.1 Although the Ramsey approach has been useful in identifying the best outcomes, it needs to be extended before it can be used to guide policy. Such an extension must describe what would happen for every history of private agent actions, government policies, and exogenous events. It should also structure policy in such a way that policy makers can ensure that their desired outcomes occur. Here, we provide such an extended approach. To construct it, we extend the language of Chari and Kehoe (1990) in a natural fashion by describing private agent actions and government policies as functions of the histories of those actions and policies as well as of exogenous events. The key to our approach is our ∗ The authors thank the National Science Foundation for financial support and Kathleen Rolfe and Joan Gieseke for excellent editorial assistance. The views expressed herein are those of the authors and not necessarily those of the Federal Reserve Bank of Minneapolis or the Federal Reserve System. 1. The Ramsey approach has been used extensively to discuss optimal monetary policy. See, among others, the work of Chari, Christiano, and Kehoe (1996); Schmitt-Groh´e and Uribe (2004); Siu (2004); and Correia, Nicolini, and Teles (2008). C 2010 by the President and Fellows of Harvard College and the Massachusetts Institute of

Technology. The Quarterly Journal of Economics, February 2010

47

48

QUARTERLY JOURNAL OF ECONOMICS

requirement that for all histories, including those in which private agents deviate from the equilibrium path, the continuation outcomes constitute a continuation competitive equilibrium.2 We label such policy functions sophisticated policies and the resulting equilibrium a sophisticated equilibrium. If policies can be structured to ensure that the desired outcomes occur, then we say that the policies uniquely implement the desired outcome. Here we describe this approach and use it to analyze an important outstanding question in monetary economics: How should policy be designed in order to avoid indeterminacy and achieve unique implementation? It has been known, at least since the work of Sargent and Wallace (1975), that when interest rates are the policy instrument, many ways of specifying policy lead to indeterminate outcomes including multiple equilibria. Indeterminacy is risky because some of those outcomes can be bad, including hyperinflation. Researchers thus agree that designing policies that achieve unique implementation is desirable. Here we demonstrate that our sophisticated policy approach does that for monetary policy. We illustrate our approach in two standard monetary economies: a simple sticky-price model with one-period pricesetting and a sticky-price model with staggered price-setting (often referred to as the New Keynesian model). For both, we show that, under sufficient conditions, any outcome of a competitive equilibrium can be uniquely implemented by appropriately constructed sophisticated policies. In particular, the Ramsey equilibrium can be uniquely implemented. In the two model economies, we construct central bank policies that uniquely implement a desired competitive equilibrium in the same basic way. Along the equilibrium path, we choose the policies to be those given by the desired competitive equilibrium. We structure the policies off the equilibrium path, the reversion policies, to discourage deviations. Specifically, if the average choice of private agents deviates from that in the desired equilibrium, then we choose the reversion policies so that the optimal choice, or best response, of each individual agent is different from the average choice. One way to see why such reversion policies can eliminate multiplicity is to recall how multiple equilibria arise in the first 2. This requirement is the natural analog of subgame perfection in an environment in which private agents are competitive. In this sense, our equilibrium concept is the obvious one for our macroeconomic environment.

SOPHISTICATED MONETARY POLICIES

49

place. At an intuitive level, they arise if, when each agent believes that all other agents will choose some particular action other than the desired one, each agent finds it optimal to go along with the deviation by also picking that particular action. Our construction of reversion policies breaks the self-fulfilling nature of such deviations. It does so by ensuring that even if an agent believes that all other agents are choosing a particular action that differs from the desired action, the central bank policy makes it optimal for that agent not to go along with that deviation. When such reversion policies can be found, we say that the best responses are controllable. A sufficient condition for controllability is that policies can be found such that after a deviation the continuation equilibrium is unique and varies with policy. Variation with policy typically holds, so if policies can be found under which the continuation equilibrium is unique (somewhere), then we have unique implementation (everywhere). This sufficient condition suggests a simple way to state our message in a general way: uniqueness somewhere generates uniqueness everywhere. One concern with our construction of sophisticated policies is that it apparently relies on the idea that the central bank perfectly observes private agents’ actions and thus can detect any deviation. We show that this concern is unwarranted: our results are robust to imperfect information about private agents’ actions. Specifically, with imperfect detection of deviations, sophisticated policies can be designed that have unique equilibria that are close to the desired outcomes when the detection error is small and that converge to the desired equilibria as the detection error goes to zero. The approach proposed here suggests an operational guide to policy making: First use the Ramsey approach to determine the best competitive equilibrium, and then check whether in that situation, best responses are controllable. If they are, then sophisticated policies of the kind we have constructed can uniquely implement the Ramsey outcome. If best responses are not controllable, then the only option is to accept indeterminacy. Our work here is related to previous work on the problem of indeterminacy in monetary economies (Wallace 1981; Obstfeld and Rogoff 1983; King 2000; Benhabib, Schmitt-Groh´e, and Uribe 2001; Christiano and Rostagno 2001; Svensson and Woodford 2005). The previous work pursues an approach different from ours (and from that in the microeconomic literature on implementation); we call it unsophisticated implementation. The basic idea of that approach is to specify policies as functions of the history

50

QUARTERLY JOURNAL OF ECONOMICS

and check only to see whether the period-zero competitive equilibrium is unique. Unsophisticated implementation has been criticized in the macroeconomic and the microeconomic literature. For example, in the macroeconomic literature, Kocherlakota and Phelan (1999), Bassetto (2002), Buiter (2002), and Ljungqvist and Sargent (2004) criticize this general idea in the context of the fiscal theory of the price level; Bassetto (2005) criticizes it in the context of a simple tax example; and Cochrane (2007) criticizes it in the context of the literature on monetary policy rules. In the microeconomic literature, Jackson (2001) criticizes a related approach to implementation. In our view, unsophisticated implementation is deficient because it does not describe how the economy will behave after a deviation by private agents from the desired outcome. This deficiency leaves open the possibility that the approach achieves implementation via nonexistence. By this phrase, we mean an approach that specifies policy actions under which no continuation equilibrium exists after private agent deviations. We agree with those who argue that implementation via nonexistence trivializes the implementation problem. To see why it does, consider the following policy rule: If private agents choose the desired outcome, then continue with the desired policy; if private agents deviate from the desired outcome, then forever after set government spending at a high level and taxes at zero. Clearly, under this policy rule, any deviation from the desired outcome leads to nonexistence of equilibrium, and hence, we trivially have implementation via nonexistence. We find this way of achieving implementation unpalatable. Our approach, in contrast, insists that policies be specified such that a competitive equilibrium exists after any deviation. We achieve implementation in the traditional microeconomic sense— by discouraging deviations, not by nonexistence. In our approach, policies are specified so that even if an individual agent believes that all other agents will deviate to some specific action, that individual agent finds it optimal to choose a different action. Our approach not only ensures that the continuation equilibria always exist, but also has the desirable property that the reversion policies are not extreme in any sense. That is, after deviations, our reversion policies do not threaten the private economy with dire outcomes such as hyperinflation; they simply bring inflation back to the desired path.

SOPHISTICATED MONETARY POLICIES

51

Despite the shortcomings of the unsophisticated implementation approach, this literature has made two contributions that we find useful. One is the idea of regime-switching. This idea dates back at least to Wallace (1981) and has been used by Obstfeld and Rogoff (1983), Benhabib, Schmitt-Groh´e, and Uribe (2001), and Christiano and Rostagno (2001). The basic idea in, say, Benhabib, Schmitt-Groh´e, and Uribe (2001) is that if the economy embarks on an undesirable path, then the monetary and fiscal policy regime switches in such a way that the government’s budget constraint is violated, and the undesirable path is not an equilibrium. The other useful contribution of the literature on unsophisticated implementation is what Cochrane (2007) calls the King rule. This rule seeks to implement a desired equilibrium through an interest-rate policy that makes the difference between the interest rate and its desired equilibrium level a linear function of the difference between inflation and its desired equilibrium level, with a coefficient greater than 1. This idea dates back to at least King (2000) and has been used by Svensson and Woodford (2005). As we show here, the King rule, like other rules that use interest rates for all histories, namely, pure interest-rate rules, always leads to indeterminacy in our simple model and does so for a large class of parameters in our staggered price-setting model as well. We build on these two contributions by considering a King– money hybrid rule: When private agents deviate from the equilibrium path, the central bank uses the King rule for small deviations and switches regimes (from interest rates to money) for large deviations. Notice that with this rule, under our definition of equilibrium, outcomes return to the desired outcome path in the period after the deviation. In this sense, our hybrid rule achieves unique implementation without threatening agents with dire outcomes. Our work here is also related to another substantial literature that aims to find monetary policy rules which eliminate indeterminacy. (See, for example, McCallum [1981] and, more recently, Woodford [2003].) The recent literature argues that to achieve a unique outcome, interest-rate rules should follow the Taylor principle: interest rates relative to exogenously specified levels should rise more than one for one when inflation rates rise relative to their exogenously specified levels. We show here that adherence to the Taylor principle is neither necessary nor sufficient for unique implementation. It is not necessary because the sophisticated policy approach can uniquely

52

QUARTERLY JOURNAL OF ECONOMICS

implement any desired competitive equilibrium outcome, including outcomes in which, along the equilibrium path, the central bank follows an interest-rate rule that violates the Taylor principle. It is not sufficient because pure interest-rate rules may lead to indeterminacy even if they satisfy the Taylor principle. Notwithstanding these considerations, our analysis of the King–money hybrid rule does lend support to the idea that adherence to the Taylor principle can sometimes help achieve unique implementation. Specifically, this is true within the class of King– money hybrid rules when the Taylor principle is used in the region where the King part of the rules applies. Our findings also cast light on empirical investigations of determinacy based on the Taylor principle. We argue that, under the set of assumptions made explicit in the literature, inferences about determinacy based on existing estimation procedures should be treated skeptically. For our simple model economies, we provide assumptions under which such inferences can be confidently made. Although there is some hope that such inference may be possible in more interesting applied examples using variants of our assumptions, difficult challenges remain. Using sophisticated policies is our proposed way to eliminate indeterminacy when setting monetary policy. For some other re˜ Correia, cent proposals, see the work of Bassetto (2002) and Adao, and Teles (2007). II. A SIMPLE MODEL WITH ONE-PERIOD PRICE-SETTING We begin by illustrating the basic idea of our construction of sophisticated policies using a simple model with one-period price-setting. The dynamical system associated with the competitive equilibrium of this model is straightforward, which lets us focus on the strategic aspects of sophisticated policies. With this model, we demonstrate that any desired outcome of a competitive equilibrium can be uniquely implemented by sophisticated policies with reversion to a money regime. We show that pure interest-rate rules, which exclusively use interest rates as the policy instrument, cannot achieve unique implementation. Finally, we show that reversion to a particular hybrid rule, which uses interest rates as the policy instrument for small deviations and money for large deviations, can achieve unique implementation. The model we analyze here is a modified version of the basic sticky-price model with a New Classical Phillips curve (as in

SOPHISTICATED MONETARY POLICIES

53

Woodford [2003, Chap. 3, Sect. 1.3]). In order to make our results comparable to those in the literature, we here describe a simple, linearized version of the model. In Atkeson, Chari, and Kehoe (2009), we describe the general equilibrium version that, when linearized, produces the equilibrium conditions studied here. II.A. The Determinants of Output and Inflation Consider a monetary economy populated by a large number of identical, infinitely lived consumers, a continuum of producers, and a central bank. Each producer uses labor to produce a differentiated good on the unit interval. A fraction of producers j ∈ [0, α) are flexible-price producers, and a fraction j ∈ [α, 1] are sticky-price producers. In this economy, the timing within a period t is as follows. At the beginning of the period, sticky-price producers set their prices, after which the central bank chooses its monetary policy by setting one of its instruments, either interest rates or the quantity of money. Two shocks, ηt and νt , are then realized. We interpret the shock ηt as a flight to quality shock that affects the attractiveness of government debt relative to private claims and the shock νt as a velocity shock. At the end of the period, flexible-price producers set their prices, and consumers make their decisions. Now we develop necessary conditions for a competitive equilibrium in this economy and then, in the next section, formally define a competitive equilibrium. Here and throughout, we express all variables in log-deviation form. This way of expressing variables implies that none of our equations will have constant terms. Consumer behavior in this model is summarized by an intertemporal Euler equation and a cash-in-advance constraint. We can write the linearized Euler equation as (1)

yt = Et [ yt+1 ] − ψ (it − Et [πt+1 ]) + ηt ,

where yt is aggregate output, it is the nominal interest rate, ηt (the flight to quality shock) is an i.i.d. mean-zero shock with variance var(η), and πt+1 = pt+1 − pt is the inflation rate from time period t to t + 1 , where pt is the aggregate price level. The parameter ψ determines the intertemporal elasticity, and Et denotes the expectations of a representative consumer given that consumer’s information in period t, which includes the shock ηt .

54

QUARTERLY JOURNAL OF ECONOMICS

The cash-in-advance constraint, when first-differenced, implies that the relationships among inflation πt , money growth μt , and output growth yt − yt−1 are given by a quantity equation of the form (2)

πt = μt − (yt − yt−1 ) + νt ,

where νt (the velocity shock) is an i.i.d. mean-zero shock with variance var(ν). We turn now to producer behavior. The optimal price set by an individual flexible-price producer j satisfies p f t ( j) = pt + γ yt ,

(3)

where the parameter γ is the elasticity of the equilibrium real wage with respect to output (often referred to in the literature as Taylor’s γ ). The optimal price set by a sticky-price producer j satisfies (4)

pst ( j) = Et−1 [ pt + γ yt ] ,

where Et−1 denotes expectations at the beginning of period t before the shocks ηt and νt are realized. The aggregate price level pt is a linear combination of the prices p f t set by the flexible-price producers and the prices pst set by the sticky-price producers and is given by α 1 (5) pt = p f t ( j) dj + pst ( j) dj. 0

α

Using language from game theory, we can think of equations (3) and (4) as akin to the best responses of the flexible- and stickyprice producers given their beliefs about the aggregate price level and aggregate output. In this model, the flexible-price producers are strategically uninteresting. Their expectations about the future have no influence on their decisions; their prices are set mechanically according to the static considerations reflected in (3). Thus, in all that follows, equation (3) will hold on and off the equilibrium path, and we can think of p f t ( j) as being residually determined by (3) and substitute out for p f t ( j). To do so, substitute (3) into (5) and solve for pt to get 1 1 (6) pt = κ yt + pst ( j) dj, 1−α α where κ = αγ /(1 − α).

SOPHISTICATED MONETARY POLICIES

55

We follow the literature and express the sticky-price producers’ decisions in terms of inflation rates rather than price levels. To do so, let xt ( j) = pst ( j) − pt−1 , and rewrite (4) as (7)

xt ( j) = Et−1 [πt + γ yt ] .

For convenience, we define (8)

1 xt = 1−α

α

1

xt ( j) dj

to be the average price set by the sticky-price producers relative to the aggregate price level in period t − 1, so that we can rewrite (7) as (9)

xt = Et−1 [πt + γ yt ] .

We can also rewrite (6) as (10)

πt = κ yt + xt .

Consider now the setting of monetary policy in this model. When the central bank sets its policy, it has to choose to operate under either a money regime or an interest-rate regime. In the money regime, the central bank’s policy instrument is money growth μt ; it sets μt , and the nominal interest rate it is residually determined from the Euler equation (1) after the realization of the shock ηt . In the interest-rate regime, the central bank’s instrument is the interest rate; it sets it , and money growth μt is residually determined from the cash-in-advance constraint (2) after the realization of the shock νt . Of course, in both regimes, the Euler equation and the cash-in-advance constraint both hold. II.B. Competitive Equilibrium Now we define a notion of competitive equilibrium for the simple model in the spirit of the work of Barro (1979) and Lucas and Stokey (1983). In this equilibrium, allocations, prices, and policies are all defined as functions of the history of exogenous events, or shocks, st = (s0 , . . . , st ), where st = (ηt , νt ). Sticky-price producer decisions and aggregate inflation and output levels can be summarized by {xt (st−1 ), πt (st ), yt (st )}. In terms of the policies, we let the regime choice and the policy choice within the regime be δt (st−1 ) = (δ1t (st−1 ), δ2t (st−1 )), where the first argument δ1t (st−1 ) ∈ {M, I} denotes the regime choice, either money (M) or the interest rate (I), and the second argument

56

QUARTERLY JOURNAL OF ECONOMICS

denotes the policy choice within the regime, either money growth μt (st−1 ) or the interest rate it (st−1 ). If the money regime is chosen in t, then the interest rate is determined residually at the end of that period, whereas if the interest-rate regime is chosen in t, then the money growth rate is determined residually at the end of the period. Let {at (st )} = {xt (st−1 ), δt (st−1 ), πt (st ), yt (st )} denote a collection of allocations, prices, and policies in this competitive equilibrium. Such a collection is a competitive equilibrium given y−1 if it satisfies (i) consumer optimality, namely, (1) and (2) for all st ; (ii) optimality by sticky-price producers, namely, (9) for all st−1 ; and (iii) optimality by flexible-price producers, namely, (10) for all st . We also define a continuation competitive equilibrium starting from any point in time. For example, consider the beginning of period t with state variables st−1 and yt−1 . A collection of allocations, prices, and policies {a(st−1 , yt−1 )}r≥t = {xr (sr−1 | st−1 , yt−1 ), δr (sr−1 | st−1 , yt−1 ), πr (sr | st−1 , yt−1 ), yr (sr | st−1 , yt−1 )}r≥t is a continuation competitive equilibrium from (st−1 , yt−1 ) if it satisfies the three conditions of a competitive equilibrium above for all periods starting from (st−1 , yt−1 ). In this definition, we effectively drop the equilibrium conditions from period 0 through period t − 1. This notion of a continuation competitive equilibrium from the beginning of period t onward is very similar to that of a competitive equilibrium from the beginning of period 0 onward, except that the initial conditions are now given by (st−1 , yt−1 ). We define a continuation competitive equilibrium that starts at the end of period t from (st−1 , yt−1 , xt , δt , st ) in a similar way. This latter definition requires optimality by consumers and flexibleprice producers from st onward and optimality by sticky-price producers from st+1 onward. Note that this equilibrium must satisfy all the conditions of a continuation competitive equilibrium that starts at the beginning of period t, except for the sticky-price optimality condition in period t, namely, (9) in period t. Finally, a continuation competitive equilibrium starting at the beginning of period 0 is simply a competitive equilibrium. The following lemma proves that any competitive equilibrium gives rise to a New Classical Phillips curve along with some other useful properties of such an equilibrium.

SOPHISTICATED MONETARY POLICIES

57

LEMMA 1 (New Classical Phillips Curve and Other Useful Properties). Any competitive equilibrium must satisfy (11)

πt (st ) = κ yt (st ) + E[πt (st ) | st−1 ],

which is often referred to as the New Classical Phillips curve; E[yt (st ) | st−1 ] = 0 and xt (st−1 ) = E[πt (st ) | st−1 ]; and (12) (13) E[xt+1 (st ) | st−1 ] = E[πt+1 (st+1 ) | st−1 ] = it , where it = it (st−1 ) if the central bank uses an interest-rate regime in period t and it = it (st ) if the central bank uses a money regime in period t. Proof. To see that E[yt (st ) | st−1 ] = 0, take expectations of (10) into (9). Using this result in (10), we as of st−1 and substitute obtain xt (st−1 ) = E πt (st ) | st−1 . Substituting this result into (10) yields (11). To show (13), take expectations of the Euler equation (1) with respect to st−1 and use E[yt (st ) | st−1 ] = 0 along with the law of iterated expectations to get (13). QED A similar argument establishes that (11)–(13) hold for any continuation competitive equilibrium. II.C. Sophisticated Equilibrium We now turn to what we call sophisticated equilibrium. The definition of this concept is very similar to that for competitive equilibrium, except that here we allow allocations, prices, and policies to be functions of more than just the history of exogenous events; they are also functions of the history of both aggregate private actions and central bank policies. For sophisticated equilibrium, we require as well that for every history, the continuation of allocations, prices, and policies from that history onward constitutes a continuation competitive equilibrium. Setup and Definition. Before turning to our formal definition, we note that our definition of sophisticated equilibrium simply specifies policy rules that the central bank must follow; it does not require that the policy rules be optimal. We specify sophisticated policies in this way in order to show that our unique implementation result does not depend on the objectives of the central bank. We think of sophisticated policies as being specified at the beginning of period 0 and of the central bank as being committed to following them.

58

QUARTERLY JOURNAL OF ECONOMICS

We turn now to defining the histories that private agents and the central bank confront when they make their decisions. The public events that occur in a period are, in chronological order, qt = (xt ; δt ; st ; yt , πt ). Letting ht denote the history of these events from period −1 up to and including period t, we have that ht = (ht−1 , qt ) for t ≥ 0. The history h−1 = y−1 is given. For notational convenience, we focus on perfect public equilibria in which the central bank’s strategy (choice of regime and policy) is a function only of the public history. The public history faced by the sticky-price producers at the beginning of period t when they set their prices is ht−1 . A strategy for the sticky-price producers is a sequence of rules σx = {xt (ht−1 )} for choosing prices for every possible public history. The public history faced by the central bank when it chooses its regime and sets either its money-growth or interest-rate policy is hgt = (ht−1 , xt ). A strategy for the central bank {δt (hgt )} is a sequence of rules for choosing the regime as well as the policy within the regime, either μt (hgt ) or it (hgt ). Let σg denote that strategy. At the end of period t, then, output and inflation are determined as functions of the relevant history hyt according to the rules yt (hyt ) and πt (hyt ). We let σ y = {yt (hyt )} and σπ = {πt (hyt )} denote the sequence of output and inflation rules. Notice that for any history, the strategies σ induce continuation outcomes in the natural way. For example, starting at some history ht−1 , these strategies recursively induce outcomes {ar (sr | ht−1 ; σ )}. We illustrate this recursion for period t. The sticky-price producer’s decision in t is given by xt ( j, st−1 | ht−1 ; σ ) = xt (ht−1 ), where xt (ht−1 ) is obtained from σx . The central bank’s decision in t is given by δt (st−1 | ht−1 ; σ ) = δt (hgt ), where hgt = (ht−1 , xt (ht−1 )) and δt (hgt ) is obtained from σg . The consumer and flexible-price producer decisions in t are given by yt (st | ht−1 ; σ ) = yt (hyt ) and πt (st | ht−1 ; σ ) = πt (hyt ), where hyt = (ht−1 , xt (ht−1 ), δt (ht−1 , xt (ht−1 ))) and yt (hyt ) and πt (hyt ) are obtained from σ y and σπ . Continuing in a similar way, we can recursively define continuation outcomes for subsequent periods. We can likewise define continuation outcomes {ar (sr | hgt ; σ )} and {ar (sr | hyt ; σ )} following histories hgt and hyt , respectively. We now use these strategies and continuation outcomes to formally define our notion of equilibrium. A sophisticated equilibrium given the policies here is a collection of strategies (σx , σg ) and allocation rules (σ y , σπ ) such that (i) given any history ht−1 , the continuation outcomes {ar (sr | ht−1 ; σ )} induced by σ constitute

SOPHISTICATED MONETARY POLICIES

59

a continuation competitive equilibrium and (ii) given any history hyt , so do the continuation outcomes {ar (sr | hyt ; σ )}.3 Associated with each sophisticated equilibrium σ = (σg , σx , σ y , σπ ) are the particular stochastic processes for outcomes that occur along the equilibrium path, which we call sophisticated outcomes. These outcomes are competitive equilibrium outcomes. We will say a policy σg∗ uniquely implements a desired competitive equilibrium {at∗ (st )} if the sophisticated outcome associated with any sophisticated equilibrium of the form (σg∗ , σx , σ y , σπ ) coincides with the desired competitive equilibrium. A central feature of our definition of sophisticated equilibrium is our requirement that for all histories, including deviation histories, the continuation outcomes constitute a continuation competitive equilibrium. We think of this requirement as analogous to the requirement that in a subgame perfect equilibrium, the continuation strategies constitute a Nash equilibrium. This requirement constitutes the most important difference between our approach to determinacy and that in the macroeconomic literature. Technically, one way of casting that literature’s approach into our language of strategies and allocation rules is to consider the following notion of equilibrium. An unsophisticated equilibrium is a strategy for the central bank σg and allocations, policies, and prices {at (st )} = {xt (st−1 ), δt (st−1 ), πt (st ), yt (st )} such that {at (st )} is a period-zero competitive equilibrium and the policies induced by σg from {at (st )} coincide with {δt (st−1 )}. In our view, unsophisticated equilibrium is a deficient guide to policy. Although an unsophisticated equilibrium does tell policy makers what to do for every history, it does not specify what will happen under their policies for every history, in particular for deviation histories. Achieving implementation using the notion of unsophisticated equilibrium is, in general, trivial. As we explained earlier, one way of achieving implementation is via nonexistence: simply specify policies so that no competitive equilibrium exists after deviation histories. We find this way of achieving implementation uninteresting. 3. In general, a sophisticated equilibrium would require that for every history (including histories in which the government acts, hgt ), the continuation outcomes from that history onward constitute a competitive equilibrium. Here, that requirement would be redundant because the conditions for a competitive equilibrium for hgt are the same as those for hyt .

60

QUARTERLY JOURNAL OF ECONOMICS

Finally, to help avoid a common confusion, we stress that our definition does not require that, when there is a deviation in period t, the entire sequence starting from period 0, including the deviation in period t, constitute a period-zero competitive equilibrium. Indeed, if we achieve unique implementation, then such a sequence will not constitute a period-zero equilibrium. Implementation with Sophisticated Policies. We focus on implementing competitive equilibria with sophisticated policies in which the central bank uses interest rates along the equilibrium path. This focus is motivated in part by the observation that most central banks seem to use interest rates as their policy instruments. Another motivation is that if the variance of the velocity shock νt is large, then all of the outcomes under the money regime are undesirable. To set up our construction of sophisticated policies, recall that in our economy the only strategically interesting agents are the sticky-price producers. Their choices must satisfy a key property, that (14)

xt (ht−1 ) = E[πt (hyt ) + γ yt (hyt ) | ht−1 ],

where hyt = (ht−1 , xt (ht−1 ), δt (ht−1 , xt (ht−1 )), st ). Notice that xt (ht−1 ) shows up on both sides of equation (14), so we require that the optimal choice xt (ht−1 ) satisfy a fixed point property. To get some intuition for this property, suppose that each sticky-price producer believes that all other sticky-price producers will choose some value, say, xˆt . This choice, together with the central bank’s strategy ˆ yt ) and the inflation and output rules, induces the outcomes πt (h ˆ ˆ and yt (hyt ), where hyt = (ht−1 , xˆt , δt (ht−1 , xˆt ), st ). The fixed point property requires that for xˆt to be part of an equilibrium, each sticky-price producer’s best response must coincide with xˆt . The basic idea behind our sophisticated policy construction is that the central bank starts by picking any desired competitive equilibrium allocations and sets its policy on the equilibrium path consistent with them. The central bank then constructs its policy off the equilibrium path so that even if an individual agent believes that all other agents will deviate to some specific action, that individual agent finds it optimal to choose a different action. In this sense, the policies are specified so that the fixed point property is satisfied at only the desired allocations.

SOPHISTICATED MONETARY POLICIES

61

We now analyze several possible ways for a central bank to attempt the implementation of competitive equilibria in which it uses interest rates as its monetary policy instrument. With reversion to a money regime. We show first that in the simple sticky-price model, any competitive equilibrium in which the central bank uses the interest rate as its instrument in all periods can be uniquely implemented with sophisticated policies that involve a one-period reversion to money. Under these policies, after a deviation, the central bank switches to a money regime for one period. More precisely, fix a desired competitive equilibrium outcome path (xt∗ (st−1 ), πt∗ (st ), yt∗ (st )) together with central bank policies it∗ (st−1 ). Consider the following trigger-type policy: If sticky-price producers choose xt in period t to coincide with the desired outcomes xt∗ (st−1 ), then let central bank policy in t be it∗ (st−1 ). If not, and these producers deviate to some xˆt = xt∗ (st−1 ), then for that period t, let the central bank switch to a money regime with a suitably chosen level of money growth. This level of money growth makes it not optimal for any individual sticky–price setter to cooperate with the deviation. If such a level of money growth exists, we say that the best responses of the sticky–price setters are controllable. The following lemma shows that this property holds for our model. LEMMA 2 (Controllability of Best Responses with One-Period Price-Setting). For any history (ht−1 , xˆt ), if the central bank chooses the money regime, then there exists a choice for money growth μt such that (15)

ˆ yt ) + γ yt (h ˆ yt )], xˆt = E[πt (h

where hyt = (ht−1 , xˆt , M, μt ). Proof. Substituting (2) into (10), we have a result showing that if the central bank chooses the money regime with money growth μt , then output yt and inflation πt are uniquely determined and given by (16) (17)

μt + νt + yt−1 − xˆt , 1+κ πt = κ yt + xˆt . yt =

62

QUARTERLY JOURNAL OF ECONOMICS

Hence, ˆ yt ) + γ yt (h ˆ yt )] = E[πt (h

κ +γ (μt + yt−1 − xˆt ) + xˆt . 1+κ

Clearly, then, any choice of μt = xˆt − yt−1 will ensure that (15) holds. QED We use this lemma to guide our choice of the suitable money growth rate after deviations. We choose this growth rate to generate the same expected inflation as in the original equilibrium. (Of course, we could have chosen many other values that also would discourage deviations, but we found this value to be the most intuitive.4 ) In particular, if the producers deviate to some xˆt = xt∗ (st−1 ), then for that period t, let the central bank switch to a money regime with money growth set so that (18)

μt = xˆt − yt−1 +

1 + κ ∗ t−1 xt (s ) − xˆt ) . κ

Note that μt = xˆt − yt−1 . With such a money growth rate, expected inflation is the same in the reversion period as it would have been in the desired outcome. From Lemma 1, such a choice of xˆt cannot be part of an equilibrium. It is also easy to see that if a deviation occurs in period t, the economy returns to the desired outcomes in period t + 1. We have established the following proposition. PROPOSITION 1 (Unique Implementation with Money Reversion). Any competitive equilibrium outcome in which the central bank uses interest rates as its instrument can be implemented as a unique equilibrium with sophisticated policies with one-period reversion to a money regime. Moreover, under this rule, after any deviation in period t, the equilibrium outcomes from period t + 1 are the desired outcomes. A simple way to describe our unique implementation result is that controllability of best responses under some regime guarantees unique implementation of any desired outcome. We obtain controllability by reversion to a money regime. Note that even though the money regime is not used on the equilibrium path, it is useful as an off-equilibrium commitment that helps support 4. We choose this part of the policy as a clear demonstration that after a deviation, the central bank is not doing anything exotic, such as producing a hyperinflation. Rather, in an intuitive sense, the central bank is simply getting the economy back on the track it had been on before the deviation threatened to shift it in another direction.

SOPHISTICATED MONETARY POLICIES

63

desired outcomes in which the central bank uses interest rates on the equilibrium path. Notice also that the proposition implies that deviations lead to only very transitory departures from desired outcomes. In particular, we do not achieve implementation by threatening the economy with dire outcomes after deviations. (Note that the particular result, that the economy returns exactly to the desired outcomes in the period after the deviation, would not hold in a version of this model with state variables, such as capital.) So far we have focused on uniquely implementing competitive outcomes when the central bank uses interest rates as its instrument. Equations (16) and (17) imply that the equilibrium outcome under a money regime is unique, so that implementing desired outcomes is trivial when the central bank uses money as its instrument. Clearly, we can use a simple generalization of Proposition 1 to uniquely implement a competitive equilibrium in which the central bank uses interest rates in some periods and money in others. With pure interest-rate rules. Now, as a second possible way for a central bank to implement competitive equilibria, we analyze pure interest-rate rules. We find that this way cannot achieve unique implementation. We begin with a pure interest-rate rule of the form (19)

it (st−1 ) = it∗ (st−1 ) + φ(xt (st−1 ) − xt∗ (st−1 )),

where it∗ (st−1 ) and xt∗ (st−1 ) are the interest rates and the stickyprice producer choices associated with a competitive equilibrium that the central bank wants to implement uniquely, and the parameter φ represents how aggressively the central bank changes interest-rates when private agents deviate from the desired equilibrium. Notice that this rule (19) specifies policy both on and off the equilibrium path. On the equilibrium path, xt (st−1 ) = xt∗ (st−1 ), and the rule yields it (st−1 ) = it∗ (st−1 ). Off the equilibrium path, the rule specifies how it (st−1 ) should differ from it∗ (st−1 ) when xt (st−1 ) differs from xt∗ (st−1 ). Pure interest-rate rules of the form (19) have been discussed by King (2000) and Svensson and Woodford (2005). We follow Cochrane (2007) and call (19) the King rule. Note from Lemma 1 that xt (st−1 ) = E[πt (st ) | st−1 ], so that the King rule can be thought of as targeting expected inflation, in the

64

QUARTERLY JOURNAL OF ECONOMICS

sense that (19) is equivalent to (20)

it (st−1 ) = it∗ (st−1 ) + φ(E[πt (st ) | st−1 ] − E[πt∗ (st ) | st−1 ]).

We now show that if the central bank follows the King rule (19), it cannot ensure unique implementation of the desired outcome. Indeed, under this rule, the economy has a continuum of equilibria. More formally: PROPOSITION 2 (Indeterminacy of Equilibrium under the King Rule). Suppose the central bank sets interest rates it according to the simple economy’s King rule (19). Then any of the continuum of sequences indexed by the initial condition x0 and the parameter c that satisfies (21)

xt+1 = it + cηt , πt = xt + κ(1 + ψc)ηt , and yt = (1 + ψc)ηt

is a sophisticated outcome. Proof. In order to verify that the multiple outcomes that satisfy (21) are part of a period-zero competitive equilibrium, we need to check that they satisfy (1), (9), and (10). That they satisfy (9) follows by taking expectations of the second and third equations in (21). Substituting for it from (19) and for xt+1 from (21) into (1), we obtain that yt = (1 + ψc)ηt , as required by (21). Inspecting the expressions for πt and yt in (21) shows that they satisfy (10). Clearly, any such period-zero competitive equilibrium can be supported by a government strategy, σg , of the King rule form and QED appropriately chosen σx , σ y , and σπ . The intuitive idea behind the multiplicity of equilibria associated with the initial condition x0 is that interest-rate rules, including the King rule, induce nominal indeterminacy and do not pin down the initial price level. The intuitive idea behind the multiplicity of stochastic equilibria associated with c = 0 is that interest rates pin down only expected inflation and not the stateby-state realizations indexed by the parameter c. Note that Proposition 2 implies that even if the King rule parameter φ > 1, the economy has a continuum of equilibria. In that case, all but one of the equilibria has exploding inflation, in the sense that inflation eventually becomes unbounded. In the literature, researchers often restrict attention to bounded equilibria. We argue that, in this model, equilibria with exploding inflation

SOPHISTICATED MONETARY POLICIES

65

cannot be dismissed on logical grounds. Indeed, these equilibria are perfectly reasonable because the inflation explosion is associated with a money supply explosion. To see this association, suppose that the economy has no stochastic shocks and the desired outcomes are πt = 0 and yt = 0 in all periods. Then, from the cash-in-advance constraint (2), we know that the growth of the money supply is given by μt = xt = φ t x0 .

(22)

Thus, in these equilibria, inflation explodes because money growth explodes. Each equilibrium is indexed by a different initial value of the endogenous variable x0 . This endogenous variable depends solely on expectations of future policy and is not pinned down by any initial condition or transversality condition. Such equilibria are reasonable because at the core of most monetary models is the idea that the central bank’s printing of money at an ever-increasing rate leads to a hyperinflation. In these equilibria, inflation does not arise from the speculative reasons analyzed by Obstfeld and Rogoff (1983), but from the conventional money-printing reasons analyzed by Cagan (1956). In this sense, our model predicts, for perfectly standard and sensible reasons, that the economy can suffer from any one of a continuum of very undesirable paths for inflation. (Cochrane [2007] makes a similar point for a flexible-price model.) The same proposition obviously applies to more general interest-rate rules that are restricted to be the same on and off the equilibrium path. For example, Proposition 2 applies to linear feedback rules of the form (23)

it = ¯ıt +

∞ s=0

φxs xt−s +

∞ s=1

φ ys yt−s +

∞

φπs πt−s ,

s=1

where the intercept term ¯ıt can depend on the history of stochastic events. With reversion to a hybrid rule. Analysis of a third possible way to implement competitive equilibria is a bit more complicated. In Proposition 1, we have shown how reversion to a money regime can achieve unique implementation. In Proposition 2 and the subsequent discussion, we have shown that pure interest-rate rules, such as the King rule, cannot. Notice that in our money reversion policies, even tiny deviations trigger a reversion to a money

66

QUARTERLY JOURNAL OF ECONOMICS

regime. A natural question arises: Can unique implementation be achieved using a combination of these two strategies, or a hybrid rule, specifying, for example, that the central bank continue to use interest rates unless the deviations are very large and then revert to a money regime? The answer is yes. To see this, consider a particular hybrid rule that is intended to implement a bounded competitive equilibrium {xt∗ (st−1 ), πt∗ (st ), yt∗ (st )} with an associated interest rate it∗ (st−1 ). Fix some x¯ and x which satisfy x¯ > maxt xt∗ (st−1 ) and x < mint xt∗ (st−1 ). What we will call the King–money hybrid rule specifies that if xt (st−1 ) is within ¯ then the central bank follows a the interest-rate interval [x, x], King rule of the form (19); and if xt (st−1 ) falls outside this interval, then the central bank reverts to a money regime and chooses the money growth rate that produces an expected inflation rate π¯ ∈ ¯ That the money growth rate can be so chosen follows from [x, x]. (16) and (17). We show that an attractive feature of outcomes under this hybrid rule is that deviations from the desired path lead only to very transitory movements away from the desired path. More precisely, after any deviation in period t, even though inflation and output in period t may differ from the desired outcomes, those in subsequent periods coincide with the desired outcomes. More formally: PROPOSITION 3 (Unique Implementation with a Hybrid Rule). In the simple economy, the King–money hybrid rule with φ > 1 uniquely implements any bounded competitive equilibrium. Moreover, under this rule, after any deviation in period t, the equilibrium outcomes from period t + 1 are the desired outcomes. We prove this proposition in the Appendix. Here we simply sketch the argument for a deterministic version of the model. The key to the proof is a preliminary result that shows that no ¯ To see equilibrium outcome xt can be outside the interval [x, x]. that this is true, suppose that in some period t, xt is outside that interval. But when this is true, the hybrid rule specifies a money growth rate in that period that yields expected inflation inside the interval. Because xt equals expected inflation, this gives a contradiction and proves the preliminary result. To establish uniqueness, suppose that there is some sophisticated equilibrium with xˆr = xr∗ for some r. From the prelimi¯ where the King rule nary result, xˆr must be in the interval [x, x]

SOPHISTICATED MONETARY POLICIES

67

is operative. From Lemma 1, we know that in any equilibrium, it = xt+1 , so that the King rule implies that ∗ xˆt+1 − xt+1 = φ xˆt − xt∗ = φ t−r (xˆr − xr∗ ). Because φ > 1 and xt∗ is bounded, eventually xˆt+1 must leave the ¯ which is a contradiction. interval [x, x], Extension to Interest-Elastic Money Demand. So far, to keep the exposition simple, we have assumed a cash-in-advance setup in which money demand is interest-inelastic. This feature of the model implies that if a money regime is adopted in some period t, then the equilibrium outcomes in that period are uniquely determined by the money growth rate in that period. This uniqueness under a money regime is what allows the central bank to switch to a one-period money regime in order to support any desired competitive equilibrium. Now we consider economies with interest-elastic money demand. We argue that under appropriate conditions, our unique implementation result extends to such economies. When economies have interest-elastic money demand, sophisticated policies that specify reversion to money or to a hybrid rule can uniquely implement any desired outcome if best responses are controllable. A sufficient condition for such controllability is that competitive equilibria are unique under a suitably chosen money regime. Here, as with inelastic money demand, the uniqueness under a money regime is what enables unique implementation. A sizable literature has analyzed the uniqueness of competitive equilibria under money growth policies with interest-elastic money demand. Obstfeld and Rogoff (1983) and Woodford (1994) provide sufficient conditions for this uniqueness. For example, Obstfeld and Rogoff (1983) consider a money-in-the-utilityfunction model with preferences of the form u(c) + v(m), where c is consumption and m is real money balances, and show that a sufficient condition for uniqueness under a money regime is lim mv (m) > 0.

m→0

Obstfeld and Rogoff (1983) focus attention on flexible-price models, but their results can be readily extended to our simple sticky-price model. Indeed, their sufficient conditions apply unchanged to a deterministic version of that model because our model without shocks is effectively identical to a flexible-price model. Hence, under appropriate sufficient conditions, our unique

68

QUARTERLY JOURNAL OF ECONOMICS

implementation result extends to environments with interestelastic money demand. More generally, for our hybrid rule to uniquely implement desired outcomes, we need a reversion policy that has a unique equilibrium. An alternative to a money regime is a commodity standard such as those in the work of Wallace (1981) and Obstfeld and Rogoff (1983). With this type of standard, the government promises to redeem money for goods for some arbitrarily low price and finances the supply of goods with taxation. An alternative to our hybrid rule with money reversion is, therefore, a hybrid rule with reversion to a commodity standard.

III. A MODEL WITH STAGGERED PRICE-SETTING We turn now to a version of our simple model with staggered price-setting, often referred to as the New Keynesian model. We show that, along the lines of the argument developed above, policies with infinite reversion to either a money regime or a hybrid rule can uniquely implement any desired outcome under an interest-rate regime. We also show that for a large class of economies, pure interest-rate rules of the King form still lead to indeterminacy. To make our points in the simplest way, we abstract from aggregate uncertainty. III.A. Setup and Competitive Equilibrium We begin by setting up the model with staggered price-setting. In the model, prices are set in a staggered fashion as in the work of Calvo (1983). At the beginning of each period, a fraction 1 − α of producers are randomly chosen and allowed to reset their prices. After that, the central bank makes its decisions, and then, finally, consumers make theirs. This economy has no flexible-price producers. The linearized equations in this model are similar to those in the simple model. The Euler equation (1) and the quantity equation (2) are unchanged, except that here they have no shocks. The price set by a producer permitted to reset its price is given by the analog of (4), which is (24)

pst ( j) = (1 − αβ)

∞ r=0

(αβ)r−t (γ yr + pr ) ,

SOPHISTICATED MONETARY POLICIES

69

where β is the discount factor. Here, again, Taylor’s γ is the elasticity of the equilibrium real wage with respect to output. Letting pst denote the average price set by producers permitted to reset their prices in period t, we can recursively rewrite this equation as (25)

pst ( j) = (1 − αβ) (γ yt + pt ) + αβpst+1 ,

together with a type of transversality condition limT →∞ (αβ)T psT ( j) = 0. The aggregate price level can then be written as (26)

pt = αpt−1 + (1 − α) pst .

To make our analysis parallel to the literature, we again translate the decisions of the sticky-price producers from price levels to inflation rates. Letting xt ( j) = pst ( j) − pt−1 and letting xt denote the average of xt ( j), with some manipulation we can rewrite (25) as (27)

xt = (1 − αβ)γ yt + πt + αβxt+1 .

We can also rewrite (26) as (28)

πt = (1 − α)xt

and the transversality condition as limT →∞ (αβ)T xt ( j) = 0. Using (28) and the fact that xt is the average of xt ( j) implies this condition is equivalent to (29)

lim (αβ)t πt = 0.

t→∞

In addition to these conditions, we now argue that in this staggered price-setting model, a competitive equilibrium must satisfy two boundedness conditions. In general, boundedness conditions are controversial in the literature. Standard analyses of New Keynesian models impose strict boundedness conditions: in any reasonable equilibrium, both output and inflation must be bounded both above and below. Cochrane (2007) has forcefully criticized this practice, arguing that any boundedness condition must have a solid economic rationale. Here we provide rationales for two such conditions: output yt must be bounded above, so that (30)

yt ≤ y¯

for some y¯ ,

70

QUARTERLY JOURNAL OF ECONOMICS

and interest rates must be bounded below, so that (31)

it ≥ i

for some i.

The rationale for output being bounded above is that the economy has a finite amount of labor to produce the output. The rationale for requiring that interest rates be bounded below comes from the restriction that the nominal interest rate must be nonnegative.5 These bounds allow outcomes in which (the log of) output, yt , falls without bound (so that the level of output converges to zero). The bounds also allow for outcomes in which inflation rates explode upward without limit. Here, then, a collection of allocations, prices, and policies at = {xt , δt , πt , yt } is a competitive equilibrium if it satisfies (i) consumer optimality, namely, the deterministic versions of (1) and (2); (ii) sticky-price producer optimality, (27)–(29); and (iii) the boundedness conditions, (30) and (31). Note that any allocations that satisfy (27)–(29) also satisfy the New Keynesian Phillips curve, (32)

πt = κ yt + βπt+1 ,

where now κ = (1 − α)(1 − αβ)γ /α. To see this result, use (28) to substitute for xt and xt+1 in (27) and collect terms. Here, as we did in the simple-sticky price model, we define continuation competitive equilibria. For example, consider the beginning of period t with a state variable yt−1 . A collection of allocations a(yt−1 ) = {xr (yt−1 ), δr (yt−1 ), πr (yt−1 ), yr (yt−1 )}r≥t is a continuation competitive equilibrium with yt−1 if it satisfies the three conditions of a competitive equilibrium above in all periods r ≥ t. A continuation competitive equilibrium that starts at the end of period t given (yt−1 , xt , δt ) is defined similarly. This definition requires optimality by consumers from t onward and optimality by sticky-price producers from t + 1 onward. III.B. Sophisticated Equilibrium We turn now to sophisticated equilibrium in the staggered price-setting model, its definition and how it can be implemented. 5. Note that even though the real value of consumer holdings of bonds must satisfy a transversality condition, this condition does not impose any restrictions on the paths of yt and πt . The reason is that in our nonlinear model, the government has access to lump-sum taxes, so that government debt can be arbitrarily chosen to satisfy any transversality condition.

SOPHISTICATED MONETARY POLICIES

71

Definition. The definition of a sophisticated equilibrium in the staggered price-setting model parallels that in the simple sticky-price model. The elements needed for that definition are basically the same. The public events that occur in a period are, in chronological order, qt = (xt ; δt ; yt , πt ). We let ht−1 denote the history of these events up until the beginning of period t. A strategy for the sticky-price producers is a sequence of rules σx = {xt (ht−1 )}. The public history faced by the central bank is hgt = (ht−1 , xt ) and its strategy, {δt (hgt )}. The public history faced by consumers in period t is hyt = (ht−1 , xt , δt ). We let σ y = {yt (hyt )} and σπ = {πt (hyt )} denote the sequences of output and inflation rules. Strategies and allocation rules induce continuation outcomes written as {ar (ht−1 ; σ )}r≥t or {a(hyt ; σ )}r≥t in the obvious recursive fashion. Formally, then, a sophisticated equilibrium given the policies here is a collection of strategies (σx , σg ) and allocation rules (σ y , σπ ) such that (i) given any history ht−1 , the continuation outcomes {ar (ht−1 ; σ )}r≥t induced by σ constitute a continuation competitive equilibrium and (ii) given any history hyt , so do the continuation outcomes {ar (hyt ; σ )}r≥t . In this model, as in the simple sticky-price model, the choices of the sticky-price producers must satisfy a key fixed point property, that (33)

xt (ht−1 ) = (1 − αβ)γ yt (hyt ) + πt (hyt ) + αβxt+1 (ht ),

where hyt = (ht−1 , xt (ht−1 ), δt (ht−1 , xt (ht−1 ))) and ht = (hyt , πt (hyt ), yt (hyt )). Here, as in the simple sticky-price model, xt (ht−1 ) shows up on both sides of the fixed point equation—on the right side, through its effect on the histories hyt and ht . Implementation with Sophisticated Policies. We now show that in the staggered price-setting model, any competitive equilibrium can be uniquely implemented with sophisticated policies. The basic idea behind our construction is, again, that the central bank starts by picking any competitive equilibrium allocations and sets its policy on the equilibrium path consistent with those allocations. The central bank then constructs its policy off the equilibrium path so that any deviations from these allocations would never be a best response for any individual price-setter. In so doing, the constructed sophisticated policies support the chosen allocations as the unique equilibrium allocations. As we did with the simple model, here we show that, under sufficient conditions, policies that specify infinite reversion

72

QUARTERLY JOURNAL OF ECONOMICS

to a money regime can achieve unique implementation, a pure interest-rate rule of the King rule form cannot, and a King–money hybrid rule can. With reversion to a money regime. We start with sophisticated policies that specify reversion to a money regime after deviations. In our construction of sophisticated policies, we assume that the best responses of sticky-price producers are controllable in that if they deviate by setting xˆt = xt∗ , then by infinitely reverting to the money regime, the central bank can set money growth rate policies so that the profit-maximizing value of xt ( j) is such that xt ( j) = xˆt . The sophisticated policy that supports a desired outcome is to follow the chosen monetary policy as long as private agents have not deviated from the desired outcome. If sticky-price producers ever deviate to some choice xˆt , the central bank switches to a money regime set such that xt ( j) = xˆt . The following proposition follows immediately: PROPOSITION 4 (Unique Implementation with Money Reversion). If the best responses of the sticky-price producers are controllable, then any competitive equilibrium outcome in which the central bank uses interest rates as its instrument can be implemented as a unique equilibrium by sophisticated policies which specify reversion to a money regime. A sufficient condition for best responses to be controllable is that in the nonlinear economy, preferences are given by U (c, l) = log c + b(1 − l), where c is consumption and l is labor supply, so that in the linearized economy, Taylor’s γ equals one. To demonstrate controllability, suppose that after a deviation, the central bank reverts to a constant money supply m = log M. With a constant money supply, it is convenient to use the original formulation of the economy with price levels rather than inflation rates. With that translation, the cash-in-advance constraint implies that yr + pr = m for all r, so that (24) implies that the producer’s price is simply to set ∞ (αβ)r−t m = m. (34) pst ( j) = (1 − αβ) r=0

That is, if after a deviation the central bank chooses a constant level of the money supply m, then sticky-price producers optimally

SOPHISTICATED MONETARY POLICIES

73

choose their prices to be m. Clearly, (34) implies that the best responses of these producers are controllable. For example, consider ∗ to a history in which price-setters in period t deviate from pst pˆ st . Obviously, the central bank can choose the level of the money supply so that the optimal choice for an individual price-setter becomes pst ( j) = pˆ st , so that xt ( j) = m − pt−1 = xˆt . With pure interest-rate rules. Now, as with the simple model, we turn to pure interest-rate rules such as the King rule. For the staggered price-setting model, we ask, can such rules uniquely implement bounded competitive equilibrium? We find that for a large class of parameter values, the answer is, again, no. We arrive at this answer by first showing that under the King rule, the economy has a continuum of period-zero competitive equilibria. We then argue that associated with each competitive equilibrium is a sophisticated equilibrium. Here, we write the King rule as (35)

it = it∗ + φ(1 − α)(xt − xt∗ ),

where it∗ and πt∗ are the interest rates and the inflation rates associated with the desired (bounded) competitive equilibrium. From (28), it follows that in all periods, inflation and the aggregate price-setting choice are mechanically linked by πt = (1 − α)xt . This mechanical link means that we can equally well think of policy as feeding back on either inflation or the price-setting choice, so that (35) is equivalent to (36)

it = it∗ + φ(πt − πt∗ ).

Now we show that the economy has a continuum of competitive equilibria by showing that there is a continuum of solutions to (1), (32), and (36) and that these solutions do not violate the transversality and boundedness conditions (29), (30), and (31). Expressing the variables as deviations from the desired equilibrium is convenient. To that end, let π˜ t = πt − πt∗ and y˜t = yt − yt∗ . Subtracting the equations governing {πt∗ , yt∗ } from those governing {πt , yt } gives a system governing {π˜ t , y˜t } that satisfies (1), (32), and (36). Substituting for ˜ıt in (1), using (36), we get that (37)

y˜t+1 + ψ π˜ t+1 = y˜t + ψφ π˜ t ,

74

QUARTERLY JOURNAL OF ECONOMICS

and from (32) we have that (38)

π˜ t = κ y˜t + β π˜ t+1 .

Equations (37) and (38) define a dynamical system. Letting zt = ( y˜t , π˜ t ) , with some manipulation we can stack these equations to give zt+1 = Azt , where ⎡ ⎤ a b ⎢ ⎥ A = ⎣ −κ 1 ⎦ β

β

and where a = 1 + κψ/β and b = ψ(φ − 1/β). This system has a continuum of solutions of the form y˜t = λt1 ω1 + λt2 ω2 and λ1 − a λ2 − a ω1 + λt2 ω2 , π˜ t = λt1 b b where λ1 < λ2 , the eigenvalues of A, are given by 2 1 + κψ 1 1 + κψ κψ 1 λ1 , λ2 = +1 ± − 1 − 4(φ − 1) , 2 β 2 β β (40)

(39)

1 ) y˜0 + π˜ 0 ]/, where is and ω1 = [( λ2b−a ) y˜0 − π˜ 0 ]/ and ω2 = [( a−λ b the determinant of A.6 This continuum of solutions is indexed by y˜0 and π˜ 0 . In the Appendix, we show that for a class of economies that satisfy the restriction

(41)

1 − κψ < β and α(1 + κψ) < 1,

equilibrium is indeterminate under the King rule. We can think of (41) as requiring that the period length is sufficiently short, in the sense that β is close enough to 1, and that the price stickiness is not too large, in the sense that α is sufficiently small. Formally, in the Appendix, we prove the following proposition: PROPOSITION 5 (Indeterminacy of Equilibrium under the King Rule). Suppose that the central bank sets interest rates it according to the King rule (35) with φ > 1 and that (41) is 6. Here and throughout, we restrict attention to values of φ ∈ [0, φmax ], where φmax is the largest value of φ that yields real eigenvalues. That is, at φmax , the discriminant in (40) is zero.

SOPHISTICATED MONETARY POLICIES

75

satisfied. Then the economy has a continuum of competitive equilibria indexed by y0 ≤ y0∗ , (42)

yt = yt∗ + λt2 (y0 − y0∗ )

and

πt = πt∗ + λt2 c(y0 − y0∗ ),

where λ2 > 1 and c = (λ2 − a)/b < 0 are constants. It is immediate to construct a sophisticated equilibrium for each of the continuum of competitive equilibria in (42). Notice that under the King rule, there is one equilibrium with yt = yt∗ and πt = πt∗ for all t, and in the rest, yt goes to minus infinity and πt to plus infinity. All of these equilibria satisfy the boundedness conditions (30) and (31) and, under (41), the transversality condition (29). It turns out that if the inequality in the second part of (41) is reversed, then the set of solutions to the New Keynesian dynamical system, (1), (28), (32), and (35), has the form (42), but the transversality condition rules out all solutions except the one with yt = yt∗ and πt = πt∗ for all t. We find this way of ruling out solutions unappealing because it hinges critically on the idea that sticky-price producers may be unable to change their prices for extremely long periods, even in the face of exploding inflation. With reversion to a hybrid rule. We now show that in the staggered price-setting model, as in the simple model, a King– money hybrid rule can uniquely implement any bounded competitive equilibrium. To do so in this model, we will assume boundedness under money, namely, that for any state variable yt−1 there exists a money regime from period t onward such that a continuation competitive equilibrium exists, and for all such equilibria, inflation in period t, πt , is uniformly bounded. Here uniformly bounded means that there exist constants π and π¯ such that for all yt−1 , πt ∈ [π, π¯ ]. It is immediate that a sufficient condition for boundedness under money is that preferences in the nonlinear economy are given by U (c, l) = log c + b(1 − l). In an economy that satisfies boundedness under money, the King–money hybrid rule that implements a competitive equilibrium {xt∗ , πt∗ , yt∗ } with an associated interest rate it∗ is defined as follows. Set x¯ to be greater than both maxt xt∗ and π¯ , and set x to be lower than both mint xt∗ and π . This rule specifies that if ¯ then the central bank follows a King rule of the form xt ∈ [x, x],

76

QUARTERLY JOURNAL OF ECONOMICS

(35) with φ > 1. If xt falls outside the interval [x, x], ¯ then the central bank reverts to a money regime forever. PROPOSITION 6 (Unique Implementation with a Hybrid Rule). Suppose the staggered price-setting economy satisfies boundedness under money. Then the King–money hybrid rule implements any desired bounded competitive equilibrium. Moreover, under this rule, after any deviation in period t, the equilibrium outcomes from period t + 1 are the desired outcomes. The formal proof of this proposition is in the Appendix. The key idea of this proof is the same as that for this proof of Proposition 3. The idea is that under the King rule, any xˆt that does not equal xt∗ leads subsequent price-setting choices to eventually ¯ But given boundedness under money, leave the interval [x, x]. ¯ cannot be part price-setting choices outside of the interval [x, x] of an equilibrium. Note that with the staggered price-setting model, as with the simple model, under a hybrid rule, deviations lead to only very transitory departures from desired outcomes.

IV. TREMBLES AND IMPERFECT INFORMATION We have shown that in both of the models we have analyzed— a simple one-period price-setting model and a staggered pricesetting model—any equilibrium outcome can be implemented as a unique equilibrium with sophisticated policies. In our equilibria, deviations in private actions lead to changes in the regime. This observation leads to the question of how to construct sophisticated policies if trembles in private actions occur or if deviations in private actions can be detected only imperfectly, say, with measurement error. We show that we can achieve unique implementation with trembles. We show that, with imperfect detection, the King–money hybrid rule leads to a unique equilibrium. This equilibrium is arbitrarily close to the desired equilibrium when the detection error is small. In this sense, our results are robust to trembles and imperfect information. IV.A. Trembles Unique implementation is not a problem if trembles in private actions occur.

SOPHISTICATED MONETARY POLICIES

77

To see that, consider allowing for trembles in private decisions by supposing that the actual price chosen by a price-setter, xt ( j), differs from the intended price, x˜t ( j), by an additive error εt ( j), so that xt ( j) = x˜t ( j) + εt ( j). Trembles are clearly a trivial consideration. If εt ( j) is independently distributed across agents, then it simply washes out in the aggregate; it is irrelevant. Even if εt ( j) is correlated across agents, say, because it has both aggregate and idiosyncratic components, our argument goes through unchanged if the central bank can observe the aggregate component, for example, with a random sample of prices. IV.B. Imperfect Information Not as trivial is a situation in which the central bank has imperfect information about prices. But even in that situation, the King–money hybrid rule leads to a unique equilibrium; and when the detection error is small, this equilibrium is arbitrarily close to the desired equilibrium. To see that, consider a formulation in which the central bank observes the actions of price-setters with measurement error. Of course, if the central bank could see some other variable perfectly, such as output or interest rates on private debt, then it could infer what the private agents did. We think of this formulation as giving the central bank minimal amounts of information relative to what actual central banks have. We show here that with this sort of imperfect information, we can implement outcomes that are close to the desired outcomes when the measurement error is small. Here the central bank observes the price-setters’ choices with error, so that (43)

xˆt = xt + εt ,

where the error εt is i.i.d. over time with mean zero and bounded support [ε, ε¯ ]. Consider using the King–money hybrid rule to support some desired competitive equilibrium. Choose the interest¯ such that xt∗ + εt is contained in this interval rate interval [x, x] for all t. Here, the King rule is of the form (44) with φ > 1.

it (hgt ) = it∗ + φ(1 − α)(xˆt − xt∗ )

78

QUARTERLY JOURNAL OF ECONOMICS

In this economy with measurement error, the best response of any individual price-setter is identical to that in the economy without measurement error. This result follows because the best response depends on only the expected values of future variables. Because the measurement error εt has mean zero, these expected values are unchanged. Therefore, the unique equilibrium in this economy with measurement error has xt = xt∗ ; thus, πt = πt∗ . The realized values of the interest rate it and output yt , however, fluctuate around their desired values it∗ and yt∗ . Using (43) and (44), we know that the realized value of the interest rate is given by (45)

it = it∗ + φ(1 − α)εt ,

whereas using the Euler equation, we know that the realized value of output is given by (46)

yt = yt∗ − ψφ(1 − α)εt .

Notice that when the central bank observes private actions imperfectly, the King–money hybrid rule does not exactly implement any desired competitive equilibrium. Rather, this rule implements an equilibrium in which output fluctuates around its desired level. These fluctuations are proportional to the size of the measurement error. Clearly, as the size of the measurement error εt goes to zero, the outcomes converge to the desired outcomes. We have thus established a proposition: PROPOSITION 7 (Approximate Implementation with Measurement Error). Suppose the sophisticated policy is described by the King–money hybrid rule described above. Then the economy has a unique equilibrium with xt = xt∗ and yt given by (46). As the variance of the measurement error approaches zero, the economy’s outcomes converge to the desired outcomes. Note that although the central bank never reverts to a money regime when it is on the equilibrium path, the possibility that it will do so off the equilibrium path plays a critical role in this implementation. V. IMPLICATIONS FOR THE TAYLOR PRINCIPLE The sophisticated policy approach we have just described has implications for the use of the Taylor principle as a device to

SOPHISTICATED MONETARY POLICIES

79

ensure determinacy and to guide inferences from empirical investigations about whether central bank policy has led the economy into a determinate or indeterminate region. (Recall that the Taylor principle is the notion that interest rates should rise more than one for one with inflation rates, both compared to some exogenous, possibly stochastic, levels.) V.A. Setup In order to show what the sophisticated policy approach implies for our discussion of the Taylor principle, we consider a popular specification of the Taylor rule of the form (47)

it = ¯ıt + φ Et−1 πt + bEt−1 yt ,

where ¯ıt is an exogenously given, possibly stochastic, sequence. (See Taylor [1993] for a similar specification.) In our simple model, from (12), policies of the Taylor rule form (47) can be written as (48)

it = ¯ıt + φ(xt − x¯t ).

When the parameter φ > 1, such policies are said to satisfy the Taylor principle: The central bank should raise its interest rate more than one for one with increases in inflation. When φ < 1, such policies are said to violate that principle. Notice that when ¯ıt and x¯t coincide with the desired competitive equilibrium outcomes it∗ and xt∗ for all periods, the Taylor rule (48) reduces to the simple model’s King rule (19). V.B. Implications for Determinacy Many economists have argued that central banks must adhere to the Taylor principle in order to ensure unique implementation. Our results clearly imply that if the central bank is following a pure interest-rate rule, then adherence to the Taylor principle is neither necessary nor sufficient for unique implementation. If, however, the central bank is following a King–money hybrid rule, then adherence to this principle after deviations between observed outcomes and desired outcomes can help ensure unique implementation. Note that policies of the Taylor rule form (48) are linear feedback rules of the form (23) and lead to indeterminacy, regardless of the value of φ. In this sense, if the central bank is following a pure interest-rate rule, then adherence to the Taylor principle is not sufficient for unique implementation. A similar argument implies

80

QUARTERLY JOURNAL OF ECONOMICS

that, under (41), it is not sufficient in the staggered price-setting model either. Clearly, under pure interest-rate rules, adherence to the Taylor principle is also not necessary for unique implementation. Propositions 1 and 4 imply that, in both models, the central bank can uniquely implement any competitive equilibrium, including those that violate the Taylor principle along the equilibrium path. V.C. Implications for Estimation Many economists have estimated monetary policy rules and then inferred that these rules have led the economy to be in the determinate region if and only if they satisfy the Taylor principle. Indeed, one branch of this literature argues that the undesirable inflation experiences of the 1970s in the United States occurred in part because monetary policy led the economy to be in the indeterminate region. See, for example, the work of Clarida, Gal´ı, and Gertler (2000). We provide a set of stark assumptions under which such inferences can be made more confidently. Nonetheless, finding appropriate assumptions in more interesting applied examples remains a challenge. Perfect Information. In economies in which the central bank and private agents have the same information, observations of variables along the equilibrium path shed no light on the properties of policies off that path, and it is these properties that govern the determinacy of equilibrium. Of course, any estimation procedure can rely only on data along the equilibrium path; it cannot uncover the properties of policies off that path. In this sense, estimation procedures in economies with perfect information cannot determine whether monetary policy is leading the economy to be in the determinate or the indeterminate region. (See Cochrane [2007] for a related point.) To see this general point in the context of our models, note that any estimation procedure can only uncover relationships between the equilibrium interest rate it∗ and the equilibrium inflation rate πt∗ . These relationships have nothing whatsoever to do with the off-equilibrium path policies that govern determinacy. For example, in the context of the King–money hybrid rule with the King rule of the form (35), neither it∗ nor πt∗ depend on the parameter φ, but the size of this parameter plays a key role in ensuring determinacy. In this sense, without trivial identifying

SOPHISTICATED MONETARY POLICIES

81

assumptions, no estimation procedure can uncover the key parameter for determinacy. For example, suppose that along the equilibrium path, interest rates satisfy (49)

it∗ = ¯ı + φ ∗ (xt∗ − x), ¯

where it∗ and xt∗ are the desired equilibrium outcomes and ı¯ and x¯ are some constants that differ from those desired outcomes. This equilibrium can be supported in many ways, including reversion after deviations to a money regime or some sort of hybrid rule. Notice that in (49) the parameter φ ∗ simply describes the relation between the equilibrium outcomes it∗ and xt∗ and has no connection to the behavior of policy after deviations. Obviously, with a policy that specifies reversion to a money regime, the size of φ ∗ (whether it is smaller or larger than one) has no bearing on the determinacy of equilibrium. That is also true with a policy that reverts to a hybrid rule after deviations, though perhaps not as obviously. Suppose that for small deviations, the hybrid rule specifies the King rule (20) with φ > 1. The parameter φ of this King rule has no connection to the parameter φ ∗ in (49). The former governs the behavior of policies after deviations, whereas the latter simply describes a relationship that holds along the equilibrium path. Furthermore, although φ > 1 ensures determinacy, the size of φ ∗ —whether it is smaller or larger than 1—has no bearing on determinacy. These arguments clearly generalize to situations in which the constants ¯ı and x¯ are replaced by exogenous, possibly stochastic, sequences ¯ıt and x¯t that differ from the desired outcomes, so that along the equilibrium path, interest rates satisfy (50)

it∗ = ¯ıt + φ ∗ (xt∗ − x¯t ).

We interpret most of the current estimation procedures of the Taylor rule variety as estimating φ ∗ , the parameter governing desired outcomes in (50) or its analog in more general setups. To use these estimates to draw inferences about determinacy, researchers implicitly assume that the parameter φ (the parameter describing off-equilibrium path behavior) is the same as φ ∗ (the parameter describing on-equilibrium path behavior). Researchers also restrict attention to bounded solutions. As we have discussed, with perfect information, theory imposes no connection between φ and φ ∗ , so the assumption that φ = φ ∗ is not grounded in theory.

82

QUARTERLY JOURNAL OF ECONOMICS

Also, the rationale for restricting attention to bounded solutions is not clear. With perfect information, then, current estimation procedures simply cannot uncover whether the economy is in the determinate or the indeterminate region. Imperfect Information. With imperfect information, however, there is some hope that variants of current procedures may be able to uncover some of the key parameters for determinacy, provided researchers are willing to make some quite strong assumptions. Here we provide a stark example in which a variant of current procedures can uncover one of the key parameters governing determinacy. Consider our staggered price-setting economy, in which the central bank observes the price-setters’ choices with error. Recall that in this economy, the equilibrium outcomes for interest rates and output, (45) and (46), depend on the parameter φ in the King–money hybrid rule and that this parameter plays a key role in ensuring determinacy. Note the contrast with the perfect information economy, in which the equilibrium outcomes do not depend on the parameter φ. The fact that equilibrium outcomes depend on the key determinacy parameter here offers some hope that researchers will be able to estimate it. For our stark example, we assume that researchers observe the same data as the central bank and that along the equilibrium path, the central bank follows a King rule of the form (51)

it = it∗ + φ(1 − α)(xˆt − xt∗ ).

If researchers know the desired outcomes xt∗ and it∗ , as well as the parameter α, then they can simply solve (51) for φ as long as xˆt does not identically equal xt∗ . To go from this solution for φ to an inference about determinacy requires more assumptions. One set of assumptions is that the data are generated by our staggered price-setting model, in which the central bank observes xˆt = xt + εt , where εt is i.i.d. over time and has mean zero and bounded support [ε, ε¯ ], and the central bank follows the King–money hybrid rule, with the King rule given by (51). The key feature of the formulation that allows this inference is that xˆt does not identically equal xt∗ as it does in the economies with perfect information. Note that in our stark example, this procedure can uncover the King rule parameter φ, but not the hybrid rule parameters π and π. ¯ More generally, no procedure can uncover what behavior would be in situations that are never reached in equilibrium, even

SOPHISTICATED MONETARY POLICIES

83

if the specification of such behavior plays a critical role in unique implementation. This observation implies that even in our stark example, we cannot distinguish between a pure interest-rate rule and the King–money hybrid rule. Although we have offered some hope for uncovering some of the key parameters for determinacy, applying our insight to a broader class of environments is apt to be hard. In practice, after all, the desired outcomes are not known, the other parameters of the economy are not known, the measurement error is likely to be serially correlated, and the interest-rate rule is subject to stochastic shocks. Quite beyond these practical issues is a theoretical one: drawing inferences about determinacy requires confronting a subtle identification issue. This issue stems from the fact that characterizing the equilibrium is relatively easy if the economy is in the determinate region, but extremely hard if it is not. Specifically, if the economy is in the determinate region, then the probability distribution over observed variables is a relatively straightforward function of the primitive parameters. If the economy is in the indeterminate region, however, then this probability distribution (which must take account of the possibility of sunspots) is more complicated. One way to proceed is to tentatively assume that the economy is in the determinate region and estimate the key parameters governing determinacy. Suppose that under this tentative assumption, we find that the parameters fall in the determinate region. Can we then conclude that the economy is in the determinate region? Not yet. We must still show that the data could not have been generated by one of the indeterminate equilibria—not an easy task. VI. CONCLUSIONS We have here described our sophisticated policy approach and illustrated its use as an operational guide to policy that achieves unique implementation of any competitive equilibrium outcome. We have demonstrated that using a pure interest-rate rule leads to indeterminacy. We have also constructed policies that avoid this by switching regimes: they use interest rates until private agents deviate and then revert to a money regime or a hybrid rule. Our work has strong implications for the use of the Taylor principle as a guide to policy. We have shown that if a central bank

84

QUARTERLY JOURNAL OF ECONOMICS

follows a pure interest-rate rule, then adherence to the Taylor principle is neither necessary nor sufficient for unique implementation. Adherence to that principle may ensure determinacy, however, if monetary policy includes a reversion to the King–money hybrid rule after deviations. We have also argued that existing empirical procedures used to draw inferences about the relationship between adherence to the Taylor principle and determinacy should be treated with caution. We have provided a set of stark assumptions that can be more confidently used in applied work to draw inferences regarding the relationship between central bank policy and determinacy. Using this method, however, requires solving multiple difficult identification problems. Finally, although we have here focused exclusively on monetary policy, the use of our operational guide is not necessarily limited to that application. The logic behind the construction of the guide should be applicable as well to other governmental policies—for example, to fiscal policy and to policy responses to financial crises—or to any application that aims to uniquely implement a desired outcome.

APPENDIX: THE PROOFS OF PROPOSITIONS 3, 5, AND 6 A. Proof of Proposition 3: A Unique Implementation with a Hybrid Rule in the Simple Model Given that the central bank follows the King–money hybrid rule, say, σg∗ , we will show here that there are unique strategies σx , σ y , and σπ for private agents that, together with σg∗ , constitute a sophisticated equilibrium. We then show that this sophisticated equilibrium implements the desired outcomes. The strategies σx , σ y , and σπ are as follows. The strategy σx specifies that xt (ht−1 ) = xt∗ (st−1 ) for all histories. The strategies σ y and σπ specify yt (hyt ) and πt (hyt ) as the unique solutions to conditions defining consumer optimality; (1) and (2), which define flexible price–producer optimality, (10); and the King–money ∗ ∗ (st+1 ) and xt+1 (st+1 ) = xt+1 (st+1 ). hybrid rule with yt+1 (st+1 ) = yt+1 Note that the value of xt in the history hyt = (ht−1 , xt , δt , st ) determines the regime in the current period and, hence, determines whether the Euler equation (1) or the cash-in-advance constraint (2) is used to solve for yt (hyt ) and πt (hyt ).

SOPHISTICATED MONETARY POLICIES

85

We now show that (σg∗ , σx , σ y , σπ ) is a sophisticated equilibrium. Given that {xt∗ (st−1 ), πt∗ (st ), yt∗ (st )} is a period-zero competi¯ so that the central bank tive equilibrium and that xt∗ (st−1 ) ∈ [x, x], is following an interest-rate regime, we know that any tail of these outcomes {xt∗ (st−1 ), πt∗ (st ), yt∗ (st )}t≥r is a continuation competitive equilibrium starting in period r regardless of the history hr−1 . On the equilibrium path, this claim follows immediately because the continuation of any competitive equilibrium is also a competitive equilibrium. Off the equilibrium path, for histories ht−1 , the tail is a period-zero competitive equilibrium (with periods suitably relabeled) and is, therefore, a continuation competitive equilibrium. A similar argument shows that the tail of the outcomes starting from the end of period r, namely, πr (hyr ) and yr (hyr ), together with the outcomes {xt∗ (st−1 ), πt∗ (st ), yt∗ (st )}t≥r+1 , constitutes a continuation competitive equilibrium. Note that our construction implies that after any deviation in period t, the equilibrium outcomes from period t + 1 are the desired outcomes. We now establish uniqueness of the sophisticated equilibrium of the form (σg∗ , σx , σ y , σπ ). We begin with a preliminary result that shows that for any st−1 in any equilibrium, xt (st−1 ) ∈ [x, x]. ¯ / This argument is by contradiction. Suppose that at st−1 , xt (st−1 ) ∈ ¯ Under the hybrid rule, the central bank reverts to a money [x, x]. ¯ From Lemma 1, regime with expected inflation equal to π¯ ∈ [x, x]. ¯ which contradicts xt (st−1 ) ∈ / [x, x]. ¯ This result xt (st−1 ) = π¯ ∈ [x, x], implies that along the equilibrium path, the central bank never reverts to money, so that interest rates are given by the King rule (19). With this preliminary result, we establish uniqueness by another contradiction argument. Suppose that the economy has a sophisticated equilibrium in which in some history hr−1 , xr (hr−1 ) = xˆr , which differs from xr∗ (sr−1 ). Without loss of generality, suppose that xˆr − xr∗ (sr−1 ) = ε > 0. Let {xˆt (st−1 ), πˆ t (st ), yˆt (st )}t≥r denote the associated continuation competitive equilibrium outcomes. Our preliminary result implies that the central bank follows the King rule in all periods. Let {ˆıt (st−1 )}t≥r denote the associated interest rates. From (13), using the law of iterated expectations, we have that (52)

∗ E[it∗ (st−1 ) | sr−1 ] = E[xt+1 (st ) | sr−1 ]

E[ˆıt (s

t−1

)|s

r−1

] = E[xˆt+1 (s ) | s t

r−1

].

and

86

QUARTERLY JOURNAL OF ECONOMICS

Substituting (52) into the King rule (19) gives that ∗ (st ) | sr−1 = φ t−r ε. E xˆt+1 (st ) − xt+1 ∗ (st ) is bounded, for every ε there exists Because φ > 1 and xt+1 some T such that ¯ E xˆ T +1 (sT ) | sT −1 > x.

But this contradicts our preliminary result that xt (st−1 ) ≤ x¯ for all QED t and st−1 . B. Proof of Proposition 5: Indeterminacy of Equilibrium under the King Rule in the Staggered Price-Setting Model It is straightforward to verify that output and inflation satisfying (42) satisfy all equilibrium conditions except the model’s transversality condition (29) and its two boundedness conditions (30) and (31). Here we verify these conditions. Consider first the transversality condition. Under (40) it follows that the larger eigenvalue λ2 (φ) is a decreasing function of φ and that λ2 (1) = (1 + κψ)/β. From (41) it then follows that βαλ2 (φ) < 1 for all φ ≥ 1. Hence, limt→∞ (αβ)t π˜ t = 0. Because πt∗ is bounded, it follows that πt satisfies the transversality condition (29). Consider next the output and interest-rate boundedness conditions. We first show that [λ2 (φ) − a]/b < 0 for all φ ≥ 1. To do so, we show that λ2 (φ) − a is positive for φ ∈ [1, 1/β), zero at φ = 1/β, and negative for φ ∈ (1/β, φmax ]. From (40) we know that 1 1 1 + κψ (53) λ2 = +1 β 2 β 1 κψ 2 κψ 1 1 −1 + −1 . + −4 2 β β β β Note that the term in the radical is a perfect square. Then using that and the first part of (41) turns (53) into 1 κψ λ2 =1+ = a. β β Because λ2 (φ) is decreasing, it follows that λ2 (φ) − a has the desired sign pattern. Because b = ψ(φ − 1/β), the numerator and the denominator of [λ2 (φ) − a]/b have opposite signs for all φ ≥ 1, so that [λ2 (φ) − a]/b is negative. Thus, the boundedness conditions

SOPHISTICATED MONETARY POLICIES

87

are satisfied for all ω2 ≤ 0. In the resulting equilibria, inflation goes to plus infinity and output goes to minus infinity (so that the level of output goes to zero). QED C. Proof of Proposition 6: Unique Implementation with a Hybrid Rule in the Staggered Price-Setting Model Let {xt∗ , πt∗ , yt∗ } be the desired bounded competitive equilibrium. The strategies that implement this competitive equilibrium are as follows. The strategy σg∗ is the King–money hybrid rule. The strategy σx specifies that xt (ht−1 ) = xt∗ for all histories. The strategies σ y and σπ specify yt (hyt ) and πt (hyt ) that are the unique solutions to the deterministic versions of the conditions defining consumer optimality, (1), (2), (28), (32), and the King–money hy∗ ∗ and xt+1 = xt+1 . brid rule with yt+1 = yt+1 ∗ The proof that (σg , σx , σ y , σπ ) is a sophisticated equilibrium closely parallels that of Proposition 3. We now establish uniqueness of the sophisticated equilibrium of the form (σg∗ , σx , σ y , σπ ). We begin by showing that given σg∗ , xt (ht−1 ) = xt∗ for all histories. (Clearly, given σg∗ and σx , σ y and σπ are unique.) For reasons similar to those underlying the preliminary result in Proposition 3, for any history ht−1 , xt (ht−1 ) must ¯ so that for any history, interest rates are be in the interval [x, x], given by the King rule (35). Under an interest-rate rule, the state yt−1 is irrelevant; therefore, a continuation competitive equilibrium starting at the beginning of any period t solves the same equations as a competitive equilibrium (starting from period 0). For notational simplicity, we focus on a competitive equilibrium starting from period 0. Suppose by way of contradiction that {xˆt , πˆ t , yˆt } is an equilibrium that does not coincide with {xt∗ , πt∗ , yt∗ }. Let x˜t = xˆt − xt∗ , and use similar notation for π˜ t and y˜t . Then, subtracting the equations governing the systems denoted with an asterisk from those denoted with a caret, we have a system governing {x˜t , π˜ t , y˜t } that satisfies (the analogs of) (1), (32), and (35). The resulting system, given by (37) and (38), coincides with that in the proof of Proposition 5. Hence, the solution is given by (39) with eigenvalues given by (40). It is easy to check that φ > 1 implies that both eigenvalues λ1 and λ2 are greater than one. Furthermore, at least one of (λ1 − a)/b and (λ2 − a)/b is nonzero. Because both of the eigenvalues are greater than one, (39) implies that if the two equilibria ever differ, then π˜ t becomes unbounded, so that x˜t does as well. Because

88

QUARTERLY JOURNAL OF ECONOMICS

xt∗ is bounded, xˆt must eventually leave the interval [x, x], ¯ which cannot happen in equilibrium. So we have a contradiction, and the first part of Proposition 6 is established. Note that our construction implies that after any deviation in period t, the equilibrium outcomes from period t + 1 are the desired outcomes. Thus, we have also established the second part of the proposition. QED UNIVERSITY OF CALIFORNIA, LOS ANGELES, FEDERAL RESERVE BANK OF MINNEAPOLIS, AND NATIONAL BUREAU OF ECONOMIC RESEARCH UNIVERSITY OF MINNESOTA AND FEDERAL RESERVE BANK OF MINNEAPOLIS FEDERAL RESERVE BANK OF MINNEAPOLIS, UNIVERSITY OF MINNESOTA, AND NATIONAL BUREAU OF ECONOMIC RESEARCH

REFERENCES ˜ Bernardino, Isabel Correia, and Pedro Teles, “Unique Monetary Equilibria Adao, with Interest Rate Rules,” manuscript, Bank of Portugal, 2007. Atkeson, Andrew, V. V. Chari, and Patrick J. Kehoe, “Sophisticated Monetary Policies,” Federal Reserve Bank of Minneapolis, Research Department Staff Report 419, 2009. Barro, Robert J., “On the Determination of the Public Debt,” Journal of Political Economy, 87 (1979), 940–971. Bassetto, Marco, “A Game-Theoretic View of the Fiscal Theory of the Price Level,” Econometrica, 70 (2002), 2167–2195. ——, “Equilibrium and Government Commitment,” Journal of Economic Theory, 124 (2005), 79–105. Benhabib, Jess, Stephanie Schmitt-Groh´e, and Mart´ın Uribe, “Monetary Policy and Multiple Equilibria,” American Economic Review, 91 (2001), 167–186. Buiter, Willem H., “The Fiscal Theory of the Price Level: A Critique,” Economic Journal, 112 (2002), 459–480. Cagan, Phillip, “The Monetary Dynamics of Hyperinflation,” in Studies in the Quantity Theory of Money, Milton Friedman, ed. (Chicago: University of Chicago Press, 1956). Calvo, Guillermo A., “Staggered Prices in a Utility-Maximizing Framework,” Journal of Monetary Economics, 12 (1983), 383–398. Chari, Varadarajan V., Lawrence J. Christiano, and Patrick J. Kehoe, “Optimality of the Friedman Rule in Economies with Distorting Taxes,” Journal of Monetary Economics, 37 (1996), 203–223. Chari, Varadarajan V., and Patrick J. Kehoe, “Sustainable Plans,” Journal of Political Economy, 98 (1990), 783–802. Christiano, Lawrence J., and Massimo Rostagno, “Money Growth Monitoring and the Taylor Rule,” NBER Working Paper No. 8539, 2001. Clarida, Richard, Jordi Gal´ı, and Mark Gertler, “Monetary Policy Rules and Macroeconomic Stability: Evidence and Some Theory,” Quarterly Journal of Economics, 115 (2000), 147–180. Cochrane, John H., “Inflation Determination with Taylor Rules: A Critical Review,” NBER Working Paper No. 13409, 2007. Correia, Isabel, Juan Pablo Nicolini, and Pedro Teles, “Optimal Fiscal and Monetary Policy: Equivalence Results,” Journal of Political Economy, 116 (2008), 141–170. Jackson, Matthew O., “A Crash Course in Implementation Theory,” Social Choice and Welfare, 18 (2001), 655–708. King, Robert G., “The New IS-LM Model: Language, Logic, and Limits,” Federal Reserve Bank of Richmond Economic Quarterly, 86 (2000), 45–103.

SOPHISTICATED MONETARY POLICIES

89

Kocherlakota, Narayana, and Christopher Phelan, “Explaining the Fiscal Theory of the Price Level,” Federal Reserve Bank of Minneapolis Quarterly Review, 23 (1999), 14–23. Ljungqvist, Lars, and Thomas J. Sargent, Recursive Macroeconomic Theory, 2nd ed. (Cambridge, MA: MIT Press, 2004). Lucas, Robert E., Jr., and Nancy L. Stokey, “Optimal Fiscal and Monetary Policy in an Economy without Capital,” Journal of Monetary Economics, 12 (1983), 55–93. McCallum, Bennett T., “Price Level Determinacy with an Interest Rate Policy Rule and Rational Expectations,” Journal of Monetary Economics, 8 (1981), 319–329. Obstfeld, Maurice, and Kenneth Rogoff, “Speculative Hyperinflations in Maximizing Models: Can We Rule Them Out?” Journal of Political Economy, 91 (1983), 675–687. Ramsey, Frank P., “A Contribution to the Theory of Taxation,” Economic Journal, 37 (1927), 47–61. Sargent, Thomas J., and Neil Wallace, “‘Rational’ Expectations, the Optimal Monetary Instrument, and the Optimal Money Supply Rule,” Journal of Political Economy, 83 (1975), 241–254. Schmitt-Groh´e, Stephanie, and Mart´ın Uribe, “Optimal Fiscal and Monetary Policy under Sticky Prices,” Journal of Economic Theory, 114 (2004), 198–230. Siu, Henry E., “Optimal Fiscal and Monetary Policy with Sticky Prices,” Journal of Monetary Economics, 51 (2004), 575–607. Svensson, Lars E. O., and Michael Woodford, “Implementing Optimal Policy through Inflation-Forecast Targeting,” in The Inflation-Targeting Debate, Ben S. Bernanke and Michael Woodford, eds. (Chicago: University of Chicago Press, 2005). Taylor, John B., “Discretion Versus Policy Rules in Practice,” Carnegie–Rochester Conference Series on Public Policy, 39 (1993), 195–214. Wallace, Neil, “A Hybrid Fiat–Commodity Monetary System,” Journal of Economic Theory, 25 (1981), 421–430. Woodford, Michael, “Monetary Policy and Price Level Determinacy in a Cash-inAdvance Economy,” Economic Theory, 4 (1994), 345–380. ——, Interest and Prices: Foundations of a Theory of Monetary Policy (Princeton, NJ: Princeton University Press, 2003).

EARNINGS INEQUALITY AND MOBILITY IN THE UNITED STATES: EVIDENCE FROM SOCIAL SECURITY DATA SINCE 1937∗ WOJCIECH KOPCZUK EMMANUEL SAEZ JAE SONG This paper uses Social Security Administration longitudinal earnings micro data since 1937 to analyze the evolution of inequality and mobility in the United States. Annual earnings inequality is U-shaped, decreasing sharply up to 1953 and increasing steadily afterward. Short-term earnings mobility measures are stable over the full period except for a temporary surge during World War II. Virtually all of the increase in the variance in annual (log) earnings since 1970 is due to increase in the variance of permanent earnings (as opposed to transitory earnings). Mobility at the top of the earnings distribution is stable and has not mitigated the dramatic increase in annual earnings concentration since the 1970s. Long-term mobility among all workers has increased since the 1950s but has slightly declined among men. The decrease in the gender earnings gap and the resulting substantial increase in upward mobility over a lifetime for women are the driving force behind the increase in long-term mobility among all workers.

I. INTRODUCTION Market economies are praised for creating macroeconomic growth but blamed for the economic disparities among individuals they generate. Economic inequality is often measured using highfrequency economic outcomes such as annual income. However, market economies also generate substantial mobility in earnings over a working lifetime. As a result, annual earnings inequality might substantially exaggerate the extent of true economic disparity among individuals. To the extent that individuals can smooth changes in earnings using savings and credit markets, inequality based on longer periods than a year is a better measure ∗ We thank Tony Atkinson, Clair Brown, David Card, Jessica Guillory, Russ Hudson, Jennifer Hunt, Markus Jantti, Alan Krueger, David Lee, Thomas Lemieux, Michael Leonesio, Joyce Manchester, Robert Margo, David Pattison, Michael Reich, Jonathan Schwabish, numerous seminar participants, and especially the editor, Lawrence Katz, and four anonymous referees for very helpful comments and discussions. We also thank Ed DeMarco, Linda Maxfield, and especially Joyce Manchester for their support, Bill Kearns, Joel Packman, Russ Hudson, Shirley Piazza, Greg Diez, Fred Galeas, Bert Kestenbaum, William Piet, Jay Rossi, and Thomas Mattson for help with the data, and Thomas Solomon and Barbara Tyler for computing support. Financial support from the Sloan Foundation and NSF Grant SES-0617737 is gratefully acknowledged. All our series are available in electronic format in the Online Appendix. C 2010 by the President and Fellows of Harvard College and the Massachusetts Institute of

Technology. The Quarterly Journal of Economics, February 2010

91

92

QUARTERLY JOURNAL OF ECONOMICS

of economic disparity. Thus, a comprehensive analysis of disparity requires studying both inequality and mobility. A large body of academic work has indeed analyzed earnings inequality and mobility in the United States. A number of key facts from the pre–World War II years to the present have been established using five main data sources:1 (1) Decennial Census data show that earnings inequality decreased substantially during the “Great Compression” from 1939 to 1949 (Goldin and Margo 1992) and remained low over the next two decades; (2) the annual Current Population Surveys (CPS) show that earnings inequality has increased substantially since the 1970s and especially during the 1980s (Katz and Murphy 1992; Katz and Autor 1999); (3) income tax statistics show that the top of the annual earnings distribution experienced enormous gains over the last 25 years (Piketty and Saez 2003); (4) panel survey data, primarily the Panel Study of Income Dynamics (PSID), show that short-term rank-based mobility has remained fairly stable since the 1970s (Gottschalk 1997); and (5) the gender gap has narrowed substantially since the 1970s (Goldin 1990, 2006; Blau 1998). There are, however, important questions that remain open due primarily to lack of homogeneous and longitudinal earnings data covering a long period of time. First, no annual earnings survey data covering most of the U.S. workforce are available before the 1960s, so that it is difficult to measure overall earnings inequality on a consistent basis before the 1960s, and in particular to analyze the exact timing of the Great Compression. Second, studies of mobility have focused primarily on short-term mobility measures due to lack of longitudinal data with large sample size and covering a long time period. Therefore, little is known about earnings mobility across an entire working life, let alone how such long-term mobility has evolved over time. Third and related, there is a controversial debate on whether the increase in inequality since the 1970s has been offset by increases in earnings mobility, and whether consumption inequality has increased to the same extent as income inequality.2 In particular, the development of performance pay such as bonuses and stock options for highly compensated employees might have increased year-to-year earnings variability substantially among 1. A number of studies have also analyzed inequality and mobility in America in earlier periods (see Lindert [2000] for a survey on inequality and Ferrie [2008] for an analysis of occupational mobility). 2. See, for example, Cutler and Katz (1991), Slesnick (2001), Krueger and Perri (2006), and Attanasio, Battistin, and Ichimura (2007).

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

93

top earners, so that the trends documented in Piketty and Saez (2003) could be misleading. The goal of this paper is to use the Social Security Administration (SSA) earnings micro data available since 1937 to make progress on those questions. The SSA data we use combine four key advantages relative to the data that have been used in previous studies on inequality and mobility in the United States. First, the SSA data we use for our research purposes have a large sample size: a 1% sample of the full US covered workforce is available since 1957, and a 0.1% sample since 1937. Second, the SSA data are annual and cover a very long time period of almost seventy years. Third, the SSA data are longitudinal balanced panels, as samples are selected based on the same Social Security number pattern every year. Finally, the earnings data have very little measurement error and are fully uncapped (with no top code) since 1978.3 Although Social Security earnings data have been used in a number of previous studies (often matched to survey data such as the Current Population Survey), the data we have assembled for this study overcome three important previous limitations. First, from 1946 to 1977, we use quarterly earnings information to extrapolate earnings up to four times the Social Security annual cap.4 Second, we can match the data to employer and industry information starting in 1957, allowing us to control for expansions in Social Security coverage that started in the 1950s. Finally, to our knowledge, the Social Security annual earnings data before 1951 have not been used outside the SSA for research purposes since Robert Solow’s unpublished Harvard Ph.D. thesis (Solow 1951). Few sociodemographic variables are available in the SSA data relative to standard survey data. Date of birth, gender, place of birth (including a foreign country birthplace), and race are available since 1937. Employer information (including geographic location, industry, and size) is available since 1957. Because we do not have information on important variables such as family 3. A number of studies have compared survey data to matched administrative data to assess measurement error in survey data (see, e.g., Abowd and Stinson [2005]). 4. Previous work using SSA data before the 1980s has almost always used data capped at the Social Security annual maximum (which was around the median of the earnings distribution in the 1960s), making it impossible to study the top half of the distribution. Before 1946, the top code was above the top quintile, allowing us to study earnings up to the top quintile over the full period.

94

QUARTERLY JOURNAL OF ECONOMICS

structure, education, and hours of work, our analysis will focus only on earnings rather than on wage rates and will not attempt to explain the links between family structure, education, labor supply, and earnings, as many previous studies have done. In contrast to studies relying on income tax returns, the whole analysis is also based on individual rather than family-level data. Furthermore, we focus only on employment earnings and hence exclude self-employment earnings as well as all other forms of income such as capital income, business income, and transfers. We further restrict our analysis to employment earnings from commerce and industry workers, who represent about 70% of all U.S. employees, as this is the core group always covered by Social Security since 1937. This is an important limitation when analyzing mobility as (a) mobility within the commerce and industry sector may be different than overall mobility and (b) mobility between the commerce and industry sector and all other sectors is eliminated. We obtain three main findings. First, our annual series confirm the U-shaped evolution of earnings inequality since the 1930s. Inequality decreases sharply up to 1953 and increases steadily and continuously afterward. The U-shaped evolution of inequality over time is also present within each gender group and is more pronounced for men. Percentile ratio series show that (1) the compression in the upper part of the distribution took place from 1942 to 1950 and was followed by a steady and continuous widening ever since the early 1950s, and (2) the compression in the lower part of the distribution took place primarily in the postwar period from 1946 to the late 1960s and unraveled quickly from 1970 to 1985, especially for men, and has been fairly stable over the last two decades. Second, we find that short-term relative mobility measures such as rank correlation measures and Shorrocks indices comparing annual vs. multiyear earnings inequality have been quite stable over the full period, except for a temporary surge during World War II.5 In particular, short-term mobility has been remarkably stable since the 1950s, for a variety of mobility measures and also when the sample is restricted to men only. Therefore, the

5. Such a surge is not surprising in light of the large turnover in the labor market generated by the war.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

95

evolution of annual earnings inequality over time is very close to the evolution of inequality of longer term earnings. Furthermore, we show that most of the increase in the variance of (log) annual earnings is due to increases in the variance of (log) permanent earnings, with modest increases in the variance of transitory (log) earnings. Finally, mobility at the top of the earnings distribution, measured by the probability of staying in the top percentile after one, three, or five years, has also been very stable since 1978 (the first year in our data with no top code). Therefore, in contrast to the stock-option scenario mentioned above, the SSA data show very clearly that mobility has not mitigated the dramatic increase in annual earnings concentration. Third, we find that long-term mobility measures among all workers, such as the earnings rank correlations from the early part of a working life to the late part of a working life, display significant increases since 1951 either when measured unconditionally or when measured within cohorts. However, those increases mask substantial heterogeneity across gender groups. Long-term mobility among males has been stable over most of the period, with a slight decrease in recent decades. The decrease in the gender earnings gap and the resulting substantial increase in upward mobility over a lifetime for women is the driving force behind the increase in long-term mobility among all workers. The paper is organized as follows. Section 2 presents the conceptual framework linking inequality and mobility measures, the data, and our estimation methods. Section 3 presents inequality results based on annual earnings. Section 4 focuses on short-term mobility and its effect on inequality, whereas Section 5 focuses on long-term mobility and inequality. Section 6 concludes. Additional details on the data and our methodology, as well as extensive sensitivity analysis and the complete series, are presented in the Online Appendix. II. FRAMEWORK, DATA, AND METHODOLOGY II.A. Conceptual Framework Our main goal is to document the evolution of earnings inequality. Inequality can be measured over short-term earnings (such as annual earnings) or over long-term earnings (such as earnings averaged over several years or even a lifetime). When there is mobility in individual earnings over time, long-term

96

QUARTERLY JOURNAL OF ECONOMICS

inequality will be lower than short-term inequality, as moving up and down the distribution of short-term earnings will make the distribution of long-term earnings more equal. Therefore, conceptually, a way to measure mobility (Shorrocks 1978) is to compare inequality of short-term earnings to inequality of long-term earnings and define mobility as a coefficient between zero and one (inclusive) as follows: (1)

Long-term earnings inequality = Short-term earning inequality × (1 − Mobility).

Alternatively, one can define mobility directly as changes or “shocks” in earnings.6 In our framework, such shocks are defined broadly as any deviation from long-term earnings. Those shocks could indeed be real shocks such as unemployment, disability, or an unexpected promotion. Changes could also be the consequence of voluntary choices such as reducing (or increasing) hours of work, voluntarily changing jobs, or obtaining an expected pay raise. Such shocks can be transitory (such as working overtime in response to a temporarily increased demand for an employer’s product, or a short unemployment spell in the construction industry) or permanent (being laid off from a job in a declining industry). In that framework, both long-term inequality and the extent of shocks contribute to shaping short-term inequality: (2) Short-term earnings inequality = Long-term earnings inequality + Variability in earnings. Equations (1) and (2) are related by the formula (3) Variability in earnings = Short-term earnings inequality × Mobility = Long-term earnings inequality × Mobility/(1 − Mobility). Thus, equation (3) shows that a change in mobility with no change in long-term inequality is due to an increase in variability in earnings. Conversely, an increase in inequality (either short-term or long-term) with no change in mobility implies an increased 6. See Fields (2007) for an overview of different approaches to measuring income mobility.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

97

variability in earnings. Importantly, our concept of mobility is relative rather than absolute.7 Formally, we consider a situation where a fixed group of individuals i = 1, . . . , I have short-term earnings zit > 0 in each period t = 1, . . . , K. For example, t can represent a year. We can define long-term earnings for individual i as average earnings across all K periods: z¯ i = t zit /K. We normalize earnings so that average earnings (across individuals) are the same in each period.8 From a vector of individual earnings z = (z1 , . . . , zI ), an inequality index can be defined as G(z), where G(.) is convex in z and homogeneous of degree zero (multiplying all earnings by a given factor does not change inequality). For example, G(.) can be the Gini index or the variance of log earnings. Shorrocks (1978, Theorem 1, p. 381) shows that G(¯z) ≤

K

G(zt )/K,

t=1

where zt is the vector of earnings in period t and z¯ the vector of long-term earnings (the average across the K periods). This inequality result captures the idea that movements in individual earnings up and down the distribution reduce long-term inequality (relative to short-term inequality). Hence we can define a related Shorrocks mobility index 0 ≤ M ≤ 1 as 1 − M = K

G(¯z)

t=1

G(zt )/K

,

which is a formalization of equation (1) above. M = 0 if and only if individuals’ incomes (relative to the mean) do not change over time. The central advantage of the Shorrocks mobility index is that it formally links short-term and long-term inequality, which is perhaps the primary motivation for analyzing mobility. The disadvantage of the Shorrocks index is that it is an indirect measure of mobility. 7. Our paper focuses exclusively on relative mobility measures, although absolute mobility measures (such as the likelihood of experiencing an earnings increase of at least X% after one year) are also of great interest. Such measures might produce different time series if economic growth or annual inequality changed over time. 8. In our empirical analysis, earnings will be indexed to the nominal average earnings index.

98

QUARTERLY JOURNAL OF ECONOMICS

Therefore, it is also useful to define direct mobility indices such as the rank correlation in earnings from year t to year t + p (or quintile mobility matrices from year t to year t + p). Such mobility indices are likely to be closely related to the Shorrocks indices, as reranking from one period to another is precisely what creates a wedge between long-term inequality and (the average of) short-term inequality. The advantage of direct mobility indices is that they are more concrete and transparent than Shorrocks indices. In our paper, we will therefore use both and show that they evolve very similarly over time. One specific measure of inequality—the variance of log earnings—has received substantial attention in the literature on inequality and mobility. Introducing yit = log zit and y¯i = t log zit /K, we can define deviations in (log) earnings as εit = yit − y¯i . It is important to note that εit may reflect both transitory earnings shocks (such as an i.i.d. process) and permanent earnings shocks (such as a Brownian motion). The deviation εit could either be uncertain ex ante from the individual perspective, or predictable.9 The Shorrocks theorem applied to the inequality index variance of log-earnings implies that vari ( y¯i ) ≤ varit (yit ), where the variance varit (yit ) is taken over both i = 1, . . . , I and K = 1, . . . , t. If, for illustration, we make the statistical assumption that εit ⊥ y¯i and we denote var(εit ) = σε2 , then we have varit (yit ) = vari ( y¯i ) + σε2 , which is a formalization of equation (2) above. The Shorrocks inequality index in that case is M = σε2 /varit (yit ) = σε2 / vari ( y¯i ) + σε2 . This shows that short-term earnings variance can increase because of an increase in long-term earnings variance or an increase in the variance of earnings deviations. Alternatively and 9. Uncertainty is important conceptually because individuals facing no credit constraints can fully smooth predictable shocks, whereas uncertain shocks can only be smoothed with insurance. We do not pursue this distinction in our analysis, because we cannot observe the degree of uncertainty in the empirical earnings shocks.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

99

equivalently, short-term inequality can increase while long-term inequality remains stable if mobility increases. This simple framework can help us understand the findings from the previous literature on earnings mobility in the United States. Rank-based mobility measures (such as year-to-year rank correlation or quintile mobility matrices) are stable over time (Gottschalk 1997), whereas there has been an increase in the variance of transitory earnings (Gottschalk and Moffitt 1994). Such findings can be reconciled if the disparity in permanent earnings has simultaneously widened to keep rank-based mobility of earnings stable. In the theoretical framework we just described, the same set of individuals are followed across the K short-term periods. In practice, because individuals leave or enter the labor force (or the “commerce and industry” sector we will be focusing on), the set of individuals with positive earnings varies across periods. As the number of periods K becomes large, the sample will become smaller. Therefore, we will mostly consider relatively small values of K such as K = 3 or K = 5. When a period is a year, that allows us to analyze short-term mobility. When a period is a longer period of time such as twelve consecutive years, with K = 3, we cover 36 years, which is almost a full lifetime of work, allowing us to analyze long-term mobility, that is, mobility over a full working life. Our analysis will focus on the time series of various inequality and mobility statistics. The framework we have considered can be seen as an analysis at a given point in time s. We can recompute those statistics for various points in time to create time series. II.B. Data and Methodology Social Security Administration Data. We use primarily data sets constructed in SSA for research and statistical analysis, known as the continuous work history sample (CWHS) system.10 The annual samples are selected based on a fixed subset of digits of (a transformation of) the Social Security number (SSN). The same digits are used every year so that the sample is a balanced panel and can be treated as a random sample of the full population data. We use three main SSA data sets. (1) The 1% CWHS file contains information about taxable Social Security earnings from 1951 to 2004, basic demographic 10. Detailed documentation of these data sets can be found in Panis et al. (2000).

100

QUARTERLY JOURNAL OF ECONOMICS

characteristics such as year of birth, sex, and race, type of work (farm or nonfarm, employment or self-employment), selfemployment taxable income, insurance status for the Social Security programs, and several other variables. Because Social Security taxes apply up to a maximum level of annual earnings, however, earnings in this data set are effectively top-coded at the annual cap before 1978. Starting in 1978, the data set also contains information about full compensation derived from the W2 forms, and hence earnings are no longer top-coded. Employment earnings (either FICA employment earnings before 1978 or W2 earnings from 1978 on) are defined as the sum of all wages and salaries, bonuses, and exercised stock options exactly as wage income reported on individual income tax returns.11 (2) The second file is known as the employee–employer file (EE-ER), and we will rely on its longitudinal version (LEED), which covers 1957 to date. Although the sampling approach based on the SSN is the same as the 1% CWHS, individual earnings are reported at the employer level so that there is a record for each employer a worker is employed by in a year. This data set contains demographic characteristics, compensation information subject to top-coding at the employer–employee record level (and with no top code after 1978), and information about the employer, including geographic information and industry at the three-digit (major group and industry group) level. The industry information allows us to control for expansion in coverage overtime (see below). Importantly, the LEED (and EE-ER) data set also includes imputations based on quarterly earnings structure from 1957 to 1977, which allows us to handle earnings above the top code (see below).12 (3) Third, we use the so-called 0.1% CWHS file (one-tenth of 1%) that is constructed as a subset of the 1% file but covers 1937– 1977. This file is unique in its covering the Great Compression of the 1940s. The 0.1% file contains the same demographic variables as well as quarterly earnings information starting with 1951 (and quarter at which the top code was reached for 1946–1950), thereby extending our ability to deal with top-coding problems (see below). 11. FICA earnings include elective employee contributions for pensions (primarily 401(k) contributions), whereas W2 earnings exclude such contributions. However, before 1978, such contributions were almost nonexistent. 12. To our knowledge, the LEED has hardly ever been used in academic publications. Two notable exceptions are Schiller (1977) and Topel and Ward (1992).

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

101

Top Coding Issues. From 1937 to 1945, no information above the taxable ceiling is available. From 1946 to 1950, the quarter at which the ceiling is reached is available. From 1951 to 1977, we rely on imputations based on quarterly earnings (up to the quarter at which the annual ceiling is reached). Finally, since 1978, the data are fully uncapped. To our knowledge, the exact quarterly earnings information seems to have been retained only in the 0.1% CWHS sample since 1951. The LEED 1% sample since 1957 contains imputations that are based on quarterly earnings, but the quarterly earnings themselves were not retained in the data available to us. The imputation method is discussed in more detail in Kestenbaum (1976, his method II) and in the Online Appendix. It relies on earnings for quarters when they are observed to impute earnings in quarters that are not observed (when the taxable ceiling is reached after the first quarter). Importantly, this imputation method might not be accurate if individual earnings were not uniform across quarters. We extend the same procedure to 1951–1956 using the 0.1% file and because of the overlap of the 0.1% file and 1% LEED between 1957 and 1977 are able to verify that this is indeed the exact procedure that was applied in the LEED data. For 1946–1950, the imputation procedure (see the Online Appendix and Kestenbaum [1976, his method I]) uses Pareto distributions and preserves the rank order based on the quarter when the taxable maximum was reached. For individuals with earnings above the taxable ceiling (from 1937 to 1945) or who reach the taxable ceiling in the first quarter (from 1946 to 1977), we impute earnings assuming a Pareto distribution above the top code (1937–1945) or four times the top code (1946–1977). The Pareto distribution is calibrated from wage income tax statistics published by the Internal Revenue Service to match the top wage income shares series estimated in Piketty and Saez (2003). The number of individuals who were top-coded in the first quarter and whose earnings are imputed based on the Pareto imputation is less than 1% of the sample for virtually all years after 1951. Consequently, high-quality earnings information is available for the bottom 99% of the sample, allowing us to study both inequality and mobility up to the top percentile. From 1937 to 1945, the fraction of workers top-coded (in our sample of interest defined below) increases from 3.6% in 1937 to 19.5% in 1944 and 17.4% in 1945. The number of top-coded observations increases

102

QUARTERLY JOURNAL OF ECONOMICS

to 32.9% by 1950, but the quarter when a person reached the taxable maximum helps in classifying people into broad income categories. This implies that we cannot study groups smaller than the top percentile from 1951 to 1977 and we cannot study groups smaller than the top quintile from 1937 to 1950. To assess the sensitivity of our mobility and multiyear inequality estimates with respect to top code imputation, we use two Pareto imputation methods (see the Online Appendix). In the first or main method, the Pareto imputation is based on draws from a uniform distribution that are independent across individuals but also across time periods. As there is persistence in ranking even at the top of the distribution, this method generates an upward bias in mobility within top-coded individuals. In the alternative method, the uniform distribution draws are independent across individuals but fixed over time for a given individual. As there is some mobility in rankings at the top of the distribution, this method generates a downward bias in mobility. We always test that the two methods generate virtually the same series (see Online Appendix Figures A.5 to A.9 for examples).13 Changing Coverage Issues. Initially, Social Security covered only “commerce and industry” employees, defined as most private for-profit sector employees, and excluding farm and domestic employees as well as self-employed workers. Since 1951, there has been an expansion in the workers covered by Social Security and hence included in the data. An important expansion took place in 1951 when self-employed workers and farm and domestic employees were included. This reform also expanded coverage to some government and nonprofit employees (including large parts of the education and health care industries), with coverage increasing significantly further in 1954 and then slowly expanding since then. We include in our sample only commerce and industry employment earnings in order to focus on a consistent definition of workers. Using SIC classification in the LEED, we define commerce and industry as all SIC codes excluding agriculture, forestry, and fishing (01–09), hospitals (8060–8069), educational services (82), social services (83), religious organizations and nonclassified membership organizations (8660–8699), private households (88), and public administration (91–97). 13. This is not surprising because, starting with 1951, imputations matter for just the top 1% of the sample and mobility measures for the full population are not very sensitive to what happens within the very top group.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

103

Between 1951 and 1956, we do not have industry information, as the LEED starts in 1957. Therefore, we impute “commerce and industry” classification using 1957–1958 industrial classification as well as discontinuities in covered earnings from 1950 to 1951 (see the Online Appendix for complete details). In 2004, commerce and industry employees are about 70% of all employees, and this proportion has declined only very modestly since 1937. Using only commerce and industry earnings is a limitation for our study for two reasons. First, inequality and mobility within the commerce and industry sector may be different from those in the full population. Second and more important, mobility between the commerce and industry sector and all other sectors is eliminated. Because in recent decades Social Security covers over 95% of earnings, we show in the Online Appendix that our mobility findings for recent decades are robust to including all covered workers. However, we cannot perform such a robustness check for earlier periods when coverage was much less complete. Note also that, throughout the period, the data include immigrant workers only if they have valid SSNs. Sample Selection. For our primary analysis, we are restricting the sample to adult individuals aged 25 to 60 (by January 1 of the corresponding year). This top age restriction allows us to concentrate on the working-age population.14 Second, we consider for our main sample only workers with annual (commerce and industry) employment earnings above a minimum threshold defined as one-fourth of a full year–full time minimum wage in 2004 ($2,575 in 2004), and then indexed by nominal average wage growth for earlier years. For many measures of inequality, such as log-earnings variance, it is necessary to trim the bottom of the earnings distribution. We show in Online Appendix Figures A.2 to A.9 that our results are not sensitive to choosing a higher minimum threshold such as a full year–full time minimum wage. We cannot analyze the transition into and out of the labor force satisfactorily using our sample because the SSA data cover only about 70% of employees in the early decades. From now on, we refer to our main sample of interest, namely “commerce and industry” workers aged 25 to 60 with earnings above the indexed minimum threshold (of $2,575 in 2004), as the “core sample.” 14. Kopczuk, Saez, and Song (2007) used a wider age group from 18 to 70 and obtain the same qualitative findings.

104

QUARTERLY JOURNAL OF ECONOMICS

0.50

0.45 ●

Gini coefficient

●

●●●

● ● ● ●

0.40

●

●● ●

●● ●●●

● ●

●

●

●●

●

●

●●

●●●●

●

●●●

●

●●

●●

●

●●

●

●●●

●● ●

●

●

●

●●

●

●

●

● ●

●

●

●●

●●●

0.35

●

All workers Men Women

0.30 1940

1950

1960

1970

1980

1990

2000

Year

FIGURE I Annual Gini Coefficients The figure displays the Gini coefficients from 1937 to 2004 for earnings of individuals in the core sample, men in the core sample, and women in the core sample. The core sample in year t is defined as all employees with commerce and industry earnings above a minimum threshold ($2,575 in 2004 and indexed using average wage for earlier years) and aged 25 to 60 (by January 1 of year t). Commerce and industry are defined as all industrial sectors excluding government employees, agriculture, hospitals, educational services, social services, religious and membership organizations, and private households. Self-employment earnings are fully excluded. Estimations are based on the 0.1% CWHS data set for 1937 to 1956, the 1% LEED sample from 1957 to 1977, and the 1% CWHS (matched to W-2 data) from 1978 on. See the Online Appendix for complete details.

III. ANNUAL EARNINGS INEQUALITY Figure I plots the annual Gini coefficient from 1937 to 2004 for the core sample of all workers, and for men and women separately in lighter gray. The Gini series for all workers follows a U-shape over the period, which is consistent with previous work based on decennial Census data (Goldin and Margo 1992), wage income from tax return data for the top of the distribution (Piketty and Saez 2003), and CPS data available since the early 1960s (Katz and Autor 1999). The series displays a sharp decrease of the Gini coefficient from 0.44 in 1938 down to 0.36 in 1953 (the Great Compression) followed by a steady increase since 1953 that accelerates in the 1970s and especially the 1980s. The Gini coefficient surpassed the prewar level in the late 1980s and was highest in 2004 at 0.47. Our series shows that the Great Compression is indeed the period of most dramatic change in inequality since the late 1930s

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

105

and that it took place in two steps. The Gini coefficient decreased sharply during the war from 1942 to 1944, rebounded very slightly from 1944 to 1946, and then declined again from 1946 to 1953. Among all workers, the increase in the Gini coefficient over the five decades from 1953 to 2004 is close to linear, which suggests that changes in overall inequality were not limited to an episodic event in the 1980s. Figure I shows that the series for males and females separately display the same U-shaped evolution over time. Interestingly, the Great Compression as well as the upward trend in inequality is much more pronounced for men than for all workers. This shows that the rise in the Gini coefficient since 1970 cannot be attributed to changes in gender composition of the labor force. The Gini for men shows a dramatic increase from 0.35 in 1979 to 0.43 in 1988, which is consistent with the CPS evidence extensively discussed in Katz and Autor (1999).15 On the other hand, stability of the Gini coefficients for men and for women from the early 1950s through the late 1960s highlights that the overall increase in the Gini coefficient in that period has been driven by a widening of the gender gap in earnings (i.e., the betweenrather than within-group component). Strikingly, there is more earnings inequality among women than among men in the 1950s and 1960s, whereas the reverse is true before the Great Compression and since the late 1970s. Finally, the increase in the Gini coefficient has slowed since the late 1980s in the overall sample. It is interesting to note that a large part of the 3.5 points increase in the Gini from 1990 to 2004 is due to a surge in earnings within the top percentile of the distribution. The series of Gini coefficients estimated, excluding the top percentile, increases by less than 2 points since 1990 (see Online Appendix Figure A.3).16 It should also be noted that, since the 1980s, the Gini coefficient has increased faster for men and women separately than for all workers. This has been driven by 15. There is a controversial debate in labor economics about the timing of changes in male wage inequality, due in part to discrepancies across different data sets. For example, Lemieux (2006), using May CPS data, argues that most of the increase in inequality occurs in the 1980s, whereas Autor, Katz, and Kearney (2008), using March CPS data, estimate that inequality starts to increase in the late 1960s. The Social Security data also point to an earlier increase in earnings inequality among males. 16. Hence, results based on survey data such as official Census Bureau inequality statistics, which do not measure the top percentile well, can give an incomplete view of inequality changes even when using global indices such as the Gini coefficient.

106

QUARTERLY JOURNAL OF ECONOMICS

1.0 ●

●●

0.9

● ●

●

●

●

●

●

● ●

0.8 Log percentile ratios

P50 P20 ●

●

● ●

●

●

●●

●

●●●

●

●●

●●

●

●

●

●●●

●●

●

●●● ●

●●

●

●

●●●

●●

●●

●●●●

●●

●●

●●

0.7

0.6

●

● ●

●●●

●

● ●

0.5

● ●

0.4

●●●

●

●●●

●●●

●●●●

●●●●●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●●●●

P80 P50

●●

●●

●●●●●●●●●●●

●

●

●

●

All workers Men Women

0.3 1940

1950

1960

1970

1980

1990

2000

Year

FIGURE II Percentile Ratios log(P80/P50) and log(P50/P20) Sample is the core sample (commerce and industry employees aged 25 to 60; see Figure I). The figure displays the log of the 50th to 20th percentile earnings ratio (upper part of the figure) and the log of the 80th to 50th percentile earnings ratio (lower part of the figure) among all workers, men only (in lighter gray), and women only (in lighter gray).

an increase in the earnings of women relative to men, especially at the top of the distribution, as we shall see. Most previous work in the labor economics literature has focused on gender-specific measures of inequality. As men and women share a single labor market, it is also valuable to analyze the overall inequality generated in the labor market (in the “commerce and industry” sector in our analysis). Our analysis for all workers and by gender provides clear evidence of the importance of changes in women’s labor market behavior and outcomes for understanding overall changes in inequality, a topic we will return to. To understand where in the distribution the changes in inequality displayed in Figure I are occurring, Figure II displays the (log) percentile annual earnings ratios P80/P50—measuring inequality in the upper half of the distribution—and P50/P20— measuring inequality in the lower half of the distribution. We also depict the series for men and women only separately in lighter gray.17 17. We choose P80 (instead of the more usual P90) to avoid top-coding issues before 1951 and P20 (instead of the more usual P10) so that our low percentile

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

107

The P80/P50 series (depicted in the bottom half of the figure) are also U-shaped over the period, with a brief but substantial Great Compression from 1942 to 1947 and a steady increase starting in 1951, which accelerates in the 1970s. Interestingly, P80/P50 is virtually constant from 1985 to 2000, showing that the gains at the top of the distribution occurred above P80. The series for men is similar except that P80/P50 increases sharply in the 1980s and continues to increase in the 1990s. The P50/P20 series (depicted in the upper half of the figure) display a fairly different time pattern from the P80/P50 series. First, the compression happens primarily in the postwar period from 1946 to 1953. There are large swings in P50/P20 during the war, especially for men, as many young low income earners leave and enter the labor force because of the war, but P50/P20 is virtually the same in 1941 and 1946 or 1947.18 After the end of the Great Compression in 1953, the P50/P20 series for all workers remains fairly stable to the present, alternating periods of increase and decrease. In particular, it decreases smoothly from the mid1980s to 2000, implying that inequality in the bottom half shrank in the last two decades, although it started increasing after 2000. The series for men only is quite different and displays an overall U shape over time, with a sharper great compression that extends well into the postwar period, with an absolute minimum in 1969 followed by a sharp increase up to 1983 and relative stability since then (consistent with recent evidence by Autor, Katz, and Kearney [2008]). For women, the P50/P20 series display a secular and steady fall since World War II. Table I summarizes the annual earnings inequality trends for all (Panel A), men (Panel B), and women (Panel C) with various inequality measures for selective years (1939, 1960, 1980, and 2004). In addition to the series depicted in the Figures, Table I contains the variance of log-earnings, which also displays a U-shaped pattern over the period, as well as the shares of total earnings going to the bottom quintile group (P0–20), the top quintile group (P80–100), and the top percentile group (P99–100). Those last two series also display a U shape over the period. In particular, the top percentile share has almost doubled from 1980 estimate is not too closely driven by the average wage-indexed minimum threshold we have chosen ($2,575 in 2004). 18. In the working paper version (Kopczuk, Saez, and Song 2007), we show that compositional changes during the war are strongly influencing the bottom of the distribution during the early 1940s.

0.433 0.375 0.408 0.471

0.417 0.326 0.366 0.475

0.380 0.349 0.354 0.426

1939 1960 1980 2004

1939 1960 1980 2004

1939 1960 1980 2004

0.635 0.570 0.564 0.693

0.800 0.533 0.618 0.797

0.826 0.681 0.730 0.791

1.36 1.31 1.22 1.34

1.32 0.94 1.06 1.34

1.43 1.24 1.33 1.39

0.87 0.82 0.74 0.74

0.85 0.58 0.64 0.73

0.88 0.79 0.76 0.76

A. All 0.55 3.64 0.46 4.54 0.57 4.34 0.63 3.91 B. Men 0.47 3.82 0.35 5.89 0.43 5.25 0.61 3.92 C. Women 0.49 4.49 0.50 4.98 0.49 5.15 0.59 4.45 42.25 39.18 40.38 47.36

45.52 38.80 42.02 51.83

46.82 41.66 44.98 51.41

6.11 4.05 4.37 8.00

9.58 5.55 6.85 13.44

9.55 5.92 7.21 12.28

9,145 15,148 20,439 32,499

17,918 32,989 44,386 52,955

15,806 27,428 35,039 44,052

4,911 11,006 19,566 33,063

15,493 24,309 30,564 42,908

20,404 35,315 50,129 75,971

#Workers (’000s) (11)

Notes. The table displays various annual earnings inequality statistics for selected years, 1939, 1960, 1980, and 2004 for all workers in the core sample (Panel A), men in the core sample (Panel B), and women in the core sample (Panel C). The core sample in year t is defined as all employees with commerce and industry earnings above a minimum threshold ($2,575 in 2004 and indexed using average wage for earlier years) and aged 25 to 60 (by January 1 of year t). Commerce and industry are defined as all industrial sectors excluding government employees, agriculture, hospitals, educational services, social services, religious and membership organizations, and private households. Self-employment earnings are fully excluded. Estimates are based on the 0.1% CWHS data set for 1937 to 1956, the 1% LEED sample from 1957 to 1977, and the 1% CWHS from 1978 on. See the Online Appendix for complete details. Columns (2) and (3) report the Gini coefficient and variance of log earnings. Columns (4), (5), and (6) report the percentile log ratios P80/P20, P50/P20, and P80/P50. P80 denotes the 80th percentile, etc. Columns (7), (8), and (9) report the share of total earnings accruing to P0–20 (the bottom quintile), P80–100 (the top quintile), and P99–100 (the top percentile). Column (10) reports average earnings in 2004 dollars using the CPI index (the new CPI-U-RS index is used after 1978). Column (11) reports the number of workers in thousands.

Gini (2)

Year (1)

Variance Log percentile ratios Earnings shares Average log earnings earnings P80/P20 P50/P20 P80/P50 P0–20 P80–100 P99–100 (2004 $) (3) (4) (5) (6) (7) (8) (9) (10) 0.1% sample from 1937 to 1956, 1% from 1957 to 2004. Number of workers in thousands

TABLE I ANNUAL EARNINGS INEQUALITY

108 QUARTERLY JOURNAL OF ECONOMICS

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

109

to 2004 in the sample of men only and the sample of women only and accounts for over half of the increase in the top quintile share from 1980 to 2004. IV. THE EFFECTS OF SHORT-TERM MOBILITY ON EARNINGS INEQUALITY In this section, we apply our theoretical framework from Section II.A to analyze multiyear inequality and relate it to the annual earnings inequality series analyzed in Section III. We will consider each period to be a year and the longer period to be five years (K = 5).19 We will compare inequality based on annual earnings and earnings averaged over five years. We will then derive the implied Shorrocks mobility indices and decompose annual inequality into permanent and transitory inequality components. We will also examine some direct measures of mobility such as rank correlations. Figure III plots the Gini coefficient series for earnings averaged over five years20 (numerator of the Shorrocks index) and the five-year average of the Gini coefficients of annual earnings (the denominator of the Shorrocks index). For a given year t, the sample for both the five-year Gini and the annual Ginis is defined as all individuals with “Commerce and Industry” earnings above the minimum threshold in all five years, t − 2, t − 1, t, t + 1, t + 2 (and aged 25 to 60 in the middle year t). We show the average of the five annual Gini coefficients between t − 2 and t + 2 as our measure of the annual Gini coefficient, because it matches the Shorrocks approach. Because the sample is the same for both series, Shorrocks’ theorem implies that the five-year Gini is always smaller than the average of the annual Gini (over the corresponding five years), as indeed displayed in the figure.21 We also display the same series for men only (in lighter gray). The annual Gini displays the same overall evolution over time as in Figure I. The level is lower, as there is naturally less inequality in the group of 19. Series based on three-year averages instead of five year generates display a very similar time pattern. Increasing K beyond five would reduce sample size substantially, as we require earnings to be above the minimum threshold in each of the five years, as described below. 20. The average is taken after indexing annual earnings by the average wage index. 21. Alternatively, we could have defined the sample as all individuals with earnings above the minimum threshold in any of the five years, t − 2, t − 1, t, t + 1, t + 2. The time pattern of those series is very similar. We prefer to use the positive-earnings in all five years criterion because this is a necessity when analyzing variability in log-earnings, as we do below.

110

QUARTERLY JOURNAL OF ECONOMICS

0.45

0.40

Gini coefficient

●

● ●

● ●

0.35

● ●

●

●

● ●

0.30

●

●●●

●

● ●●

●

●●

●

●

●●●●●

●

●●

●●

●●●●● ●●●

●●●

●●●

●

●● ●●●

●●

●●●●●●●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

● ● ●●● ●● ●

●

●

● ●

● ●

● ●● ●●● ● ●●●

● ●

● ● ●

● ● ●

●

●

●

●

●

●

●

●

● ●

Annual earnings, all workers Five-year earnings, all workers Annual earnings, men Five-year earnings, men

0.25 1940

1950

1960

1970

1980

1990

2000

Year

FIGURE III Gini Coefficients: Annual Earnings vs. Five-Year Earnings The figure displays the Gini coefficients for annual earnings and for earnings averaged over five years from 1939 to 2002. In year t, the sample for both series is defined as all individuals aged 25 to 60 in year t, with commerce and industry earnings above the minimum threshold in all five years t − 2, t − 1, t, t + 1, t + 2. Earnings are averaged over the five-year span using the average earnings index. The Gini coefficient for annual earnings displayed for year t is the average of the Gini coefficient for annual earnings in years t − 2, . . . , t + 2. The same series are reported in lighter gray for the sample restricted to men only.

individuals with positive earnings for five consecutive years than in the core sample. The Gini coefficient estimated for five-year earnings average follows a very similar evolution over time and is actually extremely close to the annual Gini, especially in recent decades. Interestingly, in this sample, the Great Compression takes place primarily during the war from 1940 to 1944. The war compression is followed by a much more modest decline till 1952. This suggests that the postwar compression observed in annual earnings in Figure I was likely due to entry (of young men in the middle of the distribution) and exit (likely of wartime working women in the lower part of the distribution). Since the early 1950s, the two Gini series are remarkably parallel, and the five-year earnings average Gini displays an accelerated increase during the 1970s and especially the 1980s, as did our annual Gini series. The fiveyear average earnings Gini series for men show that the Great Compression is concentrated during the war, with little change in the Gini from 1946 to 1970, and a very sharp increase over the next three decades, especially the 1980s.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

111

Shorrocks Gini mobility index and rank correlation

1.0

● ●

0.9

● ●●

●●● ●

●

●

●

●●

●●●●●●●●●●●●●●● ●●●● ●● ●●●●●●●●●●●●●●●●●●●●● ●●● ●●●● ●● ●●●●●● ●●●●●●●●●●●●● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ●

●

0.8

0.7

0.6 ● ●

Shorrocks Index (five-year Gini/annual Gini), all workers Shorrocks Index (five-year Gini/annual Gini), men Rank correlation (after one year), all workers Rank correlation (after one year), men

0.5 1940

1950

1960

1970

1980

1990

2000

Year

FIGURE IV Short-Term Mobility: Shorrocks’ Index and Rank Correlation The figure displays the Shorrocks mobility coefficient based on annual earnings Gini vs. five-year average earnings Gini and the rank correlation between earnings in year t and year t + 1. The Shorrocks mobility coefficient in year t is defined as the ratio of the five-year earnings (from t − 2 to t + 2) Gini coefficient to the average of the annual earnings Gini for years t − 2, . . . , t + 2 (those two series are displayed in Figure III). The rank correlation in year t is estimated on the sample of individuals present in the core sample (commerce and industry employees aged 25 to 60; see Figure I) in both year t and year t + 1. The same series are reported in lighter gray for the sample restricted to men only.

Figure IV displays two measures of mobility (in black for all workers and in lighter gray for men only). The first measure is the Shorrocks measure, defined as the ratio of the five-year Gini to (the average of) the annual Gini. Mobility decreases with the index, and an index equal to one implies no mobility at all. The Shorrocks index series is above 0.9, except for a temporary dip during the war. The increased earnings mobility during the war is likely explained by the large movements into and out of the labor force of men serving in the army and women temporarily replacing men in the civilian labor force. The Shorrocks series have very slightly increased since the early 1970s, from 0.945 to 0.967 in 2004.22 This small change in the direction of reduced mobility further confirms that, as we expected from Figure III, short-term mobility has played a minor role in the surge in annual earnings inequality documented in Figure I. 22. The increase is slightly more pronounced for the sample of men.

112

QUARTERLY JOURNAL OF ECONOMICS

The second mobility measure displayed on Figure IV is the straight rank correlation in earnings between year t and year t + 1 (computed in the sample of individuals present in our core sample in both years t and t + 1).23 As with the Shorrocks index, mobility decreases with the rank correlation and a correlation of one implies no year-to-year mobility. The rank mobility series follows the same overall evolution over time as the Shorrocks mobility index: a temporary but sharp dip during the war followed by a slight increase. Over the last two decades, the rank correlation in year-to-year earnings has been very stable and very high, around .9. As with the Shorrocks index, the increase in rank correlation is slightly more pronounced for men (than for the full sample) since the late 1960s. Figure V displays (a) the average of variance of annual log earnings from t − 2 to t + 2 (defined on the stable sample as in the Shorrocks index analysis before), (b) the variance of five-year average log-earnings, var(( t+2 s=t−2 log zis )/5), and (c) the variance of log earnings deviations, estimated as t+2 s=t−2 log zis Dt = var log(zit ) − , 5 where the variance is taken across all individuals i with earnings above the minimum threshold in all five years t − 2, . . . , t + 2. As with the previous two mobility measures, those series, displayed in black for all workers and in lighter gray for men only, show a temporary surge in the variance of transitory earnings during the war, and are stable after 1960. In particular, it is striking that we do not observe an increased earnings variability over the last twenty years, so that all the increase in the log-earnings variance can be attributed to the increase in the variance of permanent (five-year average) log-earnings. Our results differ somewhat from those of Gottschalk and Moffitt (1994), using PSID data, who found that over one-third of the increase in the variance of log-earnings from the 1970s to the 1980s was due to an increase in transitory earnings (Table 1, row 1, p. 223). We find a smaller increase in transitory earnings in 23. More precisely, within the sample of individuals present in the core sample in both years t and t + 1, we measure the rank rt and rt+1 of each individual in each of the two years, and then compute the correlation between rt and rt+1 across individuals.

113

EARNINGS INEQUALITY AND MOBILITY IN THE U.S. 0.7 ●

0.6

All Annual variance Permanent (five-year) variance Transitory variance ●●

Variance of log(earnings)

0.5

●

●● ●

●

●●●

● ●●

0.4

●

●●

●

Men Annual variance Permanent (five-year) variance Transitory variance

● ●● ●

●●●●●●● ●●

●●●

●●●●●●●●●●●●●●

●●●●●●●●●●

●●●

● ●●

●●●●

●●

●

●

●●

●●

●●●

●●

●●

●●

●

●

●

●●

●

●

●● ●

●

●● ●●●●●● ●●●●●● ●●●●●● ●●●● ●●●● ● ● ●● ●

0.3

0.2

0.1

0.0 1940

1950

1960

1970

1980

1990

2000

Year

FIGURE V Variance of Annual, Permanent, and Transitory (log) Earnings The figure displays the variance of (log) annual earning, the variance of (log) five-year average earnings (permanent variance), and the transitory variance, defined as the variance of the difference between (log) annual earnings and (log) five-year average earnings. In year t, the sample for all three series is defined as all individuals aged 25 to 60 in year t, with commerce and industry earnings above the minimum threshold in all five years t − 2, t − 1, t, t + 1, t + 2. The (log) annual earnings variance is estimated as the average (across years t − 2, . . . , t + 2) of the variance of (log) annual earnings. The same series are reported in lighter gray for the sample restricted to men only.

the 1970s and we find that this increase reverts in the late 1980s and 1990s so that transitory earnings variance is virtually identical in 1970 and 2000. To be sure, our results could differ from those of Gottschalk and Moffitt (1994) for many reasons, such as measurement error and earnings definition consistency issues in the PSID or the sample definition. Gottschalk and Moffitt focus exclusively on white males, use a different age cutoff, take out age-profile effects, and include earnings from all industrial sectors. Gottschalk and Moffitt also use nine-year earnings periods (instead of five as we do) and include all years with positive annual earnings years (instead of requiring positive earnings in all nine years as we do).24 24. The recent studies of Dynan, Elmendorf, and Sichel (2008) and Shin and Solon (2008) revisit mobility using PSID data. Shin and Solon (2008) find an increase in mobility in the 1970s followed by stability, which is consistent with our results. Dynan, Elmendorf, and Sichel (2008) find an increase in mobility in recent decades, but they focus on household total income instead of individual earnings.

114

QUARTERLY JOURNAL OF ECONOMICS ●

13 ●

Earnings Share (%)

12

●

11

● ●

10

●

●

●

● ●

●

●

●

●

●

●

●

●

9 ●

8 ●

7 ● ●

●

● ● ●

● ●

6

Annual earnings Five-year average earnings

A. Top 1% earnings share: annual vs. five-year

100 ●

After one year After three years After five years

Probability (%)

90

80 ●

● ●

● ●

● ● ●

●

● ● ●

● ● ● ● ● ● ● ●

70

● ●

●

●

● ●

60

50 1980

1985 1990 1995 B. Probability of staying in the top 1%

2000

2005

FIGURE VI Top Percentile Earnings Share and Mobility In Panel A, the sample in year t is all individuals aged 25 to 60 in year t and with commerce and industry earnings above the minimum threshold in all five years t − 2, t − 1, t, t + 1, t + 2. In year t, Panel A displays (1) the share of total year t annual earnings accruing to the top 1% earners in that year t and (2) the share of total five-year average earnings (from year t − 2, . . . , t + 2) accruing to the top 1% earners (defined as top 1% in terms of average five-year earnings). Panel B displays the probability of staying in the top 1% annual earnings group after X years (where X = 1, 3, 5). The sample in year t is all individuals present in the core sample (commerce and industry employees aged 25 to 60; see Figure I) in both year t and year t + X. Series in both panels are restricted to 1978 and on because sample has no top code since 1978.

The absence of top-coding since 1978 allows us to zoom on top earnings, which, as we showed in Table I, have surged in recent decades. Figure VI.A uses the uncapped data since 1978 to plot the share of total annual earnings accruing to the top 1% (those with

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

115

earnings above $236,000 in 2004). The top 1% annual earnings share doubles from 6.5% in 1978 to 13% in 2004.25 Figure VI.A then compares the share of earnings of the top 1% based on annual data with shares of the top 1% defined based on earnings averaged at the individual level over five years. The five-year average earnings share series naturally smoothes short-term fluctuations but shows the same time pattern of robust increase as the annual measure.26 This shows that the surge in top earnings is not due to increased mobility at the top. This finding is confirmed in Figure VI.B, which shows the probability of staying in the top 1% earnings group after one, three, and five years (conditional on staying in our core sample) starting in 1978. The one-year probability is between sixty and seventy percent and it shows no overall trend. Therefore, our analysis shows that the dramatic surge in top earnings has not been accompanied by a similar surge in mobility into and out of top earnings groups. Hence, annual earnings concentration measures provide a very good approximation to longer-term earnings concentration measures. In particular, the development of performance-based pay such as bonuses and profits from exercised stock options (both included in our earnings measure) does not seem to have increased mobility dramatically.27 Table II summarizes the key short-term mobility trends for all (Panel A) and men (Panel B) with various mobility measures for selected years (1939, 1960, 1980, and 2002). In sum, the movements in short-term mobility series appear to be much smaller than changes in inequality over time. As a result, changes in short-term mobility have had no significant impact on inequality trends in the United States. Those findings are consistent with previous studies for recent decades based on PSID data (see, e.g., Gottschalk [1997] for a summary) as well as the most recent SSA

25. The closeness of our SSA-based (individual-level) results and the tax return–based (family-level) results of Piketty and Saez (2003) shows that changes in assortative mating played at best a minor role in the surge of family employment earnings at the top of the earnings distribution. 26. Following the framework from Section II.A (applied in this case to the top 1% earnings–share measure of inequality), we have computed such shares (in year t) on the sample of all individuals with minimum earnings in all five years, t − 2, . . . , t + 2. Note also that, in contrast to Shorrocks’ theorem, the series cross because we do not average the annual income share in year t across the five years t − 2, . . . , t + 2. 27. Conversely, the widening of the gap in annual earnings between the top 1% and the rest of the workforce has not affected the likelihood of top-1% earners falling back into the bottom 99%.

116

QUARTERLY JOURNAL OF ECONOMICS TABLE II FIVE-YEAR AVERAGE EARNINGS INEQUALITY AND SHORT-TERM MOBILITY

Annual Permanent Annual 5-year earnings Rank (5-year log-earnings Transitory earnings Gini correlation average) variance logaverage (average after log-earnings (average earnings #Workers Year Gini t − 2, . . . , t + 2) 1 year variance t − 2, . . . , t + 2) variance (’000s) (1) (2) (3) (4) (5) (6) (7) (8) 1939 1960 1980 2002

0.357 0.307 0.347 0.421

0.380 0.324 0.364 0.435

0.859 0.883 0.885 0.897

1939 1960 1980 2002

0.340 0.272 0.310 0.426

0.365 0.291 0.329 0.440

0.853 0.855 0.869 0.898

A. All 0.416 0.371 0.426 0.514 B. Men 0.373 0.288 0.337 0.509

0.531 0.447 0.513 0.594

0.085 0.054 0.061 0.058

14,785 26,479 35,500 55,108

0.494 0.362 0.425 0.591

0.091 0.052 0.062 0.061

11,700 19,577 23,190 32,259

Notes. The table displays various measures of 5-year average earnings inequality and short-term mobility measures centered around selected years, 1939, 1960, 1980, and 2002 for all workers (Panel A) and men (Panel B). In all columns (except (4)), the sample in year t is defined as all employees with commerce and industry earnings above a minimum threshold ($2,575 in 2004 and indexed using average wage for earlier years) in all five years t − 2, t − 1, t, t + 1, and t + 2, and aged 25 to 60 (by January 1 of year t). Column (2) reports the Gini coefficients based on average earnings from year t − 2 to year t + 2 (averages are computed using indexed wages). Column (3) reports the average across years t − 2, . . . , t + 2 of the Gini coefficients of annual earnings. Column (4) reports the rank correlation between annual earnings in year t and annual earnings in year t + 1 in the sample of workers in the core sample (see Table I footnote for the definition) in both years t and t + 1. Column (5) reports the variance of average log-earnings from year t − 2 to year t + 2. Column (6) reports the average across years t − 2, . . . , t + 2 of the variance of annual log-earnings. Column (7) reports the variance of the difference between log earnings in year t and the average of log earnings from year t − 2 to t + 2. Column (8) reports the number of workers in thousands.

data–based analysis of the Congressional Budget Office (2007)28 and the tax return–based analysis of Carroll, Joulfaian, and Rider (2007). They are more difficult to reconcile, however, with the findings of Hungerford (1993) and especially Hacker (2006), who find great increases in family income variability in recent decades using PSID data. Our finding of stable transitory earnings variance is also at odds with the findings of Gottschalk and Moffitt (1994), who decompose transitory and permanent variance in logearnings using PSID data and show an increase in both components. Our decomposition using SSA data shows that only the variance of the relatively permanent component of earnings has increased in recent decades. V. LONG-TERM MOBILITY AND LIFETIME INEQUALITY The very long span of our data allows us to estimate long-term mobility. Such mobility measures go beyond the issue of transitory 28. The CBO study focuses on probabilities of large earnings increases (or drops).

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

117

earnings analyzed above and instead describe mobility across a full working life. Such estimates have not yet been produced for the United States in any systematic way because of the lack of panel data with large sample size and covering a long time period. V.A. Unconditional Long-Term Inequality and Mobility We begin with the simplest extension of our previous analysis to a longer horizon. In the context of the theoretical framework from Section II.A, we now assume that a period is eleven consecutive years. We define the “core long-term sample” in year t as all individuals aged 25–60 in year t with average earnings (using the standard wage indexation) from year t − 5 to year t + 5 above the minimum threshold. Hence, our sample includes individuals with zeros in some years as long as average earnings are above the threshold.29 Figure VII displays the Gini coefficients for all workers, and for men and women separately based on those eleven-year average earnings from 1942 to 1999. The overall picture is actually strikingly similar to our annual Figure I. The Gini coefficient series for all workers displays on overall U shape with a Great Compression from 1942 to 1953 and an absolute minimum in 1953, followed by a steady increase that accelerates in the 1970s and 1980s and slows down in the 1990s. The U-shaped evolution over time is also much more pronounced for men than for women and shows that, for men, the inequality increase was concentrated in the 1970s and 1980s.30 After exploring base inequality over those eleven-year spells, we turn to long-term mobility. Figure VIII displays the rank correlation between the eleven-year earnings spell centered in year t and the eleven-year earnings spell after T years (i.e., centered in year t + T ) in the same sample of individuals present in the “long-term core sample” in both year t and year t + T . The figure presents such correlations for three choices of T : ten years, fifteen years, and twenty years. Given our 25–60 age restriction (which applies in both year t and year t + T ), for T = 20, the sample in year t is aged 25 to 40 (and the sample in year t + 20 is aged 45 to 60). Thus, this measure captures mobility from early career to late career. The figure also displays the same series for men only 29. This allows us to analyze large and representative samples as the number of individuals with positive “commerce and industry” earnings in eleven consecutive years is only between 35% and 50% of the core annual samples. 30. We show in Online Appendix Figures A.8 and A.9 that these results are robust to using a higher minimum threshold.

118

QUARTERLY JOURNAL OF ECONOMICS

0.50

●

Gini coefficient

●

●

●

● ●

0.45

●

●

●●● ● ● ●● ●● ●●●

● ●●●● ●●●

●

●

●

●

●

●●

● ●● ●●

●

●

●

●●

●●●●●● ●●● ●● ●●

0.40

●

1940

1950

1960

1970

1980

1990

All workers Men Women 2000

Year (middle of the eleven-year span)

FIGURE VII Long-Term Earnings Gini Coefficients The figure displays the Gini coefficients from 1942 to 1999 for eleven-year average earnings for all workers, men only, and women only. The sample in year t is defined as all employees aged 25 to 60 in year t, alive in all years t − 5 to t + 5, and with average commerce and industry earnings (averaged using the average wage index) from year t − 5 to t + 5 above the minimum threshold. Gini coefficient in year t is based on average (indexed) earnings across the eleven-year span from year t − 5 to t + 5.

in lighter gray, in which case rank is defined within the sample of men. Three points are worth noting. First, the correlation is unsurprisingly lower as T increases, but it is striking to note that even after twenty years, the correlation is still substantial (in the vicinity of .5). Second, the series for all workers shows that rank correlation has actually significantly decreased over time: for example, the rank correlation between 1950s and 1970s earnings was around .57, but it is only .49 between 1970s and 1990s earnings. This shows that long-term mobility has increased significantly over the last five decades. This result stands in contrast to our short-term mobility results displaying substantial stability. Third, however, Figure VIII shows that this increase in long-term mobility disappears in the sample of men. The series for men displays a slight decrease in rank correlation in the first part of the period followed by an increase in the last part of the period. On net, the series for men displays almost no change in rank correlation and hence no change in long-term mobility over the full period.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

119

0.8 ● ●

●

●

● ● ● ● ● ● ● ● ●

● ●

0.7

Rank correlation

● ● ● ● ● ● ● ● ● ● ●

● ●

After ten years, all After ten years, men

● ● ● ● ● ● ● ●

After fifteen years, all After fifteen years, men

After twenty years, all After twenty years, men

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.6

0.5

0.4 1950

1960

1970

1980

1990

Year (middle of the initial eleven-year span)

FIGURE VIII Long-Term Mobility: Rank Correlation in Eleven-Year Earnings Spans The figure displays in year t the rank correlation between eleven-year average earnings centered around year t and eleven-year average earnings centered around year t + X, where X = ten, fifteen, twenty. The sample is defined as all individuals aged 25 to 60 in year t and t + X, with average eleven-year earnings around years t and t + X above the minimum threshold. Because of small sample size, series including earnings before 1957 are smoothed using a weighted three-year moving average with weight of 0.5 for cohort t and weights of 0.25 for t − 1 and t + 1. The same series are reported in lighter gray for the sample restricted to men only (in which case, rank is estimated within the sample of men only).

V.B. Cohort-Based Long-Term Inequality and Mobility The analysis so far ignored changes in the age structure of the population as well as changes in the wage profiles over a career. We turn to cohort-level analysis to control for those effects. In principle, we could control for age (as well as other demographic changes) using a regression framework. In this paper, we focus exclusively on series without controls because they are more transparent, easier to interpret, and less affected by imputation issues. We defer a more comprehensive structural analysis of earnings processes to future work.31 We divide working lifetimes from age 25 to 60 into three stages: Early career is defined as from the calendar year the 31. An important strand of the literature on income mobility has developed covariance structure models to estimate such earnings processes. The estimates of such models are often difficult to interpret and sensitive to the specification (see, e.g., Baker and Solon [2003]). As a result, many recent contributions in the mobility literature have also focused on simple measures without using a complex framework (see, e.g., Congressional Budget Office [2007] and in particular the discussion in Shin and Solon [2008]).

120

QUARTERLY JOURNAL OF ECONOMICS

0.55

Gini coefficient

0.50

0.45

●●

●

0.40

●

●

●●

●

●●●●

● ●●● ●

●

● ● ● ●●● ●●●● ● ●

●●

●●

● ● ● ●

●

Early career: age 25 to 36 Mid-career:age 37 to 48 Late career: age 49 to 60 Men only in lighter gray

●●●●●●●●●●●

● ●

0.35

●●●

●●

● ● ● ●● ● ● ●● ● ●

●●

● ●● ● ●

●

●

●

●●●

●

●

●● ● ● ● ● ●● ● ●●●●●● ● ● ● ●●● ●●● ●●● ●

●●

●●

0.30 1900

1920

1940

1960

Year of birth

FIGURE IX Long-Term Earnings Gini Coefficients by Birth Cohort Sample is career sample defined as follows for each career stage and birth cohort: all employees with average commerce and industry earnings (using average wage index) over the twelve-year career stage above the minimum threshold ($2,575 in 2004 and indexed on average wage for earlier years). Note that earnings can be zero for some years. Early career is from age 25 to 36, middle career is from age 37 to 48, late career is from age 49 to 60. Because of small sample size, series including earnings before 1957 are smoothed using a weighted three-year moving average with weight of 0.5 for cohort t and weights of 0.25 for t − 1 and t + 1.

person reaches 25 to the calendar year the person reaches 36. Middle and later careers are defined similarly from age 37 to 48 and age 49 to 60, respectively. For example, for a person born in 1944, the early career is calendar years 1969–1980, the middle career is 1981–1992, and the late career is 1993–2004. For a given year-of-birth cohort, we define the “core early career sample” as all individuals with average “commerce and industry” earnings over the twelve years of the early career stage above the minimum threshold (including zeros and using again the standard wage indexation). The “core mid-career” and “core late career” samples are defined similarly for each birth cohort. The earnings in early, mid-, and late career are defined as average “commerce and industry” earnings during the corresponding stage (always using the average wage index). Figure IX reports the Gini coefficient series by year of birth for early, mid-, and late career. The Gini coefficients for men only are also displayed in lighter gray. The cohort-based Gini coefficients

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

121

are consistent with our previous findings and display a U shape over the full period. Three results are notable. First, there is much more inequality in late career than in middle career, and in middle career than in early career, showing that long-term inequality fans out over the course of a working life. Second, the Gini series show that long-term inequality has been stable for the baby-boom cohorts born after 1945 in the sample of all workers (we can observe only early- and mid-career inequality for those cohorts, as their late-career earnings are not completed by 2004). Those results are striking in light of our previous results showing a worsening of inequality in annual and five-year average earnings. Third, however, the Gini series for men only show that inequality has increased substantially across baby-boom cohorts born after 1945. This sharp contrast between series for all workers versus men only reinforces our previous findings that gender effects play an important role in shaping the trends in overall inequality. We also find that cohort-based rank mobility measures display stability or even slight decreases over the last five decades in the full sample, but that rank mobility has decreased substantially in the sample of men (figure omitted to save space). This confirms that the evolution of long-term mobility is heavily influenced by gender effects, to which we now turn. V.C. The Role of Gender Gaps in Long-Term Inequality and Mobility As we saw, there are striking differences in the long-term inequality and mobility series for all workers vs. for men only: Long-term inequality has increased much less in the sample of all workers than in the sample of men only. Long-term mobility has increased over the last four decades in the sample of all workers, but not in the sample of men only. Such differences can be explained by the reduction in the gender gap that has taken place over the period. Figure X plots the fraction of women in our core sample and in various upper earnings groups: the fourth quintile group (P60–80), the ninth decile group (P80–90), the top decile group (P90–100), and the top percentile group (P99–100). As adult women aged 25 to 60 are about half of the adult population aged 25 to 60, with no gender differences in earnings, those fractions should be approximately 0.5. Those representation indices with no adjustment capture the total realized earnings gap including labor

122

QUARTERLY JOURNAL OF ECONOMICS

0.5 ●

Fraction of women in each group

0.4

All workers P60–80 P80–90 P90–100 P99–100

● ●

0.3

●●

●●●

● ●●●●●●

●●●

●●●●●●

●●●●

●

●●

● ●●●●

●●

●

●●

●●

●

● ●●

●●

●●●●

●●●●●●●

●●●●●●●●

●

0.2

0.1

0.0 1940

1950

1960

1970

1980

1990

2000

Year

FIGURE X Gender Gap in Upper Earnings Groups Sample is core sample (commerce and industry employees aged 25 to 60; see Figure I). The figure displays the fraction of women in various groups. P60–80 denotes the fourth quintile group from percentile 60 to percentile 80, P90–100 denotes the top 10%, etc. Because of top-coding in the micro data, estimates from 1943 to 1950 for P80–90 and P90–100 are estimated using published tabulations in Social Security Administration (1937–1952, 1967) and reported in lighter gray.

supply decisions.32 We use those representation indices instead of the traditional ratio of mean (or median) female earnings to male earnings because such representation indices remain meaningful in the presence of differential changes in labor force participation or in the wage structure across genders, and we do not have covariates to control for such changes, as is done in survey data (see, e.g., Blau, Ferber, and Winkler [2006]). Two elements in Figure X are worth noting. First, the fraction of women in the core sample of commerce and industry workers has increased from around 23% in 1937 to about 44% in 2004. World War II generated a temporary surge in women’s labor force participation, two-thirds of which was reversed immediately after the war.33 Women’s labor force participation has been steadily and continuously increasing since the mid-1950s and has been stable at around 43%–44% since 1990. 32. As a result, they combine not only the traditional wage gap between males and females but also the labor force participation gap (including the decision to work in the commerce and industry sector rather than other sectors or selfemployment). 33. This is consistent with the analysis of Goldin (1991), who uses unique micro survey data covering women’s workforce history from 1940 to 1951.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

123

Second, Figure X shows that the representation of women in upper earnings groups has increased significantly over the last four decades and in a staggered time pattern across upper earnings groups.34 For example, the fraction of women in P60–80 starts to increase in 1966 from around 8% and reaches about 34% in the early 1990s and has remained about stable since then. The fraction of women in the top percentile (P99– 100) does not really start to increase significantly before 1980. It grows from around 2% in 1980 to almost 14% in 2004 and is still quickly increasing. Those results show that the representation of women in top earnings groups has increased substantially over the last three to four decades. They also suggest that economic progress of women is likely to impact measures of upward mobility significantly, as many women are likely to move up the earnings distribution over their lifetimes. Indeed, we have found that such gender effects are strongest in upward mobility series such as the probability of moving from the bottom two quintile groups (those earning less than $25,500 in 2004) to the top quintile group (those earning over $59,000 in 2004) over a lifetime. Figure XI displays such upward mobility series, defined as the probability of moving from the bottom two quintile groups to the top quintile group after twenty years (conditional on being in the “long-term core sample” in both year t and year t + 20) for all workers, men, and women.35 The figure shows striking heterogeneity across groups. First, men have much higher levels of upward mobility than women. Thus, in addition to the annual earnings gap we documented, there is an upward mobility gap as well across groups. Second, the upward mobility gap has also been closing over time: the probability of upward mobility among men has been stable overall since World War II, with a slight increase up to the 1960s and declines after the 1970s. In contrast, the probability of upward mobility of women has continuously increased from a very low level of less than 1% in the 1950s to about 7% in the 1980s. The increase in upward mobility for women compensates for the stagnation or slight decline in mobility for men, so that upward mobility among 34. There was a surge in women in P60–80 during World War II, but this was entirely reversed by 1948. Strikingly, women were better represented in upper groups in the late 1930s than in the 1950s. 35. Note that quintile groups are always defined based on the sample of all workers, including both male and female workers.

Probability of moving from P0−40 to P80−100 (%) after twenty years

124

QUARTERLY JOURNAL OF ECONOMICS

10

8

6

4 ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

2 ●

All Men Women

0 1950

1955

1960

1965

1970

1975

1980

Year (middle of the initial eleven-year span)

FIGURE XI Long-Term Upward Mobility: Gender Effects The figure displays in year t the probability of moving to the top quintile group (P80–100) for eleven-year average earnings centered around year t + 20 conditional on having eleven-year average earnings centered around year t in the bottom two quintile groups (P0–40). The sample is defined as all individuals aged 25 to 60 in year t and t + 20, with average eleven-year “commerce and industry” earnings around years t and t + 20 above the minimum threshold. Because of small sample size, series including earnings before 1957 are smoothed using a weighted three-year moving average with weight of 0.5 for cohort t and weights of 0.25 for t − 1 and t + 1. The series are reported for all workers, men only, and women only. In all three cases, quintile groups are defined based on the sample of all workers.

all workers is slightly increasing.36 Figure XI also suggests that the gains in female annual earnings we documented above were in part due to earnings gains of women already in the labor force rather than entirely due to the entry of new cohorts of women with higher earnings. Such gender differential results are robust to conditioning on birth cohort, as series of early- to late-career upward mobility display a very similar evolution over time (see Online Appendix Figure A.10). Hence, our upward mobility results show that the economic progress of women since the 1960s has had a large impact on long-term mobility series among all U.S. workers. Table III summarizes the long-term inequality and mobility results for all (Panel A), men (Panel B), and women (Panel C) by 36. It is conceivable that upward mobility is lower for women because even within P0–40, they are more likely to be in the bottom half of P0–40 than men. Kopczuk, Saez, and Song (2007) show that controlling for those differences leaves the series virtually unchanged. Therefore, controlling for base earnings does not affect our results.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

125

TABLE III LONG-TERM INEQUALITY AND MOBILITY

Year (1)

11-year earnings average Gini (2)

1956 1978 1999

0.437 0.477 0.508

1956 1978 1999

0.376 0.429 0.506

1956 1978 1999

0.410 0.423 0.459

Rank correlation after 20 years (3)

Upward mobility after 20 years (4)

#Workers (’000s) (5)

A. All 0.572 0.494

0.037 0.053

42,753 61,828 94,930

B. Men 0.465 0.458

0.084 0.071

27,952 37,187 52,761

C. Women 0.361 0.358

0.008 0.041

14,801 24,641 42,169

Notes. The table displays various measures of eleven-year average earnings inequality and long-term mobility centered around selected years, 1956, 1978, and 1999, for all workers (Panel A), men (Panel B), and women (Panel C). The sample in year t is defined as all employees with commerce and industry earnings averaged across the eleven-year span from t − 5 to t + 5 above a minimum threshold ($2,575 in 2004 and indexed using average wage for earlier years) and aged 25 to 60 (by January 1 of year t). Column (2) reports the Gini coefficients for those eleven-year earnings averages. Column (3) reports the rank correlation between eleven-year average earnings centered around year t and eleven-year average earnings centered around year t + 20 in the sample of workers (1) aged between 25 and 60 in both years t and t + 20, and (2) with eleven-year average earnings above the minimum threshold in both earnings spans t − 5 to t + 5 and t + 15 to t + 25. Column (4) reports the probability of moving to the top quintile group (P80–100) for eleven-year average earnings centered around year t + 20 conditional on having eleven-year average earnings centered around year t in the bottom two quintile groups (P0–40). The sample is the same as in column (3). Column (5) reports the number of workers in thousands.

reporting measures for selected eleven-year spans (1950–1960, 1973–1983, and 1994–2004). VI. CONCLUSIONS Our paper has used U.S. Social Security earnings administrative data to construct series of inequality and mobility in the United States since 1937. The analysis of these data has allowed us to start exploring the evolution of mobility and inequality over a lifetime as well as to complement the more standard analysis of annual inequality and short-term mobility in several ways. We found that changes in short-term mobility have not substantially affected the evolution of inequality, so that annual snapshots of the distribution provide a good approximation of the evolution of the longer-term measures of inequality. In particular, we find that increases in annual earnings inequality are driven almost entirely by increases in permanent earnings inequality, with much more modest changes in the variability of transitory earnings.

126

QUARTERLY JOURNAL OF ECONOMICS

However, our key finding is that although the overall measures of mobility are fairly stable, they hide heterogeneity by gender groups. Inequality and mobility among male workers has worsened along almost any dimension since the 1950s: our series display sharp increases in annual earnings inequality, slight reductions in short-term mobility, and large increases in long-term inequality with slight reduction or stability of long-term mobility. Against those developments stand the very large earning gains achieved by women since the 1950s, due to increases in labor force attachment as well as increases in earnings conditional on working. Those gains have been so great that they have substantially reduced long-term inequality in recent decades among all workers, and actually almost exactly compensate for the increase in inequality for males. COLUMBIA UNIVERSITY AND NATIONAL BUREAU OF ECONOMIC RESEARCH UNIVERSITY OF CALIFORNIA BERKELEY AND NATIONAL BUREAU OF ECONOMIC RESEARCH SOCIAL SECURITY ADMINISTRATION

REFERENCES Abowd, John M., and Martha Stinson, “Estimating Measurement Error in SIPP Annual Job Earnings: A Comparison of Census Survey and SSA Administrative Data,” Cornell University, Mimeo, 2005. Attanasio, Orazio, Erich Battistin, and Hidehiko Ichimura, “What Really Happened to Consumption Inequality in the US?” in Measurement Issues in Economics—The Paths Ahead. Essays in Honor of Zvi Griliches, Ernst Berndt and Charles Hulten, eds. (Chicago: University of Chicago Press, 2007). Autor, David, Lawrence F. Katz, and Melissa Schettini Kearney, “Trends in U.S. Wage Inequality: Revising the Revisionists,” Review of Economics and Statistics, 90 (2008), 300–323. Baker, Michael, and Gary Solon, “Earnings Dynamics and Inequality among Canadian Men, 1976–1992: Evidence from Longitudinal Income Tax Records,” Journal of Labor Economics, 21 (2003), 289–321. Blau, Francine D., “Trends in the Well-being of American Women, 1970–1995,” Journal of Economic Literature, 36 (1998), 112–165. Blau, Francine D., Marianne Ferber, and Anne Winkler, The Economics of Women, Men and Work, 4th ed. (Prentice-Hall, 2006). Carroll, Robert, David Joulfaian, and Mark Rider, “Income Mobility: The Recent American Experience,” Andrew Young School of Policy Studies, Georgia State University Working Paper 07-18, 2007. Congressional Budget Office, “Trends in Earnings Variability over the Past 20 Years,” Letter to the Honorable Charles E. Schumer and the Honorable Jim Webb, April 2007. Available at http://www.cbo.gov/ftpdocs/80xx/doc8007/ 04-17-EarningsVariability.pdf. Cutler, David, and Lawrence Katz, “Macroeconomic Performance and the Disadvantaged,” Brookings Papers on Economic Activity, 2 (1991), 1–74. Dynan, Karen E., Douglas W. Elmendorf, and Daniel E. Sichel, “The Evolution of Household Income Volatility,” Brookings Institution Working Paper, 2008.

EARNINGS INEQUALITY AND MOBILITY IN THE U.S.

127

Ferrie, Joseph P., “History Lessons: The End of American Exceptionalism? Mobility in the United States since 1850,” Journal of Economic Perspectives, 19 (2005), 199–215. Fields, Gary S., “Income Mobility,” Cornell University ILR School Working Paper 19, 2007. Available at http://digitalcommons.ilr.cornell.edu/workingpapers/ 19. Goldin, Claudia, Understanding the Gender Gap: An Economic History of American Women, NBER Series on Long-Term Factors in Economic Development (New York/Oxford/Melbourne: Oxford University Press, 1990). ——, “The Role of World War II in the Rise of Women’s Employment,” American Economic Review, 81 (1991), 741–756. ——, “The Quiet Revolution That Transformed Women’s Employment, Education, and Family,” American Economic Review Papers and Proceedings, 96 (2006), 1–21. Goldin, Claudia, and Robert A. Margo, “The Great Compression: The Wage Structure in the United States at Mid-Century,” Quarterly Journal of Economics, 107 (1992), 1–34. Gottschalk, Peter, “Inequality, Income Growth, and Mobility: The Basic Facts,” Journal of Economic Perspectives, 11 (1997), 21–40. Gottschalk, Peter, and Robert Moffitt, “The Growth of Earnings Instability in the U.S. Labor Market,” Brookings Papers on Economic Activity, 2 (1994), 217–254. Hacker, Jacob S, The Great Risk Shift: The Assault on American Jobs, Families Health Care, and Retirement—And How You Can Fight Back (Oxford, UK: Oxford University Press, 2006). Hungerford, Thomas L., “U.S. Income Mobility in the Seventies and Eighties,” Review of Income and Wealth, 39 (1993), 403–417. Katz, Lawrence F., and David Autor, “Changes in the Wage Structure and Earnings Inequality,” in Handbook of Labor Economics, Orley Ashenfelter and David Card, eds. (Amsterdam/New York: Elsevier/North Holland, 1999). Katz, Lawrence F., and Kevin M. Murphy, “Changes in Relative Wages, 1963– 87: Supply and Demand Factors,” Quarterly Journal of Economics, 107 (1992), 35–78. Kestenbaum, Bert, “Evaluating SSA’s Current Procedure for Estimating Untaxed Wages,” American Statistical Association Proceedings of the Social Statistics Section, Part 2 (1976), 461–465. Kopczuk, Wojciech, Emmanuel Saez, and Jae Song, “Uncovering the American Dream: Inequality and Mobility in Social Security Earnings Data since 1937,” National Bureau of Economic Research Working Paper 13345, 2007. Krueger, Dirk, and Fabrizio Perri, “Does Income Inequality Lead to Consumption Inequality? Evidence and Theory,” Review of Economic Studies, 73 (2006), 163– 193. Lemieux, Thomas, “Increasing Residual Wage Inequality: Composition Effects, Noisy Data, or Rising Demand for Skill?” American Economic Review, 96 (2006), 461–498. Lindert, Peter, “Three Centuries of Inequality in Britain and America,” in Handbook of Income Distribution, Anthony B. Atkinson and Francois Bourguignon, eds. (Amsterdam/New York: Elsevier/North Holland, 2000). Panis, Constantijn, Roald Euller, Cynthia Grant, Melissa Bradley, Christine E. Peterson, Randall Hirscher, and Paul Steinberg, SSA Program Data User’s Manual, RAND, 2000. Prepared for the Social Security Administration. Piketty, Thomas, and Emmanuel Saez, “Income Inequality in the United States, 1913–1998,” Quarterly Journal of Economics, 118 (2003), 1–39. Schiller, Bradley R., “Relative Earnings Mobility in the United States,” American Economic Review, 67 (1977), 926–941. Shin, Donggyun, and Gary Solon, “Trends in Men’s Earnings Volatility: What Does the Panel Study of Income Dynamics Show?” National Bureau of Economic Research Working Paper 14075, 2008. Shorrocks, Anthony F., “Income Inequality and Income Mobility,” Journal of Economic Theory, 19 (1978), 376–93. Slesnick, Daniel T., Consumption and Social Welfare: Living Standards and Their Distribution in the United States (Cambridge/New York/Melbourne: Cambridge University Press, 2001).

128

QUARTERLY JOURNAL OF ECONOMICS

Social Security Administration, Handbook of Old-Age and Survivors Insurance Statistics (annual), (Washington, DC: U.S. Government Printing Office, 1937– 1952). ——, Social Security Bulletin: Annual Statistical Supplement (Washington, DC: Government Printing Press Office, 1967). Solow, Robert M., “On the Dynamics of the Income Distribution,” Ph.D. dissertation (Harvard University, 1951). Topel, Robert H., and Michael P. Ward, “Job Mobility and the Careers of Young Men,” Quarterly Journal of Economics, 107 (1992), 439–479.

THE ROLE OF THE STRUCTURAL TRANSFORMATION IN AGGREGATE PRODUCTIVITY∗ MARGARIDA DUARTE AND DIEGO RESTUCCIA We investigate the role of sectoral labor productivity in explaining the process of structural transformation—the secular reallocation of labor across sectors—and the time path of aggregate productivity across countries. We measure sectoral labor productivity across countries using a model of the structural transformation. Productivity differences across countries are large in agriculture and services and smaller in manufacturing. Over time, productivity gaps have been substantially reduced in agriculture and industry but not nearly as much in services. These sectoral productivity patterns generate implications in the model that are broadly consistent with the cross-country data. We find that productivity catch-up in industry explains about 50% of the gains in aggregate productivity across countries, whereas low productivity in services and the lack of catch-up explain all the experiences of slowdown, stagnation, and decline observed across countries.

I. INTRODUCTION It is a well-known observation that over the last fifty years countries have experienced remarkably different paths of economic performance.1 Looking at the behavior of GDP per hour in individual countries relative to that in the United States, we find experiences of sustained catch-up, catch-up followed by a slowdown, stagnation, and even decline. (See Figure I for some illustrative examples.2 ) Consider, for instance, the experience of Ireland. Between 1960 and 2004, GDP per hour in Ireland relative to that of the United States rose from about 35% to 75%.3 Spain also experienced a period of rapid catch-up to the United States from 1960 to around 1990, a period during which relative GDP per hour rose from about 35% to 80%. Around 1990, however, this ∗ We thank Robert Barro, three anonymous referees, and Francesco Caselli for very useful and detailed comments. We also thank Tasso Adamopoulos, John Coleman, Mike Dotsey, Gary Hansen, Gueorgui Kambourov, Andr´es Rodr´ıguez-Clare, Richard Rogerson, Marcelo Veracierto, Xiaodong Zhu, and seminar participants at several conferences and institutions for comments and suggestions. Andrea Waddle provided excellent research assistance. All errors are our own. We gratefully acknowledge support from the Connaught Fund at the University of Toronto (Duarte) and the Social Sciences and Humanities Research Council of Canada (Restuccia). [email protected], [email protected] 1. See Chari, Kehoe, and McGrattan (1996), Jones (1997), Prescott (2002), and Duarte and Restuccia (2006), among many others. 2. We use GDP per hour as our measure of economic performance. Throughout the paper we refer to labor productivity, output per hour, and GDP per hour interchangeably. 3. All numbers reported refer to data trended using the Hodrick–Prescott filter. See Section II for details. C 2010 by the President and Fellows of Harvard College and the Massachusetts Institute of

Technology. The Quarterly Journal of Economics, February 2010

129

130

QUARTERLY JOURNAL OF ECONOMICS

FIGURE I Relative GDP per Hour—Some Countries GDP per hour in each country relative to that of the United States.

process slowed down dramatically and relative GDP per hour in Spain stagnated and later declined. Another remarkable growth experience is that of New Zealand, where GDP per hour fell from about 70% to 60% of that of the United States between 1970 and 2004. Along their modern paths of development, countries undergo a process of structural transformation by which labor is reallocated among agriculture, industry, and services. Over the last fifty years many countries have experienced substantial amounts of labor reallocation across sectors. For instance, from 1960 to 2004 the share of hours in agriculture in Spain fell from 44% to 6%, while the share of hours in services rose from 25% to 64%. In about the same period, the share of hours in agriculture in Belgium fell from 7% to 2%, while the share in services rose from 43% to 72%. In this paper we study the behavior of GDP per hour over time from the perspective of sectoral productivity and the structural transformation.4 Does a sectoral analysis contribute to the 4. See Baumol (1967) for a discussion of the implications of structural change on aggregate productivity growth.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

131

understanding of aggregate productivity paths? At a qualitative level the answer to this question is clearly yes. Because aggregate labor productivity is the sum of labor productivity across sectors weighted by the share of hours in each sector, the structural transformation matters for aggregate productivity. At a quantitative level the answer depends on whether there are substantial differences in sectoral labor productivity across countries. Our approach in this paper is to first develop a simple model of the structural transformation that is calibrated to the growth experience of the United States. We then use the model to measure sectoral labor productivity differences across countries at a point in time. These measures, together with data on growth in sectoral labor productivity, imply time paths of sectoral labor productivity for each country. We use these measures of sectoral productivity in the model to assess their quantitative effect on labor reallocation and aggregate productivity outcomes across countries. We find that there are large and systematic differences in sectoral labor productivity across countries. In particular, differences in labor productivity levels between rich and poor countries are larger in agriculture and services than in manufacturing. Moreover, over time, productivity gaps have been substantially reduced in agriculture and industry but not nearly as much in services. To illustrate the implications of these sectoral differences for aggregate productivity, imagine that productivity gaps remain constant as countries undergo the structural transformation. Then as developing countries reallocate labor from agriculture to manufacturing, aggregate productivity can catch up as labor is reallocated from a low–relative productivity sector to a high–relative productivity sector. Countries further along the structural transformation can slow down, stagnate, and decline as labor is reallocated from industry (a high–relative productivity sector) to services (a low–relative productivity sector). When the time series of sectoral productivity are fed into the model of the structural transformation, we find that high growth in labor productivity in industry relative to that of the United States explains about 50% of the catch-up in relative aggregate productivity across countries. Although there is substantial catch-up in agricultural productivity, we show that this factor contributes little to aggregate productivity gains in our sample countries. In addition, we show that low relative productivity in services and the lack of catch-up explain all the experiences of slowdown, stagnation, and decline in relative aggregate productivity observed across countries.

132

QUARTERLY JOURNAL OF ECONOMICS

We construct a panel data set on PPP-adjusted real output per hour and disaggregated output and hours worked for agriculture, industry, and services. Our panel data include 29 countries with annual data covering the period from 1956 to 2004 for most countries.5 From these data, we document three basic facts. First, countries follow a common process of structural transformation characterized by a declining share of hours in agriculture over time, an increasing share of hours in services, and a hump-shaped share of hours in industry. Second, there is substantial lag in the process of structural transformation for some countries, and this lag is associated with the level of relative income. Third, there are sizable and systematic differences in sectoral growth rates of labor productivity across countries. In particular, most countries observe higher growth rates of labor productivity in agriculture and manufacturing than in services. In addition, countries with high rates of aggregate productivity growth tend to have much higher productivity growth in agriculture and manufacturing than the United States, but this strong relative performance is not observed in services. Countries with low rates of aggregate labor productivity growth tend to observe low labor productivity growth in all sectors. We develop a general equilibrium model of the structural transformation with three sectors—agriculture, industry, and services. Following Rogerson (2008), labor reallocation across sectors is driven by two channels: income effects due to nonhomothetic preferences and substitution effects due to differential productivity growth across sectors.6 We calibrate the model to the structural transformation of the United States between 1956 and 2004. A model of the structural transformation is essential for the purpose of this paper for two reasons. First, we use the calibrated model to measure sectoral productivity differences across countries at one point in time. This step is needed because of the lack of comparable (PPP-adjusted) sectoral output data across a large set of countries. Second, the process of structural transformation is endogenous to the level and changes over time in sectoral labor productivity. As a result, a quantitative assessment of the aggregate implications of sectoral productivity differences requires that 5. Our sample does not include the poorest countries in the world: the labor productivity ratio between the richest and poorest countries in our data is only 10:1. 6. For recent models of the structural transformation emphasizing nonhomothetic preferences, see Kongsamut, Rebelo, and Xie (2001), and emphasizing substitution effects see Ngai and Pissarides (2007).

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

133

changes in the distribution of labor across sectors be consistent with sectoral productivity paths.7 The model implies that sectoral productivity levels in the first year in the sample tend to be lower in poor than in rich countries, particularly in agriculture and services, and the model implies low dispersion in productivity levels in manufacturing across countries. We argue that these differences in sectoral labor productivity levels implied by the model are consistent with the available evidence from studies using producer and micro data for specific sectors, for instance, Baily and Solow (2001) for manufacturing and service sectors and Restuccia, Yang, and Zhu (2008) for agriculture. These productivity levels together with data on sectoral labor productivity growth for each country imply time paths for sectoral productivity. Given these time paths, the model reproduces the broad patterns of labor reallocation and aggregate productivity growth across countries. The model also has implications for sectoral output and relative prices that are broadly consistent with the cross-country data. This paper is related to a large literature studying income differences across countries. Closely connected is the literature studying international income differences in the context of models with delay in the start of modern growth.8 Because countries in our data set have started the process of structural transformation well before the first year in the sample period, our focus is on measuring sectoral productivity across countries at a point in time and on assessing the role of their movement over time in accounting for the patterns of structural transformation and aggregate productivity growth across countries.9 Our paper is also closely related to a literature that emphasizes the sectoral composition of the economy in aggregate outcomes, for instance, Caselli and Coleman (2001), C´ordoba and Ripoll (2004), Coleman (2007), Chanda and Dalgaard (2008), Restuccia, Yang, and Zhu (2008), Adamopoulos and Akyol (2009), and Vollrath (2009).10 In studying the role of the structural transformation for cross-country aggregate productivity catch-up, our paper is closest to that of Caselli 7. This is in sharp contrast to the widely followed shift-share analysis approach where aggregate productivity changes are decomposed into productivity changes within sectors and labor reallocation. 8. See, for instance, Lucas (2000), Hansen and Prescott (2002), Ngai (2004), and Gollin, Parente, and Rogerson (2002). 9. Herrendorf and Valentinyi (2006) also consider a model to measure sectoral productivity levels across countries but instead use expenditure data from the Penn World Table. 10. See also the survey article by Caselli (2005) and the references therein.

134

QUARTERLY JOURNAL OF ECONOMICS

and Tenreyro (2006). We differ in that we use a model of the structural transformation to measure sectoral productivity levels and to assess the contribution of sectoral productivity for aggregate growth. In studying labor productivity over time, our paper is related to a literature studying country episodes of slowdown and depression.11 Most of this literature focuses on the effect of exogenous movements in aggregate total factor productivity and aggregate distortions on GDP relative to trend. We differ from this literature by emphasizing the importance of sectoral productivity in the structural transformation and the secular movements in relative GDP per hour across countries. The paper is organized as follows. In the next section we document some facts about the process of structural transformation and sectoral labor productivity growth across countries. Section III describes the economic environment and calibrates a benchmark economy to U.S. data for the period between 1956 and 2004. In Section IV we discuss the quantitative experiment and perform counterfactual analysis. We conclude in Section V.

II. SOME FACTS In this section we document the process of structural transformation and labor productivity growth in agriculture, industry, and services for the countries in our data set. Because we focus on long-run trends, data are trended using the Hodrick–Prescott filter with a smoothing parameter λ = 100. The Appendix provides a detailed description of the data. II.A. The Process of Structural Transformation The reallocation of labor across sectors over time is typically referred to in the economic development literature as the process of structural transformation. This process has been extensively documented.12 The structural transformation is characterized by a systematic fall over time in the share of labor allocated to agriculture, by a steady increase in the share of labor in services, and by a hump-shaped pattern for the share of labor in manufacturing. That is, the typical process of sectoral reallocation involves an increase in the share of labor in manufacturing in the early 11. See Kehoe and Prescott (2002) and the references therein. 12. See, for instance, Kuznets (1966) and Maddison (1980), among others.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

135

stages of the reallocation process, followed by a decrease in the later stages.13 We document the processes of structural transformation in our data set by focusing on the distribution of labor hours across sectors. We note, however, that this characterization is very similar to the one obtained by looking at shares of employment. Our panel data cover countries at very different stages in the process of structural transformation. For instance, our data include countries that in 1960 allocated about 70% of their labor hours to agriculture (e.g., Turkey and Bolivia), as well as countries that in the same year had shares of hours in agriculture below 10% (e.g., the United Kingdom). Despite this diversity, all countries in the sample follow a common process of structural transformation. First, all countries exhibit declining shares of hours in agriculture, even the most advanced countries in this process, such as the United Kingdom and the United States. Second, countries at an early stage of the process of structural transformation exhibit a hump-shaped share of hours in industry, whereas this share is decreasing for countries at a more advanced stage. Finally, all countries exhibit an increasing share of hours in services. To illustrate these features, Figure II plots sectoral shares of hours for Greece, Ireland, Spain, and Canada. The processes of structural transformation observed in our sample suggest two additional observations. First, the lag in the structural transformation observed across countries is systematically related to the level of development: poor countries have the largest shares of hours in agriculture, while rich countries have the smallest shares.14 Second, our data suggest the basic tendency for countries that start the process of structural transformation later to accomplish a given amount of labor reallocation faster than those countries that initiated this process earlier.15 13. In this paper we refer to manufacturing and industry interchangeably. In the Appendix we describe in detail our definition of sectors in the data. 14. See, for instance, Gollin, Parente, and Rogerson (2007) and Restuccia, Yang, and Zhu (2008) for a detailed documentation of this fact for shares of employment across a wider set of countries. 15. According to the U.S. Census Bureau (1975), Historical Statistics of the United States, the distribution of employment in the United States circa 1870 resembles that of Portugal in 1950. By 1948 the sectoral shares in the United States were 0.10, 0.34, and 0.56, levels that Portugal reached sometime during the 1990s. Although Portugal is lagging behind the process of structural transformation of the United States, it has accomplished about the same reallocation of labor across sectors in less than half the time (39 years as opposed to 89 years in the United States). See Duarte and Restuccia (2007) for a detailed documentation of these observations.

136

QUARTERLY JOURNAL OF ECONOMICS

FIGURE II Shares of Hours—Some Countries

II.B. Sectoral Labor Productivity Growth For the United States, the annualized growth rate of labor productivity between 1956 and 2004 has been highest in agriculture (3.8%), second in industry (2.4%), and lowest in services (1.3%).16 This ranking of growth rates of labor productivity across sectors is observed in 23 of the 29 countries in our sample, and in all countries but Venezuela, the growth rate in services is the smallest. Nevertheless, there is an enormous variation in sectoral labor productivity growth across countries. Figure III plots the annualized growth rate of labor productivity in each sector against the annualized growth rate of aggregate labor productivity for all countries in our data set. The sectoral growth rate of the United States in each panel is identified by the horizontal dashed line, whereas the vertical dashed 16. The annualized percentage growth rate of variable x over the period t to t + T is computed as [(xt+T /xt )1/T − 1] × 100.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

137

Annualized growth rate of aggregate labor productivity

FIGURE III Sectoral Growth Rates of Labor Productivity (%) Aggregate labor productivity is GDP per hour, whereas sectoral labor productivity is value added per hour in each sector. Annualized percentage growth rates during the sample period are given for each country. The horizontal lines indicate the sectoral growth rates observed in the United States, and the vertical line indicates the aggregate growth rate of the United States.

line marks the growth rate of aggregate productivity of the United States. This figure documents the tendency for countries to feature higher growth rates of labor productivity in agriculture and manufacturing than in services. For instance, in our panel, the average growth rates in agriculture and manufacturing are 4.0% and 3.1%, whereas the average growth rate in services is 1.3%. Figure III also illustrates that countries with low aggregate labor productivity growth relative to the United States tend to have low productivity growth in all sectors (e.g., Latin American countries), whereas countries with high relative aggregate labor productivity growth tend to have higher productivity growth than the United States in agriculture and, especially, industry (e.g., European countries, Japan, and Korea). For the countries that grew faster than the United States in aggregate productivity,

138

QUARTERLY JOURNAL OF ECONOMICS

labor productivity growth exceeds that for the United States by, on average, 1 percentage point in agriculture and 1.5 percentage points in industry. In contrast, labor productivity growth in services for these countries exceeds that for the United States by only 0.4 percentage point. The fact is that few countries have observed a much higher growth rate of labor productivity in services than the United States. These features of the data motivate some of the counterfactual exercises we perform in Section IV.

III. ECONOMIC ENVIRONMENT We develop a simple model of the structural transformation of an economy where at each date three goods are produced: agriculture, industry, and services. Following Rogerson (2008), labor reallocation across sectors is driven by two forces—an income effect due to nonhomothetic preferences and a substitution effect due to differential productivity growth between industry and services. We calibrate a benchmark economy to U.S. data and show that this basic framework captures the salient features of the structural transformation in the United States from 1956 to 2004. III.A. Description Production. At each date three goods are produced— agriculture (a), manufacturing (m), and services (s)—according to the following constant–returns to scale production functions: (1)

Yi = Ai Li ,

i ∈ {a, m, s},

where Yi is output in sector i, Li is labor input in sector i, and Ai is a sector-specific technology parameter.17 When mapping the model to data, we associate the labor input Li with hours allocated to sector i. We assume that there is a continuum of homogeneous firms in each sector that are competitive in goods and factor markets. At each date, given the price of good i, output pi , and wages w, a 17. We note that labor productivity in each sector is summarized in the model by the productivity parameter Ai . There are many features that can explain differences over time and across countries in labor productivity such as capital intensity and factor endowments. Accounting for these sources can provide a better understanding of labor productivity facts. Our analysis abstracts from the sources driving labor productivity observations.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

139

representative firm in sector i solves max{ pi Ai Li − wLi }.

(2)

Li ≥0

Households. The economy is populated by an infinitely lived representative household of constant size. Without loss of generality we normalize the population size to one. The household is endowed with L units of time each period, which are supplied inelastically to the market. We associate L with total hours per capita in the data. The household has preferences over consumption goods as follows: ∞

β t u(ca,t , ct ),

β ∈ (0, 1),

t=0

where ca,t is the consumption of agricultural goods at date t and ct is the consumption of a composite of manufacturing and service goods at date t. The per-period utility is given by ¯ + (1 − a) log(ct ), u(ca,t , ct ) = a log(ca,t − a)

a ∈ [0, 1],

where a¯ > 0 is a subsistence level of agricultural goods below which the household cannot survive. This feature of preferences has a long tradition in the development literature and it has been emphasized as a quantitatively important feature leading to the movement of labor away from agriculture in the process of structural transformation.18 The composite nonagricultural consumption good ct is given by 1/ρ ρ , ct = bcm,t + (1 − b)(cs,t + s¯ )ρ where s¯ > 0, b ∈ (0, 1), and ρ < 1. For s¯ > 0, these preferences imply that the income elasticity of service goods is greater than one. We note that s¯ works as a negative subsistence consumption level—when the income of the household is low, less resources are allocated to the production of services, and when the income of the household increases, resources are reallocated to services. The parameter s¯ can also be interpreted as a constant level of production of service goods at home. Our approach to modeling the 18. See, for instance, Echevarria (1997), Laitner (2000), Caselli and Coleman (2001), Kongsamut, Rebelo, and Xie (2001), Gollin, Parente, and Rogerson (2002), and Restuccia, Yang, and Zhu (2008).

140

QUARTERLY JOURNAL OF ECONOMICS

home sector for services is reduced-form. Rogerson (2008) considers a generalization of this feature where people can allocate time to market and nonmarket production of service goods. However, we argue that our simplification is not as restrictive as it may first appear, because we abstract from the allocation of time between market and nonmarket activities. Our focus is on the determination of aggregate productivity from the allocation of time across market sectors. Because we abstract from intertemporal decisions the problem of the household is effectively a sequence of static problems.19 At each date and given prices, the household chooses consumption of each good to maximize the per-period utility subject to the budget constraint. Formally, ρ 1 ρ , (3) max a log(ca − a) ¯ + (1 − a) log bcm + (1 − b)(cs + s¯ ) ci ≥0 ρ subject to pa ca + pmcm + ps cs = wL. Market Clearing. The demand for labor from firms must equal the exogenous supply of labor by households at every date: La + Lm + Ls = L.

(4)

Also, at each date, the market for each good produced must clear: (5)

ca = Ya ,

cm = Ym,

cs = Ys .

III.B. Equilibrium A competitive equilibrium is a set of prices { pa , pm, ps }, allocations {ca , cm, cs } for the household, and allocations {La , Lm, Ls } for firms such that (i) given prices, firm’s allocations {La , Lm, Ls } solve the firm’s problem in (2); (ii) given prices, household’s allocations {ca , cm, cs } solve the household’s problem in (3); and (iii) markets clear: equations (4) and (5) hold. The first-order condition from the firm’s problem implies that the benefit and cost of a marginal unit of labor must be equal. Normalizing the wage rate to one, this condition implies that prices 19. Because we are abstracting from intertemporal decisions such as investment, our analysis is not crucially affected by alternative stochastic assumptions on the time path for labor productivity.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

141

of goods are inversely related to productivity: pi =

(6)

1 . Ai

Note that in the model, price movements are driven solely by labor productivity changes. The first-order conditions for consumption imply that the labor input in agriculture is given by (7)

La = (1 − a)

a¯ s¯ . + a L+ Aa As

When a = 0, the household consumes a¯ of agricultural goods each period, and labor allocation in agriculture depends only on the level of labor productivity in that sector. When productivity in agriculture increases, labor moves away from the agricultural sector. This restriction on preferences implies that output and consumption per capita of agricultural goods are constant over time, implications that are at odds with data. When a > 0 and productivity growth is positive in all sectors, the share of labor allocated to agriculture converges asymptotically to a and the nonhomothetic terms in preferences become asymptotically irrelevant in the determination of the allocation of labor. In this case, output and consumption per capita of agricultural goods grow at the rate of labor productivity. The first-order conditions for consumption of manufacturing and service goods imply that b (1 − b)

cm cs + s¯

ρ−1

=

pm . ps

This equation can be rewritten as Lm =

(8)

(L − La ) + s¯ /As , 1+x

where x≡

b 1−b

1/(ρ−1)

Am As

ρ/(ρ−1)

,

142

QUARTERLY JOURNAL OF ECONOMICS

and La is given by (7).20 Equation (8) reflects the two forces that drive labor reallocation between manufacturing and services in the model. First, suppose that preferences are homothetic (i.e., s¯ = 0). In this case, Lm/Ls = 1/x and differential productivity growth in manufacturing relative to services is the only source of labor reallocation between these sectors (through movements in x) as long as ρ is not equal to zero. In particular, when s¯ = 0, the model can be consistent with the observed labor reallocation from manufacturing into services as labor productivity grows in the manufacturing sector relative to services if the elasticity of substitution between these goods is low (ρ < 0). Second, suppose that s¯ > 0 (i.e., preferences are nonhomothetic) and that either labor productivity grows at the same rate in manufacturing and services, or ρ = 0, so that x is constant. Then, for a given La , productivity improvements lead to the reallocation of labor from manufacturing into services (services are more income-elastic). The model allows both channels to be operating during the structural transformation. III.C. Calibration We calibrate a benchmark economy to U.S. data for the period from 1956 to 2004. Our calibration strategy involves selecting parameter values so that the equilibrium of the model matches the salient features of the structural transformation for the United States during this period. We assume that a period in the model is one year. We need to select parameter values for a, b, ρ, a, ¯ s¯ , and the time series of productivity for each sector Ai,t for t from 1956 to 2004 and i ∈ {a, m, s}. We proceed as follows. First, we normalize productivity levels across sectors to one in 1956; that is, Ai,1956 = 1 for all i ∈ {a, m, s}. Then we use data on the growth rate of sectoral value added per hour in the United States to obtain the time paths of sectoral labor productivity. In particular, denoting as γi,t the growth rate of labor productivity in sector i at date t, we obtain the time path of labor productivity in each sector as Ai,t+1 = (1 + γi,t )Ai,t . Second, with positive productivity growth in all sectors, the share of hours 20. When the growth rates of sectoral labor productivity are positive, the model implies that, in the long run, the share of hours in manufacturing and services asymptote to constants that depend on preference parameters a, b, ρ and any permanent level difference in labor productivity between manufacturing and services. If productivity growth in manufacturing is higher than in services, then the share of hours in manufacturing asymptotes to 0 and the share of hours in services to (1 − a).

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

143

TABLE I PARAMETER VALUES AND U.S. DATA TARGETS Parameter

Value

Target

Ai,1956 {Aa,t }2004 t=1957

1.0 {·}

Normalization Productivity growth in agriculture

{Am,t }2004 t=1957

{·}

Productivity growth in industry

{As,t }2004 t=1957 a a¯ s¯ b ρ

{·} 0.01 0.11 0.89 0.04 −1.5

Productivity growth in services Long-run share of hours in agriculture Share of hours in agriculture 1956 Share of hours in industry 1956 Share of hours in industry 1957–2004 Aggregate productivity growth

in agriculture converges to a in the long run. Because the share of hours in agriculture has been falling systematically and was about 3% in 2004, we assume a long-run share of 1%. Although this target is somewhat arbitrary, our main results are not sensitive to this choice. Third, given values for ρ and b, a¯ and s¯ are chosen to match the shares of hours in agriculture and manufacturing in the United States in 1956 using equations (7) and (8). Finally, b and ρ are jointly chosen to match as close as possible the share of hours in manufacturing over time and the annualized growth rate of aggregate productivity. The annualized growth rate in labor productivity in the United States between 1956 and 2004 is roughly 2%. Table I summarizes the calibrated parameters and targets. The shares of hours implied by the model are reported in Figure IV (dotted lines), together with data on the shares of hours in the United States (solid lines). The equilibrium allocation of hours across sectors in the model closely matches the process of structural transformation in the United States during the calibrated period. The model implies a fall in the share of hours in manufacturing from about 39% in 1956 to 24% in 2004, whereas the share of hours in services increases from about 49% to 73% during this period.21 Notice that even though the calibration only targets the share of hours in agriculture in 1956 (13%), the model implies a time path for the equilibrium share of hours in agriculture that is remarkably close to the data, declining to about 3% in 2004. 21. We emphasize that the model can deliver a hump-shaped pattern for labor in manufacturing for less developed economies even though during the calibrated period the U.S. economy is already in the second stage of the structural transformation, whereby labor is being reallocated away from manufacturing.

144

QUARTERLY JOURNAL OF ECONOMICS

FIGURE IV Share of Hours by Sector—Model vs. U.S. Data

The model also has implications for sectoral output and for relative prices. Sectoral output is given by labor productivity times labor input. Because the model matches closely the time path of sectoral labor allocation for the U.S. economy, the output implications of the model over time for the United States are very close to the data. In particular, the model implies that output growth in agriculture is 2.08% per year (versus 2.29% in the data), whereas output growth in manufacturing and services in the model is 2.74% and 3.60% (versus 2.70% and 3.61% in the data). The model implies that the producer price of good i relative to good i is given by the ratio of labor productivity in these sectors: (9)

pi Ai = . pi Ai

We assess the price implications of the model against data on sectoral relative prices.22 The model implies that the producer price of 22. Data for sectoral relative prices are available from 1971 to 2004. See the Appendix for details.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

145

services relative to industry increases by 0.94% per year between 1971 and 2004, very close to the increase in the data for the relative price of services from the implicit price deflators (0.87% per year). The price of agriculture relative to manufacturing declines in the model at a rate of 1.04% per year from 1971 to 2004. This fall in the relative price of agriculture is consistent with the data, although the relative price of agriculture falls somewhat more in the data (3.12% per year) than in the model.23 Because productivity growth across sectors is the driving force in the model, it is reassuring that this mechanism generates implications that are broadly consistent with the data. For this reason, we also discuss the relative price implications of the model when assessing the relevance of sectoral productivity growth for labor reallocation in the cross-country data in Section IV. IV. QUANTITATIVE ANALYSIS In this section, we assess the quantitative effect of sectoral labor productivity on the structural transformation and aggregate productivity outcomes across countries. In this analysis we maintain preference parameters as in the benchmark economy and proceed in three steps. First, we use the model to restrict the level of sectoral labor productivity in the first period for each country. Second, using these levels and data on sectoral labor productivity growth in each country as the exogenous time-varying factors, the model implies time paths for the allocation of hours across sectors and aggregate labor productivity for each country. We assess the cross-country implications of the model with data for labor reallocation across sectors, aggregate productivity, and relative prices. Third, we perform counterfactual exercises to assess the quantitative importance of sectoral analysis in explaining aggregate productivity experiences across countries. IV.A. Relative Sectoral Productivity Levels We use the model to restrict the levels of labor productivity in agriculture, industry, and services relative to those in the 23. We note that in the context of our model distortions to the price of agriculture would not substantially affect the equilibrium allocation of labor in agriculture because this is mainly determined by labor productivity in agriculture relative to the subsistence constraint (a is close to zero in the calibration). In this context, it would be possible to introduce price distortions to match the faster decline in the relative price of agriculture in the data without affecting our main quantitative results.

146

QUARTERLY JOURNAL OF ECONOMICS

United States for the first year in the sample for each country. This step is needed because of the lack of comparable (PPPadjusted) sectoral output data across a large set of countries. Because our data on sectoral value added are in constant local currency units, some adjustment is needed. Using market exchange rates would be problematic for arguments well discussed in the literature, such as Summers and Heston (1991). Another approach would be to apply the national currency shares of value added to the PPP-adjusted measure of real aggregate output from the Penn World Tables (PWT). This is problematic because it assumes that the PPP-conversion factor for aggregate output applies to all sectors in that country, whereas there is strong evidence that the PPP-conversion factors differ systematically across sectors in development.24 Using detailed categories from the International Comparisons Program (ICP) benchmark data in the PWT would also be problematic for inferences at the sector level because these data are based on the expenditure side of national accounts. For instance, it would not be advisable to use food expenditures and their PPP-conversion factor to adjust units of agricultural output across countries because food expenditures include charges for goods and services not directly related to agricultural production. Our approach is to use the model to back out sector-specific PPP-conversion factors for each country and to use the constantprice value-added data in local currency units to calculate growth rates of labor productivity in each sector for each country. In particular, we use the model to restrict productivity levels in the initial period and use the data on growth rates of labor productivity to construct the time series for productivity that we feed into the model. The underlying assumption is that the growth rate of value added in constant domestic prices is a good measure of real changes in output. This approach of using growth rates as a measure of changes in “quantities” is similar to the approach followed in the construction of panel data of comparable output across countries, such as the PWT.25 We proceed as follows. For each country j, we choose the three j j j labor productivity levels Aa , Am, and As to match three targets 24. See, for instance, the evidence on agriculture relative to nonagriculture in Restuccia, Yang, and Zhu (2008). 25. In particular, in the PWT, the growth rates of expenditure categories such as consumption and investment are the growth rates of constant domestic price expenditures from national accounts.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

147

FIGURE V Relative Labor Productivity across Sectors—First Year Labor productivity relative to the level of the United States.

from the data in the first year in the sample: (1) the share of hours in agriculture, (2) the share of hours in manufacturing (therefore the model matches the share of hours in services by labor market clearing), and (3) aggregate labor productivity relative to that of the United States.26 Figure V plots the average level of sectoral labor productivity relative to the level of the United States for countries in each quintile of aggregate productivity in the first year. The model implies that relative sectoral productivity in the first year tends to be lower in poorer countries than in richer countries, but particularly so in agriculture and services. In fact, the model implies 26. We adjust s¯ by the level of relative productivity in services in the first period for each country so that s¯ /As is constant across countries in the first period of the sample. Although it is not modeled explicitly, one interpretation of s¯ is as service goods produced at home. Therefore, s¯ cannot be invariant to large changes in productivity levels in services.

148

QUARTERLY JOURNAL OF ECONOMICS

FIGURE VI Relative Labor Productivity across Sectors—First and Last Years Labor productivity relative to the level of the United States.

that the dispersion of relative productivity in agriculture and services is much larger than in manufacturing. In the first year, the six poorest countries have relative productivity in agriculture and services of around 20% and 10%, whereas the six richest countries have relative productivity in these sectors of around 86% and 84%. In contrast, for manufacturing, average relative productivity of the six poorest countries in the first year is 31% and that of the six richest countries is 70%. The levels of sectoral labor productivity implied by the model for the first year, together with data on growth rates of sectoral value added per hour in local currency units, imply time paths for sectoral labor productivity in each country. In particular, letting j γi,t denote the growth rate of labor productivity in country j, sector j j j i, at date t, we obtain sectoral productivity as Ai,t+1 = (1 + γi,t )Ai,t . Figure VI plots the average level of sectoral labor productivity relative to the level in the United States in the first and last years

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

149

for countries in each quintile of aggregate productivity in the first year. We note that, on average, countries have experienced substantial gains in productivity in agriculture and industry relative to the United States (from an average relative productivity level of 48% and 51% in the first period to 71% and 75% in the last period). In sharp contrast, countries experienced, on average, much smaller gains in productivity in services relative to the United States (from an average relative productivity level of 46% to 49%). These features are particularly pronounced for countries in the top three quintiles of the productivity distribution. For these countries, average relative labor productivity in agriculture and industry increased from 66% and 59% to 100% and 85%, whereas average productivity in services increased from 63% to only 66%. We emphasize that the low levels of relative productivity in services in the first period together with the lack of catch-up over time imply that, for most countries, relative productivity levels in services are lower than those in agriculture and industry at the end of the sample period. Therefore, as these economies allocate an increasing share of hours to services, low relative labor productivity in this sector dampens aggregate productivity growth. These relative productivity patterns are suggestive of the results we discuss in Section IV.C, where we show that productivity catchup in industry explains a large portion of the gains in aggregate productivity across countries. In addition, we show that low relative productivity levels in services and the lack of catch-up play a quantitatively important role in explaining the growth episodes of slowdown, stagnation, and decline in aggregate productivity across countries. We argue that our productivity-level results are consistent with the available evidence from studies using producer and micro data. Empirical studies provide internationally comparable measures of labor productivity for some sectors and some countries. These studies typically provide estimates for narrow sectoral definitions at a given point in time. One such study for agriculture is from the Food and Agriculture Organization (FAO) of the United Nations. This study uses producer data (prices of detailed categories at the farm gate) to calculate international prices and comparable measures of output in agriculture using a procedure similar to that of Summers and Heston (1991) for the construction of the PWT. We find that the labor productivity differences in agriculture implied by the model are qualitatively consistent with the differences in GDP per worker in agriculture between

150

QUARTERLY JOURNAL OF ECONOMICS

rich and poor countries from the FAO for 1985.27 Baily and Solow (2001) have compiled a number of case studies from the McKinsey Global Institute (MGI) documenting labor productivity differences in some sectors and countries. Their findings are broadly consistent with our results. In particular, Baily and Solow emphasize a pattern that emerges from the micro studies where productivity differences across countries in services are not only large but also larger than the differences for manufacturing. The Organization for Economic Cooperation and Development (OECD) and MGI provide studies at different levels of sectoral disaggregation for manufacturing. These studies report relative productivity for a relatively small set of countries, and most studies report estimates only at one point in time. One exception is Pilat (1996). This study reports relative labor productivity levels in manufacturing for 1960, 1973, 1985, and 1995 for thirteen countries. Although the implied relative labor productivity levels in industry in our model tend to be higher than those reported in this study, the patterns of relative productivity are consistent for most countries. Finally, consistent with our findings, several studies report that the United States has higher levels of labor productivity in service sectors than other developed countries and that lower labor productivity in service sectors compared to manufacturing is pervasive.28 IV.B. The Structural Transformation across Countries Given paths for sectoral labor productivity, the model has time-series implications for the allocation of labor hours and output across sectors, aggregate labor productivity, and relative prices for each country. In this section we evaluate the implications of the model against the available cross-country data. Overall, the model reproduces the salient features of the structural transformation and aggregate productivity across countries. Figures VII and VIII illustrate this performance. Figure VII reports the shares of hours in each sector and relative aggregate productivity in the last period of the sample for each country in the model and in the data. Figure VIII reports the change in 27. See Restuccia, Yang, and Zhu (2008) for a detailed documentation of the cross-country differences in labor productivity in agriculture. 28. Baily, Farrell, and Remes (2005), for instance, estimate that, relative to the United States, France and Germany had lower relative productivity levels in 2000 and had lower growth rates of labor productivity between 1992 and 2000 for a set of narrowly defined service sectors, with the exception of mobile telecommunications.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

151

FIGURE VII Model vs. Data across Countries—Levels in the Last Year Each plot reports the value for each variable in the last period for the model and the data.

these variables (in percentage points) between the last and first periods in the model and in the data. As these figures illustrate, the model replicates well the patterns of the allocation of hours across sectors and relative aggregate productivity observed in the data, particularly so for the share of hours in agriculture and relative aggregate productivity. This performance attests to the ability of the model to replicate the basic trends observed for the share of hours in agriculture across a large sample of countries. Regarding the share of hours in industry, the model tends to imply a smaller increase over time compared to the data, particularly for less developed economies where the share of hours in industry increased over the sample period. Conversely, the model tends to imply a larger increase in the share of hours in services over the sample period than that observed in the data. This implication of the model suggests that, especially for some less developed countries, distortions or frictions in labor reallocation between industry and services may be important in accounting for their

152

QUARTERLY JOURNAL OF ECONOMICS

FIGURE VIII Model vs. Data across Countries—Changes Each plot reports the change between the last and first period (in percentage points) of each variable during the sample period in the data and in the model.

structural transformation.29 As a summary statistic for the performance of the model in replicating the time-series properties of the data, we compute the average absolute deviation (over time and across countries) in percentage points (p.p.) between a given time series in the model and in the data.30 The average absolute deviations for the shares of hours in agriculture, industry, and 29. Although in most cases the model does well in reproducing the time series in the data, in some countries modifications to the simple model would be required in order to better account for the process of structural transformation and aggregate productivity growth—see Duarte and Restuccia (2007) for an application of wedges across sectors in Portugal. These richer environments, however, would require country-specific analysis. We instead maintain our simple model specification and leave these interesting country-specific experiences for future research. 30. We measure the average absolute deviation in percentage points between the time series in the model and the data across countries as ϒ = 1 J T j d m j=1 t=1 abs(x j,t − x j,t ) × 100, where j is the country index and T j is the JT j

sample size for country j.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

153

services are 2, 6, and 7 p.p., respectively, and 4 p.p. for relative aggregate productivity. We conclude that the model captures the bulk of the labor reallocation and aggregate productivity experiences across countries. To better understand our finding about aggregate productivity, recall that aggregate labor productivity is the sum of labor productivity in each sector weighted by the share of labor in that sector, that is, Yi Li Y = . L i∈{a,m,s} Li L As a result, the behavior of aggregate productivity arises from the behavior of sectoral labor productivity and the allocation of labor across sectors over time.31 Because the model reproduces the salient features of labor reallocation across countries, aggregate productivity growth in the model is also broadly consistent with the cross-country data. The model has implications for sectoral output in each country. Sectoral output is given by the product of labor productivity and labor hours. As a result, the growth rate of output in sector i is the sum of the growth rates of labor productivity Ai (which we take from the data) and the growth in labor hours Li . The fact that the model reproduces well the cross-country patterns of the structural transformation implies that sectoral output growth is also well captured by the model. The model also has implications for levels and changes over time in relative prices across countries. We first discuss the implications for changes in relative prices. Figure IX plots the annualized percentage change in the producer prices of agriculture and services relative to manufacturing in the model and in the data. The figure shows that the model captures the broad patterns of price changes in the data—because productivity growth tends to be faster in agriculture than in industry and in industry than in services in most countries, the tendency is for the relative price of agriculture to fall and the relative price of services to increase over time. The direction of changes in the relative price of agriculture in the model matches the data for 23 of 29 countries in the sample (80%). For the relative price of services, the model is consistent 31. Note that in the above equation, sectoral labor productivity is measured at a common set of prices across countries. We use the prices of the benchmark economy in 1956.

154

QUARTERLY JOURNAL OF ECONOMICS

FIGURE IX Changes in Relative Prices (%) Each figure reports the annualized percentage change of the variable in the time series in the data and in the model. Relative prices of agriculture and services refer to the prices of agriculture and services relative to industry. Data on relative prices cover the period 1971 to 2004.

with the data in 25 countries (86%). We note that in the model, the only factors driving relative price changes over time are the growth in labor productivity across sectors. Of course, many other factors can affect the magnitude of price changes over time, so the model cannot capture all the changes. Now we turn to the implications of the model for price-level differences across countries. Recall that the prices of agriculture and services relative to industry are given by the inverse of labor productivity ( pa / pm = Am/Aa and ps / pm = Am/As ). The fact that the dispersion in productivity across rich and poor countries is large in agriculture and services relative to industry implies that the relative prices of agriculture and services are higher in poor than in rich countries. These implications may seem inconsistent at first with conventional wisdom about price-level differences

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

155

across countries. We emphasize that this view stems from observations about expenditure prices (often from ICP or PWT data) instead of producer prices. Our model, however, is better characterized as having implications for producer prices across countries. To see why the distinction between producer and expenditure prices is important, consider first the conventional wisdom that food is cheap in poor countries. This observation arises when the PPP-expenditure price of food is compared across countries using market exchange rates. For the sample of countries in Restuccia, Yang, and Zhu (2008), the dollar price of food is 60% higher in rich than in poor countries and the elasticity with respect to GDP per worker is positive and significant at 0.23.32 Food expenditures, however, include distribution and other charges— in the United States, for every dollar of food expenditure, only 20 cents represents payments to the farmer for the agricultural product—and the distinction between producer and expenditure prices may differ systematically across countries. In fact, producer-price data reveal a striking conclusion about the relative price of agriculture across countries: the evidence from FAO and PWT data in Restuccia, Yang, and Zhu (2008) is that the price of agricultural goods relative to nonagricultural goods is much lower in rich than in poor countries (a ratio of 0.22) and the elasticity of this relative price with respect to GDP per worker is negative and statistically significant at −0.34. This evidence is consistent with the price implications of the model for agriculture.33 Regarding the relative price of services, the conventional wisdom is that the price of services is higher in rich than in poor countries. This view stems again from observations about expenditure prices; see Summers and Heston (1991, pp. 338 and 339). We argue that this evidence is not necessarily inconsistent with the producer-price implications of the model, because the gap between expenditure and producer price-levels may be affected by many factors that can be systematically related to development.34 32. Note, however, that when the price of food is compared relative to the price of all goods, food appears expensive in poor countries. See Summers and Heston (1991, p. 338). 33. The distinction between food and agricultural goods prices is also important for the implications of price changes through time. For example, in the United States, the annualized growth rate of food prices from the Consumer Price Index relative to the price of manufacturing goods is positive, about 1% per year from 1971 to 2005, whereas the growth rate of the price of agriculture relative to manufacturing is negative, at roughly −2.5%. 34. Nevertheless, it is an interesting question for future research to assess the factors explaining higher expenditure price levels of services in rich countries.

156

QUARTERLY JOURNAL OF ECONOMICS

Because there are no systematic producer price-level data for services that can be compared with the price implications of the model, we focus instead on the indirect evidence from productivity measurements found in micro studies. The lower relative price of services in rich countries in the model stems from a higher relative productivity in services than in manufacturing compared to poor countries. Thus, we use the available sectoral productivity measurements to indirectly assess the price implications of the model for services. The evidence presented by Baily and Solow (2001) and other OECD studies discussed earlier suggests that labor productivity differences between rich and poor countries in services are larger than those for manufacturing sectors. This evidence is consistent with our productivity findings and therefore indirectly provides some assurance of the price implications of the model for services. IV.C. Counterfactuals We construct a series of counterfactuals aimed at assessing the quantitative importance of sectoral labor productivity on the process of structural transformation and aggregate productivity experiences across countries. We focus on two sets of counterfactuals. The first set is designed to illustrate the mechanics of positive sectoral productivity growth for labor reallocation and the contribution of productivity growth differences across sectors and countries for labor reallocation and aggregate productivity. The second set of counterfactuals focuses on explaining aggregate productivity growth experiences of catch-up, slowdown, stagnation, and decline by assessing the contribution of specific cross-country sectoral productivity patterns, such as productivity catch-up in agriculture and industry and low productivity levels and the lack of catch-up in services. The Mechanics of Sectoral Productivity Growth. We start by considering counterfactuals where we set the growth rate of labor productivity in one sector to zero in all countries, leaving the remaining growth rates as in the data. These counterfactuals illustrate the importance of productivity growth in each sector for labor reallocation and aggregate productivity. Summary statistics are reported in Figure X and Table II. In Figure X we report, for each country, the change in the time series of the share of hours in each sector and relative aggregate productivity between the last and first periods (in percentage points) in the counterfactual

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

157

FIGURE X The Mechanics of Sectoral Productivity Growth Counterfactuals (1) to (3) set the growth rate of labor productivity in a sector to zero in all countries, leaving the other sectors as in the data, for agriculture (first column), industry (second column), and services (third column). Counterfactual (4) sets labor productivity growth in each sector to aggregate productivity growth in the United States. Each panel plots the change between the last and the first period in the time series (in percentage points) of the share of hours in each sector and relative aggregate productivity in the model and in the counterfactual.

and in the model. In Table II we report the average change in the model for all countries, for countries that catch up, and for countries that decline relative to the United States. Consider first the counterfactual for agriculture (γa = 0). No productivity growth in agriculture generates no labor reallocation away from agriculture: there is an average increase in the share of hours in agriculture of 2 p.p. in the counterfactual instead of a decrease of 26 p.p. in the model. As a result, much less labor is reallocated to services. This counterfactual has important negative implications for relative aggregate productivity for most countries regardless of their level

158

QUARTERLY JOURNAL OF ECONOMICS TABLE II SECTORAL GROWTH, LABOR REALLOCATION, AND AGGREGATE PRODUCTIVITY Change in share of hours Agriculture

Model Counterfactual: (1) γa = 0 (2) γm = 0 (3) γs = 0 (4) γi = γ US

−25.5 2.1 −25.5 −25.2 −16.8

Model Counterfactual: (1) γa = 0 (2) γm = 0 (3) γs = 0 (4) γi = γ US

−24.3

Model Counterfactual: (1) γa = 0 (2) γm = 0 (3) γs = 0 (4) γi = γ US

−27.6

4.9 −24.3 −23.8 −13.3

−2.9 −27.6 −27.6 −23.2

Industry All countries −10.3

Services

Change in relative aggregate productivity

35.8

12.8

11.6 18.2 36.9 21.5

−0.5 −7.0 −2.2 0.4

Catch-up countries −13.5 37.8

25.8

−13.7 7.3 −11.8 −4.7

−17.3 9.5 −15.6 −4.5

12.4 14.8 39.4 17.8

7.9 −1.5 4.0 1.6

Decline countries −4.5 32.1

−10.5

−7.2 3.3 −4.9 −5.1

10.1 24.3 32.5 28.2

−15.7 −16.8 −13.3 −1.9

Notes. The table reports the average change between the last and first periods in the time series (in percentage points) of each variable for the model and the counterfactuals. Counterfactuals (1) to (3) assume zero growth in labor productivity in a sector, leaving the other sectoral growth rates as in the data. Counterfactual (4) assumes labor productivity growth in each sector equal to the aggregate productivity growth in the United States.

of development: there is an average decline in relative aggregate productivity of 1 p.p. in the counterfactual instead of the 13 p.p. increase in the model. Next we turn to the counterfactual for industry (γm = 0). This counterfactual has no effect on the share of hours in agriculture (see equation (7)). With no productivity growth in industry there is much less reallocation of labor away from industry into services compared to the model and thus industry represents a larger share of output in the counterfactual. The result is a process for relative aggregate productivity that is sharply diminished across countries: an average decline of 7 p.p. in the counterfactual instead of the catch-up of 13 p.p. in the model. And indeed the largest negative impact is on countries that observed the most

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

159

catch-up in relative aggregate productivity in the model. Finally, having no productivity growth in services (γs = 0) has a very small impact on labor reallocation across sectors.35 Relative aggregate productivity declines by an average of 2 p.p. in this counterfactual. The negative impact of this counterfactual on relative aggregate productivity is smaller than that in the case with no productivity growth in industry for all countries but three (Japan, Portugal, and Venezuela), even though services account for a larger share of hours than industry in most countries. We end this set of counterfactuals by assessing the quantitative importance of differences in labor productivity growth across sectors and countries. We set labor productivity growth in each sector to the growth rate of aggregate labor productivity in the United States (γi = γ US ) and document the results in Table II and in the fourth column in Figure X. The counterfactual has a substantial impact on the process of structural transformation. In particular, much less labor is reallocated away from agriculture and industry toward services. For instance, over the sample period, the share of hours in agriculture fell, on average, 26 p.p. in the model and 17 p.p. in the counterfactual. In turn, the share of hours in services increased, on average, 36 p.p. in the model and 22 p.p. in the counterfactual. And indeed this different reallocation process, together with the assumption about sectoral labor productivity growth, explains a large portion of the experiences of catch-up and decline in aggregate productivity. For countries that catch up in aggregate productivity to the United States in the model over the sample period, the average catch-up is 26 p.p. in the model and only 2 p.p. in the counterfactual. For countries that decline in relative aggregate productivity, the average decline is 11 p.p. in the model and only 2 p.p. in the counterfactual.36 We conclude from these counterfactuals that sectoral productivity growth generates substantial effects on labor reallocation, 35. This is due to two opposing effects of productivity growth in services on the labor allocation between industry and services, which roughly cancel each other in the model. See Duarte and Restuccia (2007, p. 42) for a detailed discussion of these effects. 36. Notice that this counterfactual does not eliminate all aggregate productivity growth differences across countries, even though productivity growth rates are identical across sectors and countries and labor reallocation is much diminished as a result. For instance, in the counterfactual, relative aggregate productivity in Finland increases by 8 p.p. over the sample period, and it decreases by 6 p.p. in Mexico. These movements in relative aggregate productivity in the counterfactual stem solely from labor reallocation across sectors (due to positive productivity growth) that have different labor productivity levels.

160

QUARTERLY JOURNAL OF ECONOMICS TABLE III CHANGE IN RELATIVE AGGREGATE PRODUCTIVITY

Model Counterfactual: (1) γi = γiUS (1a) Agriculture (1b) Industry (1c) Services (2) γi = γiUS ∀i (3) Catch-up in services

All countries

Catch-up countries

Decline countries

12.8

25.8

−10.5

11.5 6.0 10.4 3.9 30.7

23.2 13.9 18.3 5.8 46.9

−9.4 −8.4 −3.7 0.5 1.6

Notes. The table reports the average change between the last and first periods in the time series (in percentage points) of relative aggregate productivity for the model and the counterfactuals. Counterfactuals (1a) to (1c) set the growth rate in a sector to the rate in the United States in that sector. Counterfactual (2) sets the growth rate of all sectors to the sectoral growth rates in the United States. Counterfactual (3) sets the productivity growth in services such that in the last period in the sample relative productivity in services is the same as relative productivity in industry in each country.

which in turn are important in understanding aggregate productivity growth across countries. Sectoral Productivity Patterns and Cross-Country Experiences. We now turn to the second set of counterfactuals, where we assess the role of specific labor productivity patterns across sectors in explaining cross-country episodes of catch-up, slowdown, stagnation, and decline in relative aggregate productivity. In Figure VI we documented a substantial catch-up across countries in labor productivity in agriculture and industry but not in services. To assess the importance of sectoral catch-up for aggregate productivity, we compute counterfactuals where we set the growth rate of labor productivity in one sector to the growth rate in that sector in the United States, leaving the other sectoral growth rates as in the data (γi = γiUS for each i ∈ {a, m, s}). For completeness we also compute a counterfactual where all sectoral growth rates are set to the ones in the United States (γi = γiUS ∀i). Table III summarizes the results for these counterfactuals. Although there has been substantial catch-up of labor productivity in agriculture during the sample period (from an average relative productivity of 48% in the first period to 71% in the last period of the sample), this factor contributes little, about 10%, to catch-up in aggregate productivity across countries (1.3 p.p. of 12.8 p.p. in the model). The substantial catch-up in agricultural productivity produces a reallocation of labor away from this

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

161

FIGURE XI Change in Relative Aggregate Productivity—The Importance of Industry This counterfactual sets the growth of labor productivity in industry in each country to the rate in the United States. The figure plots the difference between the last and first periods (in percentage points) of relative aggregate productivity during the sample period in the model and in the counterfactual.

sector, which dampens its positive effect on aggregate productivity growth.37 The catch-up in industry productivity has also been substantial. Unlike agriculture, this catch-up has a significant impact on relative aggregate productivity. Given that most countries have observed higher growth rates of labor productivity in industry than the United States, labor reallocation away from industry and toward services is diminished in the counterfactual for industry. On average, the share of hours in industry decreases 6.5 p.p. in the counterfactual, compared to a decrease of 10.3 p.p. in the model. Figure XI summarizes our findings for the effect of this counterfactual on relative aggregate productivity by reporting 37. The effect of labor reallocation on relative aggregate productivity depends on the normalized end-of-period sectoral labor productivity. Because there is substantial catch-up in agricultural productivity but not in services, the effect of reallocation from agriculture to services is negative.

162

QUARTERLY JOURNAL OF ECONOMICS

the difference in relative aggregate productivity between the last and the first period in the time series for each country in the model and in the conterfactual. Industry productivity growth is important for countries that catch up in aggregate productivity to the United States, because these countries are substantially below the 45◦ line. In fact, we draw in this figure a dash-dotted line indicating half the gains in aggregate productivity in the counterfactual relative to the model. Many countries are in this category and some countries substantially below it, such as Australia, Sweden, and the United Kingdom. For all countries, the average change in relative aggregate productivity is only 6 p.p. in the counterfactual instead of 12.8 p.p. in the model.38 We conclude from this counterfactual that productivity catch-up in industry explains about 50% (6.8 p.p. of 12.8 in the model) of the relative aggregate productivity gains observed during the sample period. Recall that, in contrast to agriculture and industry, there has been no substantial catch-up in services across countries and, as reported in Figure VI, there has been a decline in relative productivity in services for the richer countries. As a result, even though services represent an increasing share of output in the economy, we do not expect services to contribute much to catchup in the model. This is confirmed in the third counterfactual, as productivity catch-up in services contributes about 15% of the catch-up in relative aggregate productivity (2.4 p.p. of 12.8 p.p. in the model). We note, however, that for countries that decline in relative aggregate productivity, lower growth in services than in the United States contributes substantially to this decline (−6.8 p.p. of −10.5 p.p. in the model; see Table III). Among the developed economies—which feature a large share of hours in services— Canada, New Zealand, and Sweden had lower productivity growth rates in services than the United States. In the model, Canada and New Zealand declined in relative aggregate productivity by 9 p.p. and 8 p.p. over the sample period, whereas Sweden observed a substantial catch-up in relative aggregate productivity but stagnated at around 82% during the mid-1970s. In the counterfactual, relative aggregate productivity increases by 3 p.p. in Canada, remains constant for New Zealand, and increases by 9 p.p. from the stagnated level in Sweden. Low productivity growth in services is 38. Note that among countries that decline in relative productivity the effect of industry growth is not systematic and the gaps are not as large.

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

163

essential for understanding these growth experiences of stagnation and decline among rich economies. Figure VI also documents that the level of relative productivity in services is lower than that of industry and that most countries failed to catch up in services to the relative level of industry. For instance, the average relative productivity in services increased from 46% in the first period to 49% in the last period in the sample, whereas the average relative productivity in industry increased from 51% to 75%. In the last period of the sample, all countries except Austria, France, Denmark, the United Kingdom, and New Zealand feature lower relative productivity in services than in industry. Moreover, in many instances the differences in productivity between services and industry are substantial: around 40% lower in services in Spain, Finland, and Norway, around 60% lower in Portugal, and around 80% lower in Korea and Ireland. These features imply that the service sector represents an increasing drag on aggregate productivity as resources are reallocated to this sector in the process of structural transformation. To illustrate the role of low productivity in services and the lack of catch-up in accounting for the growth experiences of slowdown, stagnation, and decline, we compute a counterfactual where we let productivity growth in services be such that in the last period in the sample relative productivity in services is the same as relative productivity in industry in each country. Although the impact of these different productivity growth rates in services on labor reallocation is somewhat limited, the impact on growth experiences across countries is quite striking: for countries that catch up to the United States during the sample period, the average catch-up increases by almost 80% to 46 p.p., whereas for countries that decline there is instead a catch-up of 1.6 p.p. during the sample period. (See Table III.) More important, these summary statistics hide the impact of productivity in services in explaining experiences of slowdown, stagnation, and decline observed in the time series. For this reason, Figure XII plots the time path of relative aggregate productivity for all country experiences of slowdown, stagnation, and decline in relative aggregate productivity. The solid lines represent the model and the dash-dotted lines represent the counterfactual. This figure clearly indicates the extent to which low productivity in services and the lack of catch-up account for all these poor growth experiences. To summarize, although productivity convergence in industry (and agriculture) are essential in the first stages of the process

164

QUARTERLY JOURNAL OF ECONOMICS

FIGURE XII Relative Aggregate Productivity—The Importance of Services This counterfactual sets the productivity growth in services such that in the last period in the sample relative productivity in services is the same as relative productivity in industry in each country. Each panel plots aggregate labor productivity relative to that of the United States in the model and the counterfactual for each country which, during the sample period, experienced an episode of slowdown, stagnation, or decline. The solid line represents the model and the dash-dotted line the counterfactual.

of structural transformation, poor relative performance in services has determined a slowdown, stagnation, and decline in aggregate productivity. In fact, in the last period of the sample, almost all countries observe a lower relative labor productivity in services than in aggregate. (See Figure XIII.) Because growth rate differences across countries in the service sector tend to be small and services represent a large and increasing share of hours in most countries, this suggests an increasing role of services in determining cross-country aggregate productivity outcomes.

165

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

Relative labor productivity in services—last year

1.2

NOR

1

AUT FRA DNK BEL GBRNLD

0.8

NZL

0.6

CAN SWE AUSITA FIN

JPN IRL

ESP

0.4 VEN

0.2 BOL

0

0

CRI MEX COL TUR BRA

0.2

ARG CHL

GRC PRT

KOR

0.4

0.6

0.8

1

1.2

Relative aggregate labor productivity—last year FIGURE XIII Labor Productivity in Services across Countries—Last Period This figure plots relative labor productivity in services against relative aggregate productivity in the last period of the sample for all countries.

IV.D. Discussion Our analysis of structural transformation and aggregate productivity growth relies on a collection of closed economies. It is of interest to discuss the limitations and implications of this assumption for the results. Openness and trade can have two important effects in an economy. First, competition from trade can affect domestic productivity. Second, for an open economy, prices of traded goods reflect world market conditions and not just domestic factors. Regarding the effect of trade on productivity, we argue that the closed-economy assumption is not as restrictive for our analysis as it may first appear. To see this point, notice that the effect of openness on labor allocations and aggregate productivity is already embedded in the measures of labor productivity growth by sector, which the analysis takes as given. For instance, we found that the growth rate of labor productivity in manufacturing for Korea was almost three times that of the United States. It is

166

QUARTERLY JOURNAL OF ECONOMICS

likely that openness to trade during this period can help explain this fact. Moreover, openness would imply that productivity differences across countries for those goods that are most tradable would tend to be small relative to the differences for those goods that are less traded. The productivity implications of the model are consistent with this broad prediction, because differences in manufacturing productivity are smaller than productivity differences in services (mostly nontraded goods). It is an interesting question for future research to assess the importance of trade for productivity convergence in manufacturing across countries and the lack of convergence in services. Regarding the effect of trade on relative prices, recall that the closed-economy assumption implies a one-to-one mapping from sectoral productivity growth to relative prices. An open-economy version of the model would tend to produce a weaker link between domestic productivity growth and relative prices. In fact, in a small open economy, relative prices are invariant to domestic productivity. As we discussed earlier, the relative price implications of the model are broadly consistent with the data, which suggests that domestic productivity growth is a substantial component of the movements in relative prices. To put it differently, we found a strong correlation between changes in relative prices and labor productivity growth across countries, as documented in Figure IX. As a result, the labor allocations implied by the model are broadly consistent with the incentives that consumers face in these economies. We found that not all differences in relative prices are captured by the model. In particular, we found that the price of services relative to manufacturing increased faster in the model than in the data for many countries. This departure of the model from the data may arise not only from the closedeconomy assumption, but also from other features of the data, such as price distortions and barriers to labor reallocation across sectors. Finally, note that standard open-economy models imply that the prices of traded goods are equalized across countries. The evidence, however, suggests large departures from the law of one price. For instance, the price exercise on agricultural goods from the FAO suggests large price differences across countries and the international macro literature documents large deviations in prices across countries even for highly tradable goods. Another potential avenue to assessing the limitations of the closed-economy assumption of the model would be to compare the consumption and production implications relative to data. For

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

167

instance, in the closed economy, output and consumption shares are equal, but in the open economy, they would differ. Unfortunately, this implication cannot be tested directly, because consumption is measured as expenditures in final goods and any gap between production and consumption of goods may also be due to processing, distribution, and marketing services and other charges. But because for the more developed countries most of the trade occurs intra-industry, consumption and production shares of broad sectors tend not to differ greatly. V. CONCLUSIONS This paper highlights the role of sectoral labor productivity for the structural transformation and aggregate productivity experiences across countries. Using a model of the structural transformation that is calibrated to the growth experience of the United States, we showed that sectoral differences in labor productivity levels and growth explain the broad patterns of labor reallocation and aggregate productivity experiences across countries. We found that sectoral labor productivity differences across countries are large and systematic both at a point in time and over time. In particular, labor productivity differences between rich and poor countries are large in agriculture and services and smaller in manufacturing. Moreover, most countries have experienced substantial productivity catch-up in agriculture and industry, but productivity in services has remained low relative to the United States. An implication of these findings is that, as countries move through the process of structural transformation, relative aggregate labor productivity can first increase and later stagnate or decline. We find that labor productivity catch-up in manufacturing explains about 50% of the gains in aggregate productivity across countries and that low labor productivity in services and the lack of catch-up explain all the experiences of slowdown, stagnation, and decline in relative aggregate productivity across countries. Our findings suggest that understanding the sources of sectoral differences in labor productivity levels and growth across countries is crucial in understanding the relative performance of countries. In analyzing sectoral labor productivity levels and growth rates across countries, a number of interesting questions arise. What factors contribute to cross-country differences in labor productivity across sectors? Why were countries able to catch up in manufacturing productivity but not in services? What are the

168

QUARTERLY JOURNAL OF ECONOMICS

barriers that prevent other developed economies from sustaining growth rates of labor productivity in services as high as in the United States? How are trade openness and regulation related to these productivity differences across countries? Although there may not be a unifying explanation for all these observations, a recurrent theme in productivity studies at the sectoral level is that the threat or actual pressure of competition is crucial for productivity performance; see, for instance, Schmitz ´ (2005) and Gald´on-Sanchez and Schmitz (2002). Because services are less traded than manufacturing goods, there is a tendency for services to be less subject to competitive pressure, which may explain the larger productivity gaps observed in services relative to manufacturing across countries. Moreover, protected domestic sectors may be the explanation for poor productivity performance in some countries. Because openness to trade would not generally have the desired competitive-pressure impact in services, other factors such as the regulatory environment may prove useful in explaining productivity differences across countries in this sector. For instance, the role of land and size regulations on productivity in retail services is often emphasized; see, for instance, Baily and Solow (2001). As a first pass at providing some empirical support for this potential explanation for productivity differences across countries, we have correlated labor productivity differences in industry and services derived from our model to measures of trade openness and government regulation. We find that trade openness is strongly correlated with industry productivity but less so with services productivity, whereas measures of regulation (such as that from the World Bank’s Doing Business) are strongly correlated with productivity in services. We leave a detailed investigation of these important issues for future research. APPENDIX: DATA SOURCES AND DEFINITIONS We build a panel data set with annual observations for aggregate GDP per hour and value added per hour and shares of hours for agriculture, industry, and services for 29 countries. The countries covered in our data set are, with sample period in parentheses, Argentina (1950–2004), Australia (1964–2004), Austria (1960–2004), Belgium (1956–2004), Bolivia (1950–2002), Brazil (1950–2003), Canada (1956–2004), Chile (1951–2004), Colombia (1950–2003), Costa Rica (1950–2002), Denmark (1960–2004), Finland (1959–2004), France (1969–2003), Greece (1960–2004),

STRUCTURAL TRANSFORMATION AND PRODUCTIVITY

169

Ireland (1958–2004), Italy (1956–2004), Japan (1960–2004), Korea (1972–2003), Mexico (1950–2004), the Netherlands (1960– 2004), New Zealand (1971–2004), Norway (1956–2004), Portugal (1956–2004), Spain (1960–2004), Sweden (1960–2004), Turkey (1960–2003), the United Kingdom (1956–2004), the United States (1956–2004), and Venezuela (1950–2004). All series are trended using the Hodrick–Prescott filter with a smoothing parameter λ = 100 before any ratios are computed. A. Aggregate Data We obtain data on PPP-adjusted real GDP per capita in constant prices (RGDPL) and population (POP) from Penn World Tables version 6.2; see Heston, Summers, and Aten (2006). We obtain data on employment (EMP) and annual hours actually worked per person employed (HOURS) from the Total Economy Database; see the Conference Board (2008). With these data we construct annual time series of PPP-adjusted GDP per hour in constant prices for each country as Y Lh = RGDPL × POP/(EMP × HOURS). B. Sectoral Data We obtain annual data on employment, hours worked, and constant domestic-price value added for agriculture, industry, and services for the countries listed above. The sectors are defined by the International Standard Industrial Classification, revision 3 (ISIC III) definitions, with agriculture corresponding to ISIC divisions 1–5 (agriculture, forestry, hunting, and fishing), industry to ISIC divisions 10–45 (mining, manufacturing, construction, electricity, water, and gas), and services to ISIC divisions 50– 99 (wholesale and retail trade—including hotels and restaurants, transport, and government, financial, professional, and personal services such as education, health care, and real estate services). Value Added by Sector. Value added by sector is obtained by combining data from the World Bank (2008) World Development Indicators online and historical data from the OECD National Accounts publications for the following countries: Australia, Austria, Belgium, Canada, Denmark, Finland, France, Greece, Ireland, Italy, Japan, Korea, the Netherlands, New Zealand, Norway, Portugal, Spain, Sweden, Turkey, the United Kingdom, and the United States. The data series from the World Bank’s World Development Indicators are agriculture value added, industry

170

QUARTERLY JOURNAL OF ECONOMICS

value added, and services value added. All series are measured in constant local currency units, base year 2000 (with the exception of Turkey, 1987). These series are extended backward using historical data from the OECD National Accounts publications, except for Korea. A combination of three OECD publications was used: National Accounts of OECD Countries (1950–1968), National Accounts of OECD Countries (1950–1961), and National Accounts of OECD Countries (1960–1977); see OECD (1963, 1970, 1979). The primary resource was the book covering the period from 1950 to 1968. We compute growth rates of the OECD data for corresponding variables for years prior to those available through the World Bank and apply them to the World Bank series. Data on value added by sector for all Latin American countries in our data set (Argentina, Bolivia, Brazil, Chile, Colombia, Costa Rica, Mexico, and Venezuela) are obtained from the 10-Sector Database; see Timmer and de Vries (2009). This database has data on value added in constant local prices for ten sectors. These data are aggregated into value added in agriculture, industry, and services using the ISIC III definitions above. Employment by Sector. The sectoral employment data are obtained from a variety of sources as well. We obtain data on civilian employment in each broad sector from The OECD (2008) Labor Force Statistics database online for Australia, Austria, Belgium, Canada, Finland, France, Ireland, Italy, Japan, the Netherlands, New Zealand, Norway, Spain, Turkey, the United Kingdom, and the United States. Data for Portugal on sectoral employment are obtained from the Banco de Portugal (2006). The data are aggregated into the same three broad sectors. We extend this series forward to 2005 by using growth rates for each variable computed from the EU KLEMS database; see O’Mahony and Timmer (2009). Data for Korea and all Latin American countries are obtained from the 10-Sector Database. We aggregate these data into the three broad sectors using the ISIC III definitions above. Hours Worked by Sector. We obtain data on hours of work per worker from the EU KLEMS database for Australia, Austria, Belgium, Denmark, Finland, France, Ireland, Italy, Japan, the Netherlands, Portugal, Spain, Sweden, the United Kingdom, and the United States. These data cover the period 1970 to 2005. Data for Brazil, Canada, Chile, Colombia, Costa Rica, Greece, Mexico, Norway, New Zealand, and Turkey are obtained from the International Labour Office (2008) Laborsta database. These series are

STRUCTURAL TRANSFORMATION IN PRODUCTIVITY

171

much shorter; the time period covered varies by country, but it starts after 1990 for all countries. From these data, we compute the ratio of per-worker hours by sector relative to per-worker aggregate hours. In analyzing these ratios, we find that relative sectoral hours are remarkably stable over time for most countries and that these ratios are very close to one for many countries. Moreover, any deviations from one in relative hours across countries are not systematically related to the level of development. For each country, we use the average value of each of these ratios, denoted as hi , i ∈ {a, m, s}, to calculate shares of hours by sector and value added per hour by sector. Because the time series of sectoral hours are shorter than those of sectoral employment and value added, this simplification allows us to compute sectoral shares of total hours and value added per hour without shortening the time series. We do not have data on sectoral hours for Argentina, Bolivia, Korea, and Venezuela, and we assume that hi = 1 for these countries. Total hours by sector are computed by multiplying employment with hours per worker in each sector. We construct value added per hour by dividing the series of value added with the corresponding series of total hours for each sector. Shares of hours by sector are simply the ratio of total hours by sector relative to total aggregate hours. Prices by Sector. We compute implicit producer price deflators for each sector using data on sectoral value added at constant and current prices from the World Development Indicators. The price data are consistent with the sectoral definitions for value added. They cover the period from 1971 to 2004. DEPARTMENT OF ECONOMICS, UNIVERSITY OF TORONTO DEPARTMENT OF ECONOMICS, UNIVERSITY OF TORONTO

REFERENCES Adamopoulos, Tasso, and Ahmet Akyol, “Relative Underperformance Alla Turca,” Review of Economic Dynamics, 12 (2009), 697–717. Baily, Martin, Diana Farrell, and Jaana Remes, “Domestic Services: The Hidden Key to Growth,” McKinsey Global Institute, 2005. Baily, Martin, and Robert Solow, “International Productivity Comparisons Built from the Firm Level,” Journal of Economic Perspectives, 15 (2001), 151–172. Banco de Portugal, “S´eries Longas para a Economia Portuguesa p´os II Guerra Mundial, 2006.” Available at http://www.bportugal.pt/publish/serlong/ serlong p.htm. Baumol, William, “Macroeconomics of Unbalanced Growth: The Anatomy of Urban Crisis,” American Economic Review, 57 (1967), 415–426. Caselli, Francesco, “Accounting for Cross-Country Income Differences,” in Handbook of Economic Growth, Philippe Aghion and Steven Durlauf, eds. (New York: North Holland Elsevier, 2005).

172

QUARTERLY JOURNAL OF ECONOMICS

Caselli, Francesco, and Wilbur J. Coleman II, “The U.S. Structural Transformation and Regional Convergence: A Reinterpretation,” Journal of Political Economy, 109 (2001), 584–616. Caselli, Francesco, and Silvana Tenreyro, “Is Poland the Next Spain?” in NBER International Seminar on Macroeconomics 2004, Richard Clarida, Jeffrey Frankel, Francesco Giavazzi, and Kenneth West, eds. (Cambridge, MA: The MIT Press, 2006). Chanda, Areendam, and Carl-Johan Dalgaard, “Dual Economies and International Total Factor Productivity Differences: Channelling the Impact from Institutions, Trade, and Geography,” Economica, 75 (2008), 629–661. Chari, Varadarajan V., Patrick Kehoe, and Ellen McGrattan, “The Poverty of Nations: A Quantitative Exploration,” NBER Working Paper No. 5414, 1996. Coleman, Wilbur J., “Accommodating Emerging Giants,” Mimeo, Duke University, 2007. Conference Board, Total Economy Database, 2008. Available at www.conference -board.org/economics/. C´ordoba, Juan, and Marla Ripoll, “Agriculture, Aggregation, and Development Accounting,” Mimeo, University of Pittsburgh, 2004. Duarte, Margarida, and Diego Restuccia, “The Productivity of Nations,” Federal Reserve Bank of Richmond Economic Quarterly, 92 (2006), 195–223. ——, “The Structural Transformation and Aggregate Productivity in Portugal,” Portuguese Economic Journal, 6 (2007), 23–46. Echevarria, Cristina, “Changes in Sectoral Composition Associated with Growth,” International Economic Review, 38 (1997), 431–452. ´ Gald´on-Sanchez, Jos´e, and James Schmitz Jr., “Competitive Pressure and Labor Productivity: World Iron-Ore Markets in the 1980’s,” American Economic Review, 92 (2002), 1222–1235. Gollin, Douglas, Stephen Parente, and Richard Rogerson, “The Role of Agriculture in Development,” American Economic Review Papers and Proceedings, 92 (2002), 160–164. ——, “The Food Problem and the Evolution of International Income Levels,” Journal of Monetary Economics, 54 (2007), 1230–1255. Hansen, Gary, and Edward C. Prescott, “From Malthus to Solow,” American Economic Review, 92 (2002), 1205–1217. ` Herrendorf, Berthold, and Akos Valentinyi, “Which Sectors Make the Poor Countries So Unproductive?” Mimeo, Arizona State University, 2006. Heston, Alan, Robert Summers, and Bettina Aten, “Penn World Table Version 6.2,” Center for International Comparisons of Production, Income and Prices at the University of Pennsylvania, 2006. Available at http://pwt.econ.upenn.edu. International Labour Office, “LABORSTA Database,” Bureau of Statistics, 2008. Available at http://laborsta.ilo.org/. Jones, Charles (1997) “On the Evolution of the World Income Distribution,” Journal of Economic Perspectives, 11 (1997), 19–36. Kehoe, Timothy, and Edward C. Prescott, “Great Depressions of the 20th Century,” Review of Economic Dynamics, 5 (2002), 1–18. Kongsamut, Piyabha, S´ergio Rebelo, and Danyang Xie, “Beyond Balanced Growth,” Review of Economic Studies, 68 (2001), 869–882. Kuznets, Simon, Modern Economic Growth (New Haven, CT: Yale University Press, 1966). Laitner, John, “Structural Change and Economic Growth,” Review of Economic Studies, 67 (2000), 545–561. Lucas, Robert, “Some Macroeconomics for the 21st Century,” Journal of Economic Perspectives, 14 (2000), 159–168. Maddison, Angus, “Economic Growth and Structural Change in the Advanced Countries,” in Western Economies in Transition, Irving Leveson and Jimmy Wheeler, eds. (London: Croom Helm, 1980). Ngai, Rachel, “Barriers and the Transition to Modern Growth,” Journal of Monetary Economics, 51 (2004), 1353–1383. Ngai, Rachel, and Christopher Pissarides, “Structural Change in a Multisector Model of Growth,” American Economic Review, 97 (2007), 429–443. OECD, National Accounts of OECD Countries: Detailed Tables, Volume II, 1950– 1961 (Paris, France: OECD, 1963).

STRUCTURAL TRANSFORMATION IN PRODUCTIVITY

173

——, National Accounts of OECD Countries: Detailed Tables, Volume II, 1950–1968 (Paris, France: OECD, 1970). ——, National Accounts of OECD Countries: Detailed Tables, Volume II, 1960–1977 (Paris, France: OECD, 1979). ——, Labor Force Statistics, 2008. Available at http://hermia.sourceoecd.org/ vl=718832/cl=16/nw=1/rpsv/outlookannuals.htm. O’Mahony, Mary, and Marcel P. Timmer, “Output, Input and Productivity Measures at the Industry Level: The EU KLEMS Database,” Economic Journal, 119 (2009), F374–F403. Available at www.euklems.net. Pilat, Dirk, “Labour Productivity Levels in OECD Countries: Estimates for Manufacturing and Selected Service Sectors,” OECD Working Paper No. 169, 1996. Prescott, Edward C., “Prosperity and Depression,” American Economic Review, 92 (2002), 1–15. Restuccia, Diego, Dennis Yang, and Xiaodong Zhu, “Agriculture and Aggregate Productivity: A Quantitative Cross-Country Analysis,” Journal of Monetary Economics, 55 (2008), 234–250. Rogerson, Richard, “Structural Transformation and the Deterioriation of European Labor Market Outcomes,” Journal of Political Economy, 116 (2008), 235– 259. Schmitz, James Jr., “What Determines Productivity? Lessons from the Dramatic Recovery of the U.S. and Canadian Iron Ore Industries Following Their Early 1980s Crisis,” Journal of Political Economy, 113 (2005), 582–625. Summers, Robert, and Alan Heston, “The Penn World Table: An Expanded Set of International Comparisons, 1950–1988,” Quarterly Journal of Economics, 106 (1991), 327–368. Timmer, Marcel P., and Gaaitzen J. de Vries, “Structural Change and Growth Accelerations in Asia and Latin America: A New Sectoral Data Set,” Cliometrica, 3 (2009), 165–190. Available at www.ggdc.net. U.S. Census Bureau, Department of Commerce, Historical Statistics of the United States: Colonial Times to 1970 (Part I) (Washington, DC: U.S. Government Printing Office, 1975). Vollrath, Dietrich, “How Important Are Dual Economy Effects for Aggregate Productivity?” Journal of Development Economics, 88 (2009), 325–334. World Bank, World Development Indicators, 2008. Available at http://devdata .worldbank.org/dataonline/.

TEACHER QUALITY IN EDUCATIONAL PRODUCTION: TRACKING, DECAY, AND STUDENT ACHIEVEMENT∗ JESSE ROTHSTEIN Growing concerns over the inadequate achievement of U.S. students have led to proposals to reward good teachers and penalize (or fire) bad ones. The leading method for assessing teacher quality is “value added” modeling (VAM), which decomposes students’ test scores into components attributed to student heterogeneity and to teacher quality. Implicit in the VAM approach are strong assumptions about the nature of the educational production function and the assignment of students to classrooms. In this paper, I develop falsification tests for three widely used VAM specifications, based on the idea that future teachers cannot influence students’ past achievement. In data from North Carolina, each of the VAMs’ exclusion restrictions is dramatically violated. In particular, these models indicate large “effects” of fifth grade teachers on fourth grade test score gains. I also find that conventional measures of individual teachers’ value added fade out very quickly and are at best weakly related to long-run effects. I discuss implications for the use of VAMs as personnel tools.

I. INTRODUCTION Parallel literatures in labor economics and education adopt similar econometric strategies for identifying the effects of firms on wages and of teachers on student test scores. Outcomes are modeled as the sum of firm or teacher effect, individual heterogeneity, and transitory, orthogonal error. The resulting estimates of firm effects are used to gauge the relative importance of firm and worker heterogeneity in the determination of wages. In education, so-called “value added” models (hereafter, VAMs) have been used to measure the importance of teacher quality to educational production, to assess teacher preparation and certification programs, and as important inputs to personnel evaluations and merit pay programs.1 ∗ Earlier versions of this paper circulated under the title “Do Value Added Models Add Value?” I am grateful to Nathan Wozny and Enkeleda Gjeci for exceptional research assistance. I thank Orley Ashenfelter, Henry Braun, David Card, Henry Farber, Bo Honor´e, Brian Jacob, Tom Kane, Larry Katz, Alan Krueger, Sunny Ladd, David Lee, Lars Lefgren, Austin Nichols, Amine Ouazad, Mike Rothschild, Cecilia Rouse, Diane Schanzenbach, Eric Verhoogen, Tristan Zajonc, anonymous referees, and conference and seminar participants for helpful conversations and suggestions. I also thank the North Carolina Education Data Research Center at Duke University for assembling, cleaning, and making available the confidential data used in this study. Financial support was generously provided by the Princeton Industrial Relations Section and Center for Economic Policy Studies and the U.S. Department of Education (under Grant R305A080560). [email protected] 1. On firm effects, see, for example, Abowd and Kramarz (1999). For recent examinations of teacher effects modeling, see McCaffrey et al. (2003); Wainer (2004); Braun (2005a, 2005b); and Harris and Sass (2006). C 2010 by the President and Fellows of Harvard College and the Massachusetts Institute of

Technology. The Quarterly Journal of Economics, February 2010

175

176

QUARTERLY JOURNAL OF ECONOMICS

All of these applications suppose that the estimates can be interpreted causally. But observational analyses can identify causal effects only under unverifiable assumptions about the correlation between treatment assignment—the assignment of students to teachers, or the matching of workers to firms—and other determinants of test scores and wages. If these assumptions do not hold, the resulting estimates of teacher and firm effects are likely to be quite misleading. Anecdotally, assignments of students to teachers incorporate matching to take advantage of teachers’ particular specialties, intentional separation of children who are known to interact badly, efforts on the principal’s part to reward favored teachers through the allocation of easy-to-teach students, and parental requests (see, e.g., Monk [1987]; Jacob and Lefgren [2007]). These are difficult to model statistically. Instead, VAMs typically assume that teacher assignments are random conditional on a single (observed or latent) factor. In this paper, I develop and implement tests of the exclusion restrictions of commonly used value added specifications. My strategy exploits the fact that future teachers cannot have causal effects on past outcomes, whereas violations of model assumptions may lead to apparent counterfactual “effects” of this form. Test scores, like wages, are serially correlated, and as a result an association between the current teacher and the lagged score is strong evidence against exogeneity with respect to the current score. I examine three commonly used VAMs, two of which have direct parallels in the firm effects literature. In the simplest, most widely used VAM—which resembles the most common specification for firm effects—the necessary exclusion restriction is that teacher assignments are orthogonal to all other determinants of the so-called “gain” score, the change in a student’s test score over the course of the year. If this restriction holds, fifth grade teacher assignments should not be correlated with students’ gains in fourth grade. Using a large microdata set describing North Carolina elementary students, I find that there is in fact substantial within-school dispersion of students’ fourth grade gains across fifth grade classrooms. Sorting on past reading gains is particularly prominent, though there is clear evidence of sorting on math gains as well. Because test scores exhibit strong mean reversion— and thus gains are negatively autocorrelated—sorting on past gains produces bias in the simple VAM’s estimates.

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

177

The other VAMs that I consider rely on different exclusion restrictions, namely that classroom assignments are as good as random conditional on either the lagged test score or the student’s (unobserved, but permanent) ability. I discuss how similar strategies can be used to test these restrictions as well. I find strong evidence in the data against each. Evidently, classroom assignments respond dynamically to annual achievement in ways that are not captured by the controls typically included in VAM specifications. To evaluate the magnitude of the biases that assignments produce, I compare common VAMs to a richer model that conditions on the complete achievement history. Estimated teacher effects from the rich model diverge importantly from those obtained from the simple VAMs in common use. I discuss how selection on unobservables is likely to produce substantial additional biases. I use a simple simulation to explore the sensitivity of teacher rankings to these biases. Under plausible assumptions, simple VAMs can be quite misleading. The rich VAM that controls for all observables does better, but still yields rankings that diverge meaningfully from the truth. My estimates also point to an important substantive result. To the extent that any of the VAMs that I consider identify causal effects, they indicate that teachers’ long-run effects are at best weakly proxied by their immediate impacts. A teacher’s effect in the year of exposure—the universal focus of value added analyses—is correlated only .3 to .5 with her cumulative effect over two years, and even less with her effect over three years. Accountability policies that rely on measures of short-term value added would do an extremely poor job of rewarding the teachers who are best for students’ longer-run outcomes. An important caveat to the empirical results is that they may be specific to North Carolina. Students in other states or in individual school districts might be assigned to classrooms in ways that satisfy the assumptions required for common VAMs. But at the least, VAM-style analyses should attempt to evaluate the model assumptions, perhaps with methods like those used here. Models that rely on incorrect assumptions are likely to yield misleading estimates, and policies that use these estimates in hiring, firing, and compensation decisions may reward and punish teachers for the students they are assigned as much as for their actual effectiveness in the classroom. Section II reviews the use of preassignment variables to test exogeneity assumptions. Section III introduces the three VAMs,

178

QUARTERLY JOURNAL OF ECONOMICS

discusses their implicit assumptions, and describes my proposed tests. Section IV describes the data. Section V presents results. Section VI attempts to quantify the biases that nonrandom classroom assignments produce in VAM-based analyses. Section VII presents evidence on teachers’ long-run effects. I conclude, in Section VIII, by discussing some implications for the design of incentive pay systems in education. II. USING PANEL DATA TO TEST EXCLUSION RESTRICTIONS A central assumption in all econometric studies of treatment effects is that the treatment is uncorrelated with other determinants of the outcome, conditional on covariates. Although the assumption is ultimately untestable—the “fundamental problem of causal inference” (Holland 1986)—the data can provide indications that it is unlikely to hold. In experiments, for example, significant correlations between treatment and preassignment variables are interpreted as evidence that randomization was unsuccessful.2 Panel data can be particularly useful. A correlation between treatment and some preassignment variable X need not indicate bias in the estimated treatment effect if X is uncorrelated with the outcome variable of interest. But outcomes are typically correlated within individuals over time, so an association between treatment and the lagged outcome strongly suggests that the treatment is not exogenous with respect to posttreatment outcomes. This insight has been most fully explored in the literature on the effect of job training on wages and employment. Today’s wage or employment status is quite informative about tomorrow’s, even controlling for all observables. Evidence that assignment to job training is correlated with lagged wage dynamics indicates that simple specifications for the effect of training on outcomes are likely to yield biased estimates (Ashenfelter 1978). Richer models of the training assignment process may absorb this correlation while permitting identification (Heckman, Hotz, and Dabos 1987). But even these models may impose testable restrictions on the relationship between treatment and the outcome history 2. Similar tests are often used in nonexperimental analyses: Researchers conducting propensity score matching studies frequently check for “balance” of covariates conditional on the propensity score (Rosenbaum and Rubin 1984), and Imbens and Lemieux (2008) recommend analogous tests for regression discontinuity analyses.

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

179

(Ashenfelter and Card 1985; Card and Sullivan 1988; Jacobson, LaLonde, and Sullivan 1993).3 In value added studies, the multiplicity of teacher “treatments” can blur the connection to program evaluation methods. But the utility of past outcomes for specification diagnostics carries over directly. Identification of a teacher’s effect rests on assumptions about the relationship between the teacher assignment and the other determinants of future achievement, and the relationship with past achievement can be informative about the plausibility of these assumptions. Only a few studies have attempted to validate VAMs. Harris and Sass (2007) and Jacob and Lefgren (2008) show that value added coefficients are weakly but significantly correlated with principals’ ratings of teacher performance. Of course, if principal decisions about classroom assignments created bias in the VAMs, causality could run from principal opinions to estimated value added rather than the reverse. More relevant to the current analysis, Kane and Staiger (2008) demonstrate that VAM estimates from observational data are approximately unbiased predictors of teachers’ effects when students are randomly assigned. Although I examine a question closely related to that considered by Kane and Staiger, my larger and more representative sample permits me to extend their analysis in two ways. First, I have much more statistical power. This enables me to identify biases that are substantively important but that lie well within Kane and Staiger’s confidence intervals. Second, my sample resembles the sort that would be used for any VAM intended as a teacher compensation or retention tool. In particular, it includes teachers specializing in students (e.g., late readers) who cannot be readily identified and excluded from large-scale analyses. The likely exclusion of such teachers from Kane and Staiger’s sample quite plausibly avoids the most severe biases in observational VAM estimates.4 3. Of course, these sorts of tests cannot diagnose all model violations. If treatment assignments depend on unobserved determinants of future outcomes that are uncorrelated with the outcome history, the treatment effect estimator may be biased even though treatment is uncorrelated with past outcomes. 4. In the Kane and Staiger experiment, principals were given the name of one teacher and asked to identify a comparison teacher such that it would be appropriate to randomly assign students within the pair. One imagines that principals generally chose a comparison who was assigned similar students as the focal teacher in the preexperimental data. Moreover, a substantial majority of principals declined to participate, perhaps because the initial teacher was a specialist for whom no similar comparison could be found.

180

QUARTERLY JOURNAL OF ECONOMICS

III. STATISTICAL MODEL AND METHODS This section develops the statistical framework for VAM analysis and introduces my tests. I begin by defining the parameters of interest in Section III.A. In Section III.B, I introduce the three VAMs that I consider. Section III.C describes the exclusion restrictions that the VAM requires to permit identification of the causal effects of interest and develops the implications of these restrictions for the relationship between the current teacher and lagged outcome. Section III.D discusses the implementation of the tests. III.A. Defining the Problem I take the parameter of interest in value added modeling to be the effect on a student’s test score at the end of grade g of being assigned to a particular grade-g classroom rather than to another classroom at the same school. Later, I extend this to look at dynamic treatment effects (that is, the effect of the grade-g classroom on the g + s score). I do not distinguish between classroom and teacher effects, and use the terms interchangably. In the Online Appendix, I consider this distinction, defining a teacher’s effect as the time-invariant component of the effects of the classrooms taught by the teacher over several years. The basic conclusions are unaffected by this redefinition. I am interested in whether common VAMs identify classroom effects with arbitrarily large samples. I therefore sidestep smallsample issues by considering the properties of VAM estimates as the number of students grows with the number of teachers (and classrooms) fixed.5 If classroom effects are identified under these unrealistic asymptotics, VAMs may be usable in compensation and retention policy with appropriate allowances for the sampling errors that arise with finite class sizes;6 if not, these corrections are likely to go awry. A final important distinction is between identification of the variance of teacher quality and identification of individual teachers’ effects. I focus exclusively on the latter. It is impractical 5. Under realistic asymptotics, the number of classrooms should rise in proportion to the number of students. If so, classroom effects are not identified under any exogeneity restrictions: Even in the asymptotic limit, the number of students per teacher remains finite and the sampling error in an individual teacher’s effect remains nontrivial. 6. A typical approach shrinks a teacher’s estimated effect toward the population mean in proportion to the degree of imprecision in the estimate. The resulting empirical Bayes estimate is the best linear predictor of the teacher’s true effect, given the noisy estimate. See McCaffrey et al. (2003, pp. 63–68).

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

181

to report each of several thousand teachers’ estimated effects, however. I therefore report only the implied standard deviations (across teachers) of teachers’ actual and counterfactual effects, along with tests of the hypothesis that the teacher effects are all zero.7 III.B. Data Generating Process and the Three VAMs I develop the three VAMs and the associated tests in the context of a relatively general educational production function, modeled on those used by Todd and Wolpin (2003) and Harris and Sass (2006), that allows student achievement to depend on the full history of inputs received to date plus the student’s innate ability. Separating classroom effects from other inputs, I assume that the test score of student i at the end of grade g, Aig , can be written as (1)

Aig = αg +

g h=1

βhgc(i,h) + μi τg +

g

εihφhg + vig .

h=1

Here, βhgc is the effect of being in classroom c in grade h on the grade-g test score, and c (i, h) ∈ {1, . . . , Jh} indexes the classroom to which student i is assigned in grade h. μi is individual ability. We might expect the achievement gap between high-ability and low-ability students to grow over time; this would correspond to τk > τg > 0 for each k > g. εih captures all other inputs in grade h, including those received from the family, nonclassroom peers, and the community. It might also include developmental factors: A precocious child might have positive εs in early grades and negative εs in later grades as her classmates caught up. As this example shows, ε is quite likely to be serially correlated within students across grades. Finally, vig represents measurement error in the grade-g test relative to the student’s “true” grade-g achievement. This is independent across grades within students.8 A convenient restriction on the time pattern of classroom effects is uniform geometric decay, βhg c = βhgc λg −g for some 0 ≤ λ ≤ 1 and all h ≤ g ≤ g . A special case is λ = 1, corresponding to perfect persistence. Although my results do not depend on these restrictions, I impose them as needed for notational simplicity. 7. Rivkin, Hanushek, and Kain (2005) develop a strategy for identifying the variance of teachers’ effects, but not the effect of individual teachers, under weaker assumptions than are required by the VAMs described below. 8. I define the β parameters to include any classroom-level component of vig and assume that vig is independent across students in the same classroom.

182

QUARTERLY JOURNAL OF ECONOMICS

I consider nonuniform decay in Section VII. Note that there is no theoretical basis for restrictions on the decay of nonclassroom effects (i.e., on φhg ). Itwill be useful to adopt some simplifying notation. Let g ωig ≡ h=1 εihφhg be the composite grade-g residual achievement, and let indicate first differences across student grades: βhgc ≡ βhgc − βh,g−1,c , τg ≡ τg − τg−1 , ωig ≡ ωig − ωig−1 , and so on. Tractable VAMs amount to decompositions of Aig (or, more commonly, of Aig ≡ Aig − Aig−1 ) into the current teacher’s effect βggc(i,g) , a student heterogeneity component, and an error assumed to be orthogonal to the classroom assignment. Models differ in the form of this decomposition. In this paper I consider three specifications: A simple regression of gain scores on grade and contemporaneous classroom indicators, VAM1: Aig = αg + βggc(i, g) + e1ig ; a regression of score levels (or, equivalently, of gains) on classroom indicators and the lagged score, VAM2: Aig = αg + Aig−1 λ + βggc(i, g) + e2ig ; and a regression that stacks gain scores from several grades and adds student fixed effects, VAM3: Aig = αg + βggc(i, g) + μi + e3ig . All three are widely used.9 VAM2 and VAM3 can both be seen as generalizations of VAM1: Constraining λ = 1 converts VAM2 to VAM1, whereas constraining μi ≡ 0 converts VAM3. III.C. Exclusion Restrictions and Falsification Tests Despite their similarity, the three VAMs rely on quite distinct restrictions on the process by which students are assigned to classrooms. I discuss the three in turn. 9. The most widely used VAM, the Tennessee Value Added Assessment System (TVAAS; see Sanders, Saxton, and Horn [1997]), is specified as a mixed model for level scores that depend on the full history of classroom assignments, but this model implies an equation for annual gain scores of the form used in VAM1. VAM2 is more widely used in the recent economics literature. See, for example, Aaronson, Barrow, and Sander (2007); Goldhaber (2007); Jacob and Lefgren (2008); and Kane, Rockoff, and Staiger (2008). VAM3 was proposed by Boardman and Murnane (1979) and has been used recently by Rivkin, Hanushek, and Kain (2005); Harris and Sass (2006); Boyd et al. (2007); and Jacob and Lefgren (2008).

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

183

The Gain Score Model (VAM1). First-differencing the production function (1), we can write the grade-g gain score as (2) Aig = αg +

g−1

βhgc(i,h) + βggc(i,g) + μi τg + ωig + vig .

h=1

If we assume that teacher effects do not decay, βhgc = 0 for all h < g. The error term e1ig from VAM1 then has three components: e1ig = μi τg + ωig + vig . VAM1 will yield consistent estimates of the grade-g classroom effects only if, for each c, (3)

E[e1ig | c(i, g) = c] = 0.

The most natural model that is consistent with (3) is for assignments to depend only on student ability, μi , and for ability to have the same effect on achievement in grades g and g − 1 (i.e., τg = 0). With these restrictions, VAM1 can be seen as the firstdifference estimator for a fixed effects model, with strict exogeneity of classroom assignments conditional on μi . By contrast, (3) is not likely to hold if c (i, g) depends, even in part, on ωig−1 , vig−1 , or Aig−1 . Differences in last year’s gains across this year’s classrooms are informative about the exclusion restriction. Using (2), the average g − 1 gain in classroom c is (4) E[ Aig−1 | c(i, g) = c] = αg−1 + E[βg−1,g−1,c(i, g−1) | c(i, g) = c] + E[e1ig−1 | c(i, g) = c]. The first term is constant across c and can be neglected. The second term might vary with c if (for example) a principal compensated for a bad teacher in grade g − 1 by assignment to a better-than-average teacher in grade g. This can be absorbed by examining the across-c (i, g) variation in Aig−1 controlling for c (i, g − 1). I estimate specifications of this form below.10 Any 10. This is a test of the hypothesis that students are randomly assigned to grade-g classrooms conditional on the g − 1 classroom. This test is uninformative unless there is independent variation in c (i, g − 1) and c (i, g). To take one example, Nye, Konstantopoulos, and Hedges (2004) use data from the Tennessee STAR class size experiment to study teacher effects. In STAR, “streaming” was quite common, and in many schools there is zero independent variation in third grade classroom assignments controlling for second grade assignments. In this case, identification of teacher effects rests entirely on the assumption that past teachers’ effects do not decay.

184

QUARTERLY JOURNAL OF ECONOMICS

remaining variation across grade-g classrooms in g − 1 gains, after controlling for g − 1 classroom assignments, must indicate that students are sorted into grade-g classrooms on the basis of e1ig−1 . Sorting on e1ig−1 would not necessarily violate (3) if e1ig were not serially correlated. But the definition of e1ig above indicates four sources of potential serial correlation. First, ability μi appears in both e1ig and e1ig−1 (unless τg = 0). Second, the εig process may be serially correlated. Third, even if ε is white noise, ωig is a moving average of order g − 1 (absent strong restrictions on the φ coefficients). Finally, vig is an MA(1), degenerate only if var(v) = 0.11 Thus, (3) is not likely to hold if E[e1ig−1 | c(i, g)] is nonzero. The Lagged Score Model (VAM2). VAM2 frees up the coefficient on the lagged test score. If teacher effects decay geometrically at uniform rate 1 − λ, the grade-g score can be written in terms of the g − 1 score, (5)

Aig = αˇ g + Aig−1 λ + βggc(i, g) + e2ig ,

where αˇ g = αg − αg−1 λ. This can equivalently be expressed as a model for the grade-g gain, by subtracting Aig−1 from each side of (5). In either case, the error is (6) e2ig = μi τg − τg−1 λ + εih φhg − φhg−1 λ + εig + vig − vig−1 λ . g−1

h=1

As before, each of the terms in (6) is likely to be serially correlated. The exclusion restriction for VAM2 is that e2ig is uncorrelated with c (i, g) conditional on Aig−1 . This would hold if c (i, g) were randomly assigned conditional on Aig−1 . It is unlikely to hold if assignments depend on e2ig−1 or on any of its components (including μi ).12 As with the VAM1, I test the VAM2 exclusion restriction by 11. In Rothstein (2008), I conclude that vig accounts for as much as 80% of the variance of Aig . 12. Alternatively, if τg − τg−1 λ is constant across g, (5) can be seen as a fixed effects model with a lagged dependent variable. λ and βgg can be identified via IV or GMM (instrumenting for Aig−1 in a model for Aig ) if c (i, g) depends on μi but is strictly exogenous conditional on this (Anderson and Hsiao 1981; Arellano and Bond 1991). See, for example, Koedel and Betts (2007). Value added researchers typically apply OLS to (5). This is inconsistent for λ and identifies βggc only if c (i, g) is random conditional on Aig−1 .

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

185

reestimating the model with the g − 1 gain as the dependent variable. By rearranging the lag of (5), we can write the g − 1 gain as (7) Aig−1 = λ−1 αˇ g + Aig−1 (λ − 1) + βg−1,g−1,c(i,g−1) + e2ig−1 . Thus, the grade-g classroom assignment will have predictive power for the gain in grade g − 1, controlling for the g − 1 achievement level, if grade-g classrooms are correlated either with the g − 1 teacher’s effect (i.e., with βg−1,g−1,c(i,g−1) ) or with e2ig−1 .13 As in VAM1, the former can be ruled out by controlling for g − 1 classroom assignments; the latter would indicate a violation of the VAM2 exclusion restriction if e2 is serially correlated. The Fixed Effects in Gains Model (VAM3). For the final VAM, we return to equation (2) and to the earlier assumption of zero decay of teachers’ effects.14 The student fixed effects used in VAM3 absorb any variation in μi (assuming that τg = 1 for each g). Thus, the VAM3 error term is e3ig = ωig + vig . The reliance on fixed effects, combined with the small time dimension of student data sets, means that VAM3 requires stronger assumptions than the earlier models. To avoid bias in the teacher effects βggc , even in large samples, teacher assignments must be strictly exogenous conditional on μi : E[e3ih | c(i, g)] = 0 for all g and all h (Wooldridge 2002, p. 253).15 Conditional strict exogeneity means that the same information, μi or some function of it, is used to make teacher assignments in each grade. This requires, in effect, that principals decide on classroom assignments for the remainder of a child’s career before she starts kindergarten. If teacher assignments are updated each year in response to the student’s performance during the previous year, strict exogeneity is violated. 13. The test can alternatively be expressed in terms of a model for the score level in g − 2. (Simply rearrange terms in (7).) The VAM2 exclusion restriction of random assignment conditional on Aig−1 will be rejected if the grade-g classroom predicts Aig−2 conditional on Aig−1 . 14. Although VAM1 and VAM2 can easily be generalized to allow for nonuniform decay, VAM3 cannot. 15. For practical value added implementations, it is rare to have more than three or four student grades, so asymptotics based on the g dimension are infeasible. One approach if strict exogeneity does not hold is to focus on the first difference of (2). OLS estimation of the first-differenced equation requires that c (i, g) be uncorrelated with e3ig−1 , e3ig , and e3ig+1 . Though this is weaker than strict exogeneity, it is difficult to imagine an assignment process that would satisfy one but not the other. If the OLS requirements are not satisfied, the only option is IV/GMM (see note 12), instrumenting for both the g and g − 1 classroom assignments. Satisfactory instruments are not apparent.

186

QUARTERLY JOURNAL OF ECONOMICS

As before, my test is based on analyses of the apparent effects of grade g teachers on gains in prior grades. Consider estimation of VAM1, without the student fixed effects that are added in VAM3. If teacher assignments depend on ability, this will bias the VAM coefficients and will lead me to reject the VAM1 exclusion restriction. But the conditional strict exogeneity assumption imposes restrictions on the coefficients from the VAM1 falsification test. Under this assumption, the only source of bias in VAM1 is the omission of controls for μi . As μi enters into every grade’s gain equation, grade-g teachers should have the same apparent effects on g − 2 gains as they do on g − 1 gains. An indication that these differ would indicate that omitted time-varying determinants of gains are correlated with teacher assignments, and therefore that assignments are not strictly exogenous. Following Chamberlain (1984), consider a projection of μ onto the full sequence of classroom assignments in grades 1 through G: (8)

μi = ξ1c(i,1) + · · · + ξGc(i,G) + ηi .

ξhc is the incremental information about μi provided by the knowledge that the student was in classroom c in grade h, conditional on classroom assignments in all other grades. Substituting (8) into (2), we obtain (9)

Aig = αg +

G

πhgc(i,h) + ηi + e3ig ,

h=1

where πggc = ξgc τg + βggc and πhgc = ξhc τg for h = g. Under conditional strict exogeneity, E[e3ih | c(i, 1), . . . , c(i, G)] = 0 for each h, and the fact that (8) is a linear projection ensures that ηi is uncorrelated with the regressors as well. An OLS regression of grade-g gains onto classroom indicators in grades 1 through G thus estimates the πhgc coefficients without bias. When G ≥ 3, the underlying parameters are overidentified. To see this, note that (10)

πhgc = ξhc τg = ξhc τg−1

τg τg = πh,g−1,c τg−1 τg−1

for all h > g: The coefficient for grade-h classroom c in a model of gains in grade g is proportional to the same coefficient in a model of gains in g − 1. If there are Jh grade-h classrooms in the sample, this represents Jh − 1 overidentifying restrictions on

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

187

the 2Jh elements of the vectors hg = {πhg1 . . . πhgJh } and hg−1 = {πh,g−1,1 . . . πh,g−1,Jh } .16 To test these restrictions, I estimate the the Jh-vector h and the scalars τ1 and τ2 that minimize (11) ˆ hg−1 ˆ hg−1

h τg−1 h τg−1

−1 − − , W D= ˆ hg ˆ hg h τg h τg

ˆ ˆ using the sampling variance of ( hg−1 hg ) as W. Under the null hypothesis of strict exogeneity, the minimized value D is distributed χ 2 with Jh − 1 degrees of freedom.17 If D is above the 95% critical value from this distribution, the null is rejected. Intuitively, the correlation between corresponding elements of the coefficient vectors hg−1 and hg , representing apparent “effects” of grade-h teachers on gains in grades g − 1 and g (g < h), should be 1 or −1 under the null; a correlation far from this would suggest that the exclusion restriction is violated. III.D. Implementation To put the three VAMs in the best possible light, I focus on estimation of within-school differences in classroom effects. For many purposes, one might want to make across-school comparisons. But students are not randomly assigned to schools, and those at one school may gain systematically faster than those at another for reasons unrelated to teacher quality. Random assignment to classrooms within schools is at least somewhat plausible. To isolate within-school variation, I augment each of the estimating equations discussed above with a set of indicators for the school attended.18 The tests for VAM1 and VAM2 then amount to tests of whether students are (conditionally) randomly assigned to 16. When G > 3, there are many such pairs of vectors that must be proportional. Even when G = 3, there are additional overidentifying restrictions created by similar proportionality relationships for teachers’ effects on future gains. These restrictions might fail either because strict exogeneity is violated or because teachers’ effects decay (that is, βhh = βhg for some g > h). I therefore focus on restrictions on the coefficients for teachers’ effects on past gains, as these provide sharper tests of strict exogeneity. 17. Although there are Jh + 2 unknown parameters, they are underidentified: Multiplying h by a constant and dividing τg−1 and τg by the same constant does not change the fit. 18. This makes W singular in (11). For the OMD analysis of VAM3, I drop the elements of πgh that correspond to the largest class at each school.

188

QUARTERLY JOURNAL OF ECONOMICS

classrooms within schools. They resemble tests of successful randomization in stratified experiments, treating schools as strata. Intuitively, I will reject random assignment if replacing a set of school indicators with grade-g grade classroom indicators adds more explanatory power for g − 1 gains than would be expected by chance alone. Let Sg and Tg be matrices of indicators for grade-g schools and classrooms. These are collinear, so to eliminate this I define T˜ g as the submatrix of Tg that results from excluding the columns corresponding to one classroom per school. The VAM1 test is based on a simple regression: (12)

Ag−1 = α + Sg δ + T˜ g β + e.

The identifying assumption of VAM1 is rejected if β = 0. I use a heteroscedasticity-robust score test (Wooldridge 2002, p. 60) to evaluate this. I also estimate versions of (12) that include controls for grade-(g − 1) classroom assignments. To test VAM2, I simply add a control for Ag−1 on the right-hand side of (12). It is clear from the definition of T˜ g that only schools with multiple classrooms per grade can contribute to the analysis. One might be concerned that schools with only two or three classrooms will be misleading, as even with random assignment of students to classrooms there will be substantial overlap in the composition of a student’s grade-g and grade-(g − 1) classrooms. The Online Appendix presents a Monte Carlo analysis of the VAM1 and VAM2 tests in schools of varying sizes. The VAM1 test has appropriate size even with just two classrooms per school, so long as the number of students per classroom is large. (Recall that I focus on large-class asymptotics.) With small classes, the asymptotic distribution of the test statistic is an imperfect approximation, and as a result the test over-rejects slightly. When there are twenty students per class, the test of VAM1 has size around 10%. With empirically reasonable parameter values, the VAM2 test performs similarly.19,20 19. When students are assigned to classrooms based on the lagged score and when this score incorporates implausibly high degrees of clustering at the fourth grade classrom level, the VAM2 test rejects at high rates even with large classes. This reflects my use of a test that assumes independence of residuals within schools. Unfortunately, it is not possible to allow for dependence, as clustered variance-covariance matrices are consistent only if the number of clusters grows with the number of parameters fixed (Kezdi 2004) and in my application, the number of parameters grows with the number of clusters. 20. Kinsler (2008) claims that the VAM3 test also overrejects in simulations. In personal communication, he reports that the problem disappears with large classes.

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

189

I also report the standard deviation of the teacher coefficients (the βs in (12)) themselves. The standard deviation of the estimated coefficients necessarily exceeds that of the true coefficients (those that would be identified with large samples of students per teacher, even if these are biased estimates of teachers’ true causal effects). Aaronson, Barrow, and Sander (2007) propose a simple estimator for the variance of the true coefficients across teachers. Let β be a mean-zero vector of true projection coefficients and let βˆ be an unbiased finite-sample estimate of β, with E[β (βˆ − β)] = 0. The variance (across elements) of β can be written as (13)

ˆ − E[(βˆ − β) (βˆ − β)]. E[β β] = E[βˆ β]

ˆ is simply the variance across teachers of the coefficient E[βˆ β] estimates.21 E[(βˆ − β) (βˆ − β)] is the average heteroscedasticityrobust sampling variance. I weight each by the number of students taught. Specifications that include indicators for classroom assignments in several grades simultaneously—such as that used for the test of VAM3—introduce two complications. First, the coefficients for teachers in different grades can only be separately identified when there is sufficient shuffling of students between classrooms. If students are perfectly streamed—if a student’s classmates in third grade are also his or her classmates in fourth grade—the third and fourth grade classroom indicators are collinear. I exclude from my samples a few schools where inadequate shuffling leads to perfect collinearity. Second, these regressions are difficult to compute, due to the presence of several overlapping sets of fixed effects. As discussed in the Online Appendix, this difficulty is avoided by restricting the samples to students who do not switch schools during the grades for which classroom assignments are controlled. IV. DATA AND SAMPLE CONSTRUCTION The specifications described in Section III require longitudinal data that track students’ outcomes across several grades, linked to classroom assignments in each grade. I use administrative data on elementary students in North Carolina public schools, assembled and distributed by the North Carolina 21. βˆ is normalized to have mean zero across teachers at the same school, and its variance is adjusted for the degrees of freedom that this consumes.

190

QUARTERLY JOURNAL OF ECONOMICS

Education Research Data Center. These data have been used for several previous value added analyses (see, e.g., Clotfelter, Ladd, and Vigdor [2006]; Goldhaber [2007]). I examine end-of-grade math and reading tests from grades 3 through 5, plus “pretests” from the beginning of third grade (which I treat as second grade tests). I standardize the scale scores separately for each subject–grade–year combination.22 The North Carolina data identify the school staff member who administered the end-of-grade tests. In the elementary grades, this was usually the regular teacher. Following Clotfelter, Ladd, and Vigdor (2006), I count a student–teacher match as valid if the test administrator taught a “self-contained” (i.e., all day, all subject) class for the relevant grade in the relevant year, if that class was not designated as special education or honors, and if at least half of the tests that the teacher administered were to students in the correct grade. Using this definition, 73% of fifth graders can be matched to teachers. In each of my analyses, I restrict the sample to students with valid teacher matches in all grades for which teacher assignments are controlled. I focus on the cohort of students who were in fifth grade in 2000–2001. Beginning with the population (N = 99,071), I exclude students who have inconsistent longitudinal records (e.g., gender changes between years); who were not in fourth grade in 1999– 2000; who are missing fourth or fifth grade test scores; or who cannot be matched to a fifth grade teacher. I additionally exclude fifth grade classrooms that contain fewer than twelve sample students or are the only included classroom at the school. This leaves my base sample, consisting of 60,740 students from 3,040 fifth grade classrooms and 868 schools. My analyses all use subsets of this sample that provide sufficient longitudinal data. In analyses of fourth grade gains, for example, I exclude students who have missing third grade scores or who were not in third grade in 1998–1999. In specifications that include identifiers for teachers in multiple grades, I further exclude students who changed schools between grades, plus a few schools where streaming produces perfect collinearity. Table I presents summary statistics. I show statistics for the population, for the base sample, and for my most restricted sample 22. The original score scale is meant to ensure that one point corresponds to an equal amount of learning at each grade and at each point in the within-grade distribution. Rothstein (2008) and Ballou (2009) emphasize the importance of this property for value added modeling. All of the results here are robust to using the original scale.

# of students # of schools 1 fifth grade teacher 2 fifth grade teachers 3–5 fifth grade teachers >5 fifth grade teachers # of fifth grade classrooms # of fifth grade classrooms w/valid teacher match Female (%) Black (%) Other nonwhite (%) Consistent student record (%) Complete test score record, G4–5 (%) G3–5 (%) G2–5 (%) Changed schools between G3 and G5 (%) Valid teacher assignment in grade 3 (%) grade 4 (%) grade 5 (%) Fr. of students in G5 class in same G4 class Fr. of students in G5 class in same G3 class [0.19] [0.15]

(2)

(1) 99,071 1,269 122 168 776 203 4,876 3,315 49 29 8 99 88 81 72 30 68 70 72 0.22 0.15

SD

Mean

Population

TABLE I SUMMARY STATISTICS

60,740 868 0 207 602 59 3,040 3,040 50 28 7 100 99 91 80 27 78 86 100 0.22 0.15

(3)

Mean

[0.17] [0.13]

(4)

SD

Base sample

23,415 598 0 122 440 36 2,116 2,116 51 23 6 100 100 100 100 0 100 100 100 0.30 0.28

(5)

Mean

[0.19] [0.18]

(6)

SD

Most restricted sample TEACHER QUALITY IN EDUCATIONAL PRODUCTION

191

0.11 0.09 0.04 0.00 −0.02 −0.02 −0.01 0.08 0.08 0.04 0.00 0.01 −0.02 −0.01

[0.97] [0.94] [0.97] [1.00] [0.70] [0.58] [0.55] [0.98] [0.95] [0.98] [1.00] [0.76] [0.59] [0.59]

(2)

(1) 0.14 0.11 0.07 0.09 −0.02 −0.01 0.01 0.12 0.11 0.07 0.07 0.00 −0.02 0.00

(3)

Mean

[0.96] [0.94] [0.97] [0.98] [0.69] [0.58] [0.55] [0.98] [0.94] [0.97] [0.97] [0.75] [0.59] [0.58]

(4)

SD

Base sample

0.20 0.19 0.20 0.20 0.00 0.01 −0.01 0.17 0.19 0.18 0.17 0.01 0.00 −0.02

(5)

Mean

[0.96] [0.91] [0.93] [0.94] [0.69] [0.56] [0.53] [0.98] [0.91] [0.93] [0.94] [0.75] [0.57] [0.57]

(6)

SD

Most restricted sample

Notes. Summary statistics are computed over all available observations. Test scores are standardized using all third graders in 1999, fourth graders in 2000, and fifth graders in 2001, regardless of grade progress. “Population” in columns (1) and (2) is students enrolled in fifth grade in 2001, merged with third and fourth grade records (if present) for the same students in 1999 and 2000, respectively. Columns (3) and (4) describe the base sample discussed in the text; it excludes students with missing fourth and fifth grade test scores, students without valid fifth grade teacher matches, fifth grade classes with fewer than twelve sample students, and schools with only one fifth grade class. Columns (5) and (6) further restrict the sample to students with nonmissing scores in grades 3–5 (plus the third grade beginning-of-year tests) and valid teacher assignments in each grade, at schools with multiple classes in each school in each grade and without perfect collinearity of classroom assignments in different grades.

Third grade (beginning of year) Third grade (end of year) Fourth grade (end of year) Fifth grade (end of year) Third grade gain Fourth grade gain Fifth grade gain Reading scores Third grade (beginning of year) Third grade (end of year) Fourth grade (end of year) Fifth grade (end of year) Third grade gain Fourth grade gain Fifth grade gain

Math scores

SD

Mean

Population

TABLE I (CONTINUED)

192 QUARTERLY JOURNAL OF ECONOMICS

TEACHER QUALITY IN EDUCATIONAL PRODUCTION

193

(used for estimation of equation (9)). The last is much smaller than the others, largely because I require students to have attended the same school in grades 3 through 5 and to have valid teacher matches in each grade. Table I indicates that the restricted sample has higher mean fifth grade scores than the full population. This primarily reflects the lower scores of students who switch schools frequently.23 Average fifth grade gains are similar across samples. The Online Appendix describes each sample in more detail. As discussed above, my tests can be applied only if there is sufficient reshuffling of classrooms between grades. Table A2 in the Online Appendix shows the fraction of students’ fifth grade classmates who were also in the same fourth grade classes, by the number of fourth grade classes at the school. Complete reshuffling (combined with equal-sized classes) would produce 0.5 with two classes, 0.33 with three, and so on. The actual fractions are larger than this, but only slightly. In schools with exactly three fifth grade teachers, for example, 35% of students’ fifth grade classmates were also their classmates in fourth grade. In only 7% of multiple-classroom schools do the fourth and fifth grade classroom indicators have deficient rank. Table II presents the correlation of test scores and gains across grades and subjects. The table indicates that fifth grade scores are correlated above .8 with fourth grade scores in the same subject, whereas correlations with scores in earlier grades or other subjects are somewhat lower. Fifth grade gains are strongly negatively correlated with fourth grade levels and gains in the same subject and weakly negatively correlated with those in the other subject. The correlations between fifth and third grade gains are small but significant both within and across subjects. VAM3 is predicated on the notion that student ability is an important component of annual gains. Assuming that high-ability students gain faster, this would imply positive correlations between gains in different years. There is no indication of this in Table II. One potential explanation is that noise in the annual tests introduces negative autocorrelation in gains, but I conclude elsewhere (Rothstein 2008) that even true gains are negatively 23. Table I shows that average third and fourth grade scores in the “population” are well above zero. The norming sample that I use to standardize scores in each grade consists of all students in that grade in the relevant year (i.e., of all third graders in 1999), whereas only those who make normal progress to fifth grade in 2001 are included in the sample for columns (1) and (2). The low scores of students who repeat grades account for the discrepancy.

194

QUARTERLY JOURNAL OF ECONOMICS TABLE II CORRELATIONS OF TEST SCORES AND SCORE GAINS ACROSS GRADES Correlations Summary statistics Fifth grade score Fifth grade gain Mean

SD

Math

Reading

(1)

(2)

(3)

(4)

(5)

(6)

(7)

1.00 0.97 0.95 0.97

1 .84 .80 .71

.78 .73 .70 .64

.29 −.27 −.02 .00

.08 −.07 −.03 −.03

70,740 61,535 57,382 50,661

1.00 0.97 0.95 0.99

.78 .73 .70 .59

1 .82 .78 .65

.10 −.05 −.01 .00

.31 −.29 −.05 −.05

70,078 61,535 57,344 50,629

0.55 0.58 0.70

.29 .11 .08

.10 .07 .05

1 −.41 −.02

.25 −.07 .01

61,349 56,171 50,615

0.58 0.59 0.75

.08 .08 .09

.31 .10 .10

.25 −.08 −.01

1 −.41 .02

60,987 56,159 50,558

Math scores G5 0.02 G4 0.07 G3 0.09 G3 pretest 0.08 Reading scores G5 0.01 G4 0.06 G3 0.09 G3 pretest 0.08 Math gains G4–G5 0.01 G3–G4 −0.01 G2–G3 0.02 Reading gains G4–G5 0.00 G3–G4 −0.02 G2–G3 0.02

Math Reading

N

Notes. Each statistic is calculated using the maximal possible sample of valid student records with observations on all necessary scores and normal grade progress between the relevant grades. Column (7) lists the sample size for each row variable; correlations use smaller samples for which the column variable is also available. Italicized correlations are not different from zero at the 5% level.

autocorrelated. This strongly suggests that VAM3 is poorly suited to the test score data generating process. V. RESULTS Tables III, IV, and V present results for the three VAMs in turn. I begin with VAM1, in Table III. I regress fifth grade math and reading gains (in columns (1) and (2), respectively) on indicators for fifth grade schools and classrooms, excluding one classroom per school. In each case, the hypothesis that all of the classroom coefficients are zero (i.e., that classroom indicators have no explanatory power beyond that provided by school indicators) is decisively rejected. The VAM indicates that the withinschool standard deviations of fifth grade teachers’ effects on math and reading are 0.15 and 0.11, respectively. This is similar to what

0.160 0.113 μ M or μ A < μ M . In the case where μ A > μ M (μ A < μ M ), the chairman is hawkish (dovish) in the sense that, conditional on inflation, inflation volatility, and the output gap, he prefers a higher (lower) interest rate than M. Note that this specification does not require the chairman to be the most hawkish (or dovish) member of the committee, and there may be members that systematically prefer higher (lower) interest rates than the chairman. The identity of the chairman is assumed to be fixed over time. For the sake of exposition, this section focuses on the case of the hawkish chairman only, but the dovish case is perfectly symmetric. The voting protocol is the following. In each meeting, given the current status quo qt = it−1 , the chairman proposes an interest rate it under closed rule. That is, the other committee members can either accept or reject the chairman’s proposal. If the proposal passes (i.e., it obtains at least (N + 1)/2 votes), then the proposed policy is implemented and becomes the status quo for next meeting. If the proposal is rejected, then the status quo is maintained and it = it−1 . This procedure is repeated at the next meeting. As in the consensus model, individuals vote as if they were pivotal and disregard the consequences of their voting decisions for future meetings via the status quo. Thus, members accept a proposal whenever the current utility from the proposal is larger than or equal to the utility from the current status quo, and the chairman picks the policy closest to his ideal point among those that are acceptable to a majority of (N + 1)/2 members. This voting game is well-known in the political economy literature and was originally derived by Romer and Rosenthal (1978) under the assumption 20. The case μ A = μ M is trivial in that it always delivers the median outcome, and it is therefore observationally equivalent to the protocols studied in the next section.

MONETARY POLICY BY COMMITTEE

379

of symmetric preferences.21 Here, instead, the induced utilities of all members other than the median are single-peaked but not symmetric. In principle, this lack of symmetry may imply that proposals are accepted by a coalition that excludes the median. The proof of Proposition 2 ensures that this is not the case. Define ϒ(st , qt , ωt ) to be the political aggregator in the agendasetting game. The following proposition establishes the policy outcome under this protocol. PROPOSITION 2. The policy outcome in the agenda-setting model with μ A > μ M is given by (10)

it = ϒ(st , qt , ωt ) ⎧ ∗ ⎪ i A,t , ⎪ ⎪ ⎨q , t = ∗ ⎪ 2i ⎪ M,t − qt , ⎪ ⎩ ∗ i A,t ,

if qt > i ∗A,t , ∗ if iM,t ≤ qt ≤ i ∗A,t ,

∗ ∗ if 2iM,t − i ∗A,t ≤ qt < iM,t , ∗ ∗ if qt < 2iM,t − i A,t .

Proof. The proof consists of the following steps. Step 1. Let V j (.) denote the indirect utility of member j as a function of the interest rate and let it denote the current proposal. We show that V j (it ) − V j (qt ) is increasing in μ j for all it and qt ∗ such that qt ≤ iM,t ≤ it . The difference of the expected payoff of committee member j associated with interest rates it and qt is V j (it ) − V j (qt ) exp(t )(exp(−μ j α1 β2 qt ) − exp(−μ j α1 β2 it )) + μ j α1 β2 (qt − it ) , = μ2j where t = (1 + α1 β2 )πt + α1 β2 ι + α1 (1 + β1 )yt + μ2j σπ2 /2 + γ ut + ∗ ≤ it , it can be shown that a sufficient, but ς α1 vt . When qt ≤ iM,t not necessary, condition for V j (it ) − V j (qt ) to be increasing in μ j is that μ2j σπ2 ≥ 2 for all j > M. In the rest of the proof we will assume that this condition is verified. Step 2. First, when qt ∈ (i ∗A, i], the agenda setter proposes i ∗A,t , which is accepted by all members j such that μ j ≤ μ A. This follows ∗ , i ∗A,t ], from the indirect utility being single-peaked. When qt ∈ [iM,t the agenda setter cannot increase the interest rate. The best proposal among the acceptable ones is the status quo, which 21. The agenda-setting model has been used to study monetary policy making by Riboni and Ruge-Murcia (2008c) and Montoro (2006).

380

QUARTERLY JOURNAL OF ECONOMICS

∗ ∗ is always accepted. When qt ∈ [2iM,t − i ∗A,t , iM,t ), the set of policies ∗ − qt ]. By that the median accepts is given by the interval [qt , 2iM,t Step 1, we know that these proposals are accepted by all members ∗ − qt , j such that μ j ≥ μ M and that any proposal greater than 2iM,t which is rejected by the median, is also rejected by all members j ∗ − i ∗A,t ), the agenda such that μ j ≤ μ M . Finally, when qt ∈ [0, 2iM,t ∗ setter is again able to propose i A, which is accepted by the median. By Step 1 this proposal is also accepted by all members j such that μ j ≥ μ M .

The political aggregator as a function of qt is plotted in Panel B of Figure II. The policy aggregator for the case of a dovish chairman is plotted in Panel C of the same figure, and it is easy to see that it is the mirror image of the one derived here for the hawkish chairman. Control over the agenda of the part of the chairman implies deviations from the median outcome. This is due to the fact that the chairman can propose the policy he prefers, among those alternatives that at least a majority of committee members (weakly) prefer to the status quo. Among the acceptable alternatives, there is no reason to expect the chairman to propose the median outcome. Moreover, deviations from the median outcome are systematically in one direction. That is, they will always bring the policy outcome closer to the policy preferred by the chairman. As before, there is an interval of status quo policies for which policy change is not possible (i.e., a gridlock interval). This interval ∗ , i ∗A,t ], that is, all policies between the interest is given by [iM,t rate preferred by the median and the chairman. If the status quo falls within this interval, policy changes are blocked by either the chairman or a majority of committee members. To see this, ∗ , i ∗A,t ], a majority would veto any increase note that when qt ∈ [iM,t of the instrument value towards i ∗A and proposing the status quo is then the best option for the chairman. The width of the gridlock interval is increasing in the distance between the chairman’s and the median’s preferred interest rates. A policy change occurs only if the status quo is sufficiently extreme, compared with the members’ preferred policies. In par∗ ∗ − i ∗A,t , iM,t ), the chairticular, when qt falls in the interval [2iM,t man chooses the policy closest to his or her ideal point subject to the constraint that M will accept it. This constraint is binding at equilibrium, meaning that M will be indifferent between the status quo and the interest rate that A proposes. Because the median has a symmetric induced utility (recall that μ M = 0), this proposal

MONETARY POLICY BY COMMITTEE

381

∗ is the reflection point of qt with respect to iM,t . When the status ∗ ∗ quo policy is either lower than 2iM,t − i A,t or higher than i ∗A,t , the chairman is able to offer and pass the proposal that coincides with his ideal point. In the rest of this section, we compare the theoretical predictions of the consensus and agenda-setting models. First, both models deliver a gridlock interval where it is not possible to change the status quo. However, it is difficult to predict a priori which voting procedure features the largest gridlock interval, because the comparison depends on the degree of consensus that the committee requires (summarized by K) and on the extent of disagreement between the chairman and the median. The intersection of the two ∗ must belong to both gridlock intervals is nonempty, given that iM,t intervals. In principle, the gridlock interval in the agenda-setting model could be a strict subset of the one in the consensus model if |μ A − μ M | were sufficiently small and K sufficiently large, but the converse cannot happen. Second, whenever the committee decides to change the status quo, the models deliver different predictions with respect to the size of the policy change. The agenda-setting model with hawkish (dovish) chairman yields more aggressive interest rate increases (decreases) than the other two models. For example, suppose that ∗ , the agenda-setting the chairman is a hawk. Then, when qt < iM,t model unambiguously predicts a larger policy change than the ∗ , the comparison is amconsensus model. Instead, when qt ≥ iM,t biguous, and the size of the interest rate decrease depends on the ∗ . location of i ∗A,t versus iM+K,t Finally, note that under both protocols, the endpoints of the gridlock interval are stochastic and depend on the current state of the economy. An implication of the predicted local inertia is that the relation between changes in the state of nature and in policy is nonlinear. In particular, small changes in the state of economy are less likely to produce policy changes compared with larger ones. Empirically, this would mean, for example, that small variations in the rates of inflation and unemployment would be less likely to result in a change in the key nominal interest rate, compared with large movements in these variables.

III.C. The Frictionless Model This model is used to describe two protocols, namely the median and the dictator models, which involve different decisionmaking processes, but deliver essentially the same empirical

382

QUARTERLY JOURNAL OF ECONOMICS

implications for the nominal interest rate. In particular, both protocols predict that regardless of the initial status quo, the committee will adopt the interest rate preferred by one key individual: the median member (in the median model) or the chairman (in the dictator model). The protocols are, therefore, frictionless in the sense that the status quo plays no role in determining the current interest rate. Consider first the median model, which is the standard framework of analysis in political economy. Under standard conditions, which are all satisfied in our setting, the Median Voter Theorem (Black 1958) implies a unique core outcome represented by the alternative preferred by the individual whose ideal point constitutes the median of the set of ideal points. Although in its original formulation the Median Voter Theorem lacks a noncooperative underpinning, notice that the median outcome may be obtained as a special case of the consensus-based model when a simple majority is needed to pass a proposal. Applying Proposition 1 to the case where K = 0 (that is, when the required majority equals (N + 1)/2), it is easy to see that, starting from any status quo policy, the interest rate preferred by the median is always selected. Consider now the dictator model. Under this protocol, the chairman, denoted by C, has absolute power over the committee and is able to impose his or her views at every meeting. Hence, the interest rate selected by the committee is the one preferred by the chairman. In this respect, the chairman has much greater power than in the agenda-setting model, where the chairman fully controls the agenda but is subject to an acceptance constraint because a majority is required to pass a proposal. Absent any friction in the political process, both the median and dictator models predict that, within each meeting and starting from any status quo, the committee adopts (11)

∗ = aS + bπt + cyt + ζt , iS,t

where S equals M or C, depending on the model. (Recall that M and C, respectively, stand for the median and chairman.) It is important to note that in a frictionless model there is neither inertia nor path dependence. Having a committee is then equivalent to having either the median or the chairman as single central banker and, therefore, the reaction function is observationally indistinguishable from a standard Taylor rule derived under the assumption that monetary policy is selected by one individual. This

MONETARY POLICY BY COMMITTEE

383

model predicts a proportional adjustment of the policy instrument in response to any change in inflation and unemployment, regardless of their size, and generates interest rate autocorrelation only from the serial correlation of the fundamentals. The policy outcome predicted by the frictionless model is plotted in Panel D of Figure II. In all the panels of this figure, the size of the policy change may be inferred from the vertical distance between the policy rule and the 45◦ line. IV. EMPIRICAL ANALYSIS IV.A. The Data The data set consists of interest rate decisions by monetary policy committees in five central banks, namely the Bank of Canada, the Bank of England, the ECB, the Swedish Riksbank, and the U.S. Federal Reserve, along with measures of inflation and the output gap in their respective countries. Inflation is measured by the twelve-month percentage change of the Consumer Price Index (Canada and Sweden), the Retail Price Index excluding mortgage-interest payments or RPIX (United Kingdom), the Harmonized Consumer Price Index (European Union), and the Consumer Price Index for All Urban Consumers (United States).22 The output gap is measured by the deviation of the seasonally adjusted unemployment rate from a trend computed using the Hodrick–Prescott filter. Interest rate decisions concern the target values for the Overnight Rate (Canada), the Repo Rate (United Kingdom and Sweden), the Rate for Main Refinancing Operations (European Union), and the Federal Funds Rate (United States). For the Federal Reserve, the sources are Chappell, McGregor, and Vermilyea (2005) and the minutes of the FOMC meetings, which are available at www.federalreserve.gov. For the Riksbank, the source is the minutes of the meetings of the Executive Board, which are available at www.riksbank.com. For the other central banks, the sources are official press releases compiled by the authors. The sample for Canada starts with the first preannounced date for monetary policy decisions in December 2000 and ends in March 2007. The sample for the United Kingdom starts with the 22. Since December 2003, the inflation target in the United Kingdom applies to the consumer price index (CPI) rather than to the RPIX. However, results using the CPI are similar to the ones reported below and are available upon request.

384

QUARTERLY JOURNAL OF ECONOMICS

first meeting of the Monetary Policy Committee in June 1997 and ends in June 2007. The sample for the European Union starts on January 1999, when the ECB officially took over monetary policy from the national central banks, and ends in March 2007. The sample for Sweden starts with the first meeting of the Executive Board on January 1999 and ends in June 2007. The sample for the United States starts in August 1988 and ends in January 2007. This period corresponds to the chairmanship of Alan Greenspan, with a small number of observations from the chairmanship of Ben Bernanke.23 The number of scheduled meetings per year varies from seven or eight (Bank of Canada, Riksbank, and Federal Reserve) to eleven (ECB) and twelve (Bank of England). There is substantial heterogeneity in the formal procedures followed by the monetary policy committees in our sample. The Governing Council of the Bank of Canada consists of the Governor and five Deputy Governors and explicitly operates on a consensus basis. This means that the discussion at the meeting is expected to move the committee toward a shared view. The Monetary Policy Committee of the Bank of England consists of nine members of whom five are internal (that is, chosen from within the ranks of bank staff) and four are external appointees. Meetings are chaired by the Governor of the Bank of England, decisions are made by simple majority, and dissenting votes are public. The decision-making body of the ECB consists of six members of the Executive Board and thirteen governors of the national central banks. According to the statutes (see footnote 3 above), monetary policy is decided by simple majority rule. The ECB issues no minutes and, consequently, dissenting opinions are not made public. Under the Riksbank Act of 1999, the Swedish Riksbank is governed by an Executive Board, which includes the Governor and five Deputy Governors, and decisions concerning the Repo Rate are made by majority vote, but formal reservations against the majority decision are recorded in the minutes. Finally, the FOMC takes decisions by majority rule among the seven members of the Board of Governors, the president of the New York Fed, and four members of the remaining district banks, chosen according to an annual rotation scheme. The minutes of FOMC meetings are made public. However, unlike the Riksbank and 23. The working paper version of this article (Riboni and Ruge-Murcia 2008a) also reports results for a U.S. subsample from February 1970 to February 1978, which corresponds to the chairmanship of Arthur Burns. The conclusions drawn from that subsample are the same as those reported here.

MONETARY POLICY BY COMMITTEE

385

the Bank of England, dissenting members in the FOMC do not always state the exact interest rate they would have preferred, but only the direction of dissent (either tightening or easing). IV.B. Formulation of the Likelihood Functions This section shows that the political aggregators derived in Section II imply particular time-series processes for the nominal interest rate and presents their log-likelihood functions under the maintained assumption that shocks are normally distributed.24 First, consider the consensus-based model. The political aggregator (9) in Proposition 1 means that the nominal interest rate follows a nonlinear process whereby each observation may belong to one of three possible regimes depending on whether the sta∗ ∗ , smaller than iM−K,t , or in tus quo (qt = it−1 ) is larger than iM+K,t between these two values. In the first case, the committee cuts ∗ ; in the second case, it raises the inthe interest rate to iM+K,t ∗ terest rate to iM−K,t ; in the third case, it keeps the interest rate unchanged. Because the data clearly show the instances where the committee takes each of these three possible actions, it follows that the sample separation is perfectly observable and each interest rate observation can be unambiguously assigned to its respective regime. Define the set t = {it−1 , πt , yt } with the predetermined variables at time t, and the sets 1 , 2 and 3 that contain the observations where the interest rate was cut, left unchanged, and raised, respectively. Denote by T1 , T2 , and T3 the number of observations in each of these sets and by T (= T1 + T2 + T3 ) the total number of observations. Then the log likelihood function of the T available interest rate observations is simply L(θ ) = −(T1 + T3 )σ + log φ(zM+K,t ) +

it ∈2

it ∈1

log((zM−K,t ) − (zM+K,t )) +

log φ(zM−K,t ),

it ∈3

where θ = {aM+K , aM−K , b, c, σ } is the set of unknown parameters, zM+K,t = (it−1 − aM+K − bπt − cyt )/σ , zM−K,t = (it−1 − aM−K − bπt − cyt )/σ , and φ(·) and (·) are the probability density and cumulative distribution functions of the standard normal variable, respectively. The maximization of this function with respect 24. For the detailed derivation of these functions, see Section 4.2 in the working paper version of this article (Riboni and Ruge-Murcia 2008a).

386

QUARTERLY JOURNAL OF ECONOMICS

to θ delivers consistent maximum likelihood (ML) estimates of the parameters of the interest rate process under the consensus model. The log likelihood function of the consensus model is similar to the one studied by Rosett (1959), who generalizes the two-sided Tobit model to allow the mass point to be anywhere in the conditional cumulative distribution function. In both models, the dependent variable reacts only to large changes in the fundamentals. However, whereas Rosett’s frictional model is static and the mass point is concentrated around a fixed value, the consensus-based model is dynamic and the mass point is concentrated around a time-varying and endogenous value, albeit predetermined at the beginning of the meeting. Second, consider the agenda-setting model with a hawkish chairman. (The case of the dovish chairman is isomorphic and not presented here to save space.) The political aggregator (10) in Proposition 2 means that the nominal interest rate follows a nonlinear process where each realization belongs to one of four possible regimes, rather than three, as in the consensus model. In the case where it−1 is larger than i ∗A,t , the committee cuts the interest rate to i ∗A,t , and the observation can be unambiguously ∗ and assigned to the set 1 . In the case where it−1 is between iM,t ∗ i A,t , the committee keeps the interest rate unchanged and the observation clearly belongs to 2 . However, in the case where it−1 ∗ is smaller than iM,t (for example, as a result of a sufficiently large realization of ζt ), the agenda setter may propose an interest rate ∗ − it−1 or i ∗A,t depending on whether the increase to either 2iM,t acceptance constraint is binding or not. Although the observation can be assigned to 3 , one cannot be sure which of the two regimes ∗ − it−1 or i ∗A,t ) has generated it . The reason is simply (whether 2iM,t that on the basis of interest rate data alone, it is not possible to know ex ante whether the acceptance constraint is binding or not. Hence, in the agenda-setting model, the sample separation is imperfect. The log likelihood function of the T available observations is

L(ϑ) = −(T1 + T3 )σ + +

it ∈3

it ∈1

log φ(z A,t ) +

log((zM,t ) − (z A,t ))

it ∈2

log(φ(z A,t )I(wt ) + (1/2)φ(zD,t )(1 − I(wt ))),

MONETARY POLICY BY COMMITTEE

387

where ϑ = {aA, aM , b, c, σ } is the set of unknown parameters, z A,t = (it−1 − aA − bπt − cyt )/σ, zM,t = (it−1 − aM −bπt − cyt )/σ, zD,t = (it − 2(aM + bπt + cyt ) + it−1 )/σ , wt is short-hand for the condition it − it−1 − 2(aA − aM ) < 0, and I(·) is an indicator function that takes the value 1 if its argument is true and zero otherwise. The terms in the latter summation show that for interest rate increases, the density is a mixture of the two normal distributions ∗ − it−1 and i ∗A,t . The weights associated with the processes 2iM,t of this mixture take either the value zero or one because the disturbance term is the same in both processes and, hence, these distributions are perfectly correlated. By maximizing this function with respect to ϑ, it is possible to obtain consistent ML estimates of the parameters of the interest rate process under the agenda-setting model.25 Finally, consider the frictionless model where it = aS + bπt + cyt + ζt and the log likelihood function of the T available observations is just L(ϕ) = −T σ + log φ(ZS,t ), ∀it

where ϕ = {aS , b, c, σ } is the set of unknown parameters and ZS,t = (it − aS − bπt − cyt )/σ . The maximization of this function with respect to ϕ delivers consistent ML estimates of the parameters of the interest rate process under the frictionless model. Notice, however, that with data on interest rate decisions alone, it is not possible to distinguish between the two possible interpretations of the frictionless model. IV.C. Empirical Results Tables I through V report empirical results for the monetary committees of the Bank of Canada, the Bank of England, the ECB, the Swedish Riksbank, and the U.S. Federal Reserve. Panel A in these tables reports maximum likelihood estimates of the parameters of the interest rate process under each protocol. Although some coefficients are not statistically significant, estimates for all protocols are generally in line with the theory in the sense that 25. The indicator function I(·) induces a discontinuity in the likelihood function and, consequently, this maximization requires either the use of a non-gradientbased optimization algorithm or a smooth approximation to the indicator function. We followed the latter approach here, but using the simulated annealing algorithm in Corana et al. (1987), which does not require numerical derivatives but is more time-consuming, delivers the same results.

388

QUARTERLY JOURNAL OF ECONOMICS TABLE I BANK OF CANADA Dominant chairman Consensus Hawkish

aM+K aM−K

3.175∗ (0.422) 1.357∗ (0.471)

aM aA aS b c σ

0.386∗ (0.174) −0.120 (0.304) 0.965∗ (0.124)

L(·) −58.76 AIC 127.53 RMSE 0.506 MAE 0.388 Chairman extracts all rents (p-value) Autocorrelation Standard deviation Proportion of Cuts Increases No changes Policy reversals

Dovish

With size Frictionless friction Data

A. Parameter estimates

1.314∗ (0.462) 3.162∗ (0.413)

3.257∗ (0.414) 1.377∗ (0.462)

2.618∗ (0.343)

2.604∗ (0.343)

0.381∗ (0.171) −0.112 (0.297) 0.942∗ (0.120)

0.388∗ (0.171) −0.135 (0.297) 0.941∗ (0.120)

0.231 (0.143) −0.041 (0.256) 0.845∗ (0.085)

0.237 (0.143) −0.036 (0.246) 0.845∗ (0.086)

B. Criteria for model selection −67.07 −67.53 −62.82 −77.38 144.14 145.06 133.64 162.75 0.631 0.867 0.850 0.992 0.502 0.561 0.715 0.844

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & close