JOURNAL OF FINANCIAL ECONOMICS Aims and Scope: The Journal of Financial Economics provides a specialized forum for the publication of research in the area of financial economics and the theory of the firm, placing primary emphasis on the highest quality analytical, empirical, and clinical contributions in the following major areas: capital markets, financial institutions, corporate finance, corporate governance, and the economics of organizations. Managing Editor: G. WILLIAM SCHWERT, William E. Simon Graduate School of Business Administration, University of Rochester, Rochester, NY 14627 (email:
[email protected]) Founding Editor: MICHAEL C. JENSEN, Graduate School of Business Administration, Harvard University, Boston, MA 02163 (email:
[email protected]) Advisory Editors: EUGENE F. FAMA, Graduate School of Business, University of Chicago, Chicago, IL 60637; KENNETH R. FRENCH, Tuck School of Business, Dartmouth College, Hanover, NH 03755-1798; WAYNE MIKKELSON, Charles H. Lundquist College of Business, University of Oregon, Eugene, OR 97403; JAY SHANKEN, Goizueta Business School, Emory University, Atlanta, GA 30322; ANDREI SHLEIFER, Department of Economics, Harvard University, Cambridge, MA02138; CLIFFORD W. SMITH, JR., William E. Simon Graduate School of Business Administration, University of Rochester, Rochester, NY 14627; RENÈ M. STULZ, Ohio State University, Columbus, OH 43210. Editorial Assistant: KATHLEEN MADSEN, William E. Simon Graduate School of Business Administration, University of Rochester, Rochester, NY 14627. Associate Editors: HENDRIK BESSEMBINDER, University of Utah; JOHN CAMPBELL, Harvard Economics Department; HARRY DeANGELO, University of Southern California; DARRELL DUFFIE, Stanford University, San Francisco, CA; BENJAMIN ESTY, Harvard University; RICHARD GREEN, Carnegie-Mellon; JARRAD V.T. HARFORD, University of Washington; CHRISTOPHER JAMES, University of Florida; SIMON JOHNSON, Massachusetts Institute of Technology; STEVEN KAPLAN, University of Chicago; ANDREW KAROLYI, The Ohio State University; JOSH LERNER, Harvard Business School; TIM LOUGHRAN, University of Notre Dame; MICHELLE LOWRY, Penn State University; KEVIN MURPHY, University of Southern California; MICAH OFFICER, University of Southern California; LUBOS PASTOR, University of Chicago; NEIL PEARSON, University of Illinois; JAY RITTER, University of Florida; RICHARD G. SLOAN, University of Michigan; JEREMY C. STEIN, Harvard Economics Department; JERRY WARNER, University of Rochester; TONI WHITED, University of Wisconsin; KAREN WRUCK, Ohio State University, DAVID YERMACK, New York University. Submission Fee: Unsolicited manuscripts must be accompanied by a submission fee of $500 for authors who are current Journal of Financial Economics subscribers and $550 for non-subscribers. This submission fee will be refunded for all accepted manuscripts. To encourage quicker response, referees are paid an honorarium out of the submission fee. There are no page charges. Payments by Visa or MasterCard should give details of cardholder’s name, credit card number, and expiration date. Checks should be made payable to the Journal of Financial Economics, and must be in U.S. dollars. Publication information: Journal of Financial Economics (ISSN 0304-405X) is published monthly by Elsevier B.V. (Radarweg 29, 1043 NX Amsterdam, the Netherlands). Further information on this journal is available from the Publisher or from the Elsevier Customer Service Department nearest you or from this journal’s website (http://www.elsevier.com/locate/jfec). Information on other Elsevier products is available through Elsevier’s website (http://www.elsevier.com). Periodicals Postage Paid at Rahway, NJ, and at additional mailing offices. USA POSTMASTER: Send change of address to Journal of Financial Economics, Elsevier, Customer Service Department, 3251 Riverport Lane, Maryland Heights, MO 63043, USA. Orders, claims, and journal enquiries: please contact the Elsevier Customer Service Department nearest you: St. Louis: Elsevier Customer Service Department, 3251 Riverport Lane, Maryland Heights, MO 63043, USA; phone: (877) 8397126 [toll free within the USA]; (+1) (314) 4478878 [outside the USA]; fax: (+1) (314) 4478077; e-mail:
[email protected] Oxford: Elsevier Customer Service Department, The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK; phone: (+44) (1865) 843434; fax: (+44) (1865) 843970; e-mail:
[email protected] Tokyo: Elsevier Customer Service Department, 4F Higashi-Azabu, 1-Chome Bldg, 1-9-15 Higashi-Azabu, Minato-ku, Tokyo 106-0044, Japan; phone: (+81) (3) 5561 5037; fax: (+81) (3) 5561 5047; e-mail:
[email protected] Singapore: Elsevier Customer Service Department, 3 Killiney Road, #08-01 Winsland House I, Singapore 239519; phone: (+65) 63490222; fax: (+65) 67331510; e-mail:
[email protected] USA mailing notice: Journal of Financial Economics (ISSN 0304–405X) is published monthly by Elsevier B.V. (Radarweg 29, 1043 NX Amsterdam, The Netherlands). Periodical postage paid at Rahway NJ and additional mailing offices. AIRFREIGHT AND MAILING in the USA by Mercury International Limited, 365, Blair Road, Avenel, NJ 07001. Advertising information: If you are interested in advertising or other commercial opportunities please e-mail Commercialsales@ elsevier.com and your enquiry will be passed to the correct person who will respond to you within 48 hours. The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper). Printed in the United States of America.
Journal of Financial Economics 99 (2011) 235–261
Contents lists available at ScienceDirect
Journal of Financial Economics journal homepage: www.elsevier.com/locate/jfec
Capital structure dynamics and transitory debt$ Harry DeAngelo a, Linda DeAngelo a, Toni M. Whited b, a b
Marshall School of Business, University of Southern California, Los Angeles, CA 90089, USA Simon Graduate School of Business Administration, University of Rochester, Rochester, NY 14627, USA
a r t i c l e in fo
abstract
Article history: Received 5 May 2009 Received in revised form 11 January 2010 Accepted 15 February 2010 Available online 19 September 2010
Firms deliberately but temporarily deviate from permanent leverage targets by issuing transitory debt to fund investment. Leverage targets conservatively embed the option to issue transitory debt, with the evolution of leverage reflecting the sequence of investment outlays. We estimate a dynamic capital structure model with these features and find that it replicates industry leverage very well, explains debt issuances/ repayments better than extant tradeoff models, and accounts for the leverage changes accompanying investment ‘‘spikes.’’ It generates leverage ratios with slow average speeds of adjustment to target, which are dampened by intentional temporary movements away from target, not debt issuance costs. & 2010 Elsevier B.V. All rights reserved.
JEL classification: G32 G31 Keywords: Dynamic capital structure Financial flexibility Target capital structure
1. Introduction We estimate a dynamic capital structure model that differs from prior models in which firms have leverage targets because it recognizes that firms sometimes issue transitory debt and deviate deliberately, but temporarily, from target in order to fund investment. The model generates leverage dynamics that differ radically from those of prior tradeoff models and yields a rich set of testable predictions that link capital structure to variation
$ This research was supported by the Charles E. Cook/Community Bank and Kenneth King Stonier Chairs at the Marshall School of Business of the University of Southern California and the Michael and Diane Jones Chair at the University of Rochester. For useful comments and suggestions, we thank Utpal Bhattacharya, Murray Carlson, Wayne Ferson, Joao Gomes, Bob McDonald, Kevin J. Murphy, Michael Roberts, Rene´ Stulz, Mark Westerfield, an anonymous referee, and participants in seminars at American University, Arizona State, the Federal Reserve Board, Indiana University, Kellogg, Lancaster, Lausanne, Notre Dame, Ohio State, University of Houston, University of Michigan, University of Rochester, University of Zurich, and Wharton. Corresponding author. E-mail address:
[email protected] (T.M. Whited).
0304-405X/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jfineco.2010.09.005
in the volatility of shocks to investment policy, the serial correlation of such shocks, and the marginal profitability of investment, as well as to variation in both fixed and convex costs of adjusting the capital stock. Because firms often borrow and move away from target to fund investment, the model generates leverage ratios with slow average speeds of adjustment (SOA) to target that are close to the estimates reported in empirical SOA studies, with rebalancing toward target largely occurring in states of the world in which firms’ investment needs are moderate. We find that the model replicates industry leverage very well, that it explains firms’ debt issuance/ repayment decisions better than extant tradeoff models of capital structure, and that it can account for the leverage changes that accompany investment ‘‘spikes.’’ In this model, firms’ use of transitory debt and their target capital structures are systematically related to the nature of their investment opportunities because (i) borrowing is a cost-efficient means of raising capital when a given shock to investment opportunities dictates a funding need, and (ii) the option to issue debt is a scarce resource whose optimal intertemporal utilization depends on both current and prospective shocks. The
236
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
option to issue debt is valuable in our model because investment is endogenous and because of three assumptions that dictate that all sources of capital (external equity, corporate cash balances, and borrowing) are costly means of funding investment. First, equity issuance entails costs, an assumption intended to reflect the existence of adverse selection problems or security flotation expenses. Second, holding cash incurs costs, which can reflect corporate taxes, agency costs, or an interest rate differential on precautionary liquid asset holdings in the spirit of Keynes (1936). Finally, debt capacity is finite, an assumption that can reflect financial distress costs or asymmetric information problems that prevent creditors from gauging firms’ ability to support debt. As a result, when a firm borrows today, the relevant ‘‘leverage-related cost’’ includes the opportunity cost of its consequent future inability to borrow. This opportunity cost of borrowing implies that target capital structures are more conservative than those produced by otherwise similar tax/distress cost tradeoff models because the cost of borrowing today includes the value lost when a firm fails to preserve the option to borrow later at comparable terms. Intuitively, a firm’s long-run target is the theoretically ideal debt level that, when viewed ex ante, optimally balances its tax shield from debt against not only distress costs, but also against the opportunity cost of borrowing now rather than preserving the option to borrow later. More precisely, in our model a firm’s target capital structure is the matching of debt and assets to which the firm would converge if it optimized its debt and assets decisions in the face of uncertainty, but then by chance happened to receive only neutral shocks to investment opportunities for many periods in a row. (In this case, the firm has ample resources to pay down any debt in excess of target, and also has no new projects that must be funded externally.) In general, the target debt level is a function of the probability distribution of investment opportunities, taxes, distress costs, external equity financing costs, and the costs of holding cash. We show that the target is a single ratio of debt to assets, except when firms face fixed costs of adjusting their stock of physical capital, in which case firms have a range of target leverage ratios. Our model yields a variety of specific testable predictions that link firms’ investment attributes to their capital structure decisions. For example, average debt outstanding is inversely related to the volatility of unexpected shocks to investment opportunities, and the imposition of corporate taxes induces greater leverage for firms that face low as opposed to high shock volatility. Intuitively, firms that face high shock volatility find it especially valuable to preserve debt capacity to address substantial funding needs associated with future shocks to investment opportunities, and this benefit looms large relative to the interest tax shields they lose by maintaining low debt ratios on average. The more volatile investment outlays of high versus low shock volatility firms also induce the former to rely more on (costly) cash balances to fund investment. For similar reasons, the model also predicts lower average debt ratios and greater reliance on cash balances for firms that face higher (i) serial
correlation of shocks to investment opportunities, (ii) marginal profitability of investment, and (iii) fixed (compared to convex) costs of adjusting the stock of physical capital. We refer to the difference between actual and target debt levels as transitory debt,1 with actual debt deviating from target because investment policy is endogenous. For example, with no tax or other permanent benefit from corporate debt, firms nevertheless find that issuing debt is sometimes the most efficient source of capital (in a sense made precise below), yet zero debt is the capital structure target. Paying down debt (issued to fund prior investment) frees up debt capacity, which reduces the expected future costs of capital access. Hence managers always have incentives to return their firms to zero debt in the absence of taxes. They may not be able to accomplish this objective quickly, however, since multiple sequential shocks may arrive, requiring additional funds and, perhaps, more borrowing. Although firms have leverage targets as in static tradeoff models, managers sometimes choose to deviate from target, and subsequently seek to rebalance to target by reducing debt with a lag determined in part by the time path of investment opportunities and earnings realizations. The prediction that firms deliberately deviate from target differentiates our analysis from static tradeoff models and from the multi-period contingent claims generalizations thereof, which assume exogenous investment and positive leverage rebalancing costs, e.g., Fischer, Heinkel, and Zechner (1989) and Goldstein, Ju, and Leland (2001). These models predict that all managementinitiated changes in capital structure move firms toward target, although as we detail in Section 6 several prior studies and our own evidence indicate that this prediction is not borne out empirically. A second empirical shortcoming of extant tradeoff theories—and a shortcoming plausibly related to the high observed incidence of deliberate deviations from target —is the slow speed of adjustment (SOA) to target estimated in leverage rebalancing studies. For example, Fama and French (2002) find ‘‘suspiciously slow’’ speeds of adjustment to target, and other studies estimate that firms adjust an average of somewhere between one-third and one-twelfth of the way toward target in a typical year (see, e.g., Flannery and Rangan, 2006; Kayhan and Titman, 2007; Lemmon, Roberts, and Zender, 2008; Parsons and Titman, 2008). With empirical studies uniformly reporting slow SOA estimates, it is difficult to accept the premise of extant tradeoff theories that rebalancing to target is the sole driver of proactive changes in capital structure undertaken by firms.
1 Transitory debt is not synonymous with short-term debt. Indeed, our model includes only perpetual debt, which managers issue and later retire or perhaps leave outstanding indefinitely as future circumstances dictate. In reality, transitory debt can include bonds, term loans, and borrowing under lines of credit that managers intend to pay off in the short to intermediate term to free up debt capacity. In other words, it is managerial intent (and behavior) that determines whether debt is transitory, and not the stated life or any other contractual feature of a given debt issue.
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
The leverage ratios implied by our estimated model parameters exhibit slow average speeds of adjustment to target in the neighborhood of the estimates in prior SOA studies. Since our estimation procedure does not match model parameter values to real-world data on the basis of SOA, this finding provides additional empirical support for the model. Perhaps more importantly, the fact that firms exhibit slow speeds of adjustment to target in our model does not imply that they have weak incentives to rebalance leverage. On the contrary, we find that firms in our model rebalance aggressively toward target in some, but not all, states of the world, most notably when optimal investment outlays are low. Our analysis implies that the SOA measures used in earlier rebalancing studies understate the strength of firms’ incentives to rebalance leverage because their estimates of the average SOA include financing decisions in which firms choose to move temporarily away from target. We conduct four tests of our model’s ability to explain observed leverage decisions. First, we run simulated method of moments (SMM) estimations for each of 41 two-digit standard industrial classification (SIC) code industries and find that, with the exception of railroads, actual average leverage ratios from Compustat do not differ significantly from the simulated average leverage generated by our industry-level SMM analyses. Second, we regress average industry leverage on the structural parameters from 40 of these estimations (excluding railroads), and find that almost all of the model-specified attributes of investment opportunities exert statistically significant influences on leverage in a manner consistent with the model’s predictions. Third, analysis of debt issuance and repayment decisions by Compustat firms reveals that our model does a significantly better job predicting such decisions than is done by extant tradeoff models in which investment policy is exogenous and firms rebalance leverage subject to capital structure adjustment costs. Fourth, we analyze financing decisions associated with investment ‘‘spikes,’’ and find that even when leverage is currently above average, large investment outlays are typically accompanied by substantial debt issuances that increase leverage. Perhaps the most important ‘‘carry away’’ of the paper is that our evidence indicates that a simple dynamic capital structure model in which firms have leverage targets and issue transitory debt to fund investment does a good job explaining industry leverage, and is markedly better than extant tradeoff models at explaining debt issuance and repayment decisions and the financing decisions that accompany investment spikes. At the same time, it is also worth noting that our analysis moves beyond existing dynamic capital structure models in that our approach (i) develops the target capital structure and leverage dynamics implications of the opportunity cost of borrowing, (ii) demonstrates that this opportunity cost induces firms to use debt as a transitory financing vehicle, taking deliberate, but temporary deviations from their target capital structures, (iii) establishes that transitory debt implies radically different leverage dynamics from those of adjustment cost models in which all proactive capital structure decisions move firms toward their
237
leverage targets, (iv) formally operationalizes the notion of—and demonstrates the existence of—capital structure targets in a dynamic model with endogenous investment policy, (v) yields new testable implications that link the time-series and cross-sectional variation in firms’ capital structure to variation in the nature of their investment opportunities, (vi) shows that when transitory debt is an important component of capital structure, conventional measures of the speed of adjustment to target leverage markedly understate the strength of firms’ actual rebalancing incentives, and (vii) demonstrates that the types of physical capital stock adjustment costs that firms face affect predicted leverage dynamics and determine whether capital structure targets are unique. Endogeneity of investment policy is critical to our analysis, and variation in investment opportunity attributes is the main driver of our comparative statics predictions. The importance of endogenous investment is also evident in the results of a small but growing literature of dynamic models that explore the interactions of investment policy and capital structure. For example, ¨ Tserlukevich (2008), Morellec and Schurhoff (2010), and Sundaresan and Wang (2006) study the leverage impact of real options. Similarly, Brennan and Schwartz (1984), Hennessy and Whited (2005), Titman and Tsyplakov (2007) and Gamba and Triantis (2008) treat investment as endogenous while focusing, respectively, on debt covenants, taxes, agency issues, and cash holdings.2 Our analysis complements that of these studies by focusing directly on the capital structure impact of variation in investment attributes and, in particular, on the leverage impact of variation in the volatility and serial correlation of investment shocks, the marginal profitability of investment, and the properties of capital stock adjustment costs. (We use the shorthand term ‘‘investment shock’’ to mean a shock to investment opportunities, and not a stochastic shift in the level of investment.) While our model shares several features with Whited (1992) and Hennessy and Whited (2005), e.g., endogenous investment, there are a number of important differences. For example, while Hennessy and Whited (2005, Abstract) conclude that ‘‘there is no target leverage ratio’’ in their analysis, we recognize that a meaningful leverage target does in fact exist in dynamic models of the general type they employ. Our analysis is also differentiated by its consideration of a more realistic set of investment policy features, which both generate a richer set of leverage predictions and enable our model to do a markedly better job than Hennessy and Whited’s model does in matching the empirical volatility of investment. More generally, our
2 Tserlukevich (2008, Table 1) catalogs the assumptions of 10 dynamic capital structure models, and notes that five treat investment as exogenous (Kane, Marcus, and McDonald, 1984; Fischer, Heinkel, and Zechner, 1989; Goldstein, Ju, and Leland, 2001; Strebulaev, 2007; Leary and Roberts, 2005) and five treat it as endogenous (Brennan and Schwartz, 1984; Mauer and Triantis, 1994; Titman and Tsyplakov, 2007; Hennessy and Whited, 2005; Tserlukevich, 2008). Hennessy and Whited (2007), Gamba and Triantis (2008), and Bolton, Chen, and Wang (2009) also treat investment as endogenous, as do Riddick and Whited (2008), who analyze cash balances (but not leverage decisions) in an analytical framework close in spirit to the one we employ.
238
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
analysis is distinctive for the role it assigns to deliberate transitory deviations from target leverage, and for the associated implications it derives for (i) the existence of (and cross-firm variation in) capital structures targets, and (ii) the systematic connections between the nature of investment opportunities and leverage dynamics, including firms’ use of transitory debt to fund investment. We posit a simple dynamic model to sharply highlight the capital structure role of transitory debt, but as we show in Section 7, our conclusions generalize to considerably more complex model settings. These extensions illustrate that any cost of leverage that is increasing in the debt level implies that borrowing today entails an opportunity cost in terms of reduced future ability to borrow at terms the firm currently faces. We begin in Section 2 by explaining the model and presenting the estimation results. Sections 3 and 4 present our comparative statics analysis that establishes the predicted connections between leverage and attributes of investment opportunities. Section 5 analyzes the speed of adjustment to target. Section 6 presents evidence on our model’s ability to explain industry leverage, firms’ debt issuance/repayment decisions, and the leverage changes that accompany investment spikes. Section 7 demonstrates that our conclusions generalize to allow for collateral constraints, endogenous default, and cash holdings simultaneous with outstanding debt. Section 8 summarizes our findings. 2. A simple dynamic model of capital structure
decisions. For brevity, we often refer to z as an ‘‘investment shock’’ to capture the idea that variation in z alters the marginal productivity of capital and therefore the firm’s investment opportunities. The profit function pðk,zÞ is continuous and concave, with pð0,zÞ ¼ 0, pz ðk,zÞ 40, pk ðk,zÞ 4 0, pkk ðk,zÞ o0, and limk-1 pk ðk,zÞ ¼ 0. We use the y standard functional form pðk,zÞ ¼ zk , where y is an index of the curvature of the profit function, with 0 o y o 1, which satisfies concavity and the Inada condition. The shock z takes values in the interval ½z,z and follows a first-order Markov process with transition probability gðzu,zÞ, where a prime indicates a variable in the next period. The transition probability gðzu,zÞ has the Feller property. A convenient parameterization is an AR(1) in logs, lnðzuÞ ¼ rlnðzÞ þvu,
ð1tc Þpðk,zÞdk 0,
2.1. Model setup The firm’s managers select investment and financing decisions to maximize the wealth of owners, which is determined by risk-neutral security pricing in the capital market. The firm’s per period profit function is pðk,zÞ, in which k is capital and z is a shock observed by managers each period before making investment and financing
ð2Þ
in which d is the capital depreciation rate, 0 o d o1, and tc is the corporate income tax rate. Concavity of pðk,zÞ in k and limk-1 pk ðk,zÞ ¼ 0 ensure that k is well-defined. Because k 4 k is not economically profitable, k lies in the interval ½0,k. Compactness of the state space and continuity of pðk,zÞ ensure that pðk,zÞ is bounded. Investment, I, is defined as I kuð1dÞk,
Managers select the firm’s investment and financial policies at each date in an infinite-horizon world so that they must always be mindful of the consequences of today’s decisions on the feasible set of future decisions. Their decisions include (i) investment in real assets, (ii) changes in cash balances, (iii) equity or debt issuances, and (iv) distributions to debt and equity holders. Debt capacity is finite, an assumption that reflects the view that distress costs and/or asymmetric information problems prevent creditors from gauging with precision the firm’s ability to support debt. Equity issuance incurs exogenously given costs, which can be interpreted as flotation or adverse selection costs, as in Myers and Majluf (1984). Cash balances are also costly, an assumption motivated by differential borrowing and lending rates (Cooley and Quadrini, 2001), agency costs (Jensen, 1986; Stulz, 1990), and/or a premium paid for precautionary cash holdings (Keynes, 1936). For simplicity, we refer to such costs as ‘‘agency costs’’ or ‘‘costs of maintaining cash balances.’’
ð1Þ
in which vu has a truncated normal distribution with mean zero and variance s2v . Without loss of generality, k lies in a compact set. As in Gomes (2001), define k as
ð3Þ
in which a prime once again indicates a variable in the next period. The firm purchases and sells capital at a price of one and incurs capital stock adjustment costs that are given by 2 a kuð1dÞk Aðk,kuÞ ¼ gkFi þ k: ð4Þ 2 k The functional form of (4) is standard in the empirical investment literature, and it encompasses both fixed and smooth adjustment costs. See, for example, Cooper and Haltiwanger (2006). The first term captures the fixed component, gkFi , in which g is a constant, and Fi equals one if investment is nonzero, and zero otherwise. The fixed cost is proportional to the capital stock so that the firm has no incentive to grow out of the fixed cost. The smooth component is captured by the second term, in which a is a constant. We include the quadratic component to isolate the effects of smooth adjustment costs, which turn out to have interesting effects on leverage dynamics. The firm can finance via debt, internal cash, current profits, and external equity. Define the stock of net debt, p, as the difference between the stock of debt, d, and the stock of cash, c. Given no debt issuance costs and positive agency costs of holding cash, which are formalized below, a firm never simultaneously has positive values of both d and c because using the cash to pay off debt would leave the tax bill unchanged and reduce agency costs. It follows that d ¼ maxðp,0Þ and c ¼ minð0,pÞ, and so we can
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
parsimoniously represent the model with the variable p and then use the definitions of d and c to obtain debt and cash balances at each point in time. Debt takes the form of a riskless perpetual bond that incurs taxable interest at the after-corporate tax rate rð1tc Þ, while cash earns the same after-tax rate of return (aside from the incremental cost, s, formalized below). For simplicity, we abstract from the effects of personal taxes and debt covenants, which are treated in Miller (1977), Hennessy and Whited (2005), Smith and Warner (1979) and Brennan and Schwartz (1984). We motivate the modeling of a riskless bond from the literature that has focused on adverse selection as a mechanism for credit rationing. Jaffee and Russell (1976) discuss the potential for the quality of the credit pool to decline as the amount borrowed increases, and Stiglitz and Weiss (1981) demonstrate that lenders, recognizing the existence of adverse selection and asset substitution problems, may ration credit rather than rely on higher promised interest rates as a device for allocating funds. Based on this consideration, we assume lenders allocate funds on the basis of a screening process that ensures the borrower can repay the loan in all states of the world. This assumption translates into an upper bound, p, on the stock of net debt: p r p:
ð5Þ
The estimated value of the parameter p leads to a solution for equity value that always exceeds zero, which implies that the firm never defaults. Although the assumption of riskless debt with an exogenously specified upper bound may appear unduly simple and restrictive, we show in Section 7 that relaxing this assumption has no material effect on our results. A value of p greater than zero indicates a positive net debt position, and a value less than zero indicates a positive net cash position. Bounded savings are ensured by the corporate tax on interest earned on cash balances and by the assumption that firms face what we refer to as ‘‘agency costs,’’ as in Eisfeldt and Rampini (2006). For simplicity, we do not bound cash holdings via a stochastic probability of default, as in Carlstrom and Fuerst (1997). The agency cost function is given by sðpÞ ¼ spFc ,
ð6Þ
in which s is a constant and Fc is an indicator variable that takes a value of one if p o0, and zero otherwise. To make the choice set compact, we assume an arbitrary lower bound on liquid assets, p. This lower bound is imposed without loss of generality because of our taxation and agency cost assumptions. As in the case of an exogenously specified upper bound on debt, the assumption that cash equals negative debt has no qualitative effects on our results. The final source of finance is external equity. Equity issuance/distributions are determined simultaneously with investment, debt, and cash, and these decision variables are connected by the familiar identity that stipulates that the sources and uses of funds are equal in each period. To express this identity, we first define eðk,ku,p,pu,zÞ as gross equity issuance/distributions. If
239
eðÞ 4 0, the firm is making distributions to shareholders, and if eðÞ o 0, the firm is issuing equity. As in Hennessy and Whited (2005, 2007) and Riddick and Whited (2008), we model the cost of external equity finance in a reducedform fashion that preserves tractability, which is necessary to estimate the model. The external equity-cost function is linear-quadratic and weakly convex:
fðeðk,ku,p,pu,zÞÞ Fe ðl1 eðk,ku,p,pu,zÞ12l2 eðk,ku,p,pu,zÞ2 Þ li Z 0, i ¼ 1,2, in which l1 4 0 and l2 Z0. The indicator function Fe equals one if eðÞ o0, and zero otherwise. Convexity of fðeðÞÞ is consistent with the evidence on underwriting fees in Altinkilic and Hansen (2000). Net equity issuance/ distributions are then given by eðÞ þ fðeðÞÞ. This quantity must be equal to the difference between the firm’s sources of funds and uses of funds via the identity: eðÞ þ fðeðÞÞ ð1tc Þpðk,zÞ þpupð1 þ rð1tc ÞÞ þ dktc ðkuð1dÞkÞAðk,kuÞ þ sðpÞ:
ð7Þ
The firm chooses ðku,puÞ each period to maximize the value of expected future cash flows, discounting at the opportunity cost of funds, r. The Bellman equation for the problem is Vðk,p,zÞ ¼ max eðk,ku,p,pu,zÞ þ fðeðk,ku,p,pu,zÞÞ ku,pu Z 1 Vðku,pu,zuÞdgðzu,zÞ : ð8Þ þ 1þ r The first two terms represent the current equity distribution net of equity infusions and issuance costs and the third term represents the continuation value of equity. The model satisfies the conditions for Theorem 9.6 in Stokey and Lucas (1989), which guarantees a solution for (8). Theorem 9.8 in Stokey and Lucas (1989) ensures a unique optimal policy function, fku,pug ¼ uðk,p,zÞ, if eðÞ þ fðeðÞÞ is weakly concave in its first and third arguments. This requirement puts easily verified restrictions on fðÞ that are satisfied by the functional forms chosen above. The policy function is essentially a rule that states the best choice of ku and pu in the next period for any (k,p,z) triple in the current period. Intuitively, it tells the firm how much to invest given the tradeoff between the cost of investing and expectations about future productivity. It also positions the firm’s capital structure optimally to balance current financing needs with the possible need to raise debt capital once again in response to future shocks that might materialize. 2.2. Optimal financial policy This subsection develops the intuition behind the model by examining its optimality conditions. For simplicity of exposition, we assume in this subsection that V is once differentiable. This assumption is not necessary for the existence of a solution to (8) or of an optimal policy function. The optimal interior financial policy, obtained by solving the optimization problem (8), satisfies Z 1 1 þðl1 l2 eðÞÞFe ¼ V2 ðku,pu,zuÞdgðzu,zÞ: ð9Þ 1þr
240
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
The left side represents the marginal cost of equity finance. If the firm is issuing equity, this cost includes issuance costs. If the firm is not issuing equity, then this cost is simply a dollar-for-dollar cost of cutting distributions to shareholders. The right side represents the expected marginal cost of debt next period. At an optimum, the firm is indifferent between issuing equity, which incurs costs today, and issuing debt, which entails costs in the future. To see precisely what these costs are, we use the envelope condition. Let m be the Lagrange multiplier on the constraint (5). Then the envelope condition can be written as V2 ðk,p,zÞ ¼ ðð1 þ ð1tc ÞrÞsFc Þð1 þ ðl1 12l2 eðÞÞFe Þ þ m: ð10Þ This condition clearly illustrates the marginal costs of having debt/cash on the balance sheet. The first term in parentheses represents interest payments (less the tax shield). In the case of cash balances, this term represents the benefit of the interest on the cash (less taxes) minus the extra cost of carrying cash. The second term in parentheses captures the fact that this debt service is all the more costly if the firm has to issue external equity to make the payments. Finally, the third term is the shadow value of relaxing the constraint on debt issuance. This last term captures the intuitive point that a firm may want to preserve debt capacity today in order to avoid bumping up against its constraint in the next period. One clear implication of the value of preserving debt capacity is the intertwining of real and financial decisions. In particular, if a particular firm characteristic increases the probability that the firm will optimally want to make a large future investment, that characteristic also implies that the firm preserves debt capacity now. 2.3. Defining a target Hennessy and Whited (2005) state that, in this type of model, there is no single optimal capital structure independent of the current state of the world. Indeed, in our model, capital structure choices are made each period and are state-contingent, exhibiting (local) path dependence. One of our main contributions is the observation that even in this type of setting, firms nonetheless have capital structure targets in a long-run sense. Consider the following thought experiment. What if the firm forms an optimal policy in the face of uncertainty but then happens by chance to face an arbitrarily long sequence of shocks, all of which are neutral (z = 1)? In this case, no new funding requirements arrive randomly, and the firm eventually receives enough internally generated resources to enable it to reach its desired debt level without having to incur the costs of issuing equity. Would its optimal policy converge under this sequence of neutral shocks, and, if so, to a single {k,p} pair or to a range of {k,p} pairs?3 3 The intuition behind this definition of a long-run target capital structure is analogous to that which drives the notion of a target payout ratio in Lintner (1956). Consider a firm for which last period’s dividend
To answer the first part of this question, we define u1(k,p,1) as the first element of the policy function, evaluated at z= 1, and we define uj1(k,p,1) as the first element of the function that results from mapping u(k,p,1) into itself j times. We then define the target capital stock as lying in the interval " # liminf uj1 ðk,p,1Þ, limsup uj1 ðk,p,1Þ : j-1
ð11Þ
j-1
The existence of this interval is determined trivially by the compactness of the state space and the boundedness of u(k,p,z). For each capital stock in this interval, there is exactly one optimal level of p because the value function for this class of models is strictly concave (Hennessy and Whited, 2005). In intuitive terms, for any given k, there cannot be two values of p that yield the same maximum value. Of course, because u(k,p,z) has no closed-form solution, we use simulation to solve for the target and to determine its exact form. As elaborated in Section 4.3, whether or not the firm has a unique leverage target depends on whether it has a unique capital stock target. The target is a special case (i.e., limit) of the policy function. Therefore, like the state-contingent optimum defined by the policy function, the target also positions the firm optimally to raise capital in the future, given the nature of the uncertainty it faces. In addition, the particular limit of the policy function that we use to define a target is economically relevant. It isolates the long-run tax and opportunity cost incentives for optimizing capital structure, while abstracting from optimal financing decisions that are at least in part due to the need to finance specific investment projects. 2.4. Model estimation Because the solution of the model must be obtained numerically, the quantitative properties of the model can depend on the parameters chosen. To address concerns about this dependency, we select parameters via structural estimation of the model. This procedure helps ensure that the parameters chosen produce results that are relevant given observed data. We use simulated method of moments (SMM), which chooses model parameters that set moments of artificial data simulated from the model as close as possible to corresponding realdata moments. We estimate the following parameters: profit function curvature, y; shock serial correlation, r; shock standard deviation, sv ; smooth and fixed physical adjustment costs, a and g; the agency cost parameter, s; (footnote continued) and this period’s earnings give it an actual payout ratio below its longrun target payout ratio. Suppose the firm experiences a series of neutral earnings shocks, i.e., repeated realizations of this period’s earnings. The firm will respond by increasing dividends over time so that its actual payout ratio converges to its long-run target. In the Lintner model, the firm virtually never has an actual payout ratio equal to target, but the existence of a long-run target payout ratio represents an economic force that governs the dynamics of dividend policy. In our model, the existence of a long-run target capital structure governs leverage dynamics in the same sense.
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
241
Table 1 Simulated moments estimation. Calculations are based on a sample of nonfinancial, unregulated firms from the annual 2007 Compustat industrial files. The sample period is 1988–2001. Estimation is done with SMM, which chooses structural model parameters by matching the moments from a simulated panel of firms to the corresponding moments from the data. The first panel reports the simulated and estimated moments and the t-statistics for the differences between the corresponding moments. All moments are self-explanatory, except the serial correlation and innovation to income. These moments are the slope coefficient and error variance from a first-order autoregression of the ratio of income to assets. The second panel reports the estimated structural parameters, with standard errors in parentheses. l1 and l2 are the linear and quadratic costs of equity issuance. sv is the standard deviation of the y
innovation to lnðzÞ, in which z is the shock to the revenue function. r is the serial correlation of lnðzÞ. y is the curvature of the revenue function, zk . g and a are the fixed and convex adjustment cost parameters, and s is the agency cost parameter. p=kss is the debt ceiling expressed as a fraction of the steady state capital stock, kss. The J-test is the w2 test for the overidentifying restrictions of the model. Its p-value is indicated in parentheses. A. Moments Actual moments
Simulated moments
0.0385 0.0118 0.2393 0.0407 3.6707 0.0534 0.1915
0.0329 0.0165 0.2423 0.0231 3.9549 0.0363 0.1903
0.4150 1.8220 0.3347 2.0196 0.4181 0.5629 0.0352
0.6635
0.6488
0.2360
0.0048
0.0015
0.2974
0.1399 0.0079 0.1868
0.1350 0.0051 0.1657
0.3395 1.1062 1.1146
Variance of investment (I/k) Variance of leverage (d/k) Average leverage (d/k) Average equity issuance (e/k) Average Tobin’s q ((V +p)/k) Third central moment of investment (I/k) y
Average operating income ðzk =kÞ y
Serial correlation of income ðzk =kÞ y
Variance of the innovation to income ðzk =kÞ Average cash balances (c/k), conditional on c 4 0 Variance of equity issuance (e/k) Average investment (I/k)
t-Statistics
B. Parameter estimates
l1
l2
sv
r
y
g
a
s
p=kss
J-test
0.1615 (0.0164)
0.0041 (0.4662)
0.2843 (0.0479)
0.7280 (0.1790)
0.7880 (0.0673)
0.0034 (0.0016)
0.1519 (0.0092)
0.0077 (0.0157)
0.7196 (0.0143)
2.0360 (0.5650)
the two external equity cost parameters, l1 and l2 , and the ratio of the debt limit, p, to the steady state capital stock, kss, that would prevail in the absence of financing or physical adjustment frictions.4 We estimate the latter parameter to ensure that the firm’s debt limit reflects the asset base that serves as collateral and, more generally, to address concerns that average leverage from our model might be hard-wired by an arbitrary choice of p. Appendix A contains details concerning the model’s numerical solution, the data, the choice of moments, and the estimation. Table 1 presents the estimation results, with Panel A reporting actual and simulated moments with t-statistics for the difference between the two, and Panel B reporting parameter point estimates and a J-test for general specification. The J-test does not provide a rejection at even the 10% level, implying that the model provides a good overall match to the set of moments viewed collectively. Most simulated moments in Panel A match the corresponding data moments well. In particular, the simulated and actual variances of investment are eco-
4 The steady state capital stock is kss ¼ ðyð1tc Þ=ðr þ dÞÞ1=ð1yÞ . It equates the marginal product of capital with the user cost: r þ d. In our model, kss is always close to the average simulated capital stock.
nomically and statistically indistinguishable. In contrast, the models in Hennessy and Whited (2005, 2007) fail to match this particular moment. We attribute this result to the presence of physical adjustment costs in our model and the absence of such costs in their models. Only one moment, average equity issuance, has a simulated value that differs significantly from its corresponding average value in the data. While the quantitative gap between actual and simulated moments is not large, this finding suggests that the model could be improved by the introduction of debt issuance costs, since equity is more attractive as a marginal financing vehicle when it is costly to issue debt. And in fact, when we include such costs (see Section 7), the equity issuance difference is no longer statistically significant (and all other simulated moments remain close to those in Table 1 and insignificantly different from their average values in the data). We maintain the assumption of no debt issuance costs to establish clearly that our leverage predictions are driven by the value of preserving the option to borrow, and not by any other impediment to the use of debt. The point estimates of the profit function curvature (y) and of the serial correlation of profit shocks (r) in Panel B of Table 1 are qualitatively similar to those in Hennessy and Whited (2005, 2007). The estimates of the external equity cost parameters, l1 and l2 , and the standard
242
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
deviation of shocks (sv ) are, however, much higher than the estimates from their models. The reason makes intuitive sense. Our upper limit on debt does not have as much of a dampening effect on leverage as does the modeling of financial distress in Hennessy and Whited’s papers. Therefore, in order for simulated average leverage to be as low as actual average leverage, other parameters in our model need to adjust to hold simulated leverage down. As explained in detail below, both high external equity costs and high shock volatility work to lower optimal average leverage. Our estimates of the physical adjustment cost parameters, a and g, are also comparable to those in previous studies. For example, they display a different, but understandable, pattern from the estimates in Cooper and Haltiwanger (2006). The estimate of our fixed cost parameter is smaller, while the estimate of our smooth cost parameter is higher. This result makes sense because we estimate these parameters with firm-level data, which are substantially smoother than the plantlevel data that they use. Our estimated agency costs are small and statistically insignificant because we operationalize this variable as the marginal cost of maintaining cash balances over and above the statutory tax penalty for holding cash. Finally, the estimate of p=kss is quite high at 0.71. This level is much higher than the model-simulated average leverage, which is approximately 0.24. This large difference indicates that our model predicts that firms set leverage conservatively relative to their debt capacities. This result is remarkable and instructive, given that the only force in the model keeping leverage down is the value of preserving debt capacity for future use.
3. Comparative statics: illustrations and preliminary results This section provides a highly simplified example to illustrate firms’ incentives to issue and retire transitory debt (Section 3.1), and presents our predictions regarding the impact of financing frictions on the average amount of debt employed by firms (Section 3.2).
3.1. Predicted capital structure paths: a simple example Consider a firm that faces baseline model parameter values (per Section 2) and, for simplicity of illustration, assume temporarily that there is no corporate tax benefit to borrowing. Assume that the firm currently has no debt outstanding so that, given the absence of a tax incentive to borrow, it is at its target capital structure. Assume also that an investment shock arrives with an associated large funding need that the firm cannot fully meet from cash balances and current cash flow. (In our model, firms use internal sources of capital before borrowing because of the costs of maintaining cash balances.) Finally, assume for the moment that managers issue debt to raise the remaining funds they need because equity issuance entails direct costs, whereas debt issuance does not. (As discussed below, in our model, managers sometimes issue equity even when debt capacity is available.) If
managers do issue debt today, they will treat that debt as purely transitory. Intuitively, the ability to issue debt is valuable because borrowing is a low (zero in our model) transactions cost means of raising cash to fund investment, and so a firm that borrows today will subsequently seek to pay off debt to be able to borrow again if and when future funding needs arise. Fig. 1 plots a realized leverage path for a firm that has no tax motive to borrow, so that its target leverage is zero and any and all debt that it issues is transitory.5 The figure documents how leverage responds to a sequence of investment shocks, and illustrates three key points. First, transitory debt increases as the firm borrows to meet shock-induced funding needs, and then recedes with a lag as the firm pays down debt and returns to its zero-debt target. Second, full payoff of the debt—i.e., rebalancing leverage back to target—can take multiple periods because of the arrival of multiple investment shocks, each with a new funding need. Third, the amount of debt that the firm has outstanding, on average, is positive and markedly greater than target, and so when a firm issues material amounts of transitory debt, its time-series average leverage ratio may not be a good indicator of its leverage target. We would add that the firm has a positive cash balance target (per Section 2.3) that embeds the option value of drawing down cash to meet funding needs—hence, a time-series average of outstanding cash balances will generally include such draw downs and therefore provide a downward-biased estimate of the firm’s cash target. Our model recognizes that firms’ financing decisions are considerably more complicated than in this simple example. In general, managers must decide whether to issue debt to meet an immediate cash need generated by today’s investment shock given that future shocks may soon arrive, rendering debt capacity even more scarce, while also considering the likelihood that future cash flow realizations may be inadequate to retire debt. Our model simulations indicate that firms’ financing decisions generally do not follow the static pecking order described by Myers and Majluf (1984) and Myers (1984).6 Specifically, 5 Specifically, firms continue to incur the same total cost of holding cash as in the Section 2 model, which equals the sum of the marginal agency cost, s, and the tax penalty for holding cash, tc r. We adopt this approach in this example because our SMM procedure estimates the total cost of cash balances, and we define the marginal agency cost as that total cost minus tc r, an algorithm that assigns maximal weight to taxes. Since tc r is the largest possible tax penalty and since many realworld firms have no taxable earnings, this allocation likely underestimates the non-tax costs of holding cash. We use this approach only in simplified examples to illustrate the intuition of the model, and not in our comparative statics analysis. 6 Myers (1984, p. 590) notes that, once one moves beyond the single investment outlay decision modeled by Myers and Majluf (1984), firms may issue equity even when debt capacity is not exhausted. Viswanath (1993) and Chang and Dasgupta (2003) extend Myers and Majluf to include a second investment decision, and formally establish this result. Lemmon and Zender (2007) also note that firms may save debt capacity for future use in dynamic versions of Myers and Majluf. Our analysis moves beyond these studies by establishing two fundamental results. First, firms have leverage targets that incorporate an amount of unused debt capacity that reflects the optimal future (state-contingent) utilization
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
0.4
243
Leverage Debt issuance/retirement
Debt-to-assets ratio
0.3 0.2 0.1 0 1
6
11
16
21
-0.1
26
31
36
41
46
51
56
61
Time
-0.2 Fig. 1. Illustrative time path of leverage and debt issuance/retirement with no tax incentive to borrow. The time path is generated from the model in Section 2 with estimated parameters from Table 1, in which the tax incentive to borrow has been set to zero. As such, zero debt is the long-run capital structure target.
if managers of a firm with unused debt capacity assess a sufficiently high probability that future funding needs would force them to incur higher equity issuance costs in a present value sense (because borrowing today leaves the firm with inadequate debt capacity), they forego borrowing and instead issue costly equity to meet an immediate funding need. In general, the rational financing response to any given investment shock depends not only on the volatility and serial correlation of those shocks, but also on firm profitability and the nature of any costs of adjusting the stock of physical capital.
3.2. The capital structure impact of variation in financing frictions To obtain comparative statics results, we start with the baseline parameter values (from Section 2’s tax-inclusive SMM estimation) and analyze the model for a significant range of parameter values around each baseline value. For each set of specific parameter values, we run the model for 100,200 periods, with each firm receiving random investment shocks and responding to each by adjusting its investment and financing decisions optimally. We discard the initial 200 periods of data and, from the remaining 100,000 observations, we record economically relevant, empirically quantifiable measures of a firm’s capital structure decisions, e.g., its average debt-to-assets ratio. We interpret the resultant large sample statistics as expected values implied by the specific parameterization of the model. We repeat the exercise for different combinations of model parameters. We then generate testable predictions based on the difference in the expected value of a given capital structure variable (footnote continued) of the option to borrow. Second, because unused debt capacity is a valuable asset, firms have the incentive to follow decisions to borrow and deviate from target by subsequently rebalancing toward target.
associated with an underlying difference in the model’s parameter values. Table 2 reports expected leverage (Panel A) and the standard deviation of leverage (Panel B) as a function of the costs of issuing equity and of maintaining cash balances, with leverage measured as the debt-to-assets ratio (d/k). Each panel contains a 5 5 matrix whose elements are the model’s predicted (leverage or leverage volatility) values as a function of different costs of accessing external equity (columns) and of maintaining cash balances (rows). For example, the northeast corner of the matrix in Panel A reports the expected d/k ratio for the model specified with relatively high costs of accessing external equity coupled with relatively low costs of maintaining cash balances. Table 2 yields three main findings. First, average leverage is well above zero (never less than 19.3% of assets) under all cost specifications, as one would expect since firms in our model capture tax benefits from debt. Second, variation in the costs of maintaining cash balances has only a modest influence on the cross-firm variation in average leverage and in the volatility of leverage (scan each column in Panels A and B). The intuitive explanation is that corporate taxes themselves provide strong incentives to maintain positive net debt, and so an increase in the cost of holding cash does little on the margin to induce firms to rely more on debt and less on cash balances. Third, average leverage is around 20–30% of assets for all cost specifications except when firms face low costs of accessing external equity, in which case the expected amount of debt is much higher at around 80% of total assets (Panel A), and leverage volatility is around 50% (Panel B). The third finding contrasts sharply with standard static model reasoning about capital structure, which indicates that cheaper access to equity encourages firms to adopt lower leverage ratios. Our finding indicates that a firm typically maintains higher leverage when it faces lower costs of accessing equity capital to meet financing needs. Since the corporate tax subverts a firm’s incentive to hold
244
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
Table 2 Average and standard deviation of the debt-to-assets ratio. The average and standard deviation of the debt-to-assets ratio, d/k, are expressed as a function of equity access costs and of agency costs of cash balances. Panel A reports average leverage, and Panel B reports the standard deviation of leverage. We start with the baseline model (per Section 2’s SMM estimation results) and consider a significant range of parameter values around each baseline parameter value. Here, we consider variation in (i) the linear cost of accessing outside equity, l1 (which varies from 0.001 to 0.3 across the columns of the table) and (ii) the marginal agency cost, s, which varies from 0.0 to 0.05 down the rows. For these experiments, the quadratic cost of equity, l2 is set to zero. For each combination of parameter values, we run the model for 100,200 periods, with the firm receiving random productivity shocks and responding to each by adjusting its investment and financing decisions. We discard the initial 200 periods of data, and record and report the average of the debt-to-assets ratio, d/k, and the standard deviation of d/k for the remaining 100,000 observations. A. Average debt-to-assets ratio (d/k) Cost of maintaining cash balances:
Linear cost of accessing external equity: Low
Low
High
High
0.803 0.806 0.801 0.806 0.794
0.313 0.324 0.326 0.328 0.328
0.232 0.266 0.271 0.272 0.275
0.210 0.246 0.253 0.258 0.265
0.193 0.234 0.240 0.243 0.254
0.498 0.497 0.499 0.498 0.495
0.099 0.103 0.104 0.102 0.103
0.093 0.098 0.099 0.102 0.101
0.095 0.100 0.100 0.102 0.102
0.093 0.101 0.104 0.105 0.105
B. Standard deviation of d/k Low
High
cash balances, ‘‘levering up’’ is all the more costly in our dynamic analysis because it utilizes debt capacity that could have been preserved as a substitute future source of capital. However, the opportunity cost of utilizing debt capacity loses much of its deterrent value to ‘‘levering up’’ today when it is cheap to tap equity markets. Hence, when firms have the safety-valve alternative of issuing equity at low cost, they will ‘‘lever up’’ and remain highly levered to capture interest tax shields, even though doing so sacrifices their ability to tap the debt market in the future.7 4. Comparative statics: leverage and investment opportunities This section presents comparative statics results that predict how firms’ leverage decisions vary with the nature of their investment opportunities. Section 4.1 examines the impact of variation in investment shock volatility on average leverage, leverage volatility, and a broad variety of other dimensions of capital structure. Section 4.2 analyzes how these leverage predictions differ for firms whose investment opportunities are characterized in turn by (i) high as opposed to low shock serial correlation, (ii) high rather than low marginal profitability, and 7 This point is conceptually distinct from the idea discussed by Myers (1984), Viswanath (1993), and others that firms time equity issuances in periods when information asymmetries imply temporarily low equity issuance costs, and obtain capital from debt issuances and cash balances in periods with high equity issuance costs. These authors offer a prediction about the particular times at which firms make marginal financing decisions, whereas our statement is about the average leverage ratios that firms maintain over time.
(iii) lumpy versus smooth investment outlays attributable to differences in the fixed and convex costs of capital stock adjustment. Section 4.3 delineates the model’s predictions regarding variation in target leverage as a function of differences in the nature of firms’ investment opportunities. 4.1. The capital structure impact of variation in investment shock volatility Table 3 summarizes the predicted capital structure impact of variation in the volatility of investment shocks. The rows of the table list capital structure attributes and the columns detail the predicted impact of various investment shock volatilities centered around the baseline values (per Section 2), with standard deviations ranging from 15% to 50%. The model predicts that average investment as a percent of assets (I/k) is somewhat higher for firms subject to high shock volatility (row 1), while the standard deviation of I/k is markedly higher (row 2) and the frequency of investment is a bit lower (row 3) for such firms. For brevity here and throughout, shock volatility refers to the standard deviation of the error term in the investment shock generating process (1), and leverage volatility refers to the time-series standard deviation of the debt-to-assets ratio. Table 3 indicates that firms facing low shock volatility have a debt-to-assets ratio of 0.722, on average, whereas firms facing high shock volatility have average leverage of only 0.091 (row 4). The former firms always carry some debt, whereas the latter have no debt outstanding almost 40% of the time (row 10), reflecting their strong incentives to preserve debt capacity and to accumulate greater cash balances (row 8) to meet the potentially substantial funding needs that can arise with future investment shocks.
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
245
Table 3 Capital structure and investment shock volatility. This table reports a variety of summary statistics from simulations of the baseline model. We simulate the model for 100,200 periods, with the firm receiving random investment shocks and responding to each by adjusting its investment and financing decisions. We discard the initial 200 periods of data. Each column reports statistics for a different model simulation, each of which corresponds to a different standard deviation of the investment shock. We let this standard deviation range from 0.15 to 0.5. Average transitory debt is the mean value of leverage (d/k) minus target, conditional upon leverage exceeding target. Standard deviation of investment shocks: Low
Moderate
High
1. 2. 3.
Average investment (I/k) Standard deviation of investment (I/k) Frequency of investment
0.158 0.129 0.852
0.160 0.145 0.833
0.163 0.166 0.791
0.167 0.187 0.775
0.169 0.197 0.763
0.171 0.211 0.754
0.172 0.214 0.767
0.171 0.213 0.772
4. 5.
Average debt-to-assets ratio (d/k) Standard deviation of leverage (d/k)
0.722 0.110
0.508 0.089
0.336 0.101
0.203 0.096
0.133 0.087
0.104 0.077
0.091 0.072
0.091 0.069
6. 7. 8. 9.
Average net debt ((d c)/k) Standard deviation of net debt Average cash balances to assets (c/k) Standard deviation of (c/k)
0.722 0.110 0.000 0.000
0.508 0.089 0.000 0.000
0.333 0.127 0.002 0.083
0.190 0.145 0.010 0.153
0.081 0.227 0.046 0.303
0.011 0.272 0.085 0.306
0.010 0.261 0.091 0.269
0.007 0.248 0.088 0.236
10.
Frequency of positive debt outstanding
1.000
1.000
0.965
0.908
0.779
0.672
0.639
0.627
11. 12.
Average of positive leverage values Average of positive cash balance values
0.722 0.000
0.508 0.000
0.348 0.091
0.223 0.165
0.171 0.283
0.155 0.334
0.142 0.322
0.146 0.302
13. 14.
Debt issuance frequency Debt repayment frequency
0.468 0.521
0.463 0.534
0.451 0.528
0.438 0.499
0.393 0.436
0.341 0.390
0.330 0.367
0.328 0.356
15. 16.
Average debt issuance/assets Average debt repayment/assets
0.088 0.069
0.092 0.070
0.106 0.076
0.114 0.075
0.106 0.069
0.111 0.067
0.106 0.064
0.104 0.064
17. 18.
Equity issuance frequency Average equity issuance/assets
0.255 0.028
0.257 0.024
0.261 0.022
0.255 0.018
0.245 0.019
0.237 0.018
0.239 0.018
0.236 0.017
Average fraction of investment funded from: 19. Current cash flow 20. Cash balances 21. Debt issuance 22. Equity issuance
0.840 0.000 0.143 0.017
0.847 0.000 0.135 0.018
0.832 0.004 0.148 0.016
0.823 0.017 0.148 0.012
0.816 0.047 0.125 0.012
0.802 0.076 0.112 0.009
0.807 0.083 0.098 0.011
0.810 0.086 0.093 0.011
23. 24. 25. 26.
0.748 0.398 0.027 0.065
0.503 0.645 0.005 0.057
0.304 0.708 0.032 0.092
0.032 0.890 0.170 0.195
0.000 0.779 0.133 0.170
0.000 0.675 0.104 0.155
0.000 0.642 0.091 0.142
0.000 0.631 0.092 0.146
Target Incidence of leverage above target Average deviation from target Average transitory debt
The latter needs translate to higher volatility of investment (row 2) and somewhat higher average investment (row 1) for high as opposed to low shock volatility firms. Table 3’s most notable result is that, even in the face of corporate tax incentives to borrow, low leverage is the predicted norm for firms that face high investment shock volatility. Intuitively, high volatility implies a greater probability that large investment outlays will be optimal, and so the firm preserves debt capacity to address the commensurately large need for external finance. Variation in firms’ investment opportunities—and in particular their potential future funding needs—may therefore help resolve the ‘‘debt conservatism’’ puzzle that Graham (2000) poses, i.e., that it is difficult to explain why some firms maintain low leverage despite strong tax incentives to borrow. Such variation may also help resolve Miller’s (1977) closely related ‘‘horse and rabbit stew’’ criticism that the corporate tax benefit of borrowing swamps expected bankruptcy costs, leading traditional tradeoff models to predict unrealistically high leverage ratios, and in effect raising the question: what factors are missing from these tradeoff models? The answer offered by the static models of Miller (1977) and DeAngelo and
Masulis (1980), among others, is that attributes of the personal and corporate tax codes reduce firms’ incentives to borrow. The answer offered by our dynamic model is that, with high investment shock volatility, low leverage is desirable despite the foregone corporate tax benefits because it preserves the option to borrow to fund investment. Table 3 further indicates that low shock volatility firms have higher leverage volatility than high shock volatility firms (row 5), which reflects the latter’s tendency to hold large cash balances (rows 8, 12, and 6) as well as their higher volatility of cash balances and net debt (rows 9 and 7). Low shock volatility firms eschew large cash holdings (row 8) in part because these holdings trigger taxes, but also because, given the relatively high predictability of their funding needs (row 2), they can forego preserving large amounts of debt capacity to address such needs—hence, low shock volatility firms find it attractive to have consistently high leverage and negligible cash balances. For all shock volatility levels in Table 3, current cash flow is by far the most important source of funding for investment (rows 19–22), with debt issuances occurring more frequently than equity issuances (rows 13 and 17) and in larger amounts (rows 15 and 18). Debt reductions
246
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
Table 4 Capital structure comparative statics. This table reports a variety of summary statistics from simulations of the baseline model. We simulate the model for 100,200 periods, with the firm receiving random investment shocks and responding to each by adjusting its investment and financing decisions. We discard the initial 200 periods of data. Each column reports statistics for a different model simulation. The first two are for low and high shock serial correlation, set at 0.1 and 0.9. The next two are for low and high y, the parameter governing the marginal profitability of capital, set at 0.4 and 0.9. The last two are for smooth and lumpy investment. For smooth investment we set the convex adjustment cost parameter at 0.3 and the fixed adjustment cost parameter at 0.0. For lumpy investment we set the convex cost parameter to 0.0 and the fixed cost parameter to 0.04. Average transitory debt is the mean value of leverage (d/k) minus target, conditional upon leverage exceeding target. Shock serial correlation Low
High
Marginal profitability Low
High
Optimal investment Smooth
Lumpy
1. 2. 3.
Average investment (I/k) Standard deviation of investment (I/k) Frequency of investment
0.151 0.051 0.998
0.178 0.244 0.816
0.161 0.150 0.828
0.164 0.171 0.772
0.158 0.129 0.872
0.251 0.742 0.426
4. 5.
Average debt-to-assets ratio (d/k) Standard deviation of leverage (d/k)
0.889 0.038
0.085 0.084
0.711 0.192
0.094 0.083
0.400 0.087
0.088 0.134
6. 7. 8. 9.
Average net debt ((d c)/k) Standard deviation of net debt Average cash balances to assets (c/k) Standard deviation of (c/k)
0.889 0.038 0.000 0.000
0.453 1.413 0.530 1.894
0.711 0.192 0.000 0.000
0.004 0.258 0.090 0.252
0.399 0.089 0.000 0.023
0.184 0.669 0.207 0.811
10.
Frequency of positive debt outstanding
1.000
0.527
1.000
0.644
0.998
0.456
11. 12.
Average of positive leverage values Average of positive cash balance values
0.889 0.000
0.161 1.256
0.711 0.000
0.146 0.318
0.400 0.083
0.194 0.528
13. 14.
Debt issuance frequency Debt repayment frequency
0.024 0.017
0.263 0.272
0.420 0.437
0.331 0.385
0.489 0.503
0.217 0.390
15. 16.
Average debt issuance/assets Average debt repayment/assets
0.027 0.034
0.100 0.059
0.030 0.028
0.101 0.065
0.060 0.052
0.564 0.138
17. 18.
Equity issuance frequency Average equity issuance/assets
0.165 0.016
0.293 0.039
0.061 0.020
0.253 0.015
0.248 0.020
0.248 0.032
0.987 0.000 0.002 0.011
0.809 0.092 0.067 0.028
0.975 0.000 0.022 0.003
0.786 0.090 0.114 0.009
0.899 0.000 0.089 0.011
0.671 0.126 0.198 0.005
0.904 0.037 0.015 0.012
0.000 0.532 0.085 0.160
0.702 0.501 0.009 0.166
0.000 0.643 0.094 0.146
0.371 0.712 0.035 0.084
0.000 0.593 0.129 0.217
Average 19. 20. 21. 22. 23. 24. 25. 26.
fraction of investment funded from: Current cash flow Cash balances Debt issuance Equity issuance Target Incidence of leverage above target Average deviation from target Average transitory debt
occur roughly as often as debt issuances (rows 13 and 14), reflecting firms’ incentives to pay down debt today to free up debt capacity for future use, even though a cost of doing so is the loss of tax benefits. Although in our model firms have positive debt and cash balances on average, they do not carry both simultaneously. With or without corporate taxes, firms with positive cash balances and debt are always better off if they use the cash to retire debt and thereby avoid the costs of maintaining cash balances, while freeing up debt capacity. Of course, real-world firms do simultaneously borrow and hold cash, most obviously because they require some cash to operate the business, a motive that is easy to incorporate in our analytics and that does not change our transitory debt predictions. Gamba and Triantis (2008) note that, by accumulating cash while debt is outstanding, firms can reduce future debt issuance costs. We exclude direct costs of debt issuance from the model posited in Section 2 to highlight our point that the opportunity cost of issuing debt today (i.e., the debt capacity that is no longer available for borrowing
tomorrow) is by itself an impediment to borrowing. Section 7 shows that our qualitative conclusions remain unchanged when we add debt flotation costs to the model and allow firms to carry debt and cash balances simultaneously. In this case, firms are less aggressive in both borrowing and paying down debt, but they still treat debt capacity as a scarce resource and use debt as a transitory financing vehicle. Firms that face low shock volatility have relatively predictable funding needs and therefore see correspondingly little value from having unused debt capacity, and so they have high target leverage ratios (row 23 of Table 3) and, on average, have leverage ratios close to target (row 25). Firms that face moderate to high shock volatility have lower leverage targets (row 23) because they place greater value on the option to borrow to meet investment needs, and they accordingly have a larger gap between average and target leverage (row 25), which reflects a greater amount of transitory debt outstanding, on average (row 26). In Table 4 (discussed below), we compare target versus average leverage ratios for firms with different
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
shock serial correlations, marginal investment profitability, and capital stock adjustment costs. As in Table 3, target leverage is higher and average leverage is closer to target when firms have less use for unused debt capacity to fund investment (compare rows 23 and 25 of each column pair in Table 4).8 These comparisons from Tables 3 and 4 indicate that average leverage is a good proxy for target when having the option to borrow to fund investment is of little value, while average leverage markedly exceeds target when the option to borrow is of substantial value, with that option value manifesting in a markedly lower target debt ratio. 4.2. Serial correlation, profitability, and capital stock adjustment costs Table 4 summarizes the capital structure impact of varying the serial correlation of investment shocks, the marginal profitability of investment, and the degree of smoothness in investment outlays. (Smooth investment is generated by no fixed costs of adjusting the capital stock coupled with high convex costs, while lumpy investment is induced by high fixed costs coupled with no variable adjustment costs.) For brevity, Table 4 reports predicted capital structure values for ‘‘high’’ and ‘‘low’’ values of the first two parameters and contrasts smooth versus lumpy investment for the latter, with all other parameters held constant. The table indicates that firms that have high shock serial correlation, high marginal profitability, or lumpy optimal investment programs have relatively low average leverage ratios compared to those with the opposite attributes (row 4). Firms with the former investment characteristics typically forego large tax benefits of debt to preserve debt capacity that can be tapped to fund their more volatile prospective investment outlays (row 2). The reasons for the attraction of a conservative capital structure differ depending on the investment attribute. The higher the serial correlation of investment shocks, the more likely a current large shock will soon be followed by another shock, with an additional material need for funds. High serial correlation also implies that optimal investment outlays tend to be large because the profitability of these investments is expected to persist. Similarly, the higher a firm’s marginal profitability of investment (i.e., the y parameter), the larger is its optimal investment outlay in response to a given shock, and the possibility of a large funding need induces the firm to maintain conservative leverage, on average. Finally, holding constant the fixed component of capital stock adjustment costs, the lower the convex component of those costs, the more responsive is investment to shock arrival, and the more variable is the resultant optimal time profile of investment (Cooper and Haltiwanger, 2006). Accordingly, lower convexity in capital stock adjustment costs trans8 Although Section 4.3 explains why some firms have a range of leverage targets, all but one of the parameter values underlying Tables 3 and 4 imply unique targets, i.e., any range of target values for a given firm is less than the grid for p in the model. For high fixed adjustment costs, the target is a range, and we report the upper bound.
247
lates to less predictability in funding needs, and therefore to greater value from preserving debt capacity. The same intuition explains the higher average cash balances and lower net debt of firms with high shock serial correlation, high marginal profitability, and lumpy investment outlays (rows 8 and 6). The volatility of cash balances and net debt are also markedly higher for these firms as opposed to those with the opposite investment attributes (rows 9 and 7). Such firms also exhibit higher volatility of cash balances than of leverage (rows 9 and 5), which reflects their large build-up of cash balances in anticipation of future funding needs followed by large subsequent cash draw downs—coupled with incremental borrowing—when those needs do manifest. In all cases in Table 4, cash flow realizations are the main source of funds for new investment (rows 19–22), with equity issuance typically covering only a small fraction of investment. The latter property conforms to real-world financing patterns and thereby provides something of an ‘‘out of sample’’ check on the model, given that our SMM estimation procedure does not match on any ‘‘source of funds’’ moments. 4.3. Target capital structures Although capital structures exhibit path dependence locally, they are also globally self-correcting in the sense that, when managers find it optimal to borrow and deviate (or deviate further) from target leverage, they subsequently have incentives to return the firm to target by paying down debt as circumstances permit. The reason is that the option to borrow is valuable because it enables the firm to avoid more costly forms of financing in future periods, and so reducing debt is attractive because it restores that option. Analytically, a given firm’s target capital structure is the optimal matching of debt and assets to which that firm would converge if it optimized its debt and assets decisions in the face of uncertainty but then were to receive only neutral investment shocks (z= 1) for many periods in a row. Absent taxes, the model yields an analytically simple characterization of target capital structure—zero debt is the target for all firms. With taxes, target capital structures typically contain some debt, which enables firms to capture interest tax shields on a permanent basis, and different firms have different leverage targets, which depend on the characteristics of their investment opportunities. The capital structure target in our (tax-inclusive) model is either a fixed ratio of debt to total assets or a range9 of such ratios, depending on the precise structure of the costs that a firm faces from adjusting its stock of 9 Many capital structure models are characterized by a range of target leverage ratios rather than by a unique target ratio independent of the scale of the firm. For example, in static tradeoff models such as Robichek and Myers (1966), Kraus and Litzenberger (1973), and DeAngelo and Masulis (1980), there is no single fixed ratio of debt to assets (or debt to market value) that is optimal independent of scale, unless one imposes restrictive assumptions on the functional forms of investment opportunities and bankruptcy/agency costs.
248
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
physical capital. We consider three cases, each characterized by a different specification of capital stock adjustment costs. In case #1, firms face no such costs and, in this case, it is easy to demonstrate that the optimal capital stock at any point in time is the level that equates the price of capital goods with the shadow value of capital. Because the value function is strictly convex, a neutral shock (z= 1) corresponds to a uniquely optimal level of the capital stock, k*, which remains constant in the face of a repeated sequence of neutral shocks. Strict convexity of the value function then implies a unique target level of net debt, p*, and therefore a unique target level of debt, d* ¼ maxðp*,0Þ, and an associated fixed target leverage ratio, d*/k*. In case #2, there are no fixed costs of adjusting the capital stock, but firms face variable costs of adjusting capital that are convex in the rate of investment (I/k). In this case, if a firm receives a long series of neutral shocks, it also converges to a unique capital stock, although this level generally differs from that which obtains in the zero adjustment cost case (case #1 above) because the expected future marginal product of capital incorporates potential future adjustment costs, as discussed by Cooper and Willis (2004). The reasoning is as follows. Because the adjustment cost function is convex in the rate of investment, Jensen’s inequality implies that the firm’s optimal policy in the face of uncertainty is to avoid changing its rate of investment, except in response to a shock. If the firm receives a long series of neutral (z= 1) shocks, the firm keeps investment constant at a rate that just allows for replacement of depreciated capital. The capital stock therefore remains constant at a level k* (generally different from case #1). This rate equates the marginal adjustment and purchasing costs with the shadow value of capital. As in case #1, in case #2 a unique target capital stock, k*, implies a unique target level of debt, d*, and a unique target leverage ratio, d*/k*. In case #3, firms face only fixed costs of capital stock adjustment. In this case, a firm’s optimal investment policy (in the limit after a series of neutral shocks) is not to maintain a constant capital stock, but to allow that stock to depreciate from an upper to a lower bound, at which point it invests to restore the depreciated capital. (See Caballero and Leahy, 1996; Caballero, 1999; Whited, 2006) The upper bound is the optimal level of the capital stock at which the shadow value of capital equals the price of capital goods, a level that in general differs from those for both cases #1 and #2. In case #3, the firm does not immediately return to this level when capital depreciates; rather it waits until its capital stock reaches the lower bound, at which point the marginal benefit from returning to the optimal level just covers the fixed cost of doing so. Under the baseline model parameterization, at this lower bound the firm faces a funding need that exceeds its internal resources, which it satisfies by borrowing. As the capital stock depreciates from the upper to the lower bound, the firm uses its cash flow first to pay down debt and then to increase cash balances in anticipation of the approaching large funding need. This behavior dictates a fixed range for the optimal levels of physical capital and debt (and of net debt). Hence, in case
#3, the firm has a range of target leverage ratios that is determined by its levels of debt and capital as physical capital depreciates from the upper to the lower bound described above. Fig. 2 plots target ratios of debt to total assets as a function of investment shock volatility and serial correlation for firms that respectively face (i) zero costs of adjusting the physical capital stock, (ii) high convex costs of adjustment, and (iii) high fixed costs of adjustment, i.e., for variants of cases #1–3 discussed above. Target leverage is unique for capital stock adjustment cost scenarios (i) and (ii), but takes a range of values for scenario (iii), with Fig. 2 reporting the upper bound of the target range and for simplicity omitting the lower bound, which is 0.0 in all cases. The figure indicates that lower target leverage is associated with higher levels of shock volatility and of shock serial correlation (Panels A and B, respectively). The intuitive explanation is that a higher value of each parameter implies a higher probability that large investment outlays are optimal, which in turn provides incentives for firms to adopt capital structures with more conservative leverage, hence greater ability to issue transitory debt. Target leverage is also negatively related to the marginal profitability of investment, but the relation is not as strongly negative as it is for shock volatility and serial correlation (details not shown in the figure). Fig. 3 illustrates the existence of a leverage target and the convergence to target for a firm that faces convex capital stock adjustment costs but no fixed adjustment costs ða ¼ 0:15, g ¼ 0:00Þ. Over dates t = 0 through t =52, the firm’s debt level fluctuates in response to the arrival of investment shocks and to its decision to pay down debt in periods in which cash flow realizations exceed contemporaneous funding needs. In some cases, the firm reduces debt below target because shock realizations—coupled with serial correlation of investment shocks—indicate that large future investment outlays are likely to be optimal, and so the firm temporarily builds debt capacity in anticipation. At t = 52, the firm experiences a neutral investment shock, and such shocks continue to arrive. The firm uses its cash flow realizations to pay down debt and, at t = 55, it thereby attains its long-run leverage target where it remains as neutral shocks continue to arrive. (Note that the target is not the leverage ratio at which the firm begins receiving neutral shocks, but rather is the leverage to which the firm moves in the limit if it were to experience repeated neutral shocks.) If non-neutral shocks were to resume, leverage would once again follow a volatile path. If instead the firm faced fixed costs of adjusting its capital stock, it would not have a constant leverage target. Rather, after t = 52, the firm’s target d*/k* ratio would fluctuate as the optimal capital stock, k*, depreciates and the firm delays replenishment, while the debt level, d*, is adjusted downward in response, but generally not in strict proportion, to the reduction in k*.
5. Speed of adjustment to target capital structure Extant tradeoff models predict that, whenever leverage differs from target because of factors beyond managers’
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
249
Target capital structure and investment shock standard deviation Target debt-to-assets ratio
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.01 0.05 0.09 0.13 0.18 0.22 0.26 0.30
018
022
026
030
Shock standard deviation No capital stock adjustment costs High convex adjustment costs High fixed adjustment costs (upper bound) Target capital structure and investment shock serial correlation Target debt-to-assets ratio
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.01 0.05 0.09 0.13 0.18 0.22 0.26 0.30
018
022
026
030
Shock standard deviation No capital stock adjustment costs High convex adjustment costs High fixed adjustment costs (upper bound) Fig. 2. Target capital structure as a function of the attributes of investment opportunities. Leverage is measured as the ratio of debt to total assets. Shock volatility (sv ) and serial correlation (r) parameters are centered around the estimates from the SMM estimation in Section 2. Target leverage is unique for the no capital-stock adjustment cost and high convex adjustment cost cases, but not for the high fixed cost case. Both panels plot the upper bound on target leverage for the latter case, with the lower bound always equal to 0.00.
0.6
Leverage Debt issuance/retirement
Debt-to-assets ratio
0.4 0.2 0 1 -0.2
6
11
16
21
26
31
36
41
46
51
56
61
66
Time
-0.4 -0.6 Fig. 3. Illustrative convergence to target leverage in the estimated model. The firm experiences random investment shocks until date t = 52, at which point it begins to receive a series of neutral investment shocks. It converges to target at t= 55 and remains there as neutral shocks continue to arrive. This illustrative firm faces convex capital stock adjustment costs, but no fixed costs of adjustment, and so it has a unique long-run target leverage ratio.
250
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
control, firms rebalance toward target as quickly as is economical, given the costs of security issuance. Fama and French (2002) cast doubt on the explanatory power of tradeoff models because the estimated speed of adjustment (SOA) to target is ‘‘suspiciously slow.’’ Subsequent studies support this view with estimates that firms move on average between one-third and one-twelfth of the way toward target each year (see, e.g., Flannery and Rangan, 2006; Kayhan and Titman, 2007; Lemmon, Roberts, and Zender, 2008; Parsons and Titman, 2008). Our estimated model parameters imply slow average speeds of adjustment to target leverage in the same neighborhood as the average SOA estimates reported in prior empirical studies. The slow SOA in our model reflects an ongoing shock-dependent sequence of both (i) debt issuances that raise funds needed for investment, but that also move firms temporarily away from target, and (ii) debt repayments in which firms rebalance toward target when investment needs slacken, in order to free up debt capacity for future borrowing. Extant tradeoff models treat investment as exogenous, and so they rule out the transitory deviations from target to fund investment which, in our model, slow the average SOA as it is measured in prior empirical studies. As a result, the SOA measures employed in prior studies understate the strength of the actual leverage rebalancing incentives firms face in our model because they inappropriately include leverage changes in which firms deliberately but temporarily move away from target to fund new investment. When we exclude the latter changes from our SOA measures, firms in our model move aggressively toward target leverage. Table 5 presents our main SOA results, which are generated using the approach employed in prior sections to obtain our comparative statics results, with model parameters set at the estimated baseline values (per Section 2). In rows 1 through 3, we report three different measures of the average rate at which firms move toward target leverage, with each model-generated SOA reported both (i) unconditional on current leverage, and (ii) conditioned on whether current leverage is above/at or below target. The remaining rows (4–33) of Table 5 show model-generated leverage changes and attributes of the related financing decisions that underlie our measured speeds of adjustment to target. Because investment plays an important role in the rate at which firms deviate from and rebalance to target, each variable in rows 4–33 is reported unconditional on investment, as well as conditional on low, moderate, and high levels of investment. In Table 5, rows 1 and 2 report model-generated average SOA measures that are calculated, as done in prior empirical studies, to include both (i) rebalancing decisions, and (ii) financing decisions that deliberately move firms away from target, while row 3 reports the average SOA measured as our model indicates it should be, i.e., by (i) alone.10 Since the model is estimated using 10 ‘‘Average SOA’’ in row 1 is the mean of (1) the change in leverage divided by (2) the distance from current leverage to target leverage. As discussed below, firms sometimes overshoot the leverage target, and inclusion of such overshooting biases upward the measured average rate
yearly data, each SOA in Table 5 represents an annual rate of movement toward target. Row 1’s average SOA is 0.142, which indicates that, in a randomly selected model year, firms move about one-seventh of the way toward target, on average—a figure that is close to the annual estimates in recent SOA studies. Row 2’s average regression-based SOA is 0.378, which implies a movement of a little over one-third of the way toward target in each randomly selected year—a figure that is again close to prior empirical estimates. In sharp contrast, row 3’s average SOA, which excludes firms’ proactive decisions to deviate from target, is 0.605. This result indicates that firms whose current circumstances favor rebalancing do so aggressively, moving on average about 60% of the distance toward target in each year. As expected when firms have target leverage ratios, firms are more likely to decrease leverage when it is currently above target (rows 8 and 9 of Table 5) and more likely to increase leverage when it is below target (rows 23 and 24), with the average leverage change negative in the former case and positive in the latter ( 0.015 and 0.036, respectively, per rows 4 and 19). While these average changes are modest in absolute value, both include substantial increases and decreases (rows 5, 6, 20, and 21)—an indication that leverage changes often move firms significantly away from target. More precisely, the probability of a leverage increase is 0.372 when leverage is above target (row 7), while the probability of a leverage decrease is 0.347 when leverage is below target (row 23). Firms often take material (temporary) excursions away from target, and that is why conventional SOA measures indicate slow speeds of adjustment for model-generated leverage ratios. In short, conventional SOA measures obscure the consequences of our model’s implication that firms aggressively rebalance leverage toward target in some but not all states of the world. With endogenous investment, the specific attributes of a firm’s investment opportunities dictate whether rebalancing toward or deviating further from target leverage is currently optimal. In our model, firms have incentives to avoid maintaining a permanent large cash reserve due to corporate taxes, agency costs, and/or Keynesian liquidity costs. And, since firms are operating on a ‘‘tight leash’’ with respect to cash balances, external financing becomes necessary more often to meet the marginal funding needs associated with investment shocks. Moreover, because equity issuance is more costly than debt issuance, debt is an attractive source of marginal financing, with proactive debt issuance decisions (to fund
(footnote continued) at which firms adjust leverage towards target. We accordingly cap the ratio at 1.000 for any observation in which the leverage change exceeds the distance from target. ‘‘Average SOA toward target’’ in row 3 is calculated in the same fashion, but excludes observations that do not move leverage toward target. ‘‘Regression-based SOA’’ in row 2 is the SOA implied by a regression that uses model-generated leverage ratios and follows the general approach of extant SOA tests—i.e., current leverage is the left-hand side variable, while the explanatory variables are lagged leverage, firm value/assets, and cash flow/assets. The latter variables are included as proxies for the determinants of target leverage of the type posited in empirical SOA studies.
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
251
Table 5 Speed of adjustment to target. This table reports speeds of adjustment to target in a simulation of the baseline model. We simulate the model for 100,200 periods, with the firm receiving random investment shocks and responding to each by adjusting its investment and financing decisions. We discard the initial 200 periods of data. ‘‘Average SOA’’ in row 1 is the mean of the ratio of the change in leverage to the distance from current leverage to target leverage. ‘‘Average SOA toward target’’ in row 3 is calculated in the same fashion, but excludes observations that do not move leverage towards target. ‘‘Regression-based SOA’’ in row 2 is the SOA implied by a regression that uses model-generated leverage ratios and follows the general approach of extant SOA tests—i.e., current leverage is the left-hand side variable, while the explanatory variables are lagged leverage, firm value/assets, and cash flow/assets. All statistics are reported for the entire simulated sample, as well as for the first through third terciles of the ratio of investment to capital. A. Alternative SOA measures
1. 2. 3.
Current leverage:
Average SOA Regression based SOA Average SOA towards target
Unconditional
at or above target
below target
0.142 0.378 0.605
0.115 0.332 0.543
0.204 0.593 0.769
B. Leverage changes and investment
Investment: Unconditional
Low
Moderate
High
Leverage above or at target 4. Average change in leverage 5. Average increase in leverage 6. Average decrease in leverage 7. Probability of leverage increase 8. Probability of leverage decrease 9. Probability of overshooting target
0.015 0.057 0.058 0.372 0.627 0.280
0.021 0.057 0.061 0.344 0.655 0.293
0.017 0.052 0.056 0.366 0.634 0.253
0.007 0.063 0.057 0.414 0.585 0.293
10. 11. 12.
Probability of debt issuance Probability of debt repayment Probability of equity issuance
0.406 0.592 0.244
0.124 0.875 0.083
0.238 0.760 0.263
0.940 0.059 0.416
13. 14. 15.
Size of debt issuance Size of debt repayment Size of equity issuance
0.089 0.079 0.015
0.046 0.099 0.018
0.031 0.059 0.015
0.113 0.025 0.015
16. 17. 18.
Average shock Probability of a positive shock Fraction of observations
0.061 0.561 0.702
0.292 0.139 0.254
0.113 0.671 0.238
0.429 0.949 0.210
Leverage below target 19. Average change in leverage 20. Average increase in leverage 21. Average decrease in leverage 22. Probability of leverage increase 23. Probability of leverage decrease 24. Probability of overshooting target
0.036 0.099 0.053 0.554 0.347 0.464
0.021 0.083 0.053 0.524 0.410 0.521
0.024 0.088 0.054 0.514 0.389 0.434
0.055 0.115 0.053 0.604 0.273 0.449
25. 26. 27.
Probability of debt issuance Probability of debt repayment Probability of equity issuance
0.547 0.352 0.293
0.161 0.717 0.113
0.378 0.493 0.293
0.930 0.005 0.410
28. 29. 30.
Size of debt issuance Size of debt repayment Size of equity issuance
0.154 0.061 0.029
0.047 0.073 0.028
0.039 0.047 0.026
0.203 0.022 0.031
31. 32. 33.
Average shock Probability of a positive shock Fraction of observations
0.150 0.348 0.298
0.459 0.145 0.080
0.189 0.222 0.095
0.080 0.577 0.123
current investment) and repayment decisions (to replenish future borrowing capacity) reflecting the sequence of optimal investment outlays. Furthermore, because investment shocks are serially correlated, firms will sometimes respond to a specific favorable shock by moving/remaining temporarily below target in order to obtain/preserve additional borrowing capacity (and perhaps to build cash balances) so as to be in a better position to fund the higher future investment outlays that are more likely given the recent shock realizations.
To clarify the link between investment outlays and movements relative to target, we first consider firms whose current leverage is above/at target. When such firms face high investment outlays, the probability of a debt issuance is 0.940 (row 10 of Table 5), and the average issuance is 11.3% of total capital (row 13). When investment needs are low, the debt issuance probability falls to 0.124 (row 10), and new borrowing averages only 4.6% of capital (row 13). The situation is reversed for debt repayments, as firms with low investment outlays repay
252
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
debt with a 0.875 probability, which far exceeds the 0.059 repayment probability in high investment states of the world (row 11). Moreover, in low investment states, the average debt reduction is four times that associated with high investment (9.9% versus 2.5% of capital, per row 14). Finally, while the average equity issuance is always small (1.8% of capital or less, per row 15), its likelihood nonetheless depends on investment—at 0.416 with high investment versus 0.083 with low investment (row 12). In sum, with high investment, firms that are currently above target leverage often issue substantial debt (and sometimes issue small amounts of equity), thereby deviating further from target, whereas with low investment, these firms typically pay down debt and thus replenish future borrowing capacity. The attributes of firms’ investment decisions also govern the leverage rebalancing decisions of firms whose current leverage is below target. With high investment, these firms’ debt issuance probability is 0.930 (row 25), with an average issuance of 20.3% of capital (row 28). Such debt issuances represent aggressive movements toward target and, because the typical need for cash to fund investment is great, the (conditional) probability is 0.449 that the new leverage ratio overshoots the long-run leverage target (row 24). When investment is low, the debt issuance probability is far smaller (0.161, per row 25), as is the size of the average issuance (4.7% of capital, per row 28). However, the probability of overshooting target is nontrivial in all cases (row 24), indicating that when firms move toward target, they do so aggressively, motivated more by the need to fund investment than by the desirability of quickly returning to target leverage. Finally, when investment is low and leverage is below target, firms repay debt with probability 0.717 (row 26), a result that seems counterintuitive because firms are reducing their debt when leverage is below target. In general, why do firms sometimes choose to remain or move below target leverage when they could lever up at zero transactions costs by borrowing and immediately distributing the proceeds to stockholders? The answer is that investment shocks are serially correlated in our model, which implies that firms sometimes rationally build additional (temporary) debt capacity by moving below target when a given shock implies an increased likelihood of future shocks that will require additional resources to fund investment. While the influence of each such investment shock erodes over time, serially correlated shocks nonetheless encourage firms to remain below target for multiple periods, even when they could easily lever up. The same logic explains why firms with current leverage above target sometimes overshoot the target when reducing leverage (row 9). It also explains why we analytically define the long-run leverage target in terms of a limiting sequence of capital structures—since the influence on leverage of any given shock approaches zero over time, firms’ leverage converges to the long-run target in the limit as neutral shocks continue to arrive and any lingering influence of prior non-neutral shocks fades over time. Dudley (2009) argues that the SOA toward target is faster when firms raise external capital for investment
because capital structure adjustments are lumpy and coincide with discrete investment outlays. Our analysis also indicates that conventional SOA measures vary with the level of investment. But our model differs in its prediction that the measured SOA often takes negative values because firms often issue transitory debt and move away from target to fund investment. Hence, our model explains the empirically observed low average levels of the SOA to target on grounds that many inputs to the average are negative. Dudley presents evidence consistent with his model’s prediction that firms adjust leverage toward target when cash is raised for projects. However, our Section 6 evidence is inconsistent with the latter prediction. We examine capital structure changes that accompany investment ‘‘spikes’’ and find that firms with above-average leverage ratios typically issue debt and move further away from target to meet their funding needs. 6. Leverage, investment, and the explanatory power of the model This section reports the results of four tests of our model’s ability to explain observed leverage decisions. First, we use SMM estimation to gauge the model’s ability to match industry-level leverage for 41 two-digit SIC code industries. Second, we regress average industry leverage on the structural parameters for 40 of these estimations (excluding railroads, the sole poorly matched industry) and test whether attributes of investment opportunities exert statistically significant influences on leverage in a manner consistent with our model’s predictions. Third, we run a ‘‘horse race’’ that compares the explanatory power of our model with that of tradeoff models in which investment policy is exogenous and firms rebalance leverage subject to capital structure adjustment costs. Fourth, we test the investment/capital structure implications delineated in Panel B of Table 5 by analyzing the financing decisions associated with investment ‘‘spikes’’ by Compustat firms. 6.1. Industry leverage Frank and Goyal (2009, Table III) find that industry median leverage is the single most important (out of 34) determinant of corporate leverage, with almost three times the stand-alone explanatory power of the next most important factor. This finding suggests that a relevant gauge of the empirical usefulness of any model of capital structure is its ability to explain cross-industry variation in leverage. Fig. 4 indicates that our model does a remarkably good job explaining cross-industry variation in leverage.11 The figure plots the actual versus simulated average debt-to-asset ratios for 41 two-digit SIC industries, each of which has at least 100 firm-year observations over 1988–2001. We obtain the simulated leverage ratios from 41 distinct SMM 11 Although it would be appealing to be able to gauge the explanatory power of firm-specific SMM estimations, we focus here on industry-level explanatory power because current computational limitations make it problematic to run thousands of different SMM estimations, and because of the impossibility of calculating standard errors for firms that have few observations relative to model parameters.
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
253
Simulated average debt-to-assets ratio
0.5
0.4
0.3
0.2
0.1
0.0 0.1
0.0
0.2 0.4 0.3 Actual average debt-to-assets ratio
0.5
Fig. 4. Actual versus simulated debt: 41 industries. This figure depicts the results of 41 separate SMM estimations, each using data from a different twodigit SIC code industry. Calculations are based on a sample of nonfinancial, unregulated firms from the annual 2007 Compustat industrial files. The sample period is 1988–2001. On the vertical axis is the model-implied average leverage (debt-to-assets) ratio given the industry-specific estimated parameters. On the horizontal axis is the actual average industry leverage ratio. The figure includes a 451 (diagonal) line out of the origin to make it easier to gauge the gap between actual and model-implied leverage. Simulated and actual leverage are not statistically different from each other, except in the case of the railroad industry, which is represented by the point away from the 451 line in the upper right corner.
Table 6 Regression of industry average debt on estimated industry parameters: 40 industries. OLS regressions are estimated from a cross-section of 40 two-digit SIC code industries over 1988–2001. The left-hand side variable is average industry book leverage. The right-hand side variables are structural parameter estimates from 40 separate SMM estimations of the model in Section 2. l1 and l2 are the linear and quadratic costs of equity issuance. sv: is the standard deviation of the innovation to lnðzÞ, in which z is the shock to the revenue y
function. r is the serial correlation of lnðzÞ. y is the curvature of the revenue function, zk . g and a are the fixed and convex adjustment cost parameters, and s is the agency cost parameter. p=kss is the debt ceiling expressed as a fraction of the steady state capital stock, kss. Standard errors are in parentheses.
Coefficient Standard error t-statistic
l1
l2
sv
r
y
g
a
s
p=kss
R2
0.073 (0.186) 0.392
1.730 (2.315) 0.747
1.056 (0.470) 2.246
0.757 (0.423) 1.789
1.154 (0.387) 2.983
1.677 (2.353) 0.713
0.664 (0.180) 3.696
0.935 (0.272) 3.443
0.589 (0.219) 2.691
0.669
estimations (per the approach in Section 2), and take actual leverage ratios from Compustat for the estimation period. With one exception (railroads), all industries plot close to the 451 line in Fig. 4—a graphical manifestation of the fact that our parsimonious nine-parameter SMM estimations generate industry average leverage ratios that are insignificantly different from actual industry average leverage for 40 of the 41 industries. We also find that the model performs respectably in terms of matching most other moments (details not in Fig. 4), with the volatility of equity issuance and the level of industry investment representing the two areas in which modeling and estimation refinements would most likely improve explanatory power. Table 6 presents the results of a regression of average industry leverage on the structural parameter estimates that we obtain from 40 of the 41 SMM industry estimations described above (excluding railroads). The R2 of 0.669 indicates that our model explains a substantial proportion of the cross-industry variation in average leverage. Three estimated parameters that describe aspects of the investment opportunity set—shock volatility, profit function
curvature, and convex costs of adjusting the capital stock—have statistically significant influences of the predicted signs, with respective t-statistics of 2.246, 2.983, and 3.696, but the serial correlation parameter is only marginally significant (t-statistic of 1.789). The parameters for debt capacity and costs of cash balances are of the expected signs and statistically significant (t-statistics of 3.443 and 2.691, respectively), while the equity issuance cost and fixed capital stock adjustment cost parameters are immaterially different from zero. The overall picture is that the model does a good job explaining variation in industry average leverage, with most estimated parameters exerting significant influences on industry leverage in a manner consistent with the model’s predictions.
6.2. The model benchmarked against extant tradeoff models Table 7 indicates that our model does a better job explaining debt issuances and repayments over 1988–2001 than do extant tradeoff models in which investment policy
254
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
Table 7 Model error rates. Each model error rate equals the absolute value of the difference between the actual frequency of debt issuances (or repayments) over 1988–2001 and the frequency predicted by the model in question. For the current model (labeled DDW), predicted frequencies of issuance or repayment are based on 40 distinct SMM model estimation exercises, one for each of 40 two-digit SIC code industries, as described in the text, and with each target determined as in Section 2.3. The tradeoff model refers to the class of either static or dynamic leverage rebalancing models in which (i) investment policy is exogenous, and (ii) firms potentially face costs of capital structure adjustment. For the tradeoff model, we take industry average leverage as the empirical proxy for target leverage. When leverage is above or at target, the tradeoff model error rate equals the empirical incidence of debt issuances that result in leverage above target (a result that is ruled out by rebalancing theories). Similarly, when leverage is below target, the tradeoff model error rate equals the incidence of debt repayments that result in leverage below target. The mean (median) error rate is the average (median) across the 40 industries of the mean withinindustry error rates. The Z-statistic is for a test of the null hypothesis that the mean model error rate for the tradeoff model is greater than the mean model error rate for the DDW model. Unconditional
Investment: Low
Tradeoff model Leverage above or at target: Mean model error rate (n =40 industries) Median model error rate % of industries with DDW model error rate o tradeoff model error rate Z-statistic
Leverage below target: Mean model error rate (n =40 industries) Median model error rate % of industries with DDW model error rate o tradeoff model error rate Z-statistic
DDW model
0.441 0.078 0.440 0.074 100.0%
Tradeoff model
DDW model
0.349 0.355
0.282 0.292
22.258
0.330 0.062 0.338 0.047 100.0% 14.358
is exogenous and firms rebalance leverage subject to adjustment costs. We arrive at this inference by first noting that all such tradeoff models share the following two testable predictions about debt issuance and repayment decisions. First, firms with leverage currently above target will never issue debt and have their resultant leverage ratios remain above target, since such issuances unnecessarily move leverage away from target. By symmetric reasoning, extant tradeoff models also predict that firms with leverage currently below target will never pay down debt and have their resultant leverage ratios remain below target. Therefore, when beginning-of-period leverage exceeds target, the tradeoff model error rate in Table 7 is the fraction of cases in which firms both issue debt and have end-of-period leverage above target. Similarly, when beginning-of-period leverage is below target, the model error rate is the fraction of cases in which firms both repay debt and have end-of-period leverage below target. The error rates for our model (labeled DDW in Table 7) are calculated analogously, and equal the absolute value of (i) the actual frequency of debt issuances or repayments minus (ii) the corresponding frequency predicted by the model. Predicted issuance or repayment frequencies are determined as in Table 5 (rows 10, 11, 25, and 26), except that now we generate different predictions for each of the 40 two-digit SIC industries whose SMM estimations are described above. (We take industry average leverage as a proxy for the leverage target under the tradeoff model of all firms in the industry in question, whereas the target for our model is determined as in Section 2.3, with all firms in a given industry assumed to have the same target.)
Medium Tradeoff model
High
DDW model
85.0%
0.424 0.240 0.429 0.236 100.0%
2.682
6.858
0.365 0.382
0.277 0.298
0.367 0.377
Tradeoff model
DDW model
0.588 0.572
0.325 0.324 97.5% 8.942
0.120 0.109
0.279 0.285
0.287 0.282
62.5%
95.0%
40.0%
2.209
9.055
0.346
We follow the structure of Table 5 and report model error rates for the full sample and for subsamples conditioned on low, medium, and high investment. The mean (median) error rate in Table 7 is the average (median) across the 40 industries of the mean withinindustry error rates. The Z-statistic is for a test of the null hypothesis that the mean of the error rates for the 40 industries under the tradeoff model equals the mean of the absolute value of the error rates under the DDW model. The table also reports the percent of industries for which the value of the mean error rate for the DDW model is less than the corresponding error rate for the tradeoff model. For all 40 industries (100.0% of the sample), Table 7 indicates that the error rate for the DDW model is less than the error rate for the tradeoff model, both when leverage is currently above (or at) target, and when it is below target. When leverage is above target, the mean DDW model error rate is 0.078, which is roughly one-sixth the corresponding mean error rate of 0.441 for the tradeoff model, and the difference between the mean error rates is highly significant (Z-statistic = 22.258). The median error rates (0.074 and 0.440) indicate a similarly large differential in predictive power in favor of the DDW model, as does the compilation in Table 7 of the two models’ industry error rate statistics conditioned on leverage currently falling below target. When leverage is currently above target, the DDW model has a lower absolute error rate than the tradeoff model for 97.5%, or 39 of the 40 industries when investment is high, and for 100.0% and 85.0% of the
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
industries, respectively, when investment falls in the medium and low groups. In all cases, the mean error rate for the DDW model is significantly less than that for the tradeoff model, with t-statistics ranging from 8.942 for high investment to 2.682 for low investment. When leverage is currently below target, the DDW model retains an edge over the tradeoff model, but the difference is narrower, with 62.5% and 95.0% of industries showing smaller error rates under our model at low and medium investment levels. The tradeoff model has a smaller error rate in 60.0% of the industries when investment is high and leverage is below target, but the difference in mean error rates is not significant (t-statistic = 0.346). When leverage is above target and investment is high, the tradeoff model exhibits especially large mean and median error rates of 0.588 and 0.572 versus 0.325 and 0.324 for the DDW model (per Table 7). The large error rates of the tradeoff model reflect its strong tendency to under-predict debt issuances to fund investment, which
255
arises because all such issuances are ruled out by the model’s assumption that investment is exogenous. The smaller error rates of the DDW model arise because investment is determined endogenously with capital structure, and because the model assigns a central role to firms’ incentive to issue transitory debt to fund investment.
6.3. Debt issuances and investment ‘‘spikes’’ We next focus on large investment ‘‘spikes’’ to test the predictions about the connection between investment and leverage decisions that are delineated in Panel B of Table 5. We study investment spikes because the data inherently contain a substantial amount of noise, hence there is greater ability to detect any material capital structure changes when focusing on firms at times that they make large investment outlays. Our main findings are reported in
Table 8 Debt issuance and investment spikes. The table reports mean (median) values of annual investment, debt issuance, and other financing variables for Compustat industrial firms over 1962– 2008. An investment spike is defined as an investment outlay (variable 1) that is two or more standard deviations above the mean for the firm’s two-digit SIC code industry. Non-spikes are investment outlays that are less than two standard deviations from the industry mean. Variables 1–4, 7, 8, 10, and 11 are standardized by total assets at the beginning of the year. Variable 13 is calculated using the ‘‘speed of adjustment toward target’’ definition for item 3 of Table 5, i.e., excluding observations that do not move leverage toward target. The partitioning into above- and below-average leverage ratios is based on the average of all observations over 1962–2008 for the firm’s two-digit industry. Similar results obtain when we restrict attention to observations for 1988–2001 and use the target leverage estimates from our industry SMM analyses and partition the sample into above and below target based on the SMM results. Full sample Spike
Non-spike
Above average leverage Spike
Non-spike
Below average leverage Spike
Non-spike
1. Investment
0.114 (0.081)
0.071 (0.054)
0.116 (0.077)
0.069 (0.050)
0.113 (0.083)
0.072 (0.057)
2. Change in investment
0.056 (0.031)
0.002 (0.002)
0.053 (0.027)
0.003 (0.000)
0.058 (0.033)
0.007 (0.004)
3. Debt issuance
0.127 (0.112)
0.008 (0.000)
0.128 (0.123)
0.008 ( 0.003)
0.127 (0.107)
0.021 (0.000)
4. Lagged debt issuance
0.020 (0.000)
0.015 (0.000)
0.066 (0.051)
0.040 (0.027)
0.005 (0.000)
0.006 (0.000)
5. Lead debt issuance
0.018 (0.002)
0.010 (0.000)
0.015 (0.008)
0.004 ( 0.002)
0.019 (0.000)
0.022 (0.000)
6. Beginning-of-year debt/assets
0.203 (0.185)
0.248 (0.233)
0.380 (0.345)
0.398 (0.366)
0.108 (0.103)
0.116 (0.116)
7. End-of-year debt/assets
0.283 (0.277)
0.246 (0.229)
0.416 (0.403)
0.379 (0.355)
0.211 (0.212)
0.129 (0.119)
8. Change in cash
0.014 ( 0.002)
0.008 (0.002)
0.001 (0.003)
0.005 (0.001)
0.020 ( 0.009)
0.010 (0.004)
9. Lagged change in cash
0.024 (0.008)
0.006 (0.002)
0.015 (0.004)
0.002 (0.001)
0.028 (0.012)
0.010 (0.004)
10. Beginning-of-year cash/assets
0.141 (0.086)
0.105 (0.059)
0.086 (0.048)
0.063 (0.038)
0.171 (0.117)
0.142 (0.095)
11. Equity issuance
0.034 (0.002)
0.017 (0.001)
0.039 (0.002)
0.018 (0.000)
0.031 (0.002)
0.016 (0.001)
12. Lagged equity issuance
0.029 (0.002)
0.018 (0.000)
0.020 (0.001)
0.014 (0.000)
0.033 (0.002)
0.021 (0.001)
13. Speed of adjustment
0.234
0.070
0.295
0.113
0.518
0.032
14. Number of observations
3,756
81,437
1,314
37,978
2,442
43,459
256
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
Table 8, which analyzes debt issuance and other financing decisions associated with investment spikes by Compustat industrial firms over 1962–2008. We define an investment spike as an annual capital expenditure outlay (divided by beginning-of-year total assets) that is two or more standard deviations above the mean for the firm’s two-digit SIC code industry, with all smaller investment outlays defined as non-spikes. As a check, we also use several other thresholds (as in Whited, 2006, Table 2) to define investment spikes and find results similar to those in Table 8 (details not reported). The sample contains 3,756 spikes and 81,437 non-spikes. The mean investment spike is 11.4% of assets versus 7.1% for non-spikes, and the respective medians are 8.1% and 5.4% (row 1 of Table 8). The typical spike entails a mean increase of 5.6% in the ratio of investment to total assets, while the average non-spike corresponds to an increase in that ratio of only 0.2% (row 2). Investment spikes are associated with large debt issuances that average 12.7% of assets (row 3 of Table 8), versus an average issuance of only 2.0% in the prior year (row 4) and 1.8% in the year after the spike (row 5), with medians showing a comparably large increase in debt issuance in the year of the spike. In contrast, debt issuances average only 0.8%, 1.5%, and 1.0% in the years around non-spikes (see rows 3, 4, and 5 for the non-spike column). The 12.7% average debt issuance (row 3) is far greater than the 1.4% drawdown of cash balances in the year of the spike (row 8) and the 3.4% equity issuance in that year (row 11). The average change in cash and issuance of equity are closer to zero in non-spike years (rows 8 and 11). In the year before an investment spike, firms increase cash balances by an average of 2.4% (row 9) and issue equity equal to 2.9% of assets (row 12), both of which are swamped by the size of the average debt issuance in the year of the spike (row 12). The overall effect is that the average debt-to-assets ratio increases from 20.3% right before an investment spike to 28.3% at the end of the spike year (rows 6 and 7), while non-spike years begin with an average leverage ratio of 24.8% and end with a virtually identical 24.6% ratio. Investment spikes are associated with comparably large average debt issuances of 12.8% when leverage is currently above average (see row 3 of the third column of Table 8), and these large issuances now come on the heels of a nontrivial average debt issuance of 6.6% in the immediately prior year (row 4). Cash changes and equity issuances remain modest in the year of and year preceding the investment spike (rows 8, 9, 11, and 12), and so debt issuances continue to be much larger than these alternative sources of capital. In terms of the net impact on leverage, firms with investment spikes and above-average leverage increase the debt-to-assets ratio from an average of 38.0% at the beginning of the spike year to 41.6% at the end of that year. Overall then, even when leverage is above average—and therefore typically above target leverage for the tradeoff model—investment spikes are accompanied by increases in leverage. The explanation offered by our model is that the benefit of issuing debt to fund the current period investment spike typically overrides the advantages of immediately rebalancing leverage in the direction of target.
The findings in Table 8 are close in spirit to those of four prior studies that document that debt issuances commonly move firms away from their leverage targets. Hovakimian (2004) finds that ‘‘ydebt issues do not reduce the deviation from the target debt ratio. The pre-debt-issue deviation from the target is essentially zero. The issuance of debt increases rather than reduces the deviation from the target.’’ In Hovakimian, Opler, and Titman’s (2001, Table 4) sample, the average long-term debt issue is 17.4% of total assets, and is undertaken when the firm’s debt-toassets ratio is 1.3% below the authors’ estimate of target. Harford, Klasa, and Walcott (2008) report that, in debt-financed acquisitions, bidding firms typically move away from their target capital structures, and then rebalance back toward target with a lag. Denis and McKeon (2009) document 2,513 cases (over 1971–1999) in which 2,272 firms substantially increase their total debt at a time when their debt ratios are at least 10% above estimated target leverage. Our Table 8 findings are consistent with those of Mayer and Sussman (2004) for large firms, but differ from theirs on small firms. Mayer and Sussman study 535 investment events, each of five-year duration with an investment spike in the middle, and find that large firms tend to issue debt to fund large investments, while small firms tend to issue equity. In untabulated sensitivity checks, we find results very close to those in our Table 8 for each of nine subsamples, which we form by partitioning our full sample of investment spikes into firms with high, medium, and low (i) values of total assets, (ii) ratios of the market-tobook value of equity, and (iii) ratios of the market-to-book value of total assets. These sensitivity checks indicate that the debt issuance and leverage change inferences drawn from the Table 8 findings continue to hold even when we focus on small firms and on companies with quintessential growth-firm attributes. Finally, Table 8 highlights the importance of investment outlays in explaining why empirical studies have tended to find more sluggish rebalancing to target than one would expect if extant tradeoff models were empirically descriptive. For firms with investment spikes, the average speed of adjustment (SOA) to target is 0.234, or about four years to reach target, where target is proxied by industry average leverage (row 13). When leverage is below average, the average SOA is twice as fast at 0.518, and the reason is that the attraction of borrowing to fund investment strongly propels the firm to increase leverage and move toward target. On the other hand, when leverage is above target, the SOA is actually negative, i.e., firms typically take on debt and move away from target at a moderate rate ( 0.295). Because extant tradeoff models leave no room for transitory debt issuances to fund investment, the latter cases enter into and dampen the SOA estimates, and give the impression that firms have little motive to rebalance leverage. The explanation offered by our model is that the dominant incentive is for firms to fund immediate investment outlays in a cost-effective manner, which often means by issuing transitory debt and deviating from target. Rebalancing toward target occurs most prominently when current period investment incentives are supportive of
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
realistic features to the model to examine whether leverage responds to our model parameters in a qualitatively similar way. We examine four extensions to the model, one at a time. First, we add debt issuance costs of 4%. Second, we add an extra state variable that allows the firm to hold cash and debt at the same time. This model contains a small issuance cost of ten basis points to ensure that optimal behavior entails the simultaneous holding of cash and debt. Third, following exactly Hennessy and Whited (2005), we add a collateral constraint on debt
such leverage changes, and otherwise leverage rebalancing grinds away slowly as dictated by cash flow realizations and the impediments associated with equity issuance costs.
7. Model robustness
0.9
1
0.8
0.9
0.7
0.8 Debt-to-assets ratio
Debt-to-assets ratio
To assuage concerns that our results are artifacts of the model’s simplicity, in this section we add several more
0.6 0.5 0.4 0.3
Issuance costs Extra state variable Collateral constraint Endogeneous default Baseline
0.7 0.6 0.5 0.4 0.3
0.2
0.2
0.1
0.1
0
0 0.00
0.04
0.08 0.11 0.15 0.19 Equity issuance cost
0.23
0.30
0.9
0.8
0.8
0.7
0.7
0.6
Debt-to-assets ratio
Debt-to-assets ratio
257
0.6 0.5 0.4 0.3
0.11
0.23
0.50
0.55
0.60
0.34 0.45 0.56 Serial correlation
0.68
0.90
0.80
0.90
0.5 0.4 0.3
0.2
0.2
0.1
0.1
0
0.00
0 0.15
0.19
0.24
0.28
0.33
0.37
0.41
0.50
0.65
0.70
0.75
Profit function curvature
Standard deviation 0.6
Deb-to-assetsratio
0.5 0.4 0.3 0.2 0.1 0 Convex
Fixed Adjustment costs
Fig. 5. Comparative statics in models with issuance costs, simultaneous debt and cash balances, collateral constraints, and endogenous default. Each panel depicts average leverage as a function of one of the model parameters: linear equity issuance costs, shock standard deviation, shock serial correlation, profit function curvature, and the convexity of capital stock adjustment costs. Each line in a panel depicts the relation from a particular version of the baseline model: one with issuance costs, one with simultaneous positive amounts of debt outstanding and cash balances (labeled ‘‘extra state variable’’), one with a collateral constraint, and one with endogenous default.
258
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
financing and financial distress in the form of a fire-sale discount of 40% on capital that must be sold when profits are insufficient to pay off debt. Fourth, we allow for endogenous default and deadweight costs of default. This model is described in Appendix B. For each of these four cases, we examine how leverage responds to changes in the linear cost of issuing equity, l1 , the serial correlation of productivity shocks, r, the standard deviation of productivity shocks, sv , and the curvature of the production function, y. We also perform an additional experiment in which we simultaneously increase the fixed cost, g, and decrease the convex cost, a, of adjusting the capital stock. The results from these comparative statics exercises appear in Fig. 5. In the first panel, we plot the relation between average leverage and linear equity issuance costs, l1 , for the baseline model estimates from Table 1, as well as for the four model variants described above. We allow l1 to vary from near 0.0 to 0.3, which is roughly double its baseline estimate of 0.1615. In all four model variants, we find the same negative relation between equity issuance costs and leverage as in the baseline model. Although the patterns from the model with separate cash holding and with debt issuance costs are almost identical to the pattern from the baseline model, the negative relation between equity issuance costs and leverage is attenuated in the models that incorporate financial distress because both equity issuance costs and financial distress work to depress leverage. This same general pattern appears in the second and third panels, which depict the relations between leverage and the serial correlation of productivity shocks, r, and the standard deviation of productivity shocks, sv . We allow r to vary from 0.1 to 0.9 and sv to vary from 0.15 to 0.5. In all five models, leverage decreases with both r and sv , and the models with a collateral constraint and endogenous default generate slightly weaker relations. The fourth panel shows the relation between leverage and profit function curvature, y, in which we let y range from 0.5 to 0.9. In this case leverage falls sharply with profitability in all five models. Finally, the fifth panel shows the relation between leverage and the nature of physical adjustment costs. To generate this plot, we allow the fixed cost of adjustment to vary from 0.0 to 0.04, while the convex cost varies from 0.3 to 0.0. Once again, leverage falls in all five models as adjustment costs become more fixed in nature and investment therefore optimally becomes more lumpy. In sum, our original simple model with a fixed debt capacity generates qualitatively the same comparative statics as do more complicated models. The advantage of the simple model is its ability to highlight the role of preserving debt capacity in a dynamic setting. In contrast, the additional features, such as financial distress, in these more complicated models sometimes muddy but never erase the tradeoff between utilizing debt capacity today and preserving it for future usage. We therefore view the results from our original simple model as broadly representative of the results from a much broader class of dynamic models.
8. Summary and conclusions We develop and estimate a dynamic capital structure model in which debt serves as a transitory financing vehicle that enables firms to meet funding needs associated with imperfectly anticipated investment shocks, while allowing them to economize on the costs of issuing equity and of maintaining cash balances. Firms that issue debt incur no flotation or other direct issuance costs, but nonetheless face an economically meaningful opportunity cost of borrowing, since a firm’s decision to issue debt in a given period reduces the debt capacity available to meet its future funding needs or, more generally, reduces the firm’s future ability to borrow at the terms it currently faces. The firm’s ex ante optimum debt level reflects the value of the option to use its debt capacity to borrow ex post and deliberately, but temporarily, move away from target to fund investment. The opportunity cost of borrowing— and the resultant transitory role of debt in capital structures—radically alters the nature of predicted leverage dynamics from those of other tradeoff models in which firms have leverage targets, but all proactive financing decisions move firms toward target. Our emphasis is squarely on the role of transitory debt, a concept that plays no role in extant tradeoff theories in which firms have leverage targets because those theories ignore the interplay among target leverage, leverage dynamics, and firms’ desire to raise capital to meet the intertemporal sequence of funding needs that arise from investment shocks. Because in our model firms issue transitory debt to finance investment outlays, the time path of deviations from and rebalancing to target is shaped both by the nature of prospective investment opportunities and by the precise sequence of shock realizations from the firm’s stochastic investment opportunity set. Our approach yields a variety of new testable predictions that link capital structure decisions to variation in the volatility and serial correlation of investment shocks, the marginal profitability of investment, and properties of capital stock adjustment costs. The model offers plausible explanations for otherwise puzzling aspects of observed capital structure decisions, including why firms often choose to deviate from their leverage targets and why empirical studies find such slow average speeds of adjustment to target. And our evidence indicates that the model replicates industry leverage very well, that it explains firms’ debt issuance/repayment decisions better than extant tradeoff models of capital structure, and that it can account for the leverage changes that accompany investment ‘‘spikes.’’ Appendix A This appendix discusses the numerical procedure, the data, and the estimation procedure. A.1. Model solution To find a numerical solution, we need to specify a finite state space for the three state variables. We let the capital
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
stock lie on the points ½kð1dÞ35 , . . . ,kð1dÞ1=2 ,k:
ð12Þ
We let the productivity shock z have 19 points of support, transforming (1) into a discrete-state Markov chain on the interval ½4sv ,4sv using the method in Tauchen (1986). We let p have 29 equally spaced points in the interval ½p=2,p, in which p is a parameter to be estimated. The optimal choice of p never hits the lower endpoint, although it occasionally hits the upper endpoint when the firm finds it optimal to exhaust its debt capacity. For our estimated value of p, equity value, V, is always strictly positive in all states of the world. We solve the model via iteration on the Bellman equation, which produces the value function V(k,p,z) and the policy function fku,pug ¼ uðk,p,zÞ. In the subsequent model simulation, the space for z is expanded to include 152 points, with interpolation used to find corresponding values of V, k, and p. The model simulation proceeds by taking a random draw from the distribution of zu (conditional on z), and then computing V(k,p,z) and u(k,p,z). We use these computations to generate an artificial panel of firms. A.2. Data We obtain data on U.S. nonfinancial firms from the 2007 Standard and Poor’s Compustat industrial files. These data constitute an unbalanced panel that covers 1988 to 2001. As in Hennessy and Whited (2005), we choose this sample period because the tax code during this period contains no large structural breaks. To select the sample, we delete firm-year observations with missing data and for which total assets, the gross capital stock, or sales are either zero or negative. Then for each firm we select the longest consecutive times series of data and exclude firms with only one observation. Finally, we omit all firms whose primary SIC code is between 4900 and 4999, between 6000 and 6999, or greater than 9000, because our model is inappropriate for regulated, financial, or quasi-public firms. We end up with between 3,066 and 5,036 observations per year, for a total of 53,677 firm-year observations. A.3. Estimation We now give a brief outline of the estimation procedure, which closely follows Ingram and Lee (1991). Let xi be an i.i.d. data vector, i=1,y,n, and let yik(b) be an i.i.d. simulated vector from simulation k, i=1,y,n, and k=1,y,K. Here, n is the length of the simulated sample, and K is the number of times the model is simulated. We pick n=53,677 and K=10, following Michaelides and Ng (2000), who find that good finite-sample performance of a simulation estimator requires a simulated sample that is approximately ten times as large as the actual data sample. The simulated data vector, yik(b), depends on a vector of structural parameters, b. In our application, b ðy, r, sv ,a, g,s, l1 , l2 Þ. Three parameters we do not estimate are the depreciation rate, d, the real interest
259
rate, r, and the effective corporate tax rate, tc . We set d at 0.15, which is approximately equal to the average in our data set of the ratio of depreciation to the capital stock. We set the real interest rate equal to 0.015, which is approximately equal to the average of the realized real interest rate over the twentieth century. We set tc at the statutory rate of 0.35. The goal is to estimate b by matching a set of simulated moments, denoted as h(yik(b)), with the corresponding set of actual data moments, denoted as h(xi). The candidates for the moments to be matched include simple summary statistics, Ordinary Least Squares (OLS) regression coefficients, and coefficient estimates from non-linear reducedform models. Define " # n K X X gn ðbÞ ¼ n1 hðxi ÞK 1 hðyik ðbÞÞ : i¼1
k¼1
The simulated moments estimator of b is then defined as the solution to the minimization of ^ n gn ðbÞ, b^ ¼ argmin gn ðbÞuW b
^ n is a positive definite matrix that converges in in which W probability to a deterministic positive definite matrix W. In our application, we use the inverse of the sample covariance matrix of the moments, which we calculate using the influence-function approach in Erickson and Whited (2000). The simulated moments estimator is asymptotically normal for fixed K. The asymptotic distribution of b is given by pffiffiffi ^ d ^ nðbbÞ!N ð0,avarðbÞÞ in which 1 ^ 1þ 1 @gn ðbÞ W @gn ðbÞ avarðbÞ K @b @bu 1 @gn ðbÞ @gn ðbÞ @gn ðbÞ @gn ðbÞ W OW W , @b @bu @b @bu ð13Þ ^ n as n-1, and in in which W is the probability limit of W which O is the probability limit of a consistent estimator of the covariance matrix of h(xi). We set W O1 . The success of this procedure relies on picking moments h that can identify the structural parameters b. In other words, the model must be identified. Global identification of a simulated moments estimator obtains when the expected value of the difference between the simulated moments and the data moments equals zero if and only if the structural parameters equal their true values. A sufficient condition for identification is a one-toone mapping between the structural parameters and a subset of the data moments of the same dimension. Although our model does not yield such a closed-form mapping, we take care in choosing appropriate moments to match, and we use a minimization algorithm, simulated annealing, that avoids local minima. We pick the following 12 moments to match. Because the firm’s real and financial decisions are intertwined, all of the model parameters affect all of these moments in
260
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
some way. We can, nonetheless, categorize the moments roughly as representing the real or financial side of the firm’s decision-making problem. The first of the nonfinancial or ‘‘real’’ moments are the first and second moments of the rate of investment, defined in the simulation as I/k, and defined in Compustat as the sum of items 128 and 129 divided by item 7.12 Average investment helps identify the adjustment cost parameters, a and g, because smooth investment tends to be less skewed than lumpy investment. Therefore, the mean is lower because it tends to lie nearer the median than the upper percentiles of the distribution of investment. The variance helps identify both the curvature of the profit function, y, and the adjustment cost parameters. Lower y, higher a, and lower g produce less volatile investment. The next moment is the skewness of the rate of investment, which helps identify the fixed adjustment cost parameter, g. Higher values of this parameter lead to more intermittent, and thus more skewed investment. The next moment, average operating income, is primarily affected by the curvature of the profit function. This relation can be seen by the definition of y simulated operating income as zk =k: the higher y, the higher average operating income. Our next two moments capture the important features of the driving process for z. Here, we estimate a first-order panel autoregression of operating income on lagged operating income, in which actual operating income is defined as the ratio of Compustat items 13 and 6. The two moments that we match from this exercise are the autoregressive coefficient and the shock variance. Our next moment is the mean of Tobin’s q. Simulated Tobin’s q is constructed as (V+p)/k and actual Tobin’s q is constructed following Erickson and Whited (2000). All model parameters affect the mean of q. The remaining moments pertain to the firm’s financing decisions. The first two are the mean and second moment of the ratio of debt to assets. In our simulation, debt is defined as d/k, and in Compustat, this variable is defined as items 9 plus 34, all divided by item 6. All of the parameters in the model affect these two moments. The next two moments are average equity issuance and the variance of equity issuance. In the model, equity issuance is defined as e/k and in Compustat, it is defined as the ratio of items 108 and 6. These two moments help identify the two equity adjustment cost parameters, l1 and l2 . Our final moment is the ratio of cash to assets. In our simulations it is defined as c/k, conditional on c 4 0, and in Compustat, it is defined as the ratio of item 1 to item 6. This moment helps identify the agency cost parameter. Because our moment vector consists of separately estimated regression coefficients and first through third moments, we use the influence-function approach in Erickson and Whited (2000) to calculate the covariance matrix of the moment vector. Specifically, we stack the influence functions for each moment and then form the covariance matrix by taking the inner product of this stack. 12 We define investment this way because our model allows for the optimality of lumpy investment. Therefore, we can allow for a much more general definition of investment than that in Hennessy and Whited (2005, 2007).
One final issue is unobserved heterogeneity in our data from Compustat. Recall that our simulations produce i.i.d. firms. Therefore, in order to render our simulated data comparable to our actual data we can either add heterogeneity to the simulations, or remove the heterogeneity from the actual data. We opt for the latter approach, using fixed firm and year effects in the estimation of our regression-based data moments and our estimates of variance and skewness.
Appendix B The model that includes endogenous default replaces the upper bound on leverage, p, with the following mechanism, which is similar to that in Hennessy and Whited (2007) and Cooley and Quadrini (2001), except that physical adjustment costs prevent the firm from costlessly transforming capital into liquid assets. The presence of physical adjustment costs complicates slightly what happens to the firm when it defaults, that is, when equity value reaches zero. The endogenous default schedule is then defined implicitly by the equation V(k,p,z) = 0. In the event of default, debtholders seize the firm’s profits and almost all of its capital stock, less any applicable adjustment costs and less a fraction, x, of the capital stock that can be thought of as deadweight default costs. Because physical adjustment costs are a function of the rate of investment, they are not well-defined for a firm with a zero capital stock. We therefore leave the firm with the smallest capital stock in the discrete grid described by (12), k, and require the firm to pay the amount ð1xÞð1dÞk in cash to the debtholders. The debtholders’ recovery in default (R) is equal to Rðku,zuÞ ¼ ð1xÞð1dÞðkukÞ þ ð1tc ÞðzupðkuÞdkuÞAðku,kÞ þ ð1xÞð1dÞk
ð14Þ
¼ ð1xÞð1dÞku þ ð1tc ÞðzupðkuÞdkuÞAðku,kÞ:
ð15Þ
As an approximation to the U.S. tax code, this formulation of the debtholders’ recovery assumes that in the event of default, interest deductions on the debt obligation are disallowed. The interest rate on debt, rd, is determined endogenously via a zero-profit condition for the debtholders. Let Zd ðku,pu,zÞ be the set of states in which the firm defaults, as a function of ku, pu, and the current state z. Similarly, let Zs ðku,pu,zÞ be the set of states in which the firm remains solvent. The interest rate, rd ðku,pu,zÞ, is then defined by Z
Z Zd ðku,pu,zÞ
Rðku,zuÞdgðzu,zÞ þ ð1 þ rd ðku,pu,zÞÞpu
dgðzu,zÞ ¼ ð1þ rÞpu: Zs ðku,pu,zÞ
In words, debtholders expect over all states to earn the risk-free rate. For a proof of the existence of a solution to this class of models, see Hennessy and Whited (2007). In this model debt does not have an arbitrary upper bound, but the higher interest rate charged by debtholders limits the optimal amount of debt chosen by the firm.
H. DeAngelo et al. / Journal of Financial Economics 99 (2011) 235–261
References Altinkilic, O., Hansen, R., 2000. Are there economies of scale in underwriting fees? Evidence of rising external financing costs. Review of Financial Studies 13, 191–218. Bolton, P., Chen, H., Wang, N., 2009. A unified theory of Tobin’s q, corporate investment, financing, and risk management. Unpublished working paper. Columbia University. Brennan, M., Schwartz, E., 1984. Optimal financial policy and firm valuation. Journal of Finance 39, 593–607. Caballero, R., 1999. Aggregate investment. In: Taylor, J., Woodford, M. (Eds.), Handbook of Macroeconomics, vol. 1B. North-Holland, Amsterdam, pp. 813–862. Caballero, R., Leahy, J., 1996. Fixed costs: the demise of marginal q. Unpublished working paper. NBER. Carlstrom, C., Fuerst, T., 1997. Agency costs, net worth, and business fluctuations: a computable general equilibrium analysis. American Economic Review 87, 893–910. Chang, X., Dasgupta, S., 2003. Financing the deficit: debt capacity, information asymmetry, and the debt–equity choice. Unpublished working paper. Hong Kong University of Science and Technology. Cooley, T., Quadrini, V., 2001. Financial markets and firm dynamics. American Economic Review 91, 1286–1310. Cooper, R., Haltiwanger, J., 2006. On the nature of capital adjustment costs. Review of Economic Studies 73, 611–634. Cooper, R., Willis, J., 2004. A comment on the economics of labor adjustment: mind the gap. American Economic Review 94, 1223–1237. DeAngelo, H., Masulis, R., 1980. Optimal capital structure under corporate and personal taxation. Journal of Financial Economics 8, 3–29. Denis, D., McKeon, S., 2009. Financial flexibility and capital structure policy: evidence from proactive leverage increases. Unpublished working paper, Purdue University. Dudley, E., 2009. Capital structure and large investment projects. Unpublished working paper. University of Florida. Eisfeldt, A., Rampini, A., 2006. Financing shortfalls and the value of aggregate liquidity. Unpublished working paper. Northwestern University. Erickson, T., Whited, T., 2000. Measurement error and the relationship between investment and q. Journal of Political Economy 108, 1027–1057. Fama, E., French, K., 2002. Testing tradeoff and pecking order predictions about dividends and debt. Review of Financial Studies 15, 1–33. Fischer, E., Heinkel, R., Zechner, J., 1989. Dynamic capital structure choice: theory and tests. Journal of Finance 44, 19–39. Flannery, M., Rangan, K., 2006. Partial adjustment toward target capital structure. Journal of Financial Economics 79, 469–506. Frank, M., Goyal, V., 2009. Capital structure decisions: which factors are reliably important? Financial Management 38, 1–37. Gamba, A., Triantis, A., 2008. The value of financial flexibility. Journal of Finance 63, 2263–2296. Goldstein, R., Ju, N., Leland, H., 2001. An EBIT-based model of dynamic capital structure. Journal of Business 74, 483–511. Gomes, J., 2001. Financing investment. American Economic Review 91, 1263–1285. Graham, J., 2000. How big are the tax benefits of debt? Journal of Finance 55, 1901–1941. Harford, J., Klasa, S., Walcott, N., 2008. Do firms have leverage targets? Evidence from acquisitions. Journal of Financial Economics 93, 1–14. Hennessy, C., Whited, T., 2005. Debt dynamics. Journal of Finance 60, 1129–1165. Hennessy, C., Whited, T., 2007. How costly is external financing? Evidence from a structural estimation. Journal of Finance 62, 1705–1745. Hovakimian, A., 2004. The role of target leverage in security issues and repurchases. Journal of Business 77, 1041–1071. Hovakimian, A., Opler, T., Titman, S., 2001. The debt–equity choice. Journal of Financial and Quantitative Analysis 36, 1–24. Ingram, B., Lee, B., 1991. Simulation estimation of time-series models. Journal of Econometrics 47, 197–205.
261
Jaffee, D., Russell, T., 1976. Imperfect information and credit rationing. Quarterly Journal of Economics 90, 651–666. Jensen, M., 1986. The agency costs of free cash flow: corporate finance and takeovers. American Economic Review 76, 323–329. Kane, A., Marcus, A., McDonald, R., 1984. How big is the tax advantage to debt? Journal of Finance 39, 841–853. Kayhan, A., Titman, S., 2007. Firms’ histories and their capital structures. Journal of Financial Economics 83, 1–32. Keynes, J., 1936. The General Theory of Employment, Interest and Money. Harcourt Brace, London. Kraus, A., Litzenberger, R., 1973. A state preference model of optimal financial leverage. Journal of Finance 28, 911–921. Leary, M., Roberts, M., 2005. Do firms rebalance their capital structures? Journal of Finance 60, 2575–2619. Lemmon, M., Roberts, M., Zender, J., 2008. Back to the beginning: persistence and the cross-section of corporate capital structure. Journal of Finance 63, 1575–1608. Lemmon, M., Zender, J., 2007. Debt capacity and tests of capital structure theories. Unpublished working paper. University of Utah and University of Colorado at Boulder. Lintner, J., 1956. Distribution of incomes of corporations among dividends retained earnings and taxes. American Economic Review 46, 97–113. Mauer, D., Triantis, A., 1994. Interaction of corporate financing and investment decisions: a dynamic framework. Journal of Finance 49, 1253–1277. Mayer, C., Sussman, O., 2004. A new test of capital structure. Unpublished working paper. University of Oxford. Michaelides, A., Ng, S., 2000. Estimating the rational expectations model of speculative storage: a Monte Carlo comparison of three simulation estimators. Journal of Econometrics 96, 231–266. Miller, M., 1977. Debt and taxes. Journal of Finance 32, 261–275. ¨ Morellec, E., Schurhoff, N., 2010. Dynamic investment and financing under personal taxation. Review of Financial Studies 23, 101–146. Myers, S., 1984. The capital structure puzzle. Journal of Finance 39, 575–592. Myers, S., Majluf, N., 1984. Corporate financing and investment decisions when firms have information that investors do not have. Journal of Financial Economics 13, 187–221. Parsons, C., Titman, S., 2008. Empirical capital structure: a review. Foundations and Trends in Finance 3, 1–93. Riddick, L., Whited, T., 2008. The corporate propensity to save. Journal of Finance 64, 1729–1766. Robichek, A., Myers, S., 1966. Problems in the theory of optimal capital structure. Journal of Financial and Quantitative Analysis 1, 1–35. Smith, C., Warner, J., 1979. On financial contracting: an analysis of bond covenants. Journal of Financial Economics 7, 117–161. Stiglitz, J., Weiss, A., 1981. Credit rationing in markets with imperfect information. American Economic Review 71, 393–410. Stokey, N., Lucas, R., 1989. Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge, MA. Strebulaev, I., 2007. Do tests of capital structure theory mean what they say? Journal of Finance 62, 1747–1787. Stulz, R., 1990. Managerial discretion and optimal financing policies. Journal of Financial Economics 26, 3–27. Sundaresan, S., Wang, N., 2006. Dynamic investment, capital structure, and debt overhang. Unpublished working paper. Columbia University. Tauchen, G., 1986. Finite state Markov-chain approximations to univariate and vector autoregressions. Economics Letters 20, 177–181. Titman, S., Tsyplakov, S., 2007. A dynamic model of optimal capital structure. Review of Finance 11, 401–451. Tserlukevich, Y., 2008. Can real options explain financing behavior? Journal of Financial Economics 89, 232–252. Viswanath, P., 1993. Strategic considerations, the pecking order hypothesis and market reactions to equity financing. Journal of Financial and Quantitative Analysis 28, 213–234. Whited, T., 1992. Debt, liquidity constraints, and corporate investment: evidence from panel data. Journal of Finance 47, 1425–1460. Whited, T., 2006. External finance constraints and the intertemporal pattern of intermittent investment. Journal of Financial Economics 81, 467–502.
Journal of Financial Economics 99 (2011) 262–288
Contents lists available at ScienceDirect
Journal of Financial Economics journal homepage: www.elsevier.com/locate/jfec
Corporate investment and financing under asymmetric information$ b,1 ¨ Erwan Morellec a,, Norman Schurhoff a b
´rale de Lausanne (EPFL), Swiss Finance Institute, and CEPR Ecole Polytechnique Fe´de Faculty of Business and Economics at University of Lausanne, Swiss Finance Institute, and CEPR
a r t i c l e i n f o
abstract
Article history: Received 29 September 2009 Received in revised form 5 March 2010 Accepted 11 March 2010 Available online 7 September 2010
We develop a dynamic model of corporate investment and financing decisions in which corporate insiders have superior information about the firm’s growth prospects. We show that firms with positive private information can credibly signal their type to outside investors using the timing of corporate actions and their debt-equity mix. Using this result, we show that asymmetric information induces firms with good prospects to speed up investment, leading to a significant erosion of the option value of waiting to invest. Additionally, we demonstrate that informational asymmetries may not translate into a financing hierarchy or pecking order over securities. Finally, we generate a rich set of testable implications relating firms’ investment and financing strategies, abnormal announcement returns, and external financing costs to a number of managerial, firm, and industry characteristics. & 2010 Elsevier B.V. All rights reserved.
JEL classification: G14 G31 G32 Keywords: Asymmetric information Financing decisions Endogenous financing constraints Corporate investment Real options
1. Introduction Ever since Myers and Majluf (1984) showed that adverse selection could induce firms to bypass profitable projects and lead to a pecking order among securities,
$ First draft: June 2007. We thank the referee, Toni M. Whited, for very constructive comments on the paper. We also thank Giovanni Favara, Ruediger Fahlenbrach, Hoang Ngoc Giang, Bart Lambrecht, Ernst Ludwig von Thadden, Alexei Zhdanov, and seminar participants at the University of Mannheim for helpful comments. Both authors acknowledge financial support from the Swiss Finance Institute and from NCCR FINRISK of the Swiss National Science Foundation. This paper was previously circulated under a different title. Corresponding author. Postal: Swiss Finance Institute at EPFL, Extranef 210, Quartier UNIL-Dorigny, CH-1015 Lausanne, Switzerland. Tel.: + 41 21 693 0116. E-mail addresses: erwan.morellec@epfl.ch (E. Morellec), ¨
[email protected] (N. Schurhoff). 1 Postal: Universite´ de Lausanne, Ecole des HEC, Extranef 239, CH-1015 Lausanne, Switzerland. Tel.: + 41 21 692 3447.
0304-405X/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jfineco.2010.09.003
distortions in investment and financing policies resulting from informational asymmetries have been the subject of considerable research in corporate finance. Although we have learned much from this work, virtually all of the existing models are static and focus either on investment or on financing decisions. This has made it difficult to develop tests of the connection between investment and finance and, to date, empiricists have struggled identifying the effects of asymmetric information on corporate policy choices. In this paper, we advance the literature by developing a dynamic model of investment and financing with endogenous financing constraints arising from adverse selection. We then use this dynamic model to shed light on existing empirical results, generate a rich set of testable predictions, and offer insights and implications as to why the pecking order is not strictly observed empirically. A prerequisite for our study is a model that captures in a simple fashion the effects of asymmetric information on
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
firms’ policy choices. In this paper, we base our analysis on a dynamic real options model in which firms’ investment and financing strategies are jointly and endogenously determined. While standard real options models assume that firms have enough resources to fund investment or that the capital market has unlimited access to information, we consider instead an environment in which firms need to raise outside funds from uninformed investors to finance capital expenditures. Our paper addresses a set of key questions in corporate finance. First, how does investment policy reflect the informational advantage of corporate insiders? Second, how does asymmetric information affect financing decisions, i.e., the debt-equity mix and the cost of external funds? Third, how do investment and financing decisions interact and what are the factors that drive these interactions? We consider, as in McDonald and Siegel (1986), a firm that has a valuable real investment opportunity. In order to undertake the investment project, the firm needs to raise outside funds by issuing securities. The firm has flexibility in the timing of its investment and financing decisions and can choose to issue debt or equity. The investment project, once completed, produces a stochastic stream of cash flows that depend on firm type. There are two types of firms in the economy: good type (high cash flow) firms and bad type (low cash flow) firms. Firm types are private information, so that insiders know more about the value of the firm’s investment opportunities than potential investors. When making investment and financing decisions, management acts in the best interests of the incumbent stockholders. The model demonstrates that while under perfect information different types of firms choose different investment policies and issue fairly priced claims, this need not be the case when outside investors are imperfectly informed about the firms’ growth prospects. With asymmetric information, the low type has incentives to mimic the high type and sell overpriced securities. Hence, in a pooling equilibrium in which all firms raise funds and invest at the same time, asymmetric information reduces (increases) the value of high-type (low-type) firms and increases (reduces) their cost of investment. This forces good firms to delay investment and bad firms to speed up investment compared to the perfect information benchmark. Because asymmetric information raises the cost of external funds for good firms, these firms may try to separate by imposing mimicking costs on bad firms. We show in the paper that good firms can separate from bad firms by changing their investment and financing policy. Notably, we are the first to show that by accelerating investment (i.e. by reducing the value of the project at the time of investment) firms with good prospects can eliminate the benefits of mimicking for other firms and signal their positive information to outside investors. Although these distortions in investment policy have a cost, they allow good firms to obtain better terms for the claims they issue. We show that when the cost of distorting investment is lower than the underpricing cost due to adverse selection, firms with positive private information will choose to invest early to
263
signal their type. That is, a central message from our analysis is that informational asymmetries imply investment behavior that differs substantially from that of standard real options models with perfect information. The possibility for firms to signal their type through the timing of investment also has important implications for capital structure decisions. Static signaling models usually predict that when outside funds are necessary, firms prefer debt to equity because of the lower information costs associated with debt issues. While this pecking order hypothesis should perform best among firms that face particularly severe adverse selection problems, Frank and Goyal (2003) find that small high-growth firms often issue equity in lieu of issuing debt (see also Helwege and Liang, 1996; Leary and Roberts, 2010). Our model reveals that firms can signal their private information to investors using the timing of corporate actions and, thus, that they can find ways to issue equity that avoid adverse selection costs, as conjectured by Fama and French (2005). As a result, asymmetric information may not translate into a preference ranking over securities. In particular, one implication of our analysis is that equity issues can be more attractive than debt issues even for firms with ample debt capacity, consistent with the evidence in Leary and Roberts (2010). Our theory of corporate investment and financing differs from existing contributions in three important respects. First, unlike most dynamic models of investment and costly external finance, financing constraints are endogenous in our framework, arising from adverse selection. Second, unlike most asymmetric information models, we consider dynamics. Third, we endogenize both investment and financing decisions. These unique features allow us to generate a rich set of testable predictions about firms’ investment rates, abnormal announcement returns, the probability of project failure after investment, and external financing costs. We highlight the main empirical implications. First, our model predicts that adverse selection should lead firms to accelerate investment. Additionally, we find that firms with a higher market-to-book ratio or growthpotential should invest more readily. By contrast, cash flow volatility and operating leverage should diminish investment propensities. Another specific prediction of our model is that the dispersion in industry investment rates should be lower in industries that are more heavily debt financed and higher in industries with higher cash flow volatility or operating leverage. Second, our theory predicts that the information released at the time of investment should trigger a positive jump in the value of the good type, consistent with the finding of McConnell and Muscarella (1985) that unexpected increases in investment lead to increases in stock prices. Another prediction of the model is that abnormal returns should be higher with debt financing than with equity financing, consistent with Masulis (1983). We also show that positive abnormal announcement returns following increases in capital expenditures should be limited to firms with good investment opportunities, as documented by Chan, Gau, and Wang (1995). A novel testable implication of the model is that,
264
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
independently of the financing strategy of the firm, abnormal announcement returns should (i) decrease with the growth rate and volatility of the firm’s cash flow shock and (ii) increase with the growth differential between types, i.e., with the degree of valuation uncertainty prior to investment. Third, since adverse selection problems are more severe for young, high-growth firms, another specific prediction of the model is that these firms will invest sooner so that their investment projects will have a greater likelihood of turning out poorly. Importantly, our model generates unique predictions about the probability of project failure after investment. Notably we find that this probability should be negatively related to the size of abnormal announcement returns at the time of investment or to the degree of valuation uncertainty prior to investment. Finally, our model with endogenous financing constraints allows us to quantify financing costs and relate these to various firm and industry characteristics. We show that the cost of debt and equity are not constant, as often assumed in the literature on exogenous financing constraints. We show they are not even monotonic functions of the model parameters. For example, while worsening adverse selection may reduce investment in a model with exogenous financing constraints, we find that in our model it may encourage investment and discourage the use of debt. The present paper relates to several contributions in the literature. Myers and Majluf (1984) are the first to analyze the effects of asymmetric information on investment and financing decisions. A key assumption in their model is that ‘‘the project evaporates if the firm does not go ahead at time t = 0’’ (p. 190). This paper considers instead that the firm has flexibility in the timing of investment. Hennessy, Livdan, and Miranda (HLM, 2010) examine investment and financing decisions in a model with repeated signaling and short-lived private information. HLM find that while bad types do not use debt (and so not all firms adhere to the cash-debt-equity pecking order), good types in their model always use the least information-sensitive financing vehicle, i.e., debt over equity. In HLM, firms signal through financing only. In our model, good firms can signal their type through the timing of investment. This can lead them to prefer equity over debt. Grenadier and Wang (2005) develop a real options model to examine the effects of moral hazard on investment timing. Our paper differs from theirs in two respects. First, we consider that firms have to raise funds to invest. Second, we abstract from owner-manager conflicts and focus instead on insider-outsider conflicts, as in Myers and Majluf (1984). These differences have important implications for equilibrium investment strategies. Notably, while Grenadier and Wang find that moral hazard leads to late investment, we show that adverse selection leads to early investment. Grenadier and Malenko (2010) present a variant of our setup with a continuum of types and show our main qualitative results are robust. In their analysis, Grenadier and Malenko constrain firms to finance the capital expenditure with equity. In addition, they focus exclusively on separating
equilibria. Our analysis of investment and financing decisions and of pooling equilibria allows us to determine in which economic environments it will be optimal for firms to distort investment (rather than pool or distort financing) and to generate a rich set of testable implications on firms’ policy choices. Another difference between the two models is that Grenadier and Malenko have an agency component (with a managerial contract that is specified exogenously), making it more difficult to interpret their results. Finally, our paper also relates to the line of research that studies the magnitude of investment and financing distortions due to conflicts of interest between inside equityholders and outside investors (see Mello and Parsons, 1992; Morellec, 2001; Hennessy and Whited, 2007; Sundaresan and Wang, 2007; Hackbarth and Mauer, ¨ 2010; Morellec and Schurhoff, 2010). None of these papers have examined the effects of adverse selection on the cost of external finance and firms’ investment and financing strategies. The paper is organized as follows. Section 2 describes the model. Section 3 explores the effects of asymmetric information on firms’ equilibrium investment strategies. Section 4 introduces debt financing. Section 5 develops the model’s empirical predictions. Section 6 concludes. Technical developments are gathered in the Appendices. 2. Model and assumptions This paper considers a firm that must issue securities to invest in a risky project. Management knows more about project quality than potential investors. The firm has discretion over the timing of investment as well as the timing and type of security issuance. Investors interpret the firm’s actions rationally and use Bayes’ rule to update their beliefs. An equilibrium model of the issue-invest decision is developed under these assumptions. 2.1. Setup The model is an adaptation of Myers and Majluf (1984) and McDonald and Siegel (1986). Throughout the paper, financial markets are competitive. Agents are risk neutral and discount cash flows at a constant rate r. We consider a set of infinitely lived firms, each of which has monopoly rights to an investment project. The direct cost of investment is constant, denoted by I, and investment is irreversible. The project, once completed, produces a continuous stream of cash flows. We assume that the level of cash flows depends on firm type, which is indexed by k. Specifically, at any time t after investment, a firm of type k generates a profit flow given by Lk Xt f , where Lk 4 0 is known to corporate insiders only, f 40 represents constant operating expenses, and Xt is an observable cash flow shock that evolves according to: dX t ¼ mXt dt þ sXt dZ t ,
X0 40:
ð1Þ
In this equation, the growth rate m or and volatility s 4 0 of the cash flow shock are constant and ðZt Þt Z 0 is a standard Brownian motion. We consider that there are two types of firms, high-growth (good type g) and
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
low-growth (bad type b) firms, so that Lk has discrete sample space fLb , Lg g, with Lg 4 Lb 40 and PrðLk ¼ Lg Þ ¼ p 2 ð0,1Þ.1 Before investment, firm types are private information, i.e., are known to corporate insiders only. The initial capital structure of each firm consists of nk = 1 share of common equity. To fund the project the firm sells risky debt or new equity. Following Myers and Majluf (1984), we assume that when making investment and financing decisions, management acts in the old stockholders’ interests by maximizing the intrinsic value of existing shares, which equals the selling price of the shares when investors have full information. We also assume that when the firm issues shares, old stockholders are passive so that the issue goes to a different group of investors. We denote by nkþ ¼ 1 þ Dnk the number of shares outstanding after the round of financing and by c the selected debt coupon payment. When the capital outlay is financed with risky debt, the decision to default is endogenous and chosen by shareholders. In default, bankruptcy costs consume a fraction a 2 ð0,1 of the firm’s revenue stream.
2.2. Investment timing under symmetric information Before analyzing the effects of asymmetric information on equilibrium investment strategies, we start by reviewing the benchmark case in which all agents have full information about the firms’ investment projects. Since debt financing induces deadweight costs of bankruptcy and claims are fairly priced, it is optimal for firms to finance the capital expenditure by issuing Dnk shares of common equity in this full information benchmark. Denote by Vk the value of type k’s project before investment and by PðxÞ the present value of a perpetual stream of cash flows X starting at X0 = x: Z 1 x PðxÞ ¼ E ert Xt dtjX0 ¼ x ¼ : ð2Þ rm 0 Similarly, denote by F the present value of operating R1 expenses, i.e., F ¼ 0 ert f dt ¼ f =r. Because the firm does not generate any income before investment, old share holders are only entitled to the capital gain E½dV k over each time interval dt. The required rate of return for investing in the firm’s equity is r. Applying Itˆo’s lemma, it is then immediate to show that the value of equity before investment satisfies: rV k ¼
@V 1 2 2 @2 Vk s X þ mX k 2 @X @X 2
for k ¼ g,b:
ð3Þ
This equation is solved subject to the following boundary conditions. First, the value of equity at the time of investment is equal to the payoff from investment (value-matching): Vk ðXÞjX ¼ X ¼ Lk PðX k ÞFI, where X k k is the threshold selected by type k= g,b.2 In addition, to 1
This assumption is not crucial and can easily be relaxed (see Grenadier and Malenko, 2010). 2 For now, we ignore the option to abandon assets to keep the analysis tractable. We will examine the effects of exit/default on
265
ensure that investment occurs along the optimal path, the value of equity satisfies the smooth-pasting condition: @Vk =@XjX ¼ X ¼ Lk @PðXÞ=@XjX ¼ X at the endogenous k
k
investment threshold (see Dixit and Pindyck, 1994). Finally, as the value of the cash flow shock tends to zero, the option to invest becomes worthless so that limX-0 Vk ðXÞ ¼ 0. Solving this optimization problem yields the following expression for equity value in the perfect information benchmark (all proofs are relegated to the Appendices): x X Vk ðXÞ ¼ ½Lk PðX k ÞFI , Xk
ð4Þ
where the value-maximizing investment threshold X k is given by Xk ¼
x rm ðF þ IÞ for k ¼ g,b, x1 Lk
and x ¼ ðs2 =2mÞ=s2
þ
ð5Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½ðs2 =2mÞ=s2 2 þ 2r=s2 4 1.
Eq. (4) shows that equity value can be written as the product of the surplus created by investment (term in brackets) and the present value of $1 contingent on investment, given by ðX=X k Þx . The investment threshold in Eq. (5) reflects the option value of waiting through the factor x=ðx1Þ. If this option had no value, shareholders would follow the simple net present value (NPV) rule, 0
according to which one should invest as soon as X Z X k ¼ ðrmÞðF þIÞ=Lk . Importantly, since X g o X b , high-type firms invest before low-type firms in the full information benchmark. Finally, under perfect information, outside (equity) financing is costless and the firm achieves the same value as if it was financing the capital expenditure itself. The number of shares issued by type k is defined by the budget constraint: Dnk ½Lk PðX k ÞF ¼ Ið1 þ Dnk Þ. 2.3. Investment timing and signaling While under perfect information different types of firms choose different investment thresholds and issue fairly priced claims, this need not be the case when outside investors are imperfectly informed about firms’ growth prospects. Indeed, since Lg 4 Lb , we have Vg 4 Vb and there is an incentive for the bad type to sell overpriced securities by means of mimicking the good type’s behavior. In a pooling equilibrium, in which all firms invest at the same time and have the same market value, asymmetric information imposes costs on hightype firms—they need to dilute their equity stake more than they would otherwise. As a result, good type firms may try to separate by imposing mimicking costs on bad type firms. In the following, we show under which conditions this is feasible. (footnote continued) equilibrium investment and financing policies when we introduce debt. Any intermittent cash shortfalls can be covered through external equity without frictional costs as any asymmetric information is resolved by the time the firm’s productive assets operate.
266
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
Suppose the firm invests at date t for a value Xt of the cash flow shock. Let the firm type perceived by investors be L, Lb r L r Lg . Since investors need to break even in expectation, investors’ beliefs about firm type determine the number of shares, DnðXt ; LÞ, that need to be issued at the time of financing. The required number of shares solves the budget constraint: DnðXt ; LÞ½LPðXt ÞF= n þ ðXt ; LÞ ¼ I, the solution to which is given by
DnðXt ; LÞ ¼
I
LPðXt ÞFI
:
ð6Þ
Eq. (6) reveals that the benefit of being perceived as a higher type L translates into a larger equity stake for incumbents. More generally, it shows that the higher the type L and the larger the investment trigger X, the lower the ownership dilution. Proposition 1 shows under which conditions the first effect dominates the second so that the single-crossing (or Spence-Mirlees) condition holds (see Appendix B for a proof): Proposition 1. High-type firms find it less costly to distort investment than low-type firms so long as f 40, such that the single-crossing property holds globally: 0
1 @ V ðX; X , L Þ k B C @ B@L C40 A @Lk @ @ Vk ðX; X , LÞ @X
for all ðL,XÞ:
ð7Þ
When deciding whether to mimic or separate, each firm type balances investment distortions with ownership dilution. As a result, the possibility for the good type to separate from the bad type relates to each type’s willingness to exchange equity stakes for changes in the investment threshold. The single-crossing condition in Proposition 1 asserts that firm type affects the marginal rate of substitution between investment distortions and ownership dilution in a systematic way. In particular, the elasticity between the competitively required ownership dilution DnðX ; LÞ and investment threshold X depends negatively on the type k so long as f Z0. This implies that the high type may find it worthwhile to speed up investment and still realize a positive NPV on the project, while the bad type may face a negative NPV project at the same investment threshold. As a result, financial markets can reasonably view the timing of investment as a valid signal for project or firm quality. For a general exercise value, vðL,XÞ say, the singlecrossing condition generalizes to the requirement that the elasticity with respect to the investment threshold, v2 ðLk ,X ÞX =vðLk ,X Þ, is decreasing in firm type Lk or, equivalently, vðLk ,X Þ rv1 ðLk ,X Þv2 ðLk ,X Þ=v12 ðLk ,X Þ, where the subscript i denotes the partial derivative with respect to the ith argument. That is, when altering the timing of investment, the high type’s value needs to drop by less than the low type’s for it to act as a valid signal, which in this case occurs if and only if f 4 0. In the remainder of the paper, we consider that operating expenses f are positive so that the timing of corporate actions represents a valid signal.
3. Signaling through investment timing 3.1. Investment timing in the separating equilibrium Assume for now that the capital outlay is financed by issuing equity. Our objective in this section is to show that there exists a timing of investment (for the good type) that makes it possible to sustain a separating equilibrium in which the two types of firms choose different investment thresholds and issue fairly priced claims. The mechanism underlying the equilibrium is a simple one. When deciding whether to mimic or not, the low-type firm balances the overpricing of the shares (positive effect) with the reduction in intrinsic value due to the change in investment policy (negative effect). By speeding up investment, the high-type firm reduces the value of the project at the time of investment and, hence, reduces the benefits of mimicking for the bad type. The question we want to address is whether there exists an investment threshold such that the good type finds it profitable to invest and the bad type does not find it profitable to mimic. To determine whether there exists a separating equilibrium, we first need to check the incentive compatibility constraint (ICC) of the bad type. Suppose that the good type invests at date t for a value Xt of the cash flow shock. If the bad type mimics the investment behavior of the good type, the value of the old shareholders’ claim in the bad firm after investment is given by
Lb PðXt ÞF L PðXt ÞF ¼ b ½L PðXt ÞFI, 1 þ DnðXt ; Lg Þ Lg PðXt ÞF g
ð8Þ
since by mimicking, the bad type only needs to issue a number of shares equal to DnðXt ; Lg Þ (given by Eq. (6)) to finance the capital expenditure. This equation shows that mimicking the good type reduces both the cost (equity dilution) and benefits (NPV) of investment. Instead of mimicking the good type, the bad type can follow its firstbest strategy under perfect information, i.e., raise equity and invest when the cash flow shock reaches X b . The bad type firm prefers mimicking the good type at X rX b if:
Lb PðXÞF X ½L PðXÞFI Z ½Lb PðX b ÞFI Lg PðXÞF g Xb
x
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
¼ Vb ðXÞ : |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
Value when mimicking
Real option value 4 0
ð9Þ At the zero-NPV threshold for the good type, 0 X g ¼ ðF þIÞ=PðLg Þ, the left-hand side of Eq. (9) is equal to zero whereas the right-hand side is positive (being an option value). In this case, it is better for the bad firm to wait and not mimic the good firm. At the valuemaximizing investment threshold of the bad type, X b , the left-hand side of Eq. (9) is larger than the right-hand side (since I 40Þ. In this case, it is better for the bad firm to mimic the good firm. These observations, combined with the strict monotonicity of Vb (X), imply there exists a 0 unique value X of the cash flow shock, with X g o X oX b , such that good firms can separate from bad firms by raising funds and investing before the cash flow shock %
%
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288 %
exceeds this value. The critical threshold X is given by the solution to Eq. (9).3 To determine whether investing at or below X is an equilibrium strategy, we need to verify incentive compatibility of the good type. The following incentive compatibility constraint is a necessary condition for the good type to separate from the bad type at X rX b : %
Lg PðX b ÞF X Z 1 þ DnðX b ; Lb Þ X b
Lg PðXÞFI
|fflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflffl}
Value in separating equilibrium
x ð10Þ
Value when mimicking 4 0
The threshold X min for which ICC (10) is binding represents the lowest value of the cash flow shock such that the good type prefers separation over mimicking. Since the value of the good type when mimicking is strictly positive, the separating investment threshold cannot be too close to the zero-NPV threshold. A separating equilibrium exists only if X min rX . By the optimality of X g in the absence of information asymmetry, we also have X min r X g . Not all of the incentive compatible allocations X 2 ½X min ,X necessarily constitute a Perfect Bayesian Equilibrium (PBE). A sufficient condition for a feasible threshold X to be a PBE is that the good type has no incentive to defect to any other allocation X given a set of out-of-equilibrium beliefs LðXÞ. It is straightforward to show that this is the case for all X 2 ½X ðLÞ,X where X min r X ðLÞ rX . Using the incentive compatibility constraints of the good and bad types, it is immediate to establish the following result (proofs are relegated to Appendix C): %
%
%
%
Proposition 2. (i) There exists a separating equilibrium in which firms issue fairly priced claims and invest so long as f 4 0. (ii) In the least-cost separating equilibrium, good firms invest at the lower of the thresholds X ð ZX min Þ and X g , while bad firms invest at their first-best threshold X b . The market value of each firm before investment is independent of project quality and satisfies for X oX 4X g : %
%
%
%
if X o X g , %
ð11Þ otherwise,
b
where Lpool ¼ pLg þ ð1pÞLb and the market value under perfect information is defined in Eq. (4). The intrinsic value of the bad and good firms before investment are given by Vb (X)
3 We can restrict attention to values of the cash flow shock X r X b since all allocations with X 4 X b are incentive incompatible and Paretodominated by corresponding allocations with X r X b . Consider the case that condition (9) holds for all X Z X 4X b such that the good type can separate by investing past X . In this strategy profile, the bad type moves before the good type. After the bad type has moved, the uncertainty is revealed and beliefs should assign probability one to the good type for any X 4 X b . The good type then has no incentive to wait any longer, and the proposed strategy cannot be an equilibrium. %%
%%
and 8 ! > Lg PðX ÞF > > > Vb ðXÞ > < Lb PðX ÞF Vlcs,g ðXÞ ¼ > Lg x > > > Vb ðXÞ > : %
%
Lb
if X oX g , %
ð12Þ otherwise:
(iii) Good firms invest more aggressively than first-best (X oX g , i.e., overinvest) whenever " " # x # x Lb F Lb x 1 Lb Lg 4 : ð13Þ F þI Lg Lg x1 %
:
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
! 8 Lpool PðX ÞF > > > Vb ðXÞ, > > < Lb PðX ÞF ! Vlcs ðXÞ ¼ > pLxg þ ð1pÞLxb > > > Vb ðXÞ > : Lx
267
Proposition 2 shows that there exists an investment threshold X solving Eq. (9) such that good types can separate from bad types by issuing equity and investing at or below that threshold. The good type will want to follow this strategy only if the cost of separating (i.e., the cost of investing early) is not too high compared to the underpricing of the shares. Since the value of the good type decreases with the selected investment threshold (for any threshold below X g Þ, this implies that there exists a lower bound X min rX on the separating investment threshold. Finally, since investing early is costly for good firms, in the least-cost separating equilibrium the good firm will want to raise equity and invest the first time the cash flow shock reaches the lower of X and X g . The equilibrium characterized in Proposition 2 (ii) can be sustained under pessimistic beliefs (i.e., LðXÞ ¼ Lb 8X 4 X 4X g Þ. The good type then has no incentive to defect from X 4X g to any other allocation. By applying the Intuitive Criterion of Cho and Kreps (1987), the least-cost separating contract is uniquely selected in equilibrium (as in, e.g., Hennessy, Livdan, and Miranda, 2010). Importantly, while Grenadier and Wang (2005) show that moral hazard leads to late investment, our analysis demonstrates that adverse selection leads to early investment. This difference in equilibrium investment strategies is not surprising as, in the presence of ownermanager conflicts, the good type wants to hide its positive information about project values (and pool with the bad type) to extract more rents from the principal. As a result, the objective of the principal is to offer a contract to the agent that will induce truthful revelation. This can be achieved by making it more costly for the good type to pool with the bad type, i.e., by delaying investment for the bad type. By contrast, in the presence of adverse selection, the good type wants to reveal its positive private information and the bad type wants to mimic the good type. It is therefore optimal for the good type to speed up investment, to make it more costly for the bad type to mimic. %
%
%
%
%
3.2. Discussion Investment timing: One of the major contributions of the real options literature is to show that with uncertainty and irreversibility, there exists a value of waiting to invest so that firms should only invest when the asset value exceeds the investment cost by a potentially large option
268
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
premium. This effect is well summarized in the survey by Dixit and Pindyck (1994). These authors write: We find that [y] the option value [of waiting] is quantitatively very important. Waiting remains optimal even though the expected rate of return on immediate investment is substantially above the interest rate or the normal rate of return on capital. Return multiples as much as two or three times the normal rate are typically needed before the firm will exercise its option and make the investment. Although this investment policy is consistent with what firms would do in the perfect information benchmark, it is not consistent with what they will do when taking into account informational asymmetries. As shown in Proposition 2, asymmetric information leads the good type to invest early in the project, as manifested by its decision to select an investment threshold X that is lower than the threshold X g that maximizes project value. The intuition underlying this result is that when choosing whether to mimic the good type or not, the bad type balances the value of waiting to invest with the overpricing associated with a mimicking strategy. By investing early, the good type reduces the intrinsic value of the bad type at the selected investment threshold and, hence, the benefits of mimicking for the bad type. At the separating threshold, the cost of distorting investment becomes too high and the bad type no longer wants to mimic, allowing the good type to issue fairly priced claims. Fig. 1, Panel A plots the value-maximizing investment threshold (solid line), the separating investment threshold (bold line), and the zero-NPV threshold (dashed line) as a function of the growth potential of the high type Lg , the volatility of the cash flow shock s, and operating leverage F. We use the following parameter values: the risk-free rate r= 5%, the volatility and growth rate of cash flow shock: s ¼ 25% and m ¼ 1%, operating leverage F= 10/r, the growth potential of the good and bad firms: Lg ¼ 1:25 and Lb ¼ 1.4 Fig. 1 reveals the following. An increase in volatility decreases the bad firms’ incentives to mimic and allows good firms to invest later, since the value of the option to invest increases with the volatility of the cash flow shock.5 An increase in the size of the growth option for good firms Lg leads first to an increase in the benefits of mimicking and hence to a decrease of the separating threshold. After some critical value, however, mimicking becomes punitively expensive so that the separating %
4 The risk-free rate is taken from the yield curve on Treasury bonds. The growth rate of cash flows has been selected to generate a payout ratio consistent with observed payout ratios. The firm’s payout ratio reflects the sum of the payments to both bondholders and shareholders. Following Huang and Huang (2002), we take the weighted averages between the average dividend yields (4% according to Ibbotson and Associates) and the average historical coupon rate (close to 9%), with weights given by the median leverage ratio of S&P 500 firms (approximately 20%). Similarly, cash flow volatility is chosen to match the (leverage-adjusted) asset return volatility of an average S&P 500 firm’s equity return volatility, as in Strebulaev (2007). 5 Note that the threshold for investment in the perfect information benchmark increases more than the hurdle rate with adverse selection since the vega of an option increases with its moneyness (and the good type’s real option, determining X g , is more in the money than the bad type’s real option, determining X Þ. %
threshold approaches first best. Finally, as operating leverage increases, the cost of mimicking for the bad type increases and distortions in investment decline. Cost of adverse selection: By distorting investment, asymmetric information reduces the value of the good firm. This reduction in value is equal to the difference between the value of the firm under the value-maximizing investment policy and the value of the firm under the selected investment policy. Fig. 1, Panel B plots this drop in value, defined by (Vg (X) Vlcs,g (X))/Vg (X), as a function of the growth potential of the high type Lg , the volatility of the cash flow shock s, and operating leverage F. The figure reveals that the reduction in project value can reach 30%. The comparative statics mirror those in Panel A since greater investment distortions imply a larger reduction in firm value. Change in the stock price at the time of investment: In the separating equilibrium, outside investors have incomplete information about the quality of the firms’ growth prospects before investment. However, at the time of investment, firm types become public information and this uncertainty is resolved. Because outside investors cannot predict when investment will take place, the information released at the time of the good type’s investment triggers a positive jump in the good type’s stock price and a negative price reaction in the bad type’s stock price. This is consistent with the finding of McConnell and Muscarella (1985) that unexpected increases in investment lead to increases in stock prices (and vice versa for unexpected decreases). Denote by ARk(X) =(Vk+ (X) Vlcs (X))/Vlcs (X) the jump in the value of type k when the value of the cash flow shock is X where Vlcs (X) is defined in Proposition 2. At the time of investment (consider the case X r X g Þ, these abnormal returns are given by %
%
%
ARg ðX Þ ¼
Lg PðX ÞF 1 4 0 and Lpool PðX ÞF %
%
%
ARb ðX Þ ¼
Lb PðX ÞF 1 o 0: Lpool PðX ÞF %
ð14Þ
Eq. (14) shows that the jump in the value of the good type at the time of investment is positive as investment at X signals good news about project quality. By contrast, the jump in the value of the firm that does not invest (bad type) is negative. Fig. 1, Panel C, plots the abnormal announcement returns of the good type (solid line) and bad type (dashed line) as a function of Lg , s, and F. We set p = 50% in the base case. Because the timing of investment depends on the growth potential of each type, on the volatility of the cash flow shock and on the firm’s operating leverage, abnormal returns to shareholders depend on these factors as well. In particular, the model predicts that abnormal returns should decrease with the project’s cash flow volatility s and increase with the growth potential of the good type and operating leverage F (since mimicking is more costly). %
3.3. Pooling in equity: the underpricing–overinvestment trade-off While the available evidence on abnormal announcement returns suggests that firms are often able to signal
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
40
30
20
10
0
Investment threshold
60 Investment threshold
Investment threshold
40
40
20
1.5 2 Growth potential Λg
2.5
30
20
10
0
0 1
269
0
0.2
0.4
0.6
0
200 Operating leverage F
400
0
200 Operating leverage F
400
0
200 Operating leverage F
400
Volatility σ
0.04 0.3
0.02
Cost (%)
0.1 Cost (%)
Cost (%)
0.03 0.2
0
0
0 1
1.5 2 Growth potential Λg
2.5
0
0.2 0.4 Volatility σ
0.6
0.2
0.5
0
Abnormal return (%)
Abnormal return (%)
0.5
Abnormal return (%)
0.05
0.1
0.01
0
0.1
0
−0.1
−0.5 −0.2
−0.5 1
1.5
2
2.5
0
0.2
Growth potential Λg
0.4
0.6
Volatility σ
Fig. 1. Equity separating equilibrium. Panel A plots the investment threshold in the equity separating equilibrium (bold line), the first-best investment threshold (solid line), and the zero-NPV threshold (dashed line) for the high-type firm. Panel B plots the external financing costs under equity financing and separation for different parameter values. The value loss is measured by the drop in firm value due to investment distortions in percent of first-best (option) value. Panel C plots the abnormal announcement returns under equity financing and separation for different parameter values. Depicted is the rise in the stock price for high-type firms (solid line) and the drop for low-type firms (dashed line) at the separating investment threshold. The base parametrization is p = 0.5, Lg ¼ 1:25, Lb ¼ 1, r =0.05, m ¼ 0:01, s ¼ 0:25, I = 100, and F = 10/r.
their private information to outside investors, the study of pooling equilibria provides additional insights on the determinants of firms’ policy choices.6 In a pooling equilibrium, financial markets are not able to distinguish
among firms of different types. All firms invest at the same time and issue common equity to finance the capital outlay. The pooled value of the firms at the time of investment is given by þ Vpool ðXÞ ¼
6
A conceptual drawback of the Intuitive Criterion applied in Proposition 2 is that equilibrium selection is insensitive to the prior distribution of types. Costly separation is unlikely, however, from an ex ante perspective when the fraction of high-type firms p is close to one. In this situation, a pooling contract can be beneficial for the good type, since its optimal pooling contract approaches the first-best allocation as p tends to one.
X
PrðL ¼ Lk ÞLk PðXÞF ¼ Lpool PðXÞF:
ð15Þ
k ¼ b,g þ ðXÞDnðX; Lpool Þ=½1 þ DnðX; Lpool Þ ¼ I deThe constraint Vpool termines the number of shares that have to be issued at the time of investment given the value of the cash flow shock X pool at that time. Solving this budget constraint
270
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
yields
DnðX pool ; Lpool Þ ¼
I
Lpool PðX pool ÞFI
:
ð16Þ
This equation shows that asymmetric information leads to a dilution of the good type’s equity stake. Ex post outside investors make money on the good type and lose money on the bad type: There is cross-subsidization. To determine whether a pooling equilibrium exists, we first need to verify that pooling with the good type is an optimal strategy for the bad type. The incentive compatibility constraint of the bad type writes !x Lb PðX pool ÞF X pool Z½Lb PðX b ÞFI : ð17Þ Xb 1 þ DnðX pool ; Lpool Þ |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Value in pooling equilibrium
Real option value 4 0
Since Lpool o Lg , the threshold at which condition (17) is binding lies between X and the first-best threshold X b . For smaller values of the cash flow shock, condition (17) is violated and investment at such a threshold does not constitute a pooling equilibrium. As is standard in signaling games, we face multiplicity of equilibria. Maskin and Tirole (1992), however, show that in the mechanism design game in which the firm’s insiders (the ‘informed principal’) ex ante offer contracts to investors in the capital market (the ‘uninformed agents’), only those pooling equilibria survive that (weakly) Pareto-dominate the least-cost separating equilibrium characterized in Proposition 2. In any pooling equilibrium all firms invest at the same time and issue common equity. The incentive compatibility constraint (17) for the bad type puts a Perfect Bayesian bestresponse restriction on the set of Pareto-dominant pooling equilibria. The remaining restriction is that the value of the good type in the pooling equilibrium is larger than its value in the least-cost separating equilibrium: !x Lg PðX pool ÞF X %
1 þ DnðX pool ; Lpool Þ X pool |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Value in pooling equilibrium %
Z1X g r X Vg ðXÞ þ 1X g 4 X %
%
!
Lg PðX ÞF V ðXÞ : Lb PðX ÞF b %
ð18Þ
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Value in separating equilibrium
Pooling equilibria exist if and only if there is a threshold X pool for which conditions (17) and (18) hold. We can check these conditions as follows. Whenever X g r X , condition (18) is violated since the good type cannot do better than the first-best value Vg . Consider next the reverse situation. The best pooling equilibrium for type k is the one that selects the investment threshold X pool,k that maximizes the present value Vpool,k (X) of the cash flows accruing to incumbent shareholders. We show in Appendix D that %
X pool,k ¼
wk rm ðF þIÞ, k ¼ g,b, wk 1 Lpool
ð19Þ
where wk 41 depends on firm type. There will therefore not be a single Pareto-optimal pooling equilibrium. Notice,
however, that condition (17) holds whenever condition (18) is satisfied. A pooling equilibrium therefore exists if and only if X oX g and condition (18) holds at X pool,g . We then have the following existence result (see the appendix): %
Proposition 3. There exists a Pareto-dominant pooling equilibrium in which all firms issue equity and invest the first time the cash flow shock reaches a threshold X pool satisfying (17) and (18) with X rX pool rX b if and only if condition (13) holds (so that X o X g Þ and the fraction of good projects in the economy, p, exceeds the threshold p defined in Appendix Appendix D. %
%
The next proposition characterizes the Pareto-dominant pooling equilibria: Proposition 4. Good firms invest more conservatively than first best (underinvest) and bad firms invest more aggressively than first best (overinvest) in the pooling equilibrium, that is, X g r X pool rX b . The market value in the pooling equilibrium is independent of project type, and market and intrinsic values are equal to !x X Vpool ðXÞ ¼ ½Lpool PðX pool ÞFI and X pool Lk PðX pool ÞF Vpool ðXÞ : ð20Þ ðXÞ ¼ Vpool,k Lpool PðX pool ÞF Several important results follow from Proposition 4. First, asymmetric information lowers the investment threshold selected by the bad type compared to the perfect information benchmark. As a result, asymmetric information causes bad firms to speed up investment or overinvest relative to first best, consistent with the evidence in Kedia and Philippon (2009). Second, asymmetric information reduces the value of good firms and increases the value of bad firms. Thus, although the good type can raise the funds required to finance its project, it is hurt by the presence of the bad one. Fig. 2 plots the least-cost equilibrium (Panel A), the equilibrium investment threshold (Panel B), the external financing costs (Panel C), and the underpricing of the shares issued in the Pareto-dominant pooling equilibrium (Panel D) for the high-type firm as a function of the growth potential of the high type Lg , the volatility of the cash flow shock s, operating leverage F, and investors’ belief about the fraction of high-type firms p in the economy (depicted on the vertical axis in Panel A). Input parameter values are set as in Fig. 1. In Panels B–D, we assume that the fraction of high-type firms in the economy is p= 50%. The figure shows that for low values of operating leverage F, cash flow volatility s, or growth differential Lg =Lb , it is too costly for the good type to separate so that pooling equilibria Pareto-dominate the least-cost separating equilibrium. When this is the case, the good type underinvests relative to first best and there is no abnormal announcement return. One essential difference between our analysis and the analysis in Myers and Majluf (1984, MM) is that MM assume that ‘‘the investment opportunity evaporates if the firm does not go ahead at time t=0’’ (p. 190). In the current
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
0.5 First best
Equity separation
0
1 Equity pooling
0.5 Equity separation
First best
0.5
1
1
Growth potential of low type Λb
1.5
2
2.5
0
Equity pooling
Probability p
Probability p
Equity pooling
Equity separation
0.5 Equity separation
0 0.05
0.6
1
Equity pooling
0.5
0.3 Volatility σ
1
0 Growth rate μ
Equity separation
Growth potential of high type Λg
1
0 −0.05
0.5
0
0 0
Probability p
Equity pooling
Probability p
1
Equity pooling
Probability p
Probability p
1
271
0.5 Equity separation
0 0
200 Operating leverage F
400
0.02
0.07 Interest rate r
0.12
Fig. 2. Least-cost equity equilibrium. The different panels plot the least-cost equity equilibrium, investment threshold, and external financing costs under equity financing for different parameter values. Panel A depicts the least-cost equilibrium as a function of the parameters Lg , Lb , m, s, F, and r on the x-axis and of investors’ beliefs about the fraction of high-type firms p on the y-axis. Panel B depicts the investment threshold in the least-cost equity equilibrium (bold line), the first-best investment threshold (solid line), and the zero-NPV threshold (dashed line). Panel C depicts the drop in firm value due to investment distortions in the least-cost equity equilibrium for different parameter values. The value loss is evaluated at the zero-NPV threshold, measured in percent of first-best (option) value, and given by Costð%Þ ¼ ðVg maxðVpool,g ,Vlcs,g ÞÞ=Vg . Panel D depicts the underpricing of the shares issued in the Pareto-dominant pooling equilibrium for different parameter values. The underpricing is measured in percent and given by Underpricing (%) = (Vg+ Vpool )/Vpool at the issuance date. The base parametrization is p =0.5, Lg ¼ 1:25, Lb ¼ 1, r= 0.05, m ¼ 0:01, s ¼ 0:25, I =100, and F =10/r.
paper we make the opposite assumption and consider that each firm can delay investment as much as it desires. In Appendix E, we consider the effects of timing constraints on equilibrium investment strategies and show that the model of Myers and Majluf is nested in ours. Notably, we demonstrate that in the limit as the firm cannot postpone investment, the option value of waiting to invest vanishes and firms face a now-or-never investment decision. When this is the case, both types of firms want to invest immediately as long as the pooled net present value of the project is positive and only pooling equilibria in equity survive.
and Pyle, 1977). We demonstrate in this section that the equilibria can be Pareto-ranked and that the least-cost financing choice depends on investors’ prior beliefs about project quality and on the project’s characteristics.7
4.1. Debt issuance and firm valuation with perfect information
4. Signaling through investment and financing choice
Suppose that the investment outlay I is funded with risky perpetual debt with coupon flow c. Denote the default threshold of type k by X k ðcÞ. In the perfect information benchmark, the total value of the firm Vk+ and the value of debt Dk after investment are given by the
We have thus far not explored the possibility that firms might issue debt to finance the capital expenditure. In this section, we relax this assumption and examine the effects of debt financing on firms’ equilibrium investment strategies. While good types can separate from bad types by issuing equity and investing earlier than first best, we show they can also separate by issuing debt (as in Leland
7 For lack of space we restrict attention to situations in which firms issue only one type of financing instrument. A mix of debt and equity could be incorporated in the model and would be a nice extension. External financing is associated with substantial fixed costs including advisory, legal, regulatory fees and underwriter expenses. This often forces firms to choose between instruments. Common equity and straight debt are the most popular public financing vehicles for U.S. corporations.
272
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
30
20
10
0
40 Investment threshold
60 Investment threshold
Investment threshold
40
40
20
30
20
10
0 1
1.5
2
2.5
0 0
0.2
0.4
0.6
0
Volatility σ
Growth potential Λg
200
400
Operating leverage F
0.04 0.3
0.02
0.2
0.05
0.1
0.01
0 1
1.5 2 Growth potential Λg
0
0
2.5
0
0.2 0.4 Volatility σ
0.6
0
0.5
0.5
0.4
0.4
0.4
0.3 0.2 0.1 0
Underpricing (%)
0.5
Underpricing (%)
Underpricing (%)
Cost (%)
0.1 Cost (%)
Cost (%)
0.03
0.3 0.2
1.5
2
2.5
400
200
400
0.3 0.2 0.1
0.1
0
0 1
200 Operating leverage F
0
0.2
0.4
0.6
0
Volatility σ
Growth potential Λg
Operating leverage F
Fig. 2. (Continued)
endogenous default threshold satisfies (see Appendix F):
following expressions for k= g,b (see Appendix F): n X , Vkþ ðX,cÞ ¼ Lk PðXÞFaLk PðX k ðcÞÞ X k ðcÞ
ð21Þ
i X n c hc , ð22Þ Dk ðX,cÞ ¼ ð1aÞLk PðX k ðcÞÞ þ F r r X k ðcÞ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where n ¼ ðs2 =2mÞ=s2 ½ðs2 =2mÞ=s2 2 þ 2r=s2 o 0. Eq. (21) shows that for any value of the cash flow shock, debt financing reduces firm value by inducing bankruptcy costs (third term on the right-hand side). In Eq. (22), the first term on the right-hand side is the value of risk-free debt. The second term captures the impact of default risk on the value of corporate debt. The intrinsic value of equity is in turn given by Vk+ (X,c) Dk(X,c), k= g,b, and the
X k ðcÞ ¼
n rm c F þ , k ¼ g,b: r n1 Lk
ð23Þ
Since Lg 4 Lb , Eq. (23) implies that for a given coupon payment c, the default threshold of the good firm is lower than the default threshold of the bad firm.8 In order to simplify the exposition, define the coefficients Z and Zk as follows:
Z¼
an n1ð1aÞn
and
Zk ¼ 1
Lnk ð1ZÞ : pLng þ ð1pÞLnb
ð24Þ
8 This result is robust to alternative bankruptcy policies. For instance, if the firm is constrained from raising additional funds, it defaults when X k ðcÞ ¼ ðc þ f Þ=Lk , k= g,b, and we again have X g ðcÞo X b ðcÞ.
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
Under symmetric information, the budget constraint Dk(X,ck(X))= I uniquely determines the coupon ck(X) selected by type k at the time of investment (ck(X) is given by expression (F.10) in Appendix F). The credit spread on the debt contract is then given by rk ðXÞ ¼ ck ðXÞ=r=I1 at the date of issuance, and firm value at the investment date satisfies Vkþ ðX,ck ðXÞÞ ¼ Lk PðXÞFZrk ðXÞI. The third term in this expression captures the discount due to deadweight costs of default. In the perfect information benchmark, a firm of type k optimally invests when the cash flow shock reaches the investment threshold X k,D defined by Zrk I n½F þ ð1 þ rk ÞI 1 4X k : X k,D ¼ X k 1 þ ð25Þ IþF x½F þð1 þ nrk ÞI Since the term in brackets is larger than unity, Eq. (25) shows that debt financing delays investment. This is due to the increase in the cost of capital from deadweight losses in default. Firm value before investment is then given by 1a !x F þ 1þn rk I Xk 1na Vk,D ðXÞ ¼ Vk ðXÞ, k ¼ g,b: F þ ð1 þ nrk ÞI X k,D ð26Þ This expression reveals that debt financing reduces firm value through two channels in the perfect information benchmark. First, for a given investment policy, debt financing induces bankruptcy costs (first factor on the right-hand side of this equation). Second, debt financing distorts investment policy (second factor).
273
by the investment distortions (X D aX b Þ and the negative effect of bankruptcy costs on the value of the bad type after investment (captured in the term Vb+ ). When the value of the cash flow shock is below X D , the bad type prefers to invest at its first-best trigger X b . Otherwise, the bad type prefers to mimic the good type. To determine whether issuing debt and investing below the threshold X D is an equilibrium strategy for the good firm, we also need to check its incentive compatibility constraint: x Lg PðX b ÞF X Vgþ ðX,cg ðXÞÞI Z , ð28Þ 1 þ DnðX b ; Lb Þ X b %
%
%
where Vg+ (X,cg(X)) is given in Eq. (21). The threshold X min,D for which the incentive compatibility constraint (28) is binding represents the lowest value of the cash flow shock such that the good type prefers separation with debt over pooling with the bad type. A separating equilibrium in debt exists only if X min,D rX D . By the optimality of X g,D in the absence of information asymmetry, we also have X min,D rX g,D . Finally, for a feasible threshold X 2 ½X min,D ,X D to be a PBE, the good type must not want to defect to any other debt- or equity-financed investment policy. Combining these results, we obtain the following Proposition: %
%
Proposition 5. (i) There exists a separating equilibrium in which growth firms separate by issuing debt and investing the first time the cash flow shock reaches X satisfying X min,D rX rX D so long as Lb =Lg , s, and f are low enough and m is large enough. (ii) In the unique least-cost separating equilibrium with debt, good firms invest at the lower of the thresholds X D and X g,D , given by (25), while bad firms invest at their first-best investment threshold X b . The selected coupon payment cg and the credit spread rg are determined by condition (F.10) in Appendix Appendix F. Before investment, firm value is independent of project quality and satisfies: %
%
4.2. Separation through debt issuance In the perfect information benchmark, equity issuance maximizes firm value because debt financing induces deadweight costs of default and investment distortions (i.e. X g,D 4X g Þ. However, in the presence of informational asymmetries, issuing equity may not be in the best interests of incumbent shareholders as it may lead to underpricing in a pooling equilibrium or to large investment distortions in a separating equilibrium. To determine whether there exists a separating equilibrium in which the good type issues debt, we first need to check the incentive compatibility constraint of the bad type. The budget constraint Dg(X,cg)= I implies that the bad type is indifferent between mimicking the good type by issuing debt at the threshold X D and waiting to follow its first-best strategy if the following incentive compatibility constraint is satisfied: !x n Lb XD þ Vb ðX D ,cg ðX D ÞÞI þ 1 rg ðX D ÞI ¼ ½PðX D ÞFI : Lg Xb |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
Vlcs,D ðXÞ ¼ pV g,D ðXÞ þð1pÞVb ðXÞ:
ð29Þ
(iii) Separation in debt is least-cost if and only if 1a !x F þ 1þn rg I Xb 1na F þ ð1þ nrg ÞI X g,D " x # Lg Lg PðX ÞF Vpool,g ðXÞ Lg x ; , Z Max 1X g r X þ 1X g 4 X Lb Lb PðX ÞF Vg ðXÞ Lb %
%
%
%
%
%
%
%
%
%
Crosssubsidization
ð27Þ This incentive compatibility constraint is similar to that derived under equity financing and reflects the fact that cross-subsidization reduces the cost of debt financing n for the bad type (Db ðX,cg Þ ¼ I½1ðLnb L g 1Þrg ðXÞ o IÞ. The positive effect of selling overpriced debt is counterbalanced
ð30Þ where
(X) Vpool,g
is defined in Proposition 4.
Proposition 5 shows that there exists an investment threshold X D solving Eq. (27) such that good types can separate from bad types by issuing debt and investing at or below that threshold. The good type will follow this strategy only if the cost of separating (i.e., the cost of issuing debt and distorting investment) is not too high compared to the underpricing of the shares. Since the value of the good type increases with the selected investment threshold (for any threshold below X g,D Þ, this implies that there exists a lower bound X min,D rX D on the separating investment threshold. Finally, since distorting %
%
274
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
investment is costly, in the least-cost separating equilibrium the good firm issues debt and invests the first time the cash flow shock reaches the lower of X D and X g,D .9 %
4.3. Least-cost equilibrium Given that several investment and financing strategies are available, one question that naturally arises is what is the value-maximizing strategy for the good type? The traditional answer to this question is that firms should first issue the securities with the lowest information costs, i.e., that informational asymmetries make conventional equity issues unattractive. Proposition 5 shows that in our model the financing choice of the good type is determined by a trade-off between investment distortions, the more severe underpricing of equity, and the deadweight costs of default associated with risky debt claims. Fig. 3, Panel A maps out the least-cost equilibrium (as characterized in Proposition 5) as a function of the various parameters of the model. The figure shows that when Lg =Lb is high, the good type finds it optimal to separate from the bad type by issuing equity, as it is too costly for the bad type to distort its investment strategy in that case. By contrast, when Lg =Lb is small both types pool in equity. These results are due to the fact that the value of the option to invest is increasing and concave in the selected investment threshold so that the cost to the bad type of deviating from its first-best threshold is increasing and convex. Separation with debt issuance is the optimal strategy for intermediate values of Lg =Lb . When deciding which type of security to issue, the good type balances the investment distortions associated with equity issues and the expected bankruptcy costs of debt. Fig. 3 reveals that high bankruptcy costs, high cash flow volatility, large operating leverage, or a large growth differential between types makes it more likely for the good type to issue equity. This suggests that, consistent with the evidence reported by Frank and Goyal (2003), small high-growth firms may not find it optimal to behave according to the static pecking order theory. Hence, our model can explain observed departures from this theory such as equity issuance by firms with ample debt capacity. As in the case of equity financing, we can examine the implications of debt financing for investment distortions (Panel B) and announcement returns (Panel C). Fig. 3 plots these quantities as a function of the relative size of the 9 With two-dimensional signals (timing and financing choice) the space of deviations to consider is now larger. This has the effect that some of the equity equilibria derived in Section 3 become incentive incompatible, since it may be profitable for the good type to deviate from accelerated investment at X to delayed investment and debt issuance, even under pessimistic out-of-equilibrium beliefs. The results in this and the next section, nonetheless, are unaffected. The parameter combinations where such deviations are relevant coincide with those satisfying condition (30), where equity is dominated by debt issuance and the debt-financed investment strategy characterized in Proposition 5 (ii) is the least-cost separating equilibrium. Similarly, debt-financed investment policies where the good type wants to deviate to equityfinanced investment are dominated by equity pooling or separating equilibria. %
growth option of bad firms, Lb =Lg , the volatility of the cash flow shock s, and operating leverage F. Input parameter values are set as in Fig. 1. The figure shows that when separation in debt is least-cost, good firms select an investment threshold that is higher than the first-best threshold (i.e., X g,D Z X g ; see Eq. (25)). Since the timing of investment depends on the size and volatility of the cash flows generated by the firms’ investment project and on operating leverage, the figure reveals that abnormal returns depend on these factors as well (Panel C). In particular, the model predicts that abnormal returns should decline with volatility s and the fraction p of good firms, and rise with the growth differential Lg =Lb and operating leverage F. 5. Implications and empirical predictions of the model Over the past two decades, the literature examining the relation between external finance constraints and corporate investment has developed substantially. Most of the theoretical contributions in this area model financing frictions exogenously and loosely relate these frictions to moral hazard or adverse selection problems. Our model adds to this literature by allowing us to examine explicitly the effects of endogenous financing constraints arising from adverse selection on firms’ equilibrium investment and financing strategies, abnormal returns following the announcement of corporate policy choices, and external financing costs. 5.1. Endogenous financing constraints and corporate investment Table 1 summarizes the investment and financing behavior of the good and bad firms in each of the three types of equilibrium: separating in equity, pooling in equity, and separating in debt. The table shows that pooling equilibria in equity and separating equilibria in debt lead to late investment for the good type compared to the perfect information benchmark, while separating equilibria in equity lead to early investment. To quantify the effects of adverse selection on the timing of investment, we can examine the relation between asymmetric information and a firm’s investment hazard, defined as the probability of undertaking the project as a function of time (as in Whited, 2006). In our model, the probability of investment over the next t years can be computed as (see, e.g., Harrison, 1985, p. 15): " # lnðX0 =KÞ þ mt pffiffi FðtÞ ¼ Pr sup Xs ZK ¼ N s t s2½0,0 þ t 2m=s2 K lnðX0 =KÞmt pffiffi þ , ð31Þ N X0 s t where m ms2 =2, N is the standard normal cumulative density function, K ¼ X g or K ¼ X b under perfect information, K ¼ X 4X g in separating equilibria, and K ¼ X pool in pooling equilibria. Fig. 4 plots the probability of investment as a function of time under adverse selection for high-type (solid line) and low-type (dotted line) firms in comparison with the %
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
1
1
1 Equity
Equity pooling
0
0.5 1 Growth potential of low type Λb
Equity separation
2
Probability p 2.5
0
0.6 0.4
Equity Debt separation
200 Operating leverage F
400
20 10 0
30 20 10 0
0.5 1 Growth potential of low type Λb 40
30
30 Investment threshold
40
20
10
0 −0.05
0 Growth rate μ
0.05
0
0.5 Bankruptcy cost α
1
0
0.3 Volatility σ
0.6
0
0.5 Probability p
1
40
20
0 1
2.5 1.5 2 Growth potential of high type Λg 40 Investment threshold
0
Equity separation
60 Investment threshold
Investment threshold
40
30
0.4
0 0
40
0.6
0.2
0 0.05
0.6
Equity pooling
0.8
0.2
0 Growth rate μ
0.4
Volatility σ
Equity pooling
separation
0.2 0 −0.05
0.2
1
0.8
Debt separation
Equity separation
0 1.5
Probability p
0.4
Debt separation
Growth potential of high type Λg
Equity pooling
0.6
0.4 0.2
1
Probability p
Probability p
First best
1
0.8
Investment threshold
Equity separation
Debt separation
Probability p
0.2
1
Investment threshold
0.4
0.6
Debt separation
0
0.6
E. s.
0
0.8
E. s.
0.2
Debt separation
0.6
Equity separation
Probability p
0.8
First best
Equity pooling
pooling
0.8
0.4
275
20
10
30
20
10
0
0 0
200 Operating leverage F
400
Fig. 3. Least-cost equilibrium with debt. The different panels plot the least-cost equilibrium, investment threshold, external financing costs, and abnormal announcement returns in the least-cost equilibrium for different parameter values. Panel A depicts the least-cost equilibrium as a function of the parameters Lg , Lb , m, s, F, and a on the x-axis and of investors’ beliefs about the fraction of high-type firms p on the y-axis. Panel B depicts the investment threshold in the least-cost equilibrium (bold line), the first-best investment threshold (solid line), and the zero-NPV threshold (dashed line). Panel C plots the announcement returns in the least-cost equilibrium for different parameter values. Depicted is the rise in the stock price for high-type firms (solid line) and the drop for low-type firms (dashed line) at the (equity or debt) separating investment threshold. Abnormal announcement returns are given by ARk ð%Þ ¼ ðVkþ ðXÞDk ðXÞÞ=maxðVlcs ðXÞ,Vlcs,D ðXÞÞ1 for k= g,b at X ¼ X 4X g if equity separation is least-cost and at X ¼ X g,D if debt separation is least-cost. Panel D depicts the drop in firm value due to investment distortions in the least-cost equilibrium for different parameter values. The value loss is evaluated at the zero-NPV threshold, measured in percent of first-best (option) value, and given by Costð%Þ ¼ ðVg maxðVpool,g ,Vlcs,g ,Vg,D ÞÞ=Vg . The base parametrization is p = 0.5, Lg ¼ 1:25, Lb ¼ 1, r =0.05, m ¼ 0:01, s ¼ 0:25, a ¼ 0:25, I= 100, and F= 10/r. %
276
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
0.5 0 −0.5
0.5
0.5
Abnormal return (%)
Abnormal return (%)
Abnormal return (%)
1
0
−0.5
0
−0.5
−1 0
0.5
1
1
Growth potential of low type Λb
0.5
2
2.5
0
−0.5
0
−0.5
−0.05
0
0.05
0
−0.5 0
Growth rate μ
200
400
0
Operating leverage F
0.04
0.04
0.03
0.03
0.6
0.5 Abnormal return (%)
0
0.3 Volatility σ
0.5 Abnormal return (%)
Abnormal return (%)
1.5
Growth potential of high type Λg
0.5
1
Probability p
0.02 0.01
0.02
0
0 1
0.5 1 Growth potential of low type Λb
2
2.5
0
Growth potential of high type Λg
0.1 Cost (%)
0.05
0.6
0.1
0.05
0.05
0
0 0 Growth rate μ
0.3 Volatility σ
0.1
0.05
0 −0.05
1.5
Cost (%)
0
0.2
0.1
0.01
0
Cost (%)
Cost (%)
Cost (%)
Cost (%)
0.3
0
200 Operating leverage F
400
0
0.5 Probability p
1
Fig. 3. (Continued)
probability of investment under the first-best investment policy (dashed line). We compute this probability when firms separate and when firms pool (dash-dotted line). The top chart in each panel plots the cumulative probability of investment F(t) given in expression (31), while the bottom chart plots the hazard rate FuðtÞ=ð1FðtÞÞ. Across panels we vary the initial value of the cash flow shock. Input parameter
values are set as in the base case environment, with the fraction of good projects given by p=0.5 in the separating equilibrium and by p=0.7 in the pooling equilibrium. Fig. 4 shows that in the separating equilibrium, adverse selection speeds up investment compared to the perfect information benchmark and that the effect is quantitatively important. In the separating equilibrium, constrained firms
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
Table 1 Adverse selection and investment policy. The table summarizes the investment and financing behavior of the good and bad firms in each of the three types of equilibrium: separating in equity, pooling in equity, and separating in debt. It also reports the stock price reaction following the announcement of corporate policies. Good type
Bad type
Separation in equity
Invests early Positive abnormal returns Issues equity
Invests optimally Negative abnormal returns Issues equity
Separation in debt
Invests late Positive abnormal returns Issues debt
Invests optimally Negative abnormal returns Issues equity
Pooling in equity
Invests late No announcement returns Issues equity
Invests early No announcement returns Issues equity
invest more (i.e., the hazard rate is higher) and have a higher marginal productivity of capital (i.e., only the good firms with a high L accelerate investment). By contrast, Fig. 4 shows that in the pooling equilibrium, firms cannot signal their quality to outside investors and end up investing later than first best. The quantitative effect is limited, however, since pooling is optimal for the good type only if the cost of the investment distortion is not too high. As discussed in Section 3, a similar investment pattern would emerge in a model in which frictions are generated by moral hazard. In such models, the optimal contract between the principal and the agent implies an increase in the investment threshold of the bad type to induce truthful revelation. By contrast, in the separating equilibrium, firms with positive private information speed up investment to make it more costly for firms with negative private information to mimic. The behavior of constrained firms in the separating equilibrium is consistent with the empirical findings in Hall (1987), Evans (1987), and Dunne and Hughes (1994) that small (and presumably more financially constrained) firms invest more and grow faster than large firms. It also fits the standard folklore that smaller firms are more aggressive at entering new markets or launching new products than bigger, safer, and financially unconstrained firms. Importantly, a recent study examining private firms’ decisions to go public by Bustamante (2009) provides direct evidence supporting our theory. Her empirical analysis reveals that firm age is a significant characteristic in firms’ decisions to exercise their option to do an initial public offering (IPO). She also finds that the probability of receiving a high rating by Standard & Poor’s in the years following an IPO on the NYSE is negatively and significantly related to the age of the firm at the time of the IPO, consistent with the prediction of the model that good firms invest early. Finally, the prediction that firms underinvest when separating in debt is consistent with the negative relation between ‘‘market leverage’’
277
(measured as the value of debt divided by the value of the firm) and growth options shown in the literature examining the relation between firms’ leverage choices and the composition of their investment opportunity sets (see Smith and Watts, 1992; Rajan and Zingales, 1995; or Barclay, Morellec, and Smith, 2006). To make the analysis complete, Table 2 examines the determinants of investment hazards. To do so, we first perform the simulation experiment described in Appendix G, generating a set of 60,000 artificial firms from our model. We then regress investment hazards on a set of characteristics. Specifically, we first compute the theoretical investment hazard rates at different points in time in the simulated data. We then estimate how this hazard function depends on observed firm characteristics. Explanatory variables that accelerate investment are expected to raise hazards for small T and lower hazards for large T, and vice versa. The construction of the explanatory variables is discussed in the appendix. Consistent with the above discussion, firms with a higher market-to-book (M/B) ratio, a higher growthpotential, or a larger probability of being successful invest more readily. By contrast, cash flow volatility and bankruptcy costs diminish investment propensities.10 While our model has implications for the investment decisions of individual firms, it also has direct implications for the dispersion in industry investment rates. In particular, our theory predicts that this dispersion should be greatest when firms separate in equity. Indeed, when firms separate in equity (respectively, in debt), the investment threshold of the good type is below (above) the investment threshold in the perfect information benchmark and hence further away from (closer to) that of the bad type. As shown in Section 4, firms are more likely to separate in equity when the degree of valuation uncertainty, the volatility of cash flows, or operating leverage are higher. By contrast, our theory predicts that the dispersion in industry investment rates should be lower in industries that are more heavily debt financed. These predictions on the dispersion in industry investment rates are unique to our theory. Another robust prediction of our model is that, since operating leverage facilitates separation for high-type firms through accelerated investment, investment rates should appear inefficiently high in high operating leverage industries (separating equilibrium) and inefficiently
10 The negative effect of the rate of cash flow growth on the investment hazard comes from the fact that we set the initial value of the cash flow shock at the zero-NPV threshold of the full information 0 benchmark, defined by X g ¼ ðrmÞðF þ IÞ=Lg . Since this zero-NPV threshold is more sensitive to changes in m than the equilibrium investment threshold, an increase in m implies that the starting value of the cash flow shock is further away from the investment threshold (thereby reducing the probability of investment). Alternatively, we could have fixed the initial value of the cash flow shock independently of its growth rate. In this case, the rate of cash flow growth would have had a positive effect on the investment hazard. However, with X0 fixed, one has to either set X0 very low in order not to have cases where immediate investment is optimal (which implies that the probability of investment is very low) or one gets many immediate investment cases. Importantly, the choice of the initial value for the cash flow shock has no bearing on the sign of the relation between the investment hazard and the other explanatory variables in Table 2.
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
0.5
0 0
5 Time
10
1
Investment probability (%)
1
Investment probability (%)
Investment probability (%)
278
0.5
0
0
5 Time
10
1
0.5
0
0
5 Time
10
0
5 Time
10
3 2 1
0.2
Hazard rate (%)
Hazard rate (%)
Hazard rate (%)
4
0.15 0.1 0.05 0
0 0
5 Time
0
10
5 Time
10
0.04
0.02
0
Fig. 4. Investment probability. The figure plots the probability of investment as a function of time under adverse selection in comparison with the probability of investment under the first-best investment policy (dashed line). We compute this probability when the firms separate (good type: solid line, bad type: dotted line) and when the firms pool (dash-dotted line). The top chart in each panel plots the cumulative probability of investment F(t) given in expression (31), and the bottom chart plots the hazard rate FuðtÞ=ð1FðtÞÞ. Input parameter values are set as in the base case environment with the fraction of good projects given by p=0.5 in the separating equilibrium and by p=0.7 in the pooling equilibrium. The initial value of the cash flow shock, X0, is set to x times the zero-NPV threshold, and we vary x across panels. The base parametrization is Lg ¼ 1:25, Lb ¼ 1, r=0.05, m ¼ 0:01, s ¼ 0:25, I=100, and F=10/r.
Table 2 Determinants of corporate investment hazards. The table shows the determinants of corporate investment, as described by investment hazards (the probability of undertaking the project as a function of time). The table reports the parameter estimates from a linear regression of investment hazards at different points in time (T =1, 2, or 5) on a set of explanatory variables in simulated data. The three columns in each panel consider different specifications. The marker y next to the coefficient indicates the p-value is larger than 0.001 in a t-test of insignificance. The number of observations in each panel is 60,000. T=1 Market-to-book Cash flow volatility [s]
T= 2
2.19 0.05
1.99 0.05
0:05y 0.48 1.40 0.00
0.48 1.36 0.00
0.01
0:00y 0.03
0.53
2.13 0.03
1.16 0.01
3.48 0.05
1.20 0.03
1.36 0.03
0.79 0.02
0.18
0.33
0.40
0.52
0.48
0.27
0.09 1.77 0.02
0.10 1.61 0.02
0.14 2.52 0.04
0.84 0.22 0.02
0.87 0.19 0.02
0.77 0.39 0.03
0:00y 2.26
0.01
0:00y 2.48
0.00
3.52
0.37
2.50
1.99 0.04
0.92 0.01
0.09 1.59 1.29 0.96
0.03 0.48 0.17 0.38 0.03
Constant
0.49
y
0:00y 1.83
0.87
4.04
3.87
2.61
0.42
0.44
0.73
0:06y 0.66
1.55
R2
0.78
0.53
0.52
0.57
Firm size [lnðX Þ] Leverage [F] Cash flow growth [m] Default cost [a] Growth potential [Lg =Lb ] Belief [p] Interaction terms: p Market-to-book p Cash flow volatility p p p p p
Firm size Leverage Cash flow growth Default cost Growth potential
0.09
1.30 0:01y 0.77
T= 5
0:08 0.41
0.09 0.94
low in low-leverage industries (pooling equilibrium), compared to first best. This prediction is opposite to neoclassical models of investment with fixed costs of adjustment and differential operating leverage (see Whited, 2006). In a neoclassical model higher operating leverage makes firms less profitable; so they optimally
3.01 0:01y 0.05 0.28 0.16 1.59 0.04
replace capital less often, which lowers investment hazards. This difference has not been tested, and is a way to distinguish the two classes of theories.11
11
We thank the referee for pointing this out to us.
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
5.2. Adverse selection and ex post losses In the perfect information benchmark, firms invest with a large option premium over the cost of investment, resulting in a significant cushion against future market downturns. As a result, the probability of real asset values falling below investment cost is very low. This prediction is clearly at odds with the recent empirical evidence on the behavior of real estate markets or on the high exit rate of firms in many industries. In the analysis below, we examine the effects of adverse selection on the potential for future investment losses and derive a number of new empirical implications. To illustrate the effects of adverse selection on the likelihood of ex post losses, we can compute the probability that, given a value Xt of the cash flow shock at the time of investment, the asset value (net of operating costs) falls below the investment cost over the next T years. This probability is given by (see Harrison, 1985, p. 15): Pr inf Lk PðXt ÞF r I t2½t, t þ T
¼
ak Xt
2m=s2
2 3 2 3 a a ln k þmT ln k mT 6 6 7 7 Xt Xt 7þN 6 7, pffiffiffi pffiffiffi N6 4 5 4 5 s T s T ð32Þ
%
0.2 0.1
0.4 Loss probability (%)
Loss probability (%)
Loss probability (%)
0.3
0
0.3 0.2 0.1 0
1.5
2
2.5
0.3
Growth potential Λg Growth potential Λg
0.4 0.5 Volatility σ
0.1 0.2 0.4 0.6 Abnormal return (%)
300 Operating leverage F
400
Operating leverage F
0.4 0.35 0.3 0.25 0.2
0
0.1
0.24 Loss probability (%)
0.15
0.2
Volatility σ
Loss probability (%)
0.2
0.3
0 200
0.6
0.45
0.25 Loss probability (%)
where ak ðrmÞðF þIÞ=Lk , m ms2 =2, N is the normal cumulative distribution function, Xt ¼ X g under perfect information, and Xt ¼ X 4X g under asymmetric information. In the base case environment, asymmetric information increases the probability of ex post losses over a five-year horizon from 13% to 23%, showing that the effects of adverse selection can be significant. Since adverse selection problems are more severe for young, high-growth firms, an immediate consequence of the model is that these firms will invest sooner so that their investment projects will have a greater likelihood of turning out poorly. To get more insights on the determinants of the probability of ex post losses, Fig. 5 plots this probability in the separating equilibrium (which is the focus of most empirical studies) over a five-year horizon (changing the horizon would change the magnitude of the probability but not its functional form). We focus on three determinants of the probability of ex post losses, namely operating leverage, the growth potential of the good type, and the volatility of the cash flow shock. In each plot, we report the probability of ex post losses under the first-best investment policy (dashed line) and in the separating equilibrium (solid line). In addition, we plot in Panel B the abnormal announcement returns that had been computed in Fig. 1 against the loss probability for the same set of parameter values.
0.4
0.4
279
0.1
0.15 Abnormal return (%)
0.2
0.22
0.2
0.18
0.16 0.17
0.175 Abnormal return (%)
0.18
Fig. 5. Announcement returns and ex post losses. Panel A plots the loss probability under equity financing and separation for different parameter values. Depicted are the loss probabilities in the equity separating equilibrium for the high-type firm (solid line) and under the first-best investment policy (dashed line). The loss probability is measured by the likelihood that the asset value falls below the investment cost at some time over the next five years and is given by expression (32). Panel B plots the abnormal announcement returns under equity financing and separation against the loss probability for different parameter values. The horizontal axis measures the rise in the stock price for high-type firms at the separating investment threshold. The base parametrization is p =0.5, Lg ¼ 1:25, Lb ¼ 1, r = 0.05, m ¼ 0:01, s ¼ 0:25, I = 100, and F =10/r. We focus on environments in which firms separate by issuing equity.
280
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
One interesting testable implication that comes out of our model is that the probability of ex post losses should be negatively related to the size of abnormal announcement returns at the time of investment. Indeed, we can observe in Fig. 5 that abnormal announcement returns increase with the growth potential of the good type and operating leverage, and decrease with volatility. By contrast, the probability of ex post losses decreases with the growth potential of the good type and operating leverage and increases with volatility. Similarly, because the size of announcement returns depends on the degree of valuation uncertainty (as measured by the difference in project quality) prior to the announcement of the investment decision, another testable implication of our model is that the likelihood of ex post losses should be negatively related to the degree of valuation uncertainty. Our model also allows to generate empirical implications on the expected time to ex post losses. Denote by T the first time that real asset values fall below investment cost. We have for s2 4 2m: @E½erT 1 L PðXt Þ ¼ 2 : E½T ¼ lim ln k r-0 @r F þI s =2m In this equation, Xt ¼ X g under perfect information, and Xt ¼ X 4X g under asymmetric information. In the separating equilibrium, an increase in the quality of the good type makes it more costly for the bad type to mimic and hence, allows the good type to invest closer to first best, thereby reducing the probability of ex post losses. As a result, another testable implication of our model is that the lag between the time of investment and the first occurrence of sustained operating losses should increase with abnormal announcement returns at the time of investment and with the degree of valuation uncertainty prior to investment. These empirical predictions are again unique to our theory. %
5.3. The debt-equity choice So far the analysis has focused on the predictions of the model for the timing of investment and the likelihood of operating losses following investment, that is the asset side of the balance sheet. In this section, we perform a simulation experiment using the procedure described in Appendix Appendix G (see also Berk, Green, and Naik, 1999; or Strebulaev, 2007) to validate that the predictions of the model are consistent with the data on financing. For this purpose, we simulate a total of N=60,000 artificial firms and construct explanatory variables for the debtequity choice similar to the ones used in recent empirical studies (see, e.g., Leary and Roberts, 2010). The construction of the explanatory variables is discussed in Appendix G. In the model, the debt-equity choice is a nonlinear function of input parameter values. This relation can be linearized, yielding a binary choice equation like the one typically estimated in the empirical literature. Therefore, the specification we estimate takes the form of a simple discrete choice model for the financing vehicle. Let ei =1 when equity issuance is the least-cost financing vehicle according to the model, and ei =0 otherwise. Denote by y*i the projection of the net benefit of equity over debt
issuance on a vector xi of observable characteristics and proxies for the model parameters, defined by yi ¼ yuxi þ ei . Then, ei = 1 is equivalent to yi Z 0 and we have Prðei ¼ 1Þ ¼ Prðei Z yuxi Þ:
ð33Þ
When ei follows a normal (logit) distribution, estimating the parameters y amounts to a probit (logit) regression. Table 3, Panel A summarizes the estimation results for different specifications. Consistent with the empirical literature on the financing choice between equity and debt, and in contrast to the pecking order hypothesis, the model predicts that equity issuance is more prominent in small firms with sizeable investment opportunities, that is for small high M/B firms, and when cash flow volatility, leverage, and default costs are high (see, e.g., Leary and Roberts, 2010). The remaining columns show that these results are robust to the specification and the assumption on the error term distribution. The second column restricts the explanatory variables to observables and uses the alternative M/B definition, and the last column performs a logit estimation. The results are similar to those in the first column. Using the estimated regression parameters y, we can compute predicted probabilities for the financing decision. These can then be used to evaluate the ability of a linear discrete choice model to replicate the choice probabilities from the structural model. We follow Leary and Roberts (2010) and first determine the empirical likelihood of an equity issuance in the simulated data, P p ¼ 1=N i ei . The firm’s predicted financing choice is equity issuance if Prðei ¼ 1Þ Z p . As in Leary and Roberts (2010), the exact choice of threshold (p rather than a 0.50 cutoff) has little impact on our conclusions since we are interested in the model’s ability to characterize financing decisions as a whole. Table 3, Panel B summarizes the prediction accuracy of the model. In 82% to 95% of cases the empirical model predicts the correct financing choice. These numbers compare favorably with the classification accuracy of probit/logit regressions applied on Compustat data. This confirms that the explanatory variables used in empirical studies can capture the impact of asymmetric information on financing choices. 5.4. Additional implications and empirical predictions Our model predicts that real and financial decisions should be jointly determined and that financing decisions, abnormal announcement returns, and the costs of outside funds should depend on a number of firm- and industry-specific factors. Table 4 summarizes these predictions for the different types of equilibria. While many of our predictions on financing decisions are shared with other theories and, hence, have already been tested (see the discussion below), most of the other predictions in Table 5 are unique to our theory and provide grounds for further empirical work. Cost of outside funds: Our model with endogenous financing constraints allows us to quantify financing costs and to relate these to observable characteristics. Fig. 3, Panel D plots the cost of financing as a function of the
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
281
Table 3 The debt-equity choice. The table shows the determinants of the debt-equity financing choice. Panel A reports parameter estimates from a linear discrete choice model for the equity issuance decision in a sample of 60,000 artificial firms simulated from our model. The first column of each specification reports the coefficient estimate, and the second the marginal effect on the choice probability. The marker y next to the coefficient indicates the p-value is larger than 0.001 in a ttest of insignificance. Panel B shows the in-sample prediction accuracy for this model. Panel A: Parameter estimates Probit (Spec. 1)
Probit (Spec. 2)
Logit (Spec. 1)
Logit (Spec. 2)
Market-to-book
Coef. 3.12
@y=@x 0.85
Coef. 4.58
@y=@x 1.01
Coef. 5.87
@y=@x 0.86
Cash flow volatility [s]
0.13
0.03
0.03
0:07y 0.27 0.63 0.05 1.18 0.64
0.01 0.22
0.21
0:26y 1.01 2.31 0.20 4.32 2.35
0.06 1.00 0.54 2.70 0.16
0.12 0.60 0.04
0:42y 1.62 4.16 0.35 7.71 4.00
– – – – – – –
– – – – – – –
– – – – – – –
– – – – – – –
– – – – – – –
Firm size [lnðX Þ] Leverage [F] Cash flow growth [m] Default cost [a] Growth potential[lg =lb ] Belief [p] Interaction terms: p Market-to-book p Cash flow volatility p Firm size p Leverage p Cash flow growth p Default cost p Growth potential R2
Coef.
@y=@x
0:06y 0.24 0.61 0.05 1.13 0.59
1:54y 1.58 15.62
0:37y 0.38 3.74
20.66 9.24 2.25 13.90 214.35
4.95 2.21 0.54 3.33 51.31
– – – – – – –
6.66 13.25 1.62 22.64 23.45 5.00 2.23
1.59 3.17 0.39 5.42 5.61 1.20 0.53
0.70
0.54
0.69
0.81
Probit (Spec. 1)
Probit (Spec. 2)
Logit (Spec. 1)
Logit (Spec. 2)
Panel B: Predictive accuracy
Predicted decision Observed decision
Debt
Equity
Debt
Equity
Debt
Equity
Debt
Equity
Debt Equity
0.93 0.11
0.07 0.89
0.82 0.05
0.18 0.95
0.92 0.10
0.08 0.90
0.94 0.06
0.06 0.94
Table 4 Additional empirical predictions of the model. The table summarizes the predictions of the model for the relation between informational asymmetries and the cost of external funds, firms’s financing strategies, abnormal announcement returns, and the probability of ex post losses. In the table, a + sign indicates that the variable of the corresponding row has a positive first derivative with respect to the parameter in the corresponding column. The marker % next to the sign indicates the prediction is unique to our theory.
External financing costs Financial leverage Abnormal announcement returns Probability of ex post losses
Val. uncertainty Lg =Lb
Oper. leverage F
Volatility
Growth rate
s
m
+ / +
= / – +
+/ +
%
%
%
various parameters of the model. The figure shows that the costs of outside funds are not constant, as often assumed in the literature on exogenous financing constraints (see, e.g., Gomes, 2001). The costs of outside funds are not even a monotonic function of the model parameters. For example, while worsening adverse selection may reduce investment in a model with exogenous financing constraints, it may actually encourage investment and discourage the use of debt. Similarly, a change in the volatility of the cash flow shock may lead to a change in the firm’s financing strategy and, hence, may not imply a monotonic response of the costs of outside funds. We explore these differences further in Table 5.
%
%
%
%
%
%
+
%
%
%
Table 5 reports the determinants of external financing costs in the least-cost equilibrium. The estimates are obtained by regressing financing costs on a set of characteristics using a sample of 60,000 artificial firms generated from the model. Consistent with Fig. 3, high market-to-book firms face higher funding costs, while cash flow growth and volatility and measures of project quality (Lg and p) diminish overall costs. Abnormal announcement returns: Consider next abnormal equity returns in the time window surrounding the public announcement of investment and financing decisions. The model predicts that independently of the type of separating equilibrium that prevails, abnormal
282
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
Table 5 Determinants of external financing cost. The table summarizes the determinants of external financing costs in the model. Reported are the coefficients from an ordinary-least-squares regression of a cost measure on observable characteristics and structural parameters. The dependent variable is the cost of external funding measured in percent of first-best (option) value. The cost measure is evaluated in present value terms at the zero-NPV threshold and given by Cost (%) = 100*(Vg max(Vpool,g ,Vlcs,g ,Vg,D ))/Vg . The p-value in a t-test of insignificance is smaller than 0.001 for each of the coefficients reported. The number of observations in each panel is 60,000.
Market-to-book Cash flow volatility ½s Firm size ½lnðX Þ Leverage [F] Cash flow growth ½m Default cost ½a Growth potential ½lg =lb Belief [p] Interaction terms: p Market-to-book p Cash flow volatility p Firm size p Leverage p Cash flow growth p Default cost p Growth potential Constant R2
Specification 1
Specification 2
Specification 3
3.82 0.13 0.32
2.88 0.12 0.52
5.23 0.18 0.47
1.62 1.66 0.01 0.01 2.78
1.60 1.55 0.01 – –
2.03 2.45 0.01 0.03 1.23
– – – – – – – 2.69 0.81
– – – – – – – 4.09 0.66
3.94 0.12 0.23 1.17 1.81 0.01 0.08 2.20 0.88
announcement returns should decrease with the growth rate and volatility of the firm’s cash flow shock. The model also predicts that abnormal announcement returns should increase with the growth differential between types, i.e., with the degree of valuation uncertainty prior to investment. Finally, the model predicts that abnormal returns should be higher with debt financing than with equity financing (see Fig. 3). While most of these predictions on abnormal announcement returns are novel, this last prediction of the model is consistent with the positive return shown in debt-for-equity exchanges (see, e.g., Masulis, 1983), even though the effect predicted by the model is small (consistent with the study of Eckbo, 1986). In addition, our model predicts that positive abnormal announcement returns following increases in capital expenditures should be limited to firms with good investment opportunities, as documented by Chan, Gau, and Wang (1995), and Chung, Wright, and Charoenwong (1998). Characteristics of equity issuers: Our finding that firms will choose to finance some of their growth options by issuing equity is consistent with a number of recent empirical studies on firms’ investment and financing decisions (see, e.g., Fama and French, 2005; or Frank and Goyal, 2003). These studies show that equity issues are common and that equity issuers are not typically under duress. In addition, and consistent with the predictions of our theory, Gatchev, Spindt, and Tarhan (2009) find that firms issue equity to fund investments in intangible assets such as research and development and in funding internally developed investment opportunities (for which asymmetric information is more important) rather than external acquisitions. More generally, our model shows that even in the presence of asymmetric information,
financing with equity is not a last resort. We show that firms do not follow the pecking order in financing decisions; they simply avoid issuing equity in ways that involve asymmetric information problems. This does not mean that asymmetric information is irrelevant. In fact, the firms that decide to issue equity signal their quality by distorting investment. However, we find that, in such situations, the implications of asymmetric information for capital structure can become quite limited. Pecking order theory vs. trade-off theory: Finally, another relevant feature of our model is that some of its predictions on financing decisions could also arise from a standard trade-off model or from a model based on agency conflicts within the firm. In fact, the mechanism implementing the least-cost equilibrium in our model trades off the underpricing cost due to adverse selection with investment distortions and deadweight costs of bankruptcy. Based on this trade-off, our model predicts that the use of debt should decline with the quality of the good type’s investment opportunities, the volatility of the cash flow shock, bankruptcy costs, and operating leverage. The first prediction could also be generated by a model based on shareholder-debtholder conflicts (see, e.g., Myers, 1977) or based on manager-shareholder conflicts (see, e.g., Morellec, 2004). The last three predictions could be generated by a trade-off model in which optimal capital structure balances taxes with deadweight costs of bankruptcy (see, e.g., Leland, 1994). One essential difference between our model and these competing theories is that investment and financing decisions provide information to outside investors and trigger abnormal announcement returns, consistent with the empirical evidence. Another important difference is that first-best trade-off models predict that firms should
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
always issue some debt whereas our model can generate zero-leverage firms. 6. Conclusion
Appendix A. Investment timing with symmetric information The required rate of return for investing in the firm’s equity is r. In turn, old shareholders only receive capital gains of E½dV k over each time interval dt, because the firm does not produce any cash flows before investment. Thus, in the region for the cash flow shock where there is no investment (X oX k Þ, the value of the firm’s growth option satisfies: @V 1 2 2 @2 Vk s X þ mX k 2 2 @X @X
for k ¼ g,b:
ðA:1Þ
The solution of (A.1) is Vk ðXÞ ¼ AX x þBX n ,
ðA:2Þ
where x and n are the positive and negative roots of the equation 12 s2 yðy1Þ þ myr ¼ 0. Eq. (A.2) is solved subject to the following boundary conditions. First, the value of equity at the time of investment is equal to the payoff from investment (value-matching). In addition, to ensure that investment occurs along the optimal path, the value of equity satisfies the smooth-pasting condition at the endogenous investment threshold (see Dixit and Pindyck, 1994). Finally, as the value of the cash flow shock tends to zero, the option to invest becomes worthless. In summary, Vk ðXÞjX ¼ X ¼ Lk PðX k ÞFI,
ðA:3Þ
k
@Vk =@XjX ¼ X ¼ Lk @PðXÞ=@XjX ¼ X ,
ðA:4Þ
lim Vk ðXÞ ¼ 0:
ðA:5Þ
k
X-0
Condition (A.5) implies B =0. Condition (A.3) implies A ¼ ½Lk PðX k ÞFIðX k Þx . Simple manipulations of (A.3) and (A.4) yield the expression for X k . Appendix B. Single-crossing property
This paper develops a real options model to examine the effects of asymmetric information on investment and financing decisions when external funds are needed to finance investment. In the model, the firm’s financing and investment strategies are jointly determined and result from value-maximizing decisions. We show that by timing their decisions, corporate insiders can communicate their private information about the firm’s prospects to outside investors. In particular, we show that by accelerating investment, firms with positive private information can make it more costly for firms with negative information to mimic and, hence, get better terms on the securities they issue. We then show that this result has a wide range of empirical implications for firms’ investment and financing policies, abnormal returns following the announcement of corporate policy choices, external financing costs, and the role of firm and industry characteristics in shaping corporate policies. Some of these predictions shed light on existing findings. Others are novel and provide grounds for further empirical work on corporate policy choices.
rV k ¼
283
k
The valuation of type k when signaling by investing at X and when the perceived type is L equals Lk PðX ÞF X x Vk ðX; X , LÞ ¼ 1þ DnðX ; LÞ X x L PðX ÞF X ¼ k : ðB:1Þ ½LPðX ÞFI LPðX ÞF X This implies that we have " @ Lk PðX Þ LPðX Þ Vk ðX; X , LÞ ¼ Lk PðX ÞF LPðX ÞF @X # LPðX Þ 1 þ x Vk ðX; X , LÞ, LPðX ÞFI X " # @ PðX Þ PðX Þ Vk ðX; X , LÞ: Vk ðX; X , LÞ ¼ @L LPðX ÞFI LPðX ÞF Single-crossing can be checked as follows. Along any isovalue curve, ð@=@X ÞVk ðX; X , LÞ þ ð@=@LÞVk ðX; X , LÞð@L=@X Þ ¼ 0. The elasticity of substitution between perceived quality L and investment signal X equals Lk PðX Þ @ x Vk ðX; X , LÞX @L X L k PðX ÞF 1: ¼ @X ¼ @ @X L LPðX Þ LPðX Þ V ðX; X , LÞL @L k LPðX ÞF LPðX ÞFI ðB:2Þ Expression (B.2) depends positively (and, hence, the elasticity between the competitively required ownership dilution DnðX ; LÞ and investment threshold X depends negatively) on the type k as long as f Z 0. That is, the single-crossing property holds: 0 1 @ V ðX; X , L Þ k C @ B B@ L C 4 0: ðB:3Þ A @Lk @ @ Vk ðX; X , LÞ @X This result can be extended to a broader class of production functions. For a general exercise value, denoted vðLk ,X Þ, the valuation of type k when signaling by investing at X and when the perceived type is L equals x x vðLk ,X Þ X vðLk ,X Þ X ¼ : ½vðL,X ÞI Vk ðX; X , LÞ ¼ X 1þ DnðX ; LÞ X vðL,X Þ ðB:4Þ This implies that we have " @ v2 ðLk ,X Þ v2 ðL,X Þ Vk ðX; X , LÞ ¼ @X vðLk ,X Þ vðL,X Þ # v2 ðL,X Þ 1 þ x Vk ðX; X , LÞ, X vðL,X ÞI " # @ v1 ðL,X Þ v1 ðL,X Þ V ðX; X , LÞ ¼ Vk ðX; X , LÞ: @L k vðL,X ÞI vðL,X Þ
284
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
Hence, the term 0 11 @ " # B@LVk ðX; X , LÞC B C ¼ v2 ðLk ,X ÞX x @ @ A vðLk ,X Þ Vk ðX; X , LÞ @X ! vðLX Þ 1 v2 ðL,X Þ ½vðL,X ÞI þ v1 ðL,X ÞX I v1 ðL,X Þ depends negatively (and, hence, the elasticity between ownership dilution DnðX ; LÞ and investment threshold X depends negatively) on the type k as long as @ @Lk
v2 ðLk ,X ÞX vðLk ,X Þ
%
Vlcs,g ðXÞ 8 ! x > X Lg PðX ÞF F þ I X x > > > ½Lg PðX ÞFI ¼ > > X Lb PðX ÞF x1 X b < !x ¼ > > X Lg x F þ I X x > > ¼ ½Lg PðX g ÞFI > > Lb x1 X b : Xg %
if X o X g ,
%
%
%
%
otherwise,
ðC:3Þ and
!
x X F þI X x ðXÞ ¼ ½Lb PðX b ÞFI ¼ ¼ Vb ðXÞ: Vlcs,b x1 X b Xb
r0,
or, equivalently,
%
v1 ðLk ,X Þv2 ðLk ,X Þ
vðLk ,X Þ r
Hence, the market value of each firm before investment satisfies condition (11) for X oX , and the intrinsic values of the high-type and, respectively, low-type firm before investment are given by
v12 ðLk ,X Þ
:
ðB:5Þ
A sufficient condition for X 2 ½X min ,X to be a Perfect Bayesian Equilibrium (PBE) is that the good type has no incentive to deviate to any out-of-equilibrium allocation X given some set of out-of-equilibrium beliefs LðX Þ:
Appendix C. Investment timing in the separating equilibrium
Lg PðXÞFI
|fflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflffl}
Value in separating equilibrium
The bad type firm is indifferent between mimicking the good type at X r X b and waiting to follow its firstbest strategy under perfect information if the incentive compatibility constraint (9) holds. After simplifications, this equation can be written as
x Lg PðX ÞF X Z max : X 1þ DnðX ; LðX ÞÞ X |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Value under defection 4 0
ðC:4Þ
%
%
X Xb
!x "
Lg 1 1
Lb
%
xðF þ IÞX xðF þ IÞX ðx1ÞFX b
#
%
%
!
Lg X ¼ ð1xÞ þ x : Lb Xb %
The left-hand side of (C.4) is the value under separation at X. The right-hand side of (C.4), in turn, is the good type’s value when investing at the threshold X and given beliefs LðX Þ. For X to constitute a PBE strategy, it suffices to show that condition (C.4) holds under pessimistic beliefs (LðX Þ ¼ Lb 8X Þ:
ðC:1Þ
The condition for X min can be derived analogously. The threshold X min for which the incentive compatibility constraint (10) is binding represents the lowest value of the cash flow shock such that the good type prefers separation over pooling with the bad type. The threshold X min satisfies: !x Lg PðX b ÞF X min Lg PðX min ÞFI ¼ , ½Lb PðX b ÞFI Lb PðX b ÞF Xb or
%
%
Lg PðX ÞFI Z
Lg PðX ÞF X ½Lb PðX ÞFI Lb PðX ÞF X
!x
%
for all X ZX : From the bad type’s incentive compatibility (9), we have !x L PðX ÞF X Lg PðX ÞFI ¼ g ½Lb PðX b ÞFI Lb PðX ÞF Xb !x Lg PðX ÞF X ½Lb PðX ÞFI Z Lb PðX ÞF X !x Lg PðX ÞF X Z : ½Lb PðX ÞFI Lb PðX ÞF X %
%
%
%
%
%
%
%
X min Xb
!x
Lg 1 1
xðF þIÞ Lb xðF þ IÞðx1ÞF
! Lg X min ¼ ð1xÞ þ x : Lb Xb
ðC:2Þ
Investment is distorted in the separating equilibrium if at X g the left-hand side in (9) is larger than the right-hand side, i.e., !x Lb PðX g ÞF Xg , ½Lg PðX g ÞFI 4 ½Lb PðX b ÞFI Lg PðX g ÞF Xb or, equivalently, condition (13). From the bad type’s incentive compatibility (9), we have ! !x Lg PðX ÞF F þ I X Lg PðX ÞFI ¼ : Lb PðX ÞF x1 X b %
%
%
%
The first inequality stems from the optimality of X b , and the second inequality from the fact that ½Lg PðX ÞF=½Lb PðX ÞF is decreasing in X .
Appendix D. Investment timing in the pooling equilibrium Conditions (17) and (18) can be, respectively, rewritten as
Lb PðX pool ÞF F þ I X pool ½Lpool PðX pool ÞFI Z x1 X b Lpool PðX pool ÞF
!x ,
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
brium is related to the intrinsic value of each type as follows:
and
Lg PðX pool ÞF ½Lpool PðX pool ÞFI Lpool PðX pool ÞF
Vpool ðXÞ ¼ p
8 ! !x > > Lg PðX ÞF F þ I X pool > > > > < Lb PðX ÞF x1 X b Z !x > > Lg x F þI X pool > > > > : Lb x1 X b %
Lg PðX pool ÞF X ½Lpool PðX pool ÞFI Lpool PðX pool ÞF X pool
if X o X g , þ ð1pÞ
Lb PðX pool ÞF X ½Lpool PðX pool ÞFI Lpool PðX pool ÞF X pool
otherwise: ¼ ½Lpool PðX pool ÞFI
Since
Lb PðX ÞF ½Lg PðX pool ÞF, Lg PðX ÞF
condition (17) holds whenever condition (18) is satisfied so long as X pool Z X (and X o X g Þ. The best pooling equilibrium for a type k firm is one that maximizes the present value of the cash flows accruing to the incumbent shareholders. The objective of management is thus to pick an investment threshold solving the following optimization program: 8 !x 9 < L PðX = X k pool,k ÞF maxX : ½Lpool PðX pool,k ÞFI pool,k : Lpool PðX pool,k ÞF X pool,k ; %
%
The solution to firm k’s problem is given by the smoothpasting condition
Lk PðX pool,k Þ Lpool PðX pool,k Þ Lpool PðX pool,k Þ þ ¼ x: Lk PðX pool,k ÞF Lpool PðX pool,k ÞFI Lpool PðX pool,k ÞF ðD:1Þ The solution to firm g’s problem is given by X pool,g ¼
wg 1 Lpool
ðF þ IÞ,
ðD:2Þ
with wg 4 0 as the solution to
wg þ
F þI
"
F þI
!#
Lpool wg 1 I þ 1 Lg wg
F
I 1þ
! ¼ x,
F
wg I
or, equivalently " ! # wg x wg x F 2 F F w þ þ wg þ 1 þ I I wg ðwg 1Þ g I wg Lpool F 1 ¼ 0: ðD:3Þ Lk I A pooling equilibrium, hence, exists if and only if X o X g and condition (18) holds at X pool,g or, equivalently, if the fraction of good projects in the economy, p), is large enough that the positive root wg of Eq. (D.3) satisfies " !# Lpool wg 1 I þ 1 F %
wg
Lg
!1x
ðIwg þ FÞ Z
wg 1 wg
1 Lpool
x Lg
x
1x
x1
!x
X pool
:
ðD:5Þ
Appendix E. Adverse selection and timing constraints
%
rm
X
!x
!
%
wg
!x
%
%
Lb PðX pool ÞF Z
285
%
Lg PðX ÞF Lb PðX ÞF %
!
Lb Lpool
x :
ðD:4Þ
Denote by p the critical threshold at which (D.4) is binding. Finally, the market value in the pooling equili-
One essential difference between the analysis in this paper and the analysis in Myers and Majluf (1984, MM) is that MM assume that ‘‘the investment opportunity evaporates if the firm does not go ahead at time t = 0’’ (p. 190). In the current paper, we make the opposite assumption and consider that each firm can delay investment as much as it desires. To consider intermediate cases, suppose that if the firm does not exercise its investment opportunity promptly, the project can evaporate. Specifically, consider that with some probability l dt over the time interval dt the project can disappear because, e.g., the firm’s product becomes obsolete. Under this assumption, the expected time before the project evaporates is given by Z 1 1 E½T ¼ ltelt dt ¼ , ðE:1Þ 0
l
showing that as l tends to infinity, the firm can no longer delay investment. As before, denote by Vk the value of type k’s investment project and by PðxÞ the present value of a perpetual stream of cash flows X starting at X0 =x. Because the firm does not produce any cash flows before investment, the value of the growth option satisfies the following ordinary differential equation (ODE) (for X o X k Þ: ðr þ lÞVk ¼
@V 1 2 2 @2 Vk s X þ mX k : 2 2 @X @X
ðE:2Þ
This ordinary differential equation is similar to the one obtained above and incorporates an additional term that reflects the impact of the timing constraint on the value of the project. This term equals lð0Vk Þ, since with probability ldt the value of the investment opportunity will drop from Vk to zero. In the perfect information benchmark, this equation is solved subject to the same no-bubbles, valuematching, and smooth-pasting conditions as before. Solving the optimization problem of shareholders yields the value of equity in the perfect information benchmark given by bðlÞ X Vk ðXÞ ¼ ½Lk PðX k ðlÞÞFI : ðE:3Þ X k ðlÞ In this equation, the value-maximizing investment threshold X k ðlÞ is defined by X k ðlÞ ¼
bðlÞ rm ðF þIÞ, bðlÞ1 Lk
ðE:4Þ
286
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where bðlÞ ¼ ðs2 mÞ=s2 þ ½ðms2 Þ=s2 2 þ 2ðr þ lÞ=s2 . These expressions are identical to those reported above except that the elasticity bðlÞ reflects the time constraint imposed on the investment decision. As the hazard rate l increases, bðlÞ increases and therefore bðlÞ=ðbðlÞ1Þ decreases. That is, firms speed up investment as l increases. In the limit as l-1, the factor bðlÞ=ðbðlÞ1Þ capturing the delay in investment converges to one. Consider now the limiting case in which l-1. As long as the pooled value exceeds the investment cost, incumbent shareholders can finance the project and keep a positive fraction of the firm’s equity after investment. In particular, using the expression for the number of shares DnðX, Lpool Þ that have to be issued at the time of investment, we have that the value of the claims of the incumbents in the pooling equilibrium is given by
Lk PðXÞ Lk PðXÞ ¼ ½L PðXÞFI 1 þ DnðX, Lpool Þ Lpool PðXÞF pool
for k ¼ g,b:
ðE:5Þ
investment over each interval of time of length dt is given by: ðLk Xf cÞ dt. In addition to this cash flow, shareþ holders receive capital gains of E½dEk over each time interval. The required rate of return for investing in the firm’s equity is r. Applying Itˆo’s lemma, it is then immediate to show that the value of equity after investment satisfies the following ODE in the region for the cash flow shock where there is no default (X 4 X k ðcÞÞ: rEkþ ¼
@E þ 1 2 2 @2 Ekþ s X þ mX k þ Lk Xf c: 2 2 @X @X
The solution of (F.1) is c x n E k ðX,cÞ ¼ AX þ BX þ Lk PðXÞF , r
Proposition 6. In the limit as l-1, both types of firms find it profitable to invest as long as the initial value of the cash flow shock satisfies X0 4
IþF
PðLpool Þ
:
ðE:6Þ
In this case, it is no longer possible to have a separating equilibrium with equity issuance. Good types reject positive NPV projects for initial values of the cash flow shock satisfying I þF
PðLg Þ
r X0 r
I þF
PðLpool Þ
:
ðE:7Þ
This proposition shows that the model of Myers and Majluf (1984) is nested in ours. In the limit as the firm cannot postpone investment, the option value of waiting to invest vanishes and firms face a now-or-never investment decision. As long as the pooled net present value of the project is positive, both types of firms will want to invest now. For initial values of the cash flow shock satisfying (E.7), the good type will find it profitable to invest while the bad type will want to mimic. The pooled value of the firm is negative, however, and no investor would be willing to provide sufficient funds for investment. This is the standard lemon’s problem in markets with asymmetric information. Appendix F. Separation through debt issuance We denote the values of equity and corporate debt after investment by Ek+ (X,c) and Dk+ (X,c), respectively. Assuming that the firm has issued debt with coupon payment c, the cash flow accruing to shareholders after
ðF:2Þ
where x and n are the positive and negative roots of the equation 12 s2 yðy1Þ þ myr ¼ 0, and F= f/r. This ordinary differential equation is solved subject to the following two boundary conditions: lim ½Ekþ ðX,cÞ=X o 1,
ðF:3Þ
Ekþ ðX,cÞjX ¼ X ðcÞ ¼ 0:
ðF:4Þ
X-1
This expression is positive and investment creates value for the old shareholders of the two types of firms as long as the pooled value exceeds the cost of investment (i.e., Lpool PðXÞF 4IÞ. By contrast, if the firms do not invest at time 0, the value of the incumbent’s claims falls to zero as the investment opportunity evaporates. We then have the following result:
ðF:1Þ
The first condition is a standard no-bubble condition implying A=0. The second condition states that equity is worthless in default implying B ¼ ½Lk PðX ðcÞÞFc=r ðX ðcÞÞn . In addition to these two conditions, the value of equity satisfies the smooth-pasting condition: @Ekþ =@XjX ¼ X ðcÞ ¼ 0 at the endogenous default threshold (see, e.g., Leland, 1994, 1998). Solving this optimization problem yields the following expression for equity value: c h ci X n Ekþ ðX,cÞ ¼ Lk PðXÞF Lk PðX ÞF , ðF:5Þ r r Xk where the selected default threshold X is given by n rm c X k ðcÞ ¼ Fþ : r n1 Lk
ðF:6Þ
Taking the trigger strategy X ðcÞ as given, the value of corporate debt satisfies in the region for the cash flow shock where there is no default (X 4 X ðcÞÞ: rDkþ ¼
@D þ 1 2 2 @2 Dkþ s X þ mX k þ c: 2 2 @X @X
ðF:7Þ
This equation is solved subject to the standard no-bubbles condition limX-1 Dkþ ðX,cÞ ¼ c=r and the value-matching condition: Dkþ ðX,cÞjX ¼ X k ðcÞ ¼ ð1aÞPðX k ðcÞÞF. This condition states that when the firm defaults, the value of corporate debt is equal to the abandonment value of the firm net of default costs. Solving this valuation problem gives the value of corporate debt as i X n c hc Dk ðX,cÞ ¼ ð1aÞLk PðX k ðcÞÞ þF r r X k ðcÞ 0 1n 1n h i c c n P ðXÞ B C pool þF ¼ Lnk 1ð1aÞ @ n A : r r n1 n1 ðF:8Þ Expression (F.8) yields the following useful relation between the debt values: Lng ½Db ðX,cÞc=r ¼ Lnb ½Dg ðX,cÞc=r.
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
The value of the firm after investment is now given by Vkþ ðX,cÞ ¼ Dkþ ðX,cÞ þ Ekþ ðX,cÞ, or n X F Vkþ ðX,cÞ ¼ Lk PðXÞaPðX k ðcÞÞ X k ðcÞ c ðF:9Þ ¼ Lk PðXÞFZ½ Dk ðX,cÞ, r where the coefficient Z is given by (24). The second line in (F.8) and (F.9), respectively, follow from (F.6). F.1. Investment timing with debt under symmetric information Under symmetric information the budget constraint Dk(X,ck(X)) =I implies that the coupon ck(X) selected by type k at the time of investment is given by the solution to
c n1 c a n 1n k k þF I ¼ ½Lk PðXÞn : ðF:10Þ r r Z n1 The credit spread on the debt contract is then given by rk ðXÞ ¼ ck ðXÞ=r=I1 at the date of issuance. One can now rewrite (F.9) as Vkþ ðX,ck ðXÞÞ ¼ Lk PðXÞFZIrk ðXÞ. The value function of a type k firm at any time before investment at threshold X k,D then equals !x X ðXÞ ¼ ½Lk PðX k,D ÞFIð1 þ Zrk ðX k,D ÞÞ : ðF:11Þ Vk,D X k,D The smooth-pasting condition for the optimal investment threshold X k,D requires
287
Using the expressions for Vb+ (X,cg), Db(X,cg), and Vb ðX D Þ given in the main text, we can rewrite this equation as 2 n 3 Lb !x ! 1ð1 Z Þ 6 Lg 7 XD XD 6 7 x ¼ ð1xÞ61 þ rg ðX D ÞI 7: 4 5 IþF Xb Xb %
%
%
%
The IC condition (28) holds with equality at some threshold X min,D oX g,D . For all X oX min,D , separation is not a best-response for the good type since the investment distortions required to separate from the bad type are too large compared to the underpricing in a pooling equilibrium. A separating equilibrium in debt exists only if X min,D rX D . The critical threshold X min,D at which the inventive compatibility constraint (28) of the good type binds is given by the solution to %
Vgþ ðX min,D ,cg ðX min,D ÞÞDg ðX min,D ,cg ðX min,D ÞÞ !x Lg PðX b ÞF X min,D , ¼ Xb 1þ DnðX b ; Lb Þ which reduces to !x ! X min,D Lb ð1xÞF X min,D x 1 1 Lg xI þ F Xb Xb Lb ZI : 1þ rg ðX min,D Þ ¼ ð1xÞ IþF Lg F.3. When is debt financing the least-cost separating equilibrium?
x½Lk PðX k,D ÞFIð1 þ Zrk ðX k,D ÞÞ ¼ Lk PðX k,D Þ ZI
@rk ðX k,D Þ @X k,D
The valuations of the good type when separating in debt, separating in equity or pooling in equity are:
X k,D : ðF:12Þ
In Eq. (F.12), applying the Implicit Function theorem to (F.10) yields 0 1 B B @rk ðX k,D Þ B ¼B B @X k,D B @1þ ðn1Þ
n rk ðX k,D Þ F 1 þ þ rk ðX k,D Þ I
C C C rk ðX k,D Þ C : C X C k,D A
F.2. Incentive compatibility The critical threshold X D at which the inventive compatibility (IC) constraint (27) of the bad type binds is given by the solution to %
%
%
%
%
1a r ÞI F þð1 þ n 1 na k ¼ F þ ð1þ nrk ÞI
Xb
Separation in equity : x X ¼ ½Lg PðX ÞFI X X ½Lg PðX g ÞFI Xg
!x
X g,D
!x Vg ðXÞ
Vb ðXÞ:
%
!
Lg PðX ÞF V ðXÞ if X o X g : Lb PðX ÞF b %
%
¼
Xg
!x
X g,D
%
ðF:13Þ
Vbþ ðX D ,cg ðX D ÞÞDb ðX D ,cg ðX D ÞÞ ¼ Vb ðX D Þ:
1a F þ ð1 þ n r ÞI 1na k F þ ð1þ nrk ÞI
%
Solving for the investment threshold yields Eq. (25). The value function then equals " # !x @rk ðX k,D Þ 1 Xk X x Vk,D ðXÞ ¼ Lk PðX k,D ÞZI X k,D x X k,D Xk @X k,D " # !x ð1nÞZrk ðX k,D Þ Xk F þI X x ¼ 1þ : x1 X k 1 þ FI þ nrk ðX k,D Þ X k,D
%
Separation in debt :
Lg Lb
x
%
Vb ðXÞ if X Z X g :
ðXÞ: Pooling in equity : Vpool,g
Appendix G. Simulation procedure This appendix describes the procedure used to simulate the panel of firms underlying the analysis of the determinants of investment hazards, financing choices, and costs of external funds. In all of these analyses, we assume that the economy consists of N firms. Each firm i is characterized by the model parameters ðLg , Lb , s, m, a,F,pÞ, which may be
288
E. Morellec, N. Sch¨ urhoff / Journal of Financial Economics 99 (2011) 262–288
firm- or industry-specific. The investment expenditure is normalized to I= 100. The risk-free rate is assumed to equal r =5%. The firms’ own parameters, Lg , s, m, a, and F, and the capital market’s beliefs about other firms, Lb and p, are all allowed to vary, the latter representing differences across industries and varying economic conditions. We introduce variation across firms by drawing for each firm separate parameters from their natural domains. We also allow for correlation across firms in their respective characteristics. There are several ways in which this can be achieved. For comparability with the numerical analysis in Sections 3 and 4, we opt for perturbations of the base parametrization in Figs. 1 to 4. That is, we start with the base parametrization Lg ¼ 1:25, Lb ¼ 1, m ¼ 0:01, s ¼ 0:25, a ¼ 0:25, f= 10 and draw each of the parameters from uniform distributions with the same bounds as in Figs. 1–5 while keeping the other parameters fixed. The belief p varies from zero to one in steps of 1%. We simulate a total of N = 60,000 firms. The variables that determine investment and financing strategies in our setting are the firms’ market-to-book ratio, the firms’ growth potential (as measured by Lg =Lb Þ, cash flow volatility s, cash flow growth m, firm size (measured by the natural logarithm of cash flows X Þ, operating leverage F, default costs a, and the fraction p of good firms in the industry/economy. We measure the market-to-book ratio either at the time of investment or at the zero-NPV threshold. Firm types are chosen randomly according to the value of p. References Barclay, M., Morellec, E., Smith, C.W., 2006. On the debt capacity of growth options. Journal of Business 79, 37–59. Berk, J., Green, R., Naik, V., 1999. Optimal investment, growth options and security returns. Journal of Finance 54, 1553–1607. Bustamante, C., 2009. The dynamics of going public. Unpublished working paper, London School of Economics. Chan, S., Gau, G., Wang, K., 1995. Stock market reaction to capital investment decisions: evidence from business relocations. Journal of Financial and Quantitative Analysis 30, 81–100. Cho, I.K., Kreps, D., 1987. Signaling games and stable equilibria. Quarterly Journal of Economics 102, 179–221. Chung, K., Wright, P., Charoenwong, C., 1998. Investment opportunities and market reaction to capital expenditure decisions. Journal of Banking and Finance 22, 41–60. Dixit, A., Pindyck, R., 1994. Investment Under Uncertainty. Princeton University Press, Princeton, NJ. Dunne, P., Hughes, A., 1994. Age, size, growth, and survival: UK companies in the 1980s. Journal of Industrial Economics 42, 115–140. Eckbo, E., 1986. Valuation effects of corporate debt offerings. Journal of Financial Economics 15, 119–151. Evans, D., 1987. The relationship between firm growth, size, and age: estimates for 100 manufacturing industries. Journal of Industrial Economics 35, 567–581. Fama, E., French, K., 2005. Financing decisions: who issues stock? Journal of Financial Economics 76, 549–582. Frank, M., Goyal, V., 2003. Testing the pecking order theory of capital structure. Journal of Financial Economics 67, 217–248. Gatchev, V., Spindt, P., Tarhan, V., 2009. How do firms finance their investments? The relative importance of equity issuance and debt contracting costs. Journal of Corporate Finance 15, 179–195.
Gomes, J., 2001. Financing investment. American Economic Review 91, 1263–1285. Grenadier, S., Wang, N., 2005. Investment timing, agency, and information. Journal of Financial Economics 75, 493–533. Grenadier, S., Malenko, A., 2010. Real options signaling games with applications to corporate finance. Unpublished working paper, Stanford University. Hackbarth, D., Mauer, D., 2010. Optimal priority structure, capital structure, and investment. Unpublished working paper, University of Illinois. Hall, B., 1987. The relationship between firm size and firm growth in the U.S. manufacturing sector. Journal of Industrial Economics 35, 583–606. Harrison, M., 1985. Brownian Motion and Stochastic Flow Systems. Krieger Publication Co, Malabar. Helwege, J., Liang, N., 1996. Is there a pecking order? Evidence from a panel of IPO firms. Journal of Financial Economics 40, 429–458. Hennessy, C., Livdan, D., Miranda, B., 2010. Repeated signalling and firm dynamics. Review of Financial Studies 23, 1981–2023. Hennessy, C., Whited, T., 2007. How costly is external financing? Evidence from a structural estimation. Journal of Finance 62, 1705–1743. Huang, J., Huang, M., 2002. How much of the corporate-treasury yield spread is due to credit risk? A new calibration approach. Unpublished working paper, Stanford University. Kedia, S., Philippon, T., 2009. The economics of fraudulent accounting. Review of Financial Studies 22, 2169–2199. Leary, M., Roberts, M., 2010. The pecking order, debt capacity, and information asymmetry. Journal of Financial Economics 95, 332–355. Leland, H., 1994. Corporate debt value, bond covenants, and optimal capital structure. Journal of Finance 49, 1213–1252. Leland, H., 1998. Agency costs, risk management, and capital structure. Journal of Finance 53, 1213–1243. Leland, H., Pyle, D., 1977. Informational asymmetries, financial structure, and financial intermediation. Journal of Finance 32, 371–387. Maskin, E., Tirole, J., 1992. The principal–agent relationship with informed principal II: common values. Econometrica 60, 1–42. Masulis, R., 1983. The impact of capital structure change on firm value: some estimates. Journal of Finance 38, 107–126. McConnell, J., Muscarella, C., 1985. Corporate capital expenditure decisions and the market value of the firm. Journal of Financial Economics 14, 399–422. McDonald, R., Siegel, J., 1986. The value of waiting to invest. Quarterly Journal of Economics 101, 707–728. Mello, A., Parsons, J., 1992. Measuring the agency cost of debt. Journal of Finance 47, 1887–1904. Morellec, E., 2001. Asset liquidity, capital structure and secured debt. Journal of Financial Economics 61, 173–206. Morellec, E., 2004. Can managerial discretion explain observed leverage ratios? Review of Financial Studies 17, 257–294. ¨ Morellec, E., Schurhoff, N., 2010. Dynamic investment and financing under personal taxation. Review of Financial Studies 23, 101–146. Myers, S., 1977. The determinants of corporate borrowing. Journal of Financial Economics 5, 147–175. Myers, S., Majluf, N., 1984. Corporate financing and investment decisions when firms have information that investors do not have. Journal of Financial Economics 13, 187–221. Rajan, R., Zingales, L., 1995. What do we know about capital structure? Some evidence from international data. Journal of Finance 50, 1421–1460. Smith, C., Watts, R., 1992. The investment opportunity set, and corporate financing, dividend, and compensation policies. Journal of Financial Economics 32, 262–292. Strebulaev, I., 2007. Do tests of capital structure mean what they say? Journal of Finance 62, 1747–1787. Sundaresan, S., Wang, N., 2007. Dynamic investment, capital structure, and debt overhang. Unpublished working paper, Columbia University. Whited, T., 2006. External finance constraints and the intertemporal pattern of intermittent investment. Journal of Financial Economics 81, 467–502.
Journal of Financial Economics 99 (2011) 289–307
Contents lists available at ScienceDirect
Journal of Financial Economics journal homepage: www.elsevier.com/locate/jfec
Empty voting and the efficiency of corporate governance$ Alon Brav 1, Richmond D. Mathews Fuqua School of Business, Duke University, 1 Towerview Dr., Durham, NC 27708, USA
a r t i c l e in fo
abstract
Article history: Received 11 March 2009 Received in revised form 3 February 2010 Accepted 2 March 2010 Available online 13 October 2010
We model corporate voting outcomes when an informed trader, such as a hedge fund, can establish separate positions in a firm’s shares and votes (empty voting). The positions are separated by borrowing shares on the record date, hedging economic exposure, or trading between record and voting dates. We find that the trader’s presence can improve efficiency overall despite the fact that it sometimes ends up selling to a net short position and then voting to decrease firm value. An efficiency improvement is likely if other shareholders’ votes are not highly correlated with the correct decision or if it is relatively expensive to separate votes from shares on the record date. On the other hand, empty voting will tend to decrease efficiency if it is relatively inexpensive to separate votes from shares and other shareholders are likely to vote the right way. & 2010 Elsevier B.V. All rights reserved.
JEL classification: G34 G38 Keywords: Voting Informed trading Hedge funds Corporate governance
1. Introduction The impact of hedge funds on corporate governance has received considerable attention recently as the rise in popularity of hedge funds has coincided with an increased focus on governance in general. Much of the attention has been devoted to ‘‘activist’’ funds that take significant stakes in firms and then advocate for change.2 However, other more subtle strategies undertaken by hedge funds
$ We thank Daniel Ferreira (the referee), Alex Edmans, Ron Kaniel, Samuel Lee, Adriano Rampini, David Robinson, Zacharias Sautner, Berk Sensoy, S. Viswanathan, seminar/conference participants at Boston College, Duke, NC State, the University of Washington, Virginia Tech, the Financial Intermediaries and Markets at the Crossroads symposium, the 2009 AFA Annual Meetings, and especially Simon Gervais for helpful discussions and comments. All errors are our own. Corresponding author. Tel.: + 1 919 660 8026; fax: + 1 919 660 8038. E-mail addresses:
[email protected] (A. Brav),
[email protected] (R.D. Mathews). 1 The author is also at the NBER. 2 See Kahan and Rock (2007), Clifford (2008), Klein and Zur (2009), Greenwood and Schor (2009), and Brav, Jiang, Partnoy, and Thomas (2008).
0304-405X/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jfineco.2010.10.005
or other strategic traders can also significantly affect the efficiency of the corporate governance system. In particular, recent work has shown that some funds may use ‘‘empty voting’’—a practice whereby they accumulate voting power in excess of their economic share ownership—to manipulate shareholder vote outcomes and generate trading gains. This practice is possible even when one share, one vote is the explicit rule. It can be accomplished, for example, by borrowing shares of stock on the record date or hedging economic exposure in the derivatives markets. Hu and Black (2006, 2007) provide a number of examples where such behavior seems to result in perverse voting incentives. In one case, a hedge fund acquired votes by borrowing shares, then voted against a buyout proposal and apparently profited from a short position when the share price dropped following the vote.3 These authors suggest that some form of regulation, starting with additional disclosure
3 This incident involved a Hong Kong company named Henderson Land, which wanted to buy out a 25% minority interest in its publicly traded affiliate Henderson Investment.
290
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
requirements, may be necessary to curb the negative effects of such activities. Regulators have expressed significant concern over empty voting, particularly given the boom in the hedge fund industry and the increasing number and importance of items requiring a shareholder vote. The Wall Street Journal (January 26, 2007, p. A1) quotes Securities and Exchange Commission chairman Christopher Cox as saying that the practice of empty voting ‘‘is almost certainly going to force further regulatory response to ensure that investors’ interests are protected...This is already a serious issue and it is showing all signs of growing.’’ Many large institutional shareholders are examining their share lending practices in response to these concerns. In addition, some companies have recently amended their bylaws to force additional disclosure of complex transactions in their securities due to concerns about corporate governance implications (The Wall Street Journal, July 14, 2008, p. B4). On the other hand, Christoffersen, Geczy, Musto, and Reed (2007) argue that ‘‘vote trading’’ in the share lending market can increase efficiency because information about proposals can be costly to acquire. Uninformed shareholders who are not willing to pay the cost to become informed can sell their votes to informed parties in order to increase the efficiency of the voting outcome. Of course, this argument requires that the vote buyer and vote seller have coincident interests, which often seems to be violated in the examples cited by Hu and Black (2006, 2007). To date, there is no agreement on whether empty voting constitutes a significant problem that should be regulated. Importantly, the literature does not currently provide an integrated theoretical framework to help assess the tradeoff between increased information efficiency and the cost of possible manipulations via empty voting. In this paper, we develop a theoretical model to explore this tradeoff. We derive the optimal share and vote position of a strategic trader that has the ability to acquire unique information about the value of a management proposal and the ability to acquire votes separately from shares. We show that while the trader may sometimes reduce efficiency by ultimately selling to a net short position and then ‘‘voting the wrong way’’ (from a firm value perspective), the cost of these possible manipulations can be offset by a greater probability that the trader will ‘‘do the right thing’’ and vote to maximize firm value. In other words, in equilibrium both the presence of the strategic trader and the ability to separate votes from economic ownership can increase overall efficiency by making the ‘‘right’’ outcome more likely. This occurs when either the establishment of an empty voting stake on the record date is relatively expensive or other shareholders’ votes are not very highly correlated with the true state. However, we find that a negative efficiency effect is likely when separating votes from shares is relatively inexpensive and other shareholders are relatively likely to vote the right way. Our analysis deals with deviations from the one share, one vote rule, on which there is a large existing literature dating back to at least Manne (1964). Much of the modern
literature focuses on how the one share, one vote rule affects the efficiency of the market for corporate control (see, e.g., Harris and Raviv, 1988; Grossman and Hart, 1988; Burkart and Lee, 2008), or how disparities between cash flow and voting rights held by insiders affects efficiency (e.g., DeAngelo and DeAngelo, 1985; Gilson, 1987). These studies generally focus on long-term deviations from one share, one vote that are codified in the corporate charter. An important recent exception is Kalay and Pant (2009), which shows that the ability to separate economic and voting interests via derivatives markets can increase efficiency by allowing shareholders to extract more surplus in a control contest. Like Kalay and Pant (2009), we examine short-term deviations arising from activities in the derivatives or share lending markets. However, we focus on how these deviations affect the efficiency of voting by outsiders on regular proposals (as opposed to control contests). We think of outsiders as parties who do not make proposals themselves, but face uncertainty over whether an insider’s proposal is valueincreasing or instead self-serving. There are many types of proposals other than proxy contests or takeover bids for control that can have important value implications for the firm. Examples include proposals for the purchase of another firm, a divestiture, or a change in the corporate charter (often involving a takeover defense). In our model, the firm’s management initially proposes an action that requires shareholder approval. The proposed action may be either good or bad (i.e., its approval may either increase or decrease firm value), and its value is not observable at this stage. All shares are initially held by atomistic shareholders. After the proposal is announced, a single strategic trader can buy or sell shares in a transparent market prior to the record date (i.e., with no noise trading) and can also acquire ‘‘extra’’ votes in excess of its economic ownership by paying a convex cost. This cost represents, for example, increasing difficulty in finding shareholders from whom to borrow shares, or the increasing cost of finding counterparties to hedge a large economic interest. On the record date, voting interests are set according to share or vote ownership on that day. After the record date, there is a significant time lag before the actual date of the vote, so the strategic trader is able to further adjust its economic ownership (but not its voting interest) as well as learn about the value of the proposal.4 At this intermediate trading stage, however, the market is not completely transparent because there is noise trading by atomistic investors. Finally, on the voting date the strategic trader votes according to its economic incentives, as determined by its net economic position in the firm, while the voting of atomistic shareholders is effectively random. We do not explicitly model the atomistic holders’ voting decisions; the important feature is that their behavior induces randomness in the final voting outcome.
4 The timing of when the trader becomes informed does not matter—it can occur either before or after the record date with no significant change in the analysis.
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
Our assumptions are meant to reflect the realities of corporate governance in the United States. Christoffersen, Geczy, Musto, and Reed (2007) report that there is a significant time lag between the record date and the meeting date (a median of 54 calendar days in their sample) as opposed to the relatively short time between the announcement of the agenda and the record date. Thus, it seems reasonable that there would be little ability to trade strategically prior to the record date (which corresponds to our assumption of a transparent market at that stage), but a significant opportunity to gather information and trade less transparently between the record and voting dates. It is important to note that we highlight two ways in which empty voting can occur in the U.S. corporate governance system. In addition to the lending and derivatives markets, there is also the time lag between record and voting dates. Even if voting and economic interests have to be aligned on the record date, it is possible to divorce the two prior to the voting date by trading in the stock market during the intervening period. Our model allows us to separate the two effects. As will become clear, we find that the ease with which votes and shares can be separated on the record date is of key importance with respect to whether empty voting helps or hurts efficiency. In our model, we find that the strategic trader optimally trades to a long economic position on the record date while simultaneously acquiring ‘‘extra’’ votes, both of which set the stage for the possibility of future trading gains. The number of extra votes acquired depends on the cost of the votes versus the value (in terms of larger expected trading profits) of the increased ability to affect the vote outcome. For the bulk of the analysis, we assume that the cost of separating votes from ownership is high enough that the trader will not acquire enough votes to single-handedly determine the voting outcome. The trader’s economic position on the record date is driven by a separate tradeoff. On the positive side, a larger economic stake increases the trader’s voting power. This is valuable when extra votes are costly. On the negative side, greater economic ownership reduces expected trading gains for two reasons. First, the ‘‘future self’’ of the trader will be concerned with protecting the value of its stake in addition to maximizing trading gains. This ‘‘commitment effect’’ of owning an economic stake on the record date thus reduces expected future trading profits. Second, the strategic trader’s position reduces overall market depth. In equilibrium, the extent of the long position is determined by the expected amount of noise trading between the record and voting dates, and the ease with which votes can be acquired separately from shares on the record date. After the record date, the strategic trader plays a mixed strategy; it either buys additional shares and then votes to maximize firm value, or it sells to a net short position and then votes to minimize firm value. We find in our benchmark model that the presence of the strategic trader is good for efficiency overall when the other shareholders’ votes are not highly correlated with the correct decision, or when the ability to separate shares and votes on the
291
record date is highly restricted. Because of its long position on the record date, the strategic trader tends to ‘‘vote the right way’’ more often than not, increasing the probability of a correct decision in these situations. As market depth increases, the optimal long position on the record date increases, intensifying the positive effect. Thus, we find that allowing for trading gains by a strategic trader can increase efficiency even though the trader sometimes engages in value-reducing strategies. Also, since the strategic trader would not acquire any votes or shares in the absence of possible trading gains, noise trading improves voting efficiency. The positive efficiency effect we document for these cases is driven by the fact that the strategic trader has unique value-relevant information that other shareholders do not have. If the strategic trader brought no new information to the model, there would be no possibility of an efficiency improvement, but manipulation could still be possible. As such, our model provides a framework for determining whether and how an informed trader’s unique information is ultimately reflected in the final voting outcome and thus firm value. On the other hand, we go on to show that efficiency can be reduced by the strategic trader when other shareholders’ votes are sufficiently biased toward the correct decision and it is not too expensive to separate votes from shares on the record date. Intuitively, when it is easy to separate votes from shares on the record date, the trader chooses a smaller long position and the commitment effect is attenuated. In these cases, the trader’s efficiency-reducing votes are relatively more likely than its efficiency-enhancing votes to change the actual decision if other shareholders are likely to vote the right way on their own. Thus, the negative effects can start to outweigh the positive ones. This occurs despite the fact that the trader brings value-relevant information to the table. We also investigate how changes in the underlying parameters affect the trader’s strategy as well as overall efficiency. We find that making it easier to separate votes from economic ownership on the record date tends to decrease the trader’s economic ownership (in order to maximize trading gains by avoiding the commitment effect, as noted above). This increases the probability that the trader will go net short and vote the wrong way later. However, the net effect on efficiency can be positive when votes are not too cheap, since the additional votes also increase the trader’s ability to affect the voting outcome. This increased ability to affect the outcome can outweigh the higher probability of voting the wrong way. This result can be reversed, however, if votes are cheap enough and the correlation between others’ votes and the correct decision is high enough. Our results can provide some guidance on the efficacy of proposed regulatory reforms designed to curb or eliminate the negative effects of empty voting. For example, Hu and Black (2006, 2007) advocate additional disclosure requirements as a reasonable starting point. In the framework of our model, disclosure of an ‘‘empty voting’’ position on the record date would have no effect, because we already assume that the market maker
292
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
observes the strategic trader’s actions at that stage. Disclosure of a change in economic position relative to voting rights between the record and voting dates would have the effect of reducing or eliminating any trading profits the strategic trader could otherwise generate. This would reduce the trader’s willingness to gather information and vote. Thus, the efficiency effect of such a rule would depend case by case on whether the model predicted a positive or negative effect from the trader’s presence. Overall, our results imply that regulators should consider the possibility that curbs to empty voting behavior could be costly in cases where there is significant uncertainty about the value of a proposal. One issue we do not formally analyze is how the equilibrium would change if the strategic trader entered the model with an ex ante long or short position in the stock. The effect of this will be for the trader to have an initial bias toward protecting firm value (if initially long) or destroying firm value (if initially short). The trader’s anticipation of the commitment effect discussed above will then cause it to choose a record-date position that is increasing in its initial ownership. If the trader arrives long, the record-date position is increased and the commitment effect is strengthened, leading to a higher likelihood of a positive efficiency effect. However, if the trader arrives short, the record-date position is decreased, possibly even to a net short position, implying a higher probability of a negative efficiency effect. Another issue that is not explicitly considered in our model is the possibility that the quality of managers’ project decisions could be positively correlated with either their ability or the extent of agency problems in the firm. In other words, bad project proposals may be more likely when managers have low skill or in firms with more severe agency problems. In this situation, the strategic trader’s information about the quality of the project translates into information about the manager or firm’s overall quality, which may not be known by the market. While we do not formally model this possibility, it has interesting implications since it could result in some positive efficiency effects from empty voting even when the strategic trader votes the wrong way. In particular, when the strategic trader votes in favor of a bad proposal there will still be an efficiency cost of the proposal being more likely to be accepted, but there could be a countervailing positive effect of causing the manager’s type to be revealed more quickly than it otherwise would. This could increase efficiency if it led to greater price efficiency or an earlier opportunity for managerial discipline or turnover. Furthermore, this effect could be intensified if the strategic trader were able to condition its pre-vote trading on a good versus bad project (which we currently do not consider). In such a setting, a correlation between manager or firm type and project quality is likely to induce some asymmetry in the strategic trader’s strategy for good versus bad projects—it is likely to make the trader go short and vote the wrong way more often with bad proposals (and thus bad managers) than with good ones because the fundamental value is lower in those cases, making a short position even more attractive. We believe this is an important avenue for further research.
1.1. Related literature Our model obviously involves a form of stock price manipulation, but where the manipulation is accomplished by affecting the firm’s real operations. Closely related are Kyle and Vila (1991), Maug (1998), and Kahn and Winton (1998). In all of these models, a strategic trader can directly take an action that will affect firm value, and its ability to trade in a noisy stock market affects its incentives to do so. The main difference between our analysis and theirs is that we endogenize the trader’s ability to affect firm value by modeling the voting game and deriving the optimal ex ante share and vote position of the trader. Another strand of literature studies incentives for manipulation by traders when managers make investment decisions partially based on information gleaned from stock prices. In particular, Khanna and Sonti (2004) and Khanna and Marietta-Westberg (2005) show that informed traders may trade against their private information if that will send a valuable signal to managers about investment prospects. On the other hand, Goldstein and Guembel (2008) show that uninformed traders may take advantage of feedback effects and sell shares to manipulate prices negatively and generate trading gains. Attari, Banerjee, and Noe (2006) show that informed investors may have incentives to dump shares and move prices to induce shareholder activism. Other models of manipulation involving real activities include Bagnoli and Lipman (1996) and Vila (1989), both of which study manipulation involving direct actions such as a takeover bid. Manipulation based on information alone has also been studied widely, such as by Allen and Gale (1992) and Chakraborty and Yilmaz (2004). Many of these papers contribute to a more general literature on how large shareholders affect corporate governance. Other papers in this literature tend to focus either on blockholders who exercise ‘‘voice’’ by directly intervening in the firm’s activities (Shleifer and Vishny, 1986; Burkart, Gromb, and Panunzi, 1997), or those who use informed trading, also called ‘‘exit,’’ to improve stock price efficiency and encourage correct actions by managers (Admati and Pfleiderer, 2009; Edmans, 2009; Edmans and Manso, forthcoming). Several recent empirical papers specifically study activism by hedge funds (Kahan and Rock, 2007; Clifford, 2008; Klein and Zur, 2009; Greenwood and Schor, 2009; Brav, Jiang, Partnoy, and Thomas, 2008). Such strategies contrast sharply with the trading strategy we study in this paper, where the fund optimally hides its information about the value of a proposal while strategically using it to manipulate real outcomes and generate trading gains. Zachariadis and Olaru (2010) study the governance implications of a similarly subtle strategy, wherein hedge funds may own both debt and equity in a distressed firm, which will affect their voting on a reorganization plan. Our analysis is also closely related to the small but growing literature on vote buying. For example, Blair, Golbe, and Gerard (1989) and Neeman and Orosel (2006) show that allowing a contest for votes in addition to a contest for shares can have efficiency advantages.
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
293
Fig. 1. Timeline of the model.
However, they do not model how stock trading interacts with vote buying. Other authors have modeled trading and voting together, but without allowing for ‘‘empty voting.’’ For example, Maug (1999) models a strategic voting game in which voting and trading both help aggregate dispersed information about the value of a proposal. Musto and Yilmaz (2003) study how the operation of a financial market affects political voting. Our study is also related to papers studying the value of corporate votes. For example, Zwiebel (1995) models shareholders’ incentives to form blocks and participate in voting coalitions. Barclay and Holderness (1989) study block trades and find that there is a significant premium paid for large minority blocks, which indicates that less than majority voting control can be valuable. The paper proceeds as follows. In Section 2 we describe the model. In Section 3 we derive the equilibrium. In Section 4 we illustrate the results with several numerical examples. We discuss implications and conclude in Section 5. 2. The model The model focuses on a firm with an upcoming shareholder vote. The firm has one perfectly divisible share outstanding. The players, all of whom are risk neutral, consist of the management of the firm, a strategic trader (hereafter ‘‘H’’), a market maker, and atomistic shareholders. Management sets the agenda for the vote, but does not hold any shares and cannot vote. The market maker also does not vote. The discount rate is zero. The timeline of the game is illustrated in Fig. 1. At the beginning of the game (at time 0), management proposes an action that can be either good or bad. The proposal’s ultimate approval status determines firm value, which is either v or v. In particular, with a good proposal firm value equals v if the proposal is defeated and v ¼ v þ Dv if the proposal is approved. With a bad proposal firm value is v if the proposal is defeated and v if it is approved.5 We do not model the reason why management may make a bad proposal. As an example, it 5 The assumption that firm value can take on only two values is made for tractability. An alternative specification, where firm value equals one if any proposal is defeated and is increased by Dv if a good proposal is approved or decreased by Dv if a bad proposal is approved, yields the same qualitative results but requires additional simplifying assumptions.
could be caused by an agency problem or a lack of ability or information on management’s part. All players are initially uninformed about whether the proposal is good or bad. We assume that H costlessly becomes informed about whether the proposal is good or bad at some point on or before the voting date (it makes no difference whether this occurs before or after the record date). The market maker and the atomistic shareholders do not become informed. At time 0, all shares are held by atomistic shareholders. After the proposal is announced, but before the record date (between times 0 and 1), H can submit a market order to buy or sell shares in the firm. The order is filled by the market maker at a price equal to the expected value of the shares (the market maker and H have the same information at this point, including any ‘‘extra’’ votes H may acquire, and H’s trade is transparent to the market maker). For simplicity, we assume that the market maker holds no inventory at any stage of the model. So, for example, if it sells shares to H it is immediately able to purchase shares from atomistic holders at the same price. This simplifies the analysis because it implies that all shares will be held by H and the atomistic shareholders on the record date. The important feature of the assumption is that ownership of shares by H reduces ownership by atomistic shareholders, who are the only other parties allowed to vote, and who also determine later market liquidity. We denote H’s final economic position on the record date by aH . The strategic trader is also able to acquire votes in addition to those represented by its record-date share ownership. It can do this at a cost, cðaX Þ, that is increasing and convex in the number of ‘‘extra’’ votes, aX , as long as aX r1aH (H cannot obtain more than 100% of the votes). The cost for extra votes reflects any expense H incurs in separating its voting interest from its economic ownership. For example, the votes could be purchased on the share lending market.6 When H approaches a given atomistic shareholder to borrow its shares, H may have all of the bargaining power, and thus be able to borrow the shares at effectively zero cost; the shareholder does not believe its vote will be pivotal, and thus is willing to sell the vote for any nominal price. However, since the
6 We focus on buying votes through the share lending market for expositional simplicity. The analysis is equivalent if the empty voting position arises because H buys shares to gain votes and then hedges part of the economic exposure. In that case, the cost of extra votes becomes the cost of finding counterparties to hedge the economic exposure.
294
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
lending market is decentralized H must first find the shareholder. Our assumption then corresponds to a convex search cost function (it becomes harder to find the next shareholder the more you have already located). For simplicity, we assume that the cost function takes the form cðaX Þ ¼ max½0,ðaX aX ÞK, where aX Z0 and K is very large. Thus, extra votes are free up to aX and then prohibitively expensive beyond that.7 Consistent with these assumptions, Christoffersen, Geczy, Musto, and Reed (2007) find that the average vote sells for zero in the share lending market. Furthermore, Kolasinksi, Reed, and Ringgenberg (2008) show that the share loan supply schedule is relatively flat at lower quantity levels but becomes very steep at higher levels, and that the share lending market exhibits features consistent with significant search frictions. On the record date (at time 1), voting rights are determined according to the share and vote positions of H and the atomistic shareholders. In particular, H’s voting power is set at max½0, aH þ aX votes while the atomistic shareholders retain the remaining 1max½0, aH aX votes. Next (between times 1 and 2), a random liquidity shock may hit some of the atomistic shareholders. We assume that with probability 12 the liquidity shock hits a proportion aZ of the atomistic shareholders who held shares at time 0. Any shareholders hit by the liquidity shock immediately place market orders to sell all of their holdings. If the liquidity shock does not hit, there are no orders by atomistic shareholders. Since H owns aH shares at time 1, this means that the total number of shares sold by atomistic holders if the liquidity shock hits is aZ ð1max½0, aH Þ.8 Thus, if H buys shares before the record date, market liquidity is reduced because the pool of potential liquidity sellers is smaller. If H sells shares before the record date, liquidity is not affected because the new atomistic holders are not subject to the liquidity shock.9 H can also place a market order at this time to buy or sell whatever quantity it wishes (without first observing whether atomistic shareholders sell). The market maker observes only the total net order flow, n, which equals the sum of all orders by H and the atomistic shareholders, and sets the price at the expected firm value (given its information). There are no short sale constraints. Finally, on the voting date (at time 2), H votes its max½0, aH þ aX votes according to its own economic incentives and information. If at least 12 of the total votes are cast in favor, the proposal passes. For the 1max½0, aH aX votes held by atomistic stockholders, we assume that the proportion of ‘‘yes’’ votes cast, denoted by Y, is distributed on [0,1] according to the distribution function FG ðÞ with associated density fG ðÞ 2 C2 for a good
7 All of the qualitative results of the paper are unchanged if a continuous, convex function is assumed for cðaX Þ, as long as the function is sufficiently ‘‘steep.’’ 8 This is equivalent to assuming that the atomistic traders buy aZ ð1max½0, aH Þ=2 shares with probability 12 and sell aZ ð1max½0, aH Þ=2 shares with probability 12. 9 It seems reasonable to assume that a new shareholder buying just prior to the record date is less likely to face a liquidity shock in the short run than are pre-existing, long-term shareholders.
proposal, and FB ðÞ with associated density fB ðÞ 2 C2 for a bad proposal. We assume different distributions for good versus bad proposals to allow for the possibility of correlation between atomistic stockholders’ votes and the true state. For tractability, we assume that FG ðÞ and FB ðÞ are always mirror images, that is, that for any Z 2 ½0, 12, FB ð12 ZÞ ¼ 1FG ð12 þ ZÞ. This implies that if FG ðÞ ¼ FB ðÞ (there is no correlation with the true state), then the common distribution must be symmetric around 1 2, and thus the expected ex ante probability of approval if H has no votes equals 12 in this case. Note that the atomistic holders’ total number of ‘‘yes’’ votes equals Yð1max½0, aH aX Þ. After the proposal passes or fails, the resulting value of the firm is realized and immediately reflected in the share price. 3. Equilibrium We solve the model under the following parametric assumptions:
Assumption 1: aZ 2 ð0, 12Þ. Assumption 2: K is sufficiently large to deter any aX 4 aX .
Assumption 3: aX oð2aZ Þ=2ð2 þ aZ Þ. For the range of aZ allowed by Assumption 1, the maximum value of aX allowed by Assumption 3 varies 3 from just below 12 as aZ approaches zero to just above 10 as 1 10 aZ approaches 2. As noted previously, we also require aX r1aH since H cannot have more than 100% voting power. We derive a perfect Bayesian equilibrium (PBE), using backward induction where possible. Behavior at the voting date (time 2) is straightforward. As noted above, a random proportion Y of the atomistic shareholders vote in favor of the proposal according to FG ðÞ or FB ðÞ. H’s vote depends on its economic position in the shares. Since there is no information asymmetry after the vote, the shares will trade at their true value of v or v. Thus, if H is net long in the stock, it maximizes the value of its stake by voting in the direction that makes the v outcome more likely, i.e., in favor of a good proposal and against a bad proposal. If it is net short, it maximizes the value of its stake by voting in the opposite direction, making the v outcome more likely. We now derive the expected value of the firm depending on whether H is net long or short on the voting date. First consider a good proposal, and assume H does not have effective voting control (that is, max½0, aH þ aX o 12Þ. If H votes in favor of the proposal, the probability of acceptance is 1 Pr max½0, aH þ aX þ Yð1max½0, aH aX Þ 4 2 " # 1 max½0, a a H X ¼ Pr Y 4 2 1max½0, aH aX
10 It turns out that this ensures that H will never optimally take full control of the vote, i.e., will never set aH Z 12 aX .
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
¼ 1FG
1 2 max½0, H X
!
a a : 1max½0, aH aX
ð1Þ
Similarly, if H votes against, the probability of acceptance is 1FG ð12 =ð1max½0, aH aX ÞÞ. Now note that if H has full voting control, the probability of acceptance is one if H votes in favor and zero if H votes against. To economize on notation, we define g min½12 =ð1max½0, aH aX Þ,1 and d max½ð12 max½0, aH aX Þ=ð1max½0, aH aX Þ,0. Now, regardless of the value of aH , we can write the probability of acceptance as 1FG ðdÞ if H votes in favor and 1FG ðgÞ if H votes against. Expected firm value conditional on a good proposal and H being net long is then VL v þ Dvð1FG ðdÞÞ:
ð2Þ
Similarly, expected firm value conditional on a good proposal and H being net short is VS v þ Dvð1FG ðgÞÞ:
ð3Þ
With a bad proposal, the probability of approval given that H is net long and votes against is 1FB ðgÞ, which implies that the probability of achieving the value v is FB ðgÞ. Since FG ðÞ and FB ðÞ are mirror images and g ¼ 1d, we know that FB ðgÞ ¼ 1FG ðdÞ,
ð4Þ
which implies that expected firm value if H is net long equals VL as derived above whether the proposal is good or bad. Using a similar argument, it is easy to see that the expected value if H is net short will always equal VS as derived above. Note that this implies that the ‘‘value wedge,’’ VL VS, that H can generate by varying the sign of its final position on the voting date (and therefore its vote) can be expressed as VL VS ¼ DvðFG ðgÞFG ðdÞÞ Z 0,
ð5Þ
where the inequality follows from g Z d. We next derive the optimal trading strategy for H between the record and voting dates (between times 1 and 2) given its economic position in the stock on the record date, aH , and its quantity of ‘‘extra’’ votes, aX . H’s continuation payoff at this stage equals the expected value of its existing position plus any expected traded profits. This implies that H will never trade a quantity at this stage that leads to an expected trading loss. Any such strategy is always dominated by not trading, letting its stake remain at aH (trades at this date do not affect voting power anyway), and voting as outlined above to maximize the value of its stake. Thus, we consider only equilibria with non-negative trading profits. We assume H’s trading strategy cannot be conditioned on a good versus bad proposal. Since the values VL and VS are symmetric for good versus bad proposals, this assumption is without loss of generality.11 11 It is easy to show that if H could condition on a good versus bad proposal at this stage, the equilibrium we derive would still exist, but as part of a family of equilibria where the total probability of H ending up net long or short is the same, but the probability conditional on a good versus bad proposal can be different. This is because of the symmetry of
295
We define H’s (mixed) trading strategy as a probability distribution, sðjaH , aX , aZ ,v, DvÞ, over all possible trading quantities t 2 ½1,1, conditioned on the fixed parameters at this stage. The market maker’s strategy is to set a price schedule, pðjaH , aX , aZ ,v, DvÞ, defined over all possible order flows n. Since we are using the PBE concept, the price must be consistent with the market maker’s belief about which node in the game tree has been reached at any nonsingleton information sets. In particular, when the market maker observes a given net order flow n, it has a nonsingleton information set because it is faced with two possibilities (recall that the market maker observes only the total net order flow, not individual orders): H could have placed an order of n while atomistic holders did not trade, or H could have placed an order for n þ aZ ð1max½0, aH Þ while atomistic holders sold aZ ð1max½0, aH Þ. In a PBE, the market maker must have a belief over those two possibilities for every n and price the trades accordingly, where the belief is determined by Bayes’ rule and the equilibrium strategies where possible. Given the analysis above, the important element of the market maker’s belief is the probability with which it believes H’s final position is net long (which would imply a correct price of VL) versus net short (which would imply a correct price of VS). Thus, for every possible order flow n we summarize the market maker’s belief with the function mðnjaH , aX , aZ ,v, DvÞ, defined as the perceived probability given an observed net order flow n that H’s unobserved trading quantity leads to a net short final position. For example, if both trades that could possibly lead to a net order flow of n leave H net long given a starting recorddate position of aH , we must have mðnÞ ¼ 0. In other words, since only trades by H of n or n þ aZ ð1max½0, aH Þ could lead to a net order flow of n, we have mðnÞ ¼ 0 if both aH þ n and aH þ n þ aZ ð1max½0, aH Þ are positive. Similarly, we have mðnÞ ¼ 1 if both trades that could possibly lead to a net order flow of n leave H net short, that is, if both aH þ n and aH þn þ aZ ð1max½0, aH Þ are negative. For all n such that the two possible trades leave H with net positions of opposite sign, we have mðnÞ 2 ½0,1. In a PBE, this belief must be determined by the relative probabilities of the two trades in equilibrium if either receives positive probability in s, and can be assigned arbitrarily if neither is played with positive probability. Note that we henceforth drop the conditioning variables from the functions defined above for notational simplicity. First, consider possible equilibria with zero expected trading profits for H. In this case, H’s continuation payoff simply equals the expected value of its existing stake, so it will optimally want to maximize the value of that position. If aH 40, this is accomplished by ensuring that H’s ‘‘future self’’ votes to maximize firm value (i.e., by ensuring that the final position is still net long), while if aH o 0 it is accomplished by ensuring a vote to minimize firm value (i.e., by ensuring that the final position is still net short). Thus, H will be indifferent between any trading
(footnote continued) the situation—H’s expected payoff to buying to remain net long versus selling to go net short are the same whether the proposal is good or bad.
296
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
strategies s that have zero expected trading profits as long as s gives zero probability to trades that change the sign of H’s net position. Next, consider possible equilibria with positive expected trading profits. For any particular trading quantity t to have a positive expected trading profit, the market maker must sometimes be uncertain after observing one of the equilibrium n that could follow that trade, i.e., t or taZ ð1max½0, aH Þ, whether H will end up net long or net short. Otherwise, the price will always be correct at those n and the expected trading profit will be zero. Thus, for any t to have a positive expected profit in equilibrium, H’s strategy s must also put positive probability on another quantity that leads to the same net order flow some of the time, and that also leads to a different sign for H’s final net position. This implies that we must consider only the universe of strategies s that put positive probability on both elements of one or more ‘‘quantity pairs’’ defined by the fact that the market maker is sometimes unable to distinguish between cases where atomistic holders sell while H buys and cases where H sells while atomistic holders do not trade, and by the fact that the two trades in the pair lead to different signs for H’s final net position. For any possible such quantity pair, we define the two elements of the pair as one quantity that is bought, aB (i.e., a trading quantity of aB ), and one quantity that is sold, aS (i.e., a trading quantity of aS ), where we specify that aB 4 aS . We later show that aB , aS 4 0 is also required for a pair to feasibly be part of an equilibrium, so these are truly a buy and sell trade. For a given quantity pair, Table 1 gives a matrix of possible net order flows following trades by H and the atomistic holders. Since we have specified aB 4 aS , the only two of these four order flows that could ever be made equal are aS and aB aZ ð1max½0, aH Þ. Thus, we have the following constraint on the quantities within any given pair:
aB aZ ð1max½0, aH Þ ¼ aS :
ð6Þ
For each possible quantity pair, the buy and sell trades will result in the same net order flow only if the atomistic holders sell when H trades aB or do not sell when H trades aS . We denote the associated ‘‘hidden’’ net order flow, after which the market maker cannot determine H’s trade, as naB aB aZ ð1max½0, aH Þ ¼ aS . Each quantity pair can be indexed by any of these three quantities, naB , aB , or aS , which then determines the other two according to (6) and the definition of naB . In what follows, we generally index them using naB . Table 1 Shows possible net order flow combinations depending on the trades of the strategic trader, H, who either buys aB or sells aS shares, and atomistic holders of the stock, who either do not trade or sell an aggregate quantity of aZ ð1max½0, aH Þ.
Atomistics do not trade: Atomistics sell aZ ð1max½0, aH Þ:
H Buys aB
H Sells aS
aB
aS
aB aZ ð1max½0, aH Þ aS aZ ð1max½0, aH Þ
For any given quantity pair used in equilibrium, we let qaB 2 ½0,1 denote the relative probability with which H’s strategy s has it selling to end up short, so we have qaB sðaS Þ=ðsðaS Þ þ sðaB ÞÞ. Since the atomistic holders’ trades occur with equal and independent probability, this means that for quantity pairs used in equilibrium, the market maker’s belief at naB must be mðnaB Þ ¼ qaB , leading to a required price pðnaB Þ ¼ qaB VS þ ð1qaB ÞVL . Given this, H’s expected continuation payoff for buying aB shares and voting to maximize firm value is
aB ðVL ð12 VL þ 12ðqaB VS þ ð1qaB ÞVL ÞÞÞþ aH VL
ð7Þ
¼ 12aB qaB ðVL VS Þ þ aH VL :
ð8Þ
The first term in (7) is H’s expected trading profit. The expected price at which H buys shares, ð12 VL þ 12 ðqaB VS þð1qaB ÞVL ÞÞ, reflects the fact that half of the time, no atomistic holders sell and the net order flow is aB , leading to a price pðaB Þ ¼ VL . This must be the price since a net order flow of n ¼ aB could only be reached if H bought aB or more, which means the market maker must believe H’s position on the voting date will be net long, implying mðaB Þ ¼ 0.12 The rest of the time, H’s trade successfully hides among the noise, and the market maker sets the price at the ‘‘uninformed’’ expected value, pðnaB Þ ¼ qaB VS þð1qaB ÞVL . The second term in the equation reflects the value of H’s existing stake in the firm given that H will ultimately vote to maximize firm value. Similarly, H’s expected continuation payoff for selling and voting to minimize firm value equals ðaS ÞðVS ð12 VS þ 12ðqaB VS þ ð1qaB ÞVL ÞÞÞþ aH VS ¼ 12ðaZ ð1max½0, aH ÞaB Þð1qaB ÞðVL VS Þ þ aH VS ,
ð9Þ ð10Þ
where in (10) we use ðaB aZ ð1max½0, aH ÞÞ to replace aS given the constraint in (6). In this case, note that the price when atomistic holders also sell is pðaS aZ ð1max½0, aH ÞÞ ¼ VS since that n could be reached only by a sale of aS or more and therefore, mðaS aZ ð1max½0, aH ÞÞ ¼ 1. Eq. (8) must equal (10) in order for H to be indifferent and willing to put positive probability on both elements of the pair. Now, note that for the trading profit portion of (7) to be positive, it must be the case that aB 4 0, and similarly for the trading profit portion of (9) to be positive, it must be the case that aB aZ ð1max½0, aH Þ ¼ aS be negative, i.e., that aS 4 0. As noted above, if expected trading profits for any quantity are negative, H will be better off by not trading. Thus, aB , aS 4 0 is another necessary condition for a feasible quantity pair. We next solve for the qaB that makes H indifferent between the two quantities in a given quantity pair. For any given quantity pair indexed by naB to feasibly be part 12 As noted previously, any net order flow n could result from two possible trades by H, depending on the actions of the atomistic shareholders. Thus, when a net order flow of n ¼ aB is observed, the market maker’s beliefs could conceivably place positive weight on two possible trades by H: aB and aB þ aZ ð1max½0, aH Þ. The latter trade is not part of the quantity pair under consideration, but we have so far not ruled out the use of other trades in s, so we note here that the belief will be the same even if some probability is ascribed to this quantity.
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
of an equilibrium mixed strategy, it must be the case that the associated qaB 2 ½0,1 makes H indifferent between the two quantities in the pair. Setting (8) equal to (10) and solving for qaB yields aB 1 2 B q ðVL VS Þ þ H VL ¼ 12ð Z ð1max½0, H Þ B Þð1qaB ÞðVL VS Þ þ
a
a
a
a
a
aH VS ,
ð11Þ
qaB ð12 aB þ 12ðaZ ð1max½0, aH ÞaB Þ ¼ 12ðaZ ð1max½0, aH ÞaB ÞaH ,
ð12Þ
aB þ 2aH : aZ ð1max½0, aH Þ
ð13Þ
qaB ¼ 1
We can also express this in terms of the corresponding hidden net order flow, naB , by replacing aB according to the identity aB ¼ naB þ aZ ð1max½0, aH Þ from above. This yields qaB ¼ 1ðnaB þ aZ ð1max½0, aH Þ þ 2aH Þ=aZ ð1max ½0, aH Þ, or, simplifying, qaB ¼
naB þ 2a
H
aZ ð1max½0, aH Þ
:
ð14Þ
This condition must be satisfied by any equilibrium strategy s that uses the pair indexed by naB . We now have three conditions that must be met for a given quantity pair to feasibly be part of an equilibrium mixed strategy for H: (a) it must be such that aB , aS 4 0 (for both to have any chance of yielding a trading profit); (b) it must be such that H ends up net long after buying aB shares and net short after selling aS shares (to induce uncertainty in expected firm value); and (c) it must be such that the associated qaB that makes H indifferent is in the range [0,1] (for H to be indifferent between the two trades). Condition (a) implies that aB ¼ naB þ aZ ð1max½0, aH Þ 40 must hold, which translates to naB 4 aZ ð1max½0, aH Þ. It also implies that aS ¼ naB 40 must hold. That is, feasible pairs must have naB that fall into the range ½aZ ð1max½0, aH Þ,0. Turning to condition (b), for H to end up net long after buying, it must be the case that aH þ aB ¼ aH þ naB þ aZ ð1max½0, aH Þ 40, or, rearranging, naB 4aH aZ ð1max½0, aH Þ. For H to end up net short after selling, it must be the case that aH aS ¼ aH þnaB o 0, i.e., that naB o aH . Thus, feasible pairs must also have naB that fall into the range ½aH aZ ð1max½0, aH Þ,aH . Note that this range coincides with that from condition (a) when aH ¼ 0, but then diverges as aH rises or falls. Finally, consider condition (c). Note from (14) that the qaB required for indifference within pairs is monotonically decreasing in naB . Thus, to find a feasible range for quantity pairs we can set (14) equal to zero and one and solve for naB . Setting (14) equal to one yields naB ¼ 2aH aZ ð1max½0, aH Þ, while setting it equal to zero yields naB ¼ 2aH . Thus, feasible pairs must also have naB that fall into the range ½2aH aZ ð1max½0, aH Þ,2aH . As with condition (b), this corresponds to the range given by condition (a) when aH ¼ 0, but diverges as aH rises or falls (but faster than the condition (b) range diverges). For any pair to be feasible, its naB must fall within all three ranges. Thus, to find the relevant feasible range of quantity pairs for a given aH , we need consider simply the
297
union of the ranges provided by conditions (a) and (c) (condition (b) becomes irrelevant since the range provided by condition (c) changes faster in aH Fi:e:, the union of the other two ranges is always contained within the range given by condition (b)). The union of the two ranges is defined differently depending on the sign of aH . First, consider aH 4 0. Note that both limits of the range given by condition (c) are clearly decreasing in aH , so in this case, the lower limit of the true feasible range will be determined by condition (a) while the upper limit will be determined by condition (c), i.e., the range of naB for feasible pairs must be naB 2 ½aZ ð1aH Þ,2aH :
ð15Þ
By similar logic, the range of naB for feasible pairs given aH o 0 must be naB 2 ½2aH aZ ,0:
ð16Þ
The range in (15) shrinks as aH rises above zero, so an equilibrium with positive trading profits is feasible only up to the point where it collapses to a single point. This is when the limits are equal, or aZ ð1aH Þ ¼ 2aH ¼)aH ¼ aZ =ð2þ aZ Þ. Similarly, the range in (16) shrinks as aH falls below zero, so an equilibrium with positive trading profits is feasible only down to the point where the limits are equal, or 2aH aZ ¼ 0¼)aH ¼ aZ =2. Thus, equilibria with positive trading profits are feasible only for stakes aH 2 ½aZ =2, aZ =ð2 þ aZ Þ. For all other values of aH , the equilibrium must have H maximizing the value of its existing stake by either not trading, or always trading in a way that the sign of its position does not change, so that its voting incentives are not affected by its trade (see the proof of Lemma 1 in the Appendix for a verification of the existence of such equilibria). Note that since we have assumed aX o ð2aZ Þ=2ð2 þ aZ Þ (Assumption 3), the maximum voting power H can have in an equilibrium with positive trading profits is less than ð2aZ Þ=2ð2þ aZ Þ þ aZ =ð2 þ aZ Þ ¼ 12. For aH within the specified range, an equilibrium with positive trading profits is possible. Now we must determine which feasible quantity pairs are used by H if such an equilibrium exists. First, we find which quantity pair, if used in equilibrium, gives the highest expected payoff for H. To do this, we must simply maximize either (8) or (10) with respect to aB using the qaB defined in (13)—they will have the same payoff since this qaB was chosen to make them equal. Using (13) in (8) yields 1 aB þ2aH ðVL VS ÞaB 1 þ aH VS : ð17Þ 2 aZ ð1max½0, aH Þ Taking the derivative of (17) with respect to aB yields 1 aB þ 2aH aB ðVL VS Þ 1 ð18Þ 2 aZ ð1max½0, aH Þ aZ ð1max½0, aH Þ
¼
1 2aB þ2aH ðVL VS Þ 1 : 2 aZ ð1max½0, aH Þ
ð19Þ
The second-order condition is clearly satisfied, so the maximal profit is found by solving the first-order condition
298
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
(8) where we set aB ¼ aB and qaB ¼ q , or
(19)=0 for the optimal aB as follows: 2aB þ 2aH ¼ 0, 1 aZ ð1max½0, aH Þ 1
ð20Þ
2aB 2aH ¼ , aZ ð1max½0, aH Þ aZ ð1max½0, aH Þ
aB ¼
aZ ð1max½0, aH Þ 2
ð21Þ
ð22Þ
The corresponding ‘‘hidden’’ net order flow is then
þð1q Þð12aB q ðVL VS Þ þ aH VL Þ ¼ 12ðVL VS Þq ð1q ÞaZ ð1max½0, aH Þþ aH ðq VS þ ð1q ÞVL Þ:
aZ ð1max½0, aH Þ 2
Using (24) we have 1 aH 1 aH þ q ð1q Þ ¼ 2 aZ ð1max½0, aH Þ 2 aZ ð1max½0, aH Þ ð27Þ
:
ð23Þ
¼
aZ ð1max½0, aH Þ2aH aZ ð1max½0, aH Þ þ 2aH 2aZ ð1max½0, aH Þ 2aZ ð1max½0, aH Þ
The corresponding probability of H selling aS ¼ aZ ð1max ½0, aH ÞaB shares as opposed to buying aB shares, i.e., q qaB ¼ sðaS Þ=ðsðaS Þ þ sðaB ÞÞ, is given by (14) using the n* from (23), or
¼
a2Z ð1max½0, aH Þ2 4a2H : 4a2Z ð1max½0, aH Þ2
aZ ð1max½0, aH Þ
q ¼
aH
2 aZ ð1max½0, aH Þ
þ2aH
ð25Þ
ð26Þ
aH :
n naB ¼ aB aZ ð1max½0, aH Þ ¼ aH
q ð12ðaZ ð1max½0, aH ÞaB Þð1q ÞðVL VS Þ þ aH VS Þ
1 aH ¼ : 2 aZ ð1max½0, aH Þ ð24Þ
Now we have the question of when the pair indexed by n* is feasible. First, note that n* is clearly within the feasible range when aH ¼ 0, and it is decreasing in aH . As aH increases above zero, n* falls, but not as quickly as the upper limit of the relevant feasible range in (15). Also, as aH increases, n* is clearly always greater than the lower limit of (15). The upper limit and n* converge when 2aH ¼ aH aZ ð1aH Þ=2¼)aH ¼ aZ =ð2 þ aZ Þ. As aH decreases below zero, n* increases, but not as fast as the lower limit of (16). As aH approaches aZ =2, the lower limit of (16) approaches zero, as does n*. Thus, whenever any quantity pairs with positive trading profits are feasible, n* is one of them. In the Appendix, we show that since the pair indexed by n* is always the most profitable feasible pair, it is the only pair used by H in equilibrium when positive trading profits are feasible. We also prove the existence of this equilibrium when aH 2 ½aZ =2, aZ =ð2þ aZ Þ, and prove the existence of the zero trading profit equilibrium otherwise. To do so, we must consider all possible deviations by H given out of equilibrium prices set by the market maker. See the Appendix for the full details, which gives the following result. Lemma 1. If aH 2 ½aZ =2, aZ =ð2 þ aZ Þ, then in the unique perfect Bayesian equilibrium of the post-record-date subgame, H plays a mixed strategy in which it sells aS shares with probability q*, resulting in a net short position on the voting date, and buys aB shares with probability (1 q*), resulting in a net long position on the voting date. Otherwise, H plays a strategy with zero expected trading profits, and its final net position on the voting date has the same sign as aH . We can now characterize H’s expected continuation payoff at this stage of the game. For aH 2 ½aZ = 2, aZ =ð2 þ aZ Þ, H sells with probability q*, leading to a total expected payoff of q* times (10) plus (1 q*) times
ð28Þ
ð29Þ
Substituting this into the first term of (26), we can rewrite (26) as ðVL VS Þða2Z ð1max½0, aH Þ2 4a2H Þ þ aH ðq VS þð1q ÞVL Þ: 8aZ ð1max½0, aH Þ ð30Þ The first term represents H’s expected trading profit, while the second is simply the expected value of its stake given its mixed strategy. For cases where trading profits are zero (aH not in the range given by Lemma 1), the expected payoff is simply aH VL or aH VS depending on the sign of aH . We now proceed to solve for H’s optimal share and vote trading prior to the record date (between times 0 and 1). H’s expected profits viewed from this stage of the game include its expected future trading profits plus the expected value of the stake it acquires today, less the price it pays for the stake and the cost of any ‘‘excess’’ votes, cðaX Þ. From above, H’s expected future trading profits are ðVL VS Þða2Z ð1max½0, aH Þ2 4a2H Þ=8aZ ð1max½0, aH Þ if aH 2 ½aZ =2, aZ =ð2 þ aZ Þ, and zero otherwise. Given our assumption that the shares H buys at this stage are priced at their true expected value, the price of the stake exactly offsets its expected value, so H chooses its economic and voting stakes solely to maximize expected future trading profits less the cost of extra votes. We thus have the objective function ! ðVL VS Þða2Z ð1max½0, aH Þ2 4a2H Þ max 1aH 2½aZ =2, aZ =ð2 þ aZ Þ aH , aX 8aZ ð1max½0, aH Þ max½0,ðaX aX ÞK,
ð31Þ
where 1aH 2½aZ =2, aZ =ð2 þ aZ Þ is an indicator function equaling one if aH is in the specified range, and VL VS is given by (5) above. The basic tension in H’s stake purchase decision can be seen in this expression. For a given value wedge (VL VS), trading profits are maximized by choosing a stake of aH ¼ 0. This is reflected in the term ða2Z ð1max½0, aH Þ2 4a2H Þ=8aZ ð1max½0, aH Þ, which decreases as aH rises or falls from zero. Intuitively, the larger the stake H holds (in absolute value) at the later trading stage, the more it
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
will worry at that point about protecting the value of that stake rather than generating trading gains. Furthermore, positive economic ownership by H offsets ownership by atomistic shareholders, and thus reduces market depth and the potential for profitable trading. On the other hand, the greater the (positive) stake, the more voting power H has, so the larger is the value wedge it can generate. This is reflected in the expression given in (5) for VL VS, in which g is increasing in aH and d is decreasing in aH over the range for which positive trading profits are feasible. The optimal stake trades off these effects. The optimal amount of extra votes, aX , affects trading profits only indirectly, through the increased value wedge. Thus, this positive effect is weighed against the direct cost max½0,ðaX aX ÞK. Analyzing (31) yields the following result. Proposition 1. The strategic trader’s optimal record-date share and vote position is characterized by a long economic position, 0 r aH o aZ =ð2þ aZ Þ, and an optimal number of ‘‘extra’’ votes aX ¼ aX . The solution for H’s optimal quantity of extra votes,
aX ¼ aX , is trivial given the votes’ usefulness in generating a value wedge together with the assumed cost function. The economic stake, aH , is determined by the tradeoff between voting power and trading gains. Going short at this stage is never optimal since it confers no votes and has no effect on liquidity, but reduces future trading gains by biasing H toward value destruction. A long position reduces trading gains both by reducing liquidity and by biasing H toward value creation (the commitment effect). Indeed, if the stake gets too large (larger than aZ =ð2 þ aZ Þ), the desire to protect the value of the stake will completely overcome any incentive to profit by trading (q* goes to zero). Thus, when choosing an ex ante stake purchased at its expected value, H will significantly limit the size of the stake purchase. However, the stake is generally positive since a long position increases H’s voting power and enhances its ability to generate a value wedge. Note that without further assumptions on the other shareholders’ vote distributions, we cannot show that there is a unique optimal aH , just that it is weakly positive and within the specified range. Also, the only instances where H does not buy a strictly positive stake are when aX is sufficiently large and/or the distribution of Y is sufficiently concentrated that the extra votes are essentially worthless. Now that we have the solution to H’s share and vote purchase strategy, it remains to determine how H’s actions will affect efficiency. We measure efficiency by the probability with which the correct value-maximizing decision is made. H votes to minimize firm value with probability q*, so the ex ante probability of the correct decision is q ð1FG ðgÞÞ þ ð1q Þð1FG ðdÞÞ:
ð32Þ
Here, we again use the mirror image property of FG ðÞ and FB ðÞ to express the probability in terms of FG ðÞ alone regardless of the true state. Analyzing (32) yields the following result.
299
Proposition 2. Whenever FG ðÞ ¼ FB ðÞ, the presence of the strategic trader (weakly) increases the ex ante probability of a correct decision. Thus, despite the fact that H will seek to generate trading profits by sometimes going net short, voting the wrong way, and manipulating the firm’s decisions to decrease value, its presence overall is actually beneficial to the firm from an ex ante perspective whenever other shareholders’ votes are uncorrelated with the true state. (The only times this is not true are in the exceptional cases discussed above where aH ¼ 0, in which case H’s presence does not affect efficiency). This is because the positive economic position H takes in order to increase voting power and help generate a value wedge causes it to vote the right way more often than not. Thus, the more rare cases where H manipulates negatively can be seen as the ‘‘price to be paid’’ for greater overall voting efficiency when H’s information is particularly valuable. Note that if there were no liquidity and thus no possibility of trading gains ðaZ ¼ 0Þ, H would have no incentive to purchase shares or votes, or to try to learn the value of the proposal. It is the possibility of trading gains introduced by noise trade that induces the information gathering and thereby increases efficiency. The key to the unambiguous nature of Proposition 2 is that the ‘‘noise’’ induced by the random votes is centered around the threshold when FG ðÞ ¼ FB ðÞ. In other words, the expected proportion of yes votes by atomistic stockholders is the same as the acceptance threshold, 12. When that is not the case, it is possible for H’s presence to reduce overall efficiency. Thus, a correct interpretation of Proposition 2 is that finding a negative overall efficiency effect will require that atomistic shareholders’ votes be correlated with the actual value of the proposal. We now proceed to investigate these issues in greater depth. In order to do so, we must make further assumptions on the distributions FG ðÞ and FB ðÞ. For the remainder of this section, we assume that the underlying probability density functions, fG ðÞ and fB ðÞ, are linear on [0,1]. Further, we assume that fG ðÞ is (weakly) positively sloped while fB ðÞ is (weakly) negatively sloped by the same magnitude (to maintain the distributions’ mirror image quality). Thus, we have fG ðyÞ ¼ 1 þ aðy 12Þ, where a 2 ½0,2 is the slope parameter, and fB ðyÞ ¼ 1aðy 12Þ. When a= 0 the distributions are uniform, and as a increases the probability of acceptance increases for a good proposal and decreases for a bad proposal. Using these properties, we derive the following comparative statics for the optimal stake size. Proposition 3. Assume fG ðÞ and fB ðÞ are linear on [0,1]. Then there is a unique optimal stake size aH 4 0. The optimal stake size is increasing in aZ , decreasing in aX , and unaffected by Dv and a. The greater the market depth ðaZ Þ, the greater is the potential for trading profits, and the more equity H will purchase, in general, in order to take advantage. On the other hand, as votes become easier to purchase directly, i.e., aX increases, control can be achieved by other means,
300
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
so a smaller economic stake is taken to reduce the commitment effect and increase future trading gains. The importance of the proposal, Dv, does not affect H’s stake purchase decision. The magnitude of Dv will certainly affect the magnitude of trading profits H is able to generate, but it does not affect the tradeoff between trading profits and the ability to generate a value wedge, which determines the optimal stake size. Finally, the fact that changes in a, which measures the correlation between other shareholders’ votes and the true state, do not affect aH 40 is a function of the linear distributional assumption. Next we consider how efficiency is affected. First, we derive results on how the probability of a correct decision depends on H’s economic and voting positions. We derive the first result without taking into account the equilibrium choices of aH and aX (though we do assume that H does not have full voting control, i.e., aH þ aX o 12, as is always true in equilibrium). Proposition 4. Assume fG ðÞ and fB ðÞ are linear on [0,1]. Then the probability of a correct decision is increasing in aH . When a 40, the probability of a correct decision is increasing in aX for sufficiently large aH , and decreasing in aX for sufficiently small aH . An increase in aH increases both H’s voting power and the probability that H will vote to maximize firm value, tending to reinforce a positive efficiency effect. On the other hand, an increase in aX increases only H’s voting power, so its effect on overall efficiency depends on H’s propensity for voting the right way, measured by the extent of its long position aH . These findings lead directly to the following important result. Corollary 1. Assume fG ðÞ and fB ðÞ are linear on [0,1]. Then if
aX is sufficiently low, H’s presence always (weakly) increases efficiency. When H’s voting power is closely tied to its economic stake on the record date, we see that allowing H to ‘‘play games’’ by sometimes selling to a net short position and voting to destroy value always increases efficiency. In other words, allowing for empty voting stakes generated between the record and voting dates always improves efficiency as long as the record-date stakes are close enough to being equal, implying that the commitment effect is large. Similar to Proposition 2, the negative outcomes are the price to be paid for inducing H to participate and contribute its information to the voting process. Next we consider how efficiency is affected as other shareholders’ votes become more correlated with the true state. Proposition 5. Assume fG ðÞ and fB ðÞ are linear on [0,1]. Then the probability of a correct decision is increasing in a, but the rate of increase decreases in both aH and aX . As expected, as others’ votes become more ‘‘informed,’’ efficiency improves. However, this is attenuated when H controls more votes. As shown in Proposition 3, changes in a do not affect aH in equilibrium, so Proposition 5
provides the equilibrium result that efficiency will not rise as quickly with a when H is present versus when there is no strategic trader. Putting Propositions 4 and 5 together with Corollary 1 leads to the following conclusion. Corollary 2. Assume fG ðÞ and fB ðÞ are linear on [0,1]. Then H’s presence is more likely to decrease efficiency the larger is a, and a negative efficiency effect requires a sufficiently large aX . Not surprisingly, H is more likely to reduce efficiency as other shareholders are more likely to arrive at the correct decision on their own. Furthermore, as noted above, a negative efficiency effect requires that a significant empty voting stake be created on the record date. Finally, we relax the conditions that ensure H never optimally takes full control of the vote by looking at the extreme case where separating votes from ownership at the record date is effectively unlimited ðaX Z 12Þ. The equilibrium is as follows. Proposition 6. Assume aX Z 12 and fG ðÞ and fB ðÞ are linear on [0,1]. Then H will not trade in the stock prior to the record date (i.e., aH ¼ 0), but will accumulate sufficient votes to determine the election outcome (i.e., aX Z 12). The probability of H selling short and voting to minimize firm value will be 12 in equilibrium, and H’s presence will not affect ex ante efficiency if FG ðÞ ¼ FB ðÞ, but will decrease efficiency if FG ðÞaFB ðÞ. This result reflects the fact that H’s trading gains are maximized when its stake is zero. Since buying enough votes to swing the election maximizes the value wedge, VL VS, there is no longer any reason for H to take a position in the stock on the record date. It maximizes its trading profits by buying/selling with equal probability. It is interesting to note that this is an extreme version of the model in which H’s ability to manipulate firm value is maximized, yet the overall effect is neutral for firm value in the absence of correlated voting by others.
4. Numerical examples We now provide some numerical examples to illustrate the above results. Continuing with the assumption of linear probability density functions, we consider the cases a = 0 (uniform), a =1 (the probability of a correct decision in the absence of H is 58), and a =2 (the probability of a correct decision in the absence of H is 34). We also set aZ ¼ 0:2. We first consider how H’s optimal record-date economic ownership, aH , varies as votes become easier to acquire in the lending market, i.e., as aX increases. Fig. 2 graphs this relationship. As noted in Proposition 3, aH does not depend on a so this graph has a single line. As indicated in Proposition 3, the extent of H’s long position on the record date is clearly decreasing in the number of extra votes it can buy in the share lending market. In this case, it falls from a maximum of about 25%
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
301
way more often than not) outweighs the negative effect of a greater probability of going short. With the correlated distributions (a =1, 2), H’s presence still improves efficiency when aX is sufficiently low, as shown in Corollary 1. However, as indicated by Proposition 4, the extra votes at some point start to reduce efficiency as aH is reduced and H’s increased voting power together with the reduced commitment effect works against the other shareholders, who are now fairly likely to arrive at the correct decision on their own. Ultimately, H’s presence reduces efficiency overall when aX is sufficiently large, as indicated by Corollary 2. 5. Implications and conclusion Fig. 2. Shows the strategic trader’s optimal record-date holding, aH , as a function of the quantity of votes available in the lending market, a X .
Fig. 3. Shows the incremental probability of a correct decision as a result of the strategic trader’s presence as a function of the quantity of votes available in the lending market, a X , for varying levels of correlation between atomistic holders’ votes and the correct decision, indexed by a (larger a implies a greater positive correlation).
of aZ to about 8% of aZ as aX approaches its maximum value. Next we consider how the ease of purchasing extra votes affects efficiency. Fig. 3 graphs the relationship between the number of extra votes available, aX , and the difference in the probability of a correct decision caused by H’s presence. In particular, the graph plots the probability when H plays its optimal strategy minus the probability when H is not present in the model. Here, the results depend critically on the correlation between others’ votes and the true state. When atomistic holders’ votes are uniformly distributed (a=0), making more votes available to H always improves efficiency as long as aX o 12. This occurs despite the fact that extra votes result in a smaller economic stake (as in Proposition 3 and Fig. 2), and therefore a smaller commitment effect and a greater probability that H will go net short and vote to minimize firm value. The offsetting force is the greater voting power H has in equilibrium despite his reduced economic stake; i.e., the direct increase in voting power via aX is greater in equilibrium than the decrease in aH . This direct positive effect (with H still voting the right
Our model is stylized, but it nevertheless provides a number of useful empirical implications. Most obviously, it implies that empty voting behavior should result in both positive and negative outcomes from an efficiency perspective. Thus, the negative anecdotes described by Hu and Black (2006, 2007) should be only one side of the story. However, it is likely that direct, large-scale evidence of favorable (or unfavorable) empty voting behavior would be difficult or impossible to gather. Our model can also guide less direct empirical investigations. For instance, with slight adjustments to the model, we could predict which types of firms and proposals are more likely to be targeted by strategic empty voters. In particular, if we add an information cost that the strategic trader must pay after the record date to become informed about the value of the proposal, the model would predict that such traders are likely to target firms where the strategy is most profitable. This would tend to be firms with: (a) high liquidity (high aZ ), which could be measured by trading volume or a statistic summarizing the dispersion of share ownership; (b) potentially important pending proposals (high Dv), which could be measured either directly by looking at types of proposals, or indirectly using a proxy for the quality of corporate governance; (c) greater availability of ‘‘extra’’ votes (high aX ), which could be measured by volume and specialness in the lending market, dispersion of ownership, or the availability of derivatives to hedge economic exposure; and (d) voting outcomes that are uncertain (i.e., the vote is fairly likely to go either way) but where relatively few votes are needed to swing the result (which would correspond to a tighter distribution of non-strategic votes). This last point could again be related to the quality of corporate governance (outcomes are likely to be less certain when bad proposals are more likely). Another example could be a vote with a supermajority rule where a high percentage is needed for approval of the proposal. The distribution of share ownership between institutions and individuals could also be important if different types of shareholders have different information or voting incentives.
302
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
Our model would therefore predict that more empty voting activity should occur in these types of settings. Direct evidence could be sought by looking at share lending volumes around record dates and/or share trading by certain types of investors (such as hedge funds) before and after record dates. The comparative statics in Proposition 3 could also be tested in this context. Furthermore, the model provides predictions about when the actual decisions are likely to be more efficient when empty voting occurs. Thus, an indirect test of the model could focus on ex post measures of how voting outcomes affect firm value. As noted in the Introduction, our results may also provide some guidance on the efficacy of proposed regulatory reforms designed to curb or eliminate the negative effects of empty voting. For example, Hu and Black (2006, 2007) advocate additional disclosure requirements as a reasonable starting point. In the framework of our model, disclosure of an empty voting position on the record date would have no effect, because we already assume that the market maker observes the strategic trader’s actions at that stage. The effect of a rule requiring disclosure of a change in economic position relative to voting rights between the record and voting dates depends on how the rule is implemented. If the rule made it more difficult for the trader to hide its trades from the market maker, this would have the effect of reducing or eliminating any trading profits the strategic trader could otherwise generate. Thus, the rule could reduce efficiency if it causes the trader not to accumulate votes in the first place (which is likely if there is a cost to gathering information about the quality of the proposal) in cases where the trader’s presence improves efficiency. As noted previously, such cases are likely when either separating shares from votes is not very expensive or other shareholders’ votes are not too highly correlated with the correct decision. Otherwise, such a rule is likely to improve efficiency. Extending the model to allow for initial holdings by the strategic trader would lead to some additional implications (as noted in the Introduction). In particular, if the strategic trader were an existing blockholder with a long position, it would have an increased incentive to protect the value of its initial stake, meaning that its record-date stake would be increasing in the size of its initial position. Thus, the commitment effect would be intensified and the trader’s actions would be more likely to increase efficiency. Indeed, if initial ownership were large enough, the trader may actually find it optimal to forgo trading gains, instead buying enough shares and/or votes to ensure that the vote outcome always maximizes firm value.13 On the other hand, if the trader arrived with an initial short position, its record-date position would be reduced (potentially to a net short position) due to the existing incentive to decrease firm value, and the trader’s presence would be more likely to decrease efficiency overall.
13 A full analysis of the case of arrival with a long position is available from the authors upon request.
While our model provides a coherent framework for addressing the efficiency consequences of empty voting, there are a number of issues that remain unexplored. First of all, it would be interesting to allow for some correlation between managers’ project selection and their ability. As noted in the Introduction, if project choices are correlated with managerial ability or agency problems, empty voting could have additional efficiency implications. Furthermore, it would be interesting to study how the results would change if there were multiple strategic traders who compete to generate trading gains. It would also be interesting to study the interaction between a strategic trader with no initial interest in the stock (such as in our model) and an existing large shareholder who could also act to influence the vote outcome. Finally, we would like to more closely investigate specific mechanisms by which shares and votes can be separated, such as the share lending market. For example, if the uninformed shareholders were not all atomistic, how would they analyze the decision of whether to lend their shares on the record date? How would this affect pricing in the lending market? Our framework should provide a platform for exploring these issues in the future. Appendix Proof of Lemma 1. First, we show that if n* is feasible and aB and aS receive positive probability in H’s mixed strategy s, then no other quantities can receive positive probability. From the text, any feasible quantity pair that receives positive probability in equilibrium must have pðnaB Þ ¼ qaB VS þð1qaB ÞVL at its associated hidden order flow, where qaB is determined by (14), while pðaB Þ ¼ VL and PðaS aZ ð1max½0, aH ÞÞ ¼ VS . But at these prices, it has been shown that trading profits are superior for the quantities in the n* pair, so H cannot be indifferent between the quantities in that pair and any quantities in any other feasible pair. Furthermore, no quantity with zero or lower expected trading profits can receive mixing weight in an equilibrium where the n* pair is used. Any such quantity will be dominated by one of the n* quantities (either aB or aS will have the same value for the existing stake as the no trading profit quantity, but will also have positive expected trading profits). Next, we show there cannot be an equilibrium with positive trading profits when aH 2 ½aZ =2, aZ =ð2 þ aZ Þ that places zero weight on the quantities in n*. First, note that even if the order flow is not reached in equilibrium, pðaB Þ ¼ VL must still hold since the market maker cannot believe H is short with any probability, and similarly pðaS aZ ð1max½0, aH ÞÞ ¼ VS . Clearly, then, if the offequilibrium price p(n*) were expected to be q*VS +(1 q*)VL, then the n* quantities would always offer higher expected profits than the quantities in any other feasible pair used in the proposed equilibrium. Next, note that if p(n*) is expected to be higher than q*VS +(1 q*)VL, a trade of aS will look even more profitable, while if p(n*) is expected to be lower than that, a trade of aB will look even more profitable. Thus, no p(n*) exists that can prevent a deviation to one of the n*
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
quantities from the quantities in any feasible pair in any proposed equilibrium. Finally, to prove that the equilibrium given in Lemma 1 is unique if it exists, we must show that a no trading profits equilibrium cannot exist when aH 2 ½aZ =2, aZ =ð2þ aZ Þ. As noted in the text, any no trading profits equilibrium must leave the sign of H’s stake unchanged after all trades with positive probability in s. First, assume aH 4 0. Now note p(n*) =VL is the only possible offequilibrium price that could prevent a deviation to aB from any other trade that maintains a long position (otherwise both trades would have a stake value of aH VL , and aB would additionally have positive expected trading profits). Next, note that at this price, deviating to aS yields expected trading profits of ðaS ÞðVS ð12 VS þ 12 VL ÞÞ ¼ 12 ðaZ ð1aH Þ=2 þ aH ÞðVL VS Þ while the cost of decreasing the value of the existing stake is aH ðVL VS Þ. The former exceeds the latter for all aH o aZ =ð2 þ aZ Þ, so the deviation is profitable. Similarly, if aH o 0, a price p(n*)= VS is required to prevent deviation to aS from any other trade that maintains a short position, but this price makes deviating to aB profitable. Next we prove the existence of the equilibria given in the result by characterizing the prices set by the market maker on and off the equilibrium path and checking for deviations. First, consider the case with feasible positive trading profits, aH 2 ½aZ =2, aZ =ð2þ aZ Þ. From the text, the equilibrium has prices p(n*)= q*VS + (1 q*)VL, pðaB Þ ¼ VL , and pðaS aZ ð1max½0, aH ÞÞ ¼ VS . These give the market maker zero expected profit in equilibrium by construction, which is the market maker’s objective function, so deviations by the market maker along the equilibrium path are not a concern. Since the PBE concept puts no constraints on off-equilibrium path beliefs, we also do not have to worry about deviations by the market maker at those nodes as long as the prices we specify are consistent with the beliefs we specify. We now rule out deviations by H to different trading quantities. In order to test for such deviations, we must specify the market maker’s beliefs and thus prices that will be set at out-of-equilibrium net order flows. For all feasible ‘‘hidden’’ order flows naB as defined in the text, we assume mðnaB Þ ¼ qaB where qaB is as in (14). Thus, pðnaB Þ ¼ qaB VS þ ð1qaB ÞVL . For all order flows n above the range given by (15) if aH 4 0 or (16) if aH o0, we assume mðnÞ ¼ 0, so that p(n)= VL, and for all those below the relevant range, we assume mðnÞ ¼ 1 and thus, p(n)= VS. This implies that for any feasible quantity pair, the price will be VL if H buys aB and atomistic holders do not trade, while the price will be VS if H sells aS and atomistic holders also sell. Thus, the analysis in the text showing that the quantities in the pair indexed by n* have superior profits implies that H will not deviate to any quantity in such an alternative feasible quantity pair.
303
Next we rule out deviations to any quantity not in a feasible quantity pair. First, assume aH o0. The analysis in the text shows that the upper end of the feasible range has naB ¼ 0, and that for all pairs with higher naB , aS must be negative, i.e., no sale trade can lead to a higher net order flow. Finally, note that at the price specified above for these order flows, p(n)= VL, H cannot expect a trading profit, so such deviations have an expected payoff of aH VS or aH VL or less, all of which are dominated by the expected payoff to one of the quantities in the pair indexed by n* given their positive expected trading profits. The lower end of the feasible range with aH o 0 has naB ¼ 2aH aZ , at which point we have qaB ¼ 1 and the price equals VS, since at this point H has no more incentive to buy the corresponding quantity in the pair, only to sell at the equilibrium prices. Now, consider quantity pairs with hidden net order flows in the range ½aZ ,2aH aZ , where qaB 2 ½0,1 does not hold (condition (c) fails) but aB , aS 4 0 still holds (condition (a) is satisfied). As specified above, we keep the price at those order flows at VS. To get to those lower order flows, H must either sell more or buy less than the quantities in the pair with naB ¼ 2aH aZ , but at the same expected price. This leaves the expected sell profit the same (since VS is the correct price from H’s perspective) while decreasing the expected buy profit. Thus, since the quantities in the pair indexed by naB ¼ 2aH aZ do not have higher expected profit than the quantities indexed by n* (from above), those indexed by these lower naB also cannot. Finally, for quantity pairs with naB oaZ , the discussion in the text shows that only sell orders can result in those order flows, so a price of VS as specified prevents trading profits. Now, assume aH 40. The analysis in the text shows that the lower end of the feasible range has naB ¼ aZ ð1aH Þ, at which point aB ¼ 0. Thus, any trade that leads to a lower naB must be a sell trade, and H cannot profit given the price of VS (as specified above). The upper end of the feasible range has naB ¼ 2aH , at which point we have qaB ¼ 0 and the price is VL, since at this point H has no incentive to sell the corresponding quantity in the pair, only to buy at the equilibrium prices. Now, consider quantity pairs with naB 2 ½2aH ,0, where qaB 2 ½0,1 does not hold but aB , aS 4 0 still holds. As specified above, we keep the price at those order flows at VL. To get to these higher order flows, H must either buy more or sell less than the quantities in the pair indexed by naB ¼ 2aH , but at the same expected price. This leaves the expected buy profit the same (since VL is the correct price from H’s perspective) while decreasing the expected sell profit. Thus, since the quantities in the pair indexed by naB ¼ 2aH do not have higher expected profit than the quantities indexed by n* (from above), those indexed by these higher n also cannot. Finally, for quantity pairs with naB 4 0, the discussion in the text shows that only buy orders can result in those order flows, so a price of VL as
304
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
specified prevents any trading gains. This suffices to prove the existence of the positive trading profit equilibrium described in the result. Now, consider the no trading profit cases aH o aZ =2 and aH 4 aZ =ð2 þ aZ Þ, where H’s equilibrium strategy s places weight only on quantities that do not change the sign of H’s position (and where all such s are equally valid equilibrium strategies). Take the case aH oaZ =2 first. We specify beliefs (both on and off the equilibrium path) as follows: for n 4 0, mðnÞ ¼ 0 so p(n)= VL, and for n r0, mðnÞ ¼ 1 so p(n) =VS. For n= 0 to occur with any probability, H either must buy aZ shares or not trade. Thus, to reach any higher order flow, it must always buy, and a price of VL prevents any trading profits. To reach any lower order flow, it must either sell some quantity, in which case the price of VS eliminates trading profits, or buy some quantity. A buy at a price of VS has positive expected profit if the buy quantity exceeds aH , so that H’s final stake is positive and expected firm value is VL. The largest possible expected trading profit occurs at the largest possible buy that sometimes leads to an order flow weakly less than zero, i.e., a buy of aZ . The expected trading profit to that trade is 12 aZ ðVL VS ÞFhalf of the time atomistic holders will not trade and the trade will be priced at VL. However, since a buy with positive trading profits changes the sign of H’s final position, H takes an expected loss on its existing stake of aH ðVS VL Þ, which loss is larger than the above trading gain since aH o aZ =2. Thus, H is better off trading any quantity with zero expected trading profits that keeps the sign of its stake constant. The pricing scheme outlined above is therefore optimal for the market maker (there is no uncertainty about how H will vote in equilibrium, and all trades t that do not change the sign of H’s position and can therefore have positive weight in s lead to a price of VS, which is correct). Finally, take the case aH 4 aZ =ð2 þ aZ Þ. We specify mðnÞ ¼ 0 so p(n)=VL for all n ZaZ ð1aH Þ, and mðnÞ ¼ 1 so p(n)= VS for all lower order flows. For n ¼ aZ ð1aH Þ to occur with any probability, H must either not trade or sell aZ ð1aH Þ shares. Thus, to reach any lower order flow, it must always sell, and a price of VS prevents any trading profits. To reach any higher order flow, it must either buy some quantity, in which case the price of VL eliminates trading profits, or sell some quantity. A sell at a price of VL has positive expected profit if the sale quantity exceeds aH , so that H ends up short and the expected firm value is VS. The most profitable possible such sale occurs at the largest sale quantity that can lead to an order flow weakly above aZ ð1aH Þ, i.e., a sale of aZ ð1aH Þ. The expected trading profit to such a sale is 1 2 aZ ð1aH ÞðVL VS ÞFhalf of the time atomistic holders will also sell and the price will be VS. However, since a sell with positive trading profits changes the sign of H’s position, H takes an expected loss on its existing stake of aH ðVL VS Þ, which loss is larger than the above trading gain since
aH 4 aZ =ð2 þ aZ Þ (note that aZ ð1aH Þ ¼ 2aZ =ð2 þ aZ Þ when aH ¼ aZ =ð2 þ aZ Þ, and is decreasing in aH ). Thus, as above, H is better off trading any quantity with zero expected trading profits that keeps the sign of its stake constant. The pricing scheme outlined above is therefore optimal for the market maker (again, there is no uncertainty about how H will vote in equilibrium, and all trades t that do not change the sign of H’s position and can therefore have positive weight in s lead to a price of VL, which is correct). & Proof of Proposition 1. Taking the derivative of VL VS with respect to aX using (5) yields @g @d Dv fG ðgÞ fG ðdÞ : ð33Þ @aX @aX Note that ð1Þ 12 @g 1 ¼ ¼ @aX 2ð1max½0, aH aX Þ2 ð1max½0, aH aX Þ2
ð34Þ
and ð1Þð1max½0, aH aX Þð1Þ 12 max½0, aH aX @d ¼ 2 @aX ð1max½0, aH aX Þ ¼
1
ð35Þ
2ð1max½0, aH aX Þ2
when aH o 12 aX , which is always true over the range of feasible positive trading profits. Using these in (33), (33) becomes ! 1 Dv ðf ð g Þ þ f ð d ÞÞ 4 0: ð36Þ G G 2ð1max½0, aH aX Þ2 The result for aX is then immediate conditional on aX o1aH , which is shown to hold in equilibrium below. Next, note that when aH o 0, ! !! 1 1 aX 2 VL VS ¼ Dv FG FG 2 ð37Þ 1aX 1aX does not vary with aH . Given this, it is obvious that (31) is increasing in aH for all aH o0, so aH Z 0 follows. Now, consider aH Z 0. Given that the second term in the objective function (31) equals zero in equilibrium (since aX ¼ aX ), the entire objective function equals zero for aH 4 aZ =ð2 þ aZ Þ. However, it is positive at aH ¼ 0 (it equals ðVL VS ÞaZ =8), which proves aH o aZ =ð2 þ aZ Þ. Taking the first derivative of (31) with respect to aH assuming aH 2 and aX ¼ aX yields (noting that ½0, aZ =ð2 þ aZ Þ VL VS ¼ DvðFG ðgÞFG ðdÞÞ and that @g=@aH ¼ @g=@aX and @d=@aH ¼ @d=@aX , which were calculated above for the assumed range of aH Þ: !
a2Z ð1aH Þ2 4a2H @g @d þ DvðFG ðgÞFG ðdÞÞ Dv fG ðgÞ fG ðdÞ 8aZ ð1aH Þ @aH @aH ða2Z 2ð1aH Þ8aH Þ8aZ ð1aH Þð8aZ Þða2Z ð1aH Þ2 4a2H Þ
!
64a2Z ð1aH Þ2
ð38Þ
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
¼
DvðfG ðdÞ þfG ðgÞÞ a2Z ð1aH Þ2 4a2H 2ð1aH aX Þ2 FG ðdÞÞ
aH ðaH 2Þ
8aZ ð1aH Þ !
2aZ ðaH 1Þ2
aZ 8
305
!
! þ DvðFG ðgÞ
:
¼ Dv ð39Þ
Finally, we show that this is positive when aH ¼ aX ¼ 0, implying that aH will be positive at least for sufficiently small aX . First, note that g ¼ d ¼ 12 when aH ¼ aX ¼ 0, so FG ðgÞFG ðdÞ ¼ 0 and (39) reduces to aZ 1 4 0: & ð40Þ Dvf G 2 8
aZ 1aX ð1aX Þ 40: 8 ð1aX Þ2
ð45Þ
To prove there is a unique maximum, it suffices to show that (31) is concave in the relevant range, aH 2 ½0, aZ =ð2 þ aZ Þ. Taking the derivative of (43) with respect to aH yields Dv
a2Z ð1aH Þ2 4a2H 8aZ ð1aH Þ
!
!
2 3
þ
Dv
Now, note that @q =@aH ¼ ðaZ ð1aH ÞðaZ ÞaH Þ=a2Z ð1aH Þ2 ¼ 1=aZ ð1aH Þ2 . Also noting that fG ðgÞ ¼ fG ðdÞ when FG ðÞ is symmetric around 12 and using the expressions derived in the proof of Proposition 1 above for @d=@aH and @g=@aH , (41) simplifies easily to FG ðgÞFG ðdÞ
aZ ðaH 1Þ
2
þ
fG ðgÞð12q Þ 2
2ð1aH aX Þ
40,
where the inequality follows from q o 12.
ð42Þ
Now, to prove aH 40, we evaluate this setting aH ¼ 0. This gives Dv aZ aX aZ þ ð44Þ D v 1aX 8 ð1aX Þ2 8
!
!
4 2ð H 1Þ Z
4a a
ð46Þ
4aH ðð1aX Þð6aH 3ð1aX ÞÞa2H ð3aX ÞÞ þ aX ða2Z ð1aH Þ3 4ðaX 1Þ2 Þ
¼ Dv
4ðaH 1Þ3 ðaH þ aX 1Þ3 aZ
o 0:
ð47Þ The sign is derived as follows. The denominator is clearly positive given aH þ aX o 12. The term 6aH 3ð1aX Þ is maximized when aH and aX are large, and with aX ¼ 12 it is negative for all aH 2 ½0, 15, which is the maximum allowable range. The other terms in the numerator are clearly negative. Next, note that VL VS is independent of a (see the expression derived above), so aH will also be independent of a. This proves the last comparative static result. For the first two comparative static results, it suffices to sign the appropriate cross-partial derivative. Taking the derivative of (43) with respect to aZ yields the crosspartial Dv ð1aH aX Þ2
! 2aZ ð1aH Þ2 8aZ ð1aH Þ8ð1aH Þða2Z ð1aH Þ2 4a2H Þ 64a2Z ð1aH Þ2
! aH þ aX 2ðaH 1Þ2 aH ðaH 2Þ 1 þ Dv 1aH aX 8 4a2Z ðaH 1Þ4
&
Proof of Proposition 3. First, note that given fG ðyÞ ¼ Ry 1 þaðy 12Þ, we have FG ðyÞ ¼ 0 ð1 þ aðw 12ÞÞ dw ¼ ðð2a ð1yÞÞ=2Þy. Thus, FG ðgÞFG ðdÞ ¼ FG ðgÞFG ð1gÞ ¼ ðð2a ð1gÞÞ=2Þgðð2að1ð1gÞÞÞ=2Þð1gÞ ¼ 2g1. Using the definition of g and noting from the text that aH þ aX o 12 in equilibrium, this implies VL VS ¼ Dvð2g1Þ ¼ DvðaH þ aX Þ=ð1aH aX Þ. Also note that since d ¼ 1g, we have fG ðgÞ þ fG ðdÞ ¼ 1 þ aðg 12Þ þ1 þ aðð1gÞ 12Þ ¼ 2. With these, (39) becomes ! a2Z ð1aH Þ2 4a2H Dv 8aZ ð1aH Þ ð1aH aX Þ2 ! aH þ aX aH ðaH 2Þ aZ þ Dv : ð43Þ 1aH aX 2aZ ðaH 1Þ2 8
aH ðaH 2Þ aZ 2aZ ðaH 1Þ2 8
ð1aH aX Þ ð1aH aX Þ ! ! aH ðaH 2Þ aZ ð1aH aX Þ þ ðaH þ aX Þ aH þ aX þ Dv þ Dv 1aH aX 2aZ ðaH 1Þ2 8 ð1aH aX Þ2 ð2aH 2Þ2aZ ðaH 1Þ2 4aZ ðaH 1ÞaH ðaH 2Þ
Proof of Proposition 2. If FG ðÞ ¼ FB ðÞ, then, as noted in the text, FG ðÞ must be symmetric around 12. This implies both that the probability of approval without H is 12 and that FG ðdÞ ¼ 1FG ðgÞ. Thus, if aH ¼ 0, from (24) the optimal mixing probability is q ¼ 12, and the probability of a correct decision becomes 12 FG ðdÞ þ 12 ð1FG ðdÞÞ ¼ 12, so H does not affect efficiency regardless of aX . Thus, it suffices to show that (32) is increasing in aH for aH 2 ½0, aZ =ð2 þ aZ Þ and any feasible aX . Taking the derivative of (32) with respect to aH yields @q @g @d ð1FG ðgÞÞ þ q fG ðgÞ þ ð1q Þ fG ðdÞ @aH @aH @aH @q : ð41Þ þð1FG ðdÞÞ @aH
2
¼
" ! 4a2H 1 Dv 1 a þ H 8 ð1aH aX Þ2 a2Z ð1aH Þ !# aH þ aX 4aH ð2aH Þ 1 þ 2 þ Dv 4 0: 1aH aX aZ ð1aH Þ2
ð48Þ
ð49Þ
The inequality is derived as follows. First, note that Dv=ð1aH aX Þ2 Z4DvððaH þ aX Þ=ð1aH aX ÞÞ given aH þ aX o 12. The result then follows from the fact that ð1 þ 4aH ð2aH Þ=a2Z ð1aH Þ2 Þ Z1 clearly always holds, while ð1aH þ 4a2H =a2Z ð1aH ÞÞ Z 45 must hold given that aH o 15 in equilibrium according to Proposition 1 given our assumption that aZ o 12 (Assumption 1). Next, taking the derivative of (43) with respect to aX yields the cross-partial ! ! a2 ð1aH Þ2 4a2H 2 Dv Z 8aZ ð1aH Þ ð1aH aX Þ3 ! ! aH ðaH 2Þ aZ ð1aH aX Þ þ ðaH þ aX Þ þ Dv 2aZ ðaH 1Þ2 8 ð1aH aX Þ2 ð50Þ
306
¼
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
"
Dv 2
ð1aH aX Þ
a2Z ð1aH Þ2 4a2H ð1aH aX Þ 8aZ ð1aH Þ 2
aH ðaH 2Þ aZ þ 2aZ ðaH 1Þ2 8
!
Taking the derivative with respect to a yields gð1gÞ dð1dÞ q þ ð1q Þ : 2 2
!#
o 0:
ð51Þ
The inequality is derived as follows. Consider the terms in the square brackets. The last term is clearly negative, while the first is positive. Now, note that in the first-order condition [(43) =0], ðaH ðaH 2Þ=2aZ ðaH 1Þ2 aZ =8Þ multiplies DvððaH þ aX Þ=ð1aH aX ÞÞ, while ðða2Z ð1aH Þ2 4a2H Þ =8aZ ð1aH ÞÞ multiplies Dv=ð1aH aX Þ2 . As above, we have Dv=ð1aH aX Þ2 Z 4DvððaH þ aX Þ=ð1aH aX ÞÞ, so, from the first order condition we must have ðaH ðaH 2Þ=2aZ ðaH 1Þ2 aZ =8Þ Z 4ðða2Z ð1aH Þ2 4a2H Þ=8aZ ð1aH ÞÞ in equilibrium. This proves the result since 2=ð1aH aX Þ r 4. For the third comparative static result, note that when setting (39) equal to zero and solving for aH , Dv will drop out of the solution. & Proof of Proposition 4. The derivative of (32) with respect to aH is given by (41) above. Evaluating this with linear density functions (and noting from above that @q =@aH ¼ 1=aZ ð1aH Þ2 , FG ðgÞFG ðdÞ ¼ ðaH þ aX Þ= ð1aH aX Þ, @g=@aH ¼ 1=2ð1aH aX Þ2 , and @d=@aH ¼ 1=2 ð1aH aX Þ2 Þ yields ! 1 aH þ aX 1 þ 2ð1aH aX Þ2 aZ ð1aH Þ2 1aH aX 1 1 q 1 þ a g ð52Þ ð1q Þ 1 þ a d 2 2 ¼
aH þ aX aðaH þ aX Þ aZ ð1aH Þ2 ð1aH aX Þ 4ð1aH aX Þ3 4aH þ 40: 4aZ ð1aH Þð1aH aX Þ2
ð53Þ
The inequality is derived as follows. The last term is clearly positive. The second term is negative and its absolute value increases in a. Setting a= 2, the sum of the two terms is positive if aZ ð1aH Þ2 ð1aH aX Þ o2ð1aH aX Þ3 ¼)2 4 aZ ð1aH Þ2 =ð1aH aX Þ2 , which always holds (the denominator is larger than 14 and the numerator is at most 12Þ. Taking the derivative of (32) with respect to aX yields @d @g þ q fG ðgÞ ð54Þ ð1q Þ fG ðdÞ @aX @aX ¼ ð1q Þ
fG ðdÞ 2ð1aH aX Þ2
! q
fG ðgÞ 2ð1aH aX Þ2
! :
ð55Þ
Since fG ðÞ is linear and increasing, we have fG ðdÞ r fG ðgÞ. Also, as aH falls from aZ =ð2 þ aZ Þ to zero, q* rises from zero to 12. The result for aX follows. & Proof of Proposition 5. Given the linear density, (32) can be written as 2að1gÞ 2að1dÞ g þ ð1q Þ 1 d : ð56Þ q 1 2 2
ð57Þ
Note that gð1gÞ=2 ¼ dð1dÞ=2 since d ¼ 1g, so, noting that we always have aH þ aX o 12 in equilibrium, (57) equals ! ! 1 1 2 2 aH aX 1aH aX 1aH aX gð1gÞ 12ðaH þ aX Þ ¼ ¼ 4 0: 2 2 8ð1aH aX Þ2 ð58Þ Finally, taking the derivative of this with respect to aH or aX yields 2ð8ð1aH aX Þ2 Þ16ð1aH aX Þð1Þð12aH 2aX Þ 64ð1aH aX Þ4 ¼
aH þ aX 4ð1aH aX Þ3
o0:
&
ð59Þ
Proof of Proposition 6. With aX Z 12, votes are free up to the point where H takes complete control of the vote. It is easy to see that the value wedge H can create, VL VS, is maximized when H can single-handedly determine the outcome. Furthermore, its maximized value equals Dv. Thus, H will acquire at least 12 of the votes, and the value wedge will no longer depend on aH . H’s maximization problem with respect to aH is therefore given by (note that there is still no incentive to go short, as the analysis in the proof of Proposition 1 above is valid here): max aH
ðDvÞða2Z ð1aH Þ2 4a2H Þ : 8aZ ð1aH Þ
ð60Þ
Taking the derivative of this with respect to aH , using results derived above, yields ! aH ðaH 2Þ aZ Dv o 0: ð61Þ 2aZ ðaH 1Þ2 8 Finally, with respect to the efficiency results, note that the probability of approval is 12 with or without H’s presence when FG ðÞ ¼ FB ðÞ. On the other hand, when FG ðÞaFB ðÞ, Proposition 4 implies that the probability of a correct decision is decreasing in aX when aH ¼ 0. & References Admati, A., Pfleiderer, P., 2009. The wall street walk and shareholder activism: exit as a form of voice. Review of Financial Studies 22, 2645–2685. Allen, F., Gale, D., 1992. Stock price manipulation. Review of Financial Studies 5, 503–529. Attari, M., Banerjee, S., Noe, T., 2006. Crushed by a rational stampede: strategic share dumping and shareholder insurrections. Journal of Financial Economics 79, 181–222. Bagnoli, M., Lipman, B., 1996. Stock price manipulation through takeover bids. RAND Journal of Economics 27, 124–147. Barclay, M., Holderness, C., 1989. Private benefits from control of public corporations. Journal of Financial Economics 25, 371–395. Blair, D., Golbe, D., Gerard, J., 1989. Unbundling the voting rights and profit claims of common shares. Journal of Political Economy 97, 420–443. Brav, A., Jiang, W., Partnoy, F., Thomas, R., 2008. Hedge fund activism, corporate governance, and firm performance. Journal of Finance 63, 1729–1775.
A. Brav, R.D. Mathews / Journal of Financial Economics 99 (2011) 289–307
Burkart, M., Gromb, D., Panunzi, F., 1997. Large shareholders, monitoring, and the value of the firm. Quarterly Journal of Economics 112, 693–728. Burkart, M., Lee, S., 2008. The one share-one vote debate: a theoretical perspective. Review of Finance 12, 1–49. Chakraborty, A., Yilmaz, B., 2004. Informed manipulation. Journal of Economic Theory 114, 142–152. Christoffersen, S., Geczy, C., Musto, D., Reed, A., 2007. Vote trading and information aggregation. Journal of Finance 62, 2897–2929. Clifford, C., 2008. Value creation or destruction: hedge funds as shareholder activists. Journal of Corporate Finance 14, 323–336. DeAngelo, H., DeAngelo, L., 1985. Managerial ownership of voting rights: a study of public corporations with dual classes of stock. Journal of Financial Economics 14, 33–69. Edmans, A., 2009. Blockholder trading, market efficiency, and managerial myopia. Journal of Finance 64, 2481–2513. Edmans, A., Manso, G., forthcoming. Governance through trading and intervention: a theory of multiple blockholders. Review of Financial Studies. Gilson, R., 1987. Evaluating dual class common stock: the relevance of substitutes. Virginia Law Review 73, 807–844. Goldstein, I., Guembel, A., 2008. Manipulation and the allocational role of prices. Review of Economic Studies 75, 133–164. Greenwood, R., Schor, M., 2009. Investor activism and takeovers. Journal of Financial Economics 92, 362–375. Grossman, S., Hart, O., 1988. One share-one vote and the market for corporate control. Journal of Financial Economics 20, 175–202. Harris, M., Raviv, A., 1988. Corporate governance: voting rights and majority rules. Journal of Financial Economics 20, 203–255. Hu, H., Black, B., 2006. The new vote buying: empty voting and hidden (morphable) ownership. Southern California Law Review 79, 811–908. Hu, H., Black, B., 2007. Hedge funds, insiders, and the decoupling of economic and voting ownership: empty voting and hidden (morphable) ownership. Journal of Corporate Finance 13, 343–367. Kahan, M., Rock, E., 2007. Hedge funds in corporate governance and corporate control. University of Pennsylvania Law Review 155, 1021–1093.
307
Kahn, C., Winton, A., 1998. Ownership structure, speculation and shareholder intervention. Journal of Finance 53, 99–129. Kalay, A., Pant, S., 2009. One share-one vote is unenforceable and suboptimal. Unpublished working paper, University of Utah. Khanna, N., Marietta-Westberg, J., 2005. Rational manipulation by large shareholders: the hijacking of an election. Unpublished working paper, Michigan State University. Khanna, N., Sonti, R., 2004. Value creating stock manipulation: feedback effect of stock prices on firm value. Journal of Financial Markets 7, 237–270. Klein, A., Zur, E., 2009. Entrepreneurial shareholder activism: hedge funds and other private investors. Journal of Finance 64, 187–229. Kolasinksi, A., Reed, A., Ringgenberg, M., 2008. A multiple lender approach to understanding supply and demand in the equity lending market. Unpublished working paper, University of North Carolina at Chapel Hill. Kyle, A., Vila, J., 1991. Noise trading and takeovers. RAND Journal of Economics 22, 54–71. Manne, H., 1964. Some theoretical aspects of share voting: an essay in honor of Adolf A. Berle. Columbia Law Review 64, 1427–1445. Maug, E., 1998. Large shareholders as monitors: Is there a trade-off between liquidity and control? Journal of Finance 53, 65–98. Maug, E., 1999. How effective is proxy voting? information aggregation and conflict resolution in corporate voting contests. Unpublished working paper, University of Mannheim. Musto, D., Yilmaz, B., 2003. Trading and voting. Journal of Political Economy 111, 990–1003. Neeman, Z., Orosel, G., 2006. On the efficiency of vote buying when voters have common interests. International Review of Law and Economics 26, 536–556. Shleifer, A., Vishny, R., 1986. Large shareholders and corporate control. Journal of Political Economy 94, 461–488. Vila, J., 1989. Simple games of market manipulation. Economics Letters 29, 21–26. Zachariadis, K., Olaru, I., 2010. Trading and voting in distressed firms. Unpublished working paper, Northwestern University. Zwiebel, J., 1995. Block investment and partial benefits of control. Review of Economic Studies 62, 161–185.
Journal of Financial Economics 99 (2011) 308–332
Contents lists available at ScienceDirect
Journal of Financial Economics journal homepage: www.elsevier.com/locate/jfec
Vintage capital and creditor protection$ Efraim Benmelech a,c,, Nittai K. Bergman b,c a
Department of Economics, Harvard University, Littauer Center, Cambridge, MA 02138, USA Sloan School of Management, MIT, 50 Memorial Drive, Cambridge, MA 02142, USA c NBER, USA b
a r t i c l e i n f o
abstract
Article history: Received 5 October 2009 Received in revised form 26 January 2010 Accepted 31 January 2010 Available online 9 September 2010
We provide novel evidence linking the level of creditor protection provided by law to the degree of usage of technologically older, vintage capital in the airline industry. Using a panel of aircraft-level data around the world, we find that better creditor rights are associated with both aircraft of a younger vintage and newer technology, as well as firms with larger aircraft fleets. We propose that by mitigating financial shortfalls, enhanced legal protection of creditors facilitates the ability of firms to make large capital investments, adapt advanced technologies, and foster productivity. & 2010 Elsevier B.V. All rights reserved.
JEL classification: G01 G24 G33 Keywords: Asset-backed securities Bankruptcy Collateral Contagion
1. Introduction
$
We thank Marios Angeletos, Douglas Baird, Lucian Bebchuk, Guido Imbens, Boyan Jovanovic, Rafael La Porta, Florencio Lopez-de-Silanes, Giacomo Ponzetto, Adriano Rampini, Bill Schwert (the editor), Andrei Shleifer, Jeremy Stein, seminar participants at Berkeley, Columbia Law School, Duke, the European Summer Symposium in Financial Markets, Federal Reserve Bank of Boston, Harvard Business School, Harvard Economics, Harvard Law School, Hebrew University, Harvard-MIT organizational economics workshop, Imperial College, London Business School, London School of Economics, Northwestern, Ohio State University, Stanford, Stockholm School of Economics, Tel-Aviv University, Tilburg University, 2008 WFA meetings in Waikoloa, University of Alberta, University of Amsterdam, University of Illinois at UrbanaChampaign, Vienna University of Economics and Business Administration, Yale Law School, and especially Andrea Eisfeldt (the referee) for very constructive and helpful comments. We also thank Robert Grundy and Phil Shewring from Airclaims Inc. Alex Radu and Kate Waldock provided excellent research assistance. All errors are our own. Corresponding author at: Department of Economics, Harvard University, Littauer Center, Cambridge, MA 02138, USA. E-mail addresses: effi
[email protected] (E. Benmelech),
[email protected] (N.K. Bergman). 0304-405X/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jfineco.2010.09.004
There is a large body of evidence that better legal rules covering protection of corporate shareholders and creditors are associated with more developed financial markets and higher economic growth (La Porta, Lopezde-Silanes, Shleifer, and Vishny, 1997, 1998; King and Levine, 1993; Beck, Levine, and Loayza, 2000; Rajan and Zingales, 1998). While the empirical regularities found in the data are quite robust, most of the research is based on cross-country outcomes and suffers from small samples and potential identification problems (see Djankov, McLiesh, and Shleifer, 2007). In addition, the results from cross-country regressions do not pin down the underlying mechanism through which creditor rights and shareholder protection affect real economic outcomes. This paper attempts to fill this gap. We study the relation between creditor protection and the use of vintage capital in the airline industry in a sample of most of the aircraft in the world (489,916 aircraft-year observations) covering 5,987 operators in 129 countries in the years 1978–2003.
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
We find that airlines enjoying the benefits of higher creditor protection operate aircraft of a newer technology and younger vintage. The importance of new capital goods for economic growth has been suggested by Solow (1960): ‘‘ymany if not most innovations need to be embodied in new kinds of durable equipment before they can be made effective. Improvements in technology affect output only to the extent that they are carried into practice either by net capital formation or by the replacement of old-fashioned equipment by the latest modelsy’’ More recent theoretical models show that capital of older vintage hampers productivity and growth (Benhabib and Rustichini, 1991; Hsieh, 2001), slows technology diffusion (Chari and Hopenhayn, 1991), and increases income inequality across individuals and countries (Jovanovic, 1998). Empirical estimates suggest that around 60% of U.S. per capita growth is due to technical change that is embodied in new, more efficient capital goods (Greenwood, Hercowitz, and Krusell, 1997).1 Our paper provides novel evidence on a financial channel in technological adaption and capital formation. While we propose and provide evidence on one mechanism connecting financial constraints and creditor protection to aircraft vintage and fleet size, our results suggest a broader link between financial development, investor protection, and economic activity. Our empirical methodology differs from previous research which has focused mostly on aggregate, macroeconomic outcomes of investor protection such as financial market development and economic growth (King and Levine, 1993; La Porta, Lopez-de-Silanes, Shleifer, and Vishny, 1997, 1998; Rajan and Zingales, 1998). The wealth of the data and our focus on an important global industry allow careful consideration and identification of the specific mechanism through which investor protection affects and fosters technical progress and economic development.2 We start by developing a simple price-theory model of an airline choosing its scale and average asset age given an internal financing constraint and an external creditor protection environment that is determined at the country level. The airline must decide on the quantity of aircraft to purchase and their average age. Older aircraft are assumed to be less efficient—either because of depreciation in aircraft efficiency stemming from their normal use, or because of technological improvements in aircraft design over time. The model shows that increased availability of external finance due to enhanced creditor protection will have two important effects on firms. First,
1 See Boucekkine, de la Croix, and Licandro (2008) for a survey of the vintage capital literature. 2 Our paper adds to a growing body of literature that uses industryand firm-level data to evaluate the effects of investor protection and financial development on resource allocation (Fisman and Love, 2004; Wurgler, 2000) economic growth (Demirguc-Kunt and Maksimovic, 1998; Guiso, Sapienza, and Zingales, 2004), corporate risk-taking and innovation (Acharya, Amihud, and Litov, 2008; Acharya and Subramanian, 2009), and financial contracts and lending structures (Bergman and Nicolaievsky, 2007; Braun, 2003; Esty and Megginson, 2003; Lerner and Schoar, 2005; Liberty and Mian, 2010; Ongena and Smith, 2000; Qian and Strahan, 2007).
309
when creditor rights are greater, and hence financial constraints more relaxed, firms will be able to invest in newer, more expensive technologies. Second, since financing considerations will place fewer constraints on firm scale, firms will tend to be larger in countries with greater creditor rights.3 Using detailed profiles of most aircraft in the world during the period 1978–2003 we then study the relation between the level of creditor protection and two measures of aircraft vintage. The first measure of vintage is simply aircraft age, defined as the time elapsed since the date of the aircraft delivery. The second measure of vintage, called ‘‘technological age,’’ is calculated as the time elapsed since the model type of that aircraft was first introduced. The level of country creditor protection is measured using the creditor rights score as developed by La Porta, Lopez-de-Silanes, Shleifer, and Vishny (1997, 1998, 2008), and in particular, the more recent score that covers 129 countries in the years 1978–2003 (Djankov, McLiesh, and Shleifer, 2007). Consistent with our first prediction, our analysis shows that aircraft operated in countries with higher creditor protection are of a younger vintage and newer technology. Furthermore, we also find that operators’ size is larger in countries with better creditor protection. Our regressions control for a battery of economic development variables, legal origin, government ownership of airlines, a country’s civil aviation quality, and a variety of year, country, and airline fixedeffects. In particular, the panel dimension of our data allows us to control for country fixed-effects and hence, to identify off of changes in creditor rights within a country. To alleviate concerns about omitted variables and to provide additional evidence in support of the financing channel of technology adoption, we conduct a number of tests which split our sample into aircraft that should be treated by the creditor rights index and those that should not be treated by this index. We begin by splitting our sample into aircraft operated by commercial and private airlines, and those operated by the military. We expect the negative relation between creditor protection and both aircraft age and fleet size to hold only for nonmilitary operators, since private and commercial operators are those required to raise funds from outside investors in cases of cash-flow shortages. Moreover, only commercial and private operators would fall under the bankruptcy provisions of the local corporate and bankruptcy laws which are the essence of the creditor rights score. In contrast, sovereign debtors are incentivized to repay creditors mainly for reputational concerns and continued access to capital markets (Bulow and Rogoff, 1989a,b). Our results confirm this first conjecture: we find that the creditor rights score is correlated with the age and fleet size of commercial and private operators but is uncorrelated with the age and size of military fleets. Focusing on commercial and private operators, we then split our sample into planes that are leased and those 3 Our model is closely related to Eisfeldt and Rampini (2007) who show that firms which are credit constrained purchase more used, rather than new, capital because higher ex post maintenance payments of used capital relaxes current ex ante financial constraints.
310
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
which are not leased. Following Eisfeldt and Rampini (2009), we conjecture that leasing allows firms to alleviate some of the financial frictions associated with debt financing, as asset repossession is easier for a lessor than for a creditor. This difference in financing frictions implies, first, that airlines in countries with poor creditor rights will be more likely to lease rather than own aircraft, and second, that the negative relation between creditor rights and aircraft vintage described above should be concentrated amongst non-leased aircraft. We find support for both of these hypotheses in the data: (i) aircraft are more likely to be leased in countries with worse creditor protection, and (ii) while there is a strong statistically significant negative relation between creditor rights and non-leased aircraft, we find no relation between creditor rights and aircraft vintage amongst leased aircraft. By examining the relation between creditor rights and both leased and non-leased aircraft separately, we alleviate the concern that our results are driven by variation in unobserved variables, and in particular, variation in investment opportunities, correlated with variation in creditor rights. Indeed, there is little reason to suspect that increased investment opportunities should differentially impact the vintage of leased vs. non-leased aircraft. In contrast, the financing channel provides a clear prediction regarding the differential impact of creditor rights on the two methods of aircraft financing. We continue by splitting the sample of commercial aircraft based on airlines’ financial condition. The financing channel predicts that the negative relation between aircraft vintage and creditor rights should be stronger for airlines in poor financial health. We find that vintage of aircraft of airlines with lower leverage ratios and airlines with less debt overhang is less sensitive to creditor rights. While both leverage and long-term debt are clearly endogenous, our identification strategy relies on the interaction effects between country and firm characteristics. Furthermore, testing this prediction also alleviates the concern that a correlation between creditor rights and unobserved investment opportunities is driving our results, since there is little reason to suspect that increases in creditor rights are more strongly correlated with improved investment opportunities in financially constrained firms as compared to financially unconstrained firms. The rest of the paper is organized as follows. Section 2 presents a simple price-theory model. Section 3 provides a description of our data sources and summary statistics. Section 4 presents the empirical link between aircraft age and utilization and efficiency. Section 5 describes the empirical analysis of the relation between creditor rights and aircraft age and fleet size. Section 6 concludes.
2. The model We begin by providing a simple model of a firm facing an external financing constraint which needs to choose the vintage of the technology it will operate and its scale of operation. For simplicity, firms in our model will choose
between two technologies—new and old. Our main goal is to describe the cross-sectional variation in the allocation of vintage capital across firms operating in countries with creditor protection, and hence financial constraints, of varying degrees. The model is related to Eisfeldt and Rampini (2007), but assumes that technologies of different vintage are characterized by different production functions. 2.1. Technology allocation with exogenous prices Consider a continuous set of firms deciding on their scale of operation and deciding between the use of assets which embody either an old or a new technology. For consistency with the empirical section, we refer to firms as airlines and their assets as aircraft. For simplicity, we assume that airlines can use only one type of technology in their fleets. A fleet of qnew new aircraft is assumed to provide income f(qnew), where f is twice differentiable, concave, and f(0) = 0.4 Similarly, a fleet of qold old aircraft is assumed to provide income g(qold), where again, g is twice differentiable, concave, and g(0)= 0. As is common, concavity of the production function stems from decreasing returns to scale. Airlines are assumed to be price takers. New technology aircraft are supplied perfectly elastically at a price normalized to one while the price of an old aircraft is assumed in this section to be exogenously given at pold. An important aspect in the model will be the tradeoff between airline size and fleet quality. Given a fixed amount of capital expenditure, an airline will need to choose between operating a relatively large number of old-technology aircraft, or a smaller number of newtechnology aircraft. To solve this problem, it turns out useful to define an equivalence function between old and new aircraft, h, which relates a fleet of new aircraft of size qnew to the size of the old aircraft fleet with equal income. Formally, h satisfies g(h(qnew))= f(qnew), so that h(qnew) old aircraft provide equal income as qnew new aircraft.5 Initially, we assume that airlines have no internal funds and must purchase their fleets using funds raised in an external capital market. Each airline operates in a country with a level of protection provided to investors parameterized by m, where m measures the fraction of income that insiders within the airline can pledge to outside investors.6 Thus, given any income R, the airline’s pledgeable income—i.e., the maximal amount that it can guarantee as repayment to its investors—is m R, with m between zero and one. It should be emphasized that m is a country-level parameter, determined, for example, by the 4 Airline output is measured in passenger miles flown, which combined with the average ticket price, generate approximately 80% of airline revenue (Air Transport Association, 2007). 5 Since new aircraft are assumed to be more efficient than older aircraft—either because of depreciation in aircraft efficiency stemming from their normal use, or because of technological improvements in aircraft design over time—we have that hðqnew Þ4 qnew for all qnew. Clearly, also, h = g 1(f). 6 1m can be interpreted as the fraction of income insiders can costlessly expropriate from outside investors.
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
legal code in which firms operate. Capital markets are assumed to be perfectly competitive, and the discount factor is taken for simplicity to be one. To choose the size and technology of its fleet, each airline will compare the value of a new fleet to the value of an old one. The value of a fleet comprised of new aircraft in a country with investor protection m is given by:
311
f (qnew) qnew
Vnew ðmÞ ¼ Max½f ðqnew Þqnew s:t:
qnew r mf ðqnew Þ:
ð1Þ
)
Similarly, the value of a fleet comprised of old aircraft in a country with investor protection m is given by: Vold ðm,pold Þ ¼ Max½gðqold Þpold qold s:t: pold qold r mgðqold Þ:
p ol
q*new
ð2Þ
Since capital markets are perfectly competitive, outside investors break even, so that the maximand of the maximization problems have airlines obtaining the full NPV of the project subject to the financing constraint. We further assume that Vnew ð1Þ 4 Vold ð1,pold Þ so that in an economy without financial constraints (or in one in which firms can pledge all of their output to investors), the new technology is superior to the old technology.7 It is easy to see that the solution to (1), the newtechnology maximization problem, is given in the following manner (the old-technology solution is analogous). Define quc new to be the solution to the unconstrained problem, i.e., Max(q)(f(q) q). If quc new satisfies the financing constraint quc r mf ðquc Þ, and is hence achievable by the airline, it will be the constrained solution as well. On the other hand, if quc is not achievable (quc 4 mf ðquc ÞÞ, the financing constraint will be binding and the constrained solution will be defined implicitly by q ¼ mf ðqÞ. In this region, investment and firm value will be increasing in m, as increases in m relax the financing constraint. In choosing between the new and old technologies, an airline in a country with investor protection m simply compares Vnew ðmÞ to Vold ðm,pÞ. Proposition 1 describes this choice (all proofs are provided in Appendix A): Proposition 1. If h, the equivalency function between new and old technology, is convex in qnew, then for every pold there exists a m Z0 such that airlines operating in countries with investor protection m o m choose the old technology, and airlines operating in countries with investor protection m 4 m choose the new technology. Proposition 1 states that if h is convex, the allocation of vintage capital is such that airlines operating in low investor protection countries will choose an old aircraft fleet, while those operating in high investor protection countries will choose a new aircraft fleet. One potential reason for the equivalence function to be convex is when there are economies of scale in the use of the new technology but not in the use of the old technology. For example, in the context of aircraft, there might be economies of scale in the homogenous 7 This assumption is for expositional use only, and is no longer required once pold is endogenized.
w
q ne
h( d
qnew
Fig. 1. Combined cost and revenue functions of new and old technologies.
maintenance of new aircraft, but not of older heterogenous aircraft.8 The intuition for this convexity is that because of the economies of scale, the relative advantage of the new aircraft fleet increases with its size. As the size of the new aircraft fleet increases, the marginal increase in old aircraft required to replicate the new fleet will be ever increasing, i.e., h is convex. We provide a proof showing this result in Appendix A.9 To understand Proposition 1, it is useful to combine maximization problems (1) and (2) into one, by realizing that, in effect, each airline can produce income f(qnew) in two ways: either by employing qnew new aircraft or by employing h(qnew) old aircraft. Thus, an airline can obtain income f(qnew) at an effective cost of cðqnew Þ ¼ min½qnew ,pold hðqnew Þ. The maximization problem of an airline in a country with investor protection m can be written therefore as Max½f ðqnew Þcðqnew Þ s:t:
cðqnew Þ r mf ðqnew Þ:
ð3Þ
Fig. 1 depicts maximization problem (3) graphically in the case where h is convex in qnew. As can be seen, the convexity of h is equivalent to stating that the old technology has a comparative advantage when a firm operates on a small scale, while the new technology has a comparative advantage when the firm operates on a large scale. Indeed, if an airline operates in the region where income f(qnew) is comparatively low, then pold hðqnew Þ o qnew , i.e., the old technology is the efficient method of production. In contrast, if an airline operates in the region where income f(qnew) is comparatively high, then pold hðqnew Þ 4 qnew and the new technology 8
We thank the referee for making this point. In proving this result, we also show that while economies of scale lead to convexity of the equivalence function, they are not required. Indeed, assuming simply that both new and old aircraft have fixed marginal costs, and that their sole difference is that the marginal cost of new aircraft is lower than that of old ones—due, for example, to higher fuel efficiency—the equivalence function will be convex when marginal revenue is weakly concave. 9
312
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
technology simply decreases to the point where the demand for old technology equals the supply. Further, from Proposition 1, we know that at the equilibrium price p*old, it is the low m firms which are the ones who choose the old technology while the high m firms choose the new technology. We next characterize the size of airlines’ fleets as measured by their number of aircraft.
VNew()
Value
Vold(,pold)
Old tech
New tech
1.0
µ
Fig. 2. New and old technology value functions.
represents the efficient method of production. Economically, convexity of h arises if employing new-technology aircraft economizes on firm scale and organizational costs are convex in scale. To understand Proposition 1, note that when an airline operates in a low investor protection country (i.e., low m) its financial constraint will be binding, and thus its operating scale and income will be limited. It will therefore choose to operate the old technology because in low-scale production this is the efficient method of production. As the degree of investor protection improves, the financial constraint relaxes, and firm operating scale increases. Firms therefore switch to the new technology since it is the more efficient method of production at higher levels of production. Fig. 2 presents the value functions of the two technologies as a function of m. As Proposition 1 states, low m firms select the old technology, while high m firms select the new technology.10 2.2. Endogenous prices and fleet size We now endogenize the price of old technology, pold. To do so, we assume that there is a finite measure of preexisting old aircraft and that m is distributed according to some distribution function G. An equilibrium p*old is a price for old-technology aircraft such that the market for old aircraft clears. The equilibrium is characterized by the following proposition: Proposition 2. If h, the equivalency function between new and old technology, is convex in qnew, then the equilibrium p*old is such that there exists a m 40 such that airlines operating in countries with investor protection m o m choose the old technology, and airlines operating in countries with investor protection m 4 m choose the new technology. The intuition behind Proposition 2 is straightforward. All else equal, as the price of old technology, pold, decreases, Vold ðm,pold Þ increases compared to Vnew ðmÞ. Old technology therefore becomes more attractive and a larger fraction of airlines select it. The price of old 10
Both value functions become flat when the financing constraint becomes non-binding.
Proposition 3. There exist m and m such that fleet sizes of airlines with m r m are smaller than the fleet sizes of airlines with m Z m . Proposition 3 states that airlines operating in low m countries are financially constrained, so that their fleet sizes are restricted. In contrast, those operating in relatively high m countries will not be constrained, and indeed, their fleet sizes will be equal to the unconstrained level. As a final step, we relax the assumption that airlines have no internal funds and must fund all of their fleet acquisition employing external finance. We prove the following proposition: Proposition 4. If h, the equivalency function between new and old technology, is convex, for any level of investor protection m, there exists an AðmÞ, such that an airline operating in a country with investor protection m with internal capital AZ AðmÞ employs the new-technology fleet while an airline with internal capital Ao AðmÞ employs the old-technology fleet. Further, AðmÞ is decreasing in m. Proposition 4 states that all else equal, firms with internal wealth above a certain threshold will employ a new-technology fleet. Further, the level of internal funds at which airlines switch to the new technology is lower in countries with better investor protection. This is because these firms can more easily rely on external finance to ease their credit constraints. Internal wealth and investor protection are therefore natural substitutes in investment decisions. From Propositions 2–4, we have the following three predictions which are tested in the empirical section: Prediction 1. All else equal, airlines operating in countries with lower investor protection will have older vintage fleets. Prediction 2. All else equal, airlines operating in low investor protection countries will have smaller fleets than those operating in high investor protection countries. Prediction 3. The effect of the level of investor protection on airline fleet vintage will be smaller for airlines with greater internal funds.
3. Data and summary statistics This section describes the data sources used in the empirical analysis and presents summary statistics for both aircraft age and fleet size.
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
3.1. Aircraft-level data Throughout our analysis we utilize data from the Ascend CASE database—a leading provider of individual aircraft and airline data which contains ownership and operating information about most commercial and corporate aircraft worldwide as well as many military and government aircraft. We construct a sample of all aircraft that are available in the database for the 129 countries that are included in Djankov, McLiesh, and Shleifer (2007). Our sample consists of all aircraft worldwide over the period January 1, 1978 to December 31, 2003 in the Ascend CASE database.11 The data are very detailed and include information on individual aircraft characteristics such as model-type, serial number, year of construction, operating airline, and owner. The data in Ascend CASE thus enable us to uniquely identify most of the aircraft in the world during the time period studied in the paper. For each aircraft in the sample we construct two measures of aircraft vintage, which are then related to the creditor rights scores described below. The first measure is aircraft age, defined in each year as the time elapsed from the year of the aircraft’s initial delivery. The second measure of vintage, which we name ‘‘technological age,’’ is defined as the time elapsed from the year in which the aircraft’s model-type was first introduced. This second measure proxies for the age of the technology that is embodied in the aircraft. The Ascend CASE database defines two aircraft-type classifications: narrow and broad. We thus define two variants of technological age corresponding to the two aircraft-type classifications. To fix ideas consider the following example: aircraft N368AA, built in 1991, and delivered on December 5, 1991 to American Airlines is a Boeing 767-300ER. In this case, the broad classification is the B767 model-type which was first introduced in 1981. This particular variant of B767 (i.e., 300ER) was first introduced in 1986. Thus, as of the year 2008, the aircraft’s age is 17 years, its technological age using the broad classification is 27 years, and its technological age using the narrow definition is 22 years. Panel A of Table 1 displays summary statistics of aircraft age for four subperiods and for the entire sample. There are 489,916 aircraft-year observations in the entire sample, with an average (median) age of 13 (12) years, and a standard deviation of 9.2 years. The sample represents 219 aircraft types, and 5,987 operators from 129 countries. In the last two columns of Panel A, we split our sample into aircraft operated by commercial and private airlines (Commercial), and those operated by the military and government agencies (Military)—a distinction that plays an important role later in the analysis. There are 373,261 commercial aircraft and 116,655 that are classified as military aircraft in the sample. The commercial sample represents 161 aircraft types, and 5,437 operators from 129 countries, while the military sample represents 200 aircraft types, and 893 operators from 115 countries.
11 Benmelech and Bergman (2008, 2009) provide an extensive description of the Ascend CASE database.
313
Further, as can be seen in Panel A of the table, military aircraft are older than commercial aircraft; the average age of a commercial aircraft is 12.0 years compared to 16.0 years for military aircraft (p-value for an equal means t-test= 0.000). Panel B of Table 1 presents summary statistics for broad and narrow (in parentheses) technological age. The mean broad (narrow) technological age of the entire sample is 21.9 (18.2). As in Panel A, we split our sample into commercial and military aircraft in the last two columns of Panel B. Military aircraft embody older technology than commercial aircraft; the average broad technological age of a commercial aircraft is 20.2 years compared to 27.1 years for military aircraft (p-value for an equal means t-test= 0.000). 3.2. Country level data We match the data on individual aircraft to countrylevel macro and legal variables of the aircraft’s country of operator and owner. We augment the data from Ascend CASE with country-level macro data from the World Bank’s World Development Indicators database. This macro data include GDP and GDP per capita in U.S. dollars, GDP growth, GDP per capita growth, as well as country area (in sq. km.) and population data. We obtain data on legal origins and creditor rights from the new database assembled by Djankov, McLiesh, and Shleifer (2007) that covers 129 countries in the period 1978–2003. These new data are a major improvement upon the La Porta, Lopez-de-Silanes, Shleifer, and Vishny (1997, 1998) data, as they cover many more countries and track their variation in creditor rights score over time. For each country, the creditor rights index measures four powers of secured lenders in bankruptcy.12 First, whether there are restrictions on bankruptcy filing; second, whether there is no ‘‘automatic stay’’ or ‘‘asset freeze’’ that prevent secured creditors from seizing their collateral. Third, whether secured creditors are paid first, and finally, whether a trustee different from the management runs the firm during reorganization. A value of one is assigned to each of the provisions when a country’s law provides these powers to secured creditors. The creditor rights index is then calculated by aggregating the scores of the four provisions, and varies between a score of zero (poor creditor rights) and four (strong creditor rights). Djankov, McLiesh, and Shleifer (2007) collect time-series data on creditor rights for each of the 129 countries by identifying all major reforms and assessing their impact on the creditor rights score. Panel C of Table 1 reports summary statistics of the creditor rights index, GDP per capita, and legal origin. The mean (median) creditor rights in the sample is 1.64 (1.0) and the standard deviation is 1.01. The sample includes 276,601 aircraft from countries with English legal origin, 107,587 from countries with French legal origin, 50,982 from Socialist legal origin, 47,070 from German legal 12 See Djankov, McLiesh, and Shleifer (2007) for a comprehensive description of the index and its construction.
314
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
Table 1 Summary statistics. This table reports summary statistics for aircraft age, technological age (broad and narrow), and country characteristics. The summary statistics for aircraft age are reported for the periods 1978–1979, 1980–1989, 1990–1999, 2000–2003, as well as for the entire period. The table also reports summary statistics separately for commercial and military aircraft. 1978–1979
1980–1989
1990–1999
2000–2003
Full sample
Commercial
Military
0 4 9.2 10 13 32 5.5 96 969 89 18,854
0 5 11.0 11 17 42 7.4 136 2,134 102 128,455
0 5 13.5 12 21 52 9.4 196 4,051 129 235,860
0 6 14.7 13 22 56 10.6 202 3,133 129 106,747
0 6 13.0 12 19 56 9.2 219 5,987 129 489,916
0 5 12.0 11 18 56 8.8 161 5,437 129 373,261
0 8 16.0 15 23 47 9.8 200 893 115 116,655
1978–1979
1980–1989
1990–1999
2000–2003
Full sample
Commercial
Military
Panel B: Technological age Broad technological age (narrow technological age) Minimum 0 (0) 0 (0) 25th Percentile 13 (9) 15 (10) Mean 16.1 (12.6) 19.3 (15.3) Median 16 (12) 20 (16) 75th Percentile 20 (17) 24 (20) Maximum 32 (32) 43 (43) Standard deviation 5.2 (5.1) 7.3 (6.9) # of Aircraft types 96 136 # of Operators 969 2,134 # of Countries 89 102 # of Observations 18,854 128,455
0 (0) 14 (11) 22.7 (19.0) 24 (19) 30 (27) 52 (52) 9.9 (9.5)) 196 4,051 129 235,860
0 (0) 16 (13) 24.2 (20.7) 22 (19) 34 (30) 56 (56) 11.6 (11.1) 202 3,133 129 106,747
0 (0) 14 (11) 21.9 (18.2) 21 (17) 29 (25) 56 (56) 9.8 (9.4) 219 5,987 129 489,916
0 (0) 13 (10) 20.2 (16.7) 20 (16) 27 (23) 56 (56) 9.1 (8.8) 161 5,437 129 373,261
0 (0) 20 (15) 27.1 (22.7) 27 (22) 35 (30) 47 (47) 10.1 (10.0) 200 893 115 116,655
GDP per capita
English legal origin
French legal origin
German legal origin
Nordic legal origin
Socialist legal origin
$82.16 $2,122.3 $17,037.1 $19,591.6 $28,262.6 $45,390.5 $12,410.4
0 0 0.57 1 1 1 0.50 276,601
0 0 0.22 0 0 1 0.41 107,587
0 0 0.10 0 0 1 0.30 47,070
0 0 0.02 0 0 1 0.12 7,676
0 0 0.10 0 0 1 0.31 50,982
Panel A: Aircraft age Aircraft age Minimum 25th Percentile Mean Median 75th Percentile Maximum Standard deviation # of Aircraft types # of Operators # of Countries # of Observations
Creditor rights Panel C: Country characteristics Minimum 0 25th Percentile 1 Mean 1.64 Median 1 75th Percentile 2 Maximum 4 Standard deviation 1.01 Number of observations by legal origin
origin, and 7,676 aircraft are from countries with a Nordic legal origin. Table B1 in Appendix B lists the top ten countries with the most aircraft-year observations in the sample, and the bottom ten countries with the least aircraft-year observations. With a total of 184,122 observations, the U.S. accounts for 37.58% of the sample, followed by the Russian Federation (37,907 aircraft), U.K. (19,556 aircraft), and Canada (18,406 aircraft). The countries with the least observations in the data are Bosnia and Herzegovina (44 aircraft), Albania (49 aircraft), Niger (51 aircraft), and Togo (62 aircraft).
We collect all firms in SIC codes 4500–4580 and manually match them to the aircraft-level data from Ascend CASE. We also supplement the information with data from Compustat North America for U.S. airlines. After matching Ascend CASE to Compustat Global and Compustat North America and restricting the sample to the countries covered by Djankov, McLiesh, and Shleifer (2007), we are left with a subsample of 72 airlines from 29 countries, representing a panel of 94,272 aircraft-year observations.
4. Aircraft vintage and usage 3.3. Airline level data Finally, we match aircraft information to airline financial data where available. Information on airline financial data is obtained from Compustat Global.
We assume that assets of an older vintage are less efficient, either because they are less technologically advanced or due to physical depreciation. We therefore begin our empirical analysis with motivational evidence testing this assumption in the context of aircraft.
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
Measuring individual aircraft efficiency requires information on inputs (number of seats, man hours, fuel costs, operating times, routes, etc.) and outputs (number of passengers, revenue, arrival times). We cannot measure aircraft efficiency directly since we do not have access to these data at the individual aircraft level. Instead, we utilize data from the Ascend CASE database on aircraft usage as an approximation of aircraft efficiency.13 Spanning the period 1996–2006, the data provide hourly utilization rates for 25,009 aircraft worldwide. For each aircraft in the sample, the data tally the number of hours flown each year, as well as the aircraft type and year of build. We hypothesize that if aircraft efficiency is indeed decreasing with aircraft vintage, airlines will tend to decrease the operating times of their older vintage aircraft. Thus, for example, if older vintage aircraft are less fuel-efficient, airlines will shift their operations to the newer vintage aircraft in their fleet to the extent possible. Moreover, older aircraft require more maintenance and engine overhauls that would ground older aircraft for longer periods of time compared to newer ones. Table 2 reports the results from estimating the relation between annual hourly usage and both aircraft age and aircraft technological age for all aircraft with non-zero usage.14 For each aircraft-year pair, we calculate the age of an aircraft as the time that elapsed from its year of build and the aircraft’s technological age as the time elapsed from the year in which the aircraft’s model-type was first introduced. As can be seen from the first column of the table, the coefficient on aircraft age is negative ( 57.61) and is statistically significant at the 1% level. Thus, consistent with our underlying assumption that older vintage aircraft are less efficient, we find that aircraft usage declines with age. This result is robust to the addition of both aircraft type and aircraft fixed effects (not reported). The economic magnitude of this effect is significant: a one-standard-deviation increase in aircraft age of 8.62 years decreases aircraft yearly usage by approximately 450 hours, representing an 18% decline relative to the sample mean hourly usage of 2,466 hours. In column 2 we add the log of jet fuel price (averaged through the year) as a regressor, and in column 3 we also add the interaction term between jet fuel price and aircraft age.15 As column 3 demonstrates, the interaction term is negative and significant, implying that old aircraft are utilized less
13 Airline flights commonly have what is known as a ‘‘break-even load’’ which is the percentage of passenger seats that must be sold to justify the flight actually occurring (Morrel, 2007). Among other factors, the break-even load is influenced by the cost of flying operations (fuel costs being an important component), which implies that less efficient aircraft have higher break-even loads. Aircraft of lower efficiency are therefore flown less often. 14 Since aircraft may drop out of the sample when they are retired from active service, we analyze the relation between usage and age only for aircraft that have been utilized during the year. Thus, we analyze the intensive, rather than extensive, margin, and as such our results can be viewed as a lower bound on the relation between age and utilization. 15 We use the New York Harbor Kerosene-Type Jet Fuel Spot Price FOB (cents per gallon).
315
when jet fuel price is high. This is very much consistent with the notion that older aircraft are less fuel efficient. A one-standard-deviation increase in jet fuel price reduces the utilization of a 20-year old aircraft by 124.9 hours per year compared to a five-year old aircraft. Similarly, as column 4 shows, aircraft technological age is negatively related to annual hourly usage, with a onestandard-deviation increase in technological age reducing annual usage by 312 hours.16 Finally, as the last column of Table 2 shows, the use of old-technology aircraft is more sensitive to jet fuel price than the use of new-technology aircraft, consistent again with the notion that oldtechnology aircraft are less fuel efficient. Fig. 3 provides a graphical representation of this monotonic relation between age and usage. To construct the figure we regress yearly aircraft usage on the set of indicator variables defined for each possible value of aircraft age, while including year and aircraft-type fixed effects as well. The figure graphs the coefficients on the age indicator-variables along with their 95% confidence interval calculated by clustering at the aircraft-type level.17 The graph illustrates the evolution of aircraft usage with aircraft age. Consistent with our assumption that aircraft efficiency improves over time, aircraft usage declines with aircraft age. We also used aircraft age as an explanatory variable instead of the set of indicator variables that were used to construct Fig. 3. In unreported results, we find that the coefficient on aircraft age is consistently negative and is statistically significant at the 1% level whether we cluster the standard errors by aircraft-type or at the individual aircraft level. Thus, consistent with our underlying assumption that older vintage aircraft are less efficient, we find that aircraft usage declines with age.
5. Creditors rights and aircraft vintage 5.1. Baseline results Our simple model shows that the effects of financial constraints should be exacerbated in countries with poor creditor rights, where the availability of debt capital may be limited and its cost much higher. We therefore predict that airlines that operate in countries with poorer investor protection operate older vintage aircraft with older technologies. To test this prediction, we calculate the age and the technological age (using both the narrow and broad measures described above) of every aircraft in the 129 countries that are in our sample during the period 16 When using technological age as a dependent variable, we do not employ aircraft-type fixed effects since this regression would not be well identified—for any given year, all aircraft of the same type have equal technological age. Adding aircraft-type fixed effects is thus equivalent to identifying off of a simple linear time trend. Further, we do not report technological age results with aircraft fixed effects, as clearly, the coefficient on technological age in this specification is identical to that on age in the specification with aircraft fixed effects (columns 3 and 4). 17 The indicator variable for age equaling one is omitted, so that all coefficients are calculated in relation to the usage of aircraft of age one.
316
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
Table 2 Aircraft vintage and usage. The dependent variable is aircraft yearly usage in hours. Age is the age of the aircraft. Tech age is the technological age of the aircraft. Jet fuel is the logarithm of the average annual jet fuel price. All regressions include an intercept (not reported). Standard errors, reported in parentheses, are clustered by aircraft type. a, b, and c denote statistical significance at the 1%, 5%, and 10% levels, respectively. Dependent variable= Age
Hours (per year)
Hours (per year)
Hours (per year)
57.61a (6.219)
57.54a (6.207)
4.287 (29.662)
Tech age (broad) Jet fuel
Hours (per year)
Hours (per year)
Hours (per year)
31.964b (13.799)
31.896b (13.763) 119.151 (72.515)
27.010 (23.422) 330.397a (95.056)
b
84.456 (63.858)
276.341 (137.546) 11.951c (6.407)
Age jet fuel
18.457a (4.726)
Tech age jet fuel Fixed-effects Year # of Aircraft types # of Aircraft Adjusted R2 Observations
Yes
No
Yes
Yes
No
Yes
76 25,009 0.20 179,836
76 25,009 0.20 179,836
76 25,009 0.23 179,836
76 25,009 0.09 179,836
76 25,009 0.09 179,836
76 25,009 0.22 179,836
0 1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
-500
Hours
-1000
-1500
-2000
-2500
-3000 Age (years) Fig. 3. Annual hourly utilization as a function of aircraft age. Regression coefficients are calculated using year and aircraft-type fixed-effects. 95% confidence intervals are calculated using standard errors that are clustered by aircraft type.
1978–2003. We then run the following specification: Vintageiact ¼ a Creditor rightsct þXct l þ yt y þ zac c þ eiact , ð4Þ where the dependent variable, Vintageiact, is either the age or the technological age of aircraft i operated by operator a in country c in year t. Creditor rights is the creditor rights score of country c in year t, as measured by Djankov, McLiesh, and Shleifer (2007). Xc is a vector of country-specific control variables which includes the logarithm of country c’s GDP, GDP per capita, and annual rates of growth of both GDP and GDP per capita, the
logarithm of its population, and the logarithm of its area. In addition, in all specifications that do not include country fixed-effects, we include as control variables a set of indicator variables indicating the legal origin of the country—common law, French, German, Nordic, or Socialist.18 Finally, all regressions include year fixedeffects, yt, and depending on the specification may also
18 Country fixed-effects naturally preclude using legal origin controls as there is no time-series variation in legal origin in our sample period. For brevity of exposition, tables do not exhibit the coefficients on the legal origin dummy variables.
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
include country and operator fixed-effects represented by the vector of variables z. Since aircraft operators maintain their affiliation with the country of operation throughout the sample, country and operator fixed-effects are always applied separately in each specification. All regressions are estimated with heteroskedasticity-robust standard errors which are clustered by country. In our data, standard errors that are clustered by country are tenfold larger than simple robust standard errors. Thus, when we do not cluster, we get a t-statistic on creditor rights that is between 19.0 and 26.8. Since our variable of interest is creditor rights which is determined at the country level, we use the higher hurdle of clustering by country. The magnitudes of the differences between the standard errors when we cluster compared to simple robust standard errors are consistent with Kloek (1981) who shows that clustered standard errors are proportional toffi the squared root of the number pffiffiffiffiffiffiffiffi of clusters which is 129 ¼ 11:4 in our sample. Table 3 provides results of regression (4) over the entire sample. As hypothesized, we find that enhanced creditor rights are consistently negatively associated with both aircraft age as well as aircraft technological age. As the first column of Table 3 demonstrates, with year fixed-effects, increasing a country’s creditor rights score from zero to four, reduces the age of aircraft by 1.78 years, or 13.7% of the mean aircraft age of 13 years. Adding either country or operator fixed-effects (representing the 129 countries in the sample and 5,987 different operators) increases the magnitude of the negative impact of creditor rights on fleet aircraft age. With these fixed-effects, a movement from a creditor rights score of zero to a score of four reduces aircraft age by between 2.7 and 3.2 years, representing an approximate 20% reduction in the sample mean aircraft age. Columns 4–6 of Table 3 show that enhanced creditor rights is also negatively related to aircraft technological age constructed using the broad aircraft classification scheme. This result holds when using year, country, and operator fixed-effects. The impact of creditor rights is significant: moving the creditor rights score from zero to four reduces average technological age of aircraft in an airline’s fleet by between 1.6 and 2.8 years representing between 7.2% and 12.7% of the average technological age.19 Finally, as columns 7–9 show, repeating the analysis using technological age defined at the narrow classification scheme yields similar results. In sum, consistent with our prediction, aircraft are younger and embody newer technology in countries with better creditor rights, controlling for GDP per capita, population, area, and a battery of fixed-effects at the operator, country, and year level. We repeat the analysis in Table 3 by calculating average aircraft age within a country for each of the years, thereby collapsing the data to the country level, and estimating weighted least-squares regressions. These regressions, which are not reported for brevity, yield similar results. The negative relation between creditor rights and both aircraft age and aircraft technological age points to a
19 As in columns 1–3, the magnitude of the effect is larger when either operator or country fixed-effects are included.
317
financing channel through which improved investor protection and its associated reduction in financial frictions affects firm investment policy and ultimately real outcomes. According to this, the ability to raise external finance is an important determinant of a firm’s capacity to invest in newer technologies which is a key driver of economic growth.
5.2. Identification strategy As is usually the case in cross-country analysis, the main empirical challenge is endogeneity and, in particular, an omitted variable bias problem. Specifically, the creditor rights score could be correlated with other unidentified variables, such as investment opportunities, which are in turn influencing asset vintage choices. The relation between creditor rights and vintage could then be explained by effects other than the financing channel proposed in this paper. Most of the analysis that follows is aimed, therefore, at addressing the direction of causality in the empirical findings of Table 3. To overcome the omitted variables problem, we first utilize the panel nature of our data and the changes in creditor rights over time. By including country and operator fixed-effects, we control for unobserved and non-time-varying heterogeneity of operators and countries. In these specifications we identify off of changes in creditor rights over time within a country. Indeed, we find that in specifications which include country fixed-effects, the negative association between creditor rights and age is the largest, which is consistent with a large effect of changes in creditor rights within a country. However, while country fixed-effects help to mitigate concerns about unobserved heterogeneity, including these fixed-effects raises the issue of the endogeneity of creditor protection laws themselves. For example, a country may revise its corporate and bankruptcy laws precisely when underlying economic conditions improve. In this case the correlation between creditor rights and aircraft vintage may merely reflect increased demand for better aircraft driven by improvements in economic conditions. One solution for the endogeneity concern is to use an instrumental variable approach. However, variables that are correlated with creditor rights are also potentially correlated with aircraft age through channels other than the law, and hence, will not meet the ‘‘exclusion’’ restriction. Consider, for example, legal origins (as in La Porta, Lopez-de-Silanes, Shleifer, and Vishny, 1998) as an instrument for creditor rights. While correlated with creditor rights, legal origins are potentially correlated with aircraft vintage through other legal and economic mechanisms such as safety regulation or engineering quality. Djankov, McLiesh, and Shleifer (2007) raise similar concerns about the validity of legal origin as an instrument for creditor rights in general. In the absence of an instrument, we identify the causal effect of creditor rights on aircraft age by splitting our sample into aircraft that are expected to be treated by stronger creditor rights and those that should not. Our identification strategy is threefold. First, we split the
318
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
Table 3 Creditor rights and aircraft vintage. The dependent variable is aircraft age (columns 1–3), broad technological age (columns 4–6), and narrow technological age (columns 7–9). GDP is the natural logarithm of real GDP, GDP growth is the annual growth rate of GDP, GDP per capita is the natural logarithm of real GDP per capita, GDP per capita growth is the annual growth rate of GDP per capita. Population is the natural logarithm of the population. Area is the natural logarithm of the country surface in sq. km. Creditor rights is an index aggregating creditor rights, following Djankov, McLiesh, and Shleifer (2007). The index ranges from zero (weak creditor rights) to four (strong creditor rights) and is constructed as of January for every year from 1978 to 2003. Columns without country or operator fixed-effects also include dummies for French legal origin, German legal origin, Nordic legal origin, and Socialist legal origin (not reported for brevity). All regressions include an intercept (not reported) and year fixed-effects. Standard errors are clustered by country and reported in parentheses. a, b, and c denote statistical significance at the 1%, 5%, and 10% levels, respectively. Dependent variable= GDP GDP growth GDP per capita GDP per capita growth Population Area Creditor rights
Age
Age
Age
Tech age (broad)
Tech age (broad)
Tech age (broad)
Tech age (narrow)
Tech age (narrow)
Tech age (narrow)
0.021 (0.150) 0.056 (0.061) 0.476c (0.242) 0.095b (0.048) 0.531b (0.258) 0.362 (0.211) 0.446b (0.218)
0.125 (0.162) 0.090a (0.032) 2.245a (0.571) 0.053a (0.020) 10.610a (2.005) 6.993 (5.497) 0.791b (0.339)
0.094 (0.142) 0.056c (0.031) 1.444c (0.862) 0.050a (0.020) 2.959b (1.352) 0.483 (1.317) 0.668b (0.305)
0.007 (0.144) 0.005 (0.048) 0.806a (0.284) 0.025 (0.027) 0.647c (0.330) 0.398 (0.245) 0.395c (0.232)
0.124 (0.142) 0.073a (0.031) 1.890a (0.583) 0.039a (0.016) 8.366a (2.208) 17.035a (4.659) 0.696b (0.302)
0.075 (0.156) 0.050c (0.027) 1.263c (0.710) 0.031b (0.014) 1.567 (1.279) 0.530 (1.082) 0.443c (0.257)
0.039 (0.122) 0.038 (0.045) 0.917a (0.212) 0.069b (0.029) 0.511c (0.263) 0.413c (0.225) 0.548b (0.229)
0.037 (0.134) 0.073a (0.027) 2.719a (0.581) 0.042a (0.014) 7.851a (2.092) 2.330 (4.477) 0.527c (0.301)
0.054 (0.138) 0.049b (0.024) 1.681b (0.774) 0.037a (0.012) 1.776 (1.369) 0.303 (1.234) 0.301 (0.285)
Yes No No
Yes Yes No
Yes No Yes
Yes No No
Yes Yes No
Yes No Yes
Yes No No
Yes Yes No
Yes No Yes
129 5,883 0.09 489,407
129 5,883 0.12 489,407
129 5,883 0.45 489,407
129 5,883 0.11 489,407
129 5,883 0.15 489,407
129 5,883 0.51 489,407
129 5,883 0.15 489,407
129 5,883 0.18 489,407
129 5,883 0.51 489,407
Fixed-effects Year Country Operator # of Countries # of Operators Adjusted R2 Observations
sample between commercial and military aircraft and show that, as expected, military aircraft are not treated by creditor rights. Second, we focus on commercial aircraft and split the sample between leased and owned aircraft. Since prior literature has shown that leasing allows firms to relax financial frictions (see, e.g., Eisfeldt and Rampini, 2009), the financing channel predicts that the negative relation between creditor rights and vintage should be concentrated amongst non-leased aircraft. Our empirical results confirm the prediction that leased aircraft are not treated by the creditor rights index similarly to nonleased aircraft. Finally, we study the differential effect of creditor rights on commercial aircraft conditional on the financial health of the airline. The financing channel predicts, and we indeed find, that the relation between creditor rights and vintage is concentrated amongst airlines that are in poorer financial health. 5.3. Commercial vs. military aircrafts In Table 4, for every country, we divide our sample into aircraft operated by commercial airlines and private operators, and those operated by militaries, armed forces, and government agencies. For example, as of December 31, 2003, there are ten U.S. federal agencies or military operators in our sample: Federal Aviation Administration, NASA, U.S. Air Force, U.S. Air National Guard, U.S. Army, U.S. Army National Guard, U.S. Coast Guard, U.S. Customs Service, U.S. Marine
Corps, and the U.S. Navy. Likewise, as of December 31, 2003, there are four military operators and one government agency in the Islamic Republic of Iran: Iran National Cartographic Center, Iranian Air Force, Iranian Army, Iranian Navy, and the Iranian Revolutionary Guard. We expect the negative relation between aircraft age and creditor protection to be concentrated in commercial and private aircraft operators, since these are the firms which would be required to raise funds from external investors in cases of financial shortfalls. In addition, commercial and private firms would fall under the bankruptcy provisions of the local corporate bankruptcy laws which are the essence of the creditor protection score. In contrast, government agencies, militaries and other armed forces obtain funding from their governments that are in turn subject to international law. However, when sovereign governments default on their debt, creditors cannot effectively seize the country’s assets.20 Instead, creditors litigate with their sovereign borrowers using international law. Sovereign borrowers are then induced to settle with their creditors as they want to maintain access to capital markets (Bulow and Rogoff, 1989a,b). In summary, corporate and bankruptcy law do not apply to aircraft operated by militaries, armed forces, and government agencies.
20 Although lenders can potentially seize commercial or military and government-owned aircraft, this strategy is not very useful except as a strategy of harassment (Shleifer, 2003).
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
319
Table 4 Creditor rights and aircraft age: commercial vs. military aircraft. The dependent variable is aircraft age of either commercial or military aircraft. GDP is the natural logarithm of real GDP, GDP growth is the annual growth rate of GDP, GDP per capita is the natural logarithm of real GDP per capita, GDP per capita growth is the annual growth rate of GDP per capita. Population is the natural logarithm of the population. Area is the natural logarithm of the country surface in sq. km. Airforce size is the fraction of military aircraft as a percentage of all aircraft in a country. Creditor rights is an index aggregating creditor rights, following Djankov, McLiesh, and Shleifer (2007). The index ranges from zero (weak creditor rights) to four (strong creditor rights) and is constructed as of January for every year from 1978 to 2003. The first two columns also include dummies for French legal origin, German legal origin, Nordic legal origin, and Socialist legal origin (not reported for brevity). All regressions include an intercept (not reported) and year fixed-effects. Standard-errors are clustered by country and reported in parentheses. a, b, and c denote statistical significance at the 1%, 5%, and 10% levels, respectively.
GDP GDP growth GDP per capita GDP per capita growth Population Area Airforce size Creditor rights
Commercial age
Military age
Commercial age
Military age
Commercial age
Military age
0.180 (0.387) 0.059 (0.053) 0.756c (0.387) 0.059b (0.029) 0.919a (0.289) 0.546b (0.255) 0.177 (0.389) 0.512b (0.240)
0.153 (0.147) 0.206 (0.164) 1.464a (0.353) 0.284c (0.166) 0.254 (0.343) 0.071 (0.249) 1.160b (0.453) 0.351 (0.409)
0.242 (0.154) 0.092a (0.026) 2.390a (0.623) 0.032a (0.011) 9.251a (2.361) 2.611 (5.878) 0.774c (0.413) 0.731c (0.412)
0.064 (0.153) 0.132c (0.068) 1.056 (1.189) 0.167a (0.065) 1.732 (4.402) 83.003b (37.051) 5.538a (0.863) 0.293 (0.597)
0.142 (0.115) 0.058a (0.020) 1.815a (0.640) 0.030a (0.010) 1.800 (1.395) 0.857 (1.126) 0.986b (0.412) 0.619b (0.312)
0.038 (0.173) 0.150b (0.070) 1.003 (1.293) 0.178a (0.067) 2.381 (4.488) 9.820 (6.390) 5.100a (0.972) 0.490 (0.646)
Yes No No
Yes No No
Yes Yes No
Yes Yes No
Yes No Yes
Yes No Yes
129 5,435 0.10 372,897
114 733 0.15 116,510
129 5,435 0.14 372,897
114 733 0.20 116,510
129 5,435 0.49 372,897
114 733 0.32 116,510
Fixed- effects Year Country Operator # of Countries # of Operators Adjusted R2 Observations
To formally test the hypothesis that the negative relation between aircraft age and creditor protection will be concentrated in commercial and private aircraft operators, Table 4 reports the results of running regression (4) separately for commercial aircraft and aircraft operated by military and other government agencies.21 The dependent variable is aircraft age and the explanatory variables are as in Table 3 with the addition of the country’s airforce size. Consistent with the results in Table 3, in all specifications, the age of commercial aircraft is statistically negatively related to creditor rights with a similar economic impact found in Table 3, while the age of military aircraft is unrelated to the creditor rights score in a statistically significant manner.22 Table 5 repeats the
analysis, separating the sample into commercial aircraft and aircraft operated by military and other government agencies, but this time using aircraft technological age as the dependant variable. Again, we find that while aircraft technological age, defined using either the narrow or broad classification, is negatively related to creditor rights in the subsample of commercial aircraft, there is no statistically significant relation between the technological age of military aircraft and country creditor rights scores. Interestingly, while not statistically significant, we find that in the subsample of military aircraft, the coefficients on creditor rights scores are actually positive.
5.4. Creditor rights and aircraft leasing 21 This specification is identical to running one regression with all the explanatory variables interacted with a military dummy. We prefer to report the results separately for commercial and military aircraft as the exposition is clearer. Table 10 reports tests on the statistical significance of the difference between commercial and military aircraft. 22 We note that the main benefit of comparing military to commercial aircraft is not the comparison of their average aircraft vintage as this could be driven by technological differences, but rather the comparison of the sensitivity of their aircraft vintage to creditor protection. Showing that military aircraft vintage is not related to creditor protection can be thought of as a first test that the data should pass.
In the analysis above, we do not distinguish between airlines that lease aircraft instead of purchasing them through debt financing.23 Eisfeldt and Rampini (2009) show that, in the U.S., since the repossession of leased assets is easier than foreclosure on collateral of secured debt, lease financing allows for higher debt capacity than secured debt. Put differently, lease financing aids firms to 23 Lease financing of aircraft is fairly common, particularly in the United States (see, e.g., Benmelech and Bergman, 2008; Gavazza, 2010).
320
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
Table 5 Creditor rights and technological age: commercial vs. military aircraft. The dependent variable is the broad technological age (columns 1–4), or narrow technological age (columns 5–8) of either commercial or military aircraft. GDP is the natural logarithm of real GDP, GDP growth is the annual growth rate of GDP, GDP per capita is the natural logarithm of real GDP per capita, GDP per capita growth is the annual growth rate of GDP per capita. Population is the natural logarithm of the population. Area is the natural logarithm of the country surface in sq. km. Airforce size is the fraction of military aircraft as a percentage of all aircraft in a country. Creditor rights is an index aggregating creditor rights, following Djankov, McLiesh, and Shleifer (2007). The index ranges from zero (weak creditor rights) to four (strong creditor rights) and is constructed as of January for every year from 1978 to 2003. All regressions include an intercept (not reported) and year fixedeffects. Standard errors are clustered by country and reported in parentheses. a, b, and c denote statistical significance at the 1%, 5%, and 10% levels, respectively.
GDP GDP growth GDP per capita GDP per capita growth Population Area Airforce size Creditor rights
Commercial tech age (broad)
Military tech age (broad)
Commercial tech age (broad)
Military tech age (broad)
Commercial tech age (narrow)
Military tech age (narrow)
Commercial tech age (narrow)
Military tech age (narrow)
0.246c (0.140) 0.076a (0.024) 2.066b (0.893) 0.022b (0.009) 8.562a (2.562) 5.600 (5.573) 0.481b (0.532) 0.804b (0.371)
0.040 (0.150) 0.097a (0.033) 0.422 (1.086) 0.102a (0.031) 5.885c (3.110) 40.959c (22.716) 5.588a (0.846) 0.487 (0.639)
0.093 (0.125) 0.044a (0.018) 1.423b (0.626) 0.016c (0.009) 0.659 (1.309) 1.110 (1.005) 0.627 (0.493) 0.680a (0.260)
0.046 (0.163) 0.114a (0.030) 0.380 (1.202) 0.110a (0.025) 5.480c (3.006) 3.170 (3.537) 5.309a (0.908) 0.401 (0.706)
0.169 (0.145) 0.069a (0.022) 2.664a (0.725) 0.025a (0.007) 7.231a (2.436) 1.877 (4.482) 0.840b (0.410) 0.678c (0.381)
0.176c (0.103) 0.105a (0.035) 0.047 (1.061) 0.125a (0.029) 3.058 (2.878) 111.283a (20.545) 5.047a (0.745) 0.629 (0.557)
0.095 (0.121) 0.041a (0.016) 1.874a (0.605) 0.021a (0.006) 0.974 (1.369) 1.355 (1.078) 0.992b (0.395) 0.481c (0.280)
0.141 (0.125) 0.115a (0.033) 0.131 (1.231) 0.127a (0.027) 2.490 (2.993) 3.474 (3.470) 4.619a (0.843) 0.533 (0.632)
Yes Yes No
Yes Yes No
Yes No Yes
Yes No Yes
Yes Yes No
Yes Yes No
Yes No Yes
Yes No Yes
129 5,437 0.18 372,897
114 893 0.31 116,510
129 5,437 0.48 372,897
114 893 0.49 116,510
129 5,437 0.23 372,897
114 893 0.26 116,510
129 5,437 0.53 372,897
114 893 0.41 116,510
Fixed-effects Year Country Operator # of Countries # of Operators Adjusted R2 Observations
circumvent some of the financial frictions associated with debt financing. To the extent that this result generalizes to other countries outside the U.S. (for example, because the title to the asset remains with the lessor but is not in the possession of a secured creditor) and to the extent that as creditor protection improves, the wedge between the availability of debt and lease financing decreases, we would expect two implications to arise. First, airlines operating in countries with poor creditor rights should be more likely to use lease financing rather than plain debt because of the associated reduction in financial frictions. Second, if leasing reduces financial frictions, then the results found in Tables 3–5 showing that creditor rights is negatively related to aircraft vintage should be stronger for non-leased aircraft. To test the first conjecture, we run a probit regression on the sample of all commercial aircraft, relating a country’s creditor rights score to the likelihood of an individual aircraft being leased:
Shleifer (2007), Xc is the vector of control variables used in regression (4), and yt is a vector of year fixed-effects. The point estimate of 0.026 (p-value= 0.04) implies that airlines in countries with poor creditor rights are indeed more likely to lease their aircraft. This effect is economically significant: moving from a creditor rights score of four to a creditor rights score of zero increases the likelihood that an aircraft will be leased by 10.4 percentage points, representing an increase of 24.2% relative to the unconditional mean. Thus, the data do indeed suggest that airlines operating in countries with low creditor protection are more likely to resort to lease financing.24 To test the second conjecture, that the negative relation between creditor rights and aircraft vintage should be stronger for non-leased aircraft, we repeat the analysis in regression (4) separately for leased and nonleased aircraft.25 Results are reported in Table 6.
Prðleased ¼ 1Þ ¼ Fð0:026 Creditor rightsct þXct l þ yt yÞ,
24 As an aside, there must be some added cost associated with lease financing, since otherwise it would be the dominant form of raising external capital. One cost associated with leasing that is typically considered is the extra associated agency costs. These will be paid for up front by the lessee. 25 As before, this specification is isomorphic to running one regression with all the explanatory variables interacted with a
ð5Þ where FðÞ is the standard normal cumulative distribution function, Creditor rights is the creditor rights score of country c in year t as measured by Djankov, McLiesh, and
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
321
Table 6 Creditor rights and age: leased vs. non-leased aircraft. The dependent variable is aircraft age or broad technological age of either non-leased or leased aircraft. GDP is the natural logarithm of real GDP, GDP growth is the annual growth rate of GDP, GDP per capita is the natural logarithm of real GDP per capita, GDP per capita growth is the annual growth rate of GDP per capita. Population is the natural logarithm of the population. Area is the natural logarithm of the country surface in sq. km. Creditor rights is an index aggregating creditor rights, following Djankov, McLiesh, and Shleifer (2007). The index ranges from zero (weak creditor rights) to four (strong creditor rights) and is constructed as of January for every year from 1978 to 2003. Columns without country or operator fixed-effects also include dummies for French legal origin, German legal origin, Nordic legal origin, and Socialist legal origin (not reported for brevity). All regressions include an intercept (not reported) and year fixed-effects. Standard errors are clustered by country and reported in parentheses. a, b, and c denote statistical significance at the 1%, 5%, and 10% levels, respectively.
GDP GDP growth GDP per capita GDP per capita growth Population Area Creditor rights
Non-leased age
Leased age
Non-leased age
Leased age
Non-leased age
Leased age
Non-leased tech age (broad)
Leased tech age (broad)
0.408 (0.271) 0.245b (0.113) 1.159a (0.374) 0.168 (0.230) 1.373a (0.487) 0.938a (0.287) 0.984a (0.376)
0.111 (0.137) 0.159 a (0.057) 0.778b (0.320) 0.029 (0.018) 1.026a (0.355) 0.926 (0.202) 0.155 (0.243)
0.515c (0.278) 0.085b (0.036) 3.614a (0.747) 0.067a (0.025) 9.949a (2.634) 42.389a (16.105) 0.699 (0.475)
0.031 (0.146) 0.094 a (0.028) 2.129a (0.587) 0.016a (0.005) 8.020b (3.870) 10.853b (5.161) 0.320 (0.450)
0.331c (0.180) 0.048 (0.036) 2.909a (0.970) 0.063b (0.029) 5.106b (2.304) 4.013b (1.719) 0.651b (0.323)
0.115 (0.136) 0.069 a (0.026) 2.322 (0.531) 0.016a (0.002) 2.169c (1.288) 1.287c (0.701) 0.237 (0.418)
0.315 (0.287) 0.231b (0.112) 1.470a (0.372) 0.140 (0.119) 1.291a (0.466) 0.866a (0.312) 0.732b (0.357)
0.026 (0.124) 0.114b (0.061) 1.332a (0.273) 0.021 (0.019) 0.789c (0.417) 0.871a (0.271) 0.132 (0.226)
Yes No No
Yes No No
Yes Yes No
Yes Yes No
Yes No Yes
Yes No Yes
Yes No No
Yes No No
128 3,750 0.11 213,250
129 3,600 0.05 159,647
128 3,750 0.20 213,250
129 3,600 0.11 159,647
128 3,750 0.52 213,250
129 3,600 0.53 159,647
128 3,750 0.17 213,250
129 3,600 0.07 159,647
Fixed-effects Year Country Operator # of Countries # of Operators Adjusted R2 Observations
Consistent with our conjecture, we find that the negative relation between aircraft vintage and creditor rights is indeed concentrated amongst owned, rather than leased, aircraft. For owned aircraft, a zero to four increase in creditor rights is associated with a reduction of between 2.6 and 3.9 years in aircraft vintage. In contrast, in the leased aircraft subsample, in all of the specifications the coefficient on creditor rights is not statistically different from zero. Another benefit to examining the effect of creditor rights on leased and non-leased aircraft separately is that it alleviates the concern that our results are driven by variation in omitted variables and in particular, variation in investment opportunities correlated with variation in creditor rights. While this concern is partly addressed by our battery of GDP-based controls, and operator and country fixed-effects specifications, to the extent that time-series variation in creditor rights is correlated with investment opportunities—for example because bankruptcy reform may be enacted simultaneously with other economic reforms (see, e.g., Acharya and Subramanian, 2009)—we cannot completely rule out the possibility that variation in investment opportunities is driving the results. However,
(footnote continued) leasing dummy. Table 9 reports tests on the statistical significance of the difference between leased and non-leased aircraft.
the fact that the negative relation between creditor rights on aircraft vintage is concentrated in non-leased aircraft alleviates this concern. There is little reason to suspect that increased investment opportunities should differentially impact the vintage of leased as compared to non-leased aircraft, while the financing channel provides a clear prediction regarding the differential impact of creditor rights on the two methods of aircraft financing. One concern with our analysis in this section is that while the ability to foreclose and repossess assets is, in general, unique to lessors, there is one exception that also allows secured creditors to foreclose on assets in the event of bankruptcy. Section 1110 of the U.S. Bankruptcy Code provides relief from the automatic stay of assets in bankruptcy to creditors holding a secured interest in aircraft, strengthening the creditor rights of these creditors.26 Given that many non-leased aircraft are likely to be financed through secured financing, our measures of creditor rights—which include automatic stay as one of its components—may be biased downwards for the case of the U.S.
26 The U.S. Bankruptcy Code began to treat aircraft financing favorably in 1957, but it was not until 1979 that Congress amended the Bankruptcy Code and introduced Section 1110 protection which provides creditors relief from the automatic stay.
322
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
Table 7 Creditor rights and age of non-leased aircraft: poor vs. rich countries. The dependent variable is aircraft age in either rich or poor countries. Rich (poor) countries are defined in each year as those with GDP per capita greater (smaller) than the sample median of that year. GDP is the natural logarithm of real GDP, GDP growth is the annual growth rate of GDP, GDP per capita is the natural logarithm of real GDP per capita, GDP per capita growth is the annual growth rate of GDP per capita. Population is the natural logarithm of the population. Area is the natural logarithm of the country surface in sq. km. Creditor rights is an index aggregating creditor rights, following Djankov, McLiesh, and Shleifer (2007). The index ranges from zero (weak creditor rights) to four (strong creditor rights) and is constructed as of January for every year from 1978 to 2003. Enforcement follows Knack and Keefer (1995) and is an index on the scale of zero (low enforcement) to ten (high enforcement) defined as the ‘‘The relative degree to which contractual agreements are honored and complications presented by language and mentality differences.’’ Columns without country or operator fixed-effects also include dummies for French legal origin, German legal origin, Nordic legal origin, and Socialist legal origin (not reported for brevity). All regressions include an intercept (not reported) and year fixed-effects. Standard errors are clustered by country and reported in parentheses. a, b, and c denote statistical significance at the 1%, 5%, and 10% levels, respectively.
GDP GDP growth GDP per capita GDP per capita growth Population Area Creditor rights
Rich age
Poor age
Rich age
Poor age
Rich age
Poor age
Rich (U.S. excluded) age
Rich age
Poor age
0.483c (0.274) 0.033b (0.091) 1.805a (0.329) 0.106 (0.087) 0.453a (0.447) 0.762a (0.252) 0.715b (0.355)
0.351a (0.132) 0.136 (0.264) 1.553 (1.037) 0.042 (0.286) 2.118a (0.515) 0.117 (0.522) 0.172 (0.460)
0.655a (0.250) 0.085b (0.039) 4.579a (1.620) 0.064a (0.024) 9.023a (2.896) 47.645a (14.562) 1.017a (0.335)
0.014 (0.112) 0.268c (0.159) 3.380a (0.735) 0.186 (0.162) 0.370 (6.180) 731.985b (336.94) 0.025 (0.634)
0.368c (0.189) 0.063 (0.039) 5.005a (1.677) 0.056b (0.027) 2.743 (2.031) 2.274 (1.418) 0.648b (0.311)
0.148 (0.115) 0.220 (0.142) 2.695a (0.427) 0.168 (0.143) 0.869 (4.260) 1.904c (3.924) 0.345 (0.548)
0.233 (0.316) 0.061c (0.034) 4.354a (1.534) 0.066b (0.026) 10.043a (2.949) 44.943a (16.585) 0.941a (0.348)
0.577 (0.372) 0.019 (0.099) 2.625b (1.222) 0.009 (0.041) 0.579 (0.600) 0.884b (0.353) 0.803b (0.403) 0.893 (0.643)
0.129 (0.168) 1.583 (1.143) 0.889 (2.184) 1.602 (1.195) 0.729 (1.017) 2.007 (1.893) 0.489 (0.400) 3.904c (2.258)
Yes No No
Yes No No
Yes Yes No
Yes Yes No
Yes No Yes
Yes No Yes
Yes Yes No
Yes No No
Yes No No
66 2,963 0.12 177,937
72 845 0.26 35,313
66 2,963 0.16 177,937
72 845 0.37 35,313
66 2,963 0.51 177,937
72 845 0.58 35,313
65 2,093 0.25 106,027
37 1,649 0.08 86,058
15 269 0.36 13,492
Enforcement Fixed-effects Year Country Operator # of Countries # of Operators Adjusted R2 Observations
However, we use in our empirical analysis the entire score of the creditor rights index, and given that the incidence of automatic stay is highly correlated with the general leniency towards borrowers, our results are unlikely to be driven by the provision of Section 1110 of the U.S. Bankruptcy Code. Nevertheless, we rerun our regressions by excluding all aircraft from the U.S. and find (in unreported results available upon request) that our results are unchanged even when the U.S. is excluded from the sample; our results are therefore not driven by Section 1110.
5.5. Creditor rights and aircraft leasing: poor vs. rich countries Table 7 revisits the results of the previous table for richer (higher or equal to GDP per capita median) and poorer (below GDP per capita median) countries separately. This allows us to control better for economic development, by not only including all four GDP-based measures of development, but also allowing for a more flexible functional form in the relation between aircraft
age and economic development. As can be seen in the table, we find that among the development variables, GDP per capita is the strongest predictor of aircraft age across most of the specifications with higher GDP per capita growth associated with younger aircraft. More importantly, we find that the negative relation between creditor rights and aircraft age is driven by rich countries (the coefficient on creditor rights is between 0.648 and 1.017), while for poor countries the coefficient is between 0.025 and 0.345 and is not statistically different from zero. There are fewer aircraft in the poor countries sample compared to the rich countries sample (35,313 vs. 177,937 observations). However, since we cluster the standard errors by country and given that there are roughly the same number of countries in each of the subsamples, the results are not driven by lack of statistical power in the poor countries’ regressions. Column 7 of Table 7 focuses on non-leased aircraft of rich countries excluding all U.S. aircraft, which account for 37.58% of all aircraft in our sample. We exclude U.S. aircraft as a robustness check to verify that our results are not driven by specific characteristics of U.S. airlines that are potentially correlated with
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
323
Table 8 Creditor rights, government ownership, and aircraft vintage. The dependent variable is aircraft age or broad technological age in either non-leased or leased aircraft. GDP is the natural logarithm of real GDP, GDP growth is the annual growth rate of GDP, GDP per capita is the natural logarithm of real GDP per capita, GDP per capita growth is the annual growth rate of GDP per capita. Population is the natural logarithm of the population. Area is the natural logarithm of the country surface in sq. km. Government is a dummy variable that equals one if the airline operating the aircraft is at least partially owned by the government and equals zero otherwise. Creditor rights is an index aggregating creditor rights, following Djankov, McLiesh, and Shleifer (2007). The index ranges from zero (weak creditor rights) to four (strong creditor rights) and is constructed as of January for every year from 1978 to 2003. Columns without country or operator fixed-effects also include dummies for French legal origin, German legal origin, Nordic legal origin, and Socialist legal origin (not reported for brevity). All regressions include an intercept (not reported) and year fixed-effects. Standard errors are clustered by country and reported in parentheses. a, b, and c denote statistical significance at the 1%, 5%, and 10% levels, respectively.
GDP GDP growth GDP per capita GDP per capita growth Population Area Government Creditor rights
Non-leased age
Leased age
Non-leased age
Leased age
Non-leased age
Leased age
Non-leased tech age (broad)
Leased tech age (broad)
0.384 (0.270) 0.203b (0.102) 1.326a (0.386) 0.128 (0.101) 1.398a (0.495) 0.824a (0.287) 2.760a (0.431) 1.067a (0.357)
0.122 (0.131) 0.144b (0.055) 0.896b (0.324) 0.029c (0.016) 1.074a (0.340) 0.844a (0.195) 2.676a (0.826) 0.297 (0.236)
0.511c (0.274) 0.084b (0.037) 3.356a (0.722) 0.064a (0.024) 10.134a (2.780) 28.776 (20.156) 2.797a (0.390) 0.704 (0.490)
0.021 (0.143) 0.097a (0.028) 2.103a (0.583) 0.015a (0.004) 7.614c (3.959) 3.363 (7.745) 1.792b (0.895) 0.282 (0.440)
0.330c (0.178) 0.048 (0.036) 2.904a (0.958) 0.063b (0.029) 4.790b (2.226) 3.785b (1.646) 1.507 (0.973) 0.619b (0.298)
0.113 (0.138) 0.069a (0.026) 2.322 (0.530) 0.016a (0.002) 2.174c (1.288) 1.285c (0.701) 0.571 (0.833) 0.249 (0.404)
0.294 (0.286) 0.196c (0.104) 1.611a (0.377) 0.106 (0.105) 1.312a (0.469) 0.771b (0.303) 2.304a (0.430) 0.801b (0.339)
0.036 (0.120) 0.139b (0.058) 1.443a (0.280) 0.021 (0.018) 0.833b (0.407) 0.794a (0.260) 2.495a (0.925) 0.0001 (0.218)
Yes No No
Yes No No
Yes Yes No
Yes Yes No
Yes No Yes
Yes No Yes
Yes No No
Yes No No
128 3,750 0.13 213,250
129 3,600 0.06 159,647
128 3,750 0.20 213,250
129 3,600 0.11 159,647
128 3,750 0.52 213,250
129 3,600 0.53 159,647
128 3,750 0.18 213,250
129 3,600 0.07 159,647
Fixed-effects Year Country Operator # of Countries # of Operators Adjusted R2 Observations
creditor rights. The coefficient of creditor rights is very similar to our previous estimates that include the U.S. and thus, our results are not driven by specific features of the U.S. airline industry. Taken together, our results thus suggest that creditor rights are more important in richer countries which is consistent with Djankov, McLiesh, and Shleifer (2007) who find that creditor rights have an impact on credit markets only in richer countries. In contrast, creditor rights in poorer countries have little impact on credit market development, possibly due to their lack of enforcement. We now turn to test the roles that creditor rights and contract enforcement play in the alleviation of financial constraints and the resultant investment in vintage capital. We follow Djankov, La Porta, Lopez-de-Silanes, and Shleifer (2003) and use the enforceability of contracts index developed by Business Environmental Risk Intelligence. The index is on the scale of zero (low enforcement) to ten (high enforcement) and is defined as the ‘‘The relative degree to which contractual agreements are honored and complications presented by language and mentality differences.’’27 The index is available as a cross section of
27
See Knack and Keefer (1995) for the exact definition.
52 countries (37 rich and 15 poor) for which we have microlevel data on aircraft. Since the index averages enforcement during the 1990s and early 2000s, we restrict the analysis in regressions using contract enforcement to data for aircraft-years from 1990 onwards. The mean enforcement index in our sample is 7.48, with rich countries having a higher level of enforcement compared to poor countries (7.48 compared to 4.65 with the difference significant at the 1% level). As the last two columns of Table 7 clearly show, creditor rights has a strong and significant negative relation with aircraft age in rich countries while enforcement is not statistically significant. In contrast, in poor countries, enforcement is more important than creditor rights. An increase of one point in the enforcement index lowers aircraft age by 3.9 years in poor countries while the effect of creditor rights is not statistically significant. The results in the last two columns of Table 7 support the conjecture that creditor rights in poorer countries have little impact as they are unlikely to be enforced.
5.6. Robustness tests: government ownership and regulation We now turn to check the robustness of our results in several ways. First, since many commercial airlines are
324
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
either fully or partially state-owned, we need to control directly for government ownership in our regressions. One concern about government ownership is that it is correlated with creditor rights and hence potentially biasing our point estimates. Similarly, differences in aviation regulation may influence aircraft vintage while at the same time be correlated with a country’s creditor rights index. Thus, we need direct measures of both government ownership at the airline level, and aviation regulation quality at the country level. 5.6.1. Government ownership, creditor rights, and aircraft vintage The financing channel would predict that commercial airlines with government ownership can utilize the government as a source of capital to ease financing constraints. Commercial airlines wholly or partially owned by the government may have a ‘‘soft’’ budget constraint and as a result should have fleets of younger vintage. Furthermore, governments may be willing to invest more in new aircraft in their ‘‘flag carriers’’ as they represent the country internationally. According to Littlejohns and McGairl (1998): ‘‘Because a new aircraft symbolizes not only the nation’s prestige, but also the skill of its leaders, it is easy for politicians to value these symbols far above mere prudence.’’28 We collect data on government ownership in airlines from Ascend CASE, and supplement it with information from airline Web sites and Lexis-Nexis to construct a dummy variable taking on the value of one for airlines with some government ownership in a particular year, and zero otherwise. We then run regressions similar to the specification in regression (4) for all commercial aircraft with the government ownership dummy as an additional control. The sample is divided into leased and non-leased aircraft, and results are presented in Table 8. First, as hypothesized, and consistent with a financing channel for vintage capital, government ownership is negatively related to aircraft vintage, suggesting that governments do indeed relieve some of the financial constraints of the airlines which they own. Moreover, even after controlling for government ownership, creditor rights is negatively related to both aircraft age and technological age of non-leased aircraft. The coefficients of creditor rights in the different specifications (between 0.619 and 1.067) are generally higher than those found in the panel data regressions. 5.6.2. Aviation regulation, creditor rights, and aircraft vintage As an additional robustness test, in Table 9, we add to the regressions in Table 8 a measure of the quality of aviation regulation for country c at year t. We construct this measure using information from the Federal Aviation Administration (FAA) that ranks the overall quality of a country’s civil aviation authority. FAA inspectors assess civil aviation authorities around the world based on 28
Littlejohns and McGairl (1998, p. 216).
their authority to license and oversee air carriers in accordance with International Civil Aviation Organization (ICAO) aviation safety standards. The FAA classifies countries into two: those that are compliant with ICAO standards and those that are not compliant with ICAO standards.29 We were able to obtain data on FAA classifications for 67 countries for the years 1994–2003, resulting in a total of 188,142 aircraft. We construct a dummy variable measuring aviation quality which takes on the value of one for countries that comply with ICAO standards in a particular year, and zero for those that fail to comply. We then run similar specifications to regression (4) adding the aviation quality dummy as an explanatory variable.30 Column 1 reports results using the sample of all the commercial aircraft, in columns 2 and 3 the sample is divided into leased and non-leased aircraft, and in column 4 we use only military aircraft. The first four columns of Table 9 confirm our previous findings. After controlling for both government ownership and aviation regulation quality, it is only the vintage of non-leased commercial aircraft which is related to the creditor rights index while the vintages of leased commercial aircraft and military aircraft are not. Furthermore, controlling for government ownership and aviation regulation increases the impact of creditor rights as compared to our previous estimates— moving from a creditor rights score of zero to a creditor rights score of four reduces non-leased commercial aircraft age by 7.38 years. In the last two columns of Table 9, we test the significance of the difference between the coefficient on creditor rights in: (i) the military and commercial nonleased aircraft subsamples; and (ii) the commercial leased and non-leased aircraft subsamples. To do so, we first add an interaction coefficient between the creditor rights score and a dummy variable capturing whether an aircraft is a military aircraft to the previous regressions (columns 1–4). As can be seen, the interaction coefficient is positive and significant at the 10% level, indicating that the negative relation between creditor rights and the vintage of non-leased aircraft is statistically different than that between creditor rights and the vintage of military aircraft. In column 6 we add an interaction term between creditor rights and a dummy variable taking on the value of one if an aircraft is leased. Again, the positive coefficient on the interaction term demonstrates that the difference between the coefficients on the creditor rights variable in the leased and non-leased subsamples (columns 2 and 3) is statistically significant.
29 According to the FAA, a country fails to comply if one the following deficiencies are identified: (i) lack of laws or regulations necessary to support the certification and oversight of air carriers; (ii) lack of the technical expertise, resources, and organization to license or oversee air carrier operations; (iii) lack of adequately trained and qualified technical personnel; (iv) lack of enforcement or compliance with, minimum international standards; and (v) insufficient documentation and records of certification of air carrier operations. 30 We do not include operator or country fixed-effects in these regressions since the aviation quality measure hardly changes over time within a country.
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
325
Table 9 Aircraft age and aviation regulation. The dependent variable is aircraft age of commercial non-leased aircraft, commercial leased aircraft, or military aircraft. GDP is the natural logarithm of real GDP, GDP growth is the annual growth rate of GDP, GDP per capita is the natural logarithm of real GDP per capita, GDP per capita growth is the annual growth rate of GDP per capita. Population is the natural logarithm of the population. Area is the natural logarithm of the country surface in sq. km. Government is a dummy variable that equals one if the airline operating the aircraft is at least partially owned by the government and equals zero otherwise. Aviation is a dummy variable that equals one for aircraft in countries that comply with ICAO standards and equals zero otherwise. Military is a dummy variable that equals one for military aircraft and zero otherwise. Leased is a dummy variable that equals one for leased aircraft and zero otherwise. Creditor rights is an index aggregating creditor rights, following Djankov, McLiesh, and Shleifer (2007). The index ranges from 0 (weak creditor rights) to 4 (strong creditor rights) and is constructed as of January for every year from 1978 to 2003. All regressions include an intercept (not reported), legal origins dummies, and year fixed effects. Standard-errors are clustered by country and reported in parentheses. a, b and c denote statistical significance at the 1%, 5%, and 10% levels, respectively.
GDP GDP growth GDP per capita GDP per capita growth Population Area Government Aviation Creditor rights
Commercial leased and non-leased age
Commercial non-leased age
Commercial leased age
Military age
Military and commercial all age
Military and commercial non-leased age
Commercial leased and non-leased age
0.026 (0.256) 0.170 (0.108) 1.134a (0.506) 0.045 (0.033) 1.567a (0.467) 1.353a (0.313) 3.854a (0.744) 3.277a (1.079) 0.820b (0.388)
0.255 (0.425) 0.133 (0.139) 1.396b (0.605) 0.049 (0.070) 1.915a (0.623) 1.310a (0.342) 4.470a (0.783) 3.858a (1.138) 1.844a (0.522)
0.161 (0.176) 0.209a (0.071) 0.640 (0.443) 0.035c (0.019) 1.114a (0.321) 1.016a (0.236) 3.564a (1.001) 2.983b (1.194) 0.290 (0.274)
0.175 (0.173) 0.091 (0.099) 0.067 (0.386) 0.150 (0.095) 1.157a (0.381) 0.745a (0.282)
0.094 (0.240) 0.113 (0.087) 0.844b (0.414) 0.055 (0.037) 1.468a (0.403) 1.244a (0.261) 3.627a (0.698) 2.453a (0.926) 0.800b (0.356) 4.122a (1.334) 0.601 (0.958)
0.088 (0.321) 0.100 (0.095) 0.915b (0.449) 0.081 (0.066) 1.683a (0.468) 1.168a (0.260) 4.067a (0.757) 2.307b (0.993) 1.780a (0.465) 0.995 (1.224) 1.604c (0.843)
0.030 (0.272) 0.162 (0.108) 1.039b (0.504) 0.039 (0.028) 1.522a (0.451) 1.185a (0.285) 4.119a (0.691) 3.397a (1.083) 1.837a (0.461)
0.745 (1.58) 0.171 (0.829)
Military Military creditor rights Leased
5.444a (0.439) 1.556a (0.246)
Leased creditor rights Fixed-effects Year # of Countries # of Operators Adjusted R2 Observations
Yes
Yes
Yes
Yes
Yes
Yes
Yes
67 3,254 0.10 188,142
67 2,226 0.13 93,871
67 2,170 0.06 94,271
64 520 0.04 51,696
67 3,774 0.13 239,838
67 2,746 0.11 145,567
67 3,254 0.13 188,142
5.7. Creditor rights, financial constraints, and aircraft vintage We now turn to analyze the effect of creditor rights on aircraft age conditional on the financial condition of the operator. According to Prediction 3 of the model, since airlines with greater internal funds are less likely to rely on external financing, they should be less affected by the legal system in which they operate or local financial development. Thus, we expect the effect of creditor rights on aircraft age to be larger for more financially constrained airlines. Similar to the previous section, testing this prediction alleviates the concern that creditor rights are positively correlated with unobserved investment opportunities, and
that it is this correlation which is driving the negative relation between creditor rights and aircraft vintage. This is because there is little reason to suspect that increases in creditor rights are more strongly correlated with improved investment opportunities in financially constrained firms as compared to financially unconstrained firms. To test this prediction, we obtain information on airline financial data from Compustat Global. We are able to match 67 airlines from 28 countries to the countries covered by Djankov, McLiesh, and Shleifer (2007), representing a panel of 63,036 non-military aircraft. We then employ in our regression specification interaction terms between the country’s creditor rights index and airlinelevel measures of financial distress. Our approach is similar to Rajan and Zingales (1998) who identify
326
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
Table 10 Creditor rights, leverage and aircraft age: leased vs. non-leased aircraft. The dependent variable is aircraft age of either leased or non-leased aircraft. GDP is the natural logarithm of real GDP, GDP growth is the annual growth rate of GDP, GDP per capita is the natural logarithm of real GDP per capita, GDP per capita growth is the annual growth rate of GDP per capita. Population is the natural logarithm of the population. Area is the natural logarithm of the country surface in sq. km. Government is a dummy variable that equals one if the airline operating the aircraft is at least partially owned by the government and equals zero otherwise. Aviation is a dummy variable that equals one for aircraft in countries that comply with ICAO standards and equals zero otherwise. Creditor rights is an index aggregating creditor rights, following Djankov, McLiesh, and Shleifer (2007). The index ranges from zero (weak creditor rights) to four (strong creditor rights) and is constructed as of January for every year from 1978 to 2003. Leverage is total debt divided by total assets. LT Debt is long-term debt divided by total assets. Columns 1,2,5 and 6 also include dummies for French legal origin, German legal origin, Nordic legal origin, and Socialist legal origin (not reported for brevity). All regressions include an intercept (not reported) and year fixed-effects. Standard errors are clustered by country and reported in parentheses. a, b, and c denote statistical significance at the 1%, 5%, and 10% levels, respectively.
GDP GDP growth GDP per capita GDP per capita growth Population Area Government Aviation Creditor rights Leverage Creditor rights leverage LT debt
Non-leased age
Leased age
Non-leased age
Leased age
Non-leased age
Leased age
Non-leased age
Leased age
0.982b (0.437) 0.102 (0.108) 0.025 (0.839) 0.023 (0.016) 0.119 (0.332) 0.399 (0.255) 1.507 (1.378) 8.643 (3.240) 0.426 (0.686) 6.581b (3.224) 3.158b (1.378)
0.328c (0.179) 0.023 (0.036) 0.994a (0.285) 0.020a (0.005) 0.614b (0.264) 0.023 (0.206) 0.123 (0.556) 6.415a (1.548) 0.029 (0.572) 4.330a (1.546) 0.279 (0.946)
0.839c (0.440) 164c (0.084) 0.022 (2.366) 0.020 (0.029) 14.966 (17.276) 7.281 (63.774) 1.537 (1.042) 4.474a (1.399) 1.342 (0.793) 9.038a (2.923) 5.067a (1.626)
0.261 (0.213) 0.011 (0.036) 1.404 (2.154) 0.020a (0.003) 9.332 (9.441) 102.413a (5.306) 1.421 (1.232) 1.614a (0.419) 0.275 (0.854) 3.947b (1.538) 0.204 (1.020)
0.942a (0.431) 0.089 (0.114) 0.016 (0.836) 0.023 (0.015) 0.085 (0.310) 0.404 (0.258) 1.564 (1.444) 8.969b (3.303) 0.315 (0.783)
0.396b (0.184) 0.025 (0.036) 0.944a (0.294) 0.021a (0.006) 0.654b (0.251) 0.046 (0.203) 0.020 (0.571) 6.880a (1.569) 0.115 (0.511)
0.852c (0.432) 0.171c (0.087) 0.269 (2.339) 0.023 (0.030) 16.029 (16.878) 4.121 (62.968) 1.594 (1.068) 4.606a (1.467) 0.910 (0.805)
0.325 (0.220) 0.008 (0.038) 1.203 (2.255) 0.019a (0.003) 9.234 (9.266) 97.970a (5.591) 1.420 (1.258) 1.630a (0.428) 0.104 (0.789)
5.719 (4.381) 3.212c (1.782)
3.786a (1.350) 0.116 (0.899)
9.412b (3.648) 5.737b (2.178)
3.140b (1.399) 0.510 (1.007)
Creditor rights LT debt Fixed-effects Year Country # of Countries Adjusted R2 Observations
Yes No
Yes No
Yes Yes
Yes Yes
Yes No
Yes No
Yes Yes
Yes Yes
28 0.11 33,652
28 0.05 29,384
28 0.13 33,652
28 0.06 29,384
28 0.11 33,652
28 0.04 29,384
28 0.13 33,652
28 0.06 29,384
the effects of financial development on growth using interaction terms between financial development (at the country level) and financial dependence (at the industry level). Our analysis focuses on two measures of financial constraints: leverage and long-term debt, both used by Eisfeldt and Rampini (2007), which were found empirically to be determinants of used capital investment. We obtain similar results using other measures such as profitability. We estimate the following regression for both leased and non-leased aircraft separately: Vintageiact ¼ a Creditor rightsct þ b FinConst act þ g Creditor rightsct FinConst act þ Xct l þ yt y þzac c þ eiact ,
ð6Þ
where FinConstact is a measure of the airline financial constraints (either leverage defined as total debt divided by the book value of assets, or long-term debt defined as long-term debt divided by the book value of assets), and
Creditor rightsct Fin Constact is an interaction term between creditor rights and airline financial constraints. Regressions are estimated with heteroskedasticity-robust standard errors clustered by country. Results using aircraft age as the dependent variable are presented in Table 10.31 Consistent with the financing channel, the interaction term between creditor rights and leverage and the interaction term between creditor rights and long-term debt in the non-leased aircraft subsample (columns 1, 3, 5, 7) are negative, indicating that the effect of creditor rights on aircraft age is indeed concentrated in financially constrained airlines. In contrast, consistent with our previous results and consistent with a financing channel, we do not find a
31 We obtain similar results using aircraft technological age as the dependent variable. We do not report these results for brevity.
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
statistically significant interaction coefficient between creditor rights and measures of financial constraints in the leased aircraft subsample. While both leverage and longterm debt are clearly endogenous, our identification strategy in Table 10 relies on the interaction between country and firm characteristics. By focusing on interaction effects, we reduce the number of potential alternative explanations for our findings. Focusing on the first column of Table 10, we find that reducing a country’s level of investor protection from a creditor rights score of four to a creditor rights score of zero, increases the average age of aircraft operated by airlines in the 25th percentile of leverage by 0.95 years. In contrast, for airlines in the 75th leverage percentile, i.e., those that are arguably more financially constrained, we find that reducing creditor rights from a score of four to a score of zero increases average age by four times as much—3.98 years, representing a 30% increase compared to the sample-wide average of aircraft age.
5.8. Creditor rights and fleet size We now analyze the relation between creditor rights and fleet size. According to Prediction 2 of the model,
327
firms operating in countries with better creditor rights should operate larger fleets, on average. This is because operator scale will not be constrained by the availability and cost of external finance. This prediction is broadly consistent with the empirical findings in Kumar, Rajan, and Zingales (2002) who find that the average firm size is larger in countries with better institutional development. In order to test this prediction we need a measure of fleet size. This is somewhat complicated by the fact that airline fleets include multiple aircraft types of different size and use. Thus, a measure of fleet size must weigh aircraft of different varieties in an appropriate manner. Rather than committing to one particular weight system, we test Prediction 2 using a number of weighting schemes. To do so, for each aircraft type in our sample, we gather information on that aircraft type’s maximal seat capacity, its maximal takeoff weight, and the aircraft type’s wingspan. These data are gathered from Singfield (2005) as well as from a variety of Internet sources. Based on this information, for each operator and year in our sample, we then construct four measures of fleet size. The first is simply an equal-weighted sum of all aircraft operated. The remaining three measures of fleet size are: (1) the sum of the seat capacities of all aircraft in the fleet, (2) the sum of the maximal takeoff weight of all aircraft in
Table 11 Creditor rights and fleet size: commercial vs. military aircraft. The dependent variable is fleet size defined as the logarithm of either (1) the sum of all aircraft operated, (2) the sum of the seat capacities of all aircraft in the fleet, (3) the sum of the maximal takeoff weight of all aircraft in the fleet, (4) the sum of the wingspans of all aircraft in the fleet. GDP is the natural logarithm of real GDP, GDP growth is the annual growth rate of GDP, GDP per capita is the natural logarithm of real GDP per capita, GDP per capita growth is the annual growth rate of GDP per capita. Population is the natural logarithm of the population. Area is the natural logarithm of the country surface in sq. km. Government is a dummy variable that equals one if the airline operating the aircraft is at least partially owned by the government and equals zero otherwise. Creditor rights is an index aggregating creditor rights, following Djankov, McLiesh, and Shleifer (2007). The index ranges from zero (weak creditor rights) to four (strong creditor rights) and is constructed as of January for every year from 1978 to 2003. All regressions include an intercept (not reported), year fixed-effects, and operator fixed-effects. Standard errors are clustered by country and reported in parentheses. a, b, and c denote statistical significance at the 1%, 5%, and 10% levels, respectively.
Size = GDP GDP growth GDP per capita GDP per capita growth Population Area Government Creditor rights
Commercial number
Military number
Commercial seats
Military seats
Commercial weight
Military weight
Commercial wings
Military wings
0.024b (0.011) 0.006 (0.004) 0.227a (0.056) 0.001 (0.003) 0.047 (0.133) 0.106 (0.105) 0.007 (0.027) 0.055b (0.027)
0.022 (0.097) 0.006a (0.036) 0.328b (0.129) 0.002b (0.001) 0.047 (0.213) 0.360 (0.286)
0.116b (0.052) 0.001 (0.005) 0.597 (0.477) 0.007b (0.003) 0.113 (0.763) 0.411 (0.923)
0.029 (0.126)
0.061b (0.025) 0.012 (0.009) 0.508a (0.129) 0.001 (0.006) 0.096 (0.298) 0.184 (0.251) 0.077 (0.234) 0.138b (0.059)
0.073 (0.044) 0.014a (0.005) 0.775b (0.309) 0.004c (0.002) 0.036 (0.574) 0.746 (0.726)
0.009 (0.118)
0.088b (0.038) 0.011 (0.010) 0.611a (0.166) 0.007a (0.006) 0.244 (0.337) 0.104 (0.297) 0.118 (0.214) 0.183b (0.091)
0.070 (0.055) 0.047b (0.006) 0.921b (0.439) 0.005c (0.003) 0.323 (0.921) 1.354 (1.091)
0.061 (0.049)
0.060b (0.026) 0.011 (0.020) 0.616a (0.154) 0.0003 (0.007) 0.217 (0.324) 0.202 (0.271) 0.310 (0.218) 0.182a (0.062)
Yes Yes
Yes Yes
Yes Yes
Yes Yes
Yes Yes
Yes Yes
Yes Yes
Yes Yes
129 5,284 0.85 32,913
110 616 0.93 6,164
129 5,284 0.87 32,913
110 616 0.91 6,164
129 5,284 0.83 32,913
110 616 0.92 6,164
129 5,284 0.86 32,913
110 616 0.94 6,164
0.058 (0.101)
Fixed-effects Year Operator # of Countries # of Operators Adjusted R2 Observations
328
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
the fleet, and (3) the sum of the wingspans of all aircraft in the fleet. Having constructed these four fleet-size measures, we then run the following regression specification for all operator fleets in our sample period of 1978–2003: logðSizeact Þ ¼ a Creditor rightsct þ Xct l þyt y þ za c þ eiact : ð7Þ The dependant variable, log(Sizeact), is the logarithm of each of our four fleet-size measures for operator a in country c in year t. As usual, Creditor rights is the creditor rights score of country c in year t, and Xc is the standard vector of country-specific control variables. All regressions include year fixed-effects, yt, and operator fixedeffects represented by the vector of variables za. The regressions are estimated with heteroskedasticityrobust standard errors clustered by country. Finally, as in the case of aircraft age, regression (7) is estimated separately for commercial operators and military operators. The results are provided in Table 11. As our results demonstrate, using all four fleet-size proxies, the coefficient on the creditor rights index is consistently positive and statistically significant in the commercial operators’
regressions after controlling for GDP per capita, population, area, as well as year and operator fixed-effects. In contrast to commercial operators, and consistent with our previous results, there is no robust relation between creditor rights and fleet size of military operators. None of the creditor rights coefficients are statistically different from zero in any of the military operators regressions, and the point estimates in these regressions are always lower than their commercial regression counterparts (they are actually negative in three out of four regressions). Thus, consistent with Prediction 2 of the model, airlines in countries with higher creditor rights do indeed operate larger fleets. Given that we run a semi-log specification with respect to creditor rights, the coefficient of creditor rights is equal to the percentage change in fleet size associated with a unit change in creditor rights (dlog(Size)act/dCRct). This effect is economically significant. For example, moving from the lowest creditor rights score of zero, to the highest score of four, increases the number of aircraft in a commercial airline’s fleet by 22%. Moving from the lowest to the highest score of creditor rights increases total fleet seat capacity by 72.8%. The effect for the remaining two fleet size measures—total fleet maximal takeoff weight and total fleet wingspan—is 73.2% and 55.2%, respectively.
Table 12 Aircraft age and fleet size. The dependent variable is aircraft age of non-leased aircraft. Fleet size defined as the logarithm of either (1) the sum of all aircraft operated, (2) the sum of the seat capacities of all aircraft in the fleet, (3) the sum of the maximal takeoff weight of all aircraft in the fleet, (4) the sum of the wingspans of all aircraft in the fleet. GDP is the natural logarithm of real GDP, GDP growth is the annual growth rate of GDP, GDP per capita is the natural logarithm of real GDP per capita, GDP per capita growth is the annual growth rate of GDP per capita. Population is the natural logarithm of the population. Area is the natural logarithm of the country surface in sq. km. Government is a dummy variable that equals one if the airline operating the aircraft is at least partially owned by the government and equals zero otherwise. Creditor rights is an index aggregating creditor rights, following Djankov, McLiesh, and Shleifer (2007). The index ranges from zero (weak creditor rights) to four (strong creditor rights) and is constructed as of January for every year from 1978 to 2003. All regressions include an intercept (not reported), year fixed-effects and operator fixed-effects. Standard errors are clustered by country and reported in parentheses. a, b, and c denote statistical significance at the 1%, 5%, and 10% levels, respectively. Size =
Number
Number
Seats
Seats
Wings
Wings
Weight
Weight
Fleet size
1.231a (0.148) 0.407 (0.272) 0.080 (0.092) 0.581c (0.348) 0.088 (0.080) 0.221 (0.354) 0.051 (0.311) 1.665a (0.505) 0.980a (0.281)
1.286a (0.139) 0.476 (0.308) 0.072b (0.031) 3.611a (0.835) 0.051b (0.019) 8.592a (2.550) 39.458b (19.130) 1.018b (0.459) 0.653c (0.359)
0.428a (0.041) 0.398 (0.263) 0.067 (0.089) 0.600c (0.356) 0.081 (0.077) 0.276 (0.355) 0.027 (0.305) 1.551a (0.500) 0.951a (0.279)
0.433a (0.038) 0.472 (0.293) 0.078b (0.032) 3.451a (0.765) 0.054a (0.020) 8.688a (2.487) 40.613b (18.938) 0.958b (0.434) 0.669c (0.374)
0.521a (0.058) 0.403 (0.265) 0.071 (0.090) 0.574c (0.344) 0.083 (0.078) 0.233 (0.349) 0.026 (0.305) 1.588a (0.505) 0.956a (0.279)
0.533 (0.053) 0.474 (0.298) 0.077b (0.031) 3.573a (0.818) 0.052a (0.019) 8.557a (2.505) 38.865b (18.697) 0.979b (0.455) 0.655c (0.362)
0.311a (0.044) 0.408 (0.263) 0.069 (0.091) 0.733b (0.346) 0.085 (0.078) 0.417 (0.347) 0.064 (0.309) 1.911a (0.506) 0.930a (0.280)
0.311a (0.043) 0.493c (0.289) 0.081b (0.033) 3.439a (0.750) 0.057a (0.021) 8.916a (2.565) 36.789c (19.389) 1.442a (0.467) 0.676c (0.388)
Yes No
Yes Yes
Yes No
Yes Yes
Yes No
Yes Yes
Yes No
Yes Yes
129 0.18 213,250
129 0.23 213,250
129 0.18 213,250
129 0.23 213,250
129 0.17 213,250
129 0.23 213,250
129 0.17 213,250
129 0.22 213,250
GDP GDP growth GDP per capita GDP per capita growth Population Area Government Creditor rights Fixed-effects Year Country # of Countries Adjusted R2 Observations
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
In addition, we study the relation between fleet size and creditor rights conditional on the financial constraints that an airline is facing. In results not reported, we find no evidence that the positive relation between creditor rights and fleet size exhibited in Table 11 is concentrated amongst financially constrained airlines. This could stem from a selection bias arising from the fact that accounting data availability constrains our analysis to publicly traded airlines, which tend to be the largest airlines in each country. Alternatively, this could be driven by the positive relation which arises between leverage and fleet size when airlines use debt to purchase additional aircraft. As a final test, we analyze the relation between fleet age and fleet size. The model predicts that constrained firms should operate both smaller and older fleets. Therefore, we test whether smaller fleets are indeed comprised of older vintage aircraft. We regress aircraft age on the logarithm of our four measures of fleet size—the number of aircraft operated in the fleet, the sum of the seat capacities of all aircraft in the fleet, the sum of the maximal takeoff weight of all aircraft in the fleet, and the sum of the wingspans of all aircraft in the fleet—while including our standard control variables. The results are presented in Table 12. As can be seen, in all specifications we find a statistically significant negative relation between fleet size and aircraft age; as the model predicts, smaller fleets tend to operate older aircraft. This result is very much consistent with Eisfeldt and Rampini (2007) who show that smaller firms are more likely to employ used, as opposed to new, capital.
6. Conclusion We provide novel evidence linking creditor rights and vintage capital using a panel of aircraft-level data around the world. Consistent with theories that emphasize the protection of property rights as essential for economic development, we find that better creditor rights are associated with aircraft of a younger vintage and firms with larger aircraft fleets. Further, consistent with a financing channel, we find that the association between creditor rights and aircraft vintage is concentrated amongst non-leased commercial aircraft. Finally, we find that airlines with lower leverage ratios and airlines with less debt overhang, are less sensitive to creditor rights as they may use internal funds, rather than external capital, to finance investment. The evidence in our paper shows that legal protection of creditor rights affects both capital vintage, technological diffusion, and firm scale. Better creditor protection helps airlines to mitigate financial shortfalls and enhance investment in newer, more efficient, and more technologically advanced aircraft. While we study the relation between vintage aircraft and creditor rights, our results propose a broader link, not confined only to the airline industry, between investor protection, real corporate investment, and economic growth; legal protection of creditors facilitates the ability of firms to make large
329
capital investments, adapt advanced technologies, and fosters productivity.
Appendix A
Proof of Proposition 1. Define zðqnew Þ ¼ pold hðqnew Þ to be the price of obtaining income f(qnew) using a fleet of old-technology aircraft. For simplicity in what follows, we drop the subscript and denote qnew by q. If zuð0Þ Z 1, from the convexity of z we have that zðqÞ 4 q for all q 40 and the proposition trivially holds with m ¼ 0. Similarly, if zuð0Þ o1 and zðqÞ oq for all q 4 0, the proposition trivially holds with m ¼ 1. Assume then that zuð0Þ o 1 and that there exists a q 40 with z(q)= q. By the convexity of z, this q is unique, and we denote it by q*. Clearly zðqÞ oq for q oq and zðqÞ 4 q for q 4 q . Now, for any m we define the ‘‘financeable set’’ to be all q with cðqÞ r mf ðqÞ, where c(q)= min(z(q),q). Since f is concave, z is convex, f(0) = 0, and z(0)= 0, it is easy to see that for any m the financeable set equals [0,q] for some q. Now, define m to satisfy mf ðq Þ ¼ q . By definition of q*, the financeable set at m is ½0,q . Since in this region zðqÞ o q for all q which are financeable, a firm operating in a country with creditor protection m chooses the oldtechnology fleet. Similarly, for any m o m , the financeable set equals ½0,qu for some qu o q , so that again, zðqÞ oq for all financeable q. Thus, again, any firm operating in a country with creditor protection m o m chooses the old technology. Define quc to be the solution to the unconstrained problem Maxq ½f ðqÞcðqÞ. Since by assumption the new technology is preferred to the old when m ¼ 1, we have that c(quc)= quc, so that quc 4q . Define m to satisfy mf ðquc Þ ¼ quc . Clearly, for any m Z m , quc is financeable, so that a firm operating in a country with m 4 m chooses the unconstrained solution and hence the new technology. Now, for all m o m o m the financeable set equals ½0,qðmÞ for some q oqðmÞ o quc . Define VðmÞ to be the solution to the maximization problem of a firm operating in a country with creditor protection m o m o m . That is, V ¼ Maxq ½f ðqÞcðqÞ s.t. q rqðmÞ. Further, define V1 as the solution to the maximization problem Maxq ½f ðqÞcðqÞ s.t. q rq , and V2 ðmÞ as the solution to the maximization problem Maxq ½f ðqÞcðqÞ s.t. q r q oqðmÞ. Clearly, V ¼ max½V1 ,V2 ðmÞ and a firm in country m will choose the new technology iff V2 ðmÞ 4 V1 . Now, it is easy to see that V2 ðmÞ is increasing in m. Also, since the new technology is preferred to the old with no constraints, we have that V2 ðm Þ 4 V1 . Thus, since V1 is independent of m, there exists a m r m o m such that for all m r m r m , V1 Z V2 ðmÞ and for all m r m r m , V2 ðmÞ Z V1 . Thus, firms in countries with m r m r m choose the old technology, while firms in countries with m o m r m choose the new technology. This, combined with the fact that firms in countries with m r m choose the old technology, while
330
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
firms in countries with m Z m choose the new technology proves the proposition. & Proof of Propositions 2 and 3. Proposition 2 is a direct consequence of Proposition 1, and Proposition 3 is a direct consequence of the fact that as m tends to zero, the fleet size that is financeable under the constrained maximization problem tends to zero as well. & Proof of Proposition 4. Following the proof of Proposition 1, define q* to solve z(q)=q and define V1 as the solution to the maximization problem Maxq ½f ðqÞcðqÞ s.t. q r q . Note that to obtain V1, a firm will choose the old technology fleet since in the region q rq this is the efficient means of production (cðqÞ oq). Define now V2 ðqÞ as the solution to the maximization problem Maxq ½f ðqÞcðqÞ s.t. q rq oq. Since f is concave, V2 is increasing in q up to quc, the solution to the unconstrained problem. Further, since V2 ðquc Þ 4 V1 , i.e., the new technology is preferred to the old technology under the unconstrained solution, by continuity of V2 and by the fact that V2 ðq Þ r V1 there exists a q Zq such that V2(q**)= V1. Since V2 is increasing in the range [q**,quc), if q** is financeable, the optimal solution of the firm’s maximization problem, qopt, will satisfy qopt Zq . Since in this region the new-technology fleet is the efficient means of production (i.e., q ocðqÞ), the firm will then choose the new-technology fleet. In contrast, if q** is not financeable, then qopt will be smaller then q**. By definition of q**, V1 will then be greater than V2(qopt). In this case, therefore, the firm will choose the old-technology fleet to obtain V1. Consider now a firm operating in a country with creditor protection m. Suppose first that cðq Þ r mf ðq Þ, so that a firm with no internal wealth can finance q**. As described above, since q** is financeable, the firm will employ the new-technology fleet. Further, since any firm with positive internal wealth will also be able to finance q**, the proposition will then trivially hold with A ¼ 0. Consider now the case where cðq Þ 4 mf ðq Þ, so that q** is not financeable for a firm with no internal wealth. Define AðmÞ ¼ cðq Þmf ðq Þ. By definition, any firm with internal wealth A Z AðmÞ will be able to finance q**, and so, as above, will choose to employ the new-technology fleet. Finally, since q** is independent of m, the threshold level AðmÞ is decreasing in m. & A.1. Economies of scale and the convexity of the equivalency function In this subsection, we provide an analysis showing how economies of scale in the operation of new technology aircraft will drive the equivalency function to be convex. While we formulate the discussion by referring to aircraft, so long as a new technology provides greater economies of scale in operating costs as compared to an old technology, our results clearly generalize to other industries. Consider an airline deciding on its scale of operation and whether to use new- or old-technology aircraft. New and old aircraft differ only in their associated
cost structures. Marginal revenues associated with operating either an old or new fleet are assumed to be equal, for example, because passengers do not place much emphasis on the vintage of the aircraft in which they fly. Denote therefore by p(q) the marginal revenue associated with operating q aircraft, whether employing a new or old aircraft fleet. As usual, marginal revenue is decreasing in the quantity of aircraft operated, pu o0, for example, because when an airline operates a larger fleet, it will serve more marginal markets. Assume that operating a fleet of new aircraft involves economies of scale, for example, due to the lower maintenance costs of the associated new, more homogenous aircraft. In contrast, operating a fleet of oldtechnology aircraft does not exhibit such economies of scale, for example, because a fleet of old aircraft is more heterogenous which does not allow operating efficiencies in maintenance. Formally, we assume that the marginal cost of the old-technology aircraft is constant at cH, while the new technology’s marginal cost is c(q) with cu o0 and c(0)= cH. The profit from operating q new-technology aircraft is Rq the profit from operating h 0 ðpðxÞcðxÞÞ dx. Equivalently, Rh old-technology aircraft is 0 ðpðxÞcH Þ dx. The equivalency function h(q) is then defined by Z
q
ðpðxÞcðxÞÞ dx ¼ 0
Z
hðqÞ
ðpðxÞcH Þ dx:
ð8Þ
0
To investigate the conditions under which h is convex, we differentiate with respect to q to obtain ðpðqÞcðqÞÞ ¼ ðpðhðqÞÞcH ÞhuðqÞ:
ð9Þ
Rearranging yields huðqÞ ¼ ðpðqÞcðqÞÞ=ðpðhðqÞÞcH Þ. Since a firm will never choose to operate at a scale where marginal cost is greater than marginal revenue, we have that both numerator and denominator are greater than zero, implying that hu 4 0. Further, from (8), it is easy to see that hðqÞ 4q (i.e., a fleet of new aircraft is equivalent to a larger fleet of old aircraft), which together with cðqÞ r cH implies that hu 4 1. Differentiating (9) with respect to q once more and rearranging yields: huuðqÞ ¼
puðqÞcuðqÞpuðhðqÞÞðhuðqÞÞ2 : pðhðqÞÞcH
ð10Þ
Since the denominator is positive, the equivalency function h is convex when: puðqÞcuðqÞpuðhðqÞÞðhuðqÞÞ2 4 0:
ð11Þ
It is therefore easy to see that as the economies of scale associated with the new-technology fleet increase, i.e., cu becomes more negative, the right-hand side of Eq. (11) increases and the equivalency function h is more likely to be convex. Indeed, if the economies of scale are sufficiently large, h will be convex. The intuition is straightforward: because of the economies of scale, the relative advantage of the new aircraft fleet increases with its size. Therefore, as the size of a new aircraft fleet increases, the marginal increase in old aircraft required to replicate the new fleet will be ever increasing, i.e., h is convex.
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
331
Table B1 Countries with most and least aircraft. This table ranks countries based on the total number of aircraft in the sample. Share is total aircraft divided by total aircraft in the sample. Country Countries with most aircraft 1978–2003 1. United States 2. Russian Federation 3. United Kingdom 4. Canada 5. France 6. China 7. Brazil 8. Japan 9. Germany 10. Spain
Commercial
Military
Total
Share (%)
147,880 24,129 15,396 15,356 10,441 9,499 6,741 9,530 10,120 6,053
36,242 13,778 4,160 3,050 3,359 3,061 4,662 1,561 793 3,119
184,122 37,907 19,556 18,406 13,800 12,560 11,403 11,091 10,913 9,172
37.58 7.74 3.99 3.76 2.82 2.56 2.33 2.26 2.22 1.87
90 87 71 82 79 67 18 17 49 44
0 0 11 0 0 0 44 34 0 0
90 87 82 82 79 67 62 51 49 44
0.018 0.018 0.017 0.017 0.016 0.014 0.013 0.010 0.010 0.009
Countries with least aircraft 1978–2003 120. 121. 122-3. 122-3. 124. 125. 126. 127. 128. 129.
Macedonia Burundi Haiti Benin Rwanda Central African Republic Togo Niger Albania Bosnia and Herzegovina
As an aside, from the additional terms in (11), it is readily seen that while economies of scale create convexity, they are not required. Indeed, assume that both new and old aircraft have fixed marginal costs and that the marginal cost of new aircraft, cL is lower than that of old ones, cH, for example, because of new aircrafts’ higher fuel efficiency. From (11), the condition for the convexity of h becomes puðqÞpuðhðqÞÞðhuðqÞÞ2 4 0. A sufficient condition for convexity of h is then that the marginal revenue p is weakly concave. To see this, recall that hu 4 1, and that since hðqÞ 4q, we have that 0 4puðqÞ Z puðhðqÞÞ. The intuition is as follows. Since a new aircraft fleet has lower marginal cost, the ‘‘equivalent’’ old fleet will always be larger in size (i.e., hðqÞ 4 q). As the new fleet increases in size, the equivalent old fleet must increase as well, and further, it must do so at a faster rate than that of the new fleet (i.e., huðqÞ 4 q). This is due to two effects: first, new aircraft are (by assumption) advantaged in that they have lower marginal cost. Second, because the old fleet is larger than the new fleet, the marginal revenue that it obtains when it expands is smaller than that obtained by the expansion of the new fleet. To make up for these two disadvantages, when the new fleet expands, the equivalent old fleet must expand by a greater amount. Now, when the marginal revenue function is weakly concave, the severity of the second effect (i.e., of differential marginal revenues) increases with the size of the new fleet. This is because the difference between the size of the new fleet and its equivalent old fleet increases with the size of the new fleet (h(q) q increases). Since the marginal revenue is weakly concave, this implies that the difference in marginal revenue of the old and the new fleets will increase as well. Thus, since the severity of the
second effect increases with fleet size, the marginal amount by which the equivalent old fleet needs to expand is increasing in new fleet size—that is, the equivalency function h is convex.
Appendix B Table B1. References Acharya, V., Subramanian, K., 2009. Bankruptcy codes and innovation. Review of Financial Studies 22, 4949–4988. Acharya, V., Amihud, Y., Litov, L., 2008. Creditor rights and corporate risk-taking. Unpublished working paper, New York University. Air Transport Association, 2007. The Airline Handbook. Beck, T., Levine, R., Loayza, N., 2000. Finance and the sources of growth. Journal of Financial Economics 58, 261–300. Benhabib, J., Rustichini, A., 1991. Vintage capital, investment, and growth. Journal of Economic Theory 55, 323–339. Benmelech, E., Bergman, N., 2008. Liquidation values and the credibility of financial contract renegotiation: evidence from U.S. airlines. Quarterly Journal of Economics 123, 1635–1677. Benmelech, E., Bergman, N., 2009. Collateral pricing. Journal of Financial Economics 91, 339–360. Bergman, N., Nicolaievsky, D., 2007. Investor protection and the Coasian view. Journal of Financial Economics 84, 738–771. Braun, M., 2003. Financial contractibility and assets’ hardness. Unpublished working paper, Harvard University. Boucekkine, R., De la Croix, D., Licandro, O., 2008. Vintage capital. In: Blume, L., Durlauf, S. (Eds.), The New Palgrave Dictionary of Economics. Palgrave Macmillan, pp. 628–631. Bulow, J., Rogoff, K., 1989a. A constant recontracting model of sovereign debt. Journal of Political Economy 97, 155–178. Bulow, J., Rogoff, K., 1989b. Sovereign debt: is to forgive to forget? American Economic Review 79, 43–50 Chari, V., Hopenhayn, H., 1991. Vintage human capital, growth and the diffusion of new technology. Journal of Political Economy 99, 1142–1165.
332
E. Benmelech, N.K. Bergman / Journal of Financial Economics 99 (2011) 308–332
Demirguc-Kunt, A., Maksimovic, V., 1998. Law, finance, and firm growth. Journal of Finance 53, 2107–2137. Djankov, S., La Porta, R., Lopez-de-Silanes, F., Shleifer, A., 2003. Courts. Quarterly Journal of Economics 118, 453–517. Djankov, S., McLiesh, C., Shleifer, A., 2007. Private credit in 129 countries. Journal of Financial Economics 84, 299–329. Eisfeldt, A., Rampini, A., 2007. New or used? Investment with credit constraints. Journal of Monetary Economics 54, 2656–2681. Eisfeldt, A., Rampini, A., 2009. Leasing, ability to repossess, and debt capacity. Review of Financial Studies 22, 1621–1657. Esty, B., Megginson, W., 2003. Creditor rights, enforcement, and debt ownership structure: evidence from the global syndicated market. Journal of Financial and Quantitative Analysis 38, 37–59. Fisman, R., Love, I., 2004. Financial development and intersectoral allocation: a new approach. Journal of Finance 59, 2785–2807. Gavazza, A., 2010. Asset liquidity and financial contracts: evidence from aircraft leases. Journal of Financial Economics 95, 62–85. Greenwood, J., Hercowitz, Z., Krusell, P., 1997. Long-run implications of investment specific technological change. American Economic Review 87, 342–362. Guiso, L., Sapienza, P., Zingales, L., 2004. Does local financial development matter? Quarterly Journal of Economics 119, 929–969 Hsieh, C., 2001. Endogenous growth and obsolescence. Journal of Development Economics 66, 153–171. Jovanovic, B., 1998. Vintage capital and inequality. Review of Economic Dynamics 1, 497–530. King, R., Levine, R., 1993. Finance and growth: Schumpeter might be right. Quarterly Journal of Economics 108, 717–738. Kloek, T., 1981. OLS estimation in a model where a microvariable is explained by aggregates and contemporaneous disturbances are equicorrelated. Econometrica 49, 205–207. Knack, S., Keefer, P., 1995. Institutions and economic performance: crosscountry tests using alternative institutional measures. Economics and Politics 7, 207–227.
Kumar, K., Rajan, R., Zingales, L., 2002. What determine firm size? Unpublished working paper, University of Chicago. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., Vishny, R., 1997. Legal determinants of external finance. Journal of Finance 52, 1131–1150. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., Vishny, R., 1998. Law and finance. Journal of Political Economy 106, 1113–1155. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., Vishny, R., 2008. The economic consequences of legal origin. Journal of Economics Literature 46, 285–332. Lerner, J., Schoar, A., 2005. Does legal enforcement affect financial transactions? The contractual channel in private equity. Quarterly Journal of Economics 120, 223–246. Liberty, J., Mian, A., 2010. Collateral spread and financial development. Journal of Finance 65, 147–178. Littlejohns, A., McGairl, S. (Eds.), 1998. Aircraft Financing, third ed., Euromoney Books, England. Morrel, P., 2007. Airline Finance, second ed. Ashgate, USA. Ongena, S., Smith, D., 2000. What determines the number of bank relationships? Cross-country evidence. Journal of Financial Intermediation 9, 26–56. Qian, J., Strahan, P., 2007. How law and institutions shape financial contracts: the case of bank loans. Journal of Finance 62, 2803–2834. Rajan, R., Zingales, L., 1998. Financial dependence and growth. American Economic Review 88, 559–586. Solow, R., 1960. Investment and technological progress. In: Arrow, K., Karlin, S., Suppes, P. (Eds.), Mathematical Methods in Social Sciences, vol. 1959. Stanford University Press, pp. 89–104. Shleifer, A., 2003. Will the sovereign market survive? American Economic Review 93, 85–90. Singfield, T., 2005. Airliners Worldwide, second ed. Midland Publishing. Wurgler, J., 2000. Financial markets and the allocation of capital. Journal of Financial Economics 58, 187–214.
Journal of Financial Economics 99 (2011) 333–348
Contents lists available at ScienceDirect
Journal of Financial Economics journal homepage: www.elsevier.com/locate/jfec
Diversification disasters$ Rustam Ibragimov a,1, Dwight Jaffee b,2, Johan Walden b, a b
Department of Economics, Harvard University, Littauer Center, 1875 Cambridge St., Cambridge, MA 02138, USA Haas School of Business, University of California at Berkeley, 545 Student Services Building #1900, CA 94720-1900, USA
a r t i c l e in fo
abstract
Article history: Received 20 November 2009 Received in revised form 8 February 2010 Accepted 10 March 2010 Available online 17 September 2010
The recent financial crisis has revealed significant externalities and systemic risks that arise from the interconnectedness of financial intermediaries’ risk portfolios. We develop a model in which the negative externality arises because intermediaries’ actions to diversify that are optimal for individual intermediaries may prove to be suboptimal for society. We show that the externality depends critically on the distributional properties of the risks. The optimal social outcome involves less risksharing, but also a lower probability for massive collapses of intermediaries. We derive the exact conditions under which risk-sharing restrictions create a socially preferable outcome. Our analysis has implications for regulation of financial institutions and risk management. & 2010 Elsevier B.V. All rights reserved.
JEL classification: G20 G21 G28 Keywords: Financial crisis Financial institutions Systemic risk Limits of diversification
1. Introduction It is a common view that the interdependence of financial institutions, generated by new derivative products, had a large role in the recent financial meltdown. A primary mechanism is that by entering into sophisticated
$ We thank Greg Duffee, Paul Embrechts, Vito Gala, Todd Keister, seminar participants at the Wharton Symposium on the Measurement of Low Probability Events in the context of Financial Risk Management, April 16–17, 2009, the Second Annual NHH Symposium on Extreme Events, Bergen Norway, May 16th 2009, the 2010 FIRS Conference on Banking, Insurance and Intermediation, and at the 2010 meetings of the Western Finance Association. Ibragimov and Walden thank the NUS Risk Management Institute for support. Ibragimov also gratefully acknowledges partial support provided by the National Science Foundation grant SES-0820124 and a Harvard Academy Junior Faculty Development grant. Corresponding author. Tel.: + 1 510 643 0547; fax: + 1 510 643 1420. E-mail addresses:
[email protected] (R. Ibragimov),
[email protected] (D. Jaffee),
[email protected] (J. Walden). 1 Tel.: + 1 203 887 1631; fax: + 1 617 495 7730. 2 Tel.: + 1 510 642 1273; fax: + 1 510 643 7441.
0304-405X/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jfineco.2010.08.015
derivative contracts, e.g., credit default swaps (CDS) and collateralized debt obligations (CDO), financial institutions became so heavily exposed to each others’ risks that when a shock eventually hit, it immediately spread through the whole system, bringing down critical parts of the financial sector. Since it created this systemic failure, the interdependence was extremely costly from a social standpoint.1 There is an opposite viewpoint, however, namely that such interdependence plays a much more functional role—that of diversification. To motivate this in a simple way, start with a case in which each financial firm holds a particular risk class with its unique idiosyncratic risk. Now allow the firms to form a joint mutual market portfolio, with each firm contributing its risky portfolio to
1 This view is, e.g., expressed in Warren Buffet’s 2009 letter to the shareholders of Berkshire Hathaway in which Mr. Buffet argues that the result is that ‘‘a frightening web of mutual dependence develops among huge financial institutions.’’ A similar view is expressed in the academic literature in Jaffee (2009).
334
R. Ibragimov et al. / Journal of Financial Economics 99 (2011) 333–348
the total and receiving back its proportional share of the total. Given a sufficient variety and number of risk classes, the firms may succeed in eliminating the idiosyncratic risks embedded in their individual portfolios. This is in line with the classical view in finance, that risksharing—i.e., diversification—is always valuable (see, e.g., Samuelson, 1967). Therefore, interdependence is valuable and, indeed, what we should expect. In practice, we may expect both effects to be present, i.e., by sharing risks, intermediaries decrease the risk of individual failure, but increase the risk of massive, systemic failure.2 Which factors determine the risks of systemic failures of financial institutions and the benefits of diversification? When do the risks outweigh the benefits? What are the policy implications of such a trade-off? In this paper, we analyze these questions by introducing a parsimonious model that combines the two aspects of diversification in an integrated analysis. Along the lines of the previous viewpoints, while individual institutions may have an incentive to diversify their risks, diversification creates a negative externality in the form of systemic risk. If all intermediaries are essentially holding the same diversified portfolio, a shock may disrupt all the institutions simultaneously, which is costly to society, since it may take time for the financial system, and thereby the economy, to recover. Specifically, the slow recovery time creates a significant and continuing social cost because the unique market-making and information analysis provided by banks and other intermediaries3 is lost until they recover; see Bernanke (1983). Indeed, Bernanke’s concern with the social cost created by bank failures appears to have motivated many of the government bank bailouts. We show that the costs and benefits of risk-sharing are functions of five properties of the economy. First, the number of asset classes is crucial: The fewer the number of distinct asset classes that are present, the weaker the case for risk-sharing. Second and third, the correlation between risks within an asset class, and the heavytailedness of the risks are important. The higher the correlation and the heavier the tails of the risk distribution, the less beneficial risk-sharing is. Fourth, the longer it takes for the economy to recover after a systemic failure, the more costly risk-sharing is and, fifth, lower discount rates also work against risk-sharing. We define the diversification threshold to be the threshold at which the cost to society of systemic failure begins to exceed the private benefits of diversification, and we derive a formula for the threshold as a function of these five properties. The distributions of the risks that intermediaries take on are key to our results. When these risks are thin-tailed, risk-sharing is always optimal for both individual intermediaries and society. But, with moderately heavy-tailed risks, risk-sharing may be suboptimal for society, although individual intermediaries still benefit from it.
2 That the risk of massive failure increases when risks are shared was noted within a finance context by Shaffer (1994). 3 Our analysis applies to banks, but more broadly to general financial intermediaries, like pension funds, insurance companies, and hedge funds.
In this case, the interests of society and intermediaries are unaligned. For extremely heavy-tailed risks, intermediaries and society once again agree, this time that risksharing is suboptimal. One can argue that the focus of our study, moderately heavy-tailed distributions, is the empirically interesting case to study. It is well-known that diversification may be suboptimal in the extremely heavy-tailed case,4 and some risks may indeed have extremely heavy tails (e.g., catastrophic losses, in which case individual insurers may withdraw from the market precisely because the benefits of diversification are unavailable; see Ibragimov, Jaffee, and Walden, 2009). However, arguably most financial risks are moderately heavy-tailed, as shown by several empirical studies in recent years. Specifically, the rate at which a distribution decreases for large values, the so-called tail exponent, a (which we rigorously define in the paper), provides a useful classification of heavytailedness. Risks with a o1 are extremely heavy-tailed, whereas risks with 1 o a o1 are moderately heavytailed, and when a ¼ 1, they are thin-tailed. Many recent studies argue that the tail exponents in heavy-tailed models typically lie in the interval 2 o a o5 for financial returns on various stocks and stock indices (see, among others, Jansen and de Vries, 1991; Loretan and Phillips, 1994; Gabaix, Gopikrishnan, Plerou, and Stanley, 2006; Gabaix, 2009). Among other results, Gabaix, Gopikrishnan, Plerou, and Stanley (2006) and Gabaix (2009) provide theoretical results and empirical estimates that support heavy-tailed distributions with tail exponents a 3 for financial returns on many stocks and stock indices in different markets.5 Our analysis has implications for risk management and policies to mitigate systemic externalities. We show that value at risk (VaR) considerations lead individual intermediaries to diversify, as per incentives similar to those in the Basel bank capital requirements. Within our framework, however, the diversification actions may lead to suboptimal behavior from a societal viewpoint. It then becomes natural to look for devices that would allow individual firms to obtain the benefits of diversification, but without creating a systemic risk that could topple the entire financial system. In Section 4, we provide a framework to develop such solutions and provide specific proposals. Our paper is related to the recent, rapidly expanding literature on systemic risk and market crashes. The closest paper is Acharya (2009). Our definition of systemic risk is similar to Acharya’s, as are the negative externalities of joint failures of intermediaries. The first and foremost
4 See Mandelbrot (1997), Fama (1965), Ross (1976), Ibragimov and Walden (2007), Ibragimov (2009b), the review in Ibragimov (2009a), and references therein. 5 As discussed in, e.g., Lux (1998), Guillaume, Dacorogna, Dave´, ¨ Muuller, Olsen, and Pictet (1997), and Gabaix, Gopikrishnan, Plerou, and Stanley (2006), tail exponents are similar for financial and economic time series in different countries. For a general discussion of heavytailedness in financial time series, see the discussion in Loretan and ¨ Phillips (1994), Rachev, Menn, and Fabozzi (2005), Embrechts, Kluuppelberg, and Mikosch (1997), Gabaix, Gopikrishnan, Plerou, and Stanley (2006), Ibragimov (2009a), and references therein.
R. Ibragimov et al. / Journal of Financial Economics 99 (2011) 333–348
difference between the two papers is our focus on the distributional properties of risks and the number of risk classes in the economy, which is not part of the analysis in Acharya (2009). Moreover, the mechanisms that generate the systemic risks are different in the two papers. Whereas the systemic risk in Acharya (2009) arises when individual intermediaries choose correlated real investments, in our model the systemic risk is introduced when intermediaries with limited liability become interdependent when they hedge their idiosyncratic risks by taking positions in what is in effect each others’ risk portfolios. Such interdependence may have been especially important for systemic risk in the recent financial crisis. This leads to a distinctive set of policy implications, as we develop in Section 4. Wagner (2010) independently develops a model of financial institutions in which there are negative externalities of systemic failures, and diversification therefore may be suboptimal from society’s perspective. Wagner’s analysis, however, focuses on the effects of conglomerate institutions created through mergers and acquisitions and the effects of contagion. Furthermore, the intermediary size and investment decisions are exogenously given in Wagner’s study, and only a uniform distribution of asset returns is considered. Our model, in contrast, emphasizes the importance of alternative risk distributions and the number of risks in determining the possibly negative externality of diversification. The two studies therefore complement each other. A related literature models market crashes based on contagion between individual institutions or markets. A concise survey is available in Brunnermeier (2009). Various mechanisms to propagate the contagion have been used. Typically it propagates through an externality, in which the failure of some institutions triggers the failure of others. Rochet and Tirole (1996) model an interbank lending market, which intrinsically propagates a shock in one bank across the banking system. Allen and Gale (2000) extend the Diamond and Dybvig (1983) bank run liquidity risk model, such that geographic or industry connections between individual banks, together with incomplete markets, allows for shocks to some banks to generate industry-wide collapse. Kyle and Xiong (2001) focus on cumulative price declines that are propagated by wealth effects from losses on trader portfolios. Kodes and Pritsker (2002) use informational shocks to trigger a sequence of synchronized portfolio rebalancing actions, which can depress market prices in a cumulative fashion. Caballero and Krishnamurthy (2008) focus on Knightian uncertainty and ambiguity aversion as the common factor that triggers a flight to safety and a market crash. Most recently, Brunnermeier and Pedersen (2009) model a cumulative collapse created by margin requirements and a string of margin calls. The key commonalities between our paper and this literature is the possibility of an outcome that allows a systemic market crash, with many firms failing at the same time. Moreover, as in many other papers, in our model there is an externality of the default of an intermediary—in our case, the extra time it takes to recover when many defaults occur at the same time. The
335
key distinction between our paper and this literature is, again, our focus on the importance of risk distributions and number of asset classes in an economy. Thus, a unique feature of our model is that the divergence between private and social welfare arises from the statistical features of the loss distributions for the underlying loans alone. This leads to strong, testable implications and to distinctive policy implications. In our model, these effects arise even without additional assumptions about agency problems (e.g., asymmetric information) or third-party subsidies (e.g., government bailouts). No doubt, such frictions and distortions would make the incentives of intermediaries and society even less aligned. The paper is organized as follows. In the next section we introduce some notation. In Section 3, we introduce the model, and in Section 4 we discuss its potential implications for risk management and policy making. Finally, some concluding remarks are made in Section 5. Proofs are left to a separate appendix. To simplify the reading, we provide a list of commonly used variables at the end of the paper. 2. Notation We use the following conventions: lower case thin letters represent scalars, upper case thin letters represent sets and functions, lower case bold letters represent vectors, and upper case bold letters represent matrices. The ith element of the vector v is denoted (v)i, or vi if this does not lead to confusion, and the n scalars vi ,i ¼ 1, . . . ,n form the vector [vi]i. We use T to denote the transpose of vectors and matrices. One specific vector is 1n ¼ ð1,1, . . . ,1 ÞT , (or just 1 when n is obvious). Similarly, |fflfflfflfflfflffl{zfflfflfflfflfflffl} n
we define 0n ¼ ð0,0, . . . ,0 ÞT . |fflfflfflfflfflffl{zfflfflfflfflfflffl} n
The expectation and variance of a random variable, x, are denoted by E½x (or simply Ex) and varðxÞ (or s2 ðxÞ), respectively, provided they are finite. The correlation and covariance between two random variables x1 and x2 with 2 2 Ex1 o 1, Ex2 o 1, are denoted by covðx1 , x2 Þ and corrðx1 , x2 Þ, respectively. We will make significant use, in particular, of the Pareto distribution. A random variable, x with cumulative distribution function (c.d.f.) F(x) is said to be of Pareto-type (in the left tail), if FðxÞ ¼
c þ oð1Þ ‘ðxÞ, xa
x- þ1,
ð1Þ
where c, a are some positive constants and ‘ðxÞ is a slowly varying function at infinity: ‘ðlxÞ -1 ‘ðxÞ as x- þ 1 for all l 40. Here, f ðxÞ ¼ oð1Þ as x- þ 1 means that limx- þ 1 f ðxÞ ¼ 0. The parameter a in (1) is referred to as the tail index or tail exponent of the c.d.f. F (or the random variable x). It characterizes the heaviness (the rate of decay) of the tail of F. We define a distribution to be heavy-tailed (in the loss domain) if it satisfies (1) with a r1. It is moderately heavy-tailed if 1o a o1, and it is
336
R. Ibragimov et al. / Journal of Financial Economics 99 (2011) 333–348
thin-tailed if a ¼ 1, i.e., if the distribution decreases faster than any algebraic power. 3. Model Consider an infinite horizon economy, t 2 f0,1,2, . . .g, in which there are M different risk classes. Time value of money is represented by a discount factor d o 1 so that the present value of one dollar at t ¼ 1 is d. There is a bond market in perfectly elastic supply, so that at t, a risk-free bond that pays off one dollar at t + 1 costs d. There are M risk-neutral trading units, each trading in a separate risk class. We may think of unit m as a representative trading unit for risk class m. Henceforth, we shall call these trading units intermediaries, capturing a large number of financial institutions, like banks, pension funds, insurance companies, and hedge funds. We thus assume that each trading unit, or intermediary, specializes in one risk class. Of course, in reality, intermediaries hold a variety, perhaps a wide variety of risks. The key point here is that the intermediaries are not initially holding the market portfolio of risks, so that they may have an incentive to share risks with each other. We think of the M risk classes as different risk lines or ‘‘industries,’’ e.g., representing real estate, publicly traded stocks, private equity, etc. Within each risk class, in each time period t, there is a large number, N, of individual multivariate normally distributed risks, xt,m n , 1 r n r N. We will subsequently let N tend to infinity, whereas M will be a small constant, typically less than 50, as is typical in financial and insurance applications (the results in the paper also hold in the case when the number of risks, N, in the mth class depends on m, as long as we let the number of risks in each risk class tend to infinity). For simplicity, we assume that risks belonging to different risk classes are independent, across time t, and risk class, n i.e., xt,m n is 0 0 independent of xtn0,m if mamu or tuat. This is not a crucial assumption; similar results would arise with correlated risk classes. For the time being, we focus on the first risk class, in time period zero. We therefore drop the m and t superscripts. Per assumption, the individual risks have multivariate normal distributions, related by xi þ 1 ¼ rxi þ wi þ 1 ,
i ¼ 1, . . . ,N1,
ð2Þ
for some r 2 ½0,1Þ.6 Here, wi are independent and identically distributed (i.i.d.) normally distributed random variables with zero mean and variance s2 ðwi Þ ¼ 1r2 . Each xi represents cross-sectional risk, with local dependence in the sense that covðxi ,xj Þ quickly approaches zero when jijj grows, i.e., the decay is exponential. The risks could, for example, represent individual mortgages and the total risk class would then represent all the mortgages in the economy. For low r, the risks of these mortgages are effectively 6 We focus on multivariate normal risks, for tractability. Similar results arise with other, thin-tailed, individual risks, e.g., Bernoulli distributions, although the analysis becomes more complex, because other distribution classes are not closed under portfolio formation so the central limit theorem needs to be incorporated into the analysis.
uncorrelated, except for risks that are very close. ‘‘Close’’ here could, for example, represent mortgages on houses in the same geographical area. If r is close to one, shocks are correlated across large distances, e.g., representing country-wide shocks to real estate prices. This structure thus allows for both ‘‘local’’ and ‘‘global’’ risk dependencies in a simple setting.7 For simplicity, we introduce symmetry in the risk structure by requiring that x1 ¼ rxN þ w1 ,
ð3Þ
i.e., the relationship between x1 and xN is the same as that between xi þ 1 and xi, i ¼ 1, . . . ,N1. We can rewrite the risk structure in matrix notation, by defining x ¼ ½xi i , w ¼ ½wi i . The relationship (2), (3) then becomes Ax ¼ w: Here, A is an invertible so-called circulant Toeplitz matrix,8 given by A ¼ ToeplitzN ½r,1,0TN2 ,r: Thus, given the vector with independent noise terms, w, the risk structure, x, is defined by def
x ¼ A1 w: The symmetry is merely for tractability and we would expect to get similar results without it (although at the expense of higher model complexity). In fact, for large N, the covariance structure in our model is very similar to that of a standard AR(1) process, defined by ^ 0, x^ 0 ¼ w ^ i þ 1, x^ i ¼ rx^ i þ w
i ¼ 2, . . . ,N,
^ 1 Þ ¼ 1, s2 ðw ^ iÞ ¼ ^ i ’s are independent, s2 ðw where the w 1r2 , i4 1 (although, of course, i does not denote a time subscript in our model, as it does in a standard AR(1) process). In this case, the matrix notation becomes ^ where A^ ¼ ToeplitzN ½r,1. The only difference A^ x^ ¼ w, between A and A^ is that A1N ¼ 1, whereas A^ 1N ¼ 0. The ^ ¼ ½covðx^ i , x^ j Þ has elements covariance matrix S ij
S^ ij ¼ rjijj : Thus, in the AR(1) setting, the covariance between risks decreases geometrically with the distance jijj. As we shall see in Theorem 1, this property carries over to the covariances in our symmetric structure when the number of risks within an asset class, N, is large. 7 For review and discussion of models with common shocks and modeling approaches for spatially dependent economic and financial data, see, among others, Conley (1999), Andrews (2005), Ibragimov and Walden (2007), and Ibragimov (2009b). 8 A Toeplitz matrix A ¼ ToeplitzN ½aN þ 1 ,aN þ 2 , . . . ,a1 ,a0 ,a1 , . . . , aN2 ,aN1 , is an N N matrix with the elements given by ðAÞij ¼ aij , 1 r i r N, 1 r j r N. A Toeplitz matrix is banded if ðAÞij ¼ 0 for large jijj, corresponding to ai ¼ 0 for indices i that are large by absolute value. When ai ¼ 0 if io k or i4 m, for k o N1, m o N1, we use the notation ak ,ak þ 1 , . . . ,a0 , . . . ,am1 ,am to represent the whole sequence generating the Toeplitz matrix. For example, the notation A ¼ ToeplitzN ½a1 ,a0 then means that Aii ¼ a0 , Ai,i1 ¼ a1 , and that all other elements of A are zero. For an N N Toeplitz matrix, if aNj ¼ aj , then the matrix is, in addition, circulant. See Horn and Johnson (1990) for more on the definition and properties of Toeplitz and circulant matrices.
Cumulative distribution function, P (ρ0, game repeated Fig. 2. Intermediary’s cash flows in model. At t = 0, the intermediary reserves capital k and trades in a risk portfolio c. At t ¼ 1 risks are realized. If the intermediary defaults, it is out of business, otherwise, the game is repeated, i.e., at t= 1, the intermediary reserves capital and trades in risk, at t = 2 risks are realized, etc.
Now, if the intermediary survives, which it does with probability def
q ¼ PðcT xr kÞ, the situation is repeated, i.e., at t= 1 the intermediary takes on new risk, reserves capital, and then at t= 2 risks are realized, etc. If the intermediary defaults at any point in time, it goes out of business and its cash flows are zero from there on. In the infinite horizon case, the value of the intermediary in recursive form is therefore V B ¼ dðcT 1Þ þ dqV B , implying that VB ¼
dðcT 1Þ : 1dq
ð8Þ
We note that the counter-party of the x risk transaction understands that the intermediary may default, and takes this into account when the price for the risk is agreed upon. Therefore, since the price of the contract takes into account the risk level of the intermediary, the counterparty does not need to impose additional covenants. Eq. (8) describes the trade-off the individual intermediary makes when choosing its portfolio. On the one hand, a larger portfolio increases the cash flows per unit time—increasing the numerator—but on the other hand, it also increases the risk of default—increasing the denominator. It is straightforward to show that if the intermediary could, it would take on an infinitely large portfolio. This is shown as a part of the proof of Theorem 1 in Appendix A. In terms of Eq. (8), the numerator effect dominates the denominator effect. A regulator, representing society, therefore imposes restrictions on the probability for default, to counterweight the risk-shifting motive. Specifically, the intermediary’s one-period probability of default is not allowed to exceed b at any point in time. In other words, the intermediary faces the constraint that the value at risk at the 1b confidence level can be at most k, VaR1b ðcT xÞ r k. Therefore, since the realized dollar losses are cTx and the capital reserved is k, the value at risk constraint says that the probability is at most b that the realized losses exceed the capital. In our notation, this is the same as to say that qZ 1b. The VaR constraint is imposed in the model to reflect the existing management and regulatory standards
through which most financial intermediaries currently operate. Most major financial intermediaries apply VaR as a management tool and regularly report their VaR values. The Basel II bank capital requirements are based on VaR. European securities firms face the same Basel II VaR requirements, while the major U.S. securities firms have become bank holding companies and thus also face these requirements. European insurance firms are being reregulated under ‘‘Solvency II,’’ which is VaR-based, in parallel to Basel II. U.S. insurers are regulated by individual states with capital requirements that vary by state and insurance line; these are generally consistent with a VaR interpretation. To be clear, VaR is not necessarily the optimal risk measure when financial intermediary behavior may create systemic externalities. Indeed, in this section we show that VaR requirements can lead to diversification decisions by individual intermediaries that are inconsistent with maximizing the societal welfare. In Section 4, we consider alternative proposals for intermediary regulation when systemic externalities are important. It is natural to think of cT 1=k as a measure of the leverage of the firm, since cT1 represents the total liability exposure and k represents the capital that can be used to cover losses. The assumption that there is an upper bound, K, that the intermediary faces on how much capital can be reserved is reduced form. In a full equilibrium model without frictions, the investment level would be chosen such that the marginal benefit and cost of an extra dollar of investment would be equal, and K would then be endogenously derived. In our reduced-form model, where the benefits d are constant, we assume that the marginal cost of raising capital beyond K is very high, so that K provides an effective hard constraint on the capital raising abilities of the intermediary. We note that the bound may be strictly lower than the friction-free outcome, because of other frictions on capital availability in imperfect financial markets (see, e.g., Froot, Scharfstein, and Stein, 1993 for a discussion of such frictions). Given a choice of capital, kr K, the VaR constraint imposed by the regulator then imposes a bound on how aggressively the intermediary can invest, i.e., on the intermediary’s size. The VaR constraint therefore also automatically imposes a capital requirement restriction on the intermediary. We next study the intermediary’s behavior in this environment. 3.1. Optimal behavior of intermediary In line with the previous arguments, the program for the intermediary is maxc,k
dðcT 1Þ 1dPðcT x rkÞ
s:t:,
ð9Þ
k 2 ½0,K,
ð10Þ
c 2 Rn ,
ð11Þ
k Z VaR1b ðcT xÞ:
ð12Þ
R. Ibragimov et al. / Journal of Financial Economics 99 (2011) 333–348
The following theorem characterizes the intermediary’s behavior:
Probability distribution function,f (y)
0.3
Theorem 1. Given a VaR constrained intermediary solving (9)–(12), where the risks are on the form (2,3), b is close to zero, and the distribution of correlations is of the form (4). Then, for large N: (a) For a given r, the covariance covðxi ,xj Þ converges to rjijj for any i, j. (b) The payoff of the intermediary’s chosen risk portfolio, cTx, is of Pareto-type with the tail index 2g, and with the probability distribution function (p.d.f.) f(y), where ! 2 1 by , G g þ 2 2 g2g bg , ð13Þ f ðyÞ ¼ pffiffiffiffi p jyj1 þ 2g for some constant, b4 0, for all y 2 R\f0g. Here, G is the def R lower incomplete Gamma function, Gða,xÞ ¼ 0x t a1 et pffiffiffiffiffiffi dt, a 4 0. Also, f ð0Þ ¼ 2g=ðb 2pð2g þ 1ÞÞ. (c) Maximal capital is reserved, k ¼ K, i.e., (10) is binding. (d) Maximal value at risk is chosen, i.e., (12) is binding. (e) If g 4 1, then the variance of the portfolio is
s2 ¼ b2
g , g1
ð14Þ
else it is infinite.
We note that, since Gða,xÞ ¼ GðaÞ þ oð1Þ, as x- þ 1, R1 (where GðaÞ is the Gamma function, GðaÞ ¼ 0 t a1 et dt, a 4 0), we obtain that cT x has a Pareto-type distribution in accordance with (1): 2g bg G g þ 12 þoð1Þ PðcT x 4yÞ ¼ , y-1: pffiffiffiffi g pjyj2g Thus, the portfolio chosen by the intermediary when solving (9)–(12) is moderately heavy-tailed, or even heavy-tailed when g o 12, although the individual risks are thin-tailed. We define the random variable x ¼ cT x and the constant c ¼ cT 1, and show the p.d.f. of x for the two cases g ¼ 1 (heavy-tailed distribution with infinite variance) and g ¼ 2:5 (moderately heavy-tailed distribution with finite variance) in Fig. 3. We also note that the total portfolio risk changes with the number of investments, N, in a very different way than what is implied by standard diversification results, for which the size of the investment is taken for given. If portfolio size were given, then the portfolio risk would vanish as N grew. However, taking into account that the intermediary can change its total exposure with the number of risks, in our framework, the risk does not mitigate, but instead converges to a portfolio risk that is much more heavy-tailed than the individual risks. Our approach of letting the number of risks within an asset class grow has some similarities with the idea of an asymptotically fine-grained portfolio, (see Gordy, 2000, 2003), used for a theoretical motivation of the capital rules laid out in Basel II. Gordy shows that under the assumptions of one systematic risk factor and infinitely
339
0.25 0.2 0.15 0.1 0.05
-4
-2
2
4
y Fig. 3. Probability distribution function, f, of intermediary’s chosen portfolio of risk, for g ¼ 1, corresponding to a heavy-tailed distribution with infinite variance, and for g ¼ 2:5, corresponding to a moderately heavy-tailed distribution with finite variance.
many small idiosyncratic risks, the portfolio invariant VaR rules of Basel II can be motivated. Similar to Gordy’s setting, in our model each risk is a small part of the total asset class risk when N is large. However, in our setting all risk is not diversified away when the number of risks increases, as shown in Theorem 1. In fact, the risk that is not diversified away adds up to systematic risk in our model (which turns into systemic risk if it brings down the whole system), and since the number of risk classes, M 4 1, there will be multiple risk factors. This is contrary to the analysis in Gordy (2000), where, after diversification, all institutions hold the same one-factor risk. Therefore, the capital rules in Basel II would not be motivated in our model.
3.1.1. The value to society We next turn to the value to society of the markets for the M separate risk classes. For the individual intermediary, the game ends if it defaults. From society’s perspective, however, we would expect other players to step in and take over the business if an intermediary defaults, although it may take time to set up the business, develop client relationships, etc. Therefore, the cost to society of an intermediary’s default may not be as serious as the value lost by the specific intermediary. The argument is that since there is a well-functioning market, it is quite easy to set up a copy of the intermediary that defaulted. We assume that an outside provider of capital steps in and sets up an intermediary identical to the one that defaulted. The argument per assumption, however, only works if there is a well-functioning market. In the unlikely event in which all intermediaries default, the market is not well-functioning and it may take a much longer time to set up the individual copy. Thus, individual intermediary default is less serious from society’s perspective, but massive intermediary default (for simplicity, the case when all M intermediaries default at the same time) is much more serious. A similar argument is made in Acharya (2009).
340
R. Ibragimov et al. / Journal of Financial Economics 99 (2011) 333–348
To incorporate this reasoning into a tractable model, we use a special numerical mechanism. When only one individual intermediary fails (or a few intermediaries), the market continues to operate reasonably well. In such cases, the replacement of a failed intermediary occurs immediately if an intermediary defaults in an even period ðt ¼ 0,2,4, . . .Þ, and takes just one period if it defaults in an odd period ðt ¼ 1,3,5, . . .Þ. We note that in this favorable case without massive default, it takes, on average, half a period to rebuild after an intermediary’s default (zero periods or one period with 50% chance each). On the other hand, when all intermediaries fail—a massive default—we assume there is a 50% chance that it will take another 2T periods (2T + 1 1) to return to normal operations. Therefore, the expected extra time to recover after a massive default is ð50%Þ ð2TÞ þ ð50%Þ ð0Þ ¼ T. Henceforth, we call T the recovery time after massive default. Our distinction between defaults in odd and even time periods significantly simplifies the analysis by ensuring that in even periods there are either 0 or M intermediaries in the market. Without the assumption, we would need to introduce a state variable describing how many intermediaries are alive at each point in time, which would increase the complexity of the analysis. Qualitatively similar results arise when relaxing the assumption, i.e., when instead assuming that it always takes one period to replace up to M 1 defaulting intermediaries and T periods if all M intermediaries default at the same time. The assumption of longer recovery times after massive intermediary defaults can be viewed as a reduced-form description of the externalities imposed on society by intermediary defaults. The simplicity of the assumption allows us to carry out a rigorous analysis of the role of risk distributions, which is the focus of this paper. Microfoundations for such externalities have been suggested elsewhere in the literature, e.g., liquidity risk as in Allen and Gale (2000). In Allen and Gale (2000), the externality occurs when failure of intermediaries triggers failure of other intermediaries. Society’s and individual intermediaries’ objectives may be unaligned. From our previous discussion, it follows that the value to society of all intermediaries at t= 0 is, in recursive form: 2
V S ¼ Mdc þ dqMdc þ d ð1ð1qÞM ÞV S þ d
2T þ 2
ð1qÞM V S , ð15Þ
implying that VS ¼
Mdcð1 þ dqÞ 2
2
2T
1d þ d ð1d Þð1qÞM
:
ð16Þ
The first term on the right-hand side of (15) is the value generated between t =0 and 1, and the second term is the expected value between t= 1 and 2. The third term is the discounted contribution to the value from t =2 and forward if all M intermediaries do not default (which occurs with probability (1 (1 q)M)), and the fourth term is the contribution to the value if all M intermediaries default (which occurs with probability (1 q)M). We shall see that this externality of massive intermediary default
in the form of longer recovery time significantly decreases—or even reverses—the value of diversification from society’s perspective and that the situations in which diversification is optimal versus suboptimal are easily characterized.
3.2. Risk-sharing We next analyze what happens when the M different intermediaries, with identically distributed risk portfolios, x1 , . . . , xM , get the opportunity to share risks. We recall that the risk portfolios in the different risk classes are independently distributed. Therefore, as long as the intermediaries do not trade, a shock to one intermediary will not spread to others. If the intermediaries trade, however, systemic risk may arise when their portfolios become more similar because of trades. In what follows, we explore this idea in detail. We assume that the intermediaries may trade risks at t ¼ 12 , 32 , 52 , . . ., after they have formed their portfolio, but before the risks are realized. We assume that the VaR requirement must hold at all times. For example, even if the intermediaries trade with each other at t ¼ 12, the VaR requirement needs to be satisfied between t ¼ 0 and 12.13 Also, the counter-party of the risk trade correctly anticipates whether trades between intermediaries will take place at t ¼ 12, taking the impact of such a trade on the default option, Q, into account when the price is decided. We focus on symmetric equilibria, in which all intermediaries choose to share risks fully. We define ! P M m ¼ 1 xm qM ¼ P rK , M so qM is the probability that total losses in the market are lower than total capital. Thus, 1 qM, is the probability for massive intermediary default if intermediaries fully share risks. We note, in passing, that systemic risk is generated through a different mechanism in our model than in Acharya (2009). In Acharya (2009), systemic risk arises when intermediaries take on correlated real investments, and intermediaries do not trade risks. In our model, the risk portfolios of different intermediaries are independent, and systemic risk arises when intermediaries become interdependent through trades. The latter type of systemic risk may have been especially important in the financial crisis. All intermediaries hold portfolios of the same risk features, albeit in different risk classes, so when they share risks, the net ‘‘price’’ they pay at t ¼ 12 is zero. However, the states of the world in which default occurs are different when the intermediaries share risks. In fact, defaults will be perfectly correlated in the full risk-sharing situation: With probability 1 qM a massive intermediary default occurs and with probability qM no intermediary defaults. Risk-sharing could, of course, be achieved not 13 This is not a critical assumption. Alternatively, we could have assumed that the regulator anticipates whether the individual intermediaries will trade risks, and adjusts the VaR requirements accordingly.
R. Ibragimov et al. / Journal of Financial Economics 99 (2011) 333–348
only by cross-ownership, but also by trading in derivatives contracts (credit default swaps, corporate debt, etc.). For individual intermediaries, the value when sharing risks (using the same arguments as when deriving (8)) is then B VM
dc ¼ , 1dqM
ð17Þ
341
Diversified: Massive default Separated: Single default
Loss -ξ2
K
Diversified: No default
Diversified: Massive default
Separated: Single default
Separated: Massive default Diversified: Massive default
which should be compared with the value of not sharing, (8). Therefore, as long as qM 4 q,
ð18Þ
the intermediaries prefer to trade risks, since it leads to a higher probability of survival, and thereby a higher value. We note that there may be additional reasons for intermediaries to prefer risk-sharing. For example, if intermediaries anticipate that they will be bailed out in case of massive defaults, if managers are less punished if an intermediary performs poorly when all other intermediaries perform poorly too, or if the VaR restrictions are relaxed, this provides additional incentives for risksharing, which would amplify our main result. 3.2.1. The value to society From society’s perspective, a similar argument as when deriving Eqs. (15) and (16) shows that 2
S S VM ¼ Mdc þ dqM Mdc þ d qM VM þd
2T þ 2
S ð1qM ÞVM ,
ð19Þ
so the total value of risk-sharing between intermediaries is S ¼ VM
Mdcð1þ dqM Þ 2
2
2T
1d þ d ð1d Þð1qM Þ
:
ð20Þ
Therefore, via (16), it follows that society prefers risksharing when 1 þ dqM 1 þ dq 4 , 1 þ lð1qM Þ 1 þ lð1qÞM
ð21Þ
where
l¼
d2 ð1d2T Þ : 2 1d
ð22Þ
The value of l determines the relative trade-off society makes between the costs in foregone investment opportunities of individual and massive intermediary default.14 If T ¼ 0, there is no extra delay when massive default occurs. In this case, l ¼ 0, and society’s trade-off (21) is the same as individual intermediaries’ (18). If, on the other hand, T is large and d is close to one, representing a situation in which it takes a long while to set up a market after massive default and the discount rate is low, l is large, and the trade-off between (1 qM) and (1 q)M in (21) becomes very important. In this case, society will mainly be interested in minimizing the risk of massive default and this risk is minimized when intermediaries do not share risk, since 1qM Zð1qÞM .15 14
It is easy to show that l 2 ½0,T, and that l is increasing in d and T. P This follows trivially, since 1qM ¼ Pð i xi 4 MKÞZ Pð\i fxi 4 KgÞ ¼ ð1qÞM . 15
Separated: Single default
-ξ1 -ξ2=2K Diversified: No default
Diversified: No default
Separated: No default
Separated: Single default
K Loss -ξ1
Fig. 4. Cost to society for diversified and separated cases, when there are two intermediaries. In the diversified case, massive default occurs if x1 x2 Z 2K. In the separated case, massive default occurs if x1 Z K and x2 Z K, whereas single default occurs if x1 Z K and x2 o K, or if x2 Z K and x1 o K. Thus, massive default is rarer in the separated case than in the diversified case, but it is more common that at least one intermediary defaults in the separated case.
It is clear that society and individual intermediaries may disagree about whether it is optimal to diversify. Specifically, this occurs when (18) holds, but (21) fails. The opposite, that (18) fails but (21) holds, can never occur.16 To understand the situation conceptually, we study the special case when there are two risk classes, M= 2. In Fig. 4, we show the different outcomes depending on the realizations of losses in the diversified and separated cases. When both x1 4K and x2 4K, it does not matter whether the intermediaries diversify or not, since there will be massive intermediary default either way. Similarly, if x1 r K and x2 r K, neither intermediary defaults, regardless of whether they diversify or not. However, in the case when x1 4K and x2 rK, the outcome is different, depending on the diversification strategy the intermediaries have chosen. In this case, if x1 x2 4 2K, then a massive default occurs if the intermediaries are diversified, but only a single default if they are not. If x1 x2 r 2K, on the other hand, no default occurs if the intermediaries are diversified, but a single default occurs if they are not. Exactly the same argument applies in the case when x1 rK and x2 4 K. Therefore, the optimal outcome depends on the trade-off between avoiding single defaults in some states of the world but introducing massive defaults in others, when diversifying. The point that risk-sharing increases the risk for joint failure is, of course, not new—it was made as early as in Shaffer (1994). Our analyses of the trade-off between diversification benefits and costs of massive failures, the importance of distributional properties, correlation uncertainty, and number of asset classes, as well as the implications for financial institutions, however, are to the best of our knowledge novel.
16 This also follows trivially, since 1qM Z ð1qÞM and therefore, if q Z qM , then ð1 þ dqÞ=ð1 þ lð1qÞM ÞZ ð1 þ dqÞ=ð1 þ lð1qM ÞÞZ ð1þ dqM Þ= ð1 þ lð1qÞM Þ.
R. Ibragimov et al. / Journal of Financial Economics 99 (2011) 333–348
Theorem 2. Given i.i.d. asset class risks, x1 , . . . , xM , of Pareto-type (1) with the tail index aa1, and d, d, and T as previously defined. Then, for low b, (a) From an intermediary’s perspective, risk-sharing is optimal if and only if a 4 1, regardless of the number of risk classes, M. (b) From society’s perspective, risk-sharing is optimal if and only if a 4 1 and M 4 M , where M is the diversification threshold ! 2T þ 1 1=ða1Þ 1d M ¼ : ð23Þ 1d
Thus, the break-even for individual intermediaries, from (18), is a Pareto distribution with the tail index of one, e.g., a Cauchy distribution. If the distribution is heavier, then (18) fails, and intermediaries and society therefore agree that there should be no risk-sharing. For thinner tails, intermediaries prefer risk-sharing. On the contrary, for society, what is optimal depends on M. It is easy to show that M 2 ½1,ð2T þ 1Þ1=ða1Þ Þ, and that Mn is increasing in d and T. We note that for the limit case when d-0, a-1, or T= 0, i.e., when M -1, then society and intermediaries always agree. It is useful to define 2T þ 1
Z¼
1d 1d
,
60
T=30
50 T=20
40
η
It is clear that the objectives of the intermediaries and society will depend on the distributions of the xrisks and it may not be surprising that standard results from the theory of diversification apply to the individual intermediaries’ problem. Society’s objective function is more complex, however, since it trades off the costs of massive and individual intermediary defaults. It comes as a pleasant surprise that for low default risks (i.e., for a b close to zero in the VaR1b constraint) given the distribution of the xrisk, we can completely characterize when the objectives of intermediaries and society are different.
30 T=10
20
10 T=1 0 0.8
0.82 0.84 0.86 0.88
0.9
0.92 0.94 0.96 0.98
1
δ 2T þ 1
Fig. 5. The coefficient, Z ¼ ð1d Þ=ð1dÞ, as a function of the discount factor, d, for different recovery times after massive default, T= 1,10,20,30. All else equal, a higher Z leads to a higher diversification threshold, M .
60 Diversification threshold, M*
342
α=2
50 40 30 20
α=2.5
10
α=3 α=4
0 0
10
20
30 η
40
50
60
Fig. 6. Diversification threshold, M*, for Pareto-type risk distributions with tail indices a ¼ 2,2:5,3,4, as a function of the coefficient Z ¼ ð1d2T þ 1 Þ=ð1dÞ, where d is the per period discount factor and T is the recovery time after massive default. The diversification threshold is the break-even number of risk classes needed for diversification to be optimal for society. Results are asymptotically valid for VaR confidence levels close to 100%, i.e., for b close to zero in (12).
ð24Þ
so that M ¼ Z1=ða1Þ . This separates the impact on Mn of the tail behavior of the risk distributions from the other factors (d and T). It follows immediately that Z 2 ½1,2T þ1Þ. In Fig. 5, we show Z as a function of d for T ¼ 1, 10, 20, 30. From Theorem 2, it follows that if there is a large number of risk classes, M, society agrees with the individual intermediaries: Corollary 1. Given i.i.d. Pareto-type risk classes with the tail index a, if there is a large enough number of risk classes available, intermediaries and society agree on whether risksharing is optimal. In Fig. 6, we show the break-even number of risk classes, Mn, for Pareto-type risks with the tail indices a ¼ 2,2:5,3,4. With no externality of massive default (Z ¼ 1), society prefers diversification when a 4 1, just like individual intermediaries. However, when Z 4 1,
diversification is suboptimal up until Mn. For example, for a ¼ 2, and Z ¼ 10, at least 10 risk-classes are needed for diversification to be optimal for society. Theorem 2 provides a complete characterization of the objectives of intermediaries and firms for VaR close to the 100% confidence level, by relating the number of risk classes, M, the discount rate, d, the tail distribution of the risks, a, and the recovery time after massive default, T. It also relates to the uncertainty of correlations, g, through the relation a ¼ 2g. Theorem 2 is therefore the fundamental result of this paper. The theorem has several immediate empirical implications, since when (23) fails, we would expect there to exist financial regulations against risk-sharing across risk classes:
Implication 1. Economies with heavier-tailed risk distributions should have stricter regulations for risk-sharing between risk classes.
R. Ibragimov et al. / Journal of Financial Economics 99 (2011) 333–348
Equivalently, using the relationship between uncertainty of correlation structure and tail distributions,
We also have Implication 3. Economies with lower interest rates should have stricter regulations for risk-sharing. Implication 4. Economies with fewer risk classes should have stricter regulations for risk-sharing. Implication 5. Economies with risk classes for which it takes longer to recover after a massive default should have stricter regulations for risk-sharing. Implication 4 may be related to the size of the economy, in that larger economies may have more risk classes and therefore, society may be more tolerant to risk-sharing across these classes. Moreover, Implication 5 may be related to the degree to which an economy is open to foreign investments in that economies that are open may be faster in recovering after a massive default, and thereby allow more risk-sharing between risk classes. In Section 4, we apply these implications to derive policy solutions for controlling the social costs created when actions by intermediaries to diversify create investments in highly correlated market portfolios. Theorem 2 provides an asymptotic result and therefore, does not enlighten us about what the break-even number of risk classes is when the default risks, b, is significantly different from zero. For b significantly different from zero, the analysis is less tractable. Because of the noncoherency of the value at risk measure, we suspect that diversification may be less valuable for society in such cases. Indeed, we have the following sufficient condition for diversification to be optimal from society’s perspective, which holds for all b below some threshold. Theorem 3. Consider i.i.d. asset class risks, x1 , . . . , xM , of Pareto-type with the tail index a 4 2, and Z as defined in def (24). Define the tail probability FðxÞ ¼ Pðx1 4 xÞ. Assume 2 that the variance of x1 is s , that FðK0 Þ ¼ b0 , and that for K 4 K0 , c1 c2 r FðKÞ r a , Ka K
ð25Þ
for constants 0 oc1 rc2 . Then, for all b r b0 , risk-sharing is optimal from society’s perspective, if the number of asset classes, M, satisfies c1 2a M a C ZM a=2 Mc2 Zaa Z 0, 2 a=2
where C ¼ ð2eas Þ
ð26Þ
, and e is the constant e ¼ 2:71828 . . ..
In Fig. 7 we show the break-even number of risks for the Pareto-distributed risks of Section 3.1, for which a ¼ 2g. The bound is chosen so that the results are valid for all value at risk at confidence levels above 99%, i.e., for b r1%. We see that, compared with Fig. 6, M needs to be significantly higher to ensure that society prefers
α=3
90 Diversification threshold,M*
Implication 2. Economies with more uncertain correlation between risk classes should have stricter regulations for risk-sharing.
100
343
80 α=4
70 60 50 40
α=5
30 20 10 0 0
10
20
30 η
40
50
60
Fig. 7. Diversification threshold, M , i.e., sufficient condition for number of risk classes, M 4M , to be large enough to guarantee that society prefers diversification, for Pareto-type risk distributions with tail indices a ¼ 3,4,5, as a function of Z. Here, Z ¼ ð1d2T þ 1 Þ=ð1dÞ, d is the per period discount factor and T is the recovery time after massive default. Results are valid for VaRs above the 99% confidence level.
diversification. The bound gets even worse if the VaR is at a lower confidence level, e.g., at the 95% level. As an example, assume that the average time it takes to replace a defaulted intermediary is six months if the default is individual (i.e., that one-period is six months), that the one period discount factor is d ¼ 0:99 (so that the one-year discount rate is approximately 2%), and that if there is massive intermediary default, it takes five years to restore the whole market. In this case, Z ¼ ð10:9910 Þ= ð10:99Þ 9:6. From Fig. 7, if the risk has a tail exponent of a ¼ 4, there needs to be about M ¼ 30 risk classes to guarantee that it is optimal to diversify at a VaR at the 99% confidence level in this case. Thus, diversification may be suboptimal from society’s perspective even with moderately heavy-tailed risks, as long as the number of risk classes is not large.
3.3. Extensions to general distributions We have analyzed the simplest possible framework, with specific independent identically distributed risk classes. Here, we sketch how our results can be generalized to more general distributions. Similar to the proof of Theorem 3, one can obtain its extension to the case of the distribution of K such that, for K 4 K0 , c1 =K a rFðKÞ r c2 =K z for constants 0 o c1 rc2 and a Z z 4 1. For such distributions, the risk-sharing outcome is optimal from society’s perspective, if the number of asset classes, M is sufficiently large compared to K so that it satisfies 2z c1 Mz K z M Zc2 az K a C ZM za=2 K z Z0, where C ¼ 2za=2 ðezs2 Þa=2 . Using conditioning arguments, it is straightforward to show that the results in this paper continue to hold for (possibly dependent) scale mixtures of normals, with wi ¼ Vi Zi , where Zi are i.i.d. normally distributed zeromean random variables and Vi are (possibly dependent) random variables independent of Zi. This is a rather large class of distributions: it includes, for instance, the Student’s t-distributions with arbitrary degrees of freedom (including the Cauchy distribution), the double
344
R. Ibragimov et al. / Journal of Financial Economics 99 (2011) 333–348
exponential distribution, the logistic distribution, and all symmetric stable distributions. The bounds on the number of risk classes M provided by Theorem 3 can be sharpened using extensions and refinements of probability inequality (36) provided by Lemma 1 used in the proof of the theorem. In particular, one can use probability inequalities in terms of higher moments of the independent random variables x (see, for instance, Theorem 1.3 and Corollaries 1.7 and 1.8 in Nagaev, 1979).
4. Potential implications for risk management and for policy makers Capital requirements provide the most common mechanism used to control the risk of bank failure. Set at a high enough level, a simple capital-to-asset requirement can achieve any desired level of safety for an individual bank. Such capital requirements, however, impose significant costs on banks by limiting their use of debt tax shields, expanding the problem of debt overhang, and creating agency problems for the shareholders.17 These costs have a negative impact on the overall economy because they reduce the efficiency of financial intermediation.18 In our model, it is clear that increasing capital requirements is an imperfect tool for the regulator, since it cannot be used to specifically target negative externalities of systemic risk. In fact, there is a one-to-one relationship between VaR and capital requirements in our model—increasing the capital requirement is equivalent to decreasing the VaR. For this reason, new proposals to control systemic risk in the banking sector recommend focused capital requirements based on each bank’s specific contribution to the aggregate systemic risk. Acharya (2009), for example, advocates higher capital requirements for banks holding asset portfolios that are highly correlated with the portfolios of other banks. This follows from his model in which banks create systemic risk by choosing correlated portfolios. In a similar spirit, Adrian and Brunnermeier (2009) advocate a ‘‘CoVaR’’ method in which banks face 17 The debt overhang problem arises in recapitalizing a bank because the existing shareholder ownership is diluted while some of the cash inflow benefit accrues as a credit upgrade for the existing bondholders and other bank creditors. The agency problems arise because larger capital ratios provide management greater incentive to carry out risky investments that raise the expected value of compensation but may reduce expected equity returns; for further discussion, see Kashyap, Rajan, and Stein (2008). While the tax shield benefit of debt is valuable for the banking industry, it is not necessarily welfare-enhancing for society. 18 The efficiency costs of capital requirements can be mitigated by setting the requirements in terms of contingent capital in lieu of balance sheet capital. One mechanism is based on bonds that convert to capital if bankruptcy is threatened (Flannery, 2005), but that instrument is not particularly directed to systemic risk. Kashyap, Rajan, and Stein (2008) take the contingent capital idea a step further by requiring banks to purchase ‘‘capital insurance’’ that provides cash to the bank if industry losses, or some comparable aggregate trigger, hits a specified threshold. This mechanism may reduce or eliminate the costs that are otherwise created by bankruptcy, but it does not eliminate the negative externality that creates the systemic risk.
higher capital requirements based on their measured contribution to the aggregate systemic risk. Focused capital requirements will be efficient in controlling systemic risk, however, only if the source of the systemic risk is properly identified. In particular, the model in this paper creates a symmetric equilibrium in which each of the M banks is responsible for precisely 1/M of the systemic risk. Furthermore, systemic risk in our model arises only as a byproduct—a true negative externality—of each bank’s attempt to eliminate its own idiosyncratic risk. The risks that they take on are independent, but the diversification changes the states of the world in which massive default occurs. For this reason, neither VaR constraints, nor the capital requirement plans advocated by Acharya (2009) will be effective in controlling the type of systemic risk that arises in our model. The CoVaR measure in Adrian and Brunnermeier (2009) may also be imperfect, since it does not take taildistributions into account. Of course, systemic risk in the banking industry may reflect a variety of generating mechanisms, so we are not claiming anything like a monopoly on the proper regulatory response. But it is the case that even focused capital requirements will not be effective if the systemic risk arises because banks hedge their idiosyncratic risk by swapping into a ‘‘market’’ portfolio that is then held in common by all banks. For the case developed in our model, where all banks wish to adopt the same diversified market portfolio, direct prohibitions against specific banking activities or investments in specific asset classes will be more effective than capital requirements as a mechanism to control systemic risk. For example, Volcker (2010) has recently proposed to restrict commercial banking organizations from certain proprietary and more speculative activities. While such prohibitions may seem draconian, they would apply only to activities or asset classes in which moderately heavy tails create a discrepancy between the private and public benefits of diversification. No regulatory action would be needed for asset classes with thin-tailed risks, where the banks and society both benefit from diversification, and for severely heavy-tailed risks, where the banks and society agree that diversification is not beneficial. It is also useful to note that direct prohibitions have long existed in U.S. banking regulation. For one thing, U.S. commercial banks have long been prohibited from investing in equity shares. Even more relevant, the 1933 Glass Steagall Act forced U.S. commercial banks to divest their investment banking divisions. Subsequent legislation—specifically the 1956 Bank Holding Company Act and the Gramm-Leach-Bliley Act of 1999—provided more flexibility, by expanding the range of allowed activities for a bank holding company, although commercial banks are themselves still restricted to a ‘‘banking business.’’ GlassSteagall thus provides a precedent for direct prohibitions on bank activities as well as an indication that the prohibitions can be changed over time as conditions warrant. Experience with regulating catastrophe insurance counter-party risk suggests another practical and specific regulatory approach, namely ‘‘monoline’’ requirements. Monoline requirements have long been successfully
R. Ibragimov et al. / Journal of Financial Economics 99 (2011) 333–348
imposed on insurance firms that provide coverage against default by mortgage and municipal bond borrowers; see Jaffee (2006, 2009). The monoline requirements prohibit these insurers from operating as multiline insurers that offer coverage on multiple insurance lines. The monoline restriction eliminates the possibility that large losses on the catastrophe line would bankrupt a multiline insurer, thus creating a cascade of losses for policyholders across its other insurance lines. Such monoline restrictions do create a cost in the form of the lost benefits of diversification, because a monoline insurer is unable to deploy its capital to pay claims against a portfolio of insurance risks. Nevertheless, Ibragimov, Jaffee, and Walden (2008) show that when the benefits of diversification are muted by heavy tails or other distributional features, the social benefits of controlling the systemic risk dominate the lost benefits of diversification. As a specific example, we reference the use of credit default swaps (CDS) purchased by banks and other investors to provide protection against default on the subprime mortgage securities they held in their portfolios. The systemic problem was that the CDS protection was provided by other banks and financial service firms acting as banks, i.e., American International Group, Inc. (AIG), with the effect that a set of large banks ended up holding a very similar, albeit diversified, portfolio of subprime mortgage risks.19 When the risks on the individual underlying mortgages proved to be highly correlated, this portfolio suffered enormous losses, creating the systemic crisis. Capital requirements actually provided the investing banks with incentive to purchase the CDS protection, so that higher capital requirements, per se, do not solve the systemic problem. Instead, there must be regulatory recognition that moderately heavy-tailed risk distributions create situations in which the social costs may exceed the private benefits of diversification. 5. Concluding remarks The subprime financial crisis has revealed highly significant externalities through which the actions of individual intermediaries may create enormous systemic risks. The model in this paper highlights the differences between risks evaluated by individual intermediaries versus society. We develop a model in which the negative externality arises because actions to diversify that are optimal for individual intermediaries may prove to be suboptimal for society. We show that the distributional properties of the risks are crucial: most importantly, with moderately heavytailed risks, the diversification actions of individual intermediaries may be suboptimal for society. We also show that when there is uncertainty about correlations between a large number of thin-tailed risks and intermediaries face value at risk constraints, this is exactly the 19 It is important to note that AIG wrote its CDS contracts from its Financial Products subsidiary, which was chartered as a savings and loan association and not as an insurance firm. Indeed, AIG also owns a monoline mortgage insurer, United Guaranty, but this subsidiary was not the source of the losses that forced the government bailout.
345
type of risk portfolios they will choose. Also, the number of distinct asset classes in the economy, the discount rate, and the time to recover after a massive intermediary default influence the value to society of diversification. The optimal outcome from society’s perspective involves less risk-sharing, but also creates a lower probability for massive intermediary collapses. The policy implications of our model are very direct: banking regulation should be expanded in order to restrict the ability of banks to swap their loan portfolios containing idiosyncratic risk for market portfolios of loan risk. Such direct restrictions appear preferable to VaR constraints, higher capital requirements, and to CoVaR measures when the goal is to deter the massive bank defaults that may arise when banks diversify in this manner. Monoline requirements, which are already applied to insurance firms that offer insurance coverage against default on mortgages and municipal bonds, provide an example of how such direct restrictions can be implemented. Appendix A. Proofs
Proof of Theorem 1. We first study the problem, given a fixed N and show that there exists a solution to Eqs. (9)–(12). We replace the maximum by the supremum and show that the problem has a finite solution. Of course, c ¼ 0, k ¼ 0, is a feasible choice that satisfies Eqs. (10)–(12), immediately,immediately leading to a lower bound of 0 for the function under the maximum sign in (9). Now, if there is a finite upper bound on (9), then the supremum problem has a finite solution. Given the definition of S, it is clear that the variance of cTx is increasing in r, and since, given r, the distribution of the portfolio is normal, the value at risk, VaR1b ðcT xÞ, for a given c is also increasing in r. Therefore, the value at risk of cTx is greater than the value at risk for cTy, where y is a vector of i.i.d. normally distributed random variables with mean 0 and variance s2 . Define the sets Cy ¼ fc 2 RNþ : VaR1b ðcT yÞ rKg, and Cx ¼ fc 2 RNþ : VaR1b ðcT xÞ rKg. From the previous argument, it follows that Cx Cy . Now, Cy is defined by the P elements c that satisfy i ðci Þ2 r r, for some 0 or o 1, and since this is a compact set, and the objective function is continuous, the supremum of the relaxed problem (over Cy) is bounded above, which obviously then is also true for the intermediary’s problem. Moreover, it is clear that Cx is a closed set, and (since Cx Cy ), thereby compact. By continuity, it follows that the supremum to the intermediary’s problem is achieved. We will show that the p.d.f. of a uniform portfolio, c ¼ ð1=NÞ1, takes the form (30). Therefore, given that K 4 0, it is always possible to choose k= K, find a strictly positive e, and choose a uniform portfolio c ¼ ðe=NÞ1, such that the constraint (12) is satisfied. Clearly, for such a portfolio, (9) is strictly positive, so c ¼ 0 is not a solution to the optimization problem.
346
R. Ibragimov et al. / Journal of Financial Economics 99 (2011) 333–348
Given that there is a strictly positive solution to Eqs. (9)–(12), we now analyze its properties. It is easy to see that (10) must be binding with k ¼ K, since the program is homogeneous of degree one in k, i.e., assuming that k o K, c is a solution with r ¼ dcT 1=ð1dPðcT x 4kÞÞ 4 0, then choosing capital K, and portfolio c2 ¼ ðK=kÞc will give ru ¼ dcT2 1=ð1dPðcT2 x 4 KÞÞ ¼ ðK=kÞdcT 1=ð1dP ðcT x 4 kÞÞ ¼ ðK=kÞr 4 r, yielding a contradiction. Thus, property (c) is proved. We next show that it is always optimal for the intermediary to choose a uniform portfolio, c ¼ c1 for some c 4 0. Given a normally distributed risk, y Nð0, s2 Þ and a value at risk requirement of K0 at the 1b confidence level, it immediately follows that the maximum size of investment in this risk, c, and thereby the maximum premium collected (that does not break the VaR requirement) is c¼
K0
F1 ð1bÞ
1
s
,
ð27Þ
where F is the standard normal c.d.f.: FðxÞ ¼ pffiffiffiffiffiffi Rx t2 =2 = 2p dt. From (27) it follows that any 1 e VaR-constrained portfolio manager, whose compensation is monotonically increasing in scale, cT1—e.g., linear as assumed—who chooses among portfolios of multivariate normally distributed risks, will choose the portfolio with the lowest variance, s. Given r, standard matrix algebra yields that S ¼ covðA1 x,A1 xÞ ¼ ð1r2 ÞA1 AT . Also, for large N, standard theory of Toeplitz matrices (see, e.g., Gray, ^ ij ¼ rjijj . Therefore, property 2006) implies that Sij -S (a) holds. Conditional on r, to choose the portfolio with the lowest variance, the manager solves minq2Sn qT Sq, where S depends on r. We have S1 ¼ ð1=ð1r2 ÞÞAT A. The solution to the portfolio problem is q¼
S1 1 AT A1 s 1 1 ¼ 1: ¼ T T ¼ T 1 sN N 1 S 1 1 A A1
Here, the result follows, since 1 is an eigenvector to ATA with nonzero eigenvalue, s, since A is invertible. The manager thus invests uniformly, which is not surprising since the risks are symmetric. This is the optimal strategy regardless of the correlation, r, so even with uncertain correlations, it is always optimal to choose the uniform portfolio. We next study the distribution of the uniform portfolio. pffiffiffiffi We define xN ¼ ð1= N Þ1TN xN , where xN on the right-hand side contains N elements. The c.d.f. of xN is FN ðyÞ ¼ PðxN r yÞ. From the previous argument, it is clear that the agent’s program with N risks simplifies to maxcN
dcN K 1dFN cN
s:t:,
ð28Þ
FN
K Z 1b: cN
ð29Þ
Conditional on r, xN is normally distributed with variance s2N ðrÞ ¼ ðð1r2 Þ=NÞ1T A1 AT 1. However, since ATA is a symmetric positive definite matrix and 1 is an eigenvector with corresponding eigenvalue ð1rÞ2 , it follows that 1 is an eigenvector to A 1A T with eigenvalue 1=ð1rÞ2 . Therefore, s2N ðrÞ ¼ ðð1r2 Þ=NÞ1T A1 AT 1 ¼ ðð1r2 Þ=NÞ1T ð1=ð1rÞ2 Þ1 ¼ ð1þ rÞ=ð1rÞ. Using (5), we immediately obtain that the p.d.f. of xN is Z 1 1 g 2 pffiffiffiffiffiffiffiffiffi ey =2v g þ 1 dv f0 ðyÞ ¼ v 2pv 1 1 y2 G gþ , g g2 2 2 : ð30Þ ¼ pffiffiffiffi p jyj1 þ 2g It is easy to show that f0(y) is decreasing in y 2 ð0,1Þ, that is, f0(y) is unimodal and symmetric (about 0): this conclusion may be obtained directly or by noting that f0(y) is a scale mixture of normal distributions and conditioning arguments (see, for instance, the proof of Theorems 5.1 and 5.2 in Ibragimov, 2009b). We study the optimization problem (28) and (29), and rewrite it as maxz
Kd 1 z 1dFðzÞ
FðzÞ Z 1b,
s:t:,
ð31Þ
ð32Þ Ry
where FðyÞ ¼ 1 f0 ðzÞ dz. The constraint that b is close to zero now implies that the minimal feasible z, such that (12) is satisfied, is large. Clearly, ðKd=zÞð1=ð1dFðzÞÞÞ is positive for all z. We note that ðKd=zÞð1=ð1dFðzÞÞÞ becomes arbitrarily large as z-0, so if there was no VaR constraint, it would be optimal to take on an arbitrarily large amount of risk. The derivative with respect to z is Kd 1 d Kd z 1dFðzÞ ¼ ðdFðzÞ1þ zdf0 ðzÞÞ: dz z2 ð1dFðzÞÞ2 Now, since FðzÞ r1, d o1 and zf 0 ðzÞ-0 for large z, dFðzÞ1 þzdf0 ðzÞ r d1 þzf 0 ðzÞ o0 for large z. Therefore, d½ðKd=zÞð1=ð1dFðzÞÞÞ=dzo 0 for large z. The feasible set for z is ½z,1Þ, for some z 4 0, and the VaR constraint is binding if z ¼ z is chosen. When b is close to zero, z is large and therefore d½ðKd=zÞð1=ð1dFðzÞÞÞ=dzo 0 for all feasible z. Therefore, the optimum is reached at z and the VaR constraint, (32), is indeed binding. We note that this argument does not depend on the distribution, so the VaR constraint will also be binding when we study the diversified case. We have shown property (d). This binding VaR constraint corresponds to c ¼ K=F 1 ð1bÞ. Since there is a unique optimum, the same cN c in (28) will be chosen regardless of N, which via (30) immediately implies property (b) (with b ¼ 1=cÞ.
R. Ibragimov et al. / Journal of Financial Economics 99 (2011) 333–348
Finally, it is easy to check that for g 4 1, R1 VarðxN Þ ¼ 1 y2 f0 ðyÞ dy ¼ g=ðg1Þ, which immediately leads to property (e). We are done. & Proof of Theorem 2. Define the tail probabilities P and FM ðKÞ ¼ Pð M FðxÞ ¼ Pðx1 4 xÞ m ¼ 1 xm =M 4KÞ. From Feller (1970, pp. 275–279), it follows that since x1 is of Pareto-type with F1 ðKÞ ¼ ðð1 þoð1ÞÞ=K a Þ‘ðKÞ, as K- þ 1, for some slowly varying function, ‘ðKÞ, 1 þ oð1Þ a FM ðKÞ ¼ M ‘ðMKÞ, ð33Þ MK as K- þ1. For risk-sharing to be optimal from an intermediary’s perspective, (18) is therefore equivalent to 1 þ oð1Þ a 1 þ oð1Þ ‘ðMKÞ o ‘ðKÞ, M MK Ka which implies
To show that ZFM ðKÞ o FðKÞ, we use the following lemma. Lemma 1 (Nagaev, 1979, Corollary 1.11). For independent P 2 random variables Xi, with E½Xi ¼ 0 and M i ¼ 1 E½Xi ¼ BM , it is the case that ! x=y M M X X B P Xi 4 x r PðXi 4 yÞ þ ex=y M , x,y 40: xy i¼1 i¼1 ð36Þ Using (36) for the xrisks, for which BM ¼ Ms2 , by choosing x= MK and y ¼ 2MK=a, we get 2MK 2M a=2 a FM ðKÞ r MF K : þ ea=2 sa
a
as K- þ1, which, for large K, holds if and only if a o 1. We have proved the first part of the theorem. For the second part, since q ¼ 1FðKÞ and qM ¼ 1FM ðKÞ, (21) and (33) together imply that risk-sharing from society’s perspective is optimal if and only if ð1þ ddFM ðKÞÞð1þ lFðKÞM Þ 4 ð1þ ddFðKÞÞð1 þ lFM ðKÞÞ,
From (25), it then follows that, for K 4 K0 , 2M a=2 a FM ðKÞ r Mc2 aa ð2MKÞa þea=2 sa K ,
a
implies ZFM ðKÞ o FðKÞ. Now, by multiplying (37) by ð2MKÞa , we get
c1 2a M a C ZM a=2 M Zc2 aa Z 0:
This is equivalent to 1 þoð1Þ 1þ oð1Þ a ‘ðKÞ 4 ZM ‘ðMKÞ, a K MK
We are done.
ð34Þ
Since Z Z1, (34) is never satisfied for M 41 and a o 1, so risk-sharing is never optimal from society’s perspective when a o 1. For a 4 1, on the other hand, (34) is satisfied if and only if ð35Þ
M N xt,m n
r x c c
x, xi We have proved the second part of the theorem. & P Proof of Theorem 3. Define FM ðKÞ ¼ Pðð M m ¼ 1 xm = MÞ 4 KÞ. We first show that ZFM ðKÞ o FðKÞ is a sufficient condition for (21). We have
ZFM oF3dFM þ ð1 þ dÞlFM o dF ) 1 þ ddF þ ð1þ dÞlFM dF lFM o 1þ ddFM 3ð1 þ ddFÞð1 þ lFM Þ o 1 þ ddFM 1 þ ddFM 31 þ ddF o 1 þ lFM 1 þ dqM 1 þ lð1qM Þ
1 þ dq 1 þ lð1qÞM
o
1 þ dqM : 1 þ lð1qM Þ
&
List of variables
as K- þ1, which, in turn, is satisfied if and only if, for large K,
M 4 Z1=ða1Þ :
, it follows that the condition ! a=2 a a a =2 a 2M a Z Mc2 a ð2MKÞ þ e s K ð37Þ r c1 K a
i.e.,
dFðKÞ 4 dFM ðKÞ þ lð1 þ dÞFM ðKÞ ¼ dZFM ðKÞ:
M a1 4 Z:
and since FðKÞ Z c1 K
a
Mc2 Zaa þ Zea=2 sa ð2MaÞa=2 rc1 2a M a ,
as K-þ 1, which is satisfied if and only if
)
a
a
M 1a o 1þ oð1Þ,
31 þ dqo
347
d K
b Q Mn T d
g a f
number of risk classes number of risks within each risk class individual risk n, within risk class m at time t correlation between risks within risk class (for large N) vector of individual risks in first risk class vector of investments of intermediary (in first risk class) total size of risk investment, c ¼ cT 1 risk portfolio invested in by intermediary (in risk class i), x ¼ cT x one period discount factor capital reserved by intermediary value at risk at 1b confidence level may not be higher that K, VaR1b ðxÞ r K outcome of intermediary’s option to default diversification threshold, above which society prefers risk-sharing recovery time after massive default premium intermediary collects per unit risk of investment parameter governing uncertainty of correlations between risks in same risk class tail distribution of risk class (x). a ¼ 2g probability distribution function of portfolio risk, x
348
F q FM qM
l
Z VB VS VBM VSM
R. Ibragimov et al. / Journal of Financial Economics 99 (2011) 333–348
cumulative loss distribution of portfolio risk, FðxÞ ¼ Pðx 4 xÞ chance for individual intermediary of not defaulting when risks are separated q ¼ 1FðKÞ cumulative loss distribution of fully diversified P portfolio FM ðxÞ ¼ Pð i xi =M 4 xÞ chance for intermediaries of not defaulting when risks are shared qM ¼ 1FM ðKÞ society’s trade off between risk-sharing and separated outcomes (Eq. (22)) Z ¼ Ma1 , a 4 1 value of one intermediary if separated value of one intermediary if risk-sharing value to society if separated value to society if risk-sharing
References Acharya, V., 2009. A theory of systemic risk and design of prudent bank regulation. Journal of Financial Stability 5, 224–255. Adrian, T., Brunnermeier, M., 2009. CoVar. Unpublished working paper. Princeton University. Allen, F., Gale, D., 2000. Financial contagion. Journal of Political Economy 108, 1–33. Andrews, D.W.K., 2005. Cross-section regression with common shocks. Econometrica 73, 1551–1585. Bernanke, B.S., 1983. Nonmonetary effects of the financial crisis in the propagation of the great depression. American Economic Review 73, 257–276. Brunnermeier, M., 2009. Deciphering the liquidity and credit crunch 2007–2008. Journal of Economic Perspectives 23, 77–100. Brunnermeier, M., Pedersen, L.H., 2009. Market liquidity and funding liquidity. Review of Financial Studies 22, 2201–2238. Caballero, R., Krishnamurthy, A., 2008. Collective risk management in a flight to quality episode. Journal of Finance 63, 2195–2230. Conley, T.G., 1999. GMM estimation with cross sectional dependence. Journal of Econometrics 92, 1–45. Diamond, D., Dybvig, P., 1983. Bank runs, deposit insurance, and liquidity. Journal of Political Economy 91, 401–419. ¨ Embrechts, P., Kluppelberg, C., Mikosch, T., 1997. Modelling Extremal Events for Insurance and Finance. Springer, New York. Fama, E., 1965. Portfolio analysis in a stable Paretian market. Management Science 11, 404–419. Feller, W., 1970. An Introduction to Probability Theory and its Applications, second ed., vol. 2. Wiley, New York. Flannery, M., 2005. No pain, no gain: effecting market discipline via ‘reverse convertible debentures’. In: Scott, H. (Ed.), Capital Adequacy Beyond Basel: Banking, Securities, and Insurance. Oxford University Press, Oxford, pp. 171–197 (Chapter 5). Froot, K.A., Scharfstein, D., Stein, J., 1993. Risk management: coordinating corporate investment and financing policies. Journal of Finance 48, 1629–1658. Froot, K.A., Stein, J., 1998. Risk management, capital budgeting, and capital structure policy for financial institutions: an integrated approach. Journal of Financial Economics 47, 55–82. Gabaix, X., 2009. Power laws in economics and finance. Annual Review of Economics 1, 255–293. Gabaix, X., Gopikrishnan, P., Plerou, V., Stanley, H.E., 2006. Institutional investors and stock market volatility. Quarterly Journal of Economics 121, 461–504. Gordy, M.B., 2000. Credit VaR and risk-bucket capital rules: a reconciliation. Federal Reserve Bank of Chicago May, 406–417. Gordy, M.B., 2003. A risk-factor based model foundation for ratings-based bank capital rules. Journal of Financial Intermediation 12, 199–232.
Gray, R.M., 2006. Toeplitz and circulant matrices: a review. Foundations and Trends in Communications and Information Theory 2, 155–239. ¨ Guillaume, D., Dacorogna, M., Dave´, M., Muller, U., Olsen, R., Pictet, O., 1997. From the bird’s eye to the microscope: a survey of new stylized facts of the intra-daily foreign exchange markets. Finance and Stochastics 1, 95–129. Horn, R.A., Johnson, C.R., 1990. Matrix Analysis. Cambridge University Press, Cambridge (Corrected reprint of the 1985 original). Ibragimov, R., 2009a. Heavy-tailed densities. In: Durlauf, S.N., Blume, L.E. (Eds.), The New Palgrave Dictionary of Economics Online. Palgrave Macmillan. Ibragimov, R., 2009b. Portfolio diversification and value at risk under thick-tailedness. Quantitative Finance 9, 565–580. Ibragimov, R., Jaffee, D., Walden, J., 2008. Insurance equilibrium with monoline and multiline insurers. Unpublished working paper, University of California at Berkeley. Ibragimov, R., Jaffee, D., Walden, J., 2009. Nondiversification traps in catastophe insurance markets. The Review of Financial Studies 22, 959–993. Ibragimov, R., Jaffee, D., Walden, J., 2010. Pricing and capital allocation for multiline insurance firms. Journal of Risk and Insurance 77 (3), 551–578. Ibragimov, R., Walden, J., 2007. The limits of diversification when losses may be large. Journal of Banking and Finance 31, 2551–2569. Jaffee, D., 2006. Monoline restrictions, with applications to mortgage insurance and title insurance. Review of Industrial Organization 28, 88–108. Jaffee, D., 2009. Monoline regulations to control the systemic risk created by investment banks and GSEs. The B.E. Journal of Economic Analysis & Policy 9. Available at: /http://www.bepress.com/bejeap/vol9/iss3/ art17S. Jansen, D.W., de Vries, C.G., 1991. On the frequency of large stock returns: putting booms and busts into perspective. Review of Economics and Statistics 73, 18–32. Kashyap, A., Rajan, R., Stein, J., 2008. Rethinking capital regulation. Presented at Federal Reserve Bank of Kansas City Symposium. Available at: /http://www.kc.frb.org/publicat/sympos/2008/Kashya pRajanStein.03.12.09.pdfS. Kodes, L., Pritsker, M., 2002. A rational expectations model of financial contagion. Journal of Finance 57, 769–799. Kyle, A., Xiong, W., 2001. Contagion as a wealth effect. Journal of Finance 56, 1401–1440. Loretan, M., Phillips, P.C.B., 1994. Testing the covariance stationarity of heavy-tailed time series. Journal of Empirical Finance 1, 211–248. Lux, T., 1998. The socio-economic dynamics of speculative markets: interacting agents, chaos and the fat tails of return distributions. Journal of Economic Behavior & Organization 33, 143–165. Mandelbrot, B., 1997. Fractals and Scaling in Finance. In: Discontinuity Concentration Risk. Springer-Verlag, New York. Nagaev, S.V., 1979. Large deviations of sums of independent random variables. Annals of Probability 7, 745–789. Rachev, S.T., Menn, C., Fabozzi, F.J., 2005. Fat-tailed and Skewed Asset Return Distributions: Implications for Risk Management, Portfolio Selection, and Option Pricing. Wiley, Hoboken, NJ. Rochet, J.C., Tirole, J., 1996. Interbank lending and systemic risk. Journal of Money, Credit, and Banking 28, 762–773. Ross, S.A., 1976. A note on a paradox in portfolio theory. Unpublished Mimeo, University of Pennsylvania. Samuelson, P.A., 1967. General proof that diversification pays. The Journal of Financial and Quantitative Analysis 2, 1–13. Shaffer, S., 1994. Pooling intensifies joint failure risk. Research in Financial Services 6, 249–280. Volcker, P.A., 2010. Statement for the committee on banking, housing, and urban affairs of the United States Senate, February 2, 2010. Available at: /http:banking.senate.gov/public/index.cfm?FuseAction= Files.View&FileStore_id=ec787c56-dbd2-4498-bbbd-ddd23b58c1c4S. Wagner, W., 2010. Diversification at financial institutions and systemic crises. Journal of Financial Intermediation 19, 373–386.
Journal of Financial Economics 99 (2011) 349–364
Contents lists available at ScienceDirect
Journal of Financial Economics journal homepage: www.elsevier.com/locate/jfec
Frequent issuers’ influence on long-run post-issuance returns$ Matthew T. Billett a, Mark J. Flannery b,n, Jon A. Garfinkel a a b
Henry B. Tippie College of Business, University of Iowa, United States Warrington College of Business, University of Florida, United States
a r t i c l e in fo
abstract
Article history: Received 16 July 2009 Received in revised form 10 February 2010 Accepted 28 April 2010 Available online 29 September 2010
Prior studies conclude that firms’ equity underperforms following many individual sorts of external financing. These conclusions naturally raise significant questions about market efficiency and/or about the techniques used to measure long-run ‘‘abnormal returns.’’ Rather than concentrating on a single security type or issuance, we examine long-run performance following any and all sorts of security issuances. Initial financing events do not associate with underperformance; however, subsequent financings do. Our results suggest that negative post-issuance returns have nothing to do with the specific type of security issued, and everything to do with the number of types of securities issued. & 2010 Elsevier B.V. All rights reserved.
JEL classification: G14 G32 Keywords: Security issuance Long-run performance
1. Introduction A substantial literature concludes that a firm’s decision to raise external funds is followed by negative long-run abnormal stock returns. Published results include an estimated 5.4% mean annual abnormal return in the five years following a seasoned equity offering (Spiess and Affleck-Graves, 1995), 3.0% per year following public debt issues (Spiess and Affleck-Graves, 1999), 5% per year following a bank loan (Billett, Flannery, and Garfinkel, 2006), and 8.7% following a private equity placement (Hertzel, Lemmon, Linck, and Rees, 2002).1
$ We thank Stas Nikolova and Brandon Lockhart for research assistance, and Jay Ritter for IPO data. We thank Charlie Hadlock, seminar participants at Florida International University, Iowa State University, Massey University, Victoria University Wellington and University of Auckland, and especially an anonymous referee for helpful comments. Remaining errors are our own. n Corresponding author. Tel.: + 1 352 392 3184; fax: +1 352 392 0301. E-mail addresses:
[email protected] (M.T. Billett), flannery@ufl.edu (M.J. Flannery), jon-garfi
[email protected] (J.A. Garfinkel). 1 Ritter (2003) provides a nice summary.
0304-405X/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jfineco.2010.09.009
Initial public offerings (IPOs) were also followed by severe underperformance [nearly 9% per year for three years, according to Ritter (1991)], although this effect has disappeared from the more recent data (Ritter, 2003). These studies span most forms of external finance, including both public and private debt and public and private equity. Some researchers argue that overvaluation and market inefficiency may explain this phenomenon: if firms tend to issue securities when outsiders are inappropriately bullish on the firm, shares inevitably underperform. On the other hand, Fama (1998) concludes that the performance models generating these conclusions are flawed. Here, we investigate a third possibility. Existing studies evaluate a single type of external claim issuance without controlling for the sample firms’ other financing activities. For example, if a firm both issues seasoned equity and borrows from a bank within the analysis window, a researcher studying seasoned equity issues would fail to observe the bank loan while a researcher studying bank loans would not observe the seasoned-equity offering (SEO). The same firm thus affects both studies’ conclusions, and a relatively small number of serial-issuers may
350
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
disproportionately influence the conclusions from several studies of individual security types. Moreover, additional financing events may reflect special features of the issuing firms, not the issuance of external claims, per se. Previous ‘‘security-specific’’ studies thus potentially suffer from an omitted variable problem because firms returning repeatedly to the market may be quite different from those that seek external finance infrequently. Indeed, we find that a subsample of frequent issuers causes a large amount of the underperformance following security issuances. To isolate the effect of these frequent issuers, we evaluate firms’ long-run equity performance following five types of external financing events investigated in the prior literature: IPOs, SEOs, public debt issues (PD), bank loans (BL), and private equity issues (PVEQ). Unlike previous studies, we control for both the issuing frequency and the number of claim types issued. We pay special attention to firm-event months that likely occur in multiple studies—firms that issue two or more different types of securities within a three-year period. We use three distinct methodologies to compute expected long-run stock returns. First, we estimate Fama–MacBeth (1973) regressions for each (monthly) cross-section of realized returns, controlling for ex ante firm characteristics and securities issuance. Because some firm characteristics have been shown to predict security returns (Fama and French, 2008), we control for a wide variety of firm characteristics in our regressions assessing whether security issuers suffer negative long-run stock returns. Second, we assess the long-run returns to security issuers using the three-factor model of Fama and French (1993), augmented by Carhart’s (1997) momentum factor. Finally, we identify a variety of ‘‘peer’’ firms for each issuing firm and evaluate the buy-and-hold abnormal returns (BHARs) associated with various types of securities issuance. All three methodologies yield similar conclusions. We make several discoveries. First, multiple-type security issuances are not terribly rare events. This makes the omitted variable problem potentially important for previous studies of security issuance. Using a 36-month post-financing window, multiple-type issuers account for 34.3% of the firm-months following security issuance. In other words, a non-trivial fraction of economically important post-issuance firm-months have been overlooked by other studies. Second, significant equity underperformance does not follow the issuance of any single security type when the regression controls for multiple issuances and ex ante firm characteristics. Indeed, public debt issuance is followed by small, positive abnormal returns: 19 basis points (bps) monthly (t =1.89). In other words, our results indicate that external finance is not bad, per se. Finally, substantial underperformance follows the issuance of multiple security types. For example, a firm issuing three different security types (say IPO, bank loan, and SEO) within a 36-month window significantly underperforms by 42 bps per month (4.9% annually) over the subsequent three years. Four different security type issuances within 36 months elicits monthly underperformance of 153 bps (16.9% per year).
The remainder of the paper is organized as follows: Section 2 describes our data. We explain our variables for describing a firm’s external financing activity in Section 3. Section 4 investigates the association between securities issuance and firm characteristics, where we see that firms issuing multiple types of securities exhibit different ex ante characteristics than other firms. Section 5 describes our long-run performance measurement techniques. Section 6 presents results on the relationship between financing and stock returns. The final section concludes. 2. Data Our base sample begins with firms listed on both the Center for Research in Security Prices (CRSP) and Compustat. We include all firm-months for U.S. firms, excluding financials and utilities, with valid CRSP returns and positive book equity on Compustat at the preceding fiscal year-end. The resulting panel includes 1,007,902 firm-month observations between January 1983 and December 2005. We augment this basic CRSP/Compustat sample with data about five distinct types of security issuances during the period 1980–2005. Securities Data Corporation’s (SDC) new issues database provides information about seasoned equity offerings (SEO), private equity (PVEQ) offerings, and public debt offerings (PD).2 Jay Ritter graciously provided access to his IPO database. We obtain a sample of bank loans (BL) from two sources. We begin with data from Billett, Flannery, and Garfinkel (1995), who collected bank loan announcements using a keyword search of news stories during the calendar years 1980 through 1989. The sample includes 1,468 announced loan agreements between nonfinancial borrowers and bank or nonbank lenders. We augment this sample with 16,686 additional loans contained in the Loan Pricing Corporation (LPC) database from 1988 through 2005.3 We include all IPOs in our final sample, regardless of their size. For other security types, we omit issuances that raised less than 5% of the prior fiscal year-end’s market value of equity. This restriction is consistent with the prior literature examining long-run performance following external financing events. We aggregate all ‘‘samevehicle’’ financings (e.g., all SEOs) in a month to ascertain whether that issue-month meets the 5% threshold. Although our securities issuance data begin in 1980, we begin our analysis of post-issuance returns in January of 1983 to ensure that we have a complete three-year financing history. For example, an unobserved bank loan in 1979 might influence some of the 36 monthly returns following a 1980 SEO. Correspondingly, we end our returns analysis in 2005 because this is the last full year 2 In our reported results, convertible debt issues are classified with other forms of ‘‘straight’’ public debt, but classifying convertible debt as equity yields similar overall results. 3 LPC’s DealScan distinguishes between a loan ‘‘facility’’ and a loan ‘‘deal,’’ which may include multiple facilities. Each of our events is a ‘‘deal.’’ Furthermore, some of these loan agreements may be negotiated with non-bank lenders. For brevity, we refer to all these transactions as ‘‘bank loans’’ (BL).
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
for which we have complete financing data.4 We measure stock returns using CRSP’s monthly returns January 1983 through December 2005 (276 months). Given the documented influence of various firm characteristics on realized returns (see, for example, Fama and French, 2008; Cooper, Gulen, and Schill, 2008) and the fact that we find significant differences in the characteristics of the firms that engage in multiple financings, we include a variety of firm variables as controls in our tests. These variables are defined in Section 4 below.
351
Fig. 1. The ‘‘fixed-length window’’ approach to defining issuance events. The timeline below runs from 36 months prior to the financing month through 48 months following the event month t. XF1 and XF2 are dummy variables. XF1 equals one over the 36 months following the first financing event and zero otherwise. XF2 equals one over the 36 months following the second financing event which occurs in month t + 12. All ‘‘tick-marks’’ on the time line denote the end of the month.
3. Measuring external financing patterns 3.1. Fixed-length windows The null hypothesis in all financing event studies is that an issuing firm’s equity returns are not unusual in the months following its issuance event. We therefore construct dummy variables to identify the months following financing events (the ‘‘post-event window’’). These dummies are designed to pick up the effect on returns of financing events. With isolated security issuances, these dummy variables are straightforward to construct. We define five separate dummy variables (BL, IPO, SEO, PD, and PVEQ) equal to unity for the 36 months following issuance of the indicated type of security. These dummies allow us to replicate the results from prior studies of postissuance returns. To identify firm-months related to multiple security issuances, we construct additional dummy variables indicating the number of types of securities issued and multiple issuances of the same type.5 We employ two alternative methods to specify how the months following multiple security issuances might affect post-issuance returns. Our ‘‘fixed-length window’’ defines the postfinancing window to be the 36 months following the financing, regardless of whether other financing occurs within that time period. For our ‘‘variable-length window,’’ the post-financing window extends from the month following the financing until the sooner of 36 months or the occurrence of a subsequent financing event. Each approach offers a way to control for the overlap between two (or three or four) different financings’ post-event 36-month windows. The fixed-length window is conducive to measuring the effect of subsequent financings on returns in a Fama/MacBeth methodology. The variablelength window is conducive to measuring these effects using the Fama/French and BHAR methods. We report results based on both approaches, which yield similar conclusions.
4 In other words, if we study an SEO in 2005 and wish to measure returns into 2008, we risk not attributing some of those returns to a bank loan that occurs in 2006, which we have not observed. 5 Although combining different types of financing into a smaller set of variables may conceal some relevant information, identifying all possible financing combinations would be very unwieldy. We did explore certain combinations and orderings (such as switches between debt and equity or between public and private issuances), but found no obvious distinction from our multiple ‘‘number of security types’’ categorizations.
Fig. 1 illustrates the financing history for a firm that issues two types of securities, for example a bank loan during month t, and an SEO during month t +12. A fixedlength event window defines the post-event period to be the 36 months following a specific financing event, regardless of what additional financing events occur during that window. In this case, we define XF1 (short for the eXternal Finance event #1) to equal unity for each of the 36 months following a firm’s issuance of any single security type, provided that there was no other different type of external financing during the preceding 36 months. As shown in Fig. 1, XF1 equals unity for the interval [t + 1, t +36]. A second security type issued in month [t + 12] makes XF2 (short for eXternal Financing event #2) equal to unity for each of the next 36 months, [t + 13, t +48]. XF2 thus indicates ‘‘a second type of security was issued within 36 months of the first type.’’ XF3 and XF4 are defined analogously.6 The above variables account for the issuance of multiple security types. To account for repeat issuances of the same security type, we define Repeat equal to unity for each of the 36 months following a firm’s second (third, etc.) issuance of the same security type, provided that (1) no different security type was issued in between, and (2) the second issuance was within 36 months of the first. 7 For example, if the second security issuance in Fig. 1 were also a bank loan, Repeat would equal unity for the interval [t+13, t+48]. Defining fixed-length event windows has the advantage that all windows cover the same interval of equity returns, which conforms to the literature on post-financing event performance. Also, the fixed-length window allows us to see the economic effects of multi-type financing through the coefficients on dummies in the Fama/MacBeth regression tests, which simultaneously control for many firm characteristics known to influence ex post returns.
6 If a third (fourth) different type of external finance was issued in month t + 15 (t + 20), XF3 (XF4) would equal one between t +16 and t + 51 (t+ 21 and t+ 56). There are no instances in our data of five different types of external finance issued by a firm within 36 months. 7 Prior studies differ in their treatment of multiple issuances of the same security type: some authors include all issuances while others include only the first or last transaction within their measurement window.
352
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
On the other hand, this measurement scheme suffers from two related disadvantages. First, the indirect effect of the first financing (at t) may last quite a long time. In Fig. 1, the BL is specified to affect returns for 48 months (given its effect carries to t=48 via XF2). Had the bank loan preceded the SEO by a longer time (up to 35 months), the direct and indirect effects of the BL could have been specified to last up to 71 months.8 Second, defining fixed-length event windows implies that a subsequent event’s effect on equity returns is the same whether the second financing event was one month or 36 months after the first one. Yet an immediate return to capital markets seems to imply different conditions than a return after nearly three years. Econometrically, fixed-length windows can also complicate interpretation of some estimated coefficients. For example, the total effect of financing during the interval [t+13, t+36] equals the sum of the coefficients on XF1 and XF2. Another concern with the fixed-length window is how to implement it for the portfolios required to test abnormal returns using the factor-based and buy-and-hold return calculations. In a given month, one firm might belong to two (or more) portfolios, as in the months between t+12 and t+36 in Fig. 1. Given these concerns, we also explore definitions of financing events based on a variable-length window approach. 3.2. Variable-length windows We alternatively define dummy variables using a variable-length window from the month following a financing event to the earlier of either the month of the next financing event or 36 months. This variable-length window directly removes the effect of overlapping months (i.e., months that are within 36 months of multiple financing events) from the initial financing window and attributes the effect of these overlapping months to the subsequent financing window. This dummy variable definition is illustrated in Fig. 2, which is based on the same financing pattern as in Fig. 1: a BL at time t and an SEO at t+ 12. Between t + 1 and t +12 (inclusive), the firm had only one sort of external financing within the past 36 months, so XF1= 1 and all other dummy variables equal zero. Starting at the end of t +12, the firm had two different financing events within the past 36 months, so we set the dummy variable for this pattern (XF2) equal to unity and XF1= 0. In other words, the 36-month windows following the two different financing events ‘‘overlap’’ for 24 months starting at t + 13. At the end of month t+ 36, the bank borrowing date passes out of the trailing period. For the subsequent 12 months [t +37, t +48], the firm is again categorized as having only one type of financing during the prior 36-month period, so again XF1= 1.9 With a variable-length window definition, no event affects abnormal returns for more than 36 months, even 8 The direct effect would have been from [t, t +36], and the indirect effect from [t +36, t+ 71]. 9 If third and fourth different types of external finance were issued in months t +15 and t+ 20, the dummies would take the following forms: XF1 = 1 in [t+ 1, t +12], [t + 52, t+ 56]; XF2 = 1 in [t+ 13, t +15], [t + 49, t+ 51]; XF3 = 1 in [t + 16, t +20], [t+ 37, t + 48]; XF4 =1 in [t + 21, t+ 36].
Fig. 2. The ‘‘variable-length window’’ approach to defining issuance events. The timeline below runs from 36 months prior to the financing month through 48 months following the event month t. XF1 and XF2 are dummy variables. XF1 equals one over the 12 months following the first financing event and ending at the month of the second financing event. XF2 then equals one over the 24 months following the month of the second financing event (t + 12). After 36 months from the first financing event, XF2 reverts to zero and XF1 becomes 1. All ‘‘tick-marks’’ on the time line denote the end of the month.
indirectly. Moreover, a firm has only one financing dummy turned ‘‘on’’ at any point in time. The estimated coefficient on XF1 therefore measures the ex post return effect of a single type of financing event within the preceding 36 months. XF2 measures the effect of two different financing types during the window over which their post-event periods overlap. In sum, the fixed-length and the variable-length definitions of return effects each offer some advantages. Fixedlength windows facilitate comparison with prior studies of security issuance, and provide a clear picture of the economic effect of subsequent financing. However, the indirect effect of the first of several types of security issuances can be protracted. The variable-length window approach limits all financing event effects to 36 months, and it categorizes each firm-month with a unique financing dummy, as required by the factor-based and BHAR methods. However, it reduces our comparability with previous studies, which all use a fixed-length window. Fortunately, the implications are very similar for both approaches. 3.3. Financing event statistics Table 1 describes the incidence of different financing events. Panel A describes the number of different types of financings for the entire sample of firm-months. More than half of the firm-months (55.58%) are associated with no external financing activity within the preceding three years. The remaining 44.42% of firm-months are comprised as follows: 24.25% associate with a single financing event and 4.94% follow serial issues (two or more) of the same type of security. The next three rows in Panel A indicate that 15.22% of all firm-months follow the issuance of more than one security type within a 36-month period.10 Put another way, more than one-third of all the post-financing months (15.22% out of 44.42%) follow multiple financing types, indicating that prior single-security studies of financing events have omitted potentially important information for a substantial portion of their sample. 10 Only a small fraction of firm-months following external financings are associated with either three (XF3 = 1) or four (XF4= 1) different types of finance. However, we shall see below that these events have large economic effects on computed ex post returns.
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
353
Table 1 Incidence of different forms of financing. Percent of firm-months with dummy variable= 1 for ‘‘number of different types of external finance.’’ Dummy variables defined based on the fixedlength window definition as follows: No external financing equals one in all months that are bereft of any external financing within the prior 36 months. Dummy for single-category financing (XF1) equals one in each of 36 months following the first of any sequence of (one or more) different-type external financings. Two types (XF2) of external finance (dummy) equals one in each of 36 months following the second of any sequence of two or more differenttype external financings. Three types (XF3) of external finance (dummy) equals one in each of 36 months following the third of any sequence of three or more different-type external financings. Four types (XF4) of external finance (dummy) equals one in each of 36 months following the fourth in the sequence of four different-type external financings. Category Panel A: Entire sample No external financing Single-category financing (XF1 = 1) Two or more similar securities (Repeat = 1) Two different types of external finance (XF2 = 1) Three different types of external finance (XF3 = 1) Four different types of external finance (XF4 = 1)
Total
No. firmsa
3,494 3,715 377 3,309 535 28
11,458
% of total firm-months
55.58 24.25 4.94 13.58 1.57 0.07
100
Number of firm-months
560,221 244,457 49,774 136,912 15,835 703
1,007,902
Panel B: Security types of subsample using external finance
BL IPO SEO PD PVEQ
Total Category
3,906 5,150 2,942 1,215 206
49.02 27.32 29.22 15.44 1.63
195,037 108,692 116,249 61,426 6,473
100.00
487,877
% of firm-months
Number of firm-months
Panel C: Overlap among multiple-type security issuers Overlapping months within XF2= 1 Overlap of BL, IPO Overlap of BL, SEO Overlap of BL, PD Overlap of BL, PVEQ Overlap of IPO, SEO Overlap of IPO, PD Overlap of IPO, PVEQ Overlap of PD, SEO Overlap of PD, PVEQ Overlap of SEO, PVEQ
73.00 10.87 21.61 14.38 0.69 11.47 2.51 0.29 9.67 0.50 1.01
99,945 14,882 29,582 19,684 945 15,709 3,443 402 13,238 683 1,377
Three different types of external finance (XF3 = 1) Overlap of BL, IPO, PD Overlap of BL, IPO, SEO Overlap of BL, IPO, PVEQ Overlap of BL, PVEQ, PD Overlap of BL, PVEQ, SEO Overlap of BL, PD, SEO Overlap of IPO, PD, PVEQ Overlap of IPO, PD, SEO Overlap of IPO, SEO, PVEQ Overlap of PD, SEO, PVEQ
65.17 4.01 19.70 0.28 0.87 1.77 30.64 0.09 6.23 0.50 1.07
10,320 635 3,119 44 138 281 4,852 15 987 79 170
Four different types of external finance (XF4 = 1) Overlap of IPO, SEO, PD, PVEQ Overlap of BL, SEO, PD, PVEQ Overlap of BL, IPO, SEO, PVEQ Overlap of BL, IPO, PD, PVEQ Overlap of BL, IPO, SEO, PD
45.09 0.00 3.70 2.42 0.71 38.26
317 0 26 17 5 269
a
Firm total exceeds sample population because some firms issued multiple securities within a 36-month window.
354
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
Panel B examines the distribution of financing events across security types. Bank loans (BL) account for almost half of the firm-months in our post-financing sample. IPOs and SEOs account for 27% and 29%, respectively. Public debt issuance associates with 15% and the remaining 1.6% is attributable to private equity. Panel C provides further information about the potential importance of multiple financings for previous, single-security studies. Of the 136,912 firm-months where XF2=1, 99,945 (73%) occurred within 36 months of an initial financing event. A singlesecurity study would not have controlled for the second issuance in these months. Similarly, we see 65% and 45% of the firm-months associated with XF3= 1 and XF4 =1 overlap with the initial issue’s 36-month, post-finance window. 4. Financing events and firm characteristics In assessing the long-run return effect of securities issuance, we need to control for firm characteristics that prior literature has shown to affect returns, but that may also be correlated with security issuances.11 While most studies generally control for size and book-to-market (B/M), recent work also finds that growth, financial distress, earnings management, and other characteristics associate with future long-run returns (see more detailed discussion below). We therefore begin our analysis by assessing the extent to which a firm’s ex ante characteristics correlate with its subsequent securities issuance. We rely heavily on Fama and French (2008) to identify firm characteristics that have been linked to abnormal long-run equity returns. We divide the Fama–French firm characteristics (and a few additional characteristics) into three groups: growth/investment, financial condition, and traditional firm characteristics.12 Our methodological approach is described in greater detail below (Section 4.4), but we summarize it here. For each individual firm characteristic, we regress that characteristic on dummy variables identifying the subsequent three years’ financing behavior. The coefficients on these dummies illustrate whether future financing behavior is tied to current firm characteristics. 4.1. Traditional characteristics Many previous studies have concluded that stock returns are reliably affected by: Size: The natural log of the firm’s equity market value (Compustat [data199 data25]). 11 For example, security issuers may suffer from managerial tendencies to overinvest or they might more commonly issue overvalued securities. 12 Fama and French’s (2008) seven ‘‘anomalies’’ (size, value, profitability, growth, accruals, momentum, and net stock issues) all ‘‘seem to have unique information about future returns’’ (p. 1675). We include all of these variables as controls except for net stock issues, for which we control via our financing dummy variables. All characteristics for the overall sample of both issuers and non-issuers are windsorized at the 1st and 99th percentiles.
B/M: Book-to-market equity ratio: Book value of equity (Compustat [data60]), divided by its market value (Compustat [data199 data25]). Momentum: The cumulative raw return on the firm’s stock over the 12 months of the firm’s preceding fiscal year. Returns are from CRSP. (see Jegadeesh and Titman, 1993; Chopra, Lakonishok, and Ritter, 1992).
4.2. Growth and investment characteristics Cooper, Gulen, and Schill (2008) conclude that asset growth is negatively related to subsequent equity returns. Titman, Wei, and Xie (2004) show that firms with surprisingly large capital expenditures subsequently underperform, consistent with their hypothesis that agency problems permit some managers to ‘‘empirebuild’’ (see also Pontiff and Woodgate, 2008; Richardson and Sloan, 2003). Lower stock returns might also follow investments that constitute exercise of a real (growth) option: converting the option into a physical project delevers the firm, which naturally lowers the expected stock return (Carlson, Fisher, and Giammarino, 2006). Eberhart, Maxwell, and Siddique (2004) take a complementary view of investment by arguing that research and development (R&D) spending generates growth options whose higher effective leverage causes the observed positive abnormal returns following R&D expansions. To investigate whether a firm’s investment behavior is correlated with its subsequent financing strategies, we collect the following firm growth and investment characteristics: TA_g: Lagged growth in total assets, defined as Compustat [data6(t 1) data6(t2)]/data6(t 2). This is exactly the calculation approach in Cooper, Gulen, and Schill. (2008). CAPEX: Capital expenditures divided by total assets, defined as Compustat [data128/data6]. CAPEX is a component of Cooper, Gulen, and Schill (2008) aggregate growth measure. Although they conclude that the total asset growth variable is more informative than any of its components, we include it due to the findings of Titman, Wei, and Xie (2004). CAPEX_g: The forward constructed percentage change in the ratio of capital expenditures to assets, defined as CAPEX(t + 1)/CAPEX(t) 1. Note the timing of this variable is unique in that it is measured over the year following the fiscal year in question (year t + 1). It is designed to pick up the de-levering of a growth option, in line with Carlson, Fisher, and Giammarino (2006). This will make it an important control in returns tests. Thus, we examine its link with financing here. R&D: Defined as expenditures on research and development divided by total assets. Compustat [data46/ data6]. Missing data46 values are set to zero. Q: Tobin’s Q, defined as total assets minus book equity plus market value of equity, all divided by total assets (Compustat [data6 data60+(data25 data199)]/data6).
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
4.3. Financial condition characteristics Some firms returning to external capital markets to issue a variety of security types may be financially distressed, which tends to predict lower subsequent equity returns. One measure of financial distress is the firm’s Z-score (Denis and Mihov, 2003; Altman, 1977). High leverage, low cash flow, and low cash holdings are also potential indicators of financial distress. Discretionary accruals have been shown to explain anomalous postissuance returns for IPOs and SEOs (Teoh, Welch, and Wong, 1998a, 1998b).13 We represent potential financial distress with the following five variables: Cash: Cash and marketable securities divided by total assets (Compustat [data1/data6]). Leverage: Debt in current liabilities plus long-term debt, all divided by total assets (Compustat [data34 + data9]/data6). Low Z: An indicator variable equal to unity if the firm’s Z-score is less than 1.81, which is a critical value for predicting failure. Accruals: Discretionary accruals calculated using the modified Jones (1991) model of Dechow, Sloan, and Sweeney (1995). OIBD: Operating income before depreciation divided by total assets (Compustat [data13/data6]).
4.4. Results We regress each of the above fiscal-year-end characteristics on dummy variables describing the firm’s external financing events over the subsequent 36 months14: Zjt ¼ a0 þ
4 X
bk XF_bkj :
k¼1
Zj is any one of firm j’s characteristics listed above, measured at the end of any fiscal year t.15 The XF_bkj dummies are similar to the fixed length XF dummies, but we attach a ‘‘_b’’ to reflect the following difference: they measure the total number (k) of different 13 Some firms use discretionary accounting accruals to enhance their reported earnings. Eventually, however, the firm runs out of positive accruals and reported income subsequently falls. 14 Each regression is a panel regression adjusted with Rogers’ standard errors to account for the residual dependence created by a firm-specific effect (see Petersen, 2009). 15 The size and B/M variables require some timing assumptions to link the CRSP and Compustat data. We follow Fama and French (1992) in calculating the ex ante size as CRSP’s market value of equity in June of year t, where returns are from July of year t through June of year t + 1. For book value of equity, we use Compustat’s fiscal year-end book equity [data60], and we ensure that it precedes the monthly stock return by at least six months (Fama and French, 1992). We scale that book equity by market equity from December of year t 1 (Fama and French, 1992). For IPO transactions, we have no ‘‘ex ante’’ market value. We therefore measure firm size for IPO financings as the firm’s market value at the close of the first day of trading. Also for IPO firms, book-to-market equity uses the first available Compustat measure of book equity, which may either precede or follow the IPO date.
355
security types of external financings that occur over the 36 months following the end of year t. It is a simple count and either one, two, three, or four different financings can occur within 36 months of the characteristic date. Table 2 presents the results. The dependent variables in Panel A are the firm’s industry-adjusted characteristics (net of the two-digit SIC code median characteristic). Panel B presents regression results for the unadjusted firm characteristics. We discuss primarily the results from Panel A, although the results in Panel B are basically consistent. Columns 1–5 report coefficient estimates (bk) for growth-related variables. Cooper, Gulen, and Schill (2008) find that a firm’s asset growth is negatively correlated with its subsequent stock returns. For the asset growth measure, TA_g, the coefficients on the future financing dummy variables are all negative and significant. This suggests that prior to financing, asset growth is abnormally low. Given that high asset growth has been shown to have a negative relation to future returns and that the issuers of multiple types of securities have lower TA_g, this asset growth channel seems unlikely to explain the underperformance of multiple issuers. Despite their low rate of asset growth, multiple issuers’ capital expenditures are not correspondingly low. In fact, CAPEX is significantly greater for firms that subsequently issue multiple security types. (The forward growth in CAPEX (CAPEX_g) is unrelated to subsequent financing.) Interestingly, when we look at investment opportunities, proxied by Q, we find future financing activity associates with lower ex ante Q, raising the possibility that multipletype issuing firms were overinvesting (Titman, Wei, and Xie, 2004). The fifth column of Table 5 examines another sort of investment, R&D expenditures. Single-type issuers (XF_b1 = 1) exhibit greater R&D expense than non-issuers, but multiple claim-type issuers (XF_b2 = 1, XF_b3 = 1, XF_b4 = 1) spend less on R&D. As we move from two to four issue types, the coefficients become ever more negative, suggesting that R&D is less important for the multiple claim-type financing firms. Given that Eberhart, Maxwell, and Siddique (2004) find high returns following large R&D, these low levels of R&D could associate with lower future returns. We next examine indicators of the firm’s financial condition in columns 6–10. For the Cash specification, the ratio of cash-to-assets decreases as the diversity of future external finance activity increases, perhaps suggesting that low internal funds partially motivate the future issuances. Leverage is increasing in future external finance activity, consistent with a need to deleverage and/or a higher likelihood of financial distress. The Low Z tests are only conducted for Panel B, given it is constructed as a dummy variable (Z o1.81). It seems the multi-type issuers are more likely to be distressed than the singletype issuers. Accruals are, if anything, lower for firms that subsequently issue multiple securities, suggesting that they may have exhausted their ability to enhance reported income through discretionary accruals. Multitype issuers have significantly higher cash flows (OIBD), suggesting a greater ability to at least meet debtholders’ subsequent cash-flow requirements.
356
Growth indicators TA_g 1
Leverage 7
Low Z (logit) 8
Accruals 9
OIBD 10
B/M 11
Size 12
Momentum 13
Panel A: Dependent variable= firm characteristics, relative to industry median values Mean 0.0861 0.1907 0.0185 0.5871 0.0235 Std dev 0.9498 1.5676 0.0837 4.3199 0.1666 Intercept 0.1015c 0.1875c 0.0153c 0.6281c 0.0231c XF_b1 0.0531c 0.0111 0.0110c 0.1433c 0.0031b XF_b2 0.1107c 0.0151 0.0208c 0.3658c 0.0017 XF_b3 0.0757a 0.0639 0.0267c 0.4124c 0.0171c XF_b4 0.5025c 0.2434 0.0601c 0.6941c 0.0245c
0.0602 0.1952 0.0685c 0.0296c 0.0548c 0.0596c 0.0904c
0.0320 0.1850 0.0226c 0.0311c 0.0696c 0.0950c 0.1980c
N/A N/A N/A N/A N/A N/A N/A
0.0382 5.0761 0.0372a 0.0114 0.0291 0.0242 0.0777b
0.0485 0.4741 0.0594c 0.0441c 0.0463b 0.0702c 0.0514b
0.5059 45.6636 0.6296c 0.5355b 0.6222c 0.6372c 0.5722b
0.0842 2.0455 0.1344c 0.8738c 1.4327c 1.6259c 0.7449
0.1319 0.8732 0.1120c 0.0734c 0.1271c 0.1516c 0.0050
Panel B: Dependent variable= raw firm characteristics, with no industry adjustment 1 2 3 4 5
6
7
8
9
10
11
12
13
Mean Std dev Intercept XF_b1 XF_b2 XF_b3 XF_b4
0.1717 0.2142 0.1840c 0.0425c 0.0854c 0.1013c 0.1264c
0.2255 0.1956 0.2118c 0.0439c 0.1065c 0.1496c 0.2462c
0.1520 0.3591 1.6617c 0.3307c 0.0087 0.4213c 0.5675
0.2223 0.5597 0.2031c 0.0637c 0.1417c 0.1946c 0.4598c
0.6864 0.7030 0.7303c 0.1663c 0.2436c 0.2717c 0.4481c
3.9131 2.5244 3.8082c 0.4201c 0.5271c 0.3029a 2.3564c
0.0160 0.7017 0.0326c 0.0725c 0.0563c 0.0139 0.4578c
0.1630 1.5760 0.1584c 0.0161a 0.0253 0.0743 0.2646
CAPEX 3
0.0740 0.0820 0.0703c 0.0123c 0.0265c 0.0390c 0.0756c
Q 4
1.7879 1.7010 1.8392c 0.1574c 0.4345c 0.5689c 1.1227c
R&D 5
Firm characteristics
Cash 6
0.1868 0.9641 0.2020c 0.0523c 0.1112c 0.0768a 0.5136c
CAPEX_g 2
Financial condition indicators
0.0441 0.0931 0.0445c 0.0015 0.0142c 0.0296c 0.0433c
0.0529 0.2433 0.0405c 0.0470c 0.0710c 0.0697c 0.0768c
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
Table 2 Firm characteristics preceding external financing dummies (36-month window). We regress firms’ fiscal-year-end characteristics on dummy variables describing external financing events over the subsequent 36 months (similar to the fixed-length window definition): XF_bk = unity when the firm issues k types of security over the subsequent 36 months; where k= 1, 2, 3, or 4. Panel A expresses each firm characteristic net of the industry (two-digit SIC code) median value. Panel B uses raw firm characteristics. Firm characteristics are winsorized at the 1st and 99th percentiles. a, b, c indicate significance at 10%, 5%, 1% levels. CAPEX, R&D expenditures, and Cash are all relative to total assets. TA_g is Cooper, Gulen, and Schill’s (2008) measure of asset growth. CAPEX_g is the percentage increase in the ratio of CAPEX-to-assets from the prior year. Tobin’s Q is market-to-book assets. Leverage is long- plus short-term debt divided by assets. OIBD is operating income before depreciation scaled by assets. Size is the natural log of the market value of equity. Momentum is cumulative stock return over the preceding fiscal year. Low Z is a dummy equal to one if the Z-score is less than 1.81 (Denis and Mihov, 2003; Altman, 1977). B/M is book-to-market equity. Accruals are discretionary accruals calculated using the modified Jones method (see Jones, 1991; Dechow, Sloan, and Sweeney, 1995).
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
Columns 11–13 in Table 2 indicate how borrowing firms fit on the scale of three common return predictors: firm value (B/M), Size, and Momentum, which Fama and French (2008) conclude have positive, negative, and positive effects (respectively) on subsequent returns. The conclusion that issuing firms start with significantly lower B/M values indicates that these firms should experience lower subsequent returns, ceteris paribus. Single and multi-issuers’ larger size should also lead to lower returns. Offsetting at least some of these effects is the tendency for multiple issuers to have relatively large stock price runups (as seen in the Momentum column). In sum, numerous statistically significant differences exist in the characteristics of single- versus multi-issuers. Because many of these characteristics have been reported to associate with future returns, we control for all of these characteristics in two of our three types of post-financing return tests.16
5. Measuring long-run performance The literature on measuring long-run stock performance following corporate events is extensive, primarily because accurately measuring ‘‘normal’’ expected returns over long periods of time has proven to be extremely challenging. We present results based on three methodologies for measuring ‘‘normal’’ long-run returns. Two of these methodologies derive from models of the underlying returns: the Fama–MacBeth (1973) method, and the Fama–French (1993) method augmented with Carhart’s (1997) momentum factor. Given the ‘‘bad model’’ critique of long-run returns (Fama, 1998), we also compute buyand-hold abnormal returns (BHARs) to assess robustness.
5.1. Fama–MacBeth (1973) methodology Daniel and Titman (1997) argue that security returns reflect firm characteristics, specifically size and the bookto-market ratio of equity. In this view, abnormal returns manifest themselves as non-zero realized returns after controlling for firm characteristics.17 For each month between January 1983 and December 2005, we estimate a Fama–MacBeth (1973) regression of the form18 ðrjt VWRETDt Þ ¼ a0 þ
4 X
ak ðXFjkt Þ þ gðRepeatjt Þ þ SbJ Zj,t1 þ e~ jt ,
k¼1
ð1Þ
16 We control only for size, book-to-market equity, and momentum in our Fama/French factor portfolio tests. 17 Daniel and Titman (1997) find that firms with similar characteristics but different loadings on the Fama and French (1993) factors exhibit similar returns, although Davis, Fama, and French (2000) contradict that evidence. 18 Petersen (2009) shows that the Fama–MacBeth methodology works well when regression residuals in a given time period are correlated across firms.
357
where rjt is the return to stock j in month t, measured in percentage points. VWRETDt is the return to the CRSP value-weighted index, for month t, measured in percentage points. XFjkt is the set of external finance dummy variables defined above in Section 3. A dummy equals one if in month t, the jth firm had the kth pattern of external financing within the past relevant window. XFjkt =0 otherwise. Repeatjt is a dummy equal to unity for each of the 36 months following a firm’s second (third, etc.) issuance of the same security type, provided that (1) no different security type was issued in between, and (2) the first issuance was within 36 months of the second. Zj,t 1 is a vector of the dependent variables in Table 2, which prior research has associated with future share returns. We measure these variables as of the fiscal year-end prior to the month. Estimated coefficients on the issuance dummy variables (XFjkt and Repeatjt) measure the average contribution to market-adjusted returns during month t, across all firms for which the dummy variable was turned on. We then report the time-series average of the coefficients in (1), and t-statistics computed using the time-series standard deviation of coefficient estimates. 5.2. Fama–French (1993) methodology Fama and French (1993) model equity returns as depending on the firm’s exposure to non-diversifiable factor realizations, such as the market risk premium, the differential return to small vs. large firms, and the differential return to firms with high vs. low book-tomarket ratios. Carhart (1997) shows that momentum provides an additional, significant factor. We use this four-factor model of returns to compute abnormal returns associated with securities issuance. In each month, we form a portfolio of firms with similar recent financing patterns. We use the variablelength post-event window to determine the values for the external financing variables XF1, XF2, XF3, XF4, and Repeat.19 Specifically, the portfolios are formed for each of the XFk = 1 (where k= 1, 2, 3, 4) and for Repeat= 1. We then regress the time series of each portfolio’s monthly excess returns on the four return factors: ðRpt -Rft Þ ¼ a þ bðVWRETDt -Rft Þ þ sSMBt þ hHMLt þ mMOMt þ et ,
ð2Þ where Rpt is the return on the portfolio of sample firms in month t; Rft is the three-month T-bill yield in month t; VWRETDt is the return on the value-weighted index of NYSE, Amex, and Nasdaq stocks in month t; SMBt is the return on small firms minus the return on large firms in month t; and HMLt is the return on high book-to-market stocks minus the return on low book-to-market stocks in month t. MOMt is Carhart’s (1997) momentum factor realization for month t. A significant intercept term in (2) 19 We cannot use fixed-window dummy variables, which often assign multi-issuing firms to more than one portfolio in the same month. The variable-length window controls for subsequent financing behavior by excluding the months associated with the ‘‘next’’ financing event in the ‘‘current’’ financing event’s return window.
358
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
implies that abnormal returns are associated with the event used to assemble the portfolio. Buy-and-hold abnormal return (BHAR) methodology Starting with Ritter (1991), many authors have used peer-adjusted, buy-and-hold abnormal returns (BHARs) to measure long-run performance effects. For each securityissuing firm, a matching peer firm is chosen on the basis of a set of firm characteristics with the notable exception that the peer did not issue securities. Each individual firm’s subsequent holding period return is then calculated as: ! Ti Y HPRj ¼ ð1 þRjt Þ1 100%, t¼1
where Rjt is the jth firm’s stock return on the tth day, and Tj is the number of trading months in the variable-length (up to three-year) window. We use the variable-length window because we cannot include a dummy control for subsequent financings (as we need to do with a fixedlength window) when we are not running a cross-sectional regression. After calculating HPR for each sample firm and for its matching firm, the difference measures the stylized investor’s buy-and-hold abnormal return (BHAR): BHARj ¼ HPREvent HPRPeer j j A positive mean return differential is consistent with the ‘‘Event’’ having a positive effect on the typical event firm’s long-run returns. The value of this approach depends on the quality of its matching process. At one level, the concept that a second firm is ‘‘otherwise equivalent’’ to an issuing firm seems oxymoronic: if two firms are so similar, why did only one raise external funds? Yet Barber and Lyon (1997) report that BHARs based on peer firms with similar market capitalization and equity’s book-to-market ratio perform well in randomized samples. Lyon, Barber, and Tsai (1999) point out that BHAR test statistics may be biased if peer firms are not matched on the basis of all relevant characteristics (such as industry or pre-event returns). They suggest using a variety of alternative peer-choice criteria, to protect against inadvertent conclusions based on excluded, clustered firm characteristics. Despite the potential shortcomings, an advantage of BHARs is that they do not rely on a specific model of security returns, obviating concerns about a ‘‘bad model problem.’’ We therefore compute BHAR returns for a variety of peer definitions. Specifically, we identify a peer firm for each issuer based on size, B/M, and one other firm characteristic from among those listed in Table 2. For each issuing firm, we examine all non-issuing firms in the same size decile of the CRSP-Compustat universe and keep those with an equity market value within 25% of the issuer’s.20 We then sort these firms by their book-tomarket equity ratio and the third matching characteristic (from among the dependent variables in Table 2). We examined all firms in the same decile of each of these two 20 As in our primary sample approach, we exclude financial firms and regulated utilities from our sample of potential peer firms.
characteristics, and chose the one with the lowest sum of absolute percentage differences in size, B/M, and the third characteristic. For some events, our requirement that all three firm characteristics be in the same population decile made it impossible to find a suitable matching firm. The number of matches is reported for each set of matching criteria in Table 6 below. 6. Estimation results 6.1. Fama–MacBeth results We start by replicating the previous literature’s results using Fama–MacBeth regressions, variations of (1), that control for all the ex ante firm characteristics in Table 2 (except Q, which is omitted because of its high correlation with B/M).21 Columns 1–5 in Table 3 report these regression results for each type of security issuance studied in the extant literature, without controlling for subsequent financing. Consistent with previous studies (which did not, however, control for so many firm characteristics), bank loans, SEOs, and private equity exhibit significantly negative abnormal annual returns of approximately -3% to -4% annually over the three years following an issuance event (Spiess and Affleck-Graves, 1995; Billett, Flannery, and Garfinkel, 2006; Hertzel, Lemmon, Linck, and Rees, 2002). IPOs exhibit negative, but statistically insignificant, long-run returns, consistent with the recent literature cited by Ritter (2003). Also consistent with the prior literature, we find no evidence of underperformance associated with public debt issuances.22 The statistically significant control variables in Table 3 generally carry the coefficient signs previously shown in the literature: negative effects for size, momentum, and the growth indicators, and positive effects for B/M, R&D, OIBD. We examine the impact of subsequent financing in Columns 6–15 of Table 3. First, in columns 6–10, we add dummy variables for multiple security issuances defined in the fixed-length window. These dummies capture the overlapping months between multiple security issuance windows. In columns 11–15 we repeat the analysis in columns 1–5, but we compute the issuance-type variables (BL, IPO, SEO, PD, and PVEQ) based on the variable-length window, which removes the effect of multiple financings on the initial security issuance. The results are striking. Regardless of whether we control explicitly for subsequent financing variables (columns 6–10), or separate the first security issuance from the effects of subsequent issues (columns 11–15), we find controlling for subsequent financing eliminates any evidence of underperformance associated with any particular claim type. The coefficients on BL, SEO, and 21 Note that CAPEX_g is not a true conditioning variable because it measures investment (CAPEX) growth over the following year. We include this to control for the increased investment activity that likely follows the financings; however, our results with respect to the influence of the financing dummies are similar when this variable is excluded. 22 Spiess and Affleck-Graves (1999) find that the mean abnormal performance following debt issues is insignificant, although the median performance is significantly negative.
Table 3 Security issuance and subsequent equity performance with and without controls for subsequent financing activity. Table presents time-series averages (over 276 months, January 1983–December 2005) of the coefficients (in percentage points) from monthly cross-sectional regressions of the following form: ðrjt VWRETDt Þ ¼ a0 þ a1 FirstFinancejt þ
4 X
ak ðXFjkt Þ þ gðRepeatjt Þþ
X
bj Zj,t1 þ e~ jt
k¼2
No control for subsequent financings
Control for subsequent financings Fixed-length window XF dummies capture overlap with subsequent financing months
Variable
1
2
3
5
6
7
8
9
10
0.0012
11
12
13
15
0.0018 0.0012
0.0017 0.0017
0.0021 0.0019a
0.0014 0.0037a
0.0104c 0.0102c 0.0013c 0.0012b 0.0052c 0.0052c 0.0022 0.0022
14
0.0010 0.0027
c
b
0.0008 0.0018a 0.0037b 0.0146c
0.0015a 0.0042c 0.0150c
0.0039 0.0010 0.0034a 0.0150c
0.0025 0.0023c 0.0051c 0.016c
0.0018 0.0016a 0.0040b 0.0142c
0.0098c 0.0012b 0.0052c 0.0021
0.0099c 0.0012b 0.0051c 0.0021
0.0098c 0.0012b 0.0052c 0.0020
0.0102c 0.0013b 0.0051c 0.0021
0.0099c 0.0012b 0.0052c 0.0021
0.0043
0.0108c 0.0014c 0.0053c 0.0022
0.0109c 0.0014 0.0053c 0.0022
0.0108c 0.0014c 0.0053c 0.0022
0.0109c 0.0014c 0.0053c 0.0022
0.0109c 0.0014c 0.0053c 0.0022
359
FirstFinance BL 0.0024c IPO 0.0017 SEO 0.0028c PD PVEQ Subsequent financing Repeat XF2 XF3 XF4 Controls Constant 0.0100c 0.0102c 0.0100c Size 0.0012b 0.0013b 0.0012b B/M 0.0052c 0.0051c 0.0052c TA_g 0.0022c 0.0022 0.0021
4
Variable-length window Window ends at earlier of next financing or 36 months
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
where rjt is the return to stock j in month t, measured in percentage points. VWRETDt is the return to the CRSP value-weighted index, for month t, measured in percentage points. FirstFinance is a set of dummy variables equal to one for the months following the first financing in at least 36 months. BL, IPO, SEO, PD, and PVEQ are the dummies when the FirstFinance is a bank loan, initial public offering, seasoned equity offering, public debt offering, or private equity offering. XFjkt is the set of external finance dummy variables defined in Section 3. A dummy equals one if in month t, the jth firm had the kth pattern of external financing within the past relevant window. XFjkt = 0 otherwise. Repeatjt is a dummy equal to unity for each of the 36 months following a firm’s second (third, etc.) issuance of the same security type, provided that (1) no different security type was issued in between, and (2) the first issuance was within 36 months of the second. Zj,t 1 is a vector of ex ante firm characteristics that prior research has associated with future share returns. These variables are the dependent variables in Table 2, with the exception of momentum which is the prior six-month cumulative stock return. The statistical significance of each coefficient is based on the time-series standard deviation of its monthly estimated values. a,b,c indicate significance at 10%, 5%, 1% levels. Columns 6–10 uses financing dummy variables created according to the fixed-length window definition: XF1 equals one, in each of 36 months following the first of any sequence of (one or more) different-type external financings. Repeat equals one, in each of 36 months following the second consecutive issue of a claim type, as long as the second issue occurred within 36 months of the first, and as long as there was no intervening different type of external finance issue. XF2 equals one in each of 36 months following the second of any sequence of two or more different-type external financings. XF3 equals one in each of 36 months following the third of any sequence of three or more different-type external financings. XF4 equals one in each of 36 months following the fourth in the sequence of four different-type external financings. Columns 11–15 uses financing dummy variables created according to the Variablelength window definition: XF1 equals one in each of the 36 months following an external finance, as long as there is no other type of external finance within the prior (to this month) 36 months. Repeat equals one, in each of 36 months following the second consecutive issue of a claim type, as long as the second issue occurred within 36 months of the first, and as long as there was no intervening different type of external finance issue. XF2 equals one in months where there is overlap between two 36-month post-event windows following issuance of two different external finance vehicles. XF3 equals one in months where there is overlap between 36-month post-event windows following issuance of three different external finance vehicles. XF4 equals one in months where there is overlap between 36-month post-event windows following issuance of four different external finance vehicles. CAPEX and R&D expenditures, as well as Cash, are all relative to total assets. TA_g is Cooper, Gulen, and Schill. (2008) measure of asset growth. CAPEX_g is the percentage increase in the ratio of CAPEX-to-assets from the prior year. Tobin’s Q is market-to-book assets. Leverage is long-plus short-term debt divided by assets. OIBD is operating income before depreciation scaled by assets. Size is the natural log of the market value of equity. Momentum is cumulative stock return over the preceding fiscal year. Low Z is a dummy equal to one if the Z-score is less than 1.81 (Denis and Mihov, 2003; Altman, 1977). B/M is book-to-market equity. Accruals are discretionary accruals calculated using the modified Jones method (see Jones, 1991; Dechow, Sloan, and Sweeney, 1995).
0.0114c 0.0143c 0.0377c 0.0049a 0.0031 0.0152c 0.0003 0.0031c 0.0011c 0.0114c 0.0144c 0.0377c 0.0049a 0.0033 0.0153c 0.0003 0.0031c 0.0011c 0.0114c 0.0142c 0.038c 0.0051b 0.003 0.0152c 0.0003 0.0031c 0.0011c 0.0114c 0.0139c 0.0377c 0.0054b 0.0031 0.0156c 0.0004 0.0031c 0.0011c 0.0114c 0.0145c 0.0376c 0.0050b 0.0031 0.0152c 0.0004 0.0031c 0.0011c 0.0095c 0.0137c 0.0363c 0.0043a 0.0033a 0.0142c 0.0004 0.0031c 0.0011c 0.0095c 0.0136c 0.0363c 0.0044a 0.0037a 0.0146c 0.0004 0.0032c 0.0011c 0.0095c 0.0133c 0.0366c 0.0047a 0.0033a 0.0143c 0.0004 0.0031c 0.0011c 0.0095c 0.0135c 0.0361c 0.0048a 0.0034a 0.0145c 0.0005 0.0032c 0.0011c 0.0096c 0.0137c 0.0362c 0.0045a 0.0034a 0.0143c 0.0005 0.0032c 0.0011c 0.0094c 0.0094c 0.0094c 0.0137c 0.0146c 0.0145c 0.0365c 0.0361c 0.0364c 0.0044a 0.0041 0.0041 0.0036a 0.0041b 0.0039b 0.0143c 0.0145c 0.0142c 0.0004 0.0004 0.0004 0.0031c 0.0033c 0.0032c 0.0011c 0.0011c 0.0011c 0.0095 0.0095c 0.0146c 0.0141c 0.0362c 0.0362c 0.0041a 0.0047a 0.0037b 0.0039b 0.0143c 0.0147c 0.0005 0.0005 0.0032c 0.0033c 0.0011c 0.0011c Momentum CAPEX R&D Cash Leverage OIBD Low Z Accruals CAPEX_g
15 14 13 12 11 10 9 8 7 6 4 3 2 1 Variable
Table 3 (continued )
No control for subsequent financings
5
Fixed-length window XF dummies capture overlap with subsequent financing months
Variable-length window Window ends at earlier of next financing or 36 months
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
Control for subsequent financings
360
PVEQ become insignificant once we control for subsequent financing. Specifically, we see in columns 6–10 that the estimated coefficients’ values are also smaller than in corresponding columns 1–5, except for public debt’s positive coefficient, which is slightly larger and marginally significant (t= 1.89). Repeated bank loans or private equity issuances do not depress subsequent returns, but Repeated SEOs carry a significantly negative effect ( 4.6% annually). Notably, Repeated public debt issuances are followed by positive long-run returns of about 3% per year, perhaps because a firm with multiple debt issues is profitable, and can forestall further leverage increases by retaining earnings. The detrimental return implications of issuing multiple security types is clearly identified by the significantly negative coefficients on XF3 and XF4. (XF2 demonstrates similar, but more muted effects.) Averaging across security types, a firm that issues three security types within 36 months suffers approximately 0.40% monthly long-run abnormal returns, or about 4.7% annually for three years. The relatively few firms that issue four different securities suffer about 1.5% monthly underperformance, or roughly 18% per year for three years. Columns 11–15 of Table 3 control for subsequent financing by using the variable-length window approach, and confirm the important effect of repeated issuances on the long-run returns following security issuance. As in columns 6–10, the variable-length approach results indicate that a single security issuance is never associated with significant long-run underperformance. We conclude from Table 3 that ignoring multiple security issuances substantially changes how one views a bank loan, SEO, or private equity issuance event. For these security types, no significant underperformance follows a single issuance. Rather, issuing more than one type of security within a short window reflects undiagnosed problems that show up in later returns.23 Multiple security issuances apparently drive the statistical significance of ex post returns in columns 1–5 of Table 3, and the same is likely for previous studies that concentrated exclusively on a single type of security. Moreover, the effect of other issues is most pronounced for the small sample of firms with three or more different financing types, which means that those firms were likely included in three or more of the preceding studies that look at individual security-type issuances. Table 4 aggregates financing events to examine the impact of initial vs. subsequent security issuances on long-run returns.24 These results confirm the conclusions in Table 3. The first column indicates that a single security issuance is followed by small, but significant, long-run 23 Given that multiple security issuances bode ill for a firm, it is natural to ask whether the order in which securities are issued has any long-run return implications. For example, does debt followed by equity carry different implications than equity followed by debt? Likewise, one might ask whether a public security issuance followed by a private one reflects worse information than the converse. In unreported results, test statistics fail to reject the null hypothesis that sequence does not matter. 24 We omit reporting the control variables’ coefficients from Table 4 in order to focus more directly on the financing coefficients. (The unreported coefficients are all similar to their values in Table 3.)
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
Table 4 Returns following initial and subsequent financings. Table presents time-series averages (over 276 months, January 1983– December 2005) of the coefficients (in percentage points) from monthly cross-sectional regressions of the following form: ðrjt VWRETDt Þ ¼ a0 þ
4 X
ak ðXFjkt Þ þ gðRepeatjt Þ þ
X
bj Zj,t1 þ e~ jt
k¼1
ð1Þ where rjt is the return to stock j in month t, measured in percentage points. VWRETDt is the return to the CRSP value-weighted index, for month t, measured in percentage points. XFjkt is the set of external finance dummy variables defined in Section 3. A dummy equals one if in month t, the jth firm had the kth pattern of external financing within the past relevant window. XFjkt = 0 otherwise. Repeatjt is a dummy equal to unity for each of the 36 months following a firm’s second (third, etc.) issuance of the same security type, provided that (1) no different security type was issued in between, and (2) the first issuance was within 36 months of the second. Zj,t 1 is a vector of ex ante firm characteristics that prior research has associated with future share returns. These variables are the dependent variables in Table 2, with the exception of momentum which is the prior six-month cumulative stock return. The statistical significance of each coefficient is based on the time-series standard deviation of its monthly estimated values. a,b,c indicate significance at 10%, 5%, 1% levels. Unless otherwise noted the XF variables are defined on a fixed-length window. XF1 equals one in each of 36 months following the first financing event in at least 36 months. Repeat equals one, in each of 36 months following the second consecutive issue of a claim type, as long as the second issue occurred within 36 months of the first, and as long as there was no intervening different type of external finance issue. XF2 equals one in each of 36 months following the second of any sequence of two or more differenttype external financings. XF3 equals one in each of 36 months following the third of any sequence of three or more different-type external financings. XF4 equals one in each of 36 months following the fourth in the sequence of four different-type external financings. For the variablelength window, XF1 equals one in each month following the first financing event for 36 months or until the next financing event, whichever is sooner. In column 4, we remove all firm-months where any multiple financing dummy (Repeat, XF2, XF3, or XF4) is on (= 1). Variable
1
2
XF1
0.0017b ( 2.18)
XF1 (variable-length window) Repeat
4
0.0011 0.0012 ( 1.48) ( 1.57) 0.0010 ( 1.62)
XF2 XF3 XF4 Includes controls from Table 3
3
Yes
0.0003 ( 0.34) 0.0013a ( 1.86) 0.0042c 2.40) 0.0153 ( 2.53)c Yes Yes
Yes
underperformance even after controlling for the firm characteristics in Table 2. In column 2, we repeat the analysis of column 1 but with the overlapping months removed (i.e., XF1 is now defined based on the variablelength window). As in columns 11–15 of Table 3, we see that removing the influence of months overlapping with subsequent financing activity eliminates the evidence of underperformance. Column 3 returns to the fixed-length window approach and includes the additional subsequent financing dummies. Controlling for Repeat and multiple security-type issuances (XF2, XF3, XF4) in the third column
361
reduces the estimated effect of a single security issuance (from 0.17% monthly to 0.11%) and renders the estimate statistically indistinguishable from zero (t= 1.48). Unlike Table 3, a generic Repeat issuance has no significant effect. The multiple-type issuance dummies still carry large and significantly negative coefficients. In the last column of Table 4, we revert back to the fixedlength window approach for XF1 and remove the overlapping months from the data set. The resulting coefficient on XF1 is insignificant, suggesting this third approach also renders no evidence of underperformance following initial financings. The results of Tables 3 and 4 suggest that subsequent financings drive the evidence of underperformance following security offerings, regardless of claim type and especially when multiple types of claims are involved. We now examine a second model of expected returns.
6.2. Four-factor model results The factor-model approach to detecting abnormal long-run returns (2) requires assembling a portfolio of firms that have recently issued similar types of securities. These tests perform best when the portfolios all include a large number of firms, which minimizes the effects of idiosyncratic risk. We form five portfolios based on XF1, XF2, XF3, XF4, and Repeat.25 The firm-month return is included in a given portfolio if the corresponding dummy variable is equal to one under the variable-length window definition. We use the variable-length window definition because it assigns each firm-month to a unique portfolio. (In contrast, the fixed-length definition could assign the same firm-month return to multiple portfolios simultaneously. This would not allow us to control for or isolate the influence of subsequent financings.) We regress the monthly portfolio returns on the three Fama–French factors and Carhart’s (1997) momentum factor. A nonzero intercept term implies an abnormal return to the set of firms with similar financing characteristics (i.e., XF1, XF2, XF3, XF4, and Repeat). Most researchers have found that anomalous financial effects tend to be more apparent in equal-weighted (EW) portfolios than they are in value-weighted (VW) portfolios. (Presumably, small stocks are more difficult to arbitrage.) We therefore construct both EW and VW portfolios for each group of firms in each month. The results in Table 5 are somewhat unusual because the VW and EW portfolios yield quite similar conclusions, equally confirming the association between multiple securitytype issuances and subsequent equity underperformance. For brevity, we limit our discussion to the VW results. Panel A of Table 5 reports the estimated abnormal returns to portfolios of firms with various external financing patterns. The sample of firm-months with 25 An alternative is to study specific security issuances (as in Table 3). However, forming portfolios on each combination of external financing types would yield numerous portfolios: BL and SEO, BL and PD, SEO and PD, IPO, SEO, and PD, etc. The portfolios for many combinations of security issuance would include only a small number of firms.
362
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
Table 5 Four-factor model results. Results from estimating the Fama/French three-factor model for portfolios made up of firms with similar financing histories: (Rpt Rft) = a + b(VWRETDt Rft) +sSMBt + hHMLt + mMOMt + e where Rpt is the return on the portfolio of sample firms in month t; Rft is the three-month T-bill yield in month t; VWRETDt is the return on the value-weighted index (VWRETD) or equal-weighted index (EWRETD) of NYSE, Amex, and Nasdaq stocks in month t; SMBt is the return on small firms minus the return on large firms in month t; and HMLt is the return on high book-to-market stocks minus the return on low book-to-market stocks in month t. MOMt is Carhart’s (1997) momentum factor realization for month t. Portfolios of firms with similar funding were formed on both an equal-weighted (EW) and a value-weighted (VW) basis. Each portfolio’s regression was estimated for the period January 1983 through December 2005. We form five portfolios based on XF1, XF2, XF3, and XF4. The firm-month return is included in a given portfolio if the corresponding dummy variable is equal to one under the variable-length window definition. The weighted regressions weight each observation by the square root of the N where N is the number of firms in the portfolio for that month. p-Value of difference (Int–IntXF1) is the p-value of the test that the intercept is different from the intercept from XF1 portfolio. a,b,c indicate significance at 10%, 5%, 1% levels. Value-weighted portfolios Portfolio
Intercept
Panel A: Unweighted regressions XF1 0.0003 Repeat 0.0009 XF2 0.0031b XF3 0.0080b XF4 0.0200a
p-Value of difference (Int–IntXF1)
0.4264 0.0121 0.0087 0.0519
Panel B: Weighted regressions by square root of N XF1 0.0004 Repeat 0.0004 0.7885 XF2 0.0029b 0.0180 b XF3 0.0075 0.0082 a XF4 0.0190 0.0522
Equal-weighted portfolios N
Adj. R2
Intercept
276 276 276 276 170
0.90 0.85 0.87 0.58 0.13
0.0021 0.0011 0.0031b 0.0089c 0.0225b
276 276 276 276 170
0.90 0.85 0.87 0.62 0.16
0.0024a 0.0005 0.0029b 0.0079c 0.0202b
XF1 =1 indicates an insignificant monthly excess return of 3 bps. Repeatedly issuing the same security type also has no significant long-run return effect. However, the estimated effects of issuing two, three, or four different security types are very large, with monthly underperformance of 31, 80, and 200 bps, respectively. Moreover, each of these coefficients differs reliably from the return to firms issuing a single security type. Subsequent financing activity associates with significantly different return performance than single issuances. The literature’s inferences appear to be driven by previously unexamined effects of subsequent financing. As noted, high idiosyncratic noise in those portfolios containing few firms may bias test statistics toward zero. The XF1 and XF2 portfolios in Panel A contain more than 100 firms for all 276 months (from January 1983– December 2005). The Repeat portfolios contain more than 100 firms in all but 40 of the 276 months, and always more than 60 observations in a month. The other two portfolios tend to be smaller: the XF3 monthly portfolios include fewer than 51 (but more than ten) securities in 153 of the 276 sample months, and the XF4 monthly portfolios never include more than ten individual securities. We therefore re-estimate the models in Panel A using weighted least squares where each observation’s weight equals the square root of the number of firms in that month’s portfolio. Although the estimated underperformance amounts associated with XF2–XF4 are slightly reduced, they remain large, statistically significant, and reliably different from the return to firms issuing only one type of security (XF1). We conclude that the factor-model results in Table 5 closely conform to the Fama–MacBeth estimates in Tables 3 and 4.
N
Adj. R2
0.0138 0.0001 0.0001 0.0113
276 276 276 276 170
0.92 0.87 0.90 0.73 0.24
0.0129 0.0001 0.0001 0.0203
276 276 276 276 170
0.92 0.87 0.91 0.76 0.31
p-Value of difference (Int–IntXF1)
6.3. Buy-and-hold abnormal returns (BHARs) Evaluating the BHARs associated with various financing patterns provides a third method for testing whether security issuance is reliably followed by long-run underperformance. Given the importance of matching event firms with otherwise-similar, non-issuing firms, we report BHARs based on a variety of three-dimensional matching criteria. All matches were undertaken on the basis of size and equity’s book-to-market value. The columns in Table 6 differ in the third characteristic on which matching firms were selected. The third factor was total asset growth in column 1, CAPEX_g in column 2, and so forth, giving nine sets of BHARs for each financing pattern (XF1–XF4). We report the event firms’ mean return, the peer firms’ mean return, the mean BHAR, and its statistical significance (compared to zero). Given that we are not conducting cross-sectional regressions, we cannot use the fixed-length window approach since we do not have dummy variables that will capture the influence of subsequent financing activity on returns. Thus, as in the Fama–French approach, we use the variable-length window in constructing the XFi variables to ensure that subsequent financing holding periods capture the months that would otherwise overlap with prior financing windows. That is, the XF1 dummy is ‘‘on’’ only when the firm has issued a single type of security in the past 36 months, e.g., from [t, t+12] and [t+37, t+48] in Fig. 2. The results indicate that the BHAR following a single security issuance is negative and significant only for one set of peer firms out of the nine. (This is the set based on a size, B/M, and R&D match.) For the other eight peer groupings, the BHARs for XF1 are positive, and four of them differ reliably from zero. In other words, there is
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
363
Table 6 Buy and hold abnormal returns. Mean buy-and-hold returns for financing firms (event firms) and non-issuers (Peer firms) and their difference (BHAR). Peer firms are matched on the basis of size, equity’s market-to-book ratio, and the characteristic identified at the top of each column: CAPEX and R&D expenditures, as well as Cash, are all relative to total assets. TA_g is Cooper, Gulen, and Schill (2008) measure of asset growth. CAPEX_g is the percentage increase in the ratio of CAPEX-toassets from the prior year. Tobin’s Q is market-to-book assets. Leverage is long-plus short-term debt divided by assets. OIBD is operating income before depreciation scaled by assets. Size is the natural log of the market value of equity. Momentum is cumulative stock return over the preceding fiscal year. Low Z is a dummy equal to one if the Z-score is less than 1.81 (Denis and Mihov, 2003; Altman, 1977). B/M is book-to-market equity. Accruals are discretionary accruals calculated using the modified Jones method (see Jones, 1991; Dechow, Sloan, and Sweeney, 1995). The third factor was TA_g in column 1, CAPEX_g, and so forth. For each financing pattern (XF1–XF4), we report the event firm’s mean cumulative return, the peer firms’ mean cumulative return, and the difference between these two. The returns are computed using the variable-length window approach. a,b,c indicate significance at 10%, 5%, 1% levels. Growth indicators TA_g
CAPEX_g
CAPEX
Financial condition indicators
Stock performance
R&D
CASH
Leverage
Accruals
OIBD
Momentum
Matching criteria (in addition to size and book-to-market) XF1 = 1 Event firms 0.2362 0.2100 0.2421 Peer firms 0.1918 0.1840 0.2012 b a BHAR 0.0444 0.0260 0.0408 N 8,106 8,170 8,282
0.2644 0.2998 0.0354b 10,860
0.2067 0.1801 0.0266 8,423
0.2116 0.1893 0.0223 7,671
0.2180 0.1798 0.0382b 7,874
0.2423 0.1940 0.0482c 8,610
0.2063 0.1958 0.0105 9,142
XF2 = 1 Event firms Peer firms BHAR N
0.0555 0.0523 0.0032 6,554
0.0414 0.0833 0.0419c 6,685
0.0463 0.0523 0.0060 6,699
0.0612 0.1082 0.0470c 7,959
0.0451 0.0627 0.0175b 6,890
0.0375 0.0691 0.0316c 6,365
0.0378 0.0801 0.0423c 6,440
0.0481 0.0787 0.0306c 6,826
0.0487 0.0717 0.0229b 7,620
XF3 = 1 Event firms Peer firms BHAR N
0.0058 0.0317 0.0259 940
0.0029 0.0515 0.0544c 978
0.0067 0.0624 0.0691c 969
0.0095 0.0625 0.0530c 1,192
0.0029 0.0550 0.0578c 1,002
0.0075 0.0668 0.0743c 926
0.0229 0.0446 0.0217 943
0.0068 0.0712 0.0780c 992
0.0034 0.0491 0.0457c 1,137
XF4 = 1 Event firms Peer firms BHAR N
0.0529 0.0744 0.1273b 41
0.0070 0.0407 0.0337 49
0.0528 0.0079 0.0450 49
0.0283 0.0944 0.1227b 55
0.0376 0.0704 0.1080b 43
0.0549 0.0541 0.1091a 31
0.0370 0.1016 0.1385 39
0.0711 0.0301 0.1012b 43
0.0739 0.0546 0.1285b 53
little support for the hypothesis that issuing a single security type causes subsequent underperformance, consistent with our previous findings. All but one of the 27 BHARs associated with XF2, XF3, and XF4 are negative, and most differ significantly from zero. Moreover, the extent of underperformance rises with the number of security types issued, averaging 2.63%, 5.33%, and 10.16% for XF2, XF3, and XF4, respectively, across the nine peer definitions. Overall, we conclude from Table 6 that our BHAR results confirm the conclusions from our Fama– MacBeth and four-factor model tests.
7. Conclusion The existing literature indicates that the issuance of most external financial claims portends poor subsequent stock returns. Taken literally, these single-claim studies imply that raising external finance is associated with poor future performance. Could corporate governance or managerial incentives generally be so poor that this is true? We examine financing choices in a comprehensive context, to see whether underperformance is associated
with claim-type or rather the tendency to issue multiple claim-types. We find strong evidence supporting the latter. The estimated underperformance following issuances of single claim- types is highly dependent on whether one accounts for other financing events by the same issuer. Controlling for issuances of additional claim-types eliminates the estimated underperformance following bank loans, SEOs, and, to a lesser degree, private equity issuances.26 We find that multiple financing patterns generate much worse performance than the single events evaluated previously in the literature. Taken together, our results suggest that external finance, per se, does not augur future underperformance; rather, underperformance is more a function of the variety and frequency of firms’ issuance activities. Future research may determine why firms would engage in such issuance behavior, knowing that, on average, underperformance follows.
26 While the coefficient on PVEQ becomes statistically insignificant, the economic magnitude remains large.
364
M.T. Billett et al. / Journal of Financial Economics 99 (2011) 349–364
References Altman, E., 1977. The Z-Score Bankruptcy Model: Past, Present, and Future. Wiley, New York. Barber, B., Lyon, J., 1997. Detecting long-run abnormal stock returns: the empirical power and specification of test statistics. Journal of Financial Economics 43, 341–372. Billett, M., Flannery, M., Garfinkel, J., 1995. The effect of lender identity on a borrowing firm’s equity return. The Journal of Finance 50, 699–718. Billett, M., Flannery, M., Garfinkel, J., 2006. Are bank loans special? Evidence on the post-announcement performance of bank borrowers. Journal of Financial and Quantitative Analysis 41, 733–751. Carlson, M., Fisher, A., Giammarino, R., 2006. Corporate investment and asset price dynamics: implications for SEO event studies and longrun performance. The Journal of Finance 61, 1009–1034. Carhart, M., 1997. On persistence in mutual fund performance. The Journal of Finance 52, 57–82. Chopra, N., Lakonishok, J., Ritter, J., 1992. Measuring abnormal performance: do stocks overreact? Journal of Financial Economics 31, 235–268 Cooper, M.J., Gulen, H., Schill, M.J., 2008. Asset growth and the crosssection of stock returns. The Journal of Finance 53, 1609–1651. Daniel, K., Titman, S., 1997. Evidence on the characteristics of crosssectional variation in stock returns. The Journal of Finance 52, 1–33. Davis, J., Fama, E., French, K., 2000. Characteristics, covariances and average returns: 1929 to 1997. The Journal of Finance 54, 389–406. Dechow, P.M., Sloan, R.G., Sweeney, A.P., 1995. Detecting earnings management. The Accounting Review 70, 193–225. Denis, D., Mihov, V., 2003. The choice among bank debt, non-bank private debt, and public debt: evidence from new corporate borrowings. Journal of Financial Economics 70, 3–28. Eberhart, A., Maxwell, W., Siddique, A., 2004. An examination of longterm abnormal stock returns and operating performance following R&D increases. The Journal of Finance 59, 623–650. Fama, E., 1998. Market efficiency, long-term returns, and behavioral finance. Journal of Financial Economics 49, 283–306. Fama, E., French, K., 1992. The cross-section of expected stock returns. The Journal of Finance 47, 427–466. Fama, E., French, K., 1993. Common risk factors in the returns on stocks and bonds. Journal of Financial Economics 33, 3–56.
Fama, E., French, K., 2008. Dissecting anomalies. The Journal of Finance 53, 1653–1678. Fama, E., MacBeth, J., 1973. Risk, return, and equilibrium: empirical tests. Journal of Political Economy 81, 607–636. Hertzel, M., Lemmon, M., Linck, J., Rees, L., 2002. Long-run performance following private placements of equity. The Journal of Finance 57, 2595–2617. Jegadeesh, N., Titman, S., 1993. Returns to buying winners and selling losers: implications for stock market efficiency. The Journal of Finance 48, 65–91. Jones, J., 1991. Earnings management during import relief investigations. Journal of Accounting Research 29, 193–228. Lyon, J., Barber, B., Tsai, C., 1999. Improved methods for tests of long-run abnormal stock returns. The Journal of Finance 54, 165–201. Petersen, M., 2009. Estimating standard errors in finance panel data sets: comparing approaches. Review of Financial Studies 22, 435–480. Pontiff, J., Woodgate, A., 2008. Share issuance and cross-sectional returns. The Journal of Finance 63, 921–945. Richardson, S., Sloan, R., 2003. External financing and future stock returns. Rodney L. White Center for Financial Research Working Paper No. 03-03. Ritter, J., 1991. The long-run performance of initial public offerings. The Journal of Finance 46, 3–27. Ritter, J., 2003. Investment banking and securities issuance. In: Constantinides, G., Harris, M., Stulz, R. (Eds.), Handbook of the Economics of Finance. Elsevier Science, North-Holland, pp. 255–306 Chapter 5. Spiess, D.K, Affleck-Graves, J., 1995. Underperformance in long-run stock returns following seasoned equity offerings. Journal of Financial Economics 38, 243–267. Spiess, D.K, Affleck-Graves, J., 1999. The long-run performance of stock returns following debt offerings. Journal of Financial Economics 54, 45–73. Teoh, S., Welch, I., Wong, T., 1998a. Earnings management and the underperformance of seasoned equity offerings. Journal of Financial Economics 50, 63–99. Teoh, S., Welch, I., Wong, T., 1998b. Earnings management and the longrun market performance of initial public offerings. The Journal of Finance 53, 1935–1974. Titman, S., Wei, K., Xie, F., 2004. Capital investments and stock returns. Journal of Financial and Quantitative Analysis 39, 677–700.
Journal of Financial Economics 99 (2011) 365–384
Contents lists available at ScienceDirect
Journal of Financial Economics journal homepage: www.elsevier.com/locate/jfec
A theory of corporate financial decisions with liquidity and solvency concerns$ Sebastian Gryglewicz Erasmus University Rotterdam, Erasmus School of Economics, 3000DR Rotterdam, The Netherlands
a r t i c l e in fo
abstract
Article history: Received 9 September 2009 Received in revised form 18 February 2010 Accepted 23 March 2010 Available online 29 September 2010
This paper studies the impact of both liquidity and solvency concerns on corporate finance. I present a tractable model of a firm that optimally chooses capital structure, cash holdings, dividends, and default while facing cash flows with long-term uncertainty and short-term liquidity shocks. The model explains how changes in solvency affect liquidity and also how liquidity concerns affect solvency via capital structure choice. These interactions result in a dynamic cash policy in which cash reserves increase in profitability and are positively correlated with cash flows. The optimal dividend distributions implied by the model are smoothed relative to cash flows. I also find that liquidity concerns lead to a decrease of dispersion of credit spreads. & 2010 Elsevier B.V. All rights reserved.
JEL classification: G32 G33 G35 Keywords: Financial distress Capital structure Cash holdings Dividends Financing constraints
1. Introduction Financial distress is recognized as a driving force behind many corporate decisions. At the same time, however, there is little understanding of the roles of and relations between corporate illiquidity and insolvency—the two sources of financial distress. In this paper, I argue that the interactions of liquidity and solvency can explain empirical patterns in
$ I would like to thank Kuno Huisman, Peter Kort, and an anonymous referee for their many insightful suggestions. I also thank Viral Acharya, Jean-Paul De´camps, Ingolf Dittmann, Thomas Mariotti, Alina Maurer, ¨ Norman Schurhoff, Ste´phane Villeneuve, and seminar participants at London Business School, Toulouse School of Economics, Erasmus University Rotterdam, the 2008 European Finance Association Annual Meeting, and the 2008 Conference on Price, Liquidity, and Credit Risks in Konstanz for their helpful comments. E-mail address:
[email protected] 0304-405X/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jfineco.2010.09.010
cash and dividend policies, and also shed further light on capital structure choice, valuation, and credit spreads. Corporate finance literature has long been interested in how firms that generate uncertain cash flows distribute dividends. Firms paying dividends tend to smooth distributions relative to earnings and, when in distress, they reduce dividends rather than omit them (Lintner, 1956; DeAngelo and DeAngelo, 1990; Brav, Graham, Harvey, and Michaely, 2005). Leary and Michaely (2008) show that dividend smoothing has been steadily increasing over the past decades, but the reasons for such a payout policy remain a puzzle. A related question is how much cash firms save out of high cash flows and disburse in periods of low cash flows. Corporate cash policies have been recently receiving increasing attention due to the vast and growing cash holdings of U.S. companies (see Bates, Kahle, and Stulz, 2009). Financially constrained firms appear to show a positive cash flow sensitivity of cash, that is, a propensity to save cash from positive cash flow shocks (Almeida,
366
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
Campello, and Weisbach, 2004; Khurana, Martin, and Pereira, 2006; Sufi, 2009). However, the direction of the sensitivity and the reasoning behind it are unsettled (see Riddick and Whited, 2009).1 The prevailing evidence indicates that corporate cash serves mainly as a buffer against adverse cash flow shocks. To provide unifying insights into cash and dividend policies, this paper proposes a tractable dynamic model of corporate finance that integrates liquidity and solvency concerns and uncovers linkages between them. Consistent with the above empirical facts, my analysis reveals that firms that use cash to hedge liquidity shocks hold large amounts of cash, smooth dividends, and exhibit a positive cash flow sensitivity of cash. In essence, the model shows that persistent liquidity shocks affect solvency and that solvency levels, in turn, determine demand for corporate liquidity. For example, a negative cash flow surprise decreases solvency and, consequently, such a firm requires less cash. Because any excess cash is distributed, dividend payment is smoothed and the shock is absorbed by cash holdings. To motivate the approach taken in this paper, I start with some elementary properties of liquidity and solvency risks. Corporate liquidity is a short-term characteristic that measures the ability of a firm to pay its obligations on time. Corporate solvency is the ability to cover debt obligations in the long run. Liquidity and solvency risks are closely related to cash flow uncertainty. Short-term shocks to cash flows, together with the availability of cash reserves, affect corporate liquidity. Uncertainty about average future profitability, together with financial leverage, generates solvency concerns.2 These relations indicate that firms enter financial distress in two ways: a firm can become illiquid after a negative short-term cash flow or it can become insolvent if the expected rate of cash flows decreases sufficiently. The defining characteristic of the model is that it recognizes that these two sources of cash flow shocks are separate but interconnected. If a firm generates negative liquidity surprises, that is, if cash flows persistently fall below their expected level, expectations about future cash flow are adjusted downwards. Conversely, a firm that persistently generates positive liquidity shocks must be, after all, more profitable than expected. In both situations, liquidity shocks accumulate to change expected firm value and thereby solvency. To disentangle solvency and liquidity concerns, I model cash flows with two sources of uncertainty. The
1 Riddick and Whited (2009) use an alternative empirical methodology to the one used by Almeida, Campello, and Weisbach (2004). The reasoning employed in the literature on cash flow sensitivity of cash relies on the assumption that cash is used mainly to fund future investments. However, both empirical and survey evidence in Opler, Pinkowitz, Stulz, and Williamson (1999) and Lins, Servaes, and Tufano (2010) shows that the main motive for corporate cash holdings is precautionary, but rather than to fund investments, cash is used to cushion adverse cash flow shocks. 2 I use the terms ‘‘liquidity’’ and ‘‘short-term’’ interchangeably to describe liquidity-related shocks and risks. Similarly, ‘‘solvency,’’ ‘‘longterm,’’ and ‘‘profitability’’ are used interchangeably for solvency shocks and risks.
first is short-term liquidity uncertainty: at each time cash flow realizations may fall above or below expected cash flow due to a liquidity shock. The second source of uncertainty is long-term solvency uncertainty: the expected cash flow rate evolves over time. Liquidity and solvency are connected because short-term liquidity shocks affect the expected cash flow rate through Bayesian learning. More specifically, it is assumed that cash flows follow a Brownian motion with drift and that the drift parameter is not directly observable. The firm and investors observe noisy cash flows (subject to liquidity shocks) and learn about the drift (the average rate of cash flows). In this way, short-term liquidity shocks around average profitability are not only noise but also, if persistent, affect the assessment of solvency. I embed this cash flow process in a model of dynamic corporate finance with financing constraints, endogenous capital structure, dividend policy, cash holdings, and default. In the model, the firm issues a combination of equity and debt to finance the required investment and initial cash. Corporate debt offers a tax advantage but also creates bankruptcy costs. The firm generates cash flows with two sources of uncertainty, pays debt coupons and taxes. At each time, positive net earnings can be either distributed as dividends or retained to increase cash holdings. Net losses and dividend payouts can be covered from cash reserves. The payout-retention policy maximizes equity value. If at any time the firm is unable to pay its obligations, it is illiquid. If firm value falls below debt value, the firm becomes insolvent and it defaults. The model uncovers several linkages between liquidity and solvency, and underscores their roles in cash, dividend, and leverage policies. In the presence of financing constraints, a firm without sufficient cash reserves may become illiquid and be forced into default while still solvent. The model characterizes a level of cash, denoted by C , that allows the firm to withstand liquidity shocks up to the point at which the equity holders endogenously trigger solvency default. The analysis shows that C evolves over time and increases with expected profitability. Intuitively, a more profitable firm is more solvent and thus, it has a greater continuation value that is to be saved. Such a firm is willing to withstand larger liquidity shocks before it is eventually deemed insolvent and so the required cash buffer is higher. In other words, higher solvency creates a higher demand for corporate liquidity. I further show that it is optimal for a firm that maximizes equity value to retain all earnings if cash is below C and, subsequently, to pay out dividends that allow the firm to maintain cash at C . The model predicts dividend distributions that are smooth in comparison with cash flows or earnings. The reason is that the target cash level C is not constant but increases and decreases with firm value. A positive earnings surprise provides some positive information about future cash flows, increases expected firm value, and thereby also the optimal level of C . Therefore, not all but only a fraction of additional earnings will be paid out as a dividend. Conversely, if earnings are surprisingly low, firm value and cash level C decrease, so that dividends are complemented by released cash holdings.
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
This mechanism simultaneously explains dividend smoothing and positive cash flow sensitivity of cash. The insensitivity of dividends and the sensitivity of cash to cash flow shocks must be closely related because, without external financing of dividends, payouts can be smooth only if cash reserves absorb cash flow shocks. This model is the first to show how this corporate policy arises from an equity-value-maximizing dividend policy through the interplay of liquidity and solvency. Apart from implications for cash and dividend policies, the model has consequences also for capital structure. The model predicts that firms select their leverage to limit exposure to liquidity risk. In this way, liquidity has an impact on capital structure and thereby on solvency. This results in a novel trade-off in debt choice. Additional cash holdings are costly, so debt should be chosen such that required cash levels are low. If debt is very low, solvency is high and demand for cash is also high. On the other hand, if debt is very high, the liquidity pressure by debt coupon payments also lead to high required cash holdings. Consequently, an intermediate level of debt is optimal. As the changes in leverage translate ultimately into changes in debt credit spreads, the model predicts lower dispersion of credit spreads across firms than in the standard environment without liquidity concerns. Empirically, Eom, Helwege, and Huang (2004) show that existing models tend to predict credit spreads that are too high if observed spreads are relatively high, while predicted spreads are too low if observed spreads are relatively low. The recognition of liquidity concerns in this paper moves credit spreads in the empirically observed direction. Further analysis indicates that short-term cash flow volatility and long-term uncertainty about profitability can have very different effects on financial variables. I find that cash holdings increase in volatility and decrease in profitability uncertainty. The first relation is in line with the explanation of Bates, Kahle, and Stulz (2009) for high cash levels among U.S. firms. The second prediction is novel and provides grounds for further empirical tests of determinants of cash holdings. Debt credit spreads also react differently to the two measures of risk, namely, they decrease in volatility and increase in profitability uncertainty. The two sources of uncertainty have different effects because, in essence, volatility is related to liquidity and profitability uncertainty to solvency concerns. In the following section, I set up the model. Section 3 analyzes a benchmark case of a firm without financing constraints concerned only about solvency. Sections 4 and 5 present the main model with both liquidity and solvency concerns and discuss its implications. Section 4 analyzes optimal cash and dividend policies. Section 5 examines capital structure. In Section 6, I relate the paper to previous literature. Section 7 concludes.
367
cash holdings, dividend payout, and default policy. The model is set in continuous time with an infinite horizon; time is indexed as t 2 ½0,1Þ. It is assumed that management acts in the interest of equity holders, all investors are risk neutral and discount cash flows at a constant risk-free rate r. The original equity holders are financially constrained and seek external financing to cover investment cost I and initial cash reserves C0. Investment cannot be delayed. Once successfully financed, the firm generates a continuous flow of earnings, with cumulative earnings at time t denoted as Xt. The earnings process is the main state variable and is described in detail in the next subsection. Earnings are subject to corporate taxes at rate t with a full loss offset provision. The debt coupon payments are deducted from earnings for tax purposes, creating the tax benefit of debt. Corporate cash reserves earn interest at the risk-free rate r. Other interest rates could influence the quantity of cash holdings, but should not affect the economic insights of the model. The financing may come from a combination of equity and perpetual debt, which promises coupon payments are rate k. The value function of equity is denoted E and that of debt is D. The model allows for both fixed and proportional flotation costs of new issuance, denoted L Z0 and l 2 ½0,1Þ, respectively. For the sake of simplicity, the costs are the same for both debt and equity. The sequence of events and decisions is as follows. At time t=0, the firm issues a combination of equity and debt to maximize the value of the original equity holders. After that, the firm starts receiving the flow of earnings and pays out the promised coupon and corporate taxes. Net profits (or losses) are left at the disposal of the firm and are either retained to increase (decrease) cash reserves or are paid out to equity holders as dividends (in the case of instantaneous losses, dividends may be paid out from positive cash reserves). Cumulative dividends up to time t are denoted by Divt. To deal with indeterminate situations, I assume that equity holders pay out marginal cash holdings whenever they weakly prefer to do so. When the firm has no means to cover the current coupon payments, it defaults for the reasons of illiquidity. Such an event is called a liquidity default. The financial distress is driven here by short-term factors. The firm may also, acting in the interest of equity holders, voluntarily default if the value of equity falls below zero. In this case, the firm is not profitable enough for the equity holders to run it and pay the debt coupons. Then, the firm faces long-term distress; this type of default is referred to as a solvency default. In the event of either type of default, the firm is liquidated, which is costly. The debt claims have the absolute priority in the case of default and the liquidation value is aA, a 2 ð0,1Þ. Here, 1a is the proportional liquidation cost and A is the value of the all-equity firm at the moment of default.3
2. Setup 2.1. Outline and timing I consider financial decisions of a firm that generates uncertain cash flows. The firm selects its capital structure,
3 Following the standard in the literature, we simplify the analysis by assuming that the firm is not refinanced with an optimal capital structure after default.
368
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
2.2. Earnings and uncertainty The firm generates a stochastic flow of earnings before interest and taxes (EBIT): dX t ¼ m dt þ s dZ t ,
ð1Þ
where m is the mean of EBIT, s is its volatility, and Z is a standard Brownian motion. All parties (insiders and outsiders) have the same information at each time t. They observe the cumulative EBIT process fXs ,s rtg that generates a filtration fF t g. There are two sources of uncertainty. First, instantaneous flows are subject to Brownian shocks dZ t , which represent short-term liquidity shocks. Second, the profitability of the firm is uncertain, which is represented by the fact that the true mean m is ex ante unknown to all parties. It is assumed that m is fixed and can take either of the two values mL or mH , with mL o mH . All parties share a common prior expectation m0 about m , with m0 2 ðmL , mH Þ. The two sources of uncertainty serve to capture the two main sides of corporate financial distress. The unpredictable immediate earnings (due to Brownian shocks) bring in the short-term liquidity risk. The uncertain drift m puts the firm in a position to undergo solvency distress and, ultimately, solvency default. As time evolves, more information becomes available and the parties update their expectation of mean earnings. The current set of information generated by Xt is described by F t and is used in a Bayesian fashion to update the conditional expectation to
mt ¼ E½m jF t : One can use the optimal filtering theory to find the law of motion of the posterior expectation variable. Let an innovation process Z be the difference between the realized and expected earnings; it is defined by the differential equation dX t ¼ mt dt þ s dZ t :
ð2Þ
The process Z is a Brownian motion adapted to filtration F t . Note that Z differs from Z (which is not observable and not adapted to F t ). Eq. (2) describes the dynamics of X in terms of observables. A version of Theorem 9.1 in Liptser and Shiryaev (2001) then yields that the posterior expectation of the mean earnings level evolves as dmt ¼
1
s
ðmt mL ÞðmH mt Þ dZ t :
ð3Þ
Note first that the posterior expectation process is a martingale as it incorporates all predictable information. Second, the volatility of m is inversely related to s, reflecting the fact that expectations adjust more rapidly if the noise term in the earnings process is small (the earnings signals are informative). Finally, learning slows down as evidence accumulates in favor of one state and m is close to either mL or mH . The specification of the cash flow process in (1), which, with the use of filtering theory, can be rewritten as (2) and
(3), is the novel and defining feature of the model. The motivation for this modeling choice is threefold. First, the formulation allows me to capture the key characteristics of corporate liquidity and solvency shocks. Eq. (2) implies that short-term negative (positive) liquidity shocks are more likely if the firm is of low (high) expected long-term profitability. To see this, note that a negative liquidity shock, dZt o 0, occurs if the time t EBIT dXt falls below the expected EBIT mt dt. This is more probable if the true EBIT rate is low (m ¼ mL ) rather than if the rate is high (m ¼ mH ). Similarly, a positive liquidity shock, dZ t 4 0, is more likely if m ¼ mH . Hence, through the learning mechanism, as Eq. (3) demonstrates, liquidity shocks affect expected profitability. In this way, liquidity and solvency are separate but closely interrelated, or to use a phrase heard from an investment analyst, they are like non-identical twins. Second, the present paper can integrate two strands of corporate finance literature that have been so far separate. On the one hand, cash flows are subject to unpredictable liquidity shocks to introduce non-trivial cash and dividend policy. This is similar to liquidity management models that analyze optimal dividend policy and predict precautionary cash reserves that cushion liquidity shocks (Jeanblanc-Picque´ and Shiryaev, 1995). Technically, cumulative cash flows are modeled here as a stochastic process following an arithmetic Brownian motion. As a result, instantaneous cash flows are increments of the process and are subject to Brownian shocks.4 In contrast, the structural default literature typically models instantaneous cash flows as the level of a geometric Brownian motion, in which case instantaneous cash flows are predictable and liquidity management becomes trivial. On the other hand, this model also allows for the drift of the arithmetic Brownian motion to be uncertain to enable endogenous solvency default. In the models based on a simple arithmetic Brownian motion with constant drift, the expected profitability is constant and, given fixed debt obligations, the firm is always either solvent or insolvent. This removes endogenous default from the model. With uncertain drift, as assumed here, the firm may become insolvent, in the sense that it is not profitable enough for equity holders to cover its debt obligations (as in Leland, 1994; Leland and Toft, 1996, and others). Third, it is analytically convenient to assume cash flows following the stochastic differential Eq. (1). Specifically, I obtain closed-form solutions for corporate securities values, optimal cash reserves, dividends, and a default threshold. The same stochastic environment has been successfully adapted in different contexts by Moscarini (2005) to study job matching in labor markets and Keppo, Moscarini, and Smith (2008) to analyze the value of and demand for information.
4 Instantaneous cash flows have also been modeled as increments of an arithmetic Brownian motion in the continuous-time agency-based models of corporate finance (DeMarzo and Sannikov, 2006; Biais, Mariotti, Plantin, and Rochet, 2007).
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
369
3. Solvency default without liquidity concerns
Bellman-type ordinary differential equation:
For the sake of comparison, I start with a benchmark. Following the framework introduced by Leland (1994), assume in this section that the firm is not subject to liquidity default. The endogenous solvency default is triggered by equity holders when equity value becomes negative. The equity holders are willing and able to inject any funds necessary to keep operations running whenever the value of equity is positive. Following Leland (1994), the proceeds from secondary equity financing are not subject to flotation costs. As in numerous contingent claims models of capital structure, a closed-form solution is available under the simplifying assumption that debt is issued only once at the initial date (Leland, 1994; Leland and Toft, 1996; Fan and Sundaresan, 2000; Duffie and Lando, 2001; Miao, 2005; Hackbarth, Hennessy, and Leland, 2007; Sundaresan and Wang, 2007).5 Accordingly, in this section, assume the following.
rDnc ðmÞ ¼
Assumption 1. New debt financing is available only at the initial time t =0. Assumption 2. Equity financing is costless beyond t =0. Under these assumptions the firm is without liquidity concerns and there is no room for cash holdings because any liquidity needs can be covered by an injection of equity financing. Subscript nc is used in this section with the value functions to denote the financially unconstrained case. For brevity, I suppress the dependence of the value functions on other parameters except for m, but most notably they also depend on coupon k. Consider first the value of the firm if it were financed fully by equity. Assuming that mL Z 0,6 the firm is always profitable and its value is simply equal to the expected discounted future after-tax cash flows: Z 1 m Anc ðmÞ ¼ Em ert ð1tÞ dX t ¼ ð1tÞ : r 0 The liquidation value that debt holders receive in the event of default is aAnc ðmÞ, with 1a representing the proportional liquidation cost. The next step is to find the values of the claims held by the debt and equity holders. These values depend on the flows to the claimants and the default time. The optimal default time, chosen by the equity holders, is the first time expected profitability m falls to some threshold mnc . The firm issues perpetual debt that pays a constant continuous coupon at rate k per unit of time. It follows from the standard arguments and Ito’s lemma that, before default, debt value Dnc satisfies the following 5 In an alternative and more complex setup, Goldstein, Ju, and Leland (2001) allow for upward leverage adjustments. 6 The alternative assumption that mL o 0 would introduce an optimal liquidation of the firm even in the absence of debt financing. In this case, Anc ðmÞ equals the expected discounted future after-tax cash flows up to the time of liquidation, which is optimally chosen by the equity holders. We omit this minor extension, which adds little to our model, while slightly raising the complexity of expressions.
1 00 ðmmL Þ2 ðmH mÞ2 Dnc ðmÞ þ k, 2s2
ð4Þ
subject to Dnc ðmnc Þ ¼ aAnc ðmnc Þ,
Dnc ðmH Þ ¼
k : r
This system states that if the firm is not in default, the required rate of return on the debt equals the sum of the coupon flow and the expected increase in the value of debt. At mnc the firm defaults and the debt is valued at aAnc ðmnc Þ. The boundary condition at mH , which is an absorbing state for m, asserts that Dnc is bounded and equal to the risk-free value. At each period t before default, the equity receives the expected flow of ð1tÞðmt kÞ, which is the expected free cash flow after taxes and coupon payments. As in general, mnc o k (confirmed below in (8)), this means that nonnegative dividends are expected as long as mt Zk and that in periods with mt ok, equity receives ‘‘negative dividends’’ in expectation. The negative distributions are typically interpreted in this type of model as equity issuances. This implies that, unrealistically and inconsistently with evidence on costly equity issuance, the firm resorts to frequent external financing, especially when close to default. This issue is addressed in the main model in Section 4 below. Within this setting, the equity value Enc must satisfy the following differential equation: rEnc ðmÞ ¼
1 00 ðmmL Þ2 ðmH mÞ2 Enc ðmÞ þ ð1tÞðmkÞ, 2s2
ð5Þ
subject to Enc ðmnc Þ ¼ 0,
Enc ðmH Þ ¼ ð1tÞ
mH k r
:
This equation and the boundary conditions can be interpreted similarly to the ones for debt valuation. Having defined equity and debt values, one can calculate total levered firm value Fnc, which by definition equals the sum of equity and debt: Fnc ðmÞ ¼ Enc ðmÞ þ Dnc ðmÞ:
ð6Þ
The equity holders choose the default trigger ex post—after the initial financing. This means that they maximize equity value Enc over mnc , which is equivalent to setting the smooth pasting condition on Enc ðmÞ at mnc : Enc u ðmnc Þ ¼ 0:
ð7Þ
The condition requires the optimal value function to be smooth at the default trigger, and, indeed, it can be shown that it corresponds to the first-order condition from maximization of Enc ðmÞ with respect to mnc . The optimal capital structure is determined at the issuance point with the choice of coupon k, which maximizes the value of the initial equity holders (to indicate the dependence on k directly, it is used explicitly as a parameter of the value functions in the remainder of this section). The firm seeks to finance the investment cost I with debt and new equity. If the new equity holders obtain a fraction fðkÞ of the equity and if the proportional and fixed issuance costs are l and L, then
370
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
flows when close to default, yet they prefer to keep the firm running. Moreover, it is worth noting that neither the proportional flotation cost l nor the fixed one L influences the optimal choice of k.
the following financing identity holds: I ¼ ð1lÞðDnc ðm0 ,kÞ þ fðkÞEnc ðm0 ,kÞÞL, which can be rewritten as ð1fðkÞÞEnc ðm0 ,kÞ ¼ Dnc ðm0 ,kÞ þEnc ðm0 ,kÞ
IþL : 1l
The left-hand side represents the value of the initial equity holders. Hence, maximization of the left-hand side is equivalent to maximization of Enc ðm0 ,kÞ þDnc ðm0 ,kÞ. It then follows, using (6), that the optimal choice of coupon k (and thus of the initial leverage) by the initial equity holders is equivalent to maximizing of Fnc ðm0 ,kÞ. The findings of this section are summarized in the following proposition. Proposition 1. Suppose Assumptions 1 and 2 hold and mL Z0. The optimal solvency default is characterized by the first time m is at or below mnc given by
mnc ¼
mL mH þ ½ðb1ÞmH bmL k : ð1bÞmL þ bmH k
ð8Þ
If m Z mnc , the values of equity Enc ðmÞ, debt Dnc ðmÞ, and total firm Fnc ðmÞ are given by mk mmL 1b mH m b m k , ð1tÞ nc Enc ðmÞ ¼ ð1tÞ r r mnc mL mH mnc ð9Þ Dnc ðmÞ ¼
k mmL 1b mH m b k aAnc ðmnc Þ , r r mnc mL mH mnc ð10Þ
4. Cash holdings and dividends with financing constraints As the previous section demonstrates, firms without financing constraints bear no liquidity risk and thus hold no cash reserves. To introduce liquidity risk, the model now restricts the firm’s access to external financing. After the initial issuance, which is subject to fixed and proportional costs, the firm cannot raise additional capital. This assumption allows me to find closed-form solutions for the model and obtain a clear-cut comparison between the policies of constrained and unconstrained firms. Within the model, this assumption can be justified by a sufficiently high fixed issuance cost. More generally, financing constraints can be caused by asymmetric information between insiders and outside investors. For example, the firm’s insiders may observe cash flows before they are reported to the outsiders. Because the firm knows more about its liquidity and solvency levels, it may be difficult to obtain reasonably priced external financing (similar to Myers and Majluf, 1984). For further reference, the following assumption is introduced.7 Assumption 3. New external financing is available only at the initial time t= 0.
The optimal coupon rate knc * maximizes Fnc ðm0 Þ over k.
As in the benchmark case, debt holders’ claims have absolute priority over the productive assets in the case of default. However, the firm now also holds liquid nonproductive assets, namely cash reserves, and it is assumed that these can be distributed to equity before default. The analysis abstracts from any possible contracts that might limit such distributions as its focus is on cash and dividend policies at the discretion of equity holders. In any case, covenants that limit distributions just before default may be difficult to enforce as equity holders would try to preempt them. As shown below, in most cases the optimizing firm reaches the endogenous solvency default trigger with zero cash holdings.
The closed-form expressions for the value functions are interpreted as follows. The value of equity (9) is the sum of the present value of perpetual distributions to equity and the present value of cash flows lost at default. The value of risky debt in (10) consists of two terms. The first term, k/r, is the value of risk-free perpetual debt. The second term reflects the impact of default risk and equals the present value of cash flows lost by debt in case of default. Total firm value (11) consists of three elements: the first one is the present value of the perpetual flow of net earnings, the second is the present value of the tax benefits of debt, and finally, the negative term corrects for the present value of the cash flows lost at default. Eq. (8) implies that, in general, mnc ok (see also the discussion below Proposition 5 and Fig. 3). This means that, as in other structural default models following Leland (1994), the equity holders expect negative cash
7 Two more arguments can be given to justify Assumption 3. First, the model focuses on financial distress and constraints are related to external financing of firms in distress. It can well be that external financing of growth opportunities, left unmodeled here, is less constrained. Direct evidence on the significance of financial constraints, especially for firms in distress, is provided by, e.g., Holtz-Eakin, Joulfaian, and Rosen (1994), Zingales (1998), and Campello, Graham, and Harvey (2010). Second, Assumption 3 replaces Assumptions 1 and 2 of the benchmark model, in which following the standard in the related literature we assumed that, after the initial issuance, equity could be issued frequently and without cost and that the debt flotation costs (or other implicit concerns) would prohibit debt re-issuance. Empirical evidence indicates the opposite: new equity is issued less frequently than debt (Leary and Roberts, 2005) and, if anything, the issuance costs of debt are lower than those of equity (Altinkilic- and Hansen, 2000; Leary and Roberts, 2005). While still simplifying, Assumption 3 may be better in reflecting corporate reality as indicated by the empirical evidence than Assumptions 1 and 2.
and
m k Fnc ðmÞ ¼ ð1tÞ þ t r r mmL 1b mH m b k , ð1 a ÞA ð m Þ þ t nc nc r mH mnc mnc mL ð11Þ where 1 1 b¼ þ 2 2
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 8r s2 1þ 4 1: ðmH mL Þ2
ð12Þ
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
stream that is implied by the cash policy Ct ¼ C ðmt Þ. First, by Ito’s lemma the dynamics of C are
4.1. Optimal policies At each time before default, the firm generates stochastic EBIT dXt and pays out tax-deductible debt coupon k dt. The dynamics of earnings net of taxes and debt obligations, denoted by Yt, is thus dY t ¼ ð1tÞðdX t k dtÞ ¼ ð1tÞðmt kÞ dt þ ð1tÞs dZ t :
ð13Þ
Without cash reserves and with financing constraints, the firm becomes illiquid and is forced into default as soon as dX t ok dt. In this model, positive cash reserves serve as a means to decrease liquidity risk. Denote cash reserves at time t by Ct. Cash reserves change at each time by the instantaneous interest earned on current cash holdings and the difference between net earnings and dividend payout: dC t ¼ rC t dt þdY t dDivt :
ð14Þ
In general, the higher Ct, the lower is the risk of liquidity distress. Of special interest is the level of cash holdings that allows the firm to avoid liquidity default altogether. The next proposition characterizes this level of cash reserves for given coupon k and solvency default trigger m (these values are endogenized in Section 5). Proposition 2. Let C be the lowest level of cash reserves that allows the firm to avoid liquidity default under Assumption 3. C ðmÞ is given by s2 mmL mH m C ðmÞ ¼ ð1tÞ ln m m mH m m mL H L
1 mL þ mH k : ð15Þ þmax 0, r 2 The proof, given in the Appendix, relies on the requirement that the dividend process Divt is nondecreasing. This requirement implies a set of differential equations, with (15) being the minimal solution satisfying these equations.8 I show below that cash level C plays a key role in optimal liquidity policy. Before interpreting the expression for C in (15), it is useful to determine the dividend 8 An alternative and instructive way to see the result is to think of Ct as the level of cash that is sufficient to withstand a shock in Zt that brings mt to m (irrespective of how quickly the shock is realized). For brevity, we focus here on the case of k r 12 ðmH þ mL Þ. Eq. (14) then implies that C ðmt Þ ¼ ð1tÞsðZt Z Þ, where Zt Z* is the shock that brings mt to default trigger m . To characterize Zt Z*, let us define yt ¼ f ðmt Þ ¼ ðs2 =mH mL Þlnðmt mL =mH mt Þ and y ¼ f ðm Þ (note that yt ¼ y if and only if mt ¼ m ). Applying Ito’s lemma to yt , we have
ytu ¼ yt þ
Z
tu t
371
1 1 ð2ms mH mL Þ ds þ ðZtu Zt Þ: s 2s2
This equation also holds for ytu ¼ y in particular. So the shock that brings yt to y (and also mt to m ) is Zt Z ¼ sðyt y Þ. It follows that C ðmt Þ must satisfy C ðmt Þ ¼ ð1tÞsðf ðmt Þf ðm ÞÞ ¼ ð1tÞ
s2 mmL mH m ln , mH mL mH m m mL ð16Þ
which confirms (15) in the proposition for the case kr 12 ðmH þ mL Þ. To obtain the additional term in (15), one must impose the condition that the implied dividend payout is not negative for all mt 4 m (which is not the case under (16) if k4 12 ðmH þ mL Þ).
dC t ¼ ð1tÞ½mt 12ðmH þ mL Þ dt þ ð1tÞs dZ t :
ð17Þ
Then using (14) with (17) and (13), the dividend stream is given by h m þ m i L k dt: dDivt ¼ rC ðmt Þ dt þ dY t dC t ¼ rC ðmt Þ þ ð1tÞ H 2 ð18Þ As C is the lowest level of cash reserves that allows the firm to avoid liquidity default, it is not surprising that C ¼ 0 as m reaches m in case k is not too large (kr 12ðmH þ mL Þ). If k is larger than 12ðmH þ mL Þ, then high coupon payments require positive cash holdings at all times before default. Note that the additional term in (15) when k 4 12ðmH þ mL Þ, that is 1=r½k12ðmH þ mL Þ, makes the dividend rate in (18) equal to zero at default. The explicit formula for C in (15) allows for easily calculation of several interesting direct effects of other variables: @C 4 0, @m
@C o 0, @m
@C Z 0, @k
@C 4 0: @s
The effects of m and m are opposite: C increases in m and decreases in m . To see the intuition for this result, note that the difference mm is a measure of solvency (distance to insolvency). If, with other variables kept constant, m decreases or m increases, the firm becomes less solvent. Now recall that persistent negative shocks decrease expected profitability and solvency (see Eq. (3)). So a less solvent firm needs to suffer from a less significant series of liquidity shocks before it is considered insolvent. The target level of cash C is meant to protect against illiquidity, but not against insolvency. Consequently, lower solvency implies lower C . The level of debt coupon k affects C directly and positively if k is relatively high (k 4 12ðmH þ mL Þ). This effect is due to the burden that coupon payments impose on cash flows. If k is high, then larger cash holdings are required to complement operational cash flows in meeting high debt obligations. It is important to note, however, that coupon choice will affect C also indirectly via the endogenous insolvency trigger m . The analysis will return to the combined effect of k on C and its implications in Section 5. The direct effect of EBIT volatility s on C is positive. s measures the magnitude of liquidity shocks so higher cash reserves are needed to cushion more pronounced shocks. Also in this case, there are other indirect effects; a change in s will affect the coupon choice and default policy. Section 4.2.3 looks at the total effect and demonstrates, more interestingly, that it remains positive. Suppose now that the dividend-cash policy aims at decreasing the risk of liquidity default. It is soon verified that this is indeed optimal if the firm’s objective is to maximize equity value. Intuitively, this suggests that all cash flows are retained if the firm is at risk of liquidity default and that dividends are paid out as long as such distributions do not bring in liquidity risk. To characterize this proposed dividend policy more formally, denote it by Divt* at each time t. If, for a given mt , the cash reserves are
372
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
below the target level C , the firm retains all the earnings: dDivt
¼0
if Ct o C ðmt Þ:
ð19Þ
If the cash level is at C ðmt Þ, the payout policy is such that this level is maintained as mt fluctuates. This is, according to (18) h m þ m i L k dt if Ct ¼ C ðmt Þ: dDivt ¼ rC t þ ð1tÞ H ð20Þ 2 If the cash level exceeds C ðmt Þ, the residual is paid out
dDivt ¼ Ct C ðmt Þ
if Ct 4 C ðmt Þ:
ð21Þ
Before proving that this cash-dividend policy is optimal for equity holders, it is useful to demonstrate an intuitive property of optimal equity value that states that the partial derivative of the optimal equity value Eðm,CÞ with respect to C is larger than or equal to one. This is expected because any extra cash holdings can be paid out immediately as dividends, and the optimal dividend policy can be followed again. To see it, note that for any cash level C, equity value Eðm,CÞ of the firm following the optimal dividend policy must be at least equal to the sum of optimal equity value with CDC cash, Eðm,CDCÞ, and DC in a dividend payout: Eðm,CÞ Z Eðm,CDCÞ þ DC. After rearranging the inequality and letting DC go to zero, one obtains EC ðm,CÞ Z1:
ð22Þ
The following proposition characterizes the optimal dividend policy. Proposition 3. The payout policy (19)–(21) maximizes equity value. The formal proof of this assertion provided in the Appendix is rather involved, but the basic intuition is fairly straightforward. The proposed payout policy is optimal because it directs the retention of all cash flows whenever marginal cash holdings decrease the probability of illiquidity (so that the cash withheld in the firm is worth more than its face value, EC ðm,CÞ 41) and the payout of excess cash flows otherwise (when marginal cash holdings in the firm are equal to their face value, EC ðm,CÞ ¼ 1). A useful corollary of Proposition 3 is that once cash holdings reach C , then dividend policy is given by (20) and cash balance stays at C ðmÞ for all levels of m until solvency default at m . Consequently, the optimal cash and dividends are then described by deterministic functions of m. 4.2. Implications This section derives empirical implications with respect to cash and dividend policies and compares the present model to the standard structural models without liquidity concerns. Changes in exogenous parameters typically affect a number of or all endogenous variables simultaneously. I analyze the comparative statics using the base case as a reference level. The base case parameter values are the following: mL ¼ 0, mH ¼ 0:2, s ¼ 0:2, r= 0.06, t ¼ 0:15, a ¼ 0:6, l ¼ 0:1, and m0 ¼ 12ðmH þ mL Þ ¼ 0:1. The initial value
of the expected cash flows is the mean of the binomial distribution. The volatility of cash flows is chosen such that the initial coefficient of variation (that is, s=m0 ) is equal to 2.0. This corresponds to the annualized coefficients of variation reported in Irvine and Pontiff (2009)—they are equal to 1.59 for cash flows and 2.42 for earnings. The choice of the proportional flotation cost of l ¼ 0:1 is above the average parameter value estimated in some other studies (Gomes, 2001; Hennessy and Whited, 2005), and is justified by this paper’s focus on firms that are financially constrained beyond the initial issuance. The values of the risk-free rate r, the tax advantage of debt t, and the recovery rate a closely correspond to the recent calibration exercises for trade-off models; see, for example, Hackbarth, Miao, and Morellec (2006). 4.2.1. Interaction of liquidity and solvency The model identifies several channels that link liquidity and solvency. The first one, referred to as the information channel, is a consequence of the EBIT process specified in (1), in which both long- and short-term prospects are uncertain. This gives rise to a filtering problem with the solution in (2) and (3). The dynamics of both the observable EBIT in (2) and profitability in (3) are subject to common shocks. In financial terms, this means that liquidity shocks accumulate to affect solvency levels. If a firm is persistent in liquidity surprises, either positive or negative, they stop being surprising. The second link works from solvency to liquidity and I designate it as the hedging channel. As shown in Section 4.1, changes in solvency (measured by the distance to insolvency ðmm Þ) affect liquidity needs. A less solvent firm has a decreased continuation value and liquidity shocks it is willing to hedge are lower, so the firm needs less cash. In one extreme, for example, a nearly insolvent firm should optimally hold only little cash sufficient to hedge the last negative shocks before insolvency default. The hedging channel also means that for a given level of cash, a decrease in solvency leads to an increase in liquidity. The information and hedging channels determine the dynamics of cash management and dividend payouts. A third linkage between liquidity and solvency is related to optimal capital structure and will be discussed in Section 5. It should be noted that the interactions of liquidity and solvency are not specific to firms in financial distress. The dynamics of cash and payouts that these interactions imply are valid for safe firms as well. What is important is that firms are equally financially constrained over time. However, if financing constraints are time-varying (e.g., safe firms become unconstrained), then the strength of the effects may also vary. 4.2.2. Precautionary cash holdings The structural models following Leland (1994) have typically assumed away a meaningful cash policy. As in the benchmark analysis in Section 3, the equity holders are assumed to have no financial constraints and equity issuance is costless. Consequently, any necessary funds
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
are provided by new equity issuance as long as the equity holders are willing to continue operating the firm. This leaves the cash policy irrelevant. In contrast, the present model predicts that firms hold positive amounts of cash to meet debt coupon payments in case these obligations exceed current earnings. In other words, with costly external financing, cash reserves serve as a cushion to prevent short-term liquidity distress. The key feature of the model is that cash reserves are not meant to cover any losses. If the firm persistently generates losses, the expected profitability decreases and, ultimately, the firm becomes insolvent. As a result, the optimal policy prescribes cash holdings that are a function of the expected earnings and are sufficient to cover liquidity shocks up to the point of endogenous default. This cash policy has several desirable features and interesting implications, but first it is important to consider whether the role of cash implied by the model reflects the practice of corporate finance. Empirical studies indicate that the demand for corporate cash is driven mainly by the precautionary motives (Opler, Pinkowitz, Stulz, and Williamson, 1999; Lins, Servaes, and Tufano, 2010; Bates, Kahle, and Stulz, 2009). Precautionary cash can serve to fund future growth via capital expenditures and acquisitions (as in, e.g., Almeida, Campello, and Weisbach, 2004; Riddick and Whited, 2009) or to buffer against adverse cash flows. In a study based on financial accounting data, Opler, Pinkowitz, Stulz, and Williamson (1999) find that cash holdings do not seem to be used for capital expenditures, acquisitions, or dividend payments. Instead, large changes in cash are driven by negative or positive cash flow shocks. Based on a recent extensive survey among international chief financial officers (CFOs), Lins, Servaes, and Tufano (2010) conclude about strategic cash (their paper differentiates between operational cash, required in day-to-day operations, and strategic cash, the one studied here and in most of the literature): [S]trategic cash serves a basic function—to provide a general purpose buffer against future cash shortfalls. CFOs state that this is the primary driver of strategic cash holdings—with its importance ranking far exceeding that of other response choices. Thus, it appears that firms use strategic cash to insure against all types of negative shocks to cash flows, rather than to just fund growth when external capital may not be available. This finding positions strategic cash holdings as a form of financial distress (or bankruptcy) insurance. Besides the fact that CFOs do not report funding future investment as an important reason for holding cash, Lins, Servaes, and Tufano (2010) also find that firms that selfreport high needs for future external capital hold, in fact. significantly less cash than other firms. Overall, both survey and accounting data evidence closely matches the role of cash that is specified by the model. It is worth noting that the cash ratio (defined as cash holdings divided by total firm value) implied by the model is in line with cash holdings observed among U.S. firms.
373
With the base case parameters, the cash ratio equals 20.6% and is similar to the average cash ratio of 23.2% shown for a sample of U.S. firms in 2006 by Bates, Kahle, and Stulz (2009). The model predicts that optimal precautionary cash holdings increase in profitability. This relation is directly explained by the hedging channel linking liquidity and solvency. A more refined prediction is that cash holdings of financially constrained firms are strongly correlated with cash flows (compare (17) and (2)), while cash holdings of unconstrained firms are not systematically related to cash flows. This implication provides an alternative interpretation of the evidence of Almeida, Campello, and Weisbach (2004) that shows the same pattern of cash flow sensitivity of cash holdings. Almeida, Campello, and Weisbach (2004) explain their findings and precautionary cash holdings by the firms’ need to fund future investments while facing financing constraints.9 In contrast, in the present fully dynamic model, a constrained firm uses positive cash flows to build up cash holdings and uses cash holdings to cover negative cash flows to avoid inefficient default in the future. This mechanism can also be explained by the interaction of liquidity and solvency. Positive cash flow shocks increase the level of solvency via the information channel. Higher solvency results in higher demand for cash via the hedging channel.
4.2.3. Earnings volatility, profitability uncertainty, and cash holdings This section looks at how the two sources of uncertainty present in the model affect cash holdings. EBIT volatility is related to short-term liquidity risk and profitability uncertainty is related to long-term solvency risk. Fig. 1 examines their effects on the cash ratio.10 Increasing EBIT volatility s has two main direct effects on the endogenous variables. First, it increases the magnitude of liquidity shocks and, thus, liquidity risk. Second, it makes the instantaneous cash flows less informative about the true profitability m . Less informative signals lead to an increase in solvency default trigger m due to a lower value of waiting with the decision to default (see Fig. 3). Fig. 1A shows that the cash ratio increases in s. There are a number of forces at work. A larger liquidity risk requires a larger cash buffer. An increase of m in s means lower solvency and so a lower demand for cash. However, with less informative cash flow signals, the firm must be ready to withstand significant negative liquidity shocks before eventual insolvency, which requires high levels of cash. Fig. 1A indicates that the first and third effects dominate the second one. This prediction is consistent 9 Khurana, Martin, and Pereira (2006) and Sufi (2009) find further supporting evidence. Riddick and Whited (2009) question these results and, applying a correction in measurement error in Tobin’s q, find a negative cash flow sensitivity of cash. Our theoretical contribution can be seen as supporting the positive cash flow sensitivity of cash using a different, arguably more prevalent, motive for corporate cash. 10 All the results presented in this section hold equally for cash measured in levels.
374
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
0.3
0.24 0.23 Cash ratio
Cash ratio
0.25 0.2
0.22 0.21
0.15 0.2 0.1 0.1
0.15
0.2 σ
0.25
0.3
0.19 0.15
0.2 μH − μL
0.25
Fig. 1. Effects of EBIT volatility s and profitability uncertainty mH mL on the cash ratio. The plotted values are of the financially constrained firm with liquidity concerns and with default and leverage determined endogenously. The other parameter values are mL ¼ 0, mH ¼ 0:2, r= 0.06, t ¼ 0:15, a ¼ 0:6, l ¼ 0:1, and m0 ¼ 0:1.
with the empirical findings of Opler, Pinkowitz, Stulz, and Williamson (1999) and Han and Qiu (2007). Altogether, the analysis confirms that the explanation in Bates, Kahle, and Stulz (2009), that the recent spectacular expansion in cash holdings among U.S. firms is to a large degree due to the increasing volatility of cash flows, has a theoretical grounding in a model with endogenous cash and financing. Consider now the effects of changes in the uncertainty about the true level of profitability. With the binomial distribution of m , this uncertainty is measured by the spread between the high (mH ) and low (mL ) realizations of mean instantaneous earnings. This variable captures the uncertain economic value of the firm. In the comparative statics exercise, I vary mH mL around the mean m0 ¼ 12ðmH þ mL Þ ¼ 0:1. One effect is that a higher mH mL increases both the profit and loss potentials of the firm. The other effect is that with a higher spread mH mL , the learning dynamics in mt become more rapid as the cash flow signals are more informative about either realization of m (see Eq. (3)). This leads to a decrease in default trigger m (see Fig. 3). Fig. 1B shows that cash holdings fall in increasing mH mL . The negative effect comes from the increased speed of learning from cash flow shocks about the expected profitability. If negative liquidity translates quickly in a drop in expected profitability mt , then less cash is required to cushion liquidity distress before insolvency at m . It turns out that this effect dominates and C falls. This impact of mH mL on cash levels is opposite to the one of s and has not been tested empirically.
4.2.4. Smooth dividends The standard trade-off models treat dividends simply as a balancing item. This leads to a dividend pattern that bears little resemblance to actual corporate payout decisions. As in the benchmark case in Section 3, in these models all positive free cash flows are paid out and
dividends are omitted in periods of negative free cash flows. The model of this paper predicts a very different optimal payout policy. Consistently with empirical observations, firms either do not pay dividends at all (if cash is below C ðmÞ), or, after they initiated distributions, they pay dividends regularly (if cash is at C ðmÞ). When cash reserves are at the target level C ðmÞ, the optimal dividend payout is given by (20). These payouts allow the firm to maintain cash reserves at C ðmÞ with changing m. The dividends characterized in (20) are, in contrast to net earnings in (13), without a Brownian shock and, moreover, are strictly positive before default. This implies that dividends are smoothed relative to earnings in line with persistent empirical evidence (Lintner, 1956; Brav, Graham, Harvey and Michaely, 2005; Leary and Michaely, 2008) and that firms in distress would rather reduce dividends but not omit them (DeAngelo and DeAngelo, 1990). Fig. 2 illustrates the dividend smoothing generated by the model. The left-hand panel presents a simulation of EBIT process Xt and posterior expectations mt . I then use the model with liquidity concerns to calculate optimal dividends and cash reserves (the debt coupon and default trigger are set at the optimal levels from the analysis of Section 5). The right-hand panel shows quarterly net earnings and dividends from this simulation. Clearly, the net earnings are positive and negative in different quarters, but these changes are only partly reflected in dividend changes. The dividends remain relatively stable and even in the case of losses, the firm continues to pay out dividends. The dividend smoothing is driven by the interactions between liquidity and solvency and by the role of cash holdings as a cushion against liquidity shocks. The mechanism can be described as follows. Positive earnings surprises that bring in disposable cash flows also increase expected profitability (the information channel). A more profitable firm is more valuable and thus, it requires more cash reserves to fend off liquidity distress before declaring solvency default (the hedging channel). As a result,
0.4
0.2
0.2
0.1
0 −0.2 −0.4
0.12
375
0 −0.1 x 10−3 5
−0.2
0.1 μt
4
0.08
3
0.06 0
2
4
Dividends
Net earnings
Xt
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
2
6
0
2
Years
4
6
Years
Fig. 2. Simulated quarterly earnings and dividends. Dividends are smoothed compared to earnings. The parameter values are mL ¼ 0, mH ¼ 0:2, s ¼ 0:2, r =0.06, t ¼ 0:15, a ¼ 0:6, l ¼ 0:1, and m0 ¼ 0:1.
dividends are flattened in the case of high earnings because an increase in cash flows is offset by increasing optimal cash reserves. In the case of surprisingly low earnings, expected profitability decreases (the information channel), the firm gets closer to endogenous solvency default, and the cash level that allows it to hedge liquidity distress decreases (the hedging channel). Consequently, low earnings lead to a release of some of the cash holdings that are distributed to equity. Both positive and negative earnings surprises are smoothed out, and as Fig. 2 demonstrates, the model predicts positive and stable dividends even if earnings are very volatile. 5. Capital structure, default, and credit spreads 5.1. Valuation of corporate securities The values of corporate securities depend on a large number of factors, among them the initial cash level financed by external investors. To obtain closed-form solutions, it is assumed that the firm issues securities sufficient to cover cash holdings C ðm0 Þ, which allow the firm to avoid liquidity risk. Assumption 4. C0 ¼ C ðm0 Þ. Note that this assumption is partially validated by Assumption 3, which constrains the availability of external financing to the initial date. Without additional external financing, all the required cash is raised with the initial issuance.11 If C0 ¼ C ðm0 Þ, then by Proposition 3 the optimal dividend policy is given in (20) for all mt 4 m . This payout policy implies that Ct ¼ C ðmt Þ for all mt 4 m . In 11 Note that with a sufficiently high variable issuance cost l, the firm might prefer issuing securities for less than I þ C ðm0 Þ (but more than I) and collecting the remaining cash up to C from the retained earnings. The firm would balance the cost of exposure to liquidity risk and the benefit of a cheaper source of capital. We focus on the case generated by Assumption 4 that results in closed-form solutions for equity and debt values.
other words, under Assumption 4, the firm holds cash reserves at the level C ðmt Þ until the endogenous solvency default and is hedged against liquidity risk. Under the assumptions of the model, debt value D equals the present value of continuous coupon payments up to the time of default as soon as mt reaches m . DðmÞ must satisfy the following differential equation: rDðmÞ ¼
1 ðmmL Þ2 ðmH mÞ2 D00 ðmÞ þ k: 2s2
At default, debt holders receive a fraction a of the EBIT-generating technology. That is, following the earlier literature, the model simplifies the financing issues after default. This implies that the debt holders recover aAðm Þ at default, where AðmÞ ¼ að1tÞm=r if mL Z0. Thus, the differential equation for D is coupled with the following boundary conditions: Dðm Þ ¼ aAðm Þ,
DðmH Þ ¼
k : r
Before default at the first time mt falls to m , the equity receives a flow of dividends that is equal to (combining (20) and (15)) mmL mH m dDivt ¼ a1 ln dt þa2 dt, mH m m mL where a1 ¼
ð1tÞr s2 , mH mL
and n m þm o H k : a2 ¼ ð1tÞmax 0, L 2 Then it follows from the standard arguments that equity value E must satisfy the ordinary differential equation: 1 mmL mH m þ a2 , rEðmÞ ¼ ðmmL Þ2 ðmH mÞ2 E00 ðmÞ þ a1 ln 2 mH m m mL 2s ð23Þ
376
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
subject to the following boundary conditions: Eðm ÞC ðm Þ ¼ 0,
EðmH ÞC ðmH Þ ¼ ð1tÞ
mH k r
:
ð24Þ
As usual, the left-hand side of (23) reflects the required rate of return per unit of time for holding equity. The right-hand side represents the expected change in equity value plus the dividend flow per unit of time. The boundary condition at m states that the value of equity net of cash is zero at insolvency and is in line with the assumption that the equity holders can withdraw non-productive liquid assets prior to default. The boundary condition at mH ensures that EðmH ÞC ðmH Þ is bounded and equal to the risk-free value of free cash flows. Solving the respective differential equations with the boundary conditions, one can obtain closed-form solutions for both equity and debt values. The following proposition shows these results. Proposition 4. Suppose Assumptions 3 and 4 hold. Then, for a given default trigger m and m Z m , debt and equity value satisfy k mm 1b mH m b k DðmÞ ¼ L a Að m Þ , ð25Þ r r m mL mH m and EðmÞ ¼ C ðmÞ þ ð1tÞ
mk r
mm 1b mH m b m k L , ð1tÞ m mL r mH m
ð26Þ with b given in (12). Eq. (25) implies that, for a given coupon k and default trigger m , the value of debt is the same as in the benchmark case reported in Eq. (10) and is equal to the sum of the value of risk-free debt and the present value of the loss at default. Eq. (26) reveals that the value of equity, which is the present value of the flow of dividends until default, can be decomposed into three elements. It is a sum of the value of corporate cash plus the present value of all future net earnings plus the value of the option to default on debt of the insolvent firm at m . Notably, despite the fact that optimal dividends are different than net earnings, the equity value consists of the discounted value of net earnings plus current cash holdings. The reason for this is that even with liquidity concerns, the equity holders remain, in expectation, the claimant of all net earnings before default, but they use cash as a buffer between net profits and dividends to time distributions appropriately to manage liquidity risk. The total firm value F equals the sum of the value of equity and the value of corporate debt. From Proposition 4, it follows that, if mL Z 0,
m k FðmÞ ¼ EðmÞ þ DðmÞ ¼ C ðmÞ þð1tÞ þ t r r mmL 1b mH m b m k : þ ð1 a Þð1 t Þ t r m mL r mH m ð27Þ Eq. (27) demonstrates that the firm value is a sum of four components. It consists of the face value of cash holdings plus the present value of earnings net of taxes plus the
present value of tax shield of debt minus the probabilityadjusted present value of cash flows lost at default. Using (6) in (27) shows that the firm value net of cash, FðmÞC ðmÞ, is equal to Fnc ðmÞ, that is, the firm value of the firm with no financing constraints. By holding cash reserves C ðmÞ, the firm is hedged against liquidity distress and thus, the value of its productive assets is equal to those of the financially unconstrained firm. Moreover, the cash in the firm C ðmÞ is worth exactly C ðmÞ because the interest gained on cash equals the investors’ discount rate. However, this close relation between the values of constrained and unconstrained firms holds only for given common debt levels (if k’s are equal). But as the next section shows, the constrained firm with liquidity concerns chooses different financial leverage than the firm with no financing constraints.
5.2. Default and optimal capital structure Under Assumptions 3 and 4, the firm uses cash reserves to hedge against liquidity shocks. Then the timing of default is endogenously selected by the equity holders. Default takes place at the moment that the firm is not solvent enough. The default policy takes the form of a lower threshold on m, which maximizes equity value. This is achieved at m , which satisfies the smooth pasting condition: Euðm Þ ¼ C uðm Þ:
ð28Þ
(Compare it with the smooth pasting condition (7) and the boundary condition for E at m ¼ m in (24) in the present model.) The initial equity holders use equity and debt to finance the investment cost I and the initial level of cash reserves C ðm0 ,kÞ (to stress the dependence on k, I add parameter k to cash and value functions in the rest of this section). If the new equity holders obtain a fraction fðkÞ of equity and if the proportional cost of issuance of both debt and equity is l and the fixed cost of issuance is L, then the following financing identity holds: I þ C ðm0 ,kÞ ¼ ð1lÞðDðm0 ,kÞ þ fðkÞEðm0 ,kÞÞL: This can be rewritten as ð1fðkÞÞEðm0 ,kÞ ¼ Dðm0 ,kÞ þ Eðm0 ,kÞ
C ðm0 ,kÞ I þ L : 1l 1l
The left-hand side represents the value to the initial equity holders. It follows that the optimal k that maximizes ð1fðkÞÞEðm0 ,kÞ, also maximizes the righthand side, and the objective function can be expressed as (30) in the next proposition. The same proposition also presents the solution to the smooth pasting condition (28) for the optimal default trigger. Proposition 5. Under Assumptions 3 and 4, the optimal solvency default is characterized by the first time that m is at or below m , given by
m ¼
mL mH þ ½ðb1ÞmH bmL k : ð1bÞmL þ bmH k
ð29Þ
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
The optimal debt coupon rate k* maximizes Fðm0 ,kÞ
C ðm0 ,kÞ 1l
ð30Þ
over k. Fig. 3 presents the main properties of the optimal default trigger function (29). m is a convex increasing function of k. It is intuitive that m is equal to mH (mL ) with coupon equal to mH (mL ). This is because, with k ¼ mH , the equity holders expect losses for all m and thus, default immediately with m ¼ mH . When k ¼ mL , the firm generates positive expected profit net of coupon for all m except at the absorbing state at mL , and thus, the equity value is maximized with a default at m ¼ mL . For the intermediate values of k in ðmL , mH Þ, the default threshold falls below the coupon rate; in the figure, m lies below the diagonal m ¼ k. This difference between the expected earnings at default and coupon represents the value of waiting to default. Because of this value, the equity holders prefer to keep the firm running despite that the coupon obligations exceed the expected earnings. As illustrated in Fig. 3, default triggers m increase in b. By Eq. (12), b depends on the earnings signal quality (that is, on s and on mH mL ) and the discount rate. It follows that the default trigger increases with the noisiness of the earnings signals (higher s or smaller mH mL ) and with the level of discount rate r. Intuitively, with noisy signals and high r, the value of postponing default in order to wait for new information decreases. m in Eq. (29) is the same as mnc in the benchmark case reported in (8). Since the firm is effectively hedged against liquidity distress, it makes sense that the solvency default trigger that maximizes equity value is the same as for the financially unconstrained firm. Interestingly, this is despite the precautionary cash reserves that need to be held in the firm. However, the isomorphism of m and mnc means only that the default policy in both cases is the same if coupon obligations are the same. The second part
377
of Proposition 5 implies that, in general, the optimal coupons differ in the two cases with and without liquidity concerns. Using (27), the objective function (30) can be rewritten as Fnc ðm0 ,kÞ
l 1l
C ðm0 ,kÞ:
ð31Þ
Comparing this objective function with the one of the financially unconstrained firm (which was Fnc ðm0 ,kÞ), one notes the major difference between the cases. Whereas the coupon choice in the benchmark analysis was independent of any issuance cost, the optimal coupon of the constrained firm depends on the proportional issuance cost l. This is because now the capital structure choice interferes with the firm’s financing needs: the firm needs to raise capital to cover the initial cash balance, and the required cash balance depends on the coupon rate itself. As raising additional units of cash is costly due to the variable issuance cost, the firm’s optimal choice of debt also takes into account its impact on the initial amount of cash to be raised. One can expect that the outcome depends on the sign of the relation between k and C . If C decreases in k, the firm should be willing to accept a higher coupon to limit the needed cash and save on the cost of raising additional capital. If, on the other hand, C increases in k, it should be optimal to take somewhat less coupon and debt. I shall analyze these effects and their consequences in Section 5.3. In the model, liquidity risk can be hedged with appropriate cash policy. However, the presence of liquidity concerns is sufficient to distort the financing policies and the firm value. The value of the constrained firm net of cash (FC ) is always less than (or equal to, is some special cases) the value of the unconstrained firm (Fnc): Fðm0 ,k ÞC ðm0 ,k Þ r Fnc ðm0 ,knc Þ: The relevant comparison is with the constrained firm net of cash because the unconstrained firm does not hold cash in the model. The reason for the inequality is that the unconstrained firm chooses its debt coupon knc* to maximize Fnc ðm,kÞ. By (27), the same coupon maximizes Fðm,kÞC ðm,kÞ. However, the constrained firm selects its coupon k* to maximize (30), Fðm,kÞC ðm,kÞ=ð1lÞ. It follows that the capital structure of the constrained firm is distorted in such a way that its net value is below that of the unconstrained firm. It is worth noting that, in the absence of financing frictions in the sense of zero variable cost of issuance (l ¼ 0), the objective function simplifies to Fnc ðm0 ,kÞ and is exactly equivalent to the problem in the case without liquidity constraints. Moreover, the fixed cost of issuance does not matter for the choice of the optimal k.
5.3. Implications Fig. 3. Optimal default trigger m as a function of debt coupon k for various values of b with b1 o b2 o b3 and b3 -1. b increases in s and r and decreases in ðmH mL Þ.
To examine implications of the model with respect to capital structure and credit spreads, I employ the same base case parameters as in Section 4.2.
378
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
without liquidity concerns, capital structure is determined to balance tax benefits of debt and bankruptcy costs. With liquidity concerns, firms must take into account another trade-off layer. Firms do not want to take too little debt because high solvency exposes them to high liquidity risk. This requires firms to raise more initial cash holdings, which is costly. On the other hand, firms do not want to accept too high debt levels as this implies high coupon payments that put a strain on cash flows. In this case, firms need more cash holdings to complement cash flows in case of liquidity shocks—and this is again costly.
5.3.1. Interaction of liquidity and solvency Section 4.2 introduced the information and hedging channels linking liquidity and solvency. Endogenous leverage generates a third linkage; liquidity concerns affect the level of solvency via the capital structure choice. Accordingly, this effect is referred to as the leverage channel. It originates from the interaction between capital structure and the demand for external financing. As discussed above, the constrained firm raises capital to cover not only the required investment outlay but also the initial cash reserves, and the level of cash is affected by the amount of issued debt. Because marginal external financing is costly, the firm, in its capital structure choice, will attempt to minimize the initial level of cash (see Eq. (30)). To understand the direction of this mechanism, consider the impact of debt coupon rate k on cash holdings. From (15) observe that k affects C in two ways. The main effect works for all levels of k indirectly via solvency default trigger m . Higher coupon obligations mean closer insolvency (that is, higher m ; see (29)) and this results in lower cash needs. The direct effect, discussed already in Section 4.1, comes from the last term of (15) and stems from the fact that high debt obligations deplete cash flows. It works if debt coupon is relatively high (k 4 12ðmH þ mL Þ) and results in C increasing in k. Fig. 4 demonstrates the effects of coupon on the target level of cash C for the base case environment. The total impact is such that cash holdings decrease in k for small k and increase in k for larger k, and it appears robust for various parameter choices. Because C is minimized at the intermediate levels of k, it follows that to minimize the flotation cost of raising the initial cash reserves, the constrained firm issues more debt than the unconstrained firm if the unconstrained firm’s optimal coupon is relatively low (below ðmL þ mH Þ=2). The opposite happens if the unconstrained firm’s optimal coupon is high (above ðmL þ mH Þ=2). The leverage channel can be viewed as an extension of the trade-off theory. In the standard trade-off theory
5.3.2. Cash holdings and debt The empirical literature has been interested in the impact of debt on corporate cash holdings, treating the former variable as exogenous. Fig. 4 presents the cash level C as a function of debt coupon and shows that cash decreases in debt for low and moderate levels of debt and increases with high levels of debt. The empirical evidence of Opler, Pinkowitz, Stulz, and Williamson (1999) shows a negative relation between cash and leverage. A more refined study by Guney, Ozkan, and Ozkan (2007) provides evidence for a non-monotonic relation between cash holdings and debt, in line with this paper’s prediction. The model predicts that the marginal value of cash holdings to equity holders varies across firms with different capital structures. In particular, the model is able to encompass all the main hypotheses of the recent empirical study of Faulkender and Wang (2006). They hypothesize and empirically show that the marginal value of cash is higher for financially constrained firms and is decreasing in the level of cash reserves and the leverage ratio. In the present framework, the marginal value of cash is equal to one for both unconstrained and constrained firms at or above the target cash level C . Because the probability of liquidity default decreases with an additional unit of cash, the marginal value of cash exceeds one in constrained firms with cash below C .
1
Target cash
0.8
0.6
0.4
0.2
0 0.04
0.08
0.12
0.16
Debt coupon Fig. 4. Cash balance C as a function of debt coupon k. The parameter values are: mL ¼ 0, mH ¼ 0:2, s ¼ 0:2, r = 0.06, t ¼ 0:15, a ¼ 0:6, and m0 ¼ 0:1.
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
0.05
0.05 0.045
Credit spread
Credit spread
379
0.04 0.035 0.03
0.045 0.04 0.035 0.03 0.025
0.1
0.15
0.2 σ
0.25
0.3
0.15
0.2 μH − μL
0.25
Fig. 5. Effects of EBIT volatility s and profitability uncertainty mH mL on credit spreads. The solid lines plot the respective values of the financially constrained firm with liquidity concerns, and the dashed lines plot the values of the unconstrained firm. Default and leverage are determined endogenously. The other parameter values are mL ¼ 0, mH ¼ 0:2, r =0.06, t ¼ 0:15, a ¼ 0:6, l ¼ 0:1, and m0 ¼ 0:1.
It follows that the marginal value of cash is larger for constrained firms and that, in the case of constrained firms, it decreases with the level of cash holdings. Most interestingly, one can derive a clear interpretation of the negative cross-sectional relation between the marginal value of cash and debt level found by Faulkender and Wang (2006) (they seem to build their hypothesis and interpretation on the contingent claims models that do not have a meaningful cash policy). As explained above, for small and moderate levels of debt, the target level of cash decreases in debt. Then, for a fixed level of cash below C , an increase in debt implies that the current cash holdings are closer to C so the firm is closer to being fully hedged against liquidity risk. Consequently, the marginal value of cash decreases in debt. The model also predicts an untested possibility that the relation is reversed for high levels of debt. 5.3.3. Earnings volatility, profitability uncertainty, and credit spreads This section analyzes the effects of the two sources of uncertainty on credit spreads. As in Section 4.2.3, EBIT volatility is measured by s. Profitability uncertainty is varied by a mean preserving spread of mH mL around 0.1. Fig. 5 displays their effects on credit spreads defined by the difference between the debt yield and the risk-free rate, k/D r. The presented values are calculated with default triggers and coupons at the optimal levels. The total effect of EBIT volatility s on credit spreads is negative (see Fig. 5A). Higher volatility magnifies liquidity shocks and makes cash flow signals less informative about profitability. Higher liquidity risk and lower informativeness of cash flows increase the cost of debt. The opposing effect decreasing credit spreads is that the firm responds to the more expensive debt financing by issuing less debt. It turns out that the second effect dominates. Changes in the uncertainty about the true level of profitability affect credit spread in several ways. A higher spread mH mL means that cash flow signals are more informative and that default is relatively late. These
effects make debt cheaper, but they may be offset if more debt is issued. As demonstrated in Fig. 5B this is the case, credit spreads increase in mH mL because the combined effects of the decreased default trigger and of informative cash flows make it attractive for shareholders to issue more debt. It is interesting to investigate how cash and credit spreads are related to each other when the exogenous variables vary (that is, combine Fig. 1A with Fig. 5A and Fig. 1B with Fig. 5B). It appears that credit spreads decrease in cash. This pattern is persistent and irrespective of whether the underlying exogenous variable is s or mH mL . Empirically, such a negative relation is found in Acharya, Davydenko, and Strebulaev (2008). When comparing the effects of earnings volatility and profitability uncertainty in Figs. 1 and 5, it is striking that the two measures of risk have the opposing signs for some of the key financial variables.12 In essence, this is because EBIT volatility and profitability uncertainty are differently related to liquidity and solvency concerns. Taken together, the results presented here call for a differentiation between short-term volatility in cash flows and long-term uncertainty about economic prospects in both theoretical and empirical analysis of corporate finance.
5.3.4. Dispersion of credit spreads The model has further implication for debt credit spreads when it is compared to the benchmark model without liquidity concerns. In Fig. 5, the solid lines plot the values for the financially constrained firm with liquidity concerns, and the dashed lines plot the values for the unconstrained firm. Both Figs. 5A and B show that with financing constraints, the predicted credit spreads 12 The effects are not always opposite for other variables. In unreported analysis, we find, for example, that the leverage ratio decreases in both measures of uncertainty. This relation is in accordance with the empirical evidence on leverage (Titman and Wessels, 1988).
380
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
are less dispersed than in the case without financing constraints. This result is explained by the leverage channel that links liquidity and solvency, that is, by the influence of exposure to liquidity risk on optimal leverage. As discussed before, because external financing is costly, firms attempt to not raise too much initial liquidity and this is achieved at intermediate debt levels. Hence, high debt levels are decreased and low debt levels are increased, which ultimately translates into flattened credit spreads when compared to the financially unconstrained case. This effect allows the model to address the key problem with the predictive power of structural models as reported by Eom, Helwege, and Huang (2004). They test the yield spread predictions of several structural models and conclude that the available models tend to produce too high a dispersion of predicted credit spreads. Where the structural models predict high credit spreads, these predictions notably exceed the actual spreads, and where the models predict low credit spreads, these predictions fall significantly below the observed ones. A closely related regularity is shown by Huang and Huang (2003), who find that a small (large) fraction of yield spreads of investment-grade bonds (high-yield bonds) is explained by credit risk implied by structural models. Liquidity concerns in this model shift the predicted credit spreads in the desired direction. 5.3.5. Leverage A weakness of the standard trade-off model of capital structure that has frequently been raised in the literature is that the optimal leverage implied by the model exceeds the leverage ratios observed empirically. The model proposed here lessens this problem. Numerical analysis indicates that the leverage ratio (debt to firm value) of the firm with liquidity concerns is significantly below the ratio of the unconstrained firm. For different parameter values, the drop in the leverage ratio is between 15% and 40%. For example, with the base case parameters, the unconstrained firm’s leverage ratio is 0.68 and falls to 0.53 for the firm with liquidity concerns. While there are a number of effects that liquidity concerns bring to capital structure, the driving force behind this remarkably reduced leverage is the recognition of the role of cash in corporate assets. As the total assets of the constrained firm incorporate the value of cash, the leverage ratio decreases. 6. Related literature This paper builds on the contingent claims models of risky asset valuation introduced by Black and Scholes (1973) and Merton (1974). Since the trade-off models of Fischer, Heinkel, and Zechner (1989) and Leland (1994), an important part of the literature has focused on the corporate-finance implications of contingent claims modeling with the central role given to the optimal choice of capital structure. Subsequent extensions analyze debt maturity, debt renegotiation, recapitalization,
incomplete accounting information, macroeconomic regimes, debt structure, and investment.13 Despite these developments, the structural trade-off framework has not been successful in incorporating some essential corporate financial decisions. The existing models typically predict no role for corporate cash holdings, treat dividends merely as balancing items, and focus on solvency default and neglect liquidity concerns. The contribution of this paper is to provide a tractable model of dynamic cash and dividend policies with realistic treatment of liquidity and solvency concerns. An exception within the structural trade-off literature is the paper of Anderson and Carverhill (2007). Like this model, theirs also features two sources of uncertainty in cash flows. However, because the uncertainties are left independent, their analysis does not share with this paper the richness of interactions between liquidity and solvency and the predictions with respect to cash, dividends, and credit spreads. Instead, they employ numerical techniques and focus on dynamic refinancing. This paper is also related to the literature on dynamic liquidity management and dividend payout optimization. Jeanblanc-Picque´ and Shiryaev (1995) study a tractable model of a financially constrained firm threatened by costly liquidation, in which the optimal payout policy is to retain all earnings if cash reserves are below a certain fixed threshold and to pay out everything otherwise. The model has been extended to incorporate, among others, risk management, investment, and costly financing (Højgaard and Taksar, 1999; De´camps and Villeneuve, 2007; De´camps, Mariotti, Rochet, and Villeneuve, 2008). This paper shows that adding uncertainty in the expected level of cash flows and concerns over solvency leads the optimizing firm to smooth dividends relative to cash flows. The analysis here is also related to DeMarzo and Sannikov (2008). In their model, an agent controls the firm’s expected cash flows through costly effort, and the initially unknown expected profitability is learned over time. While their approach is different than mine, the optimal compensation contract in their model specifies payments that are smoothed over cash flows as are equity-value-maximizing dividends in this paper. Several recent papers also feature both cash holdings and debt financing. Hennessy and Whited (2005) present a trade-off model in which firms use a mix of equity, oneperiod debt, and cash balance to cover their financing needs. In contrast to this model, in Hennessy and Whited (2005) default is precluded, which results in riskless debt and zero credit spreads, and firms never hold both debt and positive cash balance at the same time. Moreover, the analysis here is focused on the roles of short-term liquidity and long-term solvency distresses, while the framework of Hennessy and Whited (2005) does not model and distinguish these forces. Gamba and Triantis
13 See Leland and Toft (1996), Fan and Sundaresan (2000), Goldstein, Ju, and Leland (2001), Duffie and Lando (2001), Hackbarth, Miao, and Morellec (2006), Broadie, Chernov, and Sundaresan (2007), Hackbarth, Hennessy, and Leland (2007), and Sundaresan and Wang (2007), among others.
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
(2008) extend Hennessy and Whited (2005) and allow firms to hold both debt and cash holdings at the same time, but the other differences remain. Acharya, Almeida, and Campello (2007) recognize that, as in this paper, the presence of financing frictions is a precondition for a meaningful role of cash holdings in corporate policy. Their motivation for cash is, however, based on the distinct roles of cash and negative debt in hedging future investment opportunities against future cash flows. Acharya, Huang, Subrahmanyam, and Sundaram (2006) introduce cash holdings into a discrete-time model of risky debt. Their focus is on the role of strategic debt renegotiation. Overall, the analysis in this paper with closed-form results is more tractable than previous models that relied on numerical solutions.
7. Conclusions Earlier literature has studied either solvency distress with optimal capital structure or liquidity distress with cash and dividend policy. The analytically tractable framework presented in this paper allows one to study both sources of financial distress simultaneously and to explore the interplay of financing, cash, and dividends. I find that corporate liquidity and solvency interact through information, hedging, and leverage channels. These interactions can help to explain several empirical regularities. The information and hedging channels cause equity-maximizing firms to smooth dividends and to absorb cash flow shocks in cash holdings. The leverage channel, which captures the fact that firms select their leverage to limit exposure to liquidity risk, can explain a low dispersion of credit spreads found in empirical studies. I further find that long-term profitability uncertainty, measuring solvency risk, and cash flow volatility, measuring liquidity risk, can have opposing effects on various variables. These findings suggest that empirical studies should pay attention to the effects of uncertainty besides the usual focus on volatility (see, e.g., Anderson, Ghysels, and Juergens (2009) for an empirical proxy of long-term uncertainty). The model can be extended to study a number of additional issues, which are left for future research. First, it would be interesting to analyze dynamic capital structure choice with different degrees of financing constraints. If debt and equity refinancing is costly, then the decision whether to finance liquidity needs or to default due to illiquidity might depend on the level of solvency. Second, the paper considers a single firm with exogenously determined cash flows. Competition can affect firms’ cash flows and default strategies, and thus optimal leverage, demand for cash, and also dividends. Analyses of both oligopoly and competitive equilibria may be worth pursuing (Lambrecht, 2001; Miao, 2005). Third, future work can also analyze the role of changing macroeconomic conditions in the framework of this paper. Macroeconomic risk has been recently successfully incorporated into contingent claims models of capital structure (Hackbarth, Miao, and Morellec, 2006; Bhamra, Kuehn, and Strebulaev, 2010; Chen, 2010) and may also
381
have an important impact on corporate liquidity and solvency risks and their interaction. Finally, a promising direction for future research would be to introduce asymmetric information between corporate insiders and outside investors. It is likely that outsiders observe true cash flows but with a lag. It could be also that the firm’s insiders know the true profitability before (alternatively, after) they seek external financing whereas investors cannot observe it directly. Each of these situations might create adverse selection problems that could deepen financial constraints. Appendix A. Proofs
Proof of Proposition 1. I first solve for the equity value function. Differential equation (5) has an analytical solution of the following general form: Enc ðmÞ ¼ B1 ðmmL Þ1b ðmH mÞb þ B2 ðmmL Þb ðmH mÞ1b þð1tÞ
mk r
,
ð32Þ
where b 4 1 is the positive root of
b2 b
2r s2 ðmH mL Þ2
¼ 0,
and B1, B2 are constants that are determined by boundary conditions. The first two terms constitute the general solution to the homogeneous part of (5) and the third term is an easy-to-guess particular solution to the whole non-homogeneous equation (5). The boundary condition at mH implies that B2 =0. This is because, with b 4 1 for any other B2, Enc ðmH Þ is unbounded. Using the boundary condition at mnc to determine B1 delivers the expression for Enc ðmÞ given in the proposition. Debt value is found analogously using that the general solution to differential equation (4) is k Dnc ðmÞ ¼ B3 ðmmL Þ1b ðmH mÞb þ B4 ðmmL Þb ðmH mÞ1b þ , r with b as above and constants B3 and B4. Applying the boundary conditions on Dnc at mH and mnc yields (10). Firm value Fnc given in (11) follows by adding (9) and (10). Optimal default trigger mnc in (8) is delivered by applying the smooth pasting condition (7) to Eq. (9). & Proof of Proposition 2. For an arbitrary function C ð,Þ, let Ct ¼ C ðmt ,Xt Þ, so that Ct is allowed to depend on both state variables. Denote the default time associated with trigger m by t ¼ infft Z0 : mt o m g. The firm is liquid up to time t* if Ct Z 0 for all t r t . Note that, for example, a simple cash policy Ct = 0, t r t , satisfies this liquidity condition, but such a policy is not feasible as it requires negative dividends. From (14) it follows that dDivt ¼ rC t dtdC t þ dY t :
ð33Þ
The cash and dividend policy is feasible if the equality holds at each time. As the firm has full discretion over non-negative dividends, the cash policy remains feasible as long as dDivt Z 0 in (33). The goal is to determine the
382
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
lowest cash level C that satisfies both liquidity and feasibility conditions. Suppose first that C ðm,XÞ is a continuous and differentiable function. Applying Ito’s lemma to C , the right-hand side of (33) can be written as
Proof of Proposition 3. Define the time of liquidity default by t0 ¼ infft Z 0 : Ct o 0g, the time of solvency default by t ¼ infft Z 0 : mt o m g, and let t~ ¼ t0 4t . For a given m , the equity value with the optimal dividend policy is given by
1 dDivt ¼ rC þ ð1tÞðmt kÞ 2 ðmt mL Þ2 ðmH mt Þ2 C mm mC X 2s
Eðm,CÞ ¼ supEm,C
1 s2 C XX ðmt mL ÞðmH mt ÞC mX dt 2 1 þ ð1tÞs ðmt mL ÞðmH mt ÞC m sC X dZ t ,
s
ð34Þ
where subindices at C denote partial derivatives. The requirement that increments of this process are nonnegative for all t r t can be satisfied if and only if, first, the volatility coefficient at dZt is constant and zero and, second, the drift parameter at dt is non-negative. The first condition yields the following partial differential equation: 1
s2
ðmt mL ÞðmH mt ÞC m þC X ¼ ð1tÞ:
"Z
t~ 0
Div
# ~
ers dDivs þ ert Ct~ :
To shorten notation, it is useful to introduce the infinitesimal generator A of the two-dimensional process ðm,CÞ with dynamics described by (3) and (14). For a function f ðm,CÞ of class C2, A is a partial differential operator describing the rate of change in f and by Ito’s lemma is given by Af ðm,CÞ ¼
1 ðmmL Þ2 ðmH mÞ2 fmm ðm,CÞ 2s2 1 þ ð1tÞ2 s2 fCC ðm,CÞ 2 þ ð1tÞðmmL ÞðmH mÞfmC ðm,CÞ þ ½rC þð1tÞðmkÞfC ðm,CÞ:
ð35Þ I use the guessed dividend policy (19)–(21) to characterize Eðm,CÞ in different regions. For m 4 m and 0 oC o C ðmÞ, dDiv* =0 by (19) and, using Ito’s lemma, Eðm,CÞ satisfies the differential equation:
Its general solution is s2 mmL C ðm,XÞ ¼ ð1tÞ ln m mL mH m H mH mL mmL Xln þM2 , þ M1 mH m s2
ð36Þ
where M1 and M2 are constants. As Xt, t r t , can in general take any positive or negative values, the liquidity condition Ct Z0, t r t , is satisfied only if M1 =0. This means that C is independent of X. To determine M2, use the non-negativity condition on the drift parameter in (34), which, with the use of (36), can be written as m þ m L k Z0: rC ðm,XÞ þ ð1tÞ H 2
Note that C is increasing in m, which implies that the inequality is most demanding at m ¼ m . Moreover, the liquidity condition at all t r t requires that C ðm ,XÞ Z0: Solving the last two inequalities for the constant M2, one obtains the formula given in the proposition. The final step is to rule out that there are points of discontinuity and non-differentiability in C if m 4 m . If C is discontinuous, it can only have downward jumps. But if immediately after the jump, C is the smallest C that allows the firm to avoid liquidity default, then in the continuous environment of the model, C before the jump could not be the smallest C satisfying this desired property. Hence, C must be continuous. Suppose now that C has some nondifferentiable points. In between the points, C must satisfy differential equation (35) with the general solution in (36), subject to the boundary conditions implied by the continuity of C . But with M1 = 0, it will result in C that is a continuous differentiable function of m for all m 4 m . &
rEðm,CÞ ¼ AEðm,CÞ:
ð37Þ
For m Z m and C ZC ðmÞ, Eðm,CÞ is given by
Eðm,CÞ ¼ Enc ðmÞ þ C:
ð38Þ
For the case of C ¼ C ðmÞ, (38) is proven in Proposition 4. The case C 4C ðmÞ follows then directly from (21). For m Z m and C =0, the firm defaults and Eðm,CÞ is Eðm,0Þ ¼ 0:
ð39Þ
Note that while the guessed policy derives from the intuition gained from Proposition 2, in principle, a guess of the optimal dividend policy and equity value could have been found starting the variational inequality usual for a singular stochastic control problem of this type. Namely, one can expect that the optimal solution is a C2 function that satisfies the variational inequality: maxfrEðm,CÞ þAEðm,CÞ,1EC ðm,CÞg ¼ 0, and (39) at liquidity default. The proof proceeds in two steps. First, it is proved that the policy specified in (19)–(21), Div*, attains Eðm,CÞ: "Z ~ # t
Em,C
0
~
ers dDivs þert Ct~ ¼ Eðm,CÞ:
ð40Þ
Second, it is shown that no other feasible dividend policy provides a higher value: "Z ~ # t
Em,C
0
~
ers dDivs þ ert Ct~ rEðm,CÞ:
ð41Þ
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
In the first step, I start with the case of 0 oC r C ðmÞ. Applying Ito’s lemma to ert Eðmt ,Ct Þ, it follows that ~
erðt4t Þ Eðmt4t~ ,Ct4t~ Þ ¼ Eðm,CÞ Z
ers Em ðms ,Cs Þ dms þ
0 t4t~
þ Z Z
rers Eðms ,Cs Þ ds Z
t4t~
ers EC ðms ,Cs Þ dC s
0
ers
1 ðm mL Þ2 ðmH ms Þ2 Emm ðms ,Cs Þ ds 2s2 s
ers
1 ð1tÞ2 s2 ECC ðms ,Cs Þ ds 2
0 t4t~
þ
t4t~
0
t4t~
þ Z
Z
0
383
The second step is to show that any other feasible dividend policy Div yields at most E. Let C be the cash holdings process corresponding to Div. As Div does not need to be continuous, one can decompose it to P Divt ¼ Divct þ s r t ðDivs Divs Þ, where Divct is the purely continuous part. Applying a generalized Ito’s lemma to ert Eðmt ,Ct Þ and rearranging to have Eðm,CÞ on the lefthand side leads to ~
Eðm,CÞ ¼ erðt4t Þ Eðmt4t~ ,Ct4t~ Þ
Z 0
t4t~
ers ð1tÞðms mL ÞðmH ms ÞEmC ðms ,Cs Þ ds:
þ 0
Moving Eðm,CÞ on the left-hand side and using the dynamics of m and C results in rðt4t~ Þ
Eðm,CÞ ¼ e
Z
t4t~
ers
1
s
0
t4t~
ers ðrEðms ,Cs Þ þAEðms ,Cs ÞÞ ds
0
Z
Eðmt4t~ ,Ct4t~ Þ
þ AEðms ,Cs ÞÞ dsþ
Z t4t~ Z t4t~ ers EC ðms ,Cs Þ dDivs ers þ 0 0 1 ðms mL ÞðmH ms ÞEm ðms ,Cs Þ þ ð1tÞsEC ðms ,Cs Þ dZ s :
Z
t4t~
ers ðrEðms ,Cs Þ
t4t~
ers EC ðms ,Cs Þ dDivs
0
ðms mL ÞðmH ms ÞEm ðms ,Cs Þ
þ ð1tÞsEC ðms ,Cs Þ dZ
X
ers ½Eðms ,Cs ÞEðms ,Cs ÞEC ðms ,Cs ÞðCs Cs Þ:
s r t4t~
s
The first integrand on the left-hand side is equal to zero from (37) if 0 oC o C ðmÞ or (38) combined with (5) if C ¼ C ðmÞ. Given that the first derivatives of E are bounded, the last term is a martingale. Thus, taking expectations results in ~
Eðm,CÞ ¼ Em,C ½erðt4t Þ Eðmt4t~ ,Ct4t~ Þ þ Em,C
"Z
#
t4t~
ers EC ðms ,Cs Þ dDivs :
0
Taking t-1 leads to r t~
Eðm,CÞ ¼ Em,C ½e
Eðmt~ ,Ct~ Þ þ Em,C
"Z
rs
e
s ,Cs Þ dDivs
EC ðm
0
:
Z
t4t~ rs
e
s ,Cs Þ dDivs
E C ðm
rs
e
½Eðms ,Cðms ÞÞEðms ,Cs Þ :
Observe that Eðms ,Cs ÞEðms ,Cs Þ r Cs Cs as EC ðm,CÞ Z 1. Moreover, Cs Cs = (Divs Divs ). It follows that ~
Eðm,CÞ Z Em,C ½erðt4t Þ Eðmt4t~ ,Ct4t~ Þ
"Z
þ Em,C
" þ Em,C
t4t~ rs
e 0
" þ Em,C
t4t~ rs
e
0 0 1 ðms mL ÞðmH ms ÞEm ðms ,Cs Þ þð1tÞsEC ðms ,Cs Þ dZ s
s
Eðm,C ðmÞÞ þEðm,CÞ þ EC ðm,CÞðC ðmÞCÞ: Using (38) in the last term gives Eðm,C ðmÞÞ þ Eðm,CÞ þEC ðm,CÞðC ðmÞCÞ ¼ C ðmÞ þ C þ C ðmÞC ¼ 0. Thus, following the same manipulations as in the previous case, one arrives at the assertion of equality (40).
EC ðm
#
X
Z Em,C ½e Z
#
c s ,Cs Þ dDivs
ers ðDivs Divs Þ
s r t4t~
ers ðrEðms ,Cs Þ þAEðms ,Cs ÞÞ ds
0
þ
Em,C
rðt4t~ Þ
t4t~
Z
#
X s r t4t~
Eðmt4t~ ,Ct4t~ Þ
0
"
~
Eðm,CÞ ¼ e
~
Eðm,CÞ ¼ Em,C ½erðt4t Þ Eðmt4t~ ,Ct4t~ Þ "Z ~ # t4t c ers EC ðms ,Cs Þ dDivs þ Em,C
#
t~
The first term on the left-hand side is equal to Em,C ½ert Ct~ by (38) and (39). As dividends are nonzero under Div* only if Cs ¼ C ðms Þ and, by (38), EC ðm,C ðmÞÞ ¼ 1, the required equality (40) is satisfied. Next, let C 4C ðmÞ. In this case, Div* and the corresponding process C are non-continuous at t= 0. Using a generalized Ito’s lemma to ert Eðmt ,Ct Þ and setting Eðm,CÞ aside, one gets rðt4t~ Þ
The first integrand on the left-hand side is equal to zero from (37) if 0 oC o C ðmÞ or (38) combined with (5) if C ¼ C ðmÞ. As the first derivatives of E are bounded, the Ito integral on the left-hand side is a martingale. After taking expectations and rearranging the Div process, it follows that
Eðmt4t~ ,Ct4t~ Þ þ Em,C
X
#
#
t4t~ rs
e 0
c dDivs
ers ðDivs Divs Þ
s r t4t~ ~
"Z
Z Em,C ½erðt4t Þ Eðmt4t~ ,Ct4t~ Þþ Em,C
"Z 0
t4t~
# ers ðms ,Cs Þ dDivs :
Finally, taking limit t-1, one reaches the required inequality (41). & Proof of Proposition 4. Debt value is found as in the proof of Proposition 1. To determine equity value, one can use the general solution to differential equation (23). By
384
S. Gryglewicz / Journal of Financial Economics 99 (2011) 365–384
direct verification, it is 1b
b
b
1b
EðmÞ ¼ B5 ðmmL Þ ðmH mÞ þB6 ðmmL Þ ðmH mÞ a1 mmL mH m mH mL mH þ mL a2 ln þ : þ m þ r mH m m mL 2 r r s2 Applying the boundary conditions at mH and m to determine constants B5 and B6 leads to the expression provided in the proposition. & References Acharya, V., Almeida, H., Campello, M., 2007. Is cash negative debt? A hedging perspective on corporate financial policies. Journal of Financial Intermediation 16, 515–554. Acharya, V., Davydenko, S., Strebulaev, I., 2008. Cash holdings and credit spreads. Unpublished working paper, London Business School. Acharya, V., Huang, J., Subrahmanyam, M., Sundaram, R., 2006. When does strategic debt-service matter? Economic Theory 29, 363–378 Almeida, H., Campello, M., Weisbach, M., 2004. The cash flow sensitivity of cash. Journal of Finance 59, 1777–1804. Altinkilic-, O., Hansen, R., 2000. Are there economies of scale in underwriting fees? Evidence of rising external financing costs. Review of Financial Studies 13, 191–218. Anderson, E., Ghysels, E., Juergens, J., 2009. The impact of risk and uncertainty on expected returns. Journal of Financial Economics 94, 233–263. Anderson, R., Carverhill, A., 2007. Liquidity and capital structure. CEPR Discussion Paper No. 6044. Bates, T., Kahle, K., Stulz, R., 2009. Why do U.S. firms hold so much more cash than they used to? Journal of Finance 64, 1985–2021 Bhamra, H., Kuehn, L., Strebulaev, I., 2010. The levered equity risk premium and credit spreads: a unified framework. Review of Financial Studies 23, 645–703. Biais, B., Mariotti, T., Plantin, G., Rochet, J., 2007. Dynamic security design: convergence to continuous time and asset pricing implications. Review of Economic Studies 74, 345–390. Black, F., Scholes, M., 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 637–654. Brav, A., Graham, J., Harvey, C., Michaely, R., 2005. Payout policy in the 21st century. Journal of Financial Economics 77, 483–527. Broadie, M., Chernov, M., Sundaresan, S., 2007. Optimal debt and equity values in the presence of Chapter 7 and Chapter 11. Journal of Finance 62, 1341–1377. Campello, M., Graham, J., Harvey, C., 2010. The real effects of financial constraints: evidence from a financial crisis. Journal of Financial Economics 23, 470–487. Chen, H., 2010. Macroeconomic conditions and the puzzles of credit spreads and capital structure. Journal of Finance, forthcoming.. DeAngelo, H., DeAngelo, L., 1990. Dividend policy and financial distress: an empirical investigation of troubled NYSE firms. Journal of Finance 45, 1415–1431. De´camps, J.P., Mariotti, T., Rochet, J., Villeneuve, S., 2008. Free cash-flow, issuance costs and stock price volatility. IDEI Working Paper No. 518. De´camps, J.P., Villeneuve, S., 2007. Optimal dividend policy and growth option. Finance and Stochastics 11, 3–27. DeMarzo, P., Sannikov, Y., 2006. Optimal security design and dynamic capital structure in a continuous-time agency model. Journal of Finance 61, 2681–2724. DeMarzo, P., Sannikov, Y., 2008. Learning in dynamic incentive contracts. Unpublished working paper, Stanford University. Duffie, D., Lando, D., 2001. Term structures of credit spreads with incomplete accounting information. Econometrica 69, 633–664. Eom, Y., Helwege, J., Huang, J., 2004. Structural models of corporate bond pricing: an empirical analysis. Review of Financial Studies 17, 499–544. Fan, H., Sundaresan, S., 2000. Debt valuation, renegotiation, and optimal dividend policy. Review of Financial Studies 13, 1057–1099. Faulkender, M., Wang, R., 2006. Corporate financial policy and the value of cash. Journal of Finance 61, 1957–1990. Fischer, E., Heinkel, R., Zechner, J., 1989. Dynamic capital structure choice: theory and tests. Journal of Finance 44, 19–40. Gamba, A., Triantis, A., 2008. The value of financial flexibility. Journal of Finance 63, 2263–2296.
Goldstein, R., Ju, N., Leland, H., 2001. An EBIT-based model of dynamic capital structure. Journal of Business 74, 483–512. Gomes, J., 2001. Financing investment. American Economic Review 91, 1263–1285. Guney, Y., Ozkan, A., Ozkan, N., 2007. International evidence on the nonlinear impact of leverage on corporate cash holdings. Journal of Multinational Financial Management 17, 45–60. Hackbarth, D., Hennessy, C., Leland, H., 2007. Can the trade-off theory explain debt structure? Review of Financial Studies 20, 1389–1428 Hackbarth, D., Miao, J., Morellec, E., 2006. Capital structure, credit risk, and macroeconomic conditions. Journal of Financial Economics 82, 519–550. Han, S., Qiu, J., 2007. Corporate precautionary cash holdings. Journal of Corporate Finance 13, 43–57. Hennessy, C., Whited, T., 2005. Debt dynamics. Journal of Finance 60, 1129–1165. Højgaard, B., Taksar, M., 1999. Controlling risk exposure and dividends payout schemes: insurance company example. Mathematical Finance 9, 153–182. Holtz-Eakin, D., Joulfaian, D., Rosen, H., 1994. Sticking it out: entrepreneurial survival and liquidity constraints. Journal of Political Economy 102, 53–75. Huang, J., Huang, M., 2003. How much of the corporate-treasury yield spread is due to credit risk?. Unpublished working paper, Pennsylvania State University. Irvine, P.J., Pontiff, J., 2009. Idiosyncratic return volatility, cash flows, and product market competition. Review of Financial Studies 22, 1149–1177. Jeanblanc-Picque´, M., Shiryaev, A.N., 1995. Optimization of the flow of dividends. Russian Mathematical Surveys 50, 257–277. Keppo, J., Moscarini, G., Smith, L., 2008. The demand for information: more heat than light. Journal of Economic Theory 138, 21–50. Khurana, I.K., Martin, X., Pereira, R., 2006. Financial development and the cash flow sensitivity of cash. Journal of Financial and Quantitative Analysis 41, 787–807. Lambrecht, B., 2001. The impact of debt financing on entry and exit in a duopoly. Review of Financial Studies 14, 765. Leary, M., Michaely, R., 2008. Why firms smooth dividends: empirical evidence. Johnson School Research Paper Series 11-08. Leary, M., Roberts, M., 2005. Do firms rebalance their capital structures? Journal of Finance 60, 2575–2619 Leland, H., Toft, K., 1996. Optimal capital structure, endogenous bankruptcy, and the term structure of credit spreads. Journal of Finance 51, 987–1019. Leland, H.E., 1994. Corporate debt value, bond covenants, and optimal capital structure. Journal of Finance 49, 1213–1252. Lins, K., Servaes, H., Tufano, P., 2010. What drives corporate liquidity? An international survey of cash holdings and lines of credit. Journal of Financial Economics 98, 160–176. Lintner, J., 1956. Distribution of incomes of corporations among dividens, retained earnings, and taxes. American Economic Review 46, 97–113. Liptser, R.S., Shiryaev, A.N., 2001. Statistics of Random Processes I: General Theory. Springer, Berlin, Heidelberg. Merton, R., 1974. On the pricing of corporate debt: the risk structure of interest rates. Journal of Finance 29, 449–470. Miao, J., 2005. Optimal capital structure and industry dynamics. Journal of Finance 60, 2621–2659. Moscarini, G., 2005. Job matching and the wage distribution. Econometrica 73, 481–516. Myers, S., Majluf, N., 1984. Corporate investment and financing decisions when firms have information that investors do not have. Journal of Financial Economics 13, 187–221. Opler, T., Pinkowitz, L., Stulz, R., Williamson, R., 1999. The determinants and implications of corporate cash holdings. Journal of Financial Economics 52, 3–46. Riddick, L., Whited, T., 2009. The corporate propensity to save. The Journal of Finance 64, 1729–1766. Sufi, A., 2009. Bank lines of credit in corporate finance: an empirical analysis. Review of Financial Studies 22, 1057–1088. Sundaresan, S., Wang, N., 2007. Investment under uncertainty with strategic debt service. American Economic Review 97, 256–261. Titman, S., Wessels, R., 1988. The determinants of capital structure choice. Journal of Finance 43, 1–19. Zingales, L., 1998. Survival of the fittest or the fattest? Exit and financing in the trucking industry. Journal of Finance 53, 905–938.
Journal of Financial Economics 99 (2011) 385–399
Contents lists available at ScienceDirect
Journal of Financial Economics journal homepage: www.elsevier.com/locate/jfec
Do time-varying risk premiums explain labor market performance?$ Long Chen a, Lu Zhang b, a b
John M. Olin Business School, Washington University in St. Louis, United States Fisher College of Business, Ohio State University, and NBER, United States
a r t i c l e in fo
abstract
Article history: Received 31 October 2009 Received in revised form 1 February 2010 Accepted 10 March 2010 Available online 7 September 2010
Within the standard search and matching model, time-to-build implies that high aggregate risk premiums should forecast low employment growth in the short run but high employment growth in the long run. If there is also time-to-plan, high risk premiums should forecast low net hiring rates in the short run but high net hiring rates in the long run. Our evidence indicates two-quarter time-to-build in the aggregate payroll data, no time-to-plan in the aggregate hiring data, but two-quarter time-to-plan in the job creation data for manufacturing firms. High payroll growth and high net job creation rate in manufacturing also forecast low stock market excess returns at business cycle frequencies. & 2010 Elsevier B.V. All rights reserved.
JEL classification: G31 G12 J23 Keywords: Time-varying risk premiums Payroll growth Hiring rate Search and matching frictions Labor markets
1. Introduction Modern asset pricing research has shown that aggregate stock market returns in excess of the short-term interest rate are predictable, meaning that expected aggregate risk premiums are time-varying.1 This body of
$ We thank Frederico Belo, Bill Schwert (the editor), Chen Xue, and an anonymous referee for helpful comments. A part of the paper was completed in July 2009 while Lu Zhang was visiting Shanghai Jiao Tong University’s Shanghai Advanced Institute of Finance, whose hospitality is gratefully acknowledged. This paper supersedes our National Bureau of Economic Research working paper no. 15129 titled ‘‘The stock market and aggregate employment.’’ All the remaining errors are our own. Corresponding author. E-mail address: zhanglu@fisher.osu.edu (L. Zhang). 1 For example, Campbell and Shiller (1988), Fama and French (1988), and Hodrick (1992) show that the dividend yield forecasts market excess returns. Fama and Schwert (1977) and Fama (1981) show that the relative Treasury bill rate, defined as the Treasury bill rate
0304-405X/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jfineco.2010.09.002
evidence suggests that a large fraction of the variation in the cost of capital in standard labor market models is driven by time-varying risk premiums, as opposed to the interest rate. However, probably because of the longstanding divide between labor economics and finance (especially asset pricing), prior work that draws the linkage between time-varying risk premiums and labor market performance seems scarce. Our reading of the labor economics literature suggests that it has largely ignored the impact of time-varying risk premiums on the labor markets.
(footnote continued) minus its past four-quarter moving average, predicts market excess returns. Keim and Stambaugh (1986) and Fama and French (1989) find that the term premium and the default premium predict returns. Cochrane (1991) shows that the aggregate investment-to-capital ratio and Lettau and Ludvigson (2001) show that the log consumption-towealth ratio forecast market excess returns.
386
L. Chen, L. Zhang / Journal of Financial Economics 99 (2011) 385–399
In this article, we use the standard search and matching framework (e.g., Pissarides, 1985, 2000; Mortensen and Pissarides, 1994) to study the impact of timevarying risk premiums on the labor market. When risk premiums are time-varying, different labor market frictions give rise to different sets of temporal relations between the expected return, labor hiring, and employment growth. Time-to-build means that hiring in the current period leads to more productive workers in the next period. Consider a discount rate drop at the beginning of the current period. The stock price rises immediately, meaning that the marginal benefit of hiring and therefore hiring also increase. With time-to-build, the employment stock increases only at the beginning of the next period. As such, the current-period employment growth is positive, and regressing it on the discount rate yields a negative slope. However, the discount rate drop also means that the realized return declines on average in the current period. The resulting lower stock price at the beginning of the next period means a lower marginal benefit of hiring and therefore lower hiring in the next period. Time-tobuild implies that the next-period employment growth is negative, and that regressing it on the current-period discount rate yields a positive slope. In short, the discount rate should forecast employment growth with a negative slope in the short run but a positive slope in the long run. However, forecasting the next-period hiring rate with the current-period discount rate should yield only a positive slope without sign flipping at longer horizons. A similar logic shows that the effect of two-period time-to-build is to prolong the horizon over which the slope switches sign by one more period. Time-to-plan means that time lags exist between the decision to hire and the actual hiring expenditure. Consider again a discount rate drop but with one-period time-to-plan (along with one-period time-to-build). The discount rate drop at the beginning of t generates a higher stock price at t. With the planning lag, hiring rises only in period t+ 1 but remains constant in t. With one-period time-to-build, employment rises at the beginning of t +2 but remains unchanged in t +1. The discount rate drop also means that the stock return drops on average over period t. The resulting lower stock price at the beginning of t +1, together with time-to-plan, means a drop in hiring over period t +2 and a fall in employment at the beginning of t + 3. Pulling the dynamics together, we observe that the discount rate should forecast employment growth (up to t + 2) and the hiring rate (up to t +1) with a negative slope in the short run, but a positive slope in the long run. We report three empirical findings. First, measuring employment growth as the growth rate of seasonally adjusted total nonfarm payrolls from US Bureau of Labor Statistics (BLS), we find that high values of the log consumption-to-wealth ratio (CAY) of Lettau and Ludvigson (2001) predict low payroll growth at short horizons within two quarters, but high payroll growth at longer horizons. Pulling all the information contained in standard risk premium proxies including the dividend yield, CAY, the relative Treasury bill rate, the term spread, and the default premium, we correlate the one-quarter-ahead
fitted risk premiums with cumulative payroll growth over various horizons. We find that the correlations are insignificantly negative within two quarters, insignificantly positive at the fourth quarter, but significantly positive from the eight-quarter horizon and onward. The evidence so far suggests that either two-period time-tobuild or the combined effect of one-period time-to-build and one-period time-to-plan is at work in the aggregate employment data. Second, we measure the hiring rate as the difference between gross hiring rate and separation rate from the Current Population Survey, conducted by the US Census Bureau for the BLS, and the BLS’s Jobs Openings and Labor Turnover Survey (JOLTS). We find that high values of CAY predict high net hiring rates at various horizons. The correlations between the one-quarter-ahead fitted risk premiums with the I-quarter-ahead net hiring rates are all positive, ranging from 0.16 to 0.35, and are mostly significant. The evidence suggests that there is no timeto-plan in the aggregate hiring data and that the temporal relations between the discount rate and payroll growth must be driven by two-period time-to-build. The evidence is more supportive of time-to-plan in manufacturing firms. When forecasting the net job creation rate in manufacturing from Davis, Faberman, and Haltiwanger (2006), the relative bill rate has a significantly positive slope in the one-quarter horizon, a weakly positive slope in the two-quarter horizon, but significantly negative slopes at the four- and eight-quarter horizons. The correlations between the one-quarterahead fitted risk premiums and the I-quarter-ahead net job creation rates in manufacturing are significantly negative in the one-quarter horizon, effectively zero in the two-quarter horizon, and significantly positive in the four- and eight-quarter horizons. The evidence suggests that time-to-plan for hiring lasts for about two quarters in manufacturing. Third, lagged payroll growth predicts market excess returns, especially at business cycle frequencies. In univariate regressions, the adjusted R2 peaks at 5% in the four-quarter horizon. Across various horizons, the slopes are universally negative and mostly significant. Judged on Newey and West (1987) t-statistics and adjusted R2s in univariate regressions, the predictive power of payroll growth dominates that of standard risk premium proxies such as the default spread and the relative Treasury bill rate. Whereas the dividend yield and the term spread maximize their predictive power at long horizons, the predictive power of payroll growth peaks at short business cycle frequencies around four quarters. We also find similar evidence using the net job creation rate in manufacturing, but stock market predictability with the net hiring rate for the overall economy is weak. Our work shows that time-varying risk premiums are quantitatively important in forecasting employment growth. However, leading models in labor economics ignore risk premiums. In particular, the constant discount rate assumption is embedded in the partial equilibrium Mortensen and Pissarides search and matching framework. As such, risk premiums are constant and cannot forecast future employment growth. Merz (1995), Andolfatto
L. Chen, L. Zhang / Journal of Financial Economics 99 (2011) 385–399
(1996), and Gertler and Trigari (2009) integrate the search and matching model into the standard business cycle framework with general equilibrium. However, their models follow the real business cycle literature in assuming log utility, which in turn implies that the risk premiums in their models are close to zero and largely timeinvariant. Our work is related to Lettau and Ludvigson (2002), who build on Barro (1990) and Lamont (2000) to study the impact of time-varying risk premiums on aggregate investment. We focus on the labor market. The asset pricing literature has only started to analyze the impact of labor on stock prices. Boyd, Hu, and Jagannathan (2005) show that stock market index responds positively to an announcement of rising unemployment in expansions but negatively in contractions. Merz and Yashiv (2007) quantify the importance of labor in explaining stock market valuation. Bazdresch, Belo, and Lin (2009) show that high employment growth predicts low average returns in the cross section. We instead study the impact of time-varying risk premiums on the labor market as well as stock market predictability with labor market variables. Finally, the voluminous literature on stock market predictability (see footnote 1) has largely ignored labor market variables. We fill this gap. The rest of the paper is organized as follows. Section 2 develops testable hypotheses, Section 3 describes our data and test design, Section 4 presents the results, and Section 5 concludes. 2. Hypothesis development We formulate the search and matching model as in Yashiv (2000) and Merz and Yashiv (2007) in Section 2.1, and we develop testable hypotheses in Subsection 2.2. 2.1. The model The economy is populated by identical workers and identical firms. Time is discrete and horizon infinite. Labor is the only input in a constant-return-to-scale production function. The operating profits are given by PðNt ,Xt Þ ¼ f ðXt ÞNt , in which Nt is total employment and Xt is productivity shock. To attract new workers, a firm needs to post a number of job vacancies, Jt. For each vacancy posted, the firm takes as given the probability lt at which the vacancy is filled. The firm’s gross hires are given by Ht lt Jt . Workers are paid a gross compensation rate of Wt. Hiring costs include both the cost of advertising, screening, and selecting new workers and the cost of training. These costs depend on the stock of employment, the number of vacancies, and the probability of filling the vacancy. For simplicity, we assume that the hiring costs function is quadratic: ða=2Þðlt Jt =Nt Þ2 Nt , in which a 4 0. The hiring costs are increasing and convex in the number of new hires and are decreasing in the employment stock. (The costs depend on lt and Jt only through their product.) These properties are desirable because training costs and costs of time spent on screening and selecting new
387
workers increase with the number of new hires. Firms make hiring decisions at the beginning of each period t, and the new hires enter production in the beginning of period t + 1. Separation of workers from jobs occurs at a constant rate of s,0 rs r 1, which firms take as given. As a result, the employment stock evolves as Nt þ 1 ¼ ð1sÞNt þ lt Jt :
ð1Þ
Firms choose the number of job vacancies to post each period to maximize the discounted present value of future free cash flows. When discounting, firms take as given the stochastic discount factor from period t to t +i, denoted Mt + i. The dynamic problem of the firms is given by " max
fJt þ i ,Nt þ i þ 1 g
Et
1 X
" Mt þ i PðNt þ i ,Xt þ i ÞWt þ i Nt þ i
i¼0
## 2 a lt þ i Jt þ i Nt þ i , 2 Nt þ i
ð2Þ subject to Eq. (1). Let qt denote the Lagrangian multiplier associated with the constraint given by Eq. (1). The multiplier is the marginal benefit of an additional unit of employment. The first-order conditions of Jt and Nt + 1 are given by, respectively, lt Jt , ð3Þ qt ¼ a Nt and a lt þ 1 Jt þ 1 þð1sÞqt þ 1 : qt ¼ Et Mt þ 1 f ðXt þ 1 ÞWt þ 1 þ 2 Nt þ 1 ð4Þ Eq. (3) says that the marginal benefit of hiring equals the marginal cost of hiring. Eq. (4) says that the marginal benefit of hiring equals the next period marginal product of labor net of gross compensation plus the saving of hiring costs and the continuation value of the employment stock net of separation, discounted to time t using Mt + 1. Combining the two first-order conditions, using Eq. (1) to substitute out lt Jt , and simplifying, we obtain Et[Mt+ 1RH t+ 1]=1, in which RH t + 1 is the hiring return, defined as RH tþ1
f ðXt þ 1 ÞWt þ 1 þða=2ÞðNt þ 2 =Nt þ 1 Þ2 ða=2Þð1sÞ2 : aðNt þ 1 =Nt Það1sÞ ð5Þ RH t+ 1
As such, is the ratio of the marginal benefit of hiring at period t+1 divided by the marginal cost of hiring at period t. With constant returns to scale, the hiring return equals the stock return, Rt+ 1.2 2 Cochrane (1991) first outlines the basic idea underlying this equivalence. Our proof follows the logic in Liu, Whited, and Zhang (2009, Appendix A). Let Vt be the cum-dividend value of equity given by Eq. (2), Pt Vt Dt be the ex-dividend value of equity, in which Dt f ðXt ÞNt Wt Nt ða=2Þðlt Jt =Nt Þ2 Nt is the current-period dividend. We expand Vt as follows (noting Ht ¼ lt Jt Þ:
Pt þ Dt ¼ Dt qt ½Nt þ 1 ð1sÞNt Ht þ Et Mt þ 1 f ðXt þ 1 ÞNt þ 1 Wt þ 1 Nt þ 1
388
L. Chen, L. Zhang / Journal of Financial Economics 99 (2011) 385–399
2.2. Testable hypothesis When the left-hand side is the stock market return, Eq. (5) motivates our testable hypotheses. 2.2.1. Forecasting employment growth The empirical finance literature has shown a standard list of risk premium proxies (see footnote 1). Because the interest rate has high persistence and small variance, these variables are in effect proxies for the discount rate, Et[Rt + 1]. As such, Eq. (5) implies that regressing shorthorizon employment growth, Nt + 1/Nt, on Et[Rt + 1] should yield negative slopes, but regressing long-horizon employment growth, Nt + 2/Nt + 1, on Et[Rt + 1] should yield positive slopes. H1. The risk premium proxies that predict market excess returns positively should have negative slopes in the short run but positive slopes in the long run in predicting employment growth. The proxies that predict market excess returns negatively should have positive slopes in the short run but negative slopes in the long run in predicting employment growth. The one-period time-to-build embedded in Eq. (1) is important for producing the predictability of employment growth. This friction says that hiring at time t, lt Jt , leads only to more productive workers at the beginning of t +1. The effect of this friction on employment growth predictability is intuitive. The length of the decision period (e.g., one month, one quarter, one year, or longer) is unspecified in the model. If the decision period is one year, Eq. (5) says that regressing employment growth up to four quarters ahead on the discount rate should yield negative slopes and that regressing employment growth at longer horizons on the discount rate should yield positive slopes. If the decision period is one quarter instead, we should see only negative slopes from using the one-quarter-ahead employment growth as the dependent variable. Employment growth at longer horizons should produce positive slopes. As such, the horizon at which the regression slopes switch signs indicates the length of time-to-build. 2.2.2. Forecasting hiring rate The time-to-build mechanism differs subtly from timeto-plan discussed in Lamont (2000) and Lettau and (footnote continued) a Ht þ 1 2 Nt þ 1 qt þ 1 ½Nt þ 2 ð1sÞNt þ 1 Ht þ 1 þ . . . 2 Nt þ 1 Recursively substituting Eqs. Pt = qt[(1 s)Nt +Ht] = qt Nt + 1 and
Rt þ 1 ¼
(3)
and
(4)
into
Eq.
(6)
yields
Pt þ 1 þ Dt þ 1 Pt
qt þ 1 ½Ht þ 1 þ ð1sÞNt þ 1 þf ðXt þ 1 ÞNt þ 1 Wt þ 1 Nt þ 1 ða=2ÞðHt þ 1 =Nt þ 1 Þ2 Nt þ 1 ¼ qt Nt þ 1
¼
qt þ 1 ½Ht þ 1 =Nt þ 1 þ ð1sÞ þ f ðXt þ 1 ÞWt þ 1 ða=2ÞðHt þ 1 =Nt þ 1 Þ2 qt
¼
f ðXt þ 1 ÞWt þ 1 þ ða=2ÞðNt þ 2 =Nt þ 1 Þ2 ða=2Þð1sÞ2 ¼ RH t þ1: aðNt þ 1 =Nt Það1sÞ
Ludvigson (2002) in the context of investment. Time-toplan means that there are time lags between the decision to hire and the actual hiring expenditure. Fig. 1 clarifies the differences between time-to-build and time-to-plan by depicting the hypothetical responses of realized returns, stock prices, employment growth, and hiring to a one-time shock to the expected return. In the one-period time-to-build model (Panel A), a discount rate drop at the beginning of t generates a higher stock price at the beginning of t. Without time-to-plan, hiring over period t rises immediately. With time-tobuild, employment stock increases only at the beginning of t + 1, meaning that employment growth over period t is positive. In addition, because the discount rate at the beginning of t drops, the realized return over period t, denoted Rt + 1, declines on average. The stock price also drops, along with the hiring over period t + 1. Time-tobuild again implies that employment stock decreases only at the beginning of t + 2, meaning that employment growth over period t +1 is negative. As such, regressing short-term employment growth, Nt + 1/Nt, on the discount rate, Et[Rt + 1], should yield a negative slope, but regressing long-term employment growth, Nt + 2/Nt + 1, on the discount rate should yield a positive slope. However, regressing the hiring rate, Ht + 1/Nt + 1, on the discount rate should yield only a positive slope without sign switching at longer horizons. Panel B of Fig. 1 analyzes two-period time-to-build, which means that hiring at the beginning of t, Ht, leads only to more productive workers at the beginning of t +2. A discount rate drop at the beginning of t generates a higher stock price at the beginning of t. Hiring goes up immediately, but with two-period time-to-build, employment stock at the beginning of t +1 remains unchanged. Because the discount rate at the beginning of t drops, the realized return over period t declines on average. The stock price at the beginning of t +1 and hiring over period t + 1 both fall. However, employment stock at the beginning of t + 2 increases as a result of hiring two periods ago. At period t + 2, the stock price and hiring remain constant because there is only one-time shock to the discount rate at the beginning of t. However, employment stock at the beginning of t + 3 decreases as a result of firing at t +1. The bottomline is that regressing employment growth up to t + 2, Nt + 2/Nt, on the discount rate should yield a negative slope, but regressing long-term employment growth, Nt + 3/Nt + 2, should yield a positive slope. As such, twoperiod time-to-build prolongs the horizon over which the slope switches signs by one more period. However, hiring rate dynamics remains the same. Regressing the hiring rate, Ht + 1/Nt + 1, on the discount rate yields only a positive slope without sign flipping at longer horizons. Panel C of Fig. 1 combines one-period time-to-build with one-period time-to-plan as in Lettau and Ludvigson (2002). Similar to Panels A and B, a discount rate drop at the beginning of t generates a higher stock price. With one-period time-to-plan, hiring rises only in the next period but remains unchanged in the current period. As such, employment stock at the beginning of t + 1 remains unchanged. Because the discount rate at the beginning of t drops, the realized return over period t declines on
L. Chen, L. Zhang / Journal of Financial Economics 99 (2011) 385–399
389
Fig. 1. Temporal relations between the expected return, labor hiring, and employment growth. The time lines depict the hypothesized responses of hiring, employment growth, stock prices, and realized returns to a one-time shock to the expected return. Et[Rt + 1] is the expected return from the beginning to the end of period t conditional on information at the beginning of t. Pt is the ex-dividend stock price at the beginning of t. Ht is the number of new hires (a flow variable) over period t. Nt + 1/Nt is employment growth from the beginning to the end of period t. Rt + 1 is the realized return from the beginning to the end of t. We depict the time lines for three models: standard one-period time-to-build (no time-to-plan, Panel A), two-period time-to-build (no time-to-plan, Panel B), and oneperiod time-to-build (and one-period time-to-plan, Panel C).
average. The stock price at the beginning of t+ 1 drops and causes firms to commit to decreasing hiring over the next period. However, because of the hiring commitment made at period t, hiring rises in period t + 1. With one-period time-to-build, employment stock increases at the beginning of t +2. Over period t +2, the stock price and the stock return are constant, hiring falls per the prior commitment, and employment stock falls at the beginning of t+ 3. Comparing Panels B and C shows that employment growth dynamics remain the same. However, hiring dynamics are different. Regressing the hiring rate on the discount rate yields negative slopes in short horizons but positive slopes in long horizons. As such, we can test the empirical relevance of time-to-plan by studying the dynamic relations between the hiring rate and the discount rate. H2. With time-to-plan, the risk premium proxies that predict market excess returns positively should have negative slopes in the short run but positive slopes in the long run in forecasting hiring rate. Without time-to-plan, these proxies should have only positive slopes in forecasting hiring rate. With time-to-plan, the risk premium proxies that predict market excess returns negatively should have positive slopes in the short run but negative slopes in the long run in forecasting hiring rate. Without time-to-plan, these proxies should have only negative slopes. 2.2.3. Forecasting market excess returns Eq. (5) also has implications for stock market predictability. If employment growth is persistent, meaning that lagged employment growth, Nt/Nt 1, forecasts current employment growth, Nt + 1/Nt, with a positive slope, lagged employment growth should forecast market excess
return, Rt + 1, with a negative slope. In addition, this forecasting power should concentrate in short horizons. It is Rt + 1, instead of Rt + 2, for example, that appears in the left-hand side of Eq. (5).3 Using Eq. (1), we can rewrite the denominator of the hiring return Eq. (5) as a(Ht/Nt). This formulation, derived under one-period time-to-build without time-to-plan, implies that the current hiring rate should forecast market excess returns, Rt + 1, with a negative slope. With oneperiod time-to-plan, the actual hiring expenditure is delayed by one period, and neither the current hiring rate nor the lagged employment growth predicts future returns. However, this result implicitly assumes that the length of one period in time-to-plan and in time-to-build equals the length of the period that it takes for the average realized return to converge to the expected return, Et[Rt + 1]. If this convergence takes longer than it takes to plan and to build, both the hiring rate and lagged employment growth should forecast returns. H3. With time-to-build, hiring rate and lagged employment growth forecast market excess returns with negative slopes. With time-to-plan, whether the hiring rate and lagged employment growth forecast returns depends on the relative length of time-to-plan, time-to-build, and 3 We use lagged employment growth, instead of current employment growth, to predict returns. Strictly speaking, in the context of the model with one-period time-to-build, Nt + 1 is known at the beginning of period t. As such, Nt + 1/Nt can be used, at least in principle, to predict Rt + 1 that goes from the beginning to the end of period t. However, in the data both Rt + 1 and Nt + 1 are observable only at the end of period t, meaning that we should use lagged employment growth to avoid look-ahead bias in forecasting returns.
390
L. Chen, L. Zhang / Journal of Financial Economics 99 (2011) 385–399
the convergence time between average and expected returns. While we have emphasized the role of labor market frictions such as time-to-build and time-to-plan, search and matching costs also are important in generating the linkages between time-varying risk premiums and labor market performance. Without search and matching costs, a = 0, Eq. (5) collapses to Rt + 1 = f(Xt + 1) Wt + 1. As such, no relation exists between the stock return and the employment growth (and the hiring rate) across various horizons. Intuitively, in a frictionless world, hiring is perfectly elastic to changes in the discount rate, meaning that a small change in the discount rate gives rise to an infinite magnitude of the hiring rate. As such, regressing future market excess returns on past employment growth or on the hiring rate should yield a slope of zero. Stock returns are not predictable with labor market variables. 3. Data and empirical specifications We describe our data in Section 3.1 and empirical specifications in Section 3.2. 3.1. Data Stock market returns. Following Lettau and Ludvigson (2002), we use the returns on the Standard and Poor (S&P) index of 500 stocks from the Center for Research in Security Prices (CRSP). The sample is quarterly from the first quarter of 1952 to the first quarter of 2009. Let rt denote the log return of the S&P index and rft the log return on the three-month Treasury bill from the Federal Reserve. The log market excess return is then rt rft. Employment growth. Employment growth is the log growth rate of payrolls (seasonally adjusted total nonfarm payrolls of all employees) from the US Bureau of Labor Statistics. The sample is quarterly from the first quarter of 1952 to the first quarter of 2009. Net hiring rate. In the model, the separation rate, s, is constant, meaning that the hiring rate (Ht/Nt) captures the same amount of information as the net hiring rate (Ht/ Nt s). In the data, however, the separation rate is timevarying and countercyclical. There is more job destruction in recessions than in booms. To capture this feature of the data, we use net hiring rates in the data to test the model’s implications for hiring rate dynamics. We merge two series to construct net hiring rates. The first series are the differences between gross hiring rates and separation rates from the Current Population Survey from the first quarter of 1977 to the fourth quarter of 2002 (e.g., Bleakley, Ferris, and Fuhrer, 1999; and Merz and Yashiv, 2007). The second series are the differences between gross hiring rates and separation rates from the Jobs Openings and Labor Turnover Survey from the first quarter of 2001 to the first quarter of 2009. To make the two series comparable in magnitude, we scale the JOLTS series from the first quarter of 2003 to the first quarter of 2009 by the ratio of the average CPS net hiring rate to the average JOLTS net hiring rate in the 2001–2002 period (the only overlapping period for the two series). The
merged series contain the CPS net hiring rates from the first quarter of 1977 to the fourth quarter of 2002 and the JOLTS net hiring rates from the first quarter of 2003 to the first quarter of 2009. Net job creation rate in manufacturing. We calculate the net job creation rates in manufacturing as the job creation rates minus the job destruction rates for manufacturing firms from Davis, Faberman, and Haltiwanger (2006). The data from the first quarter of 1952 to the first quarter of 2005 are from John Haltiwanger’s website. Risk premium proxies. The empirical finance literature has uncovered a list of financial variables that forecast market excess returns (see footnote 1). We measure the dividend yield, DP, as the natural logarithm of the sum of the past four quarters of dividends per share minus the natural logarithm of the S&P 500 index level. The source for the S&P index and its dividends is CRSP. The relative bill rate, TB, is the three-month Treasury bill rate from the Federal Reserve minus its four-quarter moving average. The term premium, TRM, is the difference between the ten-year Treasury bond yield and the three-month Treasury bill yield from the Federal Reserve. The default premium, DEF, is the difference between the BAA-rated corporate bond yield and the AAA-rated corporate bond yield from the Federal Reserve. The data for CAY are from Sydney Ludvigson’s website. The sample for all the risk premium proxies is from the first quarter of 1952 to the first quarter of 2009. Macro controls. To quantify the incremental predictive power of risk premium proxies, we employ a group of macro control variables used in prior studies to forecast future macroeconomic performance (e.g., Barro, 1990; and Lettau and Ludvigson, 2002). These macro controls are lagged payroll growth, De; lagged net hiring rate, Dh; lagged net job creation rate in manufacturing, Dhm; lagged corporate profit growth, Dprofit, measured as the growth of the after-tax corporate profit with inventory valuation and capital consumption adjustments, seasonally adjusted in current dollars, from the Bureau of Economic Analysis; lagged growth of gross domestic product, Dgdp, measured as the growth of gross domestic product (GDP), seasonally adjusted in chain-weighted 2000 dollars, from Bureau of Economic Analysis; and the growth of average Q, Dq.4 Table 1 reports the descriptive statistics for the variables listed above. The payroll growth has a mean of 0.43% per quarter and a standard deviation of 0.64%. Lagged payroll growth forecasts future payroll growth up
4 We define a firm’s average Q as the ratio of the market value of assets to the book value of assets (Compustat annual item AT). The market value of assets equals the market value of common equity (price per share times common shares outstanding from CRSP) plus the book value of preferred stock (in sequence of availability, items PSTKL, PSTKRV, and PSTK) plus the book value of total debt [the sum of total short-term debt (item DLC) and total long-term debt (item DLTT)]. We calculate the aggregate average Q as the aggregate market value of assets divided by the aggregate value of book assets (excluding financial firms). To calculate the average Q observations within a given year, we use the market value of common equity observed at the end of each quarter within the year along with all the other components observed at the last fiscal year-end.
L. Chen, L. Zhang / Journal of Financial Economics 99 (2011) 385–399
391
Table 1 Summary statistics. For a list of key variables, we report the summary statistics such as mean, standard deviation (Std), minimum, 25th percentile, median, 75th percentile, maximum, and the first-, second-, and fourth-order autocorrelations (r1 , r2 , and r4 , respectively). Standard & Poor’s 500 index returns are from the Center for Research in Security Prices. Payroll growth is the log growth rate of payroll (seasonally adjusted total nonfarm payrolls of all employees) from the US Bureau of Labor Statistics. Net hiring rate is the merged series of the difference between gross hiring and separation rates from the Current Population Survey (CPS) from 1977:Q1 to 2002:Q4 and the difference between gross hiring and separation rates from Jobs Openings and Labor Turnover Survey (JOLTS) from 2003:Q1 to 2009:Q1. We scale the JOLTS series from the first quarter of 2003 to the first quarter of 2009 by the ratio of the average CPS net hiring rate to the average JOLTS net hiring rate in the period 2001–2002. The net job creation rate in manufacturing is the difference between the job creation and job destruction rates for manufacturing firms from Davis, Faberman, and Haltiwanger (2006). The series is from 1952:Q1 to 2005:Q1 and is from John Haltiwanger’s website. DP is the natural logarithm of the sum of the past four quarters of dividends per share minus the natural logarithm of the S&P index level. TB is the relative bill rate, measured as the three-month Treasury bill rate from the Federal Reserve Board minus its four-quarter moving average. TRM is the difference between the ten-year Treasury bond yield and the three-month Treasury bill yield from the Federal Reserve. DEF is the difference between BAA-rated and AAA-rated corporate bond yields from the Federal Reserve. CAY is Lettau and Ludvigson’s (2001) log consumptionwealth ratio and is from Sydney Ludvigson’s website. Corporate profit growth is the growth rate of the after-tax corporate profit with inventory valuation and capital consumption adjustments, seasonally adjusted in current dollars, from the Bureau of Economic Analysis. GDP growth is the growth rate of gross domestic product, seasonally adjusted in chain-weighted 2000 dollars, from the Bureau of Economic Analysis. Tobin’s Q is the ratio of the aggregate market value of assets divided by the aggregate book value of assets (excluding financial firms). The market value of assets is the sum of the market value of common equity, the book value of preferred stock, and the book value of total debt. Except for net hiring rate and net job creation rate in manufacturing, the sample for all the other variables is from 1952:Q1 to 2009:Q1. All the series, except for DP, CAY, and Tobin’s Q, are in quarterly percent. Variables Log S&P 500 excess return Payroll growth Net hiring rate Net job creation rate in manufacturing DP CAY TB TRM DEF Corporate profit growth GDP growth Tobin’s Q
Mean
Std
Minimum
25%
Median
75%
Maximum
r1
r2
r4
1.22 0.43 0.26 0.22 3.47 0.00 0.01 1.37 0.97 1.47 0.49 1.09
8.09 0.64 0.36 1.22 0.41 0.01 0.83 1.20 0.46 5.41 0.98 0.24
31.57 2.12 0.82 4.31 4.49 0.03 4.07 2.65 0.34 25.82 3.26 0.71
2.94 0.10 0.07 0.83 3.59 0.01 0.37 0.51 0.68 1.85 0.03 0.90
2.47 0.54 0.28 0.07 3.43 0.00 0.04 1.32 0.82 1.57 0.54 1.03
6.53 0.82 0.46 0.44 3.18 0.01 0.42 2.25 1.16 4.61 1.01 1.29
19.06 2.09 1.49 4.50 2.78 0.04 3.56 4.42 3.38 15.10 3.39 1.67
0.10 0.70 0.44 0.70 0.97 0.88 0.46 0.79 0.88 0.15 0.37 0.95
0.03 0.43 0.29 0.40 0.95 0.80 0.10 0.63 0.74 0.03 0.19 0.89
0.02 0.03 0.18 0.11 0.89 0.67 0.11 0.41 0.57 0.13 0.08 0.80
to two quarters. The first-order autocorrelation is 0.70, the second-order autocorrelation is 0.43, but the fourth-order autocorrelation is close to zero. The net hiring rate is on average 0.26% per quarter with a standard deviation of 0.36%. The net job creation rate in manufacturing is on average 0.22% per quarter, meaning that the manufacturing sector has been declining in our sample period. The net hiring rate for the manufacturing sector is also more volatile (with a standard deviation of 1.22% per quarter) than the net hiring rate for the overall economy. Both the net hiring rate and the net job creation rate in manufacturing are autocorrelated at short horizons. 3.2. Empirical specification To forecast market excess returns and payroll growth, we use standard long-horizon predictive regressions (e.g., Lettau and Ludvigson, 2002). For market excess returns, we use as the dependent variables the I-quarter cumulative log excess returns on the S&P 500 composite P index, Ii ¼ 1 rt þ i rft þ i , in which I is the forecast horizon ranging from one quarter to 16 quarters. For payroll growth, we use as the dependent variables the I-quarter cumulative growth rates of total nonfarm payrolls, PI i ¼ 1 nt þ i nt þ i1 ¼ nt þ I nt , where nt is the natural logarithm of total nonfarm payrolls in quarter t. For each regression, we report the slopes, the Newey and West (1987) corrected t-statistics, the adjusted R2s, and the implied R2s adjusted for overlapping observations in long-
horizon regressions and calculated from vector autoregressions per Hodrick (1992). To forecast net hiring rates, we use as the dependent variables the I-quarter-ahead net hiring rate, Ht + I/Nt + I st + I, where Ht + I/Nt + I and st + I are the I-quarter-ahead gross hiring rate and separation rate, respectively. The dependent variables in forecasting the net job creation rate in manufacturing are defined analogously. We forecast the single-period net hiring rates because time-aggregating net hiring rates (using Eq. (1)) leads to long-horizon employment growth, which is irrelevant for testing the hiring rate dynamics per Hypothesis 2. For each regression, we report the slopes, the Newey and West corrected t-statistics, and the adjusted R2s. Because the single-period net hiring rates do not involve overlapping observations, there is no need for the Hodrick adjustment. 4. Empirical results We use risk premium proxies to forecast payroll growth in Section 4.1 and net hiring rates in Section 4.2. In Subsection 4.3 we use payroll growth and net hiring rates to forecast market excess returns. 4.1. Do risk premium proxies forecast payroll growth? To provide background on time-varying risk premiums, we present up-to-date long-horizon forecasts of market excess returns with standard risk premium
392
L. Chen, L. Zhang / Journal of Financial Economics 99 (2011) 385–399
proxies. We then use these proxies to forecast payroll growth, with and without macro controls. 4.1.1. Risk premium proxies Using our updated sample, Table 2 reports the longhorizon forecasts of S&P 500 index excess returns. Panel A shows that the dividend yield reveals some ability to forecast excess returns. The slopes are all positive, with the Newey and West t-statistics mostly above two. Using the same empirical specifications but in a shorter sample
Table 2 Forecasting stock market excess returns with financial variables (1952:Q1–2009:Q1). This table reports long-horizon regressions of log excess returns on the P Standard & Poor’s 500 index, Ii ¼ 1 rt þ i rft þ i , in which I is the forecast horizon in quarters. The regressors are one-quarter lagged values of the log consumption-to-wealth ratio (CAY), the log dividend yield (DP), the relative Treasury bill rate (TB), the term premium (TRM), the default premium (DEF), and their combination. We report the ordinary least squares estimate of the slopes (Slope), the Newey and West corrected tstatistics (tNW), the adjusted R2s, and the implied R2s calculated from vector autoregressions per Hodrick (1992). Forecast horizon in quarters 1
2
4
8
12
16
Panel A: Univariate regressions with DP Slope 0.03 0.06 0.11 tNW 1.94 2.12 2.31 Adjusted R2 0.01 0.03 0.07 Implied R2 0.01 0.03 0.05
0.20 2.48 0.12 0.09
0.26 2.57 0.15 0.15
0.29 2.83 0.15 0.18
Panel B: Univariate regressions with CAY Slope 1.19 2.37 4.57 tNW 3.96 4.08 4.20 2 Adjusted R 0.04 0.08 0.14 Implied R2 0.05 0.09 0.15
8.17 4.98 0.25 0.20
10.66 5.24 0.31 0.21
12.35 5.71 0.34 0.21
Panel C: Univariate regressions with TB Slope 1.15 1.97 2.86 tNW 1.81 1.65 1.18 2 Adjusted R 0.01 0.01 0.01 Implied R2 0.01 0.02 0.02
3.51 1.57 0.01 0.01
3.61 1.61 0.01 0.01
3.71 1.22 0.01 0.01
4.72 2.71 0.05 0.04
5.95 3.44 0.07 0.03
7.36 3.24 0.09 0.02
Panel E: Univariate regressions with DEF Slope 0.14 0.32 0.22 tNW 0.09 0.12 0.05 Adjusted R2 0.00 0.00 0.00 2 Implied R 0.00 0.00 0.01
2.32 0.43 0.00 0.01
3.21 0.45 0.00 0.01
0.00 0.00 0.00 0.01
Panel F: Multiple regressions DP, Slope 0.03 0.06 DP, tNW 1.73 1.97 CAY, Slope 0.90 1.81 CAY, tNW 2.42 2.80 TB, Slope 0.98 1.69 TB, tNW 1.26 1.33 TRM, Slope 0.24 0.37 TRM, tNW 0.35 0.33 DEF, Slope 1.16 2.29 DEF, tNW 0.72 0.90 Adjusted R2 0.05 0.10 2 Implied R 0.08 0.13
0.19 2.30 6.38 3.34 0.71 0.27 3.09 1.21 8.33 1.50 0.35 0.37
0.23 2.60 8.86 3.74 1.04 0.43 4.68 1.81 7.81 1.32 0.45 0.44
0.26 2.85 10.64 4.37 2.38 0.85 6.24 2.51 4.68 0.61 0.51 0.44
Panel D: Univariate regressions with TRM Slope 0.79 1.38 2.91 tNW 1.51 1.42 1.91 2 Adjusted R 0.01 0.01 0.04 2 Implied R 0.01 0.02 0.04
0.12 2.14 3.41 2.99 1.71 0.76 1.59 0.96 4.70 0.99 0.20 0.25
through 1999, Lettau and Ludvigson (2002) show only weak predictability with the dividend yield. Our evidence suggests that the dividend yield’s predictive power has substantially increased over the past decade, probably because market valuation ratios have mean-reverted from their exceedingly high levels in the late 1990s. Consistent with Lettau and Ludvigson’s (2001), Panel B shows that CAY reliably predicts market excess returns. The implied R2 starts at 5% at the quarterly horizon, rises to 15% at the four-quarter horizon, and increases further to 21% at the 16-quarter horizon. The slopes are universally positive. The Newey and West t-statistics start at 4.0 at the quarterly horizon, increase to 4.2 at the four-quarter horizon and further to 5.7 at the 16-quarter horizon. The relative bill rate forecasts market excess returns, but the predictive power is low. The slopes are insignificantly negative, and the adjusted R2 remains at 1% across all horizons. The term spread forecasts excess returns with a positive slope, albeit insignificant. As in the sample through 1999 in Lettau and Ludvigson (2002), the default premium does not show any forecasting power in our sample. The slopes have mixed signs and are all within 0.5 standard errors from zero. In multiple regressions with all five regressors, CAY is the strongest proxy, followed by the dividend yield. 4.1.2. Forecasting payroll growth Table 3 reports the long-horizon regressions of the quarterly growth rate of total nonfarm payrolls on the risk premium proxies. Consistent with Hypothesis 1, the evidence shows that time-varying aggregate risk premiums are negatively correlated with short-horizon employment growth but are positively correlated with long-horizon employment growth. From Panel A, the dividend yield forecasts shorthorizon payroll growth with a negative slope and longhorizon payroll growth with a positive slope. However, the predictability evidence is weak. The slopes across different horizons are all within 1.6 standard errors from zero. Panel B shows that the results using CAY as a risk premium proxy are slightly stronger than those using the dividend yield. High values of CAY weakly predict low payroll growth at short horizons but high payroll growth at long horizons. In particular, the Newey and West tstatistic is 1.5 at the eight-quarter horizon and about 1.8 at the 12- and 16-quarter horizons. From Panel C, the results using the relative bill rate strongly conform to Hypothesis 1. High values of the relative bill rate that predict low risk premiums (see Table 2) also forecast high payroll growth at short horizons but low payroll growth at long horizons. The dynamic sign before pattern is significant. The Newey and West t-statistics of the slope start at 2.9 by the first quarter, decrease to 2.0 by the second quarter and further to 0.8 by the fourth quarter, before turning significantly negative from the eightquarter horizon and onwards. Consistent with Lettau and Ludvigson’s (2002) evidence that the term spread has strong forecasting power for investment growth, Panel D shows that the term spread also forecasts payroll growth. However, the slopes are all positive and mostly significant across all horizons. Following Lettau
L. Chen, L. Zhang / Journal of Financial Economics 99 (2011) 385–399
Table 3 Forecasting payroll growth with risk premium proxies (1952:Q1– 2009:Q1). This table reports long-horizon regressions of payroll growth. The dependent variable is the I-quarter cumulative growth of seasonally adjusted total nonfarm payrolls of all employees, nt + I nt, in which nt is the logarithm of total payrolls in period t. The regressors are one-quarter lagged values of the log consumption-to-wealth ratio (CAY), the log dividend yield (DP), the detrended short-term Treasury bill rate (TB), the term premium (TRM), the default premium (DEF), and their combination. We report the ordinary least squares estimate of the slopes (Slope), the Newey and West corrected t-statistics (tNW), the adjusted R2s, and the implied R2s calculated from vector autoregressions per Hodrick (1992). Forecast horizon in quarters 1
2
4
8
12
16
Panel A: Univariate regressions with DP Slope 0.00 0.00 0.00 tNW 0.32 0.08 0.42 Adjusted R2 0.00 0.00 0.00 Implied R2 0.00 0.00 0.00
0.01 0.99 0.01 0.01
0.02 1.41 0.04 0.02
0.02 1.54 0.06 0.03
Panel B: Univariate regressions with CAY Slope 0.02 0.02 0.12 tNW 0.80 0.27 0.92 2 Adjusted R 0.00 0.00 0.00 Implied R2 0.00 0.00 0.00
0.39 1.54 0.03 0.00
0.52 1.79 0.04 0.00
0.60 1.85 0.04 0.00
Panel C: Univariate regressions with TB Slope 0.22 0.29 0.18 tNW 2.87 2.03 0.79 2 Adjusted R 0.08 0.04 0.00 2 Implied R 0.09 0.06 0.03
0.71 2.42 0.03 0.01
1.06 2.79 0.05 0.01
0.94 2.22 0.03 0.01
1.05 3.96 0.17 0.10
1.10 3.19 0.12 0.08
0.81 1.91 0.05 0.06
Panel E: Univariate regressions with DEF Slope 0.49 0.78 0.74 tNW 4.30 2.93 1.25 Adjusted R2 0.12 0.08 0.02 2 Implied R 0.17 0.12 0.05
0.03 0.03 0.00 0.03
0.53 0.38 0.00 0.02
0.91 0.61 0.00 0.02
Panel F: Multiple regressions DP, Slope 0.00 0.00 DP, tNW 0.87 0.85 CAY, Slope 0.07 0.12 CAY, tNW 2.37 1.92 TB, Slope 0.27 0.43 TB, tNW 3.52 3.01 TRM, Slope 0.22 0.44 TRM, tNW 4.75 4.69 DEF, Slope 0.52 0.91 DEF, tNW 3.72 2.88 Adjusted R2 0.25 0.21 2 Implied R 0.26 0.16
0.02 2.28 0.07 0.26 0.04 0.11 1.27 4.62 1.95 1.85 0.25 0.12
0.03 2.48 0.08 0.27 0.41 0.87 1.14 3.00 1.63 1.19 0.21 0.12
0.03 1.96 0.31 0.90 0.49 0.88 0.70 1.61 0.86 0.51 0.14 0.11
Panel D: Univariate regressions with TRM Slope 0.03 0.13 0.44 tNW 0.65 1.53 2.86 2 Adjusted R 0.00 0.01 0.06 Implied R2 0.00 0.01 0.06
0.01 1.18 0.11 0.77 0.57 2.52 0.83 4.86 1.30 1.94 0.17 0.11
and Ludvigson, we interpret the evidence as indicating the term spread’s strong forecasting power for output growth.5 The reason might be that the effect of the term spread works
5 A large body of work shows the predictive power of the term spread for real economic activity. Harvey (1988) shows the predictive relation of the term spread with consumption growth. Stock and Watson (1989) and Chen (1991) show that the term spread forecasts output growth. Estrella and Hardouvelis (1991) report that the term spread predicts the growth of gross national product, consumption (nondur-
393
primarily through the cash flow channel, as opposed to the risk premium channel that we focus on. In particular, the term spread is strongly affected by inflationary expectations and monetary policy, and the predictive power of the term spread for economic growth depends on the degree to which the Federal Reserve reacts to deviations in output from its long-term trend (e.g., Estrella, 2005). The term spread tends to rise when the Federal Reserve cuts the short-term interest rate to stimulate the economy, and a boom in economic activity and inflation typically follows such a policy move with a lag. The term spread tends to fall when the Federal Reserve raises the short-term interest rate to curb the inflation, and a slowdown in economic activity and inflation typically follows with a lag. From Panel E, the default spread predicts payroll growth with significantly negative slopes at short horizons but with insignificant slopes at long horizons. Although the sign pattern is consistent with Hypothesis 1, the predictability at long horizons is negligible. However, this evidence might suggest that the default spread is a weak risk premium proxy at long horizons (see Table 2). Panel F reports long-horizon multiple regressions of payroll growth with all five risk premium proxies. All the proxies show marginal predictive power for payroll growth at some horizons. With all five proxies included, the empirical specification has reliable predictive power for payroll growth at every horizon, with the adjusted R2 varying from 14% to 25% and the implied R2 from 11% to 26%. However, multicollinearity between regressors can make the sign pattern of any individual proxy in the multiple regressions difficult to interpret. To facilitate the economic interpretation, we calculate the correlations between the fitted one-quarter-ahead risk premiums using all five proxies and cumulative payroll growth rates at the various future horizons. (The slopes of the proxies in the fitted one-quarter-ahead risk premiums are reported in the first column of Panel F in Table 2.) We use the expected one-quarter-ahead risk premiums because our testable hypotheses are derived under a one-time shock to the discount rate, Et[Rt + 1]. In any case, using the fitted multi-quarter-ahead risk premiums yields largely similar results (not reported). Panel A of Fig. 2 shows the impact of time-varying risk premiums. The correlations between the one-quarterahead expected risk premiums and cumulative payroll growth are insignificantly negative within two horizons, insignificantly positive at the four-quarter horizon, but significantly positive from the eight-quarter horizon and onward. Realized payroll growth rates, however, are affected by ex post shocks that can bias the estimated correlations toward zero. As such, we also report in Panel B the correlations between the fitted one-quarter-ahead risk premiums and the fitted payroll growth from the long-horizon regressions in Panel F of Table 3. The evidence is clear. The correlations between risk premiums
(footnote continued) ables plus services), consumption durables, investment, and recession probabilities.
L. Chen, L. Zhang / Journal of Financial Economics 99 (2011) 385–399
1
1
0.8
0.8
0.6
0.6
Correlations
Correlations
394
0.4 0.2 0
0.4 0.2 0 –0.2
–0.2
–0.4
–0.4 0
2
4
6 8 10 12 14 16 18 Forecast horizon
0
2
4
6 8 10 12 14 16 18 Forecast horizon
Fig. 2. Correlations between the fitted one-quarter-ahead risk premiums and cumulative payroll growth, both realized and expected, across different forecast horizons. Panel A plots the correlations between the fitted one-quarter-ahead risk premiums, Et[Rt + 1], using all five risk premium proxies and the I-quarter-ahead cumulative payroll growth rate, where I varies from one quarter to 16 quarters. (The slopes for the proxies in the fitted one-quarterahead risk premiums are from the first column of Panel F in Table 2.) Panel B plots the correlations between the fitted one-quarter-ahead risk premiums and the fitted I-quarter-ahead cumulative payroll growth. Both fitted series use all five risk premium proxies. The correlations that are significant at the 5% level are indicated with big squares in red, and the correlations that are insignificant at the 5% level are indicated with small squares in black. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
and the expected payroll growth are significantly negative in the one-quarter horizon, are close to zero in the twoquarter horizon, and are significantly positive in all subsequent horizons. The evidence suggests that the combined effect of time-to-build and time-to-plan lasts for about two quarters in the aggregate payroll data. In untabulated results we also have studied longhorizon regressions on risk premium proxies using the growth rate of average weekly hours (seasonally adjusted average weekly hours of total private industries from the US Bureau of Labor Statistics). This variable is also an indicator of labor market performance (e.g., Stock and Watson, 1999). Without showing the details, we can report that CAY, the term spread, and to a lesser extent the default spread all predict the growth of weekly hours with significantly positive slopes, especially at long horizons. The relative bill rate predicts the growth of weekly hours with significantly negative slopes across most horizons. More important, there is no dynamic sign switching pattern as in the case of payroll growth. The evidence suggests that adjusting average weekly hours is a relatively smooth process, whereas adjusting total nonfarm payrolls is a more sluggish process. Adjusting payrolls means hiring and firing workers, a process that is time-consuming and costly. In contrast, adjusting weekly hours means changing the utilization rate of existing workers, a process that is likely smooth. 4.1.3. Forecasting payroll growth relative to macro controls As an indicator of the macroeconomy, payroll growth is likely correlated with past macroeconomic performance. We ask whether the risk premium proxies contain any information about future payroll growth beyond what is already contained in standard macro control variables. Table 4 reports the forecasts of payroll growth with macro controls. From Panel A, the lagged values of payroll growth, corporate profit growth, and GDP growth predict
future payroll growth with largely positive slopes. Unlike risk premium proxies, their predictive power mostly concentrates at short horizons. The adjusted R2 peaks at 51% at the one-quarter horizon and monotonically decreases to 20% at the four-quarter horizon and to 3% at the 12-quarter horizon. The implied R2 peaks at 53% at the one-quarter horizon and monotonically decreases to 28% at the four-quarter horizon and to 15% at the 12-quarter horizon. Turning to the slopes, the lagged payroll growth has predictive power within four quarters. Lagged GDP growth has some predictive power from the two to 12-quarter horizons. Lagged corporate profit growth retains some predictive power at horizons longer than four quarters, but lagged growth of Tobin’s Q has insignificant slopes across all horizons. From Panel B of Table 4, when we include all five risk premium proxies into the empirical specification with four macro controls, the regression explains a larger fraction of the variation in future payroll growth than what can be explained by macro controls alone. The incremental fraction explained per the adjusted R2 is substantial. Using only the macro controls, the regression explains only 5% and 1% of the payroll growth variation at the eight-quarter and the 16-quarter horizons, respectively. Adding risk premium proxies increases the respective fractions to 33% and 26%. However, the improvement is more modest in the implied R2. Risk premium proxies increase the implied R2 by only 3% to 12%. Also, there is no evidence that the improvement is larger in long horizons. The improvement in the adjusted R2 seems mostly driven by overlapping observations.
4.2. Do risk premium proxies forecast net hiring rates? Hypothesis 2 says that, without time-to-plan, regressing future hiring rates on the discount rate should yield only positive slopes without sign switching at long
L. Chen, L. Zhang / Journal of Financial Economics 99 (2011) 385–399
Table 4 Payroll growth regressions (1952:Q1–2009:Q1). The dependent variable is the I-quarter cumulative growth of seasonally adjusted total nonfarm payrolls of all employees, nt + I nt, in which nt is the logarithm of total payrolls in period t. The regressors are combinations of one-period lagged values of employment growth (Dn), profit growth (Dprofit), growth of average Q (Dq), growth of gross domestic product (Dgdp), one-quarter lagged values of the log consumption-to-wealth ratio (CAY), the log dividend yield (DP), the relative Treasury bill rate (TB), the term premium (TRM), and the default premium (DEF). We report the ordinary least squares estimate of the slopes (Slope), the Newey-West corrected t-statistics (tNW), the adjusted R2s, and the implied R2s calculated from vector autoregressions per Hodrick (1992). Forecast horizon in quarters 1 Panel A: Multiple Dn, Slope Dn, tNW Dprofit, Slope Dprofit, tNW Dq, Slope Dq, tNW Dgdp, Slope Dgdp, tNW Adjusted R2 Implied R2
2
4
8
12
16
regressions with macro controls 0.60 0.88 0.81 0.15 7.47 4.58 1.93 0.23 0.01 0.02 0.04 0.09 1.18 1.14 1.17 1.88 0.00 0.00 0.01 0.00 1.63 1.35 0.63 0.08 0.05 0.18 0.38 0.32 1.12 1.91 1.74 0.97 0.51 0.41 0.20 0.05 0.53 0.43 0.28 0.18
0.19 0.26 0.10 1.62 0.00 0.14 0.39 1.16 0.03 0.15
0.04 0.04 0.08 0.98 0.00 0.04 0.34 0.91 0.01 0.14
Panel F: Multiple regressions with macro controls and risk premium proxies Dn, Slope 0.66 1.05 1.12 0.97 0.80 1.17 Dn, tNW 6.66 4.44 2.46 1.66 1.22 1.57 Dprofit, Slope 0.01 0.01 0.01 0.03 0.05 0.04 Dprofit, tNW 0.82 0.88 0.43 0.63 0.87 0.53 Dq, Slope 0.00 0.01 0.02 0.03 0.04 0.07 Dq,tNW 1.73 1.85 2.04 1.66 1.87 1.97 Dgdp, Slope 0.04 0.14 0.35 0.32 0.38 0.38 Dgdp, tNW 0.83 1.47 1.92 1.12 1.13 0.87 DP, Slope 0.00 0.00 0.01 0.03 0.04 0.04 DP, tNW 1.21 1.40 1.83 2.70 2.96 2.77 CAY, Slope 0.01 0.01 0.07 0.14 0.36 0.75 CAY, tNW 0.57 0.21 0.61 0.61 1.31 2.10 TB, Slope 0.01 0.06 0.01 0.52 0.81 0.87 TB, tNW 0.22 0.52 0.03 1.36 1.54 1.43 TRM, Slope 0.10 0.23 0.59 1.06 0.98 0.64 TRM, tNW 3.14 3.30 4.04 3.66 2.38 1.23 DEF, Slope 0.07 0.13 0.03 0.50 0.20 1.89 DEF, tNW 0.68 0.46 0.05 0.59 0.22 1.40 Adjusted R2 0.58 0.50 0.36 0.33 0.30 0.26 Implied R2 0.60 0.52 0.40 0.28 0.22 0.17
horizons. With time-to-plan, however, regressing the hiring rates on the discount rate should yield negative slopes at short horizons but positive slopes at long horizons. We test this hypothesis in this subsection. Table 5 regresses the I-quarter-ahead net hiring rate constructed from the merged CPS and JOLTS series on risk premium proxies. Overall, there is no evidence in support of time-to-plan. From the univariate regressions, the dividend yield, CAY, and term spread slopes are mostly positive. The relative bill rate slopes show the hypothesized sign switching pattern, but the slopes in the short horizons are only insignificantly positive. The default spread slopes show a more clear-cut sign switching pattern, but, as shown in Table 2, the default spread is only a weak risk premium proxy.
395
Table 5 Forecasting net hiring rate with risk premium proxies (1977:Q1– 2009:Q1). The dependent variable is the I-quarter-ahead net hiring rate. The regressors are one-quarter lagged values of the log consumption-towealth ratio (CAY), the log dividend yield (DP), the relative Treasury bill rate (TB), the term premium (TRM), the default premium (DEF), and their combination. We report the ordinary least squares estimate of the slopes (Slope), the Newey and West corrected t-statistics (tNW), and the adjusted R2s. Forecast horizon in quarters 1
2
4
8
12
16
Panel A: Univariate regressions with Slope 0.00 0.00 tNW 1.57 1.88 Adjusted R2 0.02 0.03
DP 0.00 2.26 0.05
0.00 2.43 0.08
0.00 2.49 0.09
0.00 2.30 0.07
Panel B: Univariate regressions with Slope 0.01 0.01 tNW 0.64 0.92 Adjusted R2 0.01 0.00
CAY 0.03 1.31 0.01
0.04 1.27 0.03
0.05 1.30 0.03
0.02 0.51 0.00
Panel C: Univariate regressions with TB Slope 0.03 0.00 0.00 tNW 0.61 0.07 0.08 2 Adjusted R 0.00 0.01 0.01
0.07 1.99 0.02
0.10 2.18 0.06
0.01 0.38 0.01
Panel D: Univariate regressions with TRM Slope 0.03 0.05 0.05 tNW 1.14 2.11 1.74 2 Adjusted R 0.00 0.02 0.03
0.08 2.28 0.08
0.03 0.66 0.00
0.04 1.24 0.01
Panel E: Univariate regressions with DEF Slope 0.22 0.13 0.04 tNW 2.35 1.11 0.33 Adjusted R2 0.09 0.02 0.01
0.19 1.92 0.05
0.22 2.29 0.07
0.12 1.41 0.02
0.00 1.74 0.03 0.86 0.00 0.04 0.06 1.76 0.10 0.83 0.14
0.00 1.11 0.05 1.24 0.10 1.81 0.03 0.66 0.12 0.70 0.16
0.00 1.33 0.03 0.53 0.07 1.26 0.07 1.94 0.01 0.08 0.09
Panel F: Multiple regressions DP, Slope 0.00 0.00 DP, tNW 4.76 4.03 CAY, Slope 0.06 0.05 CAY, tNW 3.21 2.62 TB, Slope 0.03 0.03 TB, tNW 0.50 0.70 TRM, Slope 0.05 0.06 TRM, tNW 2.35 2.31 DEF, Slope 0.49 0.39 DEF, tNW 5.57 3.78 Adjusted R2 0.26 0.17
0.00 2.40 0.01 0.45 0.01 0.22 0.06 1.80 0.14 0.92 0.07
Summarizing the information contained in different risk premium proxies, we plot the correlations between the fitted one-quarter-ahead risk premiums (estimated with all five proxies) with the I-quarter-ahead net hiring rates, where I varies from one to 16 quarters. Panel A of Fig. 3 shows that the correlations are all positive, ranging from 0.16 to 0.35, and are mostly significant at the 5% level. Using an instrumental variables approach to control for noises in realized net hiring rates, we also correlate the fitted onequarter-ahead risk premiums with the fitted I-quarterahead net hiring rates (estimated in Panel F of Table 5). The correlations are significantly positive across all horizons, suggesting that there is no time-to-plan in the aggregate hiring data. The evidence is more supportive of time-to-plan for manufacturing firms. Table 6 regresses the I-quarter-ahead net job creation rate in manufacturing on risk premium proxies. The dynamic sign pattern predicted by time-to-plan
L. Chen, L. Zhang / Journal of Financial Economics 99 (2011) 385–399
1
1
0.8
0.8
0.6
0.6
Correlations
Correlations
396
0.4 0.2 0
0.4 0.2 0 –0.2
–0.2 –0.4
–0.4 0
2
4
6
8
10 12 14 16 18
0
2
4
1
0.8
0.8
0.6
0.6
Correlations
Correlations
1
0.4 0.2 0
6
8
10 12 14 16 18
Forecast horizon
Forecast horizon
0.4 0.2 0 –0.2
–0.2
–0.4
–0.4 0
2
4
6
8
10 12 14 16 18
Forecast horizon
0
2
4
6
8
10 12 14 16 18
Forecast horizon
Fig. 3. Correlations of the fitted one-quarter-ahead risk premiums with the net hiring rate and the net job creation rate in manufacturing, both realized and expected, across different forecast horizons. Panel A plots the correlations between the fitted one-quarter-ahead risk premiums, Et[Rt+ 1], using all five risk premium proxies and the I-quarter-ahead net hiring rates, where I varies from one quarter to 16 quarters. (The slopes for the proxies in the fitted one-quarterahead risk premiums are from the first column of Panel F in Table 2.) Panel B plots the correlations between the fitted one-quarter-ahead risk premiums and the fitted I-quarter-ahead net hiring rates. Both fitted series use all five risk premium proxies. Panel C plots the correlations between the fitted one-quarter-ahead risk premiums and the I-quarter-ahead net job creation rates in manufacturing. Panel D plots the correlations between the fitted one-quarter-ahead risk premiums and the fitted I-quarter-ahead net job creation rates in manufacturing. Both fitted series again use all five risk premium proxies. The correlations that are significant at the 5% level are indicated with big squares in red, and the correlations that are insignificant at the 5% level are indicated with small squares in black. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
is clearly visible in the relative bill rate slopes in Panel C. The relative bill rate has a significantly positive slope of 0.48 (t=2.96) in the one-quarter horizon, a weakly positive slope in the two-quarter horizon, and a significantly negative slope of 0.27 (t= 2.46) in the four-quarter horizon. The slope remains significantly negative at the eight-quarter horizon but is close to zero afterward. Aggregating the information from different risk premium proxies, we correlate the fitted one-quarter-ahead risk premiums with the I-quarter-ahead realized net job creation rates in manufacturing. Panel C of Fig. 3 reports a clear sign switching pattern in support of time-to-plan. The correlation is significantly negative at 0.16 in the one-quarter horizon, close to zero in the two-quarter horizon, and significantly positive at 0.26 and 0.30 in the four- and eight-quarter horizons. Using the instrumental variables approach to control for noises in realized net job creation rates in manufacturing, Panel D correlates the fitted one-quarter-ahead risk premiums with the fitted I-quarter-ahead net job creation rates (estimated in Panel F of Table 6). The correlation starts by being significantly negative, 0.29, in the one-quarter horizon, becomes zero
in the two-quarter horizon, and turns significantly positive in the subsequent horizons. In all, the evidence suggests the length of time-to-plan of about two quarters in job creation in the manufacturing sector.
4.3. Do labor market variables forecast stock market excess returns? To test stock market predictability with labor market variables, we use empirical specifications similar to those in Table 2. The dependent variables are future log excess returns on the S&P 500 index over various horizons. The regressors are one-quarter lagged values of payroll growth, net hiring rate, and net job creation rate in manufacturing, with and without the lagged values of the dividend yield, CAY, the relative bill rate, the term spread, and the default spread in multiple regressions. From Panel A of Table 7, payroll growth predicts market excess returns, especially at business cycle frequencies. The adjusted R2 is hump-shaped. It starts at 1% at the one-quarter horizon, peaks at 5% at the
L. Chen, L. Zhang / Journal of Financial Economics 99 (2011) 385–399
Table 6 Forecasting net job creation rate in manufacturing with risk premium proxies (1952:Q1–2005:Q1). The dependent variable is the I-quarter-ahead net job creation rate in manufacturing. The regressors are one-quarter lagged values of the log consumption-to-wealth ratio (CAY), the log dividend yield (DP), the relative Treasury bill rate (TB), the term premium (TRM), the default premium (DEF), and their combination. We report the ordinary least squares estimate of slopes (Slope), the Newey and West t-statistics (tNW), and adjusted R2s. Forecast horizon in quarters 1
2
4
8
12
16
Panel A: Univariate regressions with DP Slope 0.00 0.00 0.00 tNW 0.28 0.20 0.61 Adjusted R2 0.00 0.00 0.00
0.00 0.69 0.00
0.00 0.97 0.01
0.00 0.18 0.00
Panel B: Univariate regressions with CAY Slope 0.06 0.00 0.12 tNW 1.12 0.08 1.47 Adjusted R2 0.00 0.00 0.01
0.11 1.36 0.01
0.11 1.55 0.01
0.07 0.96 0.00
Panel C: Univariate regressions with TB Slope 0.48 0.10 0.27 tNW 2.96 0.71 2.46 2 Adjusted R 0.11 0.00 0.03
0.32 2.57 0.05
0.03 0.22 0.00
0.07 0.62 0.00
0.13 1.60 0.01
0.01 0.05 0.01
0.06 0.74 0.00
Panel E: Univariate regressions with DEF Slope 1.08 0.67 0.17 0.02 tNW 4.10 2.14 0.45 0.06 Adjusted R2 0.14 0.05 0.00 0.00
0.14 0.51 0.00
0.23 1.04 0.00
Panel F: Multiple regressions DP, Slope 0.00 0.00 DP, tNW 0.39 0.24 CAY, Slope 0.12 0.09 CAY, tNW 2.41 1.48 TB, Slope 0.60 0.26 TB, tNW 3.95 1.69 TRM, Slope 0.40 0.40 TRM, tNW 5.13 4.14 DEF, Slope 0.95 0.75 DEF, tNW 3.16 2.08 Adjusted R2 0.29 0.14
0.00 0.62 0.13 1.77 0.01 0.05 0.06 0.45 0.04 0.13 0.01
0.00 0.17 0.11 1.25 0.01 0.04 0.10 0.82 0.12 0.38 0.00
Panel D: Univariate regressions with TRM Slope 0.04 0.20 0.28 tNW 0.57 2.47 2.88 2 Adjusted R 0.00 0.04 0.09
0.00 1.38 0.01 0.11 0.13 1.03 0.30 2.74 0.58 1.73 0.12
0.01 2.02 0.06 0.75 0.38 2.25 0.01 0.11 0.47 1.36 0.10
four-quarter horizon, and declines to 3% at the 16-quarter horizon. The implied R2 pattern is similar. The slopes are all negative and are significant at the four-quarter horizon and onward. As such, high payroll growth forecasts low market excess returns, and low payroll growth forecasts high market excess returns from one quarter to 16 quarters ahead. This evidence is consistent with the view that aggregate risk premiums are countercyclical, whereas payroll growth is procyclical. It is useful to compare the evidence with payroll growth and the evidence with the standard risk premium proxies in Table 2. Judged on the Newey and West t-statistics and R2s, the predictive power of payroll growth dominates that of the default spread. The R2s of the default spread are close to zero, and the slopes are within 0.5 standard errors from zero across all horizons. Payroll growth also dominates the relative bill rate in forecasting returns. The R2s of the relative bill rate are flat
397
across different horizons at 1–2% and are lower than those of payroll growth. The slopes for the relative bill rate are all within 1.9 standard errors from zero, while the slopes for the payroll growth are all significant from the fourquarter horizon and onward. The predictive power of payroll growth differs from that of the dividend yield and that of the term spread. Whereas the forecasting power of payroll growth peaks at relatively short business cycle frequencies, the dividend yield and the term spread maximize their predictive power at long horizons. Only CAY dominates payroll growth in predicting market excess returns, as evidenced by Newey and West t-statistics and R2s in univariate regressions (Panel B of Table 2). Even with CAY in bivariate regressions, payroll growth retains some predictive power for returns, as shown in Panel B of Table 7. Panel C of the same table includes all five risk premium proxies along with payroll growth in forecasting long-horizon excess returns. Payroll growth retains some predictive power from the fourthquarter horizon and onward. Judged on the t-statistics, payroll growth dominates the relative bill rate, the term spread, and the default spread in predicting returns. The default spread even has negative slopes across all horizons. Panels D to F of Table 7 show that the net hiring rate has no predictive power for market excess returns. Although the slopes are negative in univariate regressions across most horizons, going in the right direction as predicted by Hypothesis 3, the negative slopes are all within one standard error from zero. The results from bivariate regressions with CAY are largely similar. When all five risk premium proxies are included, the net hiring rate has significantly negative slopes at the fourth-quarter horizon and beyond. However, multicollinearity makes the interpretation of any individual slope in multiple regressions difficult. From Panel G, the net job creation rate in manufacturing strongly predicts market excess returns with negative slopes within four quarters. The adjusted R2 starts at 2% at the one-quarter horizon, peaks at 7% at the four-quarter horizon, and drops to zero in subsequent horizons. The implied R2 starts at 3% at the one-quarter horizon, peaks at 4% at the four-quarter horizon, and drops to 2% in subsequent horizons. The slopes are negative across all horizons and are more than 2.3 standard errors from zero within four quarters. Panel H shows that although controlling for CAY weakens the predictive power of the net job creation rate in manufacturing, its negative slopes are still more than two standard errors within the fourthquarter horizon. In the multiple regressions that include all five risk premium proxies, the forecasting power of the net job creation rate in manufacturing dominates that of the relative bill rate, the term premium, and the default spread, comparable with that of the dividend yield, and is only dominated by the forecasting power of CAY.
5. Conclusion We show empirical linkages between the stock market and the labor market. We report three major findings.
398
L. Chen, L. Zhang / Journal of Financial Economics 99 (2011) 385–399
Table 7 Forecasting stock market excess returns with labor market variables. This table reports long-horizon regressions of log excess returns on the PI Standard & Poor’s 500 index, i ¼ 1 rt þ i rft þ i , in which I is the return forecast horizon in quarters. The regressors are one-quarter lagged values of employment growth (Dn), net hiring rate (Dh), and net job creation rate in manufacturing (Dhm), with and without one-period lagged values of the log consumption-to-wealth ratio (CAY), the log dividend yield (DP), the detrended short-term Treasury bill rate (TB), the term premium (TRM), the default premium (DEF), and their combination. Employment is the seasonally adjusted total nonfarm payrolls of all employees and Dn is nt nt 1, in which nt is the logarithm of employment in quarter t. We report the ordinary least squares estimate of the slopes (Slope), the Newey and West corrected t-statistics (tNW), the adjusted R2s, and the implied R2s calculated from vector autoregressions per Hodrick (1992). The sample is from 1952:Q1 to 2009:Q1 for Panels A to C, from 1977:Q1 to 2009:Q1 for Panels D to F, and from 1952:Q1 to 2005:Q1 for Panels G to H. Forecast horizon in quarters 1
2
4
8
12
16
Panel A: Univariate regressions with Dn Slope 1.54 2.92 6.74 1.60 1.74 2.57 tNW Adjusted R2 0.01 0.02 0.05 Implied R2 0.01 0.02 0.04
7.80 2.05 0.04 0.03
7.40 2.07 0.02 0.02
8.82 2.27 0.03 0.01
Panel B: Bivariate regressions with Dn and CAY CAY, Slope 1.14 2.26 4.34 7.91 CAY, tNW 3.78 3.86 3.96 4.75 Dn, Slope 1.27 2.39 5.81 5.88 Dn, tNW 1.33 1.41 2.21 1.58 AdjustedR2 0.05 0.09 0.18 0.28 Implied R2 0.06 0.11 0.18 0.20
10.42 4.99 4.66 1.30 0.32 0.20
12.05 5.25 5.62 1.38 0.35 0.19
Panel C: Multiple regressions with Dn and all five risk premium DP, Slope 0.03 0.06 0.12 0.20 0.24 DP, tNW 1.85 2.08 2.34 2.63 3.08 CAY, Slope 0.79 1.65 2.94 5.71 8.19 CAY, tNW 2.11 2.51 2.62 3.07 3.50 TB, Slope 0.26 0.56 1.86 4.33 5.94 TB, tNW 0.29 0.38 0.78 1.64 2.19 TRM, Slope 0.52 0.81 2.96 5.04 6.57 TRM, tNW 0.72 0.69 1.79 1.98 2.59 DEF, Slope 2.02 3.47 8.11 13.07 12.42 DEF, tNW 1.27 1.41 1.83 2.51 2.22 Dn, Slope 2.02 3.47 8.11 13.07 12.42 Dn, tNW 1.66 1.47 3.35 3.03 2.93 Adjusted R2 0.05 0.11 0.25 0.41 0.49 2 Implied R 0.09 0.15 0.29 0.36 0.40
proxies 0.27 3.63 9.85 4.00 7.72 2.64 8.26 3.62 9.59 1.44 9.59 3.42 0.55 0.40
Panel D: Univariate regressions with Dh Slope 2.53 0.74 4.57 tNW 1.05 0.18 0.81 Adjusted R2 0.00 0.01 0.00 2 Implied R 0.03 0.02 0.00
0.83 0.14 0.01 0.00
2.36 0.39 0.01 0.00
5.85 0.69 0.00 0.00
Panel E: Bivariate regressions with Dh and CAY CAY, Slope 0.77 1.61 3.53 7.29 CAY, tNW 2.30 2.65 2.99 3.54 Dh, Slope 2.35 1.13 5.87 2.69 Dh, tNW 0.98 0.28 1.08 0.44 Adjusted R2 0.02 0.03 0.11 0.26 2 Implied R 0.06 0.07 0.12 0.16
10.68 3.83 4.31 0.65 0.36 0.21
13.29 4.16 8.14 0.89 0.42 0.22
Panel F: Multiple regressions with Dh and all five risk premium proxies DP, Slope 0.02 0.06 0.11 0.12 0.10 0.09 DP, tNW 0.81 1.64 1.67 1.41 1.22 1.23 CAY, Slope 0.59 0.89 1.97 5.26 9.67 13.66 CAY,tNW 1.24 1.24 1.58 2.62 3.78 4.74 TB, Slope 1.54 0.87 3.38 6.65 7.54 8.45 TB, tNW 1.34 0.53 1.24 2.51 2.54 2.98 TRM, Slope 0.41 0.02 2.81 7.17 8.78 9.13 TRM, tNW 0.48 0.02 1.60 2.91 3.44 3.56
Table 7 (continued ) Forecast horizon in quarters 1
2
4
8
12
16
1.04 0.45 3.22 1.11 0.01 0.10
4.85 1.57 3.61 0.93 0.03 0.13
7.37 1.00 14.92 2.95 0.16 0.15
3.49 0.63 14.48 2.70 0.39 0.24
7.16 1.21 13.43 2.06 0.52 0.35
18.44 2.75 14.90 2.14 0.63 0.37
2.28 1.04 0.01 0.03
0.63 0.26 0.00 0.02
0.98 0.34 0.00 0.02
Panel H: Bivariate regressions with Dhm and CAY CAY, Slope 1.25 2.35 4.14 7.33 CAY, tNW 3.86 3.77 3.74 4.45 Dhm, Slope 0.87 1.63 3.29 1.39 Dhm, tNW 2.00 2.11 2.38 0.69 Adjusted R2 0.06 0.11 0.20 0.22 2 Implied R 0.08 0.14 0.20 0.22
10.33 4.98 0.63 0.27 0.29 0.21
12.40 5.64 0.51 0.18 0.34 0.19
DEF, Slope DEF, tNW Dh, Slope Dh, tNW Adjusted R2 Implied R2
Panel G: Univariate regressions with Dhm Slope 1.02 1.92 3.80 tNW 2.35 2.51 2.65 Adjusted R2 0.02 0.04 0.07 Implied R2 0.03 0.03 0.04
Panel I: Multiple regressions with Dhm and all five risk premium proxies DP, Slope 0.02 0.06 0.12 0.20 0.23 0.27 DP, tNW 1.56 1.85 2.17 2.23 2.66 3.18 CAY, Slope 1.01 1.86 2.98 5.62 8.29 10.25 CAY, tNW 2.67 2.70 2.65 2.94 3.37 4.09 TB, Slope 0.58 0.97 0.79 3.04 4.48 5.58 TB, tNW 0.59 0.62 0.36 1.12 1.51 1.71 TRM, Slope 0.47 0.95 2.78 4.09 5.73 7.32 TRM, tNW 0.63 0.79 1.64 1.58 2.19 2.89 DEF, Slope 1.07 2.60 8.85 13.29 11.63 8.55 DEF, tNW 0.63 0.93 2.20 2.29 1.89 1.18 Dhm, Slope 0.71 1.38 4.80 5.21 4.05 3.88 Dhm, tNW 1.18 1.31 3.08 2.53 2.02 1.84 Adjusted R2 0.06 0.13 0.27 0.34 0.44 0.52 2 Implied R 0.15 0.24 0.33 0.40 0.43 0.42
First, high aggregate risk premiums forecast low payroll growth within two quarters but high payroll growth in subsequent horizons. Second, high aggregate risk premiums forecast high net hiring rates for the overall economy from one to 16 quarters ahead. The evidence suggests that time-to-build, but not time-to-plan, is at work in the aggregate employment and hiring data. However, we also find that high aggregate risk premiums forecast low net job creation rates in manufacturing at the one-quarter horizon but high net job creation rates in the four- and eight-quarter horizons. The evidence suggests two-quarter time-to-plan in the manufacturing sector. Finally, we find that lagged payroll growth and net job creation rate in manufacturing predict market excess returns at business cycle frequencies, but that the net hiring rate for the overall economy does not. Our empirical analysis has implications for the existing labor economics literature. Most of the labor studies that build on the adjustment costs formulation of the labor demand (e.g., Hamermesh, 1996) or on the search and matching framework of Pissarides (1985, 2000) and Mortensen and Pissarides (1994) assume constant discount rates over the business cycles. However, the constant risk premiums cannot forecast future employment growth.
L. Chen, L. Zhang / Journal of Financial Economics 99 (2011) 385–399
Because of their log utility assumption, the general equilibrium models of Merz (1995), Andolfatto (1996), and Gertler and Trigari (2009) are likely to imply low and largely time-invariant risk premiums. As such, their models cannot explain our evidence on the linkages between time-varying risk premiums and labor market performance either. In all, our empirical analysis calls for a deep integration between labor economics and asset pricing. References Andolfatto, D., 1996. Business cycles and labor-market search. American Economic Review 86, 112–132. Barro, R.J., 1990. The stock market and investment. Review of Financial Studies 3, 115–131. Bazdresch, S., Belo, F., Lin, X., 2009. Labor hiring, investment, and stock return predictability in the cross section. Unpublished working paper, University of Minnesota, Minneapolis, MN. Bleakley, H., Ferris, A.E., Fuhrer, J.C., 1999. New data on worker flows during business cycles. New England Economic Review (July/ August), 49–76. Boyd, J.H., Hu, J., Jagannathan, R., 2005. The stock market’s reaction to unemployment news: Why bad news is usually good for stocks. Journal of Finance 60, 649–672. Campbell, J.Y., Shiller, R., 1988. The dividend–price ratio and expectations of future dividends and discount factors. Review of Financial Studies 1, 195–228. Chen, N., 1991. Financial investment opportunities and the macroeconomy. Journal of Finance 46, 529–554. Cochrane, J.H., 1991. Production-based asset pricing and the link between stock returns and economic fluctuations. Journal of Finance 46, 209–237. Davis, S.J., Faberman, R.J., Haltiwanger, J., 2006. The flow approach to labor markets: new data sources and micro–macro links. Journal of Economic Perspectives 20, 3–26. Estrella, A., 2005. Why does the yield curve predict output and inflation? Economic Journal 115, 722–744 Estrella, A., Hardouvelis, G.A., 1991. The term structure as a predictor of real economic activity. Journal of Finance 46, 555–576. Fama, E.F., 1981. Stock returns, real activity, inflation, and money. American Economic Review 71, 545–565. Fama, E.F., French, K.R., 1988. Dividend yields and expected stock returns. Journal of Financial Economics 22, 3–25.
399
Fama, E.F., French, K.R., 1989. Business conditions and expected returns on stocks and bonds. Journal of Financial Economics 25, 23–49. Fama, E.F., Schwert, G.W., 1977. Asset returns and inflation. Journal of Financial Economics 5, 115–146. Gertler, M., Trigari, A., 2009. Unemployment fluctuations with staggered Nash wage bargaining. Journal of Political Economy 117, 38–86. Hamermesh, D.S., 1996. Labor Demand. Princeton University Press, Princeton, NJ. Harvey, C.R., 1988. The real term structure and consumption growth. Journal of Financial Economics 22, 305–333. Hodrick, R.J., 1992. Dividend yields and expected stock returns: alternative procedures for inference and measurement. Review of Financial Studies 5, 357–386. Keim, D.B., Stambaugh, R.F., 1986. Predicting returns in the stock and bond markets. Journal of Financial Economics 17, 357–390. Lamont, O., 2000. Investment plans and stock returns. Journal of Finance 55, 2719–2745. Lettau, M., Ludvigson, S., 2001. Consumption, aggregate wealth, and expected stock returns. Journal of Finance 56, 815–849. Lettau, M., Ludvigson, S., 2002. Time-varying risk premia and the cost of capital: an alternative implication of the Q theory of investment. Journal of Monetary Economics 49, 31–66. Liu, L.X., Whited, T.M., Zhang, L., 2009. Investment-based expected stock returns. Journal of Political Economy 117, 1105–1139. Merz, M., 1995. Search in the labor market and the real business cycle. Journal of Monetary Economics 36, 269–300. Merz, M., Yashiv, E., 2007. Labor and the market value of the firm. American Economic Review 97, 1419–1431. Mortensen, D.T., Pissarides, C.A., 1994. Job creation and job destruction in the theory of unemployment. Review of Economic Studies 61, 397–415. Newey, W.K., West, K.D., 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Pissarides, C.A., 1985. Short-run dynamics of unemployment, vacancies, and real wages. American Economic Review 75, 676–690. Pissarides, C.A., 2000. Equilibrium Unemployment Theory, second ed. MIT Press, Cambridge, MA. Stock, J.H., Watson, M.W., 1989. New indexes of coincident and leading economic indicators. In: Blanchard, O.J., Fischer, S. (Eds.), NBER Macroeconomic Annual. MIT Press, Cambridge, MA, pp. 351–394. Stock, J.H., Watson, M.W., 1999. Business cycle fluctuations in US macroeconomic time series. In: Taylor, J.B., Woodford, M. (Eds.), Handbook of Macroeconomics. Elsevier B.V., pp. 3–64. Yashiv, E., 2000. The determinants of equilibrium unemployment. American Economics Review 90, 1297–1322.
Journal of Financial Economics 99 (2011) 400–426
Contents lists available at ScienceDirect
Journal of Financial Economics journal homepage: www.elsevier.com/locate/jfec
General equilibrium pricing of options with habit formation and event risks$ Du Du Hong Kong University of Science and Technology, Hong Kong
a r t i c l e i n f o
abstract
Article history: Received 18 June 2008 Received in revised form 26 October 2009 Accepted 23 November 2009 Available online 7 September 2010
This paper proposes a general equilibrium model that explains the pricing of the S&P 500 index options. The central ingredients are a peso component in the consumption growth rate and the time-varying risk aversion induced by habit formation which amplifies consumption shocks. The amplifying effect generates the excess volatility and a large jump-risk premium which combine to produce a pronounced volatility smirk for index options. The time-varying volatility and jump-risk premiums explain the observed state-dependent smirk patterns. Besides volatility smirks, the model has a variety of other implications which are broadly consistent with the aggregate stock and option market data. & 2010 Elsevier B.V. All rights reserved.
JEL classification: G01 G12 G13 Keywords: Habit formation Economic disasters Jump-risk premium Volatility smirk
1. Introduction It is well-known that the Black and Scholes (1973, B/S) model cannot explain the observed index option data after the 1987 market crash. The biggest puzzle is the so-called volatility smirk. Options, including ATM options,
$ This paper is based on my dissertation at the University of Chicago. I am very grateful to my committee members, Lars Hansen, Susanne Schennach, and especially Pietro Veronesi, for their encouragement and insightful comments. I am also grateful to an anonymous referee whose constructive suggestions have greatly improved the paper, to Hui Chen, John Cochrane, George Constantinides, Sudipto Dasgupta, Christian B. Hansen, John Heaton, Kenneth Judd, Robert Lucas, Monika Piazzesi, and seminar participants at McGill University, Hong Kong University of Science and Technology, Nanyang Technological University, National University of Singapore, Singapore Management University, University of Chicago, and University of Hong Kong for many helpful comments. Financial support from the Hong Kong RGC Research Grant (HKUST 641208) is acknowledged. Tel.: +852 2358 5049; fax: +852 2358 1749. E-mail address:
[email protected] 0304-405X/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jfineco.2010.09.001
are typically priced at a premium; and the premium is higher for OTM (out-of-the-money) put options than for ATM (at-the-money) options, generating a smirk pattern in the cross-sectional plot of the implied Black-Scholes volatility (B/S-vol) against the options’ moneyness.1 In addition, the smirk patterns tend to vary over changing economic conditions. None of the above empirical observations can be explained by the traditional Black-Scholes model in which the option-implied volatilities are equal to the return volatility of the underlying index which is assumed to be a constant. This paper proposes a consumption-based explanation of both the average and the state dependences of the smirk patterns. I use a representative agent model with
1 Moneyness is defined as the ratio of the strike price of an option contract to the spot price of the underlying asset on which the option is written. Therefore, an ATM option would have the moneyness of one. By convention, OTM and ITM (in-the-money) puts refer to put options with moneyness less than and greater than one, respectively.
D. Du / Journal of Financial Economics 99 (2011) 400–426
401
Data 24 22 20
Black–Scholes implied volatility (%)
18 0.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
Habit formation models
20 MSV CC
15 10 0.9
0.91
0.92 0.93 0.94 0.95 0.96 0.97 0.98 Naik and Lee (1990) with Barro (2006) jump calibration
0.99
1
0.9
0.91
0.92
0.99
1
25 20 15 10 5 0.93
0.94
0.95
0.96
0.97
0.98
Degree of moneyness Fig. 1. Option pricing implied from the data and the previous models. This figure plots the implied volatility smirks for options with 30 days to expiration. The top panel plots the observed volatility smirk for the S&P 500 index option data averaged over the period from April 4, 1988 to September 30, 2008. The middle panel plots the volatility smirks implied from Campbell and Cochrane (1999, CC) and Menzly, Santos, and Veronesi (2004, MSV) under their original calibrations. The bottom panel plots the volatility smirk implied from Naik and Lee (1990) with Barro’s (2006) choice of jump parameters.
non-time-separable preferences and time-varying risk aversion induced by habit formation. Aggregate consumption is exogenous and its instantaneous growth rate follows an i.i.d. lognormal process subject to a smallprobability jump. The jump component models the rare economic disasters which strike at a constant intensity. Closed-form valuation is derived for the aggregate stock as the claim to the aggregate consumptions, and index options are proxied by options written on the aggregate stock. Within the representative agent framework, the model nests as special cases in both habit formation models (e.g., Campbell and Cochrane, 1999, CC; the aggregate model of Menzly, Santos, and Veronesi, 2004, MSV) and ‘‘peso problem’’2 models (e.g., Naik and Lee, 1990; Barro, 2006). The main mechanism of the model is as follows. At the presence of habit formation, risk aversion reacts negatively to changes in the aggregate consumption creating an extra channel by which consumption innovations induce excess innovations in the stock market beyond
2 The term of ‘‘peso problem’’ is attributed to Milton Friedman’s comments about the effects of the infrequent but disastrous events on the Mexican peso market in the early 1970s.
and above those due to cash-flow innovations. Excess stock return innovations take two forms: excess diffusive volatility and the excess jumps. First, excess volatility generates high ATM prices which determine the levels of the observed volatility smirk. Second, positive riskaversion jumps induced by the negative consumption jumps raise the marginal utility which makes stock market crashes particularly unpleasant. This effect, combined with the amplifying return jumps, generates a large jump-risk premium. Third, options with varying moneyness are sensitive to the potential jumps in a variety of ways. In particular, deep OTM puts are much more sensitive to market crashes than ATMs, and hence, bear higher levels of jump-risk premiums. Taken together, excess volatility and a large compensation for jump risks combine to generate the pronounced smirk pattern observed in the data which is plotted in the top panel of Fig. 1, where I use the data of S&P 500 index options with 30 days to expiration for the period from April 1988 to September 2008. The model also predicts that the smirk premium, measured as the price (quoted in B/S-vol) difference between 10% OTM puts and ATMs, is decreasing in the underlying volatility. Except for very bad states which are unlikely to occur, both volatility and jumps in my model
402
D. Du / Journal of Financial Economics 99 (2011) 400–426
are negatively correlated with the surplus ratio, S, which serves as the state variable in habit formation models and rises as the economy gets better. An increase of S has two offsetting effects on the implied smirk premium. First, it decreases jumps, hence the lower premiums assigned to options for the given volatility. Second, it decreases volatility which increases the relative importance of jumps in the total stock price variations. As the result, the agent has stronger incentives to buy insurance from OTM puts against potential market crashes, which drives up the smirk premium for the given jumps. The second effect dominates under my calibration, which leads to a rising smirk premium as the volatility decreases. The data largely support this prediction. In calibrating the model, I first drop the jump component and match the resulting consumption process to the aggregate consumption data over period with no documented economic disasters. The preference and the jump parameters are then calibrated to the financial markets. At the presence of habit formation, the implied jumps strike once every 45 years which cause a 15.8% drop in the aggregate consumption. The 15.8% consumption jumps, which still look very large compared to the maximum 9.9% annual consumption contraction during the Great Depression, are much more in line with the historical observations than those implied from peso problem models with standard CRRA preference (e.g., Barro, 2006) in which highly severe jumps in consumption or GDP (37% used by Barro) are required to rationalize the observed equity premium. Quoted in B/Svol, the calibrated model yields an average 17.0% ATM price and an average 10.4% smirk premium, measured by the price difference between 10% OTM puts and ATMs. Both numbers match their data counterparts of 16.4% and 9.0%, respectively, as plotted in the top panel of Fig. 1. The calibrated model also fits various other pricing features of both options and stocks besides the volatility smirk. It matches the aggregate stock market behavior including the low interest rate, the high equity premium, and the price–dividend ratio. Conditioned on the medium economic state, it replicates the decreasing term structure for deep OTM puts and the increasing term structure for ATMs. By feeding the model with the historical data of either the aggregate consumption or the model-free realized volatility, the implied price–dividend ratios account for much of the fluctuation in historical stock prices. I further show that the model’s main implications are robust to (i) alternative habit specifications; (ii) randomness in the consumption jump size. In its focus on explaining the option volatility smirk from an equilibrium setting, this paper closely relates to Liu, Pan, and Wang (2005, LPW), and Benzoni, CollinDufresne, and Goldstein (2007, BCDG). Starting from Naik and Lee (1990), LPW add an additional layer of uncertainty aversion to rare events which gives rise to the rareevent premium, and show that a significant portion of the pronounced smirk pattern is attributable to the varying degrees of rare-event premiums implicit in options. BCDG expand on Bansal and Yaron’s (2004) insight by considering the Epstein and Zin (1989) preference, which separates the elasticity of intertemporal substitution
(EIS) from risk aversion, by specifying the expected consumption growth rate to be driven by a persistent component, x, which is subject to jumps. When EIS is larger than one, a downward jump in x induces both an upward jump in the marginal utility and stock market crash, hence, the premium paid to OTM puts which deliver insurance against jumps. In both BCDG and my model, the diffusive and the jump component of the consumption process are essentially priced in the same way, and a pronounced smirk pattern is generated through a large compensation for jumps which is due to habit formation in my model and due to the persistence of x in BCDG. This is in contrast to LPW who emphasize the importance of differently pricing jump risks and diffusive risks by adding an extra layer of uncertainty toward jumps but not toward diffusions. The reason is that the Naik-Lee model, with the standard CRRA preference, is unable to generate a sizable jump-risk premium after being calibrated conditional on the ‘‘excess volatility puzzle’’ (see Section 6.1 for more detailed discussion). Unlike my model, neither LPW nor BCDG can explain variations of smirk premiums across economic states. This is natural for the LPW model which does not have a state variable. While x serves as the state, BCDG report that its variations lead to virtually no changes in option prices. My paper thus provides the first model that accounts for the state-dependent smirk patterns from an equilibrium setting. My model for habit formation builds on the work of CC and MSV, which are well-known for explaining a wide variety of stock pricing phenomena. More recently, researchers have successfully applied their models to explain the pricing of various other assets as well (e.g., Wachter, 2006, and Buraschi and Jiltsov, 2007 for defaultfree bonds; Chen, Collin-Dufresne, and Goldstein, 2008 for defaultable bonds). To check their implications for derivatives, I plot in the middle panel of Fig. 1 the volatility smirks of options with 30 days to expiration implied from CC and MSV under their original calibrations. Unlike what is implied from the data, the ‘‘smirk’’ implied from CC is both inverted and severely underpriced.3 MSV generate a ‘‘normal’’ smirk pattern, but the implied smirk premium is less than 3% which is clearly off compared to its data counterpart. I also tried different calibrations for both models but still failed to match the observed smirk pattern. In addition, as I search parameter values for a better match of option prices, other important properties of these models, such as the replication of the average equity premium, decrease. These results prompt the motivation of resorting to a peso explanation for options. My model also builds on the work of Rietz (1988) and Barro (2006) who advocate the peso explanation for the aggregate stock market behavior. Using the discrete-time analog of the Naik-Lee model with constant jump sizes,
3 The implied volatilities from CC are cut for small moneyness because they do not exist for moneyness less than 0.97. As pointed out by Wachter (2006), the unconditional stock return volatility implied from CC is actually around 8% which is only about half of what CC report. My numerical results are very close to Wachter’s result.
D. Du / Journal of Financial Economics 99 (2011) 400–426
Barro (2006) recently shows that rare but very severe disasters in the form of 30–40% consumption or GDP jumps, which are calibrated to the sharp economic contractions in various countries during the twentieth century, can rationalize the high U.S. equity premium. However, his calibration seems too severe for the U.S. data, which is also inconsistent with option prices. To see it, I plot in the bottom panel of Fig. 1 the implied volatility smirk from Naik and Lee (1990) under Barro’s jump calibrations. The generated 22% smirk premium is apparently too high compared to the 9% premium in the data, which is partly due to the severely underpriced ATMs. Introducing habit formation is thus essential for a peso problem model to match the observed smirk pattern with more reasonable jump calibrations. Finally, my paper relates to the literature of using reduced-form models to explain index option prices, and examples include Heston (1993), Bakshi, Cao, and Chen (1997), Bates (2000), Duffie, Pan, and Singleton (2000), Pan (2002), and Eraker (2004), etc. While these models tend to match well the observed option pricing dynamics, their successes only raise the question of what economic mechanism is at work. For example, Pan (2002) points out the importance of a time-varying jump-risk premium for reconciling the dynamics implied from both the time-series and the cross-sectional options data. In her model, time variations in such a premium are exogenously imposed by assuming a jump intensity proportional to the diffusive volatility. In contrast, the comovements of jump intensity (under the risk-neutral measure) and volatility are endogenous in my model which provides an economic story for the time-varying jump-risk premium: the time-varying risk aversions driven by consumption shocks give rise to variable compensations for bearing event risks. The remainder of the paper is organized as follows. Section 2 presents the model setup and derives the implied equilibrium asset prices. Section 3 gives the detailed examination about the mechanism of the model. Section 4 talks about the identifiability of the model, the calibration methodology, and the results of the calibration. In Sections 5 and 6, I present the model’s fit of the data and check the robustness of the main results. I also discuss in Section 6 whether different pricing of diffusive and jump risks is crucial for option pricing. Section 7 concludes the paper. Technical details are provided in Appendix A. 2. The model and the equilibrium prices 2.1. The setup
rate. As in both CC and MSV, I adopt the external habit formation specification in that the agent’s habit level is determined by the aggregate consumption rather than by his own consumption in the past. Since habit is external, the local curvature of the utility function, gt , which measures the agent’s instantaneous risk-aversion degree, is given by
gt
Ct ucc ðCt ,Ht Þ Ct 1 ¼ ¼ , uc ðCt ,Ht Þ St Ct Ht
ð2Þ
where St ðCt Ht Þ=Ct denotes the surplus consumption ratio. St is procyclical in that a low S implies a bad economic state, under which Ct is close to Ht. As result, gt is countercyclical implying that the agent is more risk averse during the bad states. In particular, g tends toward infinity as S goes to zero. Due to their monotone relation, both gt and St serve as the state variable in the model, whose variations naturally translate into the corresponding variations in the prices and returns of the financial assets. Following MSV, I impose the stochastic structure on gt , which implicitly characterizes the dynamics of habit persistence. Analogously to both CC and MSV, I assume that gt follows a mean-reverting process, perfectly negatively correlated with innovations in log consumption, i.e., dgt ¼ kðg gt Þ dtaðgt bÞðdct Et ½dct Þ,
ð3Þ
where g is the long-run average of the agent’s risk aversion; k controls the speed of mean reversion; ct denotes the log consumption; a 40 captures the sensitivity of gt to consumption innovations; b Z1 sets the lower bound for gt , and hence, the upper bound for St. I deviate from MSV by assuming that the growth rate of the log consumption evolves as a random walk subject to a small-probability negative jump, i.e., dct dlogðCt Þ ¼ m1 dt þ sdBt þ bdN t ,
ð4Þ
where Bt is a standard Brownian motion; Nt is a Poisson process capturing the random arrival of a smallprobability economic disaster with the constant intensity l; b o 0 denotes the jump size of ct: upon the occurrence of the i-th disaster at ti , log consumption jumps from cðti Þ to cðti Þ þb. Following Naik and Lee (1990), I assume b is normally distributed with mean mb ð o 0Þ and variance s2b . By Ito’s lemma with jumps (e.g., Duffie, 2001, Appendix F), dC t ¼ mdt þ sdBt þ JC dN t , Ct
ð5Þ
where JC eb 1 denotes the consumption jump size;
Time is continuous and infinite, and the uncertainty is represented by a complete probability space ðO,F ,PÞ and an information filtration ðF t Þt Z 0 , where F t denotes the information set observed up to period t. A representative agent in the economy maximizes Z 1 Z 1 E ert uðCt ,Ht Þ dt ¼ E ert lnðCt Ht Þ dt , ð1Þ 0
403
0
where Ct denotes the aggregate consumption; Ht denotes the habit level and r denotes the subjective time-discount
m m1 þ 12 s2 . Following numerous previous papers, I treat the aggregate dividend (cash flow) and the aggregate consumption as a single process. In exercises that are not reported, I find that the model implications are insensitive to alternatively assumed dividend processes. By substituting out the consumption innovation, the gt process can be rewritten as: dgt
gt
¼ mgt dt þ sgt dBt þ Jgt dN t ,
ð6Þ
404
D. Du / Journal of Financial Economics 99 (2011) 400–426
where 8 g gt g b > > mgt ¼ k þa t m l, > > > g gt b t > > > < g b sgt ¼ a t s, gt > > > > þ > g > > Jgt t 1 ¼ agt bb 40, > : g g t
ð7Þ
where g denotes the value of g an instant after the occurrences of jumps. Since gt 4 b, signðsgt Þ ¼ signðsÞ and signðJgt Þ ¼ signðeb 1Þ, implying a negative consumption innovation, whether driven by diffusions or by jumps, leads to a positive innovation in the agent’s risk-aversion degree, or equivalently, a negative innovation in the surplus ratio S. In addition, jsgt j and jJgt j are both increasing in gt , implying that the agent’s risk aversion is more volatile when he becomes more risk averse. 2.2. Equilibrium prices of stocks and options With external habit formation, the pricing kernel Lt in the economy is given by
gt Ct
:
ð8Þ
Eq. (8) implies that the pricing kernel is determined by both the aggregate consumption C denoting the economic fundamentals, and by the representative agent’s risk aversion g which represents the market sentiment. As widely acknowledged, both economic fundamentals and market sentiment are important for asset prices. By Ito’s lemma with jumps, dLt
Lt
¼ mLt dt þ sLt dBt þ JLt dNt ,
where 8 > mLt ¼ rm þ s2 þ mgt ssgt , > > > < s ¼ s s, gt Lt þ > L > t > 1 ¼ eb ðJgt þ 1Þ1 40: > : JLt
ð9Þ
ð10Þ
Lt
sLt and JLt are sometimes referred to as the price of the diffusive risk and the price of the jump risk, respectively (e.g., Dai and Singleton, 2003). From (10), a negative consumption jump leads to an amplified positive jump in the pricing kernel through the induced positive jump in the agent’s risk aversion g, which characterizes the agent’s overreaction to jumps in economic fundamentals. Given b, Jgt and hence, JLt is larger for higher g. Intuitively, a given negative consumption jump causes more pain during bad states than during good states, which is reflected in the different magnitudes of jumps in the agent’s marginal utility. Denote by Pt the equilibrium price of the aggregate stock at period t, which is by definition Z 1 Ls Pt ¼ Et Cs ds : ð11Þ t
ð12Þ
Proof. See the Appendix.
t
þ t
Lt ¼ ert uc ðCt ,Ht Þ ¼ ert
given by 8 Pt 1 > > ¼ a1 þ a2 ¼ a1 þ a2 St , > Xt < Ct gt 1 kg > > > : with a1 ¼ r þ k 4 0; a2 ¼ rðr þ kÞ 4 0:
Lt
Proposition 1 extends Proposition 1 in MSV to the case where the consumption process is subject to a smallprobability jump. Whereas the price–consumption ratio is a constant in models with the standard CRRA preference (e.g., Naik and Lee, 1990), it is procyclical in habit formation models. Intuitively, a bad economic state or a low S is associated with a high risk aversion, which depresses the stock price relative to its dividend. At period t the equilibrium price of a put option written on the aggregate stock is by definition: Lt þ t Ot ¼ Et maxðKPt þ t ,0Þ , ð13Þ
Lt
where K denotes the strike price; t denotes the time to maturity. Unlike stock price, option prices cannot be derived in closed form. I therefore simulate a large number of the Ot-realizations and use their average as the option price implied from the model. Given Ot, the implied Black-Scholes volatility (B/S-vol) is computed as B=S volt ¼ BSC 1 ðt,K,Ot ,rt,t þ t ,CP t,t þ t Þ,
ð14Þ
1
where BSC is the inverse of the Black-Scholes formula for the put option, inverted over the argument s; rt,t þ t and CP t,t þ t are the interest rate and the dividend–price ratio over the period of ½t,t þ t. Following the convention, I quote option prices in terms of B/S-vol in the following analysis. To use the common metrics for comparing option prices, I fix rt,t þ t and CP t,t þ t at 5% and 3% throughout the paper.4 3. The mechanism of the model This section examines why and how the model works. Section 3.1 studies the implied stock return dynamics under both the actual measure and the risk-neutral measure. Section 3.2 examines the effects of consumption jumps on the implied option prices. Next, I discuss in Section 3.3 the main mechanism of the model for option pricing: a large jump-risk premium driven by jumps in the marginal utility and the excess stock price jumps. I comment in Section 3.3 that the endogenized leverage effect also contributes to a pronounced smirk pattern. Finally, Section 3.4 talks about the state dependences of the implied smirk patterns. To focus on the central intuitions, I consider a simplified version of the model, referred to as the base case, in which the consumption jump size is a constant, i.e., sb ¼ 0, and b ¼ mb . Unless
The following proposition gives the closed-form for Pt. Proposition 1. Under the assumed consumption ( = dividend) process of (5) and the habit formation dynamics characterized by (6)–(7), the price–consumption ratio is
4 In the model, both r and CP are time-varying. Since my analysis focuses on comparing the option prices implied from the model vs. from the data, the results are unaffected so long as the common r and CP are used.
D. Du / Journal of Financial Economics 99 (2011) 400–426
indicated otherwise, the following discussions are based on the base-case calibration presented in Table 2. 3.1. Stock return dynamics Under the actual and the risk-neutral measure, the stock return evolves according to dPt Q ¼ ðrf t þ EPt CP t Þ dt þ sRt dBt þJPt dN t lJPt dt, Pt
ð15Þ
and dPt Q ¼ ðrf t CP t Þ dt þ sRt dBt þ JPt dN t lt JPt dt, Pt
ð16Þ
respectively, where dBQ t denotes the standard Brownian under the risk-neutral measure; dNt denotes the Poisson Q process satisfying Et ðdN t Þ ¼ ldt and EQt ðdN t Þ ¼ lt dt, where Q Et (.) denotes the expectation under the risk-neutral Q measure; CPt, rft, EPt, sRt , JPt, and lt denote, respectively, the dividend–price ratio, the risk-free rate, the equity premium, the diffusive return volatility, the stock price jump size, and the jump arrival intensity under the riskneutral measure, which are all functions of the state S. The model thus captures two important features of the stock return dynamics: stochastic volatility and price jumps. Since consumption jump size is a constant under the base case, JPt = Et (JPt)= Et Q(JPt), and the return process varies between the two measures in terms of the average stock return and the jump intensity. The following proposition summarizes the closed forms for variables related to the above two processes. Proposition 2. In the benchmark case with a constant consumption jump size, the consumption–price ratio CPt, the risk-free rate rt, the equity premium EPt, the diffusive return volatility sRt , the stock price jump size JPt, and the jump Q intensity under the risk-neutral measure lt , are given by 1 1 , ð17Þ CP t ¼ a1 þ a2
gt
rf t ¼ mLt lJLt ,
ð18Þ
EP t ¼ sLt sRt lJLt JPt ,
ð19Þ
sRt ¼ sa2 JPt
St sgt , Xt
Ptþ St 1 1 ¼ eb 1 þ a2 1 1, Pt Xt 1 þ Jgt
lQt ¼ lðJLt þ 1Þ,
ð20Þ
ð21Þ
ð22Þ
where St and gt denote the surplus ratio and the risk aversion, respectively; mgt , sgt , Jgt , mLt , sLt , JLt , Xt, and a2 are given by (7), (10), and (12). Proof. See the Appendix. Since JLt 4 0 and JPt o 0, Eqs. (18) and (19) imply that a peso component decreases the (real) risk-free rate and increases the equity premium. These results are well established in the peso problem literature (e.g., Rietz, 1988; Barro, 2006), and the intuition is that the
405
precautionary saving prompts the risk-averse agent to shift his investment from the risky stock to the risk-free asset at the presence of jump risk. The top left panel of Fig. 2 plots CPt, rt, and EPt as the function of the surplus ratio S, where we see that the implied CPt and rt are both countercyclical (since S is procyclical). While the equity premium is also countercyclical for large S, it becomes procyclical for low values of S. As explained by MSV, the volatility of S must vanish as S-0 in order to prevent negative marginal utility. This results in a decrease in the diffusive return volatility, and hence, in a decrease in the equity premium. Unlike Naik and Lee (1990) in which the diffusive return volatility sR equals the diffusive consumption volatility s, habit persistence adds to sR an extra term, a2 ðSt =Xt Þðsgt =gt Þ, which is positive since a2 4 0 and sgt o0. Intuitively, a negative consumption diffusion translates into a positive diffusion in the stock price through two channels at the presence of habit formation: (i) the usual channel of the negative innovation in the cash flow; and (ii) the extra channel of the positive innovation in the agent’s risk aversion, leading to the negative innovation in the price–consumption ratio. To show their relative importance, I plot in the top right panel of Fig. 2 the state-dependences of sRt and a2 ðSt =Xt Þðsgt =gt Þ under the base-case calibration, where their difference is attributed to the cash-flow innovation equaling s. The extra channel, which is missing in Naik and Lee (1990), is clearly the key to generate the observed excess return volatility conditional on no jumps. The bottom left panel of the same figure plots together lQt and the absolute JPt as the function of S. Since JLt 4 0, Q Eq. (22) implies that lt 4 l, i.e., jumps are viewed as more intensive under the risk-neutral measure than under the Q actual measure. In addition, lt is state-dependent, driven by the state-dependent jump size in the pricing kernel, which is decreasing in S. Intuitively, as the agent becomes more risk averse for a smaller S, he views the arrivals of jumps more probable under the risk-neutral measure. On the other hand, jJPt j decreases in S for large S, implying a less severe stock market crash as the economy gets better. For the same technical reason as that for EPt and sRt , jJPt j becomes increasing in S for small S. In contrast to the reduced-form models (e.g., Duffie, Pan, and Singleton, 2000) in which the stock price jump size has a timeinvariant probability distribution, JPt under the base case is fully observed conditioned on the information set F t , and it varies across time with the evolution of the state S. Taking into account the contribution of jumps, the stock return volatility in the base model, denoted by volRt under the actual measure and by volRQ t under the riskneutral measure, is qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 8 2, > < volRt ¼ s2Rt þ lJPt qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi > : volRQt ¼ s2 þ lQt J2 , Rt Pt
ð23Þ
where sRt denotes the diffusive volatility given by (20). The bottom right panel of Fig. 2 plots together the statedependences of sRt , volRt, and volRQ t . The peso component further raises the excess volatility generated from Q Q the model. Since lt 4 l and lt is decreasing in S,
406
D. Du / Journal of Financial Economics 99 (2011) 400–426
0.3 Consumption–price ratio Risk–free rate Equity premium
0.2
0.25
0.15
0.2
0.1
0.15
0.05
0.1
0
0.05 0.01
0.02
0.03
0.04
σR Due to variations in γ
0.01
Surplus ratio
0.02
0.03
0.04
Surplus ratio
0.6
0.35
0.5
0.3 λQ
0.4
0.25
|JP|
0.3
0.2
0.2
0.15
σR volR volRQ
0.1
0.1
0.05 0.01
0.02 0.03 Surplus ratio
0.04
0.01
0.02 0.03 Surplus ratio
0.04
Fig. 2. Stock return dynamics. This figure was plotted using the base-case calibration reported in Table 2. The top left panel plots theconsumption–price ratio, the risk-free rate, and the equity premium as functions of the surplus ratio S. The top right panel plots the diffusive return volatility, sR , and its component due to the variations in the agent’s risk aversion g, as functions of the state S. The bottom left panel plots the state-dependences of the jump Q arrival intensity under the risk-neutral measure, l , and of the absolute stock price jump size, jJP j. The bottom right panel plots together the state dependences of the diffusive volatility, sR , the return volatilities taking into account the contribution of jumps, volR, and volRQ, which are under the actual measure and the risk-neutral measure, respectively.
Q
volR 4 volR, and their difference decreases as the economy gets better.
3.2. The effect of consumption jumps To illustrate the impacts of consumption jumps on the generated volatility smirk, I plot in the top two panels of Fig. 3 changes of the average option prices implied from the model in response to the changes of jmb j and l denoting, respectively, the absolute log consumption jump size and the jump intensity under the actual measure, where all other parameters are at their basecase levels reported in Table 2. I consider options under three degrees of moneyness: M =0.9 or 10% OTM puts, M =0.95 or 5% OTM puts, and M= 1 or ATMs. The smirk premiums, measured as the price differences between OTM puts and ATMs, rise rapidly with both jmb j and l. To illustrate the reason, I plot in the bottom left panel of the same figure the jump intensity under the risk-neutral Q measure, l , and the absolute stock price jump size, jJP j, as functions of jmb j conditioned on gt ¼ g , where we see that Q both l and jJP j are increasing in jmb j. Intuitively, more severe consumption jumps induce more severe jumps in
both the stock market and the pricing kernel,5 and the latter effect implies that jumps are viewed as more intensive under the risk-neutral measure. l works in a similar way except that it does not directly affect jJP j. Since OTM puts, in particular deep OTM puts, provide excellent hedges against stock market crashes, the agent, when facing a more severe peso problem situation, has stronger incentives to shift his investment to these options, which drive up their prices in equilibrium. The increases in smirk premiums are partly due to the slight decreases of the ATM prices as potential jumps become more severe. To explain it, I plot in the bottom right panel of Fig. 3 the frequency of the surplus ratio S in its stationary region under the benchmark calibration and under the otherwise identical calibration but with no jumps. Given g , the peso component adds a negative compensation term, aððgt bÞ=gt Þmb l, to the drift of the
5 In my model, as in the usual consumption-based peso problem models, jumps of the aggregate consumption (or GDP) and jumps in the stock market occur simultaneously. In the real world, however, consumption jumps and stock market crashes do not always happen at the same time, a phenomenon whose explanation is beyond the scope of this paper.
28
Black–Scholes implied volatility (%)
Black–Scholes implied volatility (%)
D. Du / Journal of Financial Economics 99 (2011) 400–426
26 M=0.9 M=0.95 M=1
24 22 20 18 0
0.05
0.1
0.15
407
30 28 26
M=0.9 M=0.95 M=1
24 22 20 18
0.2
0
Absolute log consumption jump size
0.01
0.02
0.03
0.04
Jump intensity under the actual measure Frequency of the surplus ratio S
0.6
2 frequency of S (%)
0.5 Q
λ |JP|
0.4 0.3 0.2
No jump With jump
1.5
1
0.5
0.1 0
0 0
0.05
0.1
0.15
Absolute log consumption jump size
0.2
0.01
0.02
0.03
0.04
Surplus consumption
Fig. 3. Effect of consumption jumps. The top two panels plot the changes of the average prices of options with 30 days to expiration in three degrees of moneyness in response to the changes of jmb j and l denoting, respectively, the absolute log consumption jump size and the jump intensity under the Q actual measure. The bottom left panel plots the jump intensity under the risk-neutral measure, l , and the absolute stock price jump size, jJP j, as functions of jmb j conditioned on gt ¼ g . The bottom right panel plots the frequency of the surplus ratio S in its stationary region under the base-case calibration and under the other identical calibration but with no jumps. Unless otherwise indicated, parameters are fixed at their base-case levels reported in Table 2.
gprocess, as is shown in (2.5), which shifts the mass of the realized g to the left, and hence, shifts the mass of the realized S to the right. On the other hand, ATM prices, which are tightly linked to the diffusive return volatility sR , are decreasing in S, except for very low values of S which are unlikely to realize. The change of the S-distribution thus leads to the decreases in ATM prices. 3.3. Excess stock price jumps and leverage effect Since Jgt , a2 40, Eq. (21) implies that JPt oemb 1 ¼ JC o 0. Hence, unlike Naik and Lee (1990) in which JP =JC, jumps in the stock market are more severe than consumption jumps in my model. Similar to the discussions on excess diffusive volatility, negative consumption jumps at the presence of habit formation induce negative stock price jumps through two channels: (i) the usual channel of negative jumps in the cash flow; (ii) the extra channel of the positive jumps in the agent’s risk aversion, which translates into the negative jumps in the price–dividend ratio. Empirical evidence seems to support the implication of excess stock price jumps. For example, Liu, Pan, and Longstaff (2005) report that there are two major crashes in the U.S. stock market during the past 100 years: one is from mid-October to mid-November in 1929, and the other is the black Monday of October 19,
1987, when the Dow index fell by 44% and 23%, respectively. While these falls are accumulated within only one month or even one day, they are much more dramatic than the maximum 9.9% annual consumption contractions observed during the Great Depression. To see which parameters are important to excess stock price jumps, I plot in the top left panel of Fig. 4 changes of the average stock price jump size in absolute value, jmeanðJPt Þj, in response to changes of g , b, and jmb j, where the x -axis plots the percentage changes of the parameters relative to their base-case levels reported in Table 2. For example, 10% for g means a value of g equaling 34 ð1 þ 10%Þ ¼ 37:4. The effect of jmb j is intuitive. As the agent becomes more risk averse, on average, a given consumption jump induces a more dramatic jump in the agent’s risk aversion, hence, the higher extra jump in the stock market. The effect of b is opposite to that of g , which is largely mechanical. By comparison, mean(JPt) are relatively insensitive to other parameters. For completeness, I also plot in the bottom left panel of the same figure the average jump Q intensity under the risk-neutral measure, meanðlt Þ, in response to changes in the same three parameters. When the agent becomes more risk averse, or when consumption jumps are more severe, stock price jumps are viewed more intensively under the risk-neutral measure. Except for l, Q other parameters have only marginal effects on meanðlt Þ.
408
D. Du / Journal of Financial Economics 99 (2011) 400–426
12
0.55
Smirk premium (%)
The absolute stock price jump size
Jump–risk premium vs. smirk premium
0.5 0.45 0.4
–30
–20
–10
0
10
20
10 8 6
30
0
Percentage changes relative to benchmark levels
1
2
3
Jump–risk premium (%)
Black–Scholes implied volatility (%)
Jump arrival intensity under the risk–neutral measure
Impact of excess price jumps
0.08 0.07 0.06 0.05
–30
–20
–10
0
10
20
30
Percentage changes relative to benchmark levels
Base case JP=JC
26 24 22 20 18 16 0.9
0.92
0.94
0.96
0.98
1
Moneyness
Fig. 4. The effect of the excess stock price jumps and the leverage effect. The left two panels plot the average stock price jump size in absolute value, Q jmeanðJPt Þj, and the average jump intensity under the risk-neutral measure, meanðlt Þ, respectively, as functions of g , b, and jmb j, where the x-axis plots the percentage changes of g , b, and jmb j relative to their base-case levels. The top right panel plots the average smirk premium in response to changes in the average jump-risk premium as we vary the consumption jump size. The bottom right panel plots the model-implied volatility smirk for options with 30 days to expiration under two scenarios: the base-case scenario, and the otherwise identical scenario but with no excess jumps in the stock market, i.e., JP = JC.
At the presence of excess stock price jumps, the model generates a large jump-risk premium which is given by Q
Q
EP j,t ¼ lEt ðJPt Þlt EQt ðJPt Þ ¼ ðllt ÞJPt ,
ð24Þ
where the second equality follows from the assumption that the consumption jump size is a constant.6 The positive jumps in the pricing kernel due to the negative consumption jumps create a wedge in the actual and the risk-neutral stock price jumps, which is captured by the implied jump-risk premium. From (3.8), a more dramatic jump in the stock price is compensated with a higher premium implying a wider wedge in the jump component of the stock return process viewed under the two measures. For equilibrium models advocating the peso problem story, this wedge measured by jump-risk premium is the key to their success in explaining the pronounced smirk pattern in the data. To show it, I vary the consumption jump size and plot in the top right panel of Fig. 4 the implied jump-risk premiums together with the implied smirk premiums, measured as the price Q
6 Since lt ¼ lðJLt þ 1Þ, substituting for JLt in (24) from (22) yields the jump-risk premium of the form lJLt JPt , which is the second term in the expression for the equity premium given by (19).
differentials between 10% OTM puts and ATMs. The positive relation is clear: as the compensation for jump risks increases from zero to 4%, the smirk premium increases monotonically from 4.2% to 13.4%. Given the link between the jump-risk premium and the smirk premium, excess price jumps are important for the model to generate a pronounced smirk pattern. To illustrate it, I plot in the bottom right panel of same figure the implied volatility smirk for options with 30 days to expiration under both the base-case scenario and the otherwise identical scenario but with JP = JC, where JC denotes the (constant) consumption jump size. The smirk pattern is much flatter in the latter scenario with the smirk premium dropping from 10.4% to 6.1%, which is associated with the drop of the jump-risk premium from 2.46% (accounting for 41.8% of the total premium) to 0.72% (accounting for 17.4% of the total premium) when we shut down the excess jumps in the stock market. In addition to price jumps, stochastic volatility is also important for option pricing, as well-documented in the literature with reduced-form models (e.g., Heston, 1993; Bakshi, Cao, and Chen, 1997). Under the MSV habit specification, volatility (both the diffusive and the jump component) is decreasing in the surplus ratio S except for small values of S that are unlikely to realize. On the other
D. Du / Journal of Financial Economics 99 (2011) 400–426
409
the observed volatility smirk; the latter, combined with the positive jump in risk aversion, generates a large jumprisk premium which helps match the size of the observed volatility smirk determined by the smirk premium. The endogenized leverage effect also contributes to a pronounced smirk pattern. Both the excess volatilities and excess jumps are missing under the CRRA preference.
hand, (2.9) implies that the price–dividend ratio is increasing in S. Combining the two relations, stock returns in my model typically move in opposite directions with changes of volatility, which is consistent with the empirical observations (e.g., Black, 1976). As first shown by Heston (1993), this negative correlation, sometimes referred to as the ‘‘leverage effect’’, creates a fat left tail in the risk-neutral distribution of the stock returns, which makes OTM puts more expensive than ATMs. The leverage effect explains why my model still generates a sizable smirk premium when the jump-risk premium drops to zero, as shown in Fig. 3. The above analysis illustrates the mechanism of the model for matching the average smirk pattern observed in the data, which also highlights the difficulty for equilibrium models with the standard CRRA preference (e.g., Naik and Lee, 1990) to explain option prices: the negative correlation between risk aversion and the aggregate consumption due to habit formation creates an extra channel by which consumption innovations induce excess innovations in the stock market beyond and above those induced by the cash-flow innovations. The excess stock price innovations take two forms: the excess diffusive volatility and the excess stock price jumps. The former generates the high ATM prices which matches the level of
3.4. State-dependences An important advantage of habit formation models is that they have implications about asset prices depending on economic states. To examine the effects of statedependences on option prices, I simulate the risk-neutral stock return processes under three scenarios: (i) the basecase scenario; (ii) the otherwise identical scenario but with the diffusive volatility, sR , fixed at its average level; (iii) the otherwise identical scenario but with the jump Q intensity, l , and the stock price jump size, JP, both fixed at their average levels. The implied average volatility smirks for options with 30 days to expiration under (i)– (iii) are plotted together in the top panel of Fig. 5. Imposing a constant sR largely shuts down the leverage effect, which results in an apparently less pronounced
Black–Scholes implied volatility (%)
Volatility smirk Base case Constant σR
26
Constant JP and λQ
24 22 20 18 0.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
Moneyness State–dependent smirk premium 16
Smirk premium (%)
14 12
Base case Constant σR
10
Constant JP and λQ
8 6 4 0.02
0.025
0.03
0.035
0.04
0.045
Surplus ratio Fig. 5. The effect of the state-dependent volatility and jump. The top panel plots the model-implied volatility smirk for options with 30 days to expiration under three scenarios: (i) the base-case scenario; (ii) the otherwise identical scenario but with the diffusive volatility, sR , fixed at its average level; and Q (iii) the otherwise identical scenario but with the jump intensity, l , and the stock price jump size, JP, both fixed at their average levels. The bottom panel plots the state-dependences of the smirk premium, measured as the price differentials between 10% OTM puts and ATMs, under the same three scenarios.
410
D. Du / Journal of Financial Economics 99 (2011) 400–426
smirk pattern. In contrast, the implied average volatility smirk is virtually unchanged when we ignore the statedependences in jumps. In the bottom panel of the same figure, I plot the statedependences of the smirk premium under the same three scenarios. I ignore very small values of S which are unlikely to realize but tend to cause different state-dependences for pure technical reasons. For relatively large S, an increase of the surplus ratio has two offsetting effects on the implied Q smirk premium. First, it decreases both l and jJP j which gives the agent less incentives to hedge against jumps, hence, the lower option premium for the given sR . Second, it decreases sR which increases the relative importance of jumps in the total stock price variations. As result, the agent has stronger incentives to buy insurance from OTM puts against jumps which drives up the smirk premium for Q the given l and JP. The second effect dominates under the base-case calibration, which leads to a rising smirk premium as the economy gets better.
4. Model calibration 4.1. Can we identify all parameters? There are a total of nine parameters in the benchmark model: fm, s, r, g , b,k, a, l, mb g. m and s characterize the evolution of the aggregation consumption conditioned on no occurrence of jumps, which are calibrated to the consumption data during the period with no documented economic disasters. I use the average price–dividend ratio to calibrate the subjective discount rate r, which is insensitive to other pricing moments. Simulation studies show that k, which controls the speed of the mean reversion in gt , determines the serial correlation of the price–dividend ratio. This is reasonable since Pt =Ct is a function of gt alone. I therefore choose k to match the first-order autocorrelation of the log price–dividend ratio. Similar calibration is used by CC and Wachter (2006). This leaves me with five parameters fg , b, a, l, mb g, which are jointly denoted by Y. As discussed in the last section, the risk-free rate, the equity premium, and the smirk premium are all sensitive to the two jump parameters, l and mb . To check the informativeness of pricing moments to the remaining three parameters associated with habit formation, g , b, and a, I plot in Fig. 6 their influences on six moments implied from the model: the average risk-free rate, mean(rf), the average equity premium, mean(EP), the average return volatility, mean(volR), and the average prices of options with 30 days to expiration under two degrees of moneyness: M =0.9 or 10% OTM and M= 1 or ATMs, plus the average smirk premium measured as the price difference between 10% OTM puts and ATMs, where the x-axis plots the percentage changes of g , b, and a relative to their base-case levels reported in Table 2; the y -axis plots the implied moment values quoted in percentages. All other parameters remain at their basecase values. As plotted in solid lines, an increase of g reduces mean(rf) but raises both mean(EP) and the average option
prices. Intuitively, as the agent becomes more risk averse, on average, he sells the stock for the risk-free asset, hence, the implied changes in returns. While the diffusive consumption volatility is invariant to g in my model, it induces higher diffusive volatility in g translating into higher diffusive return volatility through the extra channel for larger g . Intuitively, the agent’s risk aversion becomes more volatile as he becomes more risk averse. As illustrated in Fig. 4, a larger g also induces more severe and more intensive stock price jumps under the riskneutral measure, which further raises the return volatility. A higher volatility thus leads to higher prices of options across all moneyness. It seems a little surprising that the average smirk premium is decreasing in g , which is, in fact, for the similar reason that leads to a rising smirk premium as the economy gets better. The decrease of g , or the increase of the average surplus ratio, has two offsetting effects on the Q implied smirk premium. First, it decreases both meanðl Þ and jmeanðJP Þj which gives the agent less incentives to hedge against jumps. Second, it decreases sR which increases the relative importance of jumps, hence, the stronger hedging incentives against jumps. The second effect dominates under the benchmark calibration, implying a larger smirk premium as the agent gets less risk averse, on average. As plotted in the dashed lines, b affects asset pricing moments in exactly the opposite way to g . These relations are mechanical with no apparent intuitions. For example, combining (7) with (10) shows that the absolute price of diffusive risk, jsL j, is lower for larger b, which implies a decreasing average equity premium in b. Except for the smirk premium, a, which controls the sensitivity of gt to consumption innovations, affects the pricing moments in a similar way as g , albeit to a less degree. Intuitively, a larger a implies that a negative consumption innovation induces a larger positive innovation in the agent’s risk aversion, which (i) gives him the stronger incentive to reallocate his investment from the stock to the risk-free asset, and (ii) induces higher return volatility through the extra channel. Hence, the implied return and price changes. By comparison, the smirk premium is insensitive to a. The above analysis illustrates an important difference between the preference-based equilibrium models (e.g., Naik and Lee, 1990; MSV), to which my model belongs, and the reduced-form models (e.g., Bates, 2000; Duffie, Pan, and Singleton, 2000), which are more common in the option pricing literature. Due to their exogenous specifications, reduced-form models give rise to separate parameterizations of the price–dividend (PD) ratio, the risk-free rate, the prices of risks (which determine the equity premium), and the stock return process under the risk-neutral measure (which determines option prices), which are calibrated separately to the corresponding portions of data. This is in contrast to the general equilibrium models in which all financial variables are driven by the common parameters associated with the underlying economic mechanism, such as g , b, and a, which are associated with habit formation in my model. These parameters, therefore, need to be identified by the joint data on stocks and options.
D. Du / Journal of Financial Economics 99 (2011) 400–426
The average risk free rate
411
The average price of 10% OTM puts 32
4
30 2
28 26
0
24
Moment values quoted in percentages
–2 –30
22 –20
–10
0
10
20
30
–30
–20
The average equity premium
–10
0
10
20
30
20
30
20
30
The average price of ATMs 25
10 8
20
6
15
4 10 2 –30
–20
–10
0
10
20
30
–30
–20
The average return volatility
–10
0
10
The average smirk premium 14
20 12 15
10
10
8
–30
–20
–10
0
10
20
30
–30
–20
–10
0
10
Percentage changes relative to their–base case levels Fig. 6. Informativeness of g , b, and a on asset pricing moments. This figure plots the informativeness of three parameters, g , b, and a, on six asset pricing moments, i.e., the average risk-free rate, mean(rf), the average equity premium, mean(EP), the average return volatility, mean(volR), the average prices from options with 30 days to expiration under two degrees of moneyness: 10% OTM puts and ATMs, plus the average smirk premium measured as the price difference between 10% OTM puts and ATMs. The x-axis plots the percentage changes of g , b, and jmb j relative to their base-case levels; the y-axis plots the implied moment values quoted in percentages.
So far I have shown that the pricing moments are informative about all parameters in Y. In a more strict sense, the ‘‘identification condition’’ is the assumption that the expected value of the derivative of the moment equations with respect to the model parameters have a full rank (Hansen, 1982, Assumption 3.4). In other words, a structural parameter y1 is not identified if there exists another parameter y2 such that @z=@y1 and @z=@y2 are collinear, where z 2 RN denotes the N moments used for estimation. In my model, the nonlinear relations between the parameters and the pricing moments, and the existence of a state variable ensure that @z=@y1 and @z=@y2 are not collinear for 8y1 , y2 2 Y. For example, it is easy to verify numerically that @meanðrf Þ=@g=@meanðrf Þ= @aa @meanðEPÞ=@g=@meanðEPÞ=@a. I thus conclude that Y are identifiable from the joint data on stocks and options.
4.2. Data 4.2.1. Option data I collect S&P 500 index option data from two data sets: CBOE data from April 4, 1988 to January 3, 1996 and Ivy DB data from January 4, 1996 to September 30, 2008, which combine to cover a period of more than 20 years with 5,166 trading days. I exclude the six months following the 1987 market crash and the pre-crash period because the pronounced volatility smirks emerged only after the crash and have persisted ever since. CBOE data are no longer publicly available; I instead use those from Eraker who independently computes the relevant option prices (see Eraker, 2004, for the details). Ivy DB data are available from OptionMetrics, and the option prices are computed as the bid-ask averages following the standard
412
D. Du / Journal of Financial Economics 99 (2011) 400–426
practice. Option prices and S&P 500 index are both closing prices at 4 pm EST. I apply several exclusion filters to choose option observations used in this paper. First, I delete price quotes that violate the basic arbitrage restrictions: Sedt Z call Z maxð0,Sedt Kert Þ and
Kert Zput Z maxðKert Sedt ,0Þ:
ð25Þ $ 38
to avoid Second, I eliminate price quotes lower than the impact of price discreteness. The deleted options constitute less than 1% of the data. Third, I use only put option data, since this paper focuses on the price differentials between deep OTM puts and ATMs. Fourth, I choose from the remaining option observations those with time to maturity between 20 days and 140 days and moneyness between 0.895 to 1.005. The final sample contains a total of 130,964 quotes, of which 16,883 quotes are from CBOE and the remaining 114,081 quotes are from Ivy DB. Given the target time to maturity t (quoted in days) and the target moneyness M, I collect all available option contracts with moneyness and time to maturity falling into the ranges of ½t10, t þ 10 and ½M0:005,M þ 0:005, and I use their average price as the unconditional firstorder moment implied from the data for options characterized by ðt,MÞ. Different choices of the ranges induce little variations in the average prices. Table 1 reports the summary statistics of the categorized option data. We observe a pronounced smirk pattern for short-term options, and the smirk premiums tend to decrease with the increase of t. Given the moneyness, long term ATMs and near ATMs are more expensive than short-term ATMs, while long-term deep OTM puts are cheaper than their short-term counterparts. In terms of the trading volume, shorter-term options, ATMs and near ATMs are more heavily traded than longer-term options and deep OTM puts.
4.2.2. Other data By convention, aggregate consumption and risk-free rate are proxied by the sum of non-durables and services, and the 30 day T-bill rate, respectively. Monthly consumption, CPI and population data are from the St. Louis Fed database. Monthly 30-day T-bill rates, S&P 500 index values, total market values of the S&P 500 firms (PSP), the ex-dividend and the cum-dividend S&P 500 returns, denoted, respectively, by Rdt and Rxt , are all from CRSP. The range of the monthly data is from January 1959 to December 2007, and the range of the daily S&P 500 index is from April 1988 to September 2008. Equity premiums are computed as the cum-dividend returns of the S&P 500 index minus the 30-day T-bill rates. Monthly dividend– price ratios for the S&P 500 firms, Dt =Pt , are computed out of the difference between Rdt and Rxt and smoothed using 12 month trailing averages as discussed in Hansen, Heaton, and Li (2004). I construct the monthly series of dividends out of PSP and the smoothed Dt =Pt , which are used to calibrate the separate dividend processes considered in Section 6.3. To test whether risk aversion actually follows the estimated dynamics, I also need the intraday index data to construct the model-free realized volatility. I collect the S&P 500 index data at the five-minute frequency from February 1983 to September 2008, which is provided by the Institute for Financial Markets. The data cover the period from 9:30 am to 4:00 pm New York time every day, for a total of 78 observations per day. 4.3. Calibration methodology I calibrate the base-case model in three steps. First, since the two U.S. economic disasters during the past 100 years documented by Barro (2006) both occurred outside of my consumption sample period, I pick m and s to exactly match the mean and the standard deviation of the
Table 1 Summary statistics of the option data. Table 1 reports the summary statistics of the option data collected from two combined data sets: CBOE data from April 4, 1988 to January 3, 1996, and Ivy DB data from January 4, 1996 to September 30, 2008. CBOE data are no longer publicly available; I instead use those from Eraker who independently computes the relevant option prices (see Eraker, 2004, for the details). Ivy DB data are available from OptionMetrics, and the option prices are computed as the bid-ask averages following the standard practice. I report the statistics conditioned on intersections of various moneyness and time-to-maturity ranges. For each intersection, I report the average option prices quoted in Black-Scholes implied volatility. I also report (in parentheses) the number of option contracts for each of the moneyness and time-to-maturity categories. Moneyness
0.895–0.905 0.905–0.915 0.915–0.925 0.925–0.935 0.935–0.945 0.945–0.955 0.955–0.965 0.965–0.975 0.975–0.985 0.985–0.995 0.995–1.005 Subtotal
Time to maturity (days) 20–40
40–60
60–80
80–100
100–120
120–140
Subtotal
25.4 24.1 23.2 22.0 21.2 20.2 19.4 18.5 17.8 17.0 16.4 (44,367)
23.8 23.0 22.3 21.5 20.9 20.1 19.4 18.5 17.8 17.0 16.4 (32,682)
23.4 22.7 22.2 21.7 21.0 20.1 19.6 18.8 18.1 17.3 16.6 (23,828)
23.2 22.5 22.4 21.6 21.4 20.7 20.0 19.4 18.6 18.1 17.3 (15,267)
23.6 22.5 21.9 21.4 21.0 20.4 20.1 19.5 18.5 18.2 17.6 (7,912)
22.9 21.8 21.9 21.0 20.7 20.0 19.8 19.2 18.3 18.2 17.4 (6,908)
(8,398) (8,989) (9,588) (10,027) (10,670) (11,458) (12,432) (13,326) (14,464) (15,610) (16,002) (130,964)
D. Du / Journal of Financial Economics 99 (2011) 400–426
consumption growth rate over this period. Second, I select values of r and k to match the average price–dividend ratio and the first-order autocorrelation of the log price– dividend ratio. Third, I treat the calibrated m, s, r, and k as given and select Y fg , b, a, l, mb g to minimize the sum of the squared percentage differences between the observed and the model-implied asset pricing moments. Below, I discuss the details of the third-step estimation. I use the short-term option contracts with 30 days to expiration, which have the most pronounced volatility smirk, to construct the option pricing moments. In particular, I choose to match the average option prices under 11 degrees of moneyness: M ½0:92,0:93,0:94, 0:95,0:96,0:97,0:98,0:99,1, which cover both deep OTM puts and ATMs. For non-option pricing moments, I choose to match the average risk-free rate, mean(rf), the average equity premium, mean(EP), the average return volatility, mean(volR), and the standard deviation of the risk-free rate, std(rf). The last moment condition is imposed to avoid large variations in the interest rates which many habit formation models give rise to. In summary, I have a total of 15 moments to estimate five unknowns summarized by Y. Unlike risk-free rates or the equity premium, option prices cannot be derived in closed-form. I thus apply interpolation to efficiently compute the unconditional option pricing moments implied from the model. I first obtain fSi gN i ¼ 1 , a large number of the realizations of the surplus ratio realizations S in its stationary distribution region. Next, I simulate option prices conditioned on n interpolation nodes within the range of fSi gN i ¼ 1 . Finally, I use the n conditional prices to interpolate the prices conditioned on each of realizations in fSi gN i ¼ 1 , whose
413
average is reported as the unconditional option moments implied from the model. Experiments show that Chebyshev interpolation with ten interpolation nodes provides fairly accurate results compared to direct simulations.
4.4. Calibration results Panel A of Table 2 reports the calibrated non-jump parameters. Excluding the jump component, the mean and the standard deviation of the consumption growth rate are both around 2%. The calibrated 4.7% subjective time-discount rate is consistent with its values usually assumed in the macro-finance literature. The k estimate is close to its calibration in MSV, which matches the autocorrelation of the historical price–dividend ratios. As argued in CC, a high risk aversion is inescapable in the class of representative-agent models with low consumption-growth-rate volatility that are consistent with the observed equity premium. The b estimate implies that the surplus ratio cannot exceed 0.05, which ensures that small consumption innovations lead to large variations in the financial market. Finally, the estimated a is much smaller than that in MSV ( =79.39), implying that the risk aversion is much less sensitive to consumption innovations, hence, a smoother process for the habit level H. Panel B of the same table reports variables related to jump estimates. Jumps strike once every 45 years under the actual measure, and they are viewed as more intensive at once every 15 years under the risk-neutral measure. This feature helps generate the desired smirk
Table 2 The base-case calibration and the matched moments. In the general model, the log consumption jump size, b, is assumed to be normally distributed with mean mb and standard deviation sb . In the base-case model, I assume a constant consumption jump size by setting sb to 0, hence, mb ¼ b. Panel A of Table 2 reports the base-case calibration of non-jump Q
parameters. Panel B reports the base-case calibration related to jumps, where l and l denote the jump intensity under the actual and the risk-neutral measure, respectively; JC and JP denote the consumption jump size and the stock price jump size, respectively. Panel C reports the non-option pricing moments implied from both the data and the model, where mean(PD), mean(rf), std(rf), mean(EP), and mean(volR) denote, respectively, the average price– dividend ratio, the average risk-free rate, the standard deviation of the risk-free rate, the average equity premium, and the average return volatility. Panel D reports part of the prices for options with 30 days to expiration implied from both the model and the data. Panel A: Non-jump parameters
m
s
r
0.0205
0.0182
0.047
k 0.13
g
b
a
34
20
39.4
Panel B: Variables related to jumps
l
jmb j ¼ mb ¼ b
0.022
0.172
Q
meanðl Þ 0.047
JC ¼ emb 1
mean(JP)
0.158
0.51
Panel C: Non-option pricing moments
Data Model
mean(PD) 26.2 25.9
mean(rf) 1.46% 1.46%
std(rf) 1.00% 3.13%
mean(EP) 5.68% 5.88%
mean(volR) 14.01% 16.05%
Panel D: Part of the option pricing moments (%) Moneyness Data Model
0.90 25.4 27.4
0.92 23.2 24.6
0.94 21.2 21.8
0.96 19.4 19.5
0.98 17.8 17.7
1.00 16.4 17.0
414
D. Du / Journal of Financial Economics 99 (2011) 400–426
pattern which is determined by the stock return process under the risk-neutral measure, while keeping the stock dynamics not so ‘‘jumpy’’ under the actual measure, since major economic events that cause severe stock market crashes are relatively rare. The 15.8% consumption jumps implied from the pricing data, which still look very large compared to the maximum 9.9% annual consumption contraction during the Great Depression, are much more in line with the empirical observations than those implied from peso problem models with standard CRRA preference (e.g., Barro, 2006). On the other hand, habit persistence induces much more dramatic crashes in the stock market. This feature, combined with the wedge Q between l and l, generates a high jump-risk premium and hence a pronounced volatility smirk, as discussed in Section 3.3. Pan (2002) also estimates the stock price jumps using the joint data of the S&P 500 index and prices of ATM index options but with a reduced-form model. By restricting the jump intensity to be the same under the actual and the risk-neutral measures, she excludes the jump-timing risk premium. In contrast, I impose the same jump sizes under the two measures in the base-case, hence, the exclusion of the jump-size risk premium. Pan reports an average jump intensity of about 0.36 per year, which is much larger than my estimates, and average stock price jump sizes of 0.3% and 18% under the actual and the risk-neutral measure, respectively, which are much smaller than my estimates. The differences are not unexpected given the different setups. The implied negative skewness in the return distribution, on the other hand, is unlikely to be much different between the two models, since both Pan’s (2002) and my model match the observed volatility smirk. The estimates of Pan imply an average jump-size risk premium of 3.5% which accounts for 38.9% of the total equity premium, while under my base-case calibration the average jump-timing risk premium is 2.46% accounting for 41.8% of the total equity premium. My estimate for the relative contribution of the jump-risk premium is, thus, very much in line with Pan’s estimate. Panels C and D of Table 2 report part of the moments that are used for calibration. The model perfectly matches the averages of the price–dividend ratio, the risk-free rate, and the equity premium. It generates volatilities that are higher than but still comparable to their data counterparts. For option pricing moments, the model matches well prices of ATMs and near ATMs but somewhat overprices deep OTM puts. In summary, the model produces an average 17.0% ATM price and an average 10.4% smirk premium, measured by the price difference between 10% OTM puts and ATMs, compared to their data counterparts of 16.4% and 9.0%, respectively.
matches of the unconditional option pricing moments. Taking into account that prices are state-dependent, I then study the model implications in relation to data on the option price variations across economic states which are proxied by volatility deciles. Finally, I examine the model’s fit of the time-series data. 5.1. Unconditional option pricing Table 2 shows that the model matches well both the average ATM price and the average return volatility. In addition, the average ATM price is higher than the average return volatility in the model, which is also consistent with the data. To corroborate this result, I plot together in the top left panel of Fig. 8 the model-implied ATM prices with 30 days to expiration and volatilities as functions of the surplus ratio S, which shows that ATM options are priced with a premium relative to the volatilities of the underlying index across all economic states. The top left panel of Fig. 7 plots the average prices of options with 30 days to expiration against moneyness, i.e., the unconditional volatility smirk, implied from both the model and the data, along with (plus and minus) the onestandard-deviation error band implied from the data. Given an option characterized by t (in days) and M, I compute the data-implied standard deviation of its price as that of the prices of all option contracts in my sample period with moneyness and the time to maturity falling into the ranges of ½t10, t þ 10 and [M 0.005, M +0.005]. The calibrated model generates a pronounced smirk pattern and achieves fairly good matches of the data, except that the deep OTM puts are somewhat overpriced. As discussed in Section 3.3, the main driving forces are the excess volatility and the large compensation for jump risks, both of which are due to the existence of habit formation. 5.2. Conditional option pricing In this subsection, I examine the fit of option prices conditioned on economic states. Since only unconditional moments are used to estimate the model, all matches reported in this subsection are out-of-the-sample. Given the benchmark estimates denoted by y^ , I use the daily option prices to back out the state for each of the trading days within my option sample period. More specifically, let Nt be the number of option prices on day t, and ðmÞ ^ OðdÞ n ðt, tn ,Mn Þ and On ðt, tn ,Mn , y ; St Þ be, respectively, the observed and the model price of the nth option (n =1,2,y,Nt). I identify day t s surplus consumption ratio, S^ t , by minimizing the sum of the squared pricing errors as follows: S^ t ¼ argmin
5. Model fit In this section, I examine the model’s fit of the observed asset prices from three aspects: the fit of the average levels, the fit of the variations across state, and the fit of the variations across time. I first check the
St
Nt X
2 ðmÞ ^ ½OðdÞ n ðt, tn ,Mn ÞOn ðt, tn ,Mn , Y ; St Þ :
ð26Þ
n¼1
With fS^ t g as the inputs, I compute the implied daily return ^ volatility, fvolRg, according to (23). Except for one day (July 22, 2002, for which S^ t ¼ 0:0138), the range of the backed-out fS^ t g is from 0.0148 to 0.0498, within which volR is monotonically decreasing in S, and hence, can be
D. Du / Journal of Financial Economics 99 (2011) 400–426
Unconditional
415
Low volatility decile
30
22 20
25 18 16
Black–Scholes implied volatility (%)
20
14 15
12 0.9
0.92
0.94
0.96
0.98
Medium volatility decile
1 Data 0.9 One std bound Model
0.92
0.94
0.96
0.98
1
High volatility decile
25 30 28 26
20
24 22 15 0.9
0.92
0.94
0.96
0.98
1 0.9 Moneyness
0.92
0.94
0.96
0.98
1
Fig. 7. Unconditional and conditional volatility smirk. The top left panel plots the average prices of options with 30 days to expiration against the moneyness, i.e., the unconditional volatility smirks, implied from both the model and the data, along with (plus and minus) the one-standard-deviation error band implied from the data. Given an option characterized by t (in days) and M, I compute the data-implied standard deviation of its price as that of the prices of all option contracts in my sample period (from April 4, 1988 to September 30, 2008) with moneyness and the time to maturity falling into the ranges of ½t10, t þ 10 and [M 0.005, M + 0.005]. The other three panels plot the volatility smirks implied from both the model and the data along with the one-standard-deviation error bands, which are conditioned on the low volatility days, the medium volatility days, and the high volatility days, respectively.
used as the equivalent state variable. This is consistent with the usual practices in the reduced-form option pricing models in which return volatilities, with exogenously specified dynamics, are used as the state. ^ Based on the implied fvolRg, I sort the total of 5,166 trading days in my option sample period into ten volatility deciles with days in the higher deciles experiencing higher return volatilities. Given a particular decile, I collect all available option contracts from days in that decile with moneyness and time to maturity falling into the ranges of [M 0.005, M+ 0.005] and ½t10, t þ 10, and compute the average and the standard deviation of their prices, which are used as the average and the standard deviation of the data-implied prices conditioned on the given volatility decile for the option characterized by ðt,MÞ. The model-implied prices for the given decile are computed conditioned on the mid-point of the decile. The second and the third columns of Table 3 report the ranges and the mid-points, quoted in percentages, for each of the ten volatility deciles. In the following analysis, I interpret the second, the sixth, and the ninth deciles as the low volatility, the medium volatility, and the high volatility deciles, respectively.
5.2.1. Volatility smirk The top right and the bottom two panels of Fig. 7 plot the volatility smirks conditioned on the low volatility days, the medium volatility days, and the high volatility days, respectively, for options with 30 days to expiration that are implied from both the model and the data. I plot in the same panels (plus and minus) the one-standarddeviation error bands implied from the data, which, as expected, are much narrower than those for unconditional prices. The observed smirk patterns vary across states, which are generally captured by the model. In particular, the match gets better as volatility increases. This is an attractive feature to practitioners, since options are usually more actively traded during periods of high return volatilities. I further plot in the top right panel of Fig. 8 the implied smirk premiums, measured as the price differentials between 10% OTM puts and ATMs, against the time to maturity t conditioned on the medium volatility decile. In both the data and the model, the smirk premium monotonically decreases in t. While the model somewhat overshoots the smirk premium for short-term options, the matches improve and become nearly perfect for longerterm options.
416
D. Du / Journal of Financial Economics 99 (2011) 400–426
Table 3 Volatility deciles and state-dependent smirk premium. I sort the total of 5,166 trading days in my option sample period (from April 4, 1988 to September 30, 2008) into ten deciles according to the return volatilities backed out from the daily option data, where days in the higher deciles experience higher volatilities. Given a particular decile, I collect all available option contracts from days in that decile with moneyness and time to maturity (quoted in days) falling into the ranges of [M 0.005, M + 0.005] and ½t10, t þ 10, and compute their average price as the data-implied price conditioned on the given volatility decile for the option characterized by ðt,MÞ. The model-implied prices for the given decile are computed conditioned on the mid-point of the decile. I then compute the implied smirk premium, measured as the price difference between 10% OTM puts and ATMs. Columns 2–3 report the range and the mid-point of the range for each of the ten deciles. Columns 4–5 report the average smirk premium implied from both the data and the model under the base-case calibration for each of the ten deciles. Decile
Range in terms of volR (%)
Mid-point of the range (%)
Data-implied premium (%)
Model-implied premium (%)
1 2 3 4 5 6 7 8 9 10
3.3–9.4 9.4–10.5 10.5–11.7 11.7–13.0 13.0–14.4 14.5–16.2 16.2–17.9 17.9–19.7 19.7–22.6 22.6–32.3
6.3 9.9 11.1 12.4 13.7 15.3 17.1 18.8 21.1 27.4
9.5 9.6 9.2 9.1 8.5 8.1 7.9 7.9 7.3 7.7
12.0 11.0 10.6 10.1 9.5 8.9 8.3 7.9 7.4 6.2
Smirk premium in medium volatility days 0.3
9
0.25
8
0.2
7
Data Model
6
0.15
5
volR ATM B/S–vol
0.1
0.01
0.02
4
0.03
0.04
40
60
80
100
Surplus consumption
Time to maturity (days)
M=0.9
M=1
120
18
24 22
17 Data One std bound Model Model with no jump
20 18
Data
16
One std bound Model Model with no jump
15 16 40
60
80
100
Time to maturity (days)
120
40
60
80
100
120
Time to maturity (days)
Fig. 8. State-dependent ATM prices, smirk premium, and term structure. The top left panel plots together the model-implied state-dependences of ATM prices of options with 30 days to expiration and the return volatility. The top right panel plots the smirk premiums, measured as the price difference between 10% OTM puts and ATMs, against the time to maturity t conditioned on the medium volatility days. The bottom two panels plot option prices conditioned on the medium volatility decile against the time to maturity t, i.e., the option term structure, under two degrees of moneyness: M= 0.9 and 1, implied from both the model and the data, along with (plus and minus) the one-standard-deviation bands. I also plot in the same panels the modelimplied term structure when jumps are shut down.
D. Du / Journal of Financial Economics 99 (2011) 400–426
In the model, as in the data, the smirk premium varies across economic states proxied by the volatility deciles, which is reported in the last two columns of Table 3 for options with 30 days to expiration. As explained in Section 3.4, there are two offsetting effects on the model-implied smirk premium as we go from bad states to good states: the decreasing jumps which depress the hedging motivations; and the decreasing diffusive return volatility which induces stronger incentives to hedge against jumps. The second effect dominates, and the model predicts that the smirk premium decreases as we go from lower volatility deciles (with higher S) to the higher ones (with lower S). Except for the first and the last decile, the data largely support this prediction. Both LPW and BCDG study the unconditional volatility smirk, but neither considers volatility smirks conditioned on states. This is natural for the LPW model, which, like Naik and Lee (1990), does not have a state variable. While the persistent component x in the expected consumption growth rate serves as the state variable in BCDG, its variations induce little price change. As reported by BCDG, option prices computed at values of x up to 73 standard deviations away from its steady state value turn out to be very similar to each other. In summary, neither LPW nor BCDG are able to generate the observed variations in volatility smirks across different economic states. David and Veronesi (2002) consider an economy in which the expected growth rates in dividends switch between two unobservable states, interpreted as ‘‘booms’’ and ‘‘recessions’’, in which the representative agent uses past realizations of dividend to infer its current drift rate. The implied volatility smirk is state-dependent in their model as well. In particular, their model generates the normal positive smirk premium during economic expansions, but a negative smirk premium during recessions which implies that ATMs are more expensive than OTM puts. In contrast, smirk premium is always positive in my model, whose magnitudes vary across states in a way that is largely consistent with the data. 5.2.2. Term structure While option pricing differentials across moneyness have been extensively studied in the literature, relatively little attention is paid to their pricing differentials across time to maturities, i.e., the option term structure. In the bottom two panels of Fig. 8, I plot option prices conditioned on the medium volatility decile against the time to maturity t for two degrees of moneyness: M =0.9 and 1, implied from both the model and the data, along with the one-standard-deviation bands. To facilitate exposition, I define ‘‘term premium’’ as the price difference between options with 130 days to expiration and options with 30 days to expiration at the same moneyness. Consistent with the data, term premium is negative for deep OTM puts but turns positive for ATMs. Intuitively, the agent is more concerned about jump risks in the near term, which drives up the equilibrium prices of shortterm deep OTM puts relative to their long-term counterparts. On the other hand, the more time you give to ATMs, which are not used for hedging, the more chances that jumps will kick in to enhance the option values.
417
To further illustrate the effects of jumps, I plot in the same two panels the model-implied term structure when jumps are shut down. Except for the expected changes in levels, the term premium turns positive for deep OTM puts, leading to the counterfactually decreasing term structure. On the other hand, the term premium becomes less positive for ATMs. While the impacts are relatively small compared to those for volatility smirks, this exercise shows that potential jumps are important to generate the observed cross-sectional option pricing differentials, not only along the dimension of moneyness but also along the dimension of time to maturity, an issue that was not explored before. 5.3. Time-series properties The model makes strong predictions that all time variations are driven by a single state variable, the surplus ratio S, or equivalently, the risk aversion g. Below, I back out g from the historical data of aggregate consumptions, option prices, and the model-free realized volatilities, and investigate the implied time-series properties. 5.3.1. Risk aversion backed out of consumptions In habit formation models, an important restriction is that asset pricing variations are ultimately driven by consumption shocks. To test it, I obtain historical innovations in real consumption growth rate, and approximate the innovations appearing in the gprocess of (2.2) by Dct þ 1 Et ½Dct þ 1 . Note g reacts to diffusive and jump innovations in the same way. Assuming that g starts at the steady state at the beginning of the sample, I back out the time series of g, from which I compute the time series of the PD ratio according to (2.9), which is denoted by PD(m) t . I then run the time-series regressions of the form ðdÞ lnðPDðmÞ t Þ ¼ a0 þ a1 lnðPDt Þ,
ð27Þ
{PD(d) t }
denotes the historical price–dividend ratios. where To make the results more reliable, I obtain consumption and PD data before 1959 from Robert Shiller’s web site, (d) and {PD(m) t } and {PDt } under analysis are both from 1890 to 2007 at the annual frequency. The correlation between the model-implied and the historical PD series is 0.32, compared to 0.34 found by CC. The regression coefficient a^ 1 equals 0.50 which is significantly different from zero at the 5% level with the standard error (Newey-West adjusted) of 0.15. The model is thus able to generate a substantial component of the historical stock price data, although not completely. These matches relieve the concern about shuffling the asset pricing puzzles to the state process, since I do not calibrate the pricing kernel to fit the historical pricing data for the given consumption process. An arbitrarily ‘‘reverse engineered’’ model clearly cannot replicate to a large extent the historical PD ratios with only consumption data. 5.3.2. Risk aversion backed out of option prices The model predicts that a high risk aversion is associated with a high return volatility. To test this property with the time-series data, I back out the surplus
418
D. Du / Journal of Financial Economics 99 (2011) 400–426
ratio from the daily option prices according to (26), from which I obtain the implied daily g, denoted by fg^ t g. The key step is to construct the model-free volatility measure which aims to yield the historical volatilities in an objecn tive way. Denote by Vt,t þ D the realized volatility computed by summing the squared high-frequency returns over the ½t,t þ D interval, i.e., n Vt,t þD
n X
2 st þ ði=nÞD st þ ðði1Þ=nÞD ,
6. Further discussions and robustness 6.1. Is different pricing of diffusive and jump risks crucial for option pricing?
ð28Þ
i¼1
where s denotes the log stock price. By the theory of quadratic variation (e.g., Barndorff-Nielsen and Shephard, 2002), Z tþD n lim Vt,t Vs ds, ð29Þ þ D ¼ Vt,t þ D n-1
volatilities are driven by the evolution of a single state, which is predicted by this model.
t
where Vt denotes the instantaneous volatility incorporating both the diffusive and the jump components. As demonstrated by Barndorff-Nielsen and Shephard (2002), n the model-free volatility Vt,t þ D thus computed offers much more accurate ex post observation of the actual volatility than the more traditional sample variances based on daily or coarser frequency data. I construct the daily series of V^ t,t þ D based on the fiveminute S&P 500 index data from April 1988 to September 2008, where the five-minute sample frequency strikes a reasonable balance between the desire for finely sampled returns and the contaminating effects of market microstructure frictions. The two series, fg^ t g and fV^ t,t þ D g, are correlated by 0.69, providing the empirical evidence that a high risk aversion is associated with a high volatility. This exercise shows that the time-varying risk aversion induced by habit persistence not only generates the average level of the return volatilities but also explains a substantial component of their time variations.
5.3.3. Risk aversion backed out of volatilities Due to the unusually high PD ratios since the early 1990s, a significant portion of the observed PDs in my sample are beyond the PD range implied by the calibrated model. To avoid this difficulty, I choose instead to back out the risk aversions from the historical volatilities proxied by fV^ t,t þ D g, and check whether the implied PD ratios match their historical values.7 Since PD data are only available at the monthly frequency, I obtain monthly fV^ t,t þ D g from February 1983 to December 2007 which are inverted to obtain the implied monthly risk aversions fg^ t g. Five out of the 299 V^ are beyond the volatility range implied by the model, which are deleted from the sample. The implied PDs from the backed-out fg^ t g correlate with the historical PDs by 0.38, suggesting that a significant component of variations in the PD ratios and the 7 As pointed out by Lettau, Ludvigson, and Wachter (2008), the unusually high PD ratios can be justified by a model incorporating a shift to substantially lower consumption volatility at the beginning of the 1990s. While introducing regime switch helps generate PD ratios above their previous historical norms in the more recent period, it does not change the prediction that risk aversion and PD ratios are negatively correlated as implied from the single-state models such as mine.
In my model, habit innovations, induced by both diffusions and jumps, are perfectly correlated with consumption innovations. Therefore, the implied pricing kernel effectively price the diffusive and the jump risks in the same way. This raises the question of why my model is capable of resolving the option-premium puzzle, since LPW state that a successful equilibrium model for options should be able to price jump-risk separately from the diffusive risk. Below, I first briefly review LPW’s argument which is based on the Naik and Lee (1990) model. I then show that their arguments do not necessarily generalize to other models. In Naik-Lee, a common parameter, i.e., the risk aversion g, controls the prices of both diffusive and jump risks. Using jump parameters close to those estimated by Pan (2002)and matching the model to the stock market volatility and a total of 8% equity premium, LPW show that the implied volatility smirk is virtually flat, and the option-implied g for the jump risk is considerably larger than that for the diffusive risk. To reconcile this discrepancy, LPW add an additional layer of uncertainty, the model uncertainty, to the jump component but not to the diffusive component, and show that the implied smirk pattern becomes much more pronounced. LPW thus conclude that to explain the option smirk premium, the pricing kernel needs to differentiate the pricing of jump risks from that of the diffusive risks. An important part of LPW’s argument is that they calibrate the consumption volatility s to the stock market. In other words, they calibrate Naik-Lee conditional on the excess volatility puzzle. As result, the implied jump-risk premium (0.2%) is pretty small which explains the virtually flat smirk pattern. To illustrate that LPW’s conclusion does not generalize to other models, the simplest way is to consider a pseudo-model that has the same pricing kernel as Naik-Lee but delivers a stock return volatility sR different from s. In other words, the pseudo-model is capable of ‘‘resolving’’ the excess volatility puzzle. This difference is critical, because a separate calibration of s and sR helps reconcile the discrepancy in the implied g for the jump risk and for the diffusive risk.8 Indeed, setting s at 2% and sR at 15%, using 8 It is worth noting that the ‘‘pseudo-model’’ can also be interpreted as an alternative calibration to the Naik-Lee model: instead of calibrating both s and sR to the stock market data as in LPW, we can calibrate them separately to the consumption data and to the stock market data. This calibration, while in violation of the implication that s ¼ sR , seems to make more sense for an equilibrium model in which the prices of risks are determined not by the return dynamics, but by the dynamics of the economic fundamentals. Calibrating s to the S&P 500 index market is thus equivalent to letting evidence from the equity market guide the specification of the diffusive risk, as people do with a reduced-form
D. Du / Journal of Financial Economics 99 (2011) 400–426
419
Black–Scholes implied volatility (%)
λ = 1/3; μb = –1% 17
16.5 Naik–Lee 16
The pseudo–model
15.5
0.9
0.92
0.94
0.96
0.98
1
Moneyness λ = 1/100; μb = –20% Black–Scholes implied volatility (%)
Black–Scholes implied volatility (%)
λ = 1/25; μb = –10% 20 19 18 17 16
0.9
0.92
0.94
0.96
0.98
Moneyness
1
23 22 21 20 19 18 17 16 0.9
0.92
0.94
0.96
0.98
1
Moneyness
Fig. 9. Naik-Lee vs. a pseudo-model that ‘‘resolves’’ the issue of excess volatility. This figure plots together the implied volatility smirk from the model of Naik and Lee (1990) and a pseudo-model that has the same pricing kernel as Naik-Lee but delivers a stock return volatility different from the consumption growth rate volatility. In particular, I plot the volatility smirks under three jump scenarios, where l denotes the jump intensity, and mb denotes the jump size of the log consumption growth rate.
the same jump parameters, and searching for g that matches the 8% equity premium, I find the implied volatility smirk becomes much more pronounced than that implied from Naik-Lee, which are plotted together in the top left panel of Fig. 9. Why does the pseudo-model price options better? The reason is that the implied g is much higher at 20.1 (vs. 3.47 in Naik-Lee), which generates a much larger jumprisk premium of 1.97% (vs. 0.20% in Naik-Lee). In fact, the implied jump-risk premium from the pseudo-model is close to the 2.09% compensation for the jump-related risks9 in the high uncertainty aversion ðf ¼ 20Þ case of LPW reported in their Table 1, which explains why the smirk patterns implied from these two cases are similar to each other. Indeed, the pseudo-model generates 15.5% (ATM) and 17.1 (10% OTM), compared to 15.6% and 17.2% implied from the high uncertainty aversion case, which is plotted in the top left panel of Fig. 1 in LPW.
(footnote continued) model, which conflicts with the fact that the Naik-Lee model is built in an equilibrium setting. 9 In the case with uncertainty aversion, compensation for jumprelated risks consists of two parts: the usual jump-risk premium, and the premium for the aversion against the model uncertainty in jumps, which is called the rare-event premium in LPW.
To show the robustness of the result, I try alternative jump parameters as reported in Tables 2 and 3 of LPW, and plot the implied volatility smirks from both the pseudo-model and the Naik-Lee model in the bottom two panels of Fig. 9. In each case, I search g that matches the 8% equity premium. Again, the pseudo-model generates much more pronounced smirk patterns than Naik-Lee, and the patterns are similar to those implied from the high uncertainty aversion case of LPW with the same jumps. Commensurate with the increase of the smirk premiums as potential jumps become more severe, the associated jump-risk premiums in the pseudo-model increase from 1.97% to 2.90% and then to 3.76%. The pseudo-model, albeit rather contrived, illustrates that the failure of Naik-Lee to account for volatility smirk lies not in the same pricing of diffusive and jump risks, but in its inability to generate a large compensation for jump risks after being calibrated conditional on the excess volatility puzzle. My base model provides another example in which the prices of diffusive and jump risks are both determined by a common parameter a controlling the sensitivity of risk aversion to consumption shocks. In contrast to Naik-Lee, my model can generate a large jump-risk premium due to habit formation, hence, the implied volatility smirk. Similar logic applies to CC’s model augmented with a peso component: while both
420
D. Du / Journal of Financial Economics 99 (2011) 400–426
Table 4 Prices of diffusive and jump risks in various models. Table 4 reports the prices of diffusive and jump risks in various models, where I use s and Z to denote the diffusive consumption volatility and consumption jump size for all models. In models of both Naik and Lee (1990) and LPW, g denotes the risk aversion. In LPW’s 2
2
model, the extra term ea þ bZbmJ ð1=2Þb sJ is due to the uncertainty aversion toward jumps. In my base model, gt denotes the time-varying risk aversion; a controls the sensitivity of risk aversion to consumption shocks. In CC’s model augmented with a peso component, lðst Þ denotes the sensitivity function controlling the sensitivity of the surplus ratio to consumption shocks, which is defined by Eq. (10) in CC. Note the local risk aversion in CC is given by g=St , not by g. Price of diffusive risk
Price of jump risk
Naik-Lee LPW
gs gs
1eZg
My base case
g b sþa t s gt gs þ glðst Þs
CC with jumps
2 2 Z g a þ bZbmJ ð1=2Þb sJ
1e
e g b 1eZ 1a t Z
gt
1egZ eglðst ÞZ
6.2. Alternative habit specifications At the heart of habit formation models is the timevarying risk aversion that amplifies consumption shocks. By adding a peso component to the consumption growth rate, habit formation of all specifications are likely to generate a large jump-risk premium, hence, a large smirk premium. In this subsection, I give two concrete examples: (i) the MSV model under an alternative calibration so that the implied habit dynamics mimic those from CC habit specification; (ii) the original CC model with a peso component, and I show that they are both able to produce pronounced smirk patterns. 6.2.1. CC habit specification In CC, time is discrete, and the representative agent has the following utility: ðCt Ht Þ1g 1 , 1g
ð30Þ
where Ct and Ht denote consumption and the habit level, respectively. Since habit formation is external, the pricing 10 lðst Þ controls the sensitivity of the surplus ratio to consumption shock, and it is defined by Eq. (10) in CC.
ð31Þ
where St ðCt Ht Þ=Ct denotes the surplus ratio. CC report that in terms of the implied stock pricing, the model with i.i.d. log dividend/consumption process is almost indistinguishable from the model treating consumption and dividend as a single process. For simplicity, I adopt the simplified assumption, under which the log consumption ( = dividend) evolves according to
Dc ¼ g Dt þ sDz,
risks are controlled by the common sensitivity function lðst Þ,10 the CC habit model is able to produce pronounced smirk patterns as well through the induced large compensation for jumps. Section 6.2.3 gives more detailed discussions about the smirk pattern implied from CC habit. To facilitate comparison, I list in Table 4 the prices of diffusive and jump risks under various models just discussed. In summary, so long as it can generate a wide wedge between the actual and the risk-neutral jumps in the form of a large jump-related premium, differentiating the pricing of diffusive and jump risks is no longer crucial for an equilibrium model to replicate the observed volatility smirk.
ert uðCt ,Ht Þ ¼ ert
kernel is given by g S C Lt ¼ ert t þ 1 t þ 1 , St Ct
ð32Þ
where Dz denotes a random variable with the standard normal distribution. In addition, CC assume that the log surplus ratio s logðSÞ follows pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 12ðlogSt sÞ1 sDz Dst ¼ ð1fÞðslogSt ÞDt þ S for St rSmax , ð33Þ where 8 rffiffiffiffiffiffiffiffiffiffiffi g > > >S s , < 1f > 1 2 > > : smax s þ ð1S Þ ¼ lnSmax : 2
ð34Þ
I ignore the part for St 4Smax since realizations of St rarely fall beyond Smax. Finally, CC choose m ¼ 1:89%, s ¼ 1:5%, f ¼ 0:87, g ¼ 2, and r ¼ logðdÞ ¼ 0:1165. 6.2.2. Mimicking CC habit External habit formation models differ in their specifications about how the surplus ratio S (or a function of it) reacts to consumption innovations, which is given by Eq. (3) for MSV habit and by Eq. (33) for CC habit. To make the two processes more comparable, I apply Ito’s lemma and Euler discretization to rewrite (3) as a2 Ds ¼ kðg Sgt 1Þ þ ð1bSt Þ2 s2 Dt þ að1bSt ÞsDz 2 for St r b
1
,
ð35Þ 1
where s =ln (S); St r b follows from gt Z b; I shut down jumps to facilitate the comparison with CC. Although (35) and (33) denote two different processes, the implied habits can follow similar dynamics if within the range of S the drift and the diffusion components in (35) approximate their counterparts in (33). This is the problem of functional fitting, and the basic idea is to transform it into the problem of fitting the coefficients. More specifically, I approximate the drifts and the diffusions in both (35) and (33) as linear combinations of a set of n independent basis functions, where the associated coefficients are sometimes referred to as ‘‘interpolation coefficients’’. Combine interpolation coefficients for the drift and the diffusion, and denote them by c_CC 2 Rn2 and c_SV 2 Rn2 for CC and MSV habit models, respectively. I select parameter values in MSV to minimize the distance between c_CC and c_SV under certain metrics. By using the common basis functions, a smaller distance between c_CC and c_SV implies a better match of the two s-processes in (35) and (33).
D. Du / Journal of Financial Economics 99 (2011) 400–426
Table 5 The alternative calibration of MSV habit that mimics CC habit dynamics. Table 5 reports the alternative calibration of MSV habit that mimics CC habit dynamics governed by the calibrated s-process, where s denotes the log surplus ratio. To mimic the s-process in CC with the s-process implied from MSV specification, I write the drifts and the diffusions in both processes as linear combinations of the common basis functions, where the implied coefficients are referred to as interpolation coefficients. I then choose parameters to minimize the distance between the two sets of coefficients under certain metrics. Panel A reports the parameters that enable MSV to best match CC. Panel B reports the matches in terms of the implied interpolation coefficients, where rows 3–12 list the coefficients on Chebyshev polynomials from the first order to the tenth order. Panel A: Non-jump parameters
m
s
r
1.89%
1.5%
0.1165
k 0.172
g
b
a
17.8
10.7
45.0
Panel B: Implied interpolation coefficients Order 1 2 3 4 5 6 7 8 9 10
Drift (CC) 0.113 0.258 0.119 0.073 0.049 0.035 0.025 0.017 0.011 0.005
Drift (MSV) 0.113 0.258 0.029 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Diffusion (CC) 22.46 22.01 3.879 4.024 1.656 1.678 0.831 0.772 0.360 0.229
Diffusion (MSV) 22.49 22.49 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Parameters governing the consumption dynamics and the subjective discount rate are directly matched to CC calibration. To ensure the same range of S, I impose b1 ¼ Smax , where Smax is given by (34). Next, I select values of fg , a,kg to minimize 2 X
½c_CCð: ,iÞc_MSVð: ,iÞuW½c_CCð: ,iÞc_MSVð: ,iÞ,
ð36Þ
i¼1
where W is the weighting matrix; c_CCð: ,1Þ 2 Rn1 and c_CCð: ,2Þ 2 Rn1 denote the interpolation coefficients for the drift and the diffusion of the s-processes under CC habit, and similarly for c_MSV(:,i). I choose Chebyshevnode polynomials as the basis functions and I set n =10. Since higher order polynomials are less important for the Chebyshev interpolation, I assign the weight 1010 i to the coefficient associated with the ith order polynomial (i= 1,y,10) with the off-diagonal elements in W all set to zero. Panel A of Table 5 reports the implied calibrations. Compared with the base-case calibration, the implied habit dynamics are apparently different. I report in Panel B of the same table the degree of functional fitting in terms of the implied interpolation coefficients. The minimization procedure matches well coefficients of the first two orders, but has difficulty in replicating the higher order coefficients. I further illustrate the matches by plotting in the top two panels of Fig. 10 the implied drift and diffusion functions from both (33) and (35) under their respective calibrations. For comparison, I plot in the same panels the implied drift and the diffusion functions from my base-case calibration with jumps shut down. While the mimicking is good in
421
terms of the basic shapes, it misses higher order curvatures. Still, the implied habit dynamics, the diffusion component in particular, comes much closer to that implied from CC when compared to that implied from my base model.
6.2.3. Option pricing implications Given the alternative habit dynamics mimicking those from the CC specification, I add the peso component at its base-case calibration and report in the bottom left panel of Fig. 10 the state-dependent ATM prices and return volatilities. Consistent with the data, ATM options are priced with premiums relative to the total market volatilities over all states. I also plot in the same panel the state-dependent volatility and jump intensity under the risk-neutral measure, which look similar to their basecase counterparts. The bottom right panel plots the implied average volatility smirk, where the smirk premium (11.9%) is somewhat higher than that (10.4%) in the base model. As expected, I find that the implied jump-risk premium accounts for 50.5% of the total equity premium, which is also somewhat higher than its base-model counterpart (41.8%). I also examine the option pricing implication from the original CC model augmented with a peso component. One issue is the sensitivity function in CC, i.e., lðst Þ defined by Eq. (10) in their paper, rises very quickly as the surplus ratio S decreases, which is illustrated in their Fig. 1. The amplifying effect is thus very large for low S. In particular, a mild consumption jump induces extreme jumps in the pricing kernel during the bad economic states, implying very large wedges in the actual and the risk-neutral jumps which lead to counterfactually high option prices with the implied volatility well above 100%. To gauge the quantity, recall that the pricing kernel M in CC satisfies Mt þ 1 pexpðg½lðst Þ þ 1ct þ 1 Þ,
ð37Þ
which implies the following jump size of M: JM þ 1 ¼ expðg½lðst Þ þ 1mb Þ
ð38Þ
when the peso component as in the base model is added. Using CC calibration for habit with jump parameters set at their base levels, JM +1 =513 at st ¼ s! For comparison, the largest pricing kernel jump size in my base model (when S-0Þ is below 10.0. To avoid extreme numbers, I focus on option prices conditioned on good states under which lðst Þ and hence, JM, are relatively small. In particular, I choose s at which the implied JM matches the average jump size of the pricing kernel in my base model. The selected state denotes a very good economic condition under which lðsÞ equals 2.27. The implied volatility smirk is also plotted in the bottom right panel of Fig. 10, which is again highly pronounced. Note the implied ATM price is very small, because return volatility in CC is very low during good states, as illustrated in their Fig. 5. The above results corroborate the point made at the start of this section: the implied smirk pattern for options is robust to different habit specifications.
422
D. Du / Journal of Financial Economics 99 (2011) 400–426
The drift in the log surplus ratio process
The diffusion in the log surplus ratio process 0.7
0.4
CC Mimicking CC Base case
0.3
CC Mimicking CC Base case
0.6 0.5 0.4
0.2
0.3
0.1
0.2 0 0.1 –0.1 0.02
0.04
0.06
0.08
0.02
Surplus ratio
0.04
0.06
0.08
Surplus ratio
Black–Scholes implied volatility (%)
Implied volatility smirks when there are jumps
0.25 0.2 0.15
ATM B/S–vol volR
0.1
volR λ
Q
Q
0.05 0.02
0.04
0.06
0.08
Surplus ratio
25 20 15 10 MSV habit mimicking CC habit Original CC habit
5
0.9
0.92
0.94
0.96
0.98
1
Moneyness
Fig. 10. Robustness check on CC habit dynamics. The top two panels plot, respectively, the drifts and the diffusions of the log surplus-ratio process as functions of the surplus ratio S, implied from (i) CC habit dynamics, (ii) an alternative calibration of MSV habit that mimics CC habit dynamics, and (iii) the base-case calibration with jumps shut down. The bottom left panel plots together the state-dependences of ATM prices, the return volatilities under both Q the actual measure, volR, and the risk-neutral measure, volRQ, and jump intensity under the actual measure, l , implied from the mimicking calibration. The bottom right panel plots together the volatility smirk for options with 30 days to expiration implied from (i) MSV habit that mimics CC habit, and (ii) the original CC habit, where in both bases I add a peso component calibrated at the base-case level.
6.3. Effects of randomness in consumption jump size In the base model, I assume a constant consumption jump size b by setting sb to zero. A natural question is whether and how option prices will change when we add randomness to the jump size? To answer it, I set sb ¼ 13 jmb j ¼ 5:73% representing a fairly volatile consumption jump size, and I plot in the top left panel of Fig. 11 the implied average volatility smirk for options with 30 days to expiration under the two scenarios of sb , which shows that the implied option prices are virtually unchanged. To see why, I plot in the top right panel of the same figure the state-dependent jump-risk premiums under the same two scenarios, which again shows little differences, particularly for large S that are more likely to occur. Adding randomness into the consumption jump size thus induces little changes to both the jump-risk compensation and the implied smirk pattern. The above results look a little surprising, because with a high risk aversion (g ¼ 34 in the base model), the NaikLee model would imply significant changes in option prices when we raise sb . Indeed, using jump parameters from Table 1 of LPW ðmb ¼ 1%, lb ¼ 1=3Þ, I find that the
implied volatility smirk from Naik-Lee changes dramatically when I raise sb from zero to 5.73% with g set at 34, which is plotted in the bottom left panel of Fig. 11. The reason is that with the standard CRRA preference, jumprisk premium in Naik-Lee is sensitive to risk aversion when consumption jump size is random. To illustrate it, I vary g and plot in the bottom right panel of Fig. 11 the jump-risk premium, EPj, implied from Naik-Lee under two sb scenarios. EPj rises quickly with g when sb 40. In contrast, EPj barely changes conditioned on a constant jump size. When g is high, the substantial difference in jump-risk compensations between the two sb scenarios explains the large discrepancy in the generated smirk patterns. The contrasts between the top two and the bottom two panels of Fig. 11 is striking, and the reason is that, unlike Naik-Lee, raising sb in my model with habit formation does not induce significant changes in the compensation for jump risks even though the average risk aversion is fairly high. The best way to illustrate this critical difference is to consider the limit case where risk aversion approaches to infinity. Given g, denote by DEP NL j ðg; s b Þ and DEP Hj ðgt ; s b Þ the changes of the jump-risk premium in
D. Du / Journal of Financial Economics 99 (2011) 400–426
Jump–risk premium from my model 7
26
Jump–risk premium (%)
Black–Scholes implied volatility (%)
Unconditional volatility smirk from my model
24 22 20
6 5 4 3 2 1
18 0.9
0.92
0.94
0.96
0.98
1
Moneyness
σb=0
0.01
0.02
0.03
0.04
0.05
surplus ratio
σb=5.73%
volatility smirk from Naik–Lee model
Jump–risk premium from the Naik–Lee 5 Jump–risk premium (%)
Black–Scholes implied volatility (%)
423
30
25
20
15
4 3 2 1 0
0.9
0.92
0.94 0.96 moneyness
0.98
1
5
10 Risk aversion
15
20
Fig. 11. Effects of randomness in the consumption jump size. The top two panels plot, respectively, the unconditional volatility smirk of options with 30 days to expiration, and the state-dependent smirk premium, measured as the price difference between 10% OTM puts and ATMs, under two jump scenarios: the base-case scenario with sb ¼ 0, and the otherwise identical scenario but with sb ¼ 13 jmb j, where sb denotes the volatility of the consumption jump size. The bottom two panels plot, respectively, the volatility smirk and the jump-risk premium as the function of the risk aversion implied from the Naik-Lee model under the same two sb scenarios. Other parameters are from Table 1 of LPW for the only risk-aversion case, except that I set g ¼ 34 when computing the volatility smirk from Naik and Lee (1990).
Naik-Lee and in my model, respectively, following the increase of sb from zero to some s b 40. In Appendix A.2, I show that DEP NL j ðg; s b Þ-1 as g-1. In contrast, DEP Hj ðgt ; s b Þ approaches to the number Aðs b Þ given by (61)–(62) when gt -1. As result, the implied smirk pattern from my model is much less sensitive to randomness in the consumption jump size even when the agent is highly risk averse.
7. Conclusion This paper proposes a preference-based general equilibrium model that explains the pricing of the S&P 500 index options since the 1987 market crash. The central ingredients are a peso component in the consumption growth rate and the time-varying risk aversion induced by habit formation that amplifies consumption shocks. The amplifying effect generates excess volatility and excess jumps in the stock market. Consumption jumps also induce jumps in the marginal utility which, when combined with excess price jumps, generate a large jump-risk premium. Taken together, excess volatility and a large compensation for jump risks combine to produce a pronounced smirk pattern observed
in the data. The time-varying volatility and jump-risk premiums enable the model to make predictions about the state-dependent smirk patterns as well, which are largely supported in the data. Besides volatility smirks, the model has a variety of other pricing implications, such as the high equity premium, the option term structure, and variations of price–dividend ratios across time, which are broadly consistent with the observations in the aggregate stock and option market. Featuring state-dependent premiums and prices, my model differs in an important way from the other two equilibrium option pricing models, LPW and BCDG, in which asset prices have no state-variations. As such, my paper provides the first model that accounts for the statedependent smirk patterns from an equilibrium setting. In contrast to reduced-form option pricing models (e.g., Bates, 2000; Pan, 2002; Eraker, 2004) which exogenously specify the dynamics of interest rate, risk premiums, diffusive volatility, and jumps, my model endogenizes all the above variables and traces their dynamics back to those of the aggregate consumption. For example, it attributes the comovements of jump intensity and volatility to the time-varying risk aversions driven by consumption shocks, which provides an economic story
424
D. Du / Journal of Financial Economics 99 (2011) 400–426
for the time-varying jump-risk premiums implicit in options that are shown by Pan (2002). My paper also contributes to the peso problem literature. First, I show that very severe jumps required by peso problem models (e.g., Barro, 2006) to rationalize the equity premium lead to counterfactually high smirk premiums for index options. Second, I show that by introducing habit formation, a jump calibration much more in line with the historical U.S. data simultaneously replicates the observed equity premium and smirk premium. In summary, with more reasonably calibrated jumps, this paper unifies stock and option pricing and connects both of them to the underlying macroeconomic behavior. Appendix A A.1. Proof of propositions I need the following lemma to prove Proposition 1: Lemma 1. Fix a probability space ðO,F ,PÞ and an information filtration ðF t Þt Z 0 , where F t denotes the information available up to period t. Suppose X is a Markov process in some state space D Rn , which satisfies the following process: dX t ¼ mðXt Þ dt þ sðXt Þ dBt þ dZ t ,
ð39Þ
where Bt is an m-dimensional standard Brownian; m : D-Rn , s : D-Rnm ; Zt is a pure jump process with the constant jump arrival intensity l and the jump sizes JXt ð Xtþ Xt Þ which are allowed to be inadequately observed conditioned on F t ; the Brownian’s and the jumps are independent of each other. Assume necessary technical conditions, which are similar to the conditions of (i)–(iii) in Duffie, Pan, and Singleton (2000). For an n-dimensional vector x, denote by diag(x) an n-by-n diagonal matrix with its ith principal diagonal equaling the ith entry of x (i=1,y,n). Then, if Et ðdX t Þ ¼ ðA0 þ A1 Xt Þ dt, n
are both constants, Et(Xs) ðs 4 tÞ where A0 2 R and A1 2 R can be derived in the following closed form: Z s Et ðXs Þ ¼ Cst Xt þ Csu A0 du, ð41Þ t
where Ct ¼ Ueot U 1 , where U 2 Rnn and o ½o1 , . . . , on u 2 Rn are eigenvectors and eigenvalues of A1, i.e., A1 ¼ UdiagðoÞU 1 ; eot diag½eo1 , . . . ,eon 2 Rnn . In particular, when n=1, A0 A1 ðstÞ ½e 1: A1
ð42Þ
Proof. Under the assumption of (40), the Xt-process can be written in terms of the Ito integral as follows: Z s Z s Xs ¼ Xt þ ðA0 þ A1 Xu Þ du þ sðXu Þ dBu þJt,s , ð43Þ t
t
where Jt,s
X
ðXt þ Xti Þl
t o ti r s
i
t
which implies: dEt ðXs Þ ¼ A0 þ A1 Et ðXs Þ: ds
ð46Þ
Eq. (46) is a vector form ordinary differential equation (ODE) about Et(Xs) with the initial condition of Et(Xt) = Xt, and its solution is given by (41). Eq. (42) follows by noticing that U=1 and o ¼ A1 in the scalar case. & Proof of Proposition 1. By substituting out the expression for the pricing kernel in (11), Z 1 Ct erðstÞ gs ds : ð47Þ Pt ¼ Et
gt
t
From (6) to (7), Et ðdgs Þ ¼ kðg gt Þ, which satisfies the assumption of (40) with A0 ¼ kg and A1 = k. Hence, Z 1 Z 1 erðstÞ gs ds ¼ erðstÞ Et ðgs Þ ds fs ðgt Þ Et t t Z 1 A0 A1 ðstÞ erðstÞ eðstÞA1 gt þ ½e 1 ds ¼ A1 t 1 A0 1 1 gt þ ¼ rA1 A1 rA1 r 1 kg , ð48Þ gþ ¼ r þ k t rðr þ kÞ where the second equality follows from the Fubini Theorem; the third equality follows from (42); the last equality follows from the definitions of A0 and A1 for this problem. Proposition 1 thus follows by substitutions. &
ð40Þ nn
Et ðXs Þ ¼ eðstÞA1 Xt þ
X conditioned on F t . Under necessary technical conditions and by following the argument similar to the proof of Lemma 1 in Duffie, Pan, and Singleton (2000), I conclude Rs that t sðXu ÞdBu and Jt,s are both martingales. Hence, by taking conditional expectations on both sides of (43), Z s Et ðXs Þ ¼ Xt þ Et ðA0 þ A1 Xu Þ du , ð45Þ
Z
s t
Eu ðJXu Þ du,
ð44Þ
where tiþ denotes the time an instant after the occurrence of the ith jump at ti ; Et ðJXt Þ denotes the average jump size of
Proof of Proposition 2. Eq. (17) is obvious from Proposition 1. By its definition (e.g., Cochrane, 2001, Chapter 2), dLt rf t dt ¼ Et ð49Þ ¼ mLt dtlJLt dt,
Lt
which yields (18). Eq. (19) follows from (A.14) in Lonstaff and Piazzesi (2004) which computes the equity premium with the recognition of jumps, where sRt and JPt are defined by (20)–(21). Eq. (20) is a direct result of Ito’s lemma applied to Pt in (12). To verify (21), note 3 2 1 1 a2 þ þ þ þ 6 Pt C X gt gt 7 7 ¼ t t ¼ eb 6 41 þ 5 Pt Ct Xt Xt St gt ¼ eb 1 þa2 1 , Xt gtþ
ð50Þ
where the first and the second equality follow from (12). Eq. (21) thus follows by noting gtþ =gt ¼ 1þ Jgt . Finally, the impact on the jump processes due to the change of the measure (e.g., Duffie, Pan, and Singleton, 2000, Appendix B) implies that
lQt ¼ lEt ðJLt þ1Þ,
ð51Þ
D. Du / Journal of Financial Economics 99 (2011) 400–426
and (22) follows since Et ðJLt Þ ¼ JLt with a constant consumption jump size. & A.2. Randomness in consumption jump size: Naik-Lee vs. habit formation I show in the following that the implied smirk pattern from my model, when compared to that in the Naik-Lee model, is much less sensitive to randomness in the consumption size. Due to the relation between smirk premium and jump-risk premium, I present the results in terms of the comparison between DEP NL j ðg; s b Þ and DEP Hj ðgt ; s b Þ which denote the changes of the jump-risk premium for the given risk aversion g in Naik-Lee and in my model, respectively, following the increase of sb from zero to some s b 40, where sb denotes the volatility of the consumption jump size. In particular, I consider the limit case where g-1 which illustrates the point. To facilitate comparison, I transform Naik and Lee’s (1990) notations as much as possible into mine. First consider the Naik-Lee model in which the representative agent has the CRRA preference ðC 1g Þ=ð1gÞ; the aggregate consumption C follows (5) with the log jump size being normally distributed. In their model,
lQ ¼ legmb þ ð1=2Þg
2
s2b ;
mQb ¼ mb gsb ,
ð52Þ
Q
where l denotes the jump arrival intensity under the risk-neutral measure; mb and mQb denote the average log consumption jump size under the actual measure and under the risk-neutral measure, respectively. From (24), the implied jump-risk premium is mb þ ð1=2Þs2b Þlegmb þ ð1=2Þg2 s2b ð1emb gsb þ ð1=2Þs2b Þ: EP NL j ¼ lð1e
ð53Þ
425 Q
þ þ þ þ X g Lt Pt þ Et t t Et Xt gt Lt Pt Q Q Pt Et ðJPt Þ Et 1 1 ¼ þ 1 ¼ Et ðJLt þ1Þ Pt Lt Et
Lt
a1 gtþ gt Et 1 þ Xt gt 1 ¼ Et ðJLt þ 1Þ a1 aðgt bÞmb 1 Xt gt mb ð1=2Þs2b ¼e 1, gt b 1þ a ðmb þ s2b Þ
gt
where I’ve used (12) for the third and the fourth equality, and (7) and (10) for the last equality. Hence, Q
lim E gt -1 t
2
ðJPt Þ ¼ emb ð1=2Þsb
1amb 1: 1 þ aðmb þ s2b Þ
2
mb þ ð1=2Þsb 1Þ lim EP H j,t ¼ lðe
gt -1
2
lemb þ ð1=2Þsb ½1þ aðmb þ s2b Þ " # 1amb 2 emb ð1=2Þsb 1 : 1þ aðmb þ s2b Þ
ð61Þ
Therefore, gt -1
gt -1
ð62Þ
2 ¼ lðemb þ ð1=2Þs b þ emb þegmb eð1gÞmb Þ 2
gmb þ ð1=2Þg2 s b
le
mb gs b þ ð1=2Þs 2b
ð1e
Þ:
which is some finite number. ð54Þ
Since mb o 0 and s b 4 0, it is not difficult to see that DEP NL j ðg; s b Þ-1 as g-1. In my model, combining (7) with (10) yields Lþ g b b 1: ð55Þ JLt t 1 ¼ eb 1a t
gt
Lt
Using (22), 2
lQt ¼ l½Et ðJLt Þ þ1 ¼ lemb þ ð1=2Þsb 1 þ a
gt b ðmb þ s2b Þ , gt ð56Þ
where I’ve used that b is normally distributed with mean mb and volatility sb . Hence, 2
¼ lemb þ ð1=2Þsb ½1 þ aðmb þ s2b Þ:
ð57Þ
On the other hand, (21) combined with (7) and (12) implies JPt -eb 1 as gt -1, hence, 2
lim Et ðJPt Þ ¼ emb þ ð1=2Þsb 1:
gt -1
ð60Þ
Substituting (57)–(60) into (24) yields the limit jump-risk premium implied from my model:
NL NL DEP NL j ðg; s b Þ EP j jsb ¼ s b EP j jsb ¼ 0
Q
ð59Þ
H H lim DEP H j ðgt ; s b Þ lim ½EP j,t jsb ¼ s b EP j,t jsb ¼ 0 Aðs b Þ,
Hence,
lim l gt -1 t
2
We can similarly write limgt -1 EQt ðJPt Þ ¼ emb þ ð1=2Þsb 1 under the risk-neutral measure, but the direct computation of mQb is hard at the presence of habit formation. Instead, I first compute EQt ðJPt Þ for the given gt and then take the limit. Following the change of the measure for jump processes (e.g., Duffie, Pan, and Singleton, 2000, Appendix B),
ð58Þ
References Bakshi, G.S., Cao, C., Chen, Z., 1997. Empirical performance of alternative option pricing models. Journal of Finance 52, 2003–2051. Bansal, R., Yaron, A., 2004. Risks for the long run: a potential resolution of asset pricing puzzles. Journal of Finance 59, 1481–1509. Barndorff-Nielsen, O., Shephard, N., 2002. Econometric analysis of realized volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society: Series B 64, 253–280. Barro, R.J., 2006. Rare disasters and asset markets in the twentieth century. Quarterly Journal of Economics 121, 823–866. Bates, D.S., 2000. Post -‘87 crash fears in the S&P 500 futures option market. Journal of Econometrics 94, 181–238. Benzoni, L., Collin-Dufresne, P., Goldstein, R.S., 2007. Explaining pre- and post-1987 crash asset prices within a unified general equilibrium framework. Unpublished working paper, Federal Reserve Bank of Chicago, Columbia University, University of Minnesota. Black, F., Scholes, M., 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 637–654. Black, F., 1976. Studies of stock price volatility changes. In: Proceedings of the 1976 Meetings of the Business and Economics Statistics Section. American Statistical Association, pp. 177–181. Buraschi, A., Jiltsov, A., 2007. Habit formation and macroeconomic models of the term structure of interest rates. The Journal of Finance 62, 3009–3063.
426
D. Du / Journal of Financial Economics 99 (2011) 400–426
Campbell, J., Cochrane, J., 1999. By force of habit: a consumer-based explanation of aggregate stock market behavior. Journal of Political Economy 107, 205–251. Chen, L., Collin-Dufresne, P., Goldstein, R.S., 2008. On the relation between credit spread puzzles and the equity premium puzzle. Review of Financial Studies 22, 3367–3409. Cochrane, J., 2001. Asset Pricing. Princeton University Press, Princeton, NJ. Dai, Q., Singleton, K., 2003. Term structure dynamics in theory and reality. Review of Financial Studies 16, 631–678. David, A., Veronesi, P., 2002. Option prices with uncertain fundamentals. Unpublished working paper, Washington University in St. Louis, University of Chicago. Duffie, D., 2001. Dynamic Asset Pricing Theory. Princeton University Press, Princeton, NJ. Duffie, D., Pan, J., Singleton, K., 2000. Transform analysis and asset pricing for affine crash-diffusions. Econometrica 68, 1343–1376. Epstein, L.G., Zin, S.E., 1989. Substitution, risk aversion, and the temporal behavior of consumption and asset returns: a theoretical framework. Econometrica 57, 937–969. Eraker, B., 2004. Do stock prices and volatility crash? Reconciling the evidence from spot and option prices. Journal of Finance 59, 1367–1404. Hansen, L.P., 1982. Large sample properties of generalized method of moment estimators. Econometrica 50, 1029–1054.
Hansen, L.P., Heaton, J., Li, N., 2004. Intangible risk. Unpublished working paper, University of Chicago. Heston, S., 1993. A closed-form solution of options with stochastic volatility with applications to bond and currency options. The Review of Financial Studies 6, 327–343. Lettau, M., Ludvigson, S., Wachter, J., 2008. The declining equity premium: what role does macroeconomic risk play? Review of Financial Studies 21, 1653–1687. Liu, J., Pan, J., Wang, T., 2005. An equilibrium model of rare-event premiums and its implication for option smirks. Review of Financial Studies 8, 131–164. Lonstaff, F., Piazzesi, M., 2004. Corporate earnings and the equity premium. Journal of Financial Economics 74, 401–421. Menzly, L., Santos, T., Veronesi, P., 2004. Understanding predictability. Journal of Political Economy 112, 1–47. Naik, V., Lee, M., 1990. General equilibrium pricing of options on the market portfolio with discontinuous returns. Review of Financial Studies 3, 493–521. Pan, J., 2002. The jump-risk premia implicit in options: evidence from an integrated time-series study. Journal of Financial Economics 63, 3–50. Rietz, T., 1988. The equity premium puzzle: a solution? Journal of Monetary Economics 21, 117–132. Wachter, J., 2006. A consumption-based model of the term structure of interest rate. Journal of Financial Economics 79, 365–399.
Journal of Financial Economics 99 (2011) 427–446
Contents lists available at ScienceDirect
Journal of Financial Economics journal homepage: www.elsevier.com/locate/jfec
Maxing out: Stocks as lotteries and the cross-section of expected returns$ Turan G. Bali a,1, Nusret Cakici b,2, Robert F. Whitelaw c,d,n a
Department of Economics and Finance, Zicklin School of Business, Baruch College, One Bernard Baruch Way, Box 10-225, New York, NY 10010, United States Department of Finance, Fordham University, Fordham University, 113 West 60th Street, New York, NY 10023, United States Stern School of Business, New York University, 44 W. 4th Street, Suite 9-190, New York, NY 10012, United States d NBER, United States b c
a r t i c l e in fo
abstract
Article history: Received 24 November 2009 Received in revised form 2 February 2010 Accepted 2 March 2010
Motivated by existing evidence of a preference among investors for assets with lotterylike payoffs and that many investors are poorly diversified, we investigate the significance of extreme positive returns in the cross-sectional pricing of stocks. Portfolio-level analyses and firm-level cross-sectional regressions indicate a negative and significant relation between the maximum daily return over the past one month (MAX) and expected stock returns. Average raw and risk-adjusted return differences between stocks in the lowest and highest MAX deciles exceed 1% per month. These results are robust to controls for size, book-to-market, momentum, short-term reversals, liquidity, and skewness. Of particular interest, including MAX reverses the puzzling negative relation between returns and idiosyncratic volatility recently shown in Ang, Hodrick, Xing, and Zhang (2006, 2009). & 2010 Elsevier B.V. All rights reserved.
JEL classification: G11 G17 G12 Keywords: Extreme returns Lottery-like payoffs Cross-sectional return predictability Skewness preference Idiosyncratic volatility
1. Introduction What determines the cross-section of expected stock returns? This question has been central to modern financial economics since the path breaking work of
$ We would like to thank Yakov Amihud, Xavier Gabaix, Evgeny Landres, Orly Sade, Jacob Sagi, Daniel Smith, Jeff Wurgler, and seminar participants at the Cesaerea 6th Annual Conference, Arison School of Business, IDC; HEC, Paris; INSEAD; New York University; and Simon Fraser University for helpful comments. n Corresponding author at: Stern School of Business, New York University, 44 W. 4th Street, Suite 9-190, New York, NY 10012, United States. Tel.: + 1 212 998 0338. E-mail addresses:
[email protected] (T.G. Bali),
[email protected] (N. Cakici),
[email protected] (R.F. Whitelaw). 1 Tel.: + 1 646 312 3506; fax: + 1 646 312 3451. 2 Tel.: + 1 212 636 6776; fax: +1 212 765 5573.
0304-405X/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jfineco.2010.08.014
Sharpe (1964), Lintner (1965), and Mossin (1966). Much of this work has focused on the joint distribution of individual stock returns and the market portfolio as the determinant of expected returns. In the classic capital asset pricing model (CAPM) setting, i.e., with either quadratic preferences or normally distributed returns, expected returns on individual stocks are determined by the covariance of their returns with the market portfolio. Introducing a preference for skewness leads to the threemoment CAPM of Kraus and Litzenberger (1976), which has received empirical support in the literature as, for example, in Harvey and Siddique (2000) and Smith (2007). Diversification plays a critical role in these models due to the desire of investors to avoid variance risk, i.e., to diversify away idiosyncratic volatility, yet a closer examination of the portfolios of individual investors suggests
428
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
that these investors are, in general, not well-diversified.3 There may be plausible explanations for this lack of diversification, such as the returns to specialization in information acquisition (Van Nieuwerburgh and Veldkamp, 2010), but nevertheless this empirical phenomenon suggests looking more closely at the distribution of individual stock returns rather than just co-moments as potential determinants of the cross-section of expected returns. There is also evidence that investors have a preference for lottery-like assets, i.e., assets that have a relatively small probability of a large payoff. Two prominent examples are the favorite-longshot bias in horsetrack betting, i.e., the phenomenon that the expected return per dollar wagered tends to increase monotonically with the probability of the horse winning, and the popularity of lottery games despite the prevalence of negative expected returns (Thaler and Ziemba, 1988). Interestingly, in the latter case, there is increasing evidence that it is the degree of skewness in the payoffs that appeals to participants (Garrett and Sobel, 1999; Walker and Young, 2001), although there are alternative explanations, such as lumpiness in the goods market (Patel and Subrahmanyam, 1978). In the context of the stock market, Kumar (2009) shows that certain groups of individual investors appear to exhibit a preference for lottery-type stocks, which he defines as low-priced stocks with high idiosyncratic volatility and high idiosyncratic skewness. Motivated by these two literatures, we examine the role of extreme positive returns in the cross-sectional pricing of stocks. Specifically, we sort stocks by their maximum daily return during the previous month and examine the monthly returns on the resulting portfolios over the period July 1962–December 2005. For valueweighted decile portfolios, the difference between returns on the portfolios with the highest and lowest maximum daily returns is 1.03%. The corresponding Fama-FrenchCarhart four-factor alpha is 1.18%. Both return differences are statistically significant at all standard significance levels. In addition, the results are robust to sorting stocks not only on the single maximum daily return during the month, but also the average of the two, three, four, or five highest daily returns within the month. This evidence suggests that investors may be willing to pay more for stocks that exhibit extreme positive returns, and thus, these stocks exhibit lower returns in the future. This interpretation is consistent with cumulative prospect theory (Tversky and Kahneman, 1992) as modeled in Barberis and Huang (2008). Errors in the probability weighting of investors cause them to overvalue stocks that have a small probability of a large positive return. It is also consistent with the optimal beliefs framework of Brunnermeier, Gollier, and Parker (2007). In this model, agents optimally choose to distort
3 See, for example, Odean (1999), Mitton and Vorkink (2007), and Goetzmann and Kumar (2008) for evidence based on the portfolios of a large sample of U.S. individual investors. Calvet, Campbell, and Sodini (2007) present evidence on the underdiversification of Swedish households, which can also be substantial, although the associated welfare costs for the median household appear to be small.
their beliefs about future probabilities in order to maximize their current utility. Critical to these interpretations of the empirical evidence, stocks with extreme positive returns in a given month should also be more likely to exhibit this phenomenon in the future. We confirm this persistence, showing that stocks in the top decile in one month have a 35% probability of being in the top decile in the subsequent month and an almost 70% probability of being in one of the top three deciles. Moreover, maximum daily returns exhibit substantial persistence in firm-level cross-sectional regressions, even after controlling for a variety of other firm-level variables. Not surprisingly, the stocks with the most extreme positive returns are not representative of the full universe of equities. For example, they tend to be small, illiquid securities with high returns in the portfolio formation month and low returns over the prior 11 months. To ensure that it is not these characteristics, rather than the extreme returns, that are driving the documented return differences, we perform a battery of bivariate sorts and reexamine the raw return and alpha differences. The results are robust to sorts on size, book-to-market ratio, momentum, short-term reversals, and illiquidity. Results from cross-sectional regressions corroborate this evidence. Are there alternative interpretations of this apparently robust empirical phenomenon? Recent papers by Ang, Hodrick, Xing, and Zhang (2006, 2009) contain the anomalous finding that stocks with high idiosyncratic volatility have low subsequent returns. It is no surprise that the stocks with extreme positive returns also have high idiosyncratic (and total) volatility when measured over the same time period. This positive correlation is partially by construction, since realized monthly volatility is calculated as the sum of squared daily returns, but even excluding the day with the largest return in the volatility calculation only reduces this association slightly. Could the maximum return simply be proxying for idiosyncratic volatility? We investigate this question using two methodologies, bivariate sorts on extreme returns and idiosyncratic volatility and firm-level cross-sectional regressions. The conclusion is that not only is the effect of extreme positive returns we find robust to controls for idiosyncratic volatility, but that this effect reverses the idiosyncratic volatility effect shown in Ang, Hodrick, Xing, and Zhang (2006, 2009). When sorted first on maximum returns, the equal-weighted return difference between high and low idiosyncratic volatility portfolios is positive and both economically and statistically significant. In a cross-sectional regression context, when both variables are included, the coefficient on the maximum return is negative and significant while that on idiosyncratic volatility is positive, albeit insignificant in some specifications. These results are consistent with our preferred explanation—poorly diversified investors dislike idiosyncratic volatility, like lottery-like payoffs, and influence prices and hence future returns. A slightly different interpretation of our evidence is that extreme positive returns proxy for skewness, and investors exhibit a preference for skewness. For example, Mitton and Vorkink (2007) develop a model of agents with heterogeneous skewness preferences and show that
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
the result is an equilibrium in which idiosyncratic skewness is priced. However, we show that the extreme return effect is robust to controls for total and idiosyncratic skewness and to the inclusion of a measure of expected skewness as in Boyer, Mitton, and Vorkink (2010). It is also unaffected by controls for co-skewness, i.e., the contribution of an asset to the skewness of a welldiversified portfolio. The paper is organized as follows. Section 2 provides the univariate portfolio-level analysis, and the bivariate analyses and firm-level cross-sectional regressions that examine a comprehensive list of control variables. Section 3 focuses more specifically on extreme returns and idiosyncratic volatility. Section 4 presents results for skewness and extreme returns. Section 5 concludes. 2. Extreme positive returns and the cross-section of expected returns 2.1. Data The first data set includes all New York Stock Exchange (NYSE), American Stock Exchange (Amex), and Nasdaq financial and nonfinancial firms from the Center for Research in Security Prices (CRSP) for the period from January 1926 through December 2005. We use daily stock returns to calculate the maximum daily stock returns for each firm in each month as well as such variables as the market beta, idiosyncratic volatility, and various skewness measures; we use monthly returns to calculate proxies for intermediate-term momentum and short-term reversals; we use volume data to calculate a measure of illiquidity; and we use share prices and shares outstanding to calculate market capitalization. The second data set is Compustat, which is used to obtain the equity book values for calculating the book-to-market ratios of individual firms. These variables are defined in detail in the Appendix and are discussed as they are used in the analysis. 2.2. Univariate portfolio-level analysis Table 1 presents the value-weighted and equalweighted average monthly returns of decile portfolios that are formed by sorting the NYSE/Amex/Nasdaq stocks based on the maximum daily return within the previous month (MAX). The results are reported for the sample period July 1962–December 2005. Portfolio 1 (low MAX) is the portfolio of stocks with the lowest maximum daily returns during the past month, and portfolio 10 (high MAX) is the portfolio of stocks with the highest maximum daily returns during the previous month. The value-weighted average raw return difference between decile 10 (high MAX) and decile 1 (low MAX) is 1.03% per month with a corresponding NeweyWest (1987) t-statistic of 2.83. In addition to the average raw returns, Table 1 also presents the intercepts (Fama-French-Carhart four-factor alphas) from the regression of the value-weighted portfolio returns on a constant, the excess market return, a size factor (SMB),
429
a book-to-market factor (HML), and a momentum factor (MOM), following Fama and French (1993) and Carhart (1997).4 As shown in the last row of Table 1, the difference in alphas between the high MAX and low MAX portfolios is 1.18% per month with a Newey–West t-statistic of 4.71. This difference is economically significant and statistically significant at all conventional levels. Taking a closer look at the value-weighted average returns and alphas across deciles, it is clear that the pattern is not one of a uniform decline as MAX increases. The average returns of deciles 1–7 are approximately the same, in the range of 1.00–1.16% per month, but, going from decile 7 to decile 10, average returns drop significantly, from 1.00% to 0.86%, 0.52%, and then to 0.02% per month. The alphas for the first seven deciles are also similar and close to zero, but again they fall dramatically for deciles 8 through 10. Interestingly, the reverse of this pattern is evident across the deciles in the average across months of the average maximum daily return of the stocks within each decile. By definition, this average increases monotonically from deciles 1 to 10, but this increase is far more dramatic for deciles 8, 9, and 10. These deciles contain stocks with average maximum daily returns of 9%, 12%, and 24%, respectively. Given a preference for upside potential, investors may be willing to pay more for, and accept lower expected returns on, assets with these extremely high positive returns. In other words, it is conceivable that investors view these stocks as valuable lottery-like assets, with a small chance of a large gain. As shown in the third column of Table 1, similar, although somewhat less economically and statistically significant results, are obtained for the returns on equalweighted portfolios. The average raw return difference between the low MAX and high MAX portfolios is 0.65% per month with a t-statistic of 1.83. The corresponding difference in alphas is 0.66% per month with a t-statistic of 2.31. As with the value-weighted returns, it is the extreme deciles, in this case deciles 9 and 10, that exhibit low future returns and negative alphas. For the analysis in Table 1, we start the sample in July 1962 because this starting point corresponds to that used in much of the literature on the cross-section of expected returns; however, the results are similar using the sample starting in January 1926 and for various subsamples. For example, for the January 1926–June 1962 subsample, the average risk-adjusted return difference for the valueweighted portfolios is 1.25% per month, with a corresponding t-statistic of 3.43. When we break the original sample at the end of 1983, the subperiods have alpha differences of 1.62% and 0.99% per month, both of which are statistically significant. In the remainder of the paper, we continue presenting results for the July 1962– December 2005 sample for comparability with earlier studies. While conditioning on the single day with the maximum return is both simple and intuitive as a proxy for
4 Small minus big (SMB), high minus low (HML), and MOM (winner minus loser) are described in and obtained from Kenneth French’s data library: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/.
430
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
Table 1 Returns and alphas on portfolios of stocks sorted by MAX. Decile portfolios are formed every month from July 1962 to December 2005 by sorting stocks based on the maximum daily return (MAX) over the past one month. Portfolio 1 (10) is the portfolio of stocks with the lowest (highest) maximum daily returns over the past one month. The table reports the value-weighted (VW) and equal-weighted (EW) average monthly returns, the four-factor Fama-French-Carhart alphas on the value-weighted and equalweighted portfolios, and the average maximum daily return of stocks within a month. The last two rows present the differences in monthly returns and the differences in alphas with respect to the four-factor Fama-French-Carhart model between portfolios 10 and 1 and the corresponding t-statistics. Average raw and risk-adjusted returns, and average daily maximum returns are given in percentage terms. Newey-West (1987) adjusted t-statistics are reported in parentheses. VW Portfolios Decile
EW Portfolios
Average return
Four-factor alpha
Average return
Four-factor alpha
Average MAX
Low MAX 2 3 4 5 6 7 8 9 High MAX
1.01 1.00 1.00 1.11 1.02 1.16 1.00 0.86 0.52 0.02
0.05 0.00 0.04 0.16 0.09 0.15 0.03 0.21 0.49 1.13
1.29 1.45 1.55 1.55 1.49 1.49 1.37 1.32 1.04 0.64
0.22 0.33 0.39 0.39 0.31 0.33 0.23 0.20 0.09 0.44
1.30 2.47 3.26 4.06 4.93 5.97 7.27 9.07 12.09 23.60
10-1 difference
1.03 ( 2.83)
1.18 ( 4.71)
0.65 ( 1.83)
0.66 ( 2.31)
extreme positive returns, it is also slightly arbitrary. As an alternative, we also rank stocks by the average of the N (N = 1, 2, y, 5) highest daily returns within the month, with the results reported in Table 2. As before, we report the difference between the returns and alphas on the deciles of firms with the highest and lowest average daily returns over the prior month. For ease of comparison, we report the results from Table 1 in the first column (N =1). For both the value-weighted (Panel A) and the equalweighted portfolios (Panel B), the return patterns when sorting on average returns over multiple days are similar to those when sorting on the single maximum daily return. In fact, if anything, the raw return and alpha differences are both economically and statistically more significant as we average over more days. For example, for value-weighted returns these differences increase in magnitude from 1.03% and 1.18% for N =1 to 1.23% and 1.32% for N =5. Another alternative measure of the extent to which a stock exhibits lottery-like payoffs is to compute MAX over longer past periods. Consequently, we first form the MAX(1) portfolios based on the highest daily return over the past 3, 6, and 12 months, and the average raw return differences between the high MAX and low MAX portfolios are 0.63%, 0.52%, and 0.41% per month, respectively. Although these return differences are economically significant, we have statistical significance only for MAX(1) computed over the past quarter. When the MAX(5) portfolios are formed based on the five largest daily returns over the past 3, 6, and 12 months, the average raw return differences are larger ( 1.27%, 1.15%, and 0.86% per month, respectively), and they are all statistically significant. More importantly, the differences between the four-factor Fama-French-Carhart alphas for the low and high MAX portfolios are negative and economically and statistically significant for all measures of MAX(1) and MAX(5). Specifically, the alpha
differences for the MAX(1) portfolios are in the range of 0.68% to 0.74% per month with t-statistics ranging from 2.52 to 2.92. For MAX(5) the results are even stronger, with alpha differences ranging between 1.20% and 1.41% per month and t-statistics between 3.78 and 4.36. Finally, we also consider a measure defined as the maximum daily return in a month averaged over the past 3, 6, and 12 months. The average raw and riskadjusted return differences between the extreme portfolios are negative and highly significant without exception.5 These analyses show that different proxies for lotterylike payoffs generate similar results, confirming their robustness and thus providing further support for the explanation we offer. For simplicity we focus on MAX(1) over the previous month in the remainder of the paper except in cases where the multiple-day averages are needed to illustrate or illuminate a point. Of course, the maximum daily returns shown in Table 1 and those underlying the portfolio sorts in Table 2 are for the portfolio formation month, not for the subsequent month over which we measure average returns. Investors may pay high prices for stocks that have exhibited extreme positive returns in the past in the expectation that this behavior will be repeated in the future, but a natural question is whether these expectations are rational. We investigate this issue by examining the average month-to-month portfolio transition matrix, i.e., the average probability that a stock in decile i in one month will be in decile j in the subsequent month (although for brevity, we do not report these results in detail). If maximum daily returns are completely random, then all the probabilities should be approximately 10%,
5 In the interest of brevity, we do not present detailed results for these alternative measures of MAX, but they are available from the authors upon request.
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
431
Table 2 Returns on portfolios of stocks sorted by multi-day maximum returns. Decile portfolios are formed every month from July 1962 to December 2005 by sorting stocks based on the average of the N highest daily returns (MAX(N)) over the past one month. Portfolio 1 (10) is the portfolio of stocks with the lowest (highest) maximum multi-day returns over the past one month. The table reports the value-weighted (Panel A) and equal-weighted (Panel B) average monthly returns for N = 1,y,5. The last two rows present the differences in monthly returns and the differences in alphas with respect to the four-factor Fama-French-Carhart model between portfolios 10 and 1. Average raw and risk-adjusted returns are given in percentage terms. Newey-West (1987) adjusted t-statistics are reported in parentheses. Panel A: Value-weighted returns on MAX(N) portfolios Decile N =1
N =2
N= 3
N= 4
N=5
1.01 1.00 1.00 1.11 1.02 1.16 1.00 0.86 0.52 0.02
1.00 0.96 1.06 1.08 1.08 1.03 1.04 0.78 0.50 0.16
1.05 0.98 1.09 1.02 1.05 1.08 1.00 0.68 0.49 0.13
1.02 1.02 1.08 1.01 1.06 1.03 1.06 0.70 0.43 0.12
1.05 1.07 1.06 1.04 1.04 1.01 1.06 0.70 0.48 0.18
1.03 ( 2.83) 1.18 ( 4.71)
1.16 ( 2.97) 1.29 ( 4.56)
1.18 ( 2.95) 1.26 ( 4.12)
1.14 ( 2.74) 1.21 ( 3.71)
1.23 ( 2.93) 1.32 ( 4.07)
Panel B: Equal-weighted returns on MAX(N) portfolios Decile N =1
N =2
N= 3
N= 4
N=5
Low MAX(N) 2 3 4 5 6 7 8 9 High MAX(N)
1.29 1.45 1.55 1.55 1.49 1.49 1.37 1.32 1.04 0.64
1.28 1.45 1.55 1.58 1.56 1.45 1.44 1.28 1.01 0.59
1.27 1.48 1.56 1.61 1.56 1.49 1.43 1.27 1.00 0.54
1.29 1.49 1.59 1.62 1.52 1.53 1.42 1.28 0.95 0.51
1.30 1.54 1.59 1.60 1.55 1.52 1.43 1.26 0.94 0.49
0.65 ( 1.83) 0.66 ( 2.31)
0.69 ( 1.88) 0.72 ( 2.36)
0.73 ( 1.99) 0.78 ( 2.53)
0.78 ( 2.11) 0.84 ( 2.75)
0.81 ( 2.21) 0.89 ( 2.93)
Low MAX(N) 2 3 4 5 6 7 8 9 High MAX(N) Return difference Alpha difference
Return difference Alpha difference
since a high or low maximum return in one month should say nothing about the maximum return in the following month. Instead, there is clear evidence that MAX is persistent, with all the diagonal elements of the transition matrix exceeding 10%. Of greater importance, this persistence is especially strong for the extreme portfolios. Stocks in decile 10 (high MAX) have a 35% chance of appearing in the same decile next month. Moreover, they have a 68% probability of being in deciles 8–10, all of which exhibit high maximum daily returns in the portfolio formation month and low returns in the subsequent month. A slightly different way to examine the persistence of extreme positive daily returns is to look at firm-level cross-sectional regressions of MAX on lagged predictor variables. Specifically, for each month in the sample we run a regression across firms of the maximum daily return within that month on the maximum daily return from the previous month and seven lagged control variables that are defined in the Appendix and discussed in more detail later—the market beta (BETA), the market capitalization
(SIZE), the book-to-market ratio (BM), the return in the previous month (REV), the return over the 11 months prior to that month (MOM), a measure of illiquidity (ILLIQ), and the idiosyncratic volatility (IVOL).6 Table 3 reports the average cross-sectional coefficients from these regressions and the Newey-West (1987) adjusted t-statistics. In the univariate regression of MAX on lagged MAX, the coefficient is positive, quite large, and extremely statistically significant, and the R-squared of over 16% indicates substantial cross-sectional explanatory power. In other words, stocks with extreme positive daily returns in one month also tend to exhibit similar features in the following month. When the seven control variables are added to the regression, the coefficient on lagged MAX remains large and significant. Of these seven variables, it 6 The high cross-sectional correlation between MAX and IVOL, as shown later in Table 9 and discussed in Section 3, generates a multicollinearity problem in the regression; therefore, we orthogonalize IVOL for the purposes of regressions that contain both variables.
432
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
Table 3 Cross-sectional predictability of MAX. Each month from July 1962 to December 2005 we run a firm-level cross-sectional regression of the maximum daily return in that month (MAX) on subsets of lagged predictor variables including MAX in the previous month and seven control variables that are defined in the Appendix. The table reports the time-series averages of the cross-sectional regression coefficients, their associated Newey-West (1987) adjusted t-statistics (in parentheses), and the regression R-squareds. MAX
BETA
SIZE
BM
MOM
REV
ILLIQ
R2
IVOL
0.4054 (45.34)
16.64% 0.1116 (4.47)
1.00% 1.3381 ( 22.42)
15.99% 0.5334 (6.41)
1.81% 1.7264 ( 6.87)
3.52% 0.0655 ( 11.19)
3.31% 0.1209 (8.13)
0.3325 (31.31)
0.2500 (12.14)
0.4737 ( 30.45)
0.1277 ( 5.86)
is SIZE and IVOL that contribute most to the explanatory power of the regression, with univariate R-squareds of 16% and 27%, respectively. The remaining five variables all have univariate R-squareds of less than 5%. As a final check on the return characteristics of stocks with extreme positive returns, we examine more closely the distribution of monthly returns on stocks in the high MAX and low MAX portfolios. Tables 1 and 2 report the mean returns on these stocks, and the cross-sectional regressions in Table 3 and the portfolio transition matrix show that the presence, or absence, of extreme positive returns is persistent, but what are the other features of the return distribution? Table 4 presents descriptive statistics for the approximately 240,000 monthly returns on stocks within the two extreme deciles in the post-formation month. The mean returns are almost identical to those reported in Table 1 for the equal-weighted portfolio. The slight difference is attributable to the fact that Table 1 reports averages of returns across equal-weighted portfolios that contain slightly different numbers of stocks, whereas Table 4 weights all returns equally. In addition to having a lower average return, high MAX stocks display significantly higher volatility and more positive skewness. The percentiles of the return distribution illustrate the upper tail behavior. While median returns on high MAX stocks are lower, the returns at the 90th, 95th, and 99th percentiles are more than twice as large as those for low MAX stocks. Clearly, high MAX stocks exhibit higher probabilities of extreme positive returns in the following month. The percentiles of the distribution are robust to outliers, but the moments are not, so in the final two columns we report statistics for returns where the 0.5% most extreme returns in both tails have been eliminated. While means, standard deviations, and skewness for the trimmed distributions fall, the relative ordering remains—high
0.3432 ( 4.47)
0.0504 ( 22.25)
0.0200 (6.16)
4.28% 0.1643 (86.41) 0.1930 (41.60)
27.36% 35.10%
Table 4 Distribution of monthly returns for stocks in the high and low MAX portfolios. Decile portfolios are formed every month from July 1962 to December 2005 by sorting stocks based on the maximum daily returns (MAX) over the past one month. The table reports descriptive statistics for the approximately 240,000 monthly returns on the individual stocks in deciles 1 (low MAX) and 10 (high MAX) in the following month. The tails of the return distribution are trimmed by removing the 0.5% most extreme observations in each tail prior to the calculation of the statistics in the final two columns. Trimmed
Mean Median Std dev Skewness Percentiles 1% 5% 10% 25% 50% 75% 90% 95% 99%
Low MAX
High MAX
Low MAX
High MAX
1.26% 0.35% 12.54% 4.26
0.60% 2.50% 30.21% 5.80
1.04% 0.35% 9.70% 0.59
0.16% 2.50% 24.12% 1.35
29.6% 14.7% 9.3% 3.4% 0.3% 5.1% 11.6% 17.7% 40.0%
52.1% 33.8% 25.9% 14.3% 2.5% 9.5% 28.6% 46.3% 100.0%
MAX stocks have lower means, but higher volatilities and skewness than their low MAX counterparts in the subsequent month. We do not measure investor expectations directly, but the results presented in Tables 3 and 4 are certainly consistent with the underlying theory about preferences for stocks with extreme positive returns. While MAX measures the propensity for a stock to deliver lottery-like
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
433
Table 5 Summary statistics for decile portfolios of stocks sorted by MAX. Decile portfolios are formed every month from July 1962 to December 2005 by sorting stocks based on the maximum (MAX) daily returns over the past one month. Portfolio 1 (10) is the portfolio of stocks with the lowest (highest) maximum daily returns over the past one month. The table reports for each decile the average across the months in the sample of the median values within each month of various characteristics for the stocks—the maximum daily return (in percent), the market beta, the market capitalization (in millions of dollars), the book-to-market (BM) ratio, our measure of illiquidity (scaled by 105), the price (in dollars), the return in the portfolio formation month (labeled REV), the cumulative return over the 11 months prior to portfolio formation (labeled MOM), and the idiosyncratic volatility over the past one month (IVOL). There is an average of 309 stocks per portfolio. Decile
MAX
Size ($106)
Price ($)
Market beta
BM ratio
Illiquidity (105)
IVOL
REV
MOM
Low MAX 2 3 4 5 6 7 8 9 High MAX
1.62 2.51 3.22 3.92 4.71 5.63 6.80 8.40 11.01 17.77
316.19 331.47 250.98 188.27 142.47 108.56 80.43 58.69 39.92 21.52
25.44 25.85 23.88 21.47 19.27 16.95 14.53 12.21 9.57 6.47
0.33 0.55 0.68 0.76 0.87 0.97 1.04 1.12 1.15 1.20
0.7259 0.6809 0.6657 0.6563 0.6605 0.6636 0.6738 0.7013 0.7487 0.8890
0.2842 0.1418 0.1547 0.1935 0.2456 0.3242 0.4501 0.7067 1.3002 4.0015
0.97 1.26 1.51 1.77 2.05 2.37 2.76 3.27 4.07 6.22
2.44 0.96 0.42 0.01 0.43 0.82 1.48 2.34 4.01 9.18
10.95 11.16 10.90 10.25 9.77 8.62 6.71 3.75 0.85 11.74
payoffs in the portfolio formation month, these stocks continue to exhibit this behavior in the future. To get a clearer picture of the composition of the high MAX portfolios, Table 5 presents summary statistics for the stocks in the deciles. Specifically, the table reports the average across the months in the sample of the median values within each month of various characteristics for the stocks in each decile. We report values for the maximum daily return (in percent), the market capitalization (in millions of dollars), the price (in dollars), the market beta, the book-to-market (BM) ratio, a measure of illiquidity (scaled by 105), the return in the portfolio formation month (REV), the return over the 11 months prior to portfolio formation (MOM), and the idiosyncratic volatility (IVOL).7 Definitions of these variables are given in the Appendix. The portfolios exhibit some striking patterns. As we move from the low MAX to the high MAX decile, the average across months of the median daily maximum return of stocks increases from 1.62% to 17.77%. With the exception of decile 10, these values are similar to those reported in Table 1 for the average maximum daily return. For decile 10, the average maximum return exceeds the median by approximately 6%. The distribution of maximum daily returns is clearly right-skewed, with some stocks exhibiting very high returns. These outliers are not a problem in the portfolio-level analysis, but we will revisit this issue in the firm-level, cross-sectional regressions. As MAX increases across the deciles, market capitalization decreases. The absolute numbers are difficult to interpret since market capitalizations go up over time, but the relative values indicate that the high MAX portfolios are dominated by smaller stocks. This pattern is good news for the raw return differences shown in Table 1 since the concentration of small stocks in the high MAX
7 The qualitative results from the average statistics are very similar to those obtained from the median statistics. Since the median is a robust measure of the center of the distribution that is less sensitive to outliers than the mean, we choose to present the median statistics in Table 5.
deciles would suggest that these portfolios should earn a return premium, not the return discount observed in the data. This phenomenon may partially explain why the alpha difference exceeds the difference in raw returns. The small stocks in the high MAX portfolios also tend to have low prices, declining to a median price of $6.47 for decile 10. While this pattern is not surprising, it does suggest that there may be measurement issues associated with microstructure phenomena for some of the small, low-priced stocks in the higher MAX portfolios, or, more generally, that the results we show may be confined solely to micro-cap stocks with low stock prices. The fact that the results hold for value-weighted portfolios, as well as equal-weighted portfolios, does allay this concern somewhat, but it is still worthwhile to check the robustness of the results to different sample selection procedures. First, we repeat the analysis in Table 1 excluding all stocks with prices below $5/share. The four-factor alpha differences between the low MAX and high MAX valueweighted and equal-weighted portfolios are 0.81% and 1.14% per month, respectively, and both differences are highly statistically significant. Second, we exclude all Amex and Nasdaq stocks from the sample and form portfolios of stocks trading only on the NYSE. Again, the average risk-adjusted return differences are large and negative: 0.45% per month with a t-statistic of 2.48 for the value-weighted portfolios and 0.89% per month with a t-statistic of 5.15 for the equal-weighted portfolios. Finally, we sort all NYSE stocks by firm size each month to determine the NYSE decile breakpoints for market capitalization. Then, each month we exclude all NYSE/Amex/Nasdaq stocks with market capitalizations that would place them in the smallest NYSE size quintile, i.e., the two smallest size deciles, consistent with the definition of micro-cap stocks in Keim (1999) and Fama and French (2008). The average risk-adjusted return differences are 0.72% and 0.44% per month with t-statistics of 4.00 and 2.25 for the value-weighted and equal-weighted portfolios, respectively. These analyses provide convincing evidence that, while our main findings are certainly concentrated among smaller stocks,
434
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
the phenomenon is not confined to only the smallest, lowest-price segment of the market. We can also look more directly at the distribution of market capitalizations within the high MAX decile. For example, during the last 2 years of our sample period, approximately 68% of these stocks fell below the size cutoff necessary for inclusion in the Russell 3000 index. In other words, almost one-third of the high MAX stocks were among the largest 3000 stocks. Over the full sample, approximately 50% of the high MAX stocks, on average, fell into the two smallest size deciles. This prevalence of small stocks with extreme positive returns, and their corresponding low future returns, is consistent with the theoretical motivation discussed earlier. It is individual investors, rather than institutions, that are most likely to be subject to the phenomena modeled in Barberis and Huang (2008) and Brunnermeier, Gollier, and Parker (2007), and individual investors also exhibit underdiversification. Thus, these effects should show up in the same small stocks that are held and traded by individual investors but by very few institutions. Returning to the descriptive statistics in Table 5, betas are calculated monthly using a regression of daily excess stock returns on daily excess market returns; thus, these values are clearly noisy estimates of the true betas. Nevertheless, the monotonic increase in beta as MAX increases does suggest that stocks with high maximum daily returns are more exposed to market risk. To the extent that market risk explains the cross-section of expected returns, this relation between MAX and beta serves only to emphasize the low raw returns earned by the high MAX stocks as shown in Table 1. The difference in four-factor alphas should control for this effect, which partially explains why this difference is larger than the difference in the raw returns. Median book-to-market ratios are similar across the portfolios, although if anything, high MAX portfolios do have a slight value tilt. In contrast, the liquidity differences are substantial. Our measure of illiquidity is the absolute return over the month divided by the monthly trading volume, which captures the notion of price impact, i.e., the extent to which trading moves prices (see Amihud, 2002). We use monthly returns over monthly trading volume, rather than a monthly average of daily values of the same quantity, because a significant fraction of stocks have days with no trade. Eliminating these stocks from the sample reduces the sample size with little apparent change in the empirical results. Based on this monthly measure, illiquidity increases quite dramatically for the high MAX deciles, consistent with these portfolios containing smaller stocks. Again, this pattern only serves to strengthen the raw return differences shown in Table 1 since these stocks should earn a higher return to compensate for their illiquidity. Moreover, the four-factor alphas do not control for this effect except to the extent that the size and bookto-market factors also proxy for liquidity. The final two columns of Table 5 report median returns in the portfolio formation month (REV) and the return over the previous 11 months (MOM). These two variables indicate the extent to which the portfolios are subject to
short-term reversal and intermediate-term momentum effects, respectively. Jegadeesh and Titman (1993) and subsequent papers show that over intermediate horizons, stocks exhibit a continuation pattern, i.e., past winners continue to do well and past losers continue to perform badly. Over shorter horizons, stocks exhibit return reversals, due partly to microstructure effects such as bid-ask bounce (Jegadeesh, 1990; Lehmann, 1990). The Fama-French-Carhart four-factor model does not control for short-term reversals; therefore, we control for the effects of REV in the context of bivariate sorts and cross-sectional regressions later in the paper. However, it is also possible that REV, a monthly return, does not adequately capture shorter-term effects. To verify that it is not daily or weekly microstructure effects that are driving our results, we subdivide the stocks in the high MAX portfolio according to when in the month the maximum daily return occurs. If the effect we find is more prominent for stocks whose maximum return occurs towards the end of the month, it would cast doubt on our interpretation of the evidence. There is no evidence of this phenomenon. For example, for value-weighted portfolios, average raw return differences between the low MAX and high MAX portfolios are 0.98% per month for stocks with the maximum return in the first half of the month versus 0.95% per month for those with the maximum return in the second half of the month. The alpha differences follow the same pattern. Similarly, the raw return differences for stocks with the maximum return in the first week of the month are 1.41% per month, which is larger than the return difference of 0.89% per month for those stocks with maximum returns in the last week. Again, the alpha differences follow the same ordering. Moreover, the low returns associated with high MAX stocks persist beyond the first month after portfolio formation. Thus, short-term reversals at the daily or weekly frequency do not seem to explain the results. Given that the portfolios are sorted on maximum daily returns, it is hardly surprising that median returns in the same month are also high, i.e., stocks with a high maximum daily return also have a high return that month. More interesting is the fact that the differences in median monthly returns for the portfolios of interest are smaller than the differences in the median MAX. For example, the difference in MAX between deciles 9 and 10 is 6.8% relative to a difference in monthly returns of 5.2%. In other words, the extreme daily returns on the lotterylike stocks are offset to some extent by lower returns on other days. This phenomenon explains why these same stocks can have lower average returns in the subsequent month (Table 1) even though they continue to exhibit a higher frequency of extreme positive returns (Tables 3 and 4). This lower average return is also mirrored in the returns over the prior 11 months. The high MAX portfolios exhibit significantly lower and even negative returns over the period prior to the portfolio formation month. The strength of this relation is perhaps surprising, but it is consistent with the fact that stocks with extreme positive daily returns are small and have low prices.
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
The final column in Table 5 reports the idiosyncratic volatility of the MAX-sorted portfolios. It is clear that MAX and IVOL are strongly positively correlated in the crosssection. We address the relation between extreme returns and idiosyncratic volatility in detail in Section 3. Given these differing characteristics, there is some concern that the four-factor model used in Table 1 to calculate alphas is not adequate to capture the true difference in risk and expected returns across the portfolios sorted on MAX. For example, the HML and SMB factors of Fama and French do not fully explain the returns of portfolios sorted by book-to-market ratios and size.8 Moreover, the four-factor model does not control explicitly for the differences in expected returns due to differences in illiquidity or other known empirical phenomena such as short-term reversals. With the exception of short-term reversals and intermediate-term momentum, it seems unlikely that any of these factors can explain the return differences in Table 1 because high MAX stocks have characteristics that are usually associated with high expected returns, while these portfolios actually exhibit low returns. Nevertheless, in the following two subsections we provide different ways of dealing with the potential interaction of the maximum daily return with firm size, book-to-market, liquidity, and past returns. Specifically, we test whether the negative relation between MAX and the cross-section of expected returns still holds once we control for size, book-to-market, momentum, short-term reversal, and liquidity using bivariate portfolio sorts and Fama-MacBeth (1973) regressions.
2.3. Bivariate portfolio-level analysis In this section we examine the relation between maximum daily returns and future stock returns after controlling for size, book-to-market, momentum, shortterm reversals, and liquidity. For example, we control for size by first forming decile portfolios ranked based on market capitalization. Then, within each size decile, we sort stocks into decile portfolios ranked based on MAX so that decile 1 (decile 10) contains stocks with the lowest (highest) MAX. For brevity, we do not report returns for all 100 (10 10) portfolios. Instead, the first column of Table 6, Panel A presents returns averaged across the ten size deciles to produce decile portfolios with dispersion in MAX, but which contain all sizes of firms. This procedure creates a set of MAX portfolios with similar levels of firm size, and thus, these MAX portfolios control for differences in size. After controlling for size, the valueweighted average return difference between the low MAX and high MAX portfolios is about 1.22% per month with a Newey–West t-statistic of 4.49. The 10–1 difference in the four-factor alphas is 1.19% per month with a t-statistic of 5.98. Thus, market capitalization does not explain the high (low) returns to low (high) MAX stocks. 8 Daniel and Titman (1997) attribute this failure to the fact that returns are driven by characteristics, not risk. We take no stand on this issue, but instead conduct a further battery of tests to demonstrate the robustness of our results.
435
If, instead of averaging across the size deciles, we look at the alpha differences for each decile in turn, the results are consistent with those reported in Section 2.2. Specifically, while the direction of the MAX effect is consistent across all the deciles, it is generally increasing in both magnitude and statistical significance as the market capitalization of the stocks decreases. The fact that the results from the bivariate sort on size and MAX are, if anything, both economically and statistically more significant than those presented for the univariate sort in Table 1 is perhaps not too surprising. As shown in Table 5, the high MAX stocks, which have low subsequent returns, are generally small stocks. The standard size effect would suggest that these stocks should have high returns. Thus, controlling for size should enhance the effect on raw returns and even on four-factor alphas to the extent that the SMB factor is an imperfect proxy. However, there is a second effect of bivariate sorts that works in the opposite direction. Size and MAX are correlated; hence, variation in MAX within size-sorted portfolios is smaller than in the broader universe of stocks. That this smaller variation in MAX still generates substantial return variation is further evidence of the significance of this phenomenon. The one concern with dependent bivariate sorts on correlated variables is that they do not sufficiently control for the control variable. In other words, there could be some residual variation in size across the MAX portfolios. We address this concern in two ways. First, we also try independent bivariate sorts on the two variables. These sorts produce very similar results. Second, in the next section we perform cross-sectional regressions in which all the variables appear as control variables. We control for book-to-market (BM) in a similar way, with the results reported in the second column of Table 6, Panel A. Again the effect of MAX is preserved, with a valueweighted average raw return difference between the low MAX and high MAX deciles of 0.93% per month and a corresponding t-statistic of 3.23. The 10–1 difference in the four-factor alphas is also negative, 1.06% per month, and highly significant. When controlling for momentum in column 3, the raw return and alpha differences are smaller in magnitude, but they are still economically large and statistically significant at all conventional levels. Again, the fact that momentum and MAX are correlated reduces the dispersion in maximum daily returns across the MAX portfolios, but intermediate-term continuation does not explain the phenomenon we show. Column 4 controls for short-term reversals. Since firms with large positive daily returns also tend to have high monthly returns, it is conceivable that MAX could be proxying for the well-known reversal phenomenon at the monthly frequency, which we do not control for in the four-factor model in Table 1. However, this is not the case. After controlling for the magnitude of the monthly return in the portfolio formation month, the return and alpha differences are still 81 and 98 basis points, respectively, and both numbers exhibit strong statistical significance. Finally, we control for liquidity by first forming decile portfolios ranked based on the illiquidity measure of
436
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
Table 6 Returns on portfolios of stocks sorted by MAX after controlling for SIZE, BM, MOM, REV, and ILLIQ. Double-sorted, value-weighted (Panel A) and equal-weighted (Panel B) decile portfolios are formed every month from July 1962 to December 2005 by sorting stocks based on the maximum daily returns after controlling for size, book-to-market, intermediate-term momentum, short-term reversals, and illiquidity. In each case, we first sort the stocks into deciles using the control variable, then within each decile, we sort stocks into decile portfolios based on the maximum daily returns over the previous month so that decile 1 (10) contains stocks with the lowest (highest) MAX. This table presents average returns across the ten control deciles to produce decile portfolios with dispersion in MAX but with similar levels of the control variable. ‘‘Return difference’’ is the difference in average monthly returns between the High MAX and Low MAX portfolios. ‘‘Alpha difference’’ is the difference in four-factor alphas on the High MAX and Low MAX portfolios. Newey-West (1987) adjusted t-statistics are reported in parentheses. Panel A: Value-weighted portfolios Decile
SIZE
BM
MOM
REV
ILLIQ
Low MAX 2 3 4 5 6 7 8 9 High MAX
1.47 1.60 1.69 1.65 1.57 1.49 1.29 1.20 0.93 0.25
1.22 1.19 1.27 1.19 1.17 1.23 1.13 0.99 0.89 0.29
1.32 1.14 1.17 1.07 1.03 1.03 0.96 0.93 0.88 0.67
1.06 1.18 1.19 1.18 1.15 1.15 1.04 1.07 0.86 0.25
1.29 1.31 1.30 1.23 1.12 1.06 0.99 0.88 0.60 0.18
1.22 ( 4.49) 1.19 ( 5.98)
0.93 (3.23) 1.06 ( 4.87)
0.65 ( 3.18) 0.70 ( 5.30)
0.81 ( 2.70) 0.98 ( 5.37)
1.11 ( 4.07) 1.12 ( 5.74)
Panel B: Equal-weighted portfolios Decile
SIZE
BM
MOM
REV
ILLIQ
Low MAX 2 3 4 5 6 7 8 9 High MAX
1.52 1.63 1.73 1.70 1.62 1.54 1.38 1.27 1.04 0.41
1.37 1.50 1.53 1.54 1.48 1.52 1.45 1.33 1.19 0.78
1.47 1.45 1.38 1.32 1.29 1.20 1.15 1.08 1.03 0.71
1.36 1.56 1.60 1.58 1.59 1.53 1.44 1.33 1.15 0.52
1.40 1.59 1.60 1.58 1.52 1.52 1.40 1.32 1.05 0.59
1.11 ( 4.05) 1.06 ( 5.18)
0.59 ( 2.00) 0.54 ( 1.96)
0.76 ( 3.70) 0.88 ( 7.62)
0.83 ( 2.83) 1.02 ( 5.09)
0.81 ( 2.68) 0.79 ( 3.40)
Return difference Alpha difference
Return difference Alpha difference
Amihud (2002), with the results reported in the final column of Table 6. Again, variation in MAX is apparently priced in the cross-section, with large return differences and corresponding t-statistics. Thus, liquidity does not explain the negative relation between maximum daily returns and future stock returns. As mentioned earlier, we compute illiquidity as the ratio of the absolute monthly return to the monthly trading volume. We can also compute the original illiquidity measure of Amihud (2002), defined as the daily absolute return divided by daily dollar trading volume averaged within the month. These measures are strongly correlated, but in the latter case, we need to make a decision about how to handle stocks with zero trading volume on at least one day within the month. When we eliminate these stocks from the sample, the findings remain essentially unchanged. Raw return and alpha differences are 1.25% per month and 1.20% per month, respectively. Thus, for the remainder of the paper we
focus on the larger sample and the monthly measure of illiquidity. Next, we turn to an examination of the equal-weighted average raw and risk-adjusted returns on MAX portfolios after controlling for the same cross-sectional effects as in Table 6, Panel A. Again, to save space, instead of presenting the returns of all 100 (10 10) portfolios for each control variable, we report the average returns of the MAX portfolios, averaged across the 10 control deciles to produce decile portfolios with dispersion in MAX but with similar levels of the control variable. Table 6, Panel B shows that after controlling for size, book-to-market, momentum, short-term reversal, and liquidity, the equal-weighted average return differences between the low MAX and high MAX portfolios are 1.11%, 0.59%, 0.76%, 0.83%, and 0.81% per month, respectively. These average raw return differences are both economically and statistically significant. The corresponding values for the equal-weighted average risk-adjusted
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
return differences are 1.06%, 0.54%, 0.88%, 1.02%, and 0.79%, which are also highly significant. These results indicate that for both the value-weighted and the equal-weighted portfolios, the well-known crosssectional effects such as size, book-to-market, momentum, short-term reversal, and liquidity cannot explain the low returns to high MAX stocks.
Table 7 Firm-level cross-sectional return regressions. Each month from July 1962 to December 2005 we run a firm-level cross-sectional regression of the return in that month on subsets of lagged predictor variables including MAX in the previous month and six control variables that are defined in the Appendix. In each row, the table reports the time-series averages of the cross-sectional regression slope coefficients and their associated Newey-West (1987) adjusted t-statistics (in parentheses).
2.4. Firm-level cross-sectional regressions
MAX
So far we have tested the significance of the maximum daily return as a determinant of the cross-section of future returns at the portfolio level. This portfolio-level analysis has the advantage of being non-parametric in the sense that we do not impose a functional form on the relation between MAX and future returns. The portfolio-level analysis also has two potentially significant disadvantages. First, it throws away a large amount of information in the cross-section via aggregation. Second, it is a difficult setting in which to control for multiple effects or factors simultaneously. Consequently, we now examine the cross-sectional relation between MAX and expected returns at the firm level using Fama and MacBeth (1973) regressions. We present the time-series averages of the slope coefficients from the regressions of stock returns on maximum daily return (MAX), market beta (BETA), log market capitalization (SIZE), log book-to-market ratio (BM), momentum (MOM), short-term reversal (REV), and illiquidity (ILLIQ). The average slopes provide standard Fama-MacBeth tests for determining which explanatory variables, on average, have non-zero premiums. Monthly cross-sectional regressions are run for the following econometric specification and nested versions thereof:
0.0434 ( 2.92)
Ri,t þ 1 ¼ l0,t þ l1,t MAXi,t þ l2,t BETAi,t þ l3,t SIZEi,t þ l4,t BMi,t þ l5,t MOMi,t þ l6,t REVi,t þ l7,t ILLIQ i,t þ ei,t þ 1 ,
ð1Þ
where Ri,t + 1 is the realized return on stock i in month t+1. The predictive cross-sectional regressions are run on the one-month lagged values of MAX, BETA, SIZE, BM, REV, and ILLIQ, and MOM is calculated over the 11-month period ending 2 months prior to the return of interest. Table 7 reports the time-series averages of the slope coefficients li,t (i= 1, 2, y, 7) over the 522 months from July 1962 to December 2005 for all NYSE/Amex/Nasdaq stocks. The Newey-West adjusted t-statistics are given in parentheses. The univariate regression results show a negative and statistically significant relation between the maximum daily return and the cross-section of future stock returns. The average slope, l1,t, from the monthly regressions of realized returns on MAX alone is 0.0434 with a t-statistic of 2.92. The economic magnitude of the associated effect is similar to that shown in Tables 1 and 6 for the univariate and bivariate sorts. The spread in median maximum daily returns between deciles 10 and 1 is approximately 16%. Multiplying this spread by the average slope yields an estimate of the monthly risk premium of 69 basis points. In general, the coefficients on the individual control variables are also as expected—the size effect is negative
437
BETA
SIZE
BM
MOM
REV
ILLIQ
0.0118 ( 0.43) 0.1988 ( 4.08) 0.4651 (6.73) 0.7317 (4.67) 0.0675 ( 11.24)
0.0637 ( 6.16)
0.0140 (0.56) 0.0485 (2.18)
0.0371 (3.87) 0.0865 0.3390 0.7436 0.0751 0.0223 ( 1.73) (4.82) (5.29) ( 14.15) (3.64) 0.1358 0.3201 0.6866 0.0712 0.0224 ( 3.10) (4.69) (4.97) ( 13.53) (3.78)
and significant, the value effect is positive and significant, stocks exhibit intermediate-term momentum and shortterm reversals, and illiquidity is priced. The average slope on BETA is negative and statistically insignificant, which contradicts the implications of the CAPM but is consistent with prior empirical evidence. In any case, these results should be interpreted with caution since BETA is estimated over a month using daily data, and thus, is subject to a significant amount of measurement error. The regression with all six control variables shows similar results, although the size effect is weaker and the coefficient on BETA is now positive, albeit statistically insignificant. Of primary interest is the last line of Table 7, which shows the results for the full specification with MAX and the six control variables. In this specification, the average slope coefficient on MAX is 0.0637, substantially larger than in the univariate regression, with a commensurate increase in the t-statistic to 6.16. This coefficient corresponds to a 102 basis-point difference in expected monthly returns between median stocks in the high and low MAX deciles. The explanation for the increased magnitude of the estimated effect in the full specification is straightforward. Since stocks with high maximum daily returns tend to be small and illiquid, controlling for the increased expected return associated with these characteristics pushes the return premium associated with extreme positive return stocks even lower. These effects more than offset the reverse effect associated with intermediate-term momentum and short-term reversals, which partially explain the low future returns on high MAX stocks. The strength of the results is somewhat surprising given that there are sure to be low-priced, thinly traded
438
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
stocks within our sample whose daily returns will exhibit noise due to microstructure and other effects. To confirm this intuition, we re-run the cross-sectional regressions after winsorizing MAX at the 99th and 95th percentiles to eliminate outliers. In the full specification, the average coefficient on MAX increases to 0.0788 and 0.0902, suggesting that the true economic effect is even larger than that shown in Table 7. A different but related robustness check is to run the same analysis using only NYSE stocks, which tend to be larger and more actively traded and are thus likely to have less noisy daily returns. For this sample, the baseline coefficient of 0.064 in Table 7 increases to 0.077. Given the characteristics of the high MAX stocks, as discussed previously, it is also worthwhile verifying that different methods of controlling for illiquidity do not affect the main results. Using the daily Amihud (2002) measure averaged over the month, the coefficient on MAX is somewhat larger in magnitude. In addition, controlling for the liquidity risk measure of Pastor and Stambaugh (2003) has little effect on the results. The regression in Eq. (1) imposes a linear relation between returns and MAX for simplicity rather than for theoretical reasons. However, adding a quadratic term to the regression or using a piecewise linear specification appears to add little, if anything, to the explanatory power. Similarly, interacting MAX with contemporaneous volume, with the idea that trading volume may be related to the informativeness of the price movements, also proved fruitless. The clear conclusion is that cross-sectional regressions provide strong corroborating evidence for an economically and statistically significant negative relation between extreme positive returns and future returns, consistent with models that suggest that idiosyncratic lottery-like payoffs are priced in equilibrium. 3. Idiosyncratic volatility and extreme returns While arguably MAX is a theoretically motivated variable, there is still a concern that it may be proxying for a different effect. In particular, stocks with high volatility are likely to exhibit extreme returns of both signs. Moreover, stocks with high maximum daily returns in a given month will also have high realized volatility in the same month, measured using squared daily returns, almost by construction. Ang, Hodrick, Xing, and Zhang (2006, 2009) show that idiosyncratic volatility has a significant negative price in the cross-section, i.e., stocks with high idiosyncratic volatility have low subsequent returns9; thus, it is plausible that MAX is proxying for this effect. We examine this issue in detail in this section. 9 Fu (2009) emphasizes the time-series variation in idiosyncratic volatility and finds a significantly positive relation between conditional idiosyncratic variance and the cross-section of expected returns. Spiegel and Wang (2005) estimate idiosyncratic volatility from monthly rather than daily returns and find that stock returns increase with the level of idiosyncratic risk and decrease with the stock’s liquidity but that idiosyncratic risk often subsumes the explanatory power of liquidity. Fu (2009) and Huang, Liu, Rhee, and Zhang (2010) argue that the results are
Table 8 Time-series average of cross-sectional correlations. The table reports the average across months of the cross-sectional correlation of the maximum daily return in a month (MAX), the average of the highest five daily returns in a month (MAX(5)), the minimum daily return in a month (MIN), total volatility (TVOL), and idiosyncratic volatility (IVOL) for the period July 1962 to December 2005.
MAX MAX(5) MIN TVOL IVOL
MAX
MAX(5)
MIN
TVOL
IVOL
1
0.8981 1
0.5491 0.6153 1
0.7591 0.8312 0.7603 1
0.7533 0.8204 0.7554 0.9842 1
As preliminary evidence, Table 8 provides the average monthly cross-sectional correlations between five variables of interest—MAX (the maximum daily return within the month), MAX(5) (the average of the highest five daily returns within the month), MIN (the negative of the minimum daily return within the month), TVOL (monthly realized total volatility measured using daily returns within the month), and IVOL (monthly realized idiosyncratic volatility measured using the residuals from a daily market model within the month). TVOL, IVOL, and MIN are defined in the Appendix. We reverse the sign on the minimum daily returns so that high values of MIN correspond to more extreme returns. Note that idiosyncratic volatility and total volatility are essentially identical when measured within a month due to the low explanatory power of the market model regression. In our sample, the average cross-sectional correlation between these variables exceeds 0.98. We choose to work with IVOL since it corresponds to the variable used by Ang, Hodrick, Xing, and Zhang (2006).10 Not surprisingly, MAX and MAX(5) are highly correlated. Of greater interest, the average, cross-sectional correlations between IVOL and both MAX and MIN are approximately 0.75, which is very high given that all three variables are calculated at the individual stock level. MAX(5) is even more highly correlated with IVOL than MAX. Moreover, these correlations are not driven simply by the fact that a squared extreme daily return leads to a high measured realized volatility. Even when the maximum and minimum daily returns are eliminated prior to the calculation of volatility, volatility remains highly correlated with MAX, MAX(5), and MIN. MAX and MAX(5) are also quite closely related to MIN, with correlations of 0.55 and 0.62, respectively. Clearly stocks with high volatility exhibit extreme returns and vice versa. A second important piece of preliminary evidence is to verify the relation between idiosyncratic volatility and
(footnote continued) driven by monthly stock-return reversals, although Nyberg (2008) disputes this claim. Fang and Peress (2009) show that the idiosyncratic volatility effect is reversed for stocks with no media coverage. 10 Measuring idiosyncratic volatility relative to a three-factor or four-factor model rather than the market model has little effect on the results.
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
future returns in our sample. We conduct a univariate portfolio sort on IVOL, similar to that given in Table 1 for MAX, although, for brevity, we do not report the results in detail. These results look very similar to those in Table 1. For value-weighted returns, deciles 1 through 7 (lower idiosyncratic volatility) all exhibit average monthly returns of around 1%. These returns fall dramatically for the higher volatility stocks, all the way to 0.02% per month for decile 10. Both the return differences between the low and high IVOL deciles of 0.93% per month and the corresponding four-factor alpha differences of 1.33% are economically and statistically significant. These results coincide closely with the results in Ang, Hodrick, Xing, and Zhang (2006), although they form quintiles rather than deciles and use a slightly shorter sample period. Of some interest, there is no
439
evidence of an idiosyncratic volatility effect in equalweighted portfolios—a result that is found in Bali and Cakici (2008). Given the strong positive correlation between MAX and IVOL shown in Table 8 above, it is not surprising that average maximum daily returns increase across the IVOL-sorted portfolios. In fact, the range is not that much smaller than in the MAX-sorted portfolios (Table 1), with the stocks in the low and high IVOL portfolios having an average MAX of 1.95% and 17.31%, respectively. To examine the relation between extreme returns and volatility more closely, we first conduct four bivariate sorts. In Table 9, Panel A we sort on both the maximum daily return (MAX) and the average of the five highest daily returns (MAX(5)), controlling for idiosyncratic volatility. We first form decile portfolios ranked based
Table 9 Returns on portfolios of stocks sorted by MAX and IVOL after controlling for IVOL and MAX. Double-sorted, value-weighted (VW) and equal-weighted (EW) decile portfolios are formed every month from July 1962 to December 2005. In Panel A we sort stocks based on the maximum daily return (MAX) or average of the five highest daily returns (MAX(5)) after controlling for idiosyncratic volatility (IVOL). In Panel B we sort stocks based on idiosyncratic volatility (IVOL) after controlling for the maximum daily return (MAX) or average of the five highest daily returns (MAX(5)). In both cases, we first sort the stocks into deciles using the control variable, then within each decile, we sort stocks into decile portfolios based on the variable of interest. The columns report average returns across the ten control deciles to produce decile portfolios with dispersion in the variable of interest but with similar levels of the control variable. ‘‘Return difference’’ is the difference in average monthly returns between deciles 10 and 1. ‘‘Alpha difference’’ is the difference in four-factor alphas between deciles 10 and 1. Newey-West (1987) adjusted t-statistics are reported in parentheses. Panel A: Sorted by MAX and MAX(5) controlling for IVOL N=1
N=5
Decile
VW
EW
VW
EW
Low MAX(N) 2 3 4 5 6 7 8 9 High MAX(N)
1.12 1.09 0.94 0.93 0.80 0.77 0.79 0.82 0.76 0.77
2.01 1.65 1.54 1.41 1.34 1.22 1.19 1.23 1.04 1.10
1.39 1.18 1.20 1.11 0.99 0.84 0.74 0.79 0.67 0.53
2.25 1.81 1.67 1.51 1.38 1.21 1.11 1.06 0.93 0.75
0.35 ( 2.42) 0.34 ( 2.48)
0.91 ( 7.86) 0.92 ( 7.96)
0.86 ( 4.36) 0.84 ( 4.98)
1.50 ( 9.21) 1.58 ( 10.05)
Return difference Alpha difference
Panel B: Sorted by IVOL controlling for MAX and MAX(5) MAX
MAX(5)
Decile
VW
EW
VW
EW
Low IVOL 2 3 4 5 6 7 8 9 High IVOL
1.03 0.93 0.90 0.92 0.95 0.88 0.94 0.83 0.73 0.66
1.18 1.15 1.10 1.17 1.27 1.21 1.37 1.48 1.52 2.16
0.89 0.86 0.78 0.93 0.97 0.98 0.99 1.09 0.96 0.95
0.84 1.02 1.03 1.17 1.20 1.28 1.40 1.56 1.69 2.51
0.38 ( 1.98) 0.44 ( 3.12)
0.98 (4.88) 0.95 (4.76)
0.06 (0.29) 0.05 (0.34)
1.67 (8.04) 1.74 (7.67)
Return difference Alpha difference
440
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
on idiosyncratic volatility, and within each IVOL decile we sort stocks into decile portfolios based on MAX or MAX(5) so that decile 1 (decile 10) contains stocks with the lowest (highest) MAX(N). Panel A shows the average of the valueweighted and equal-weighted returns across the IVOL deciles and the associated Newey-West t-statistics. The key statistics are the return and four-factor alpha differences (and Newey-West t-statistics) between the low MAX(N) and high MAX(N) portfolios, i.e., the differences between returns on portfolios that vary in MAX(N) but have approximately the same levels of idiosyncratic volatility. The value-weighted average raw return difference between the low MAX and high MAX deciles is 0.35% per month with a t-statistic of 2.42. The 10–1 difference in the four-factor alphas is also negative, 0.34% per month, and highly significant. These magnitudes are much smaller than we have seen previously, but this result is hardly surprising. Idiosyncratic volatility and MAX are highly correlated; thus, after controlling for idiosyncratic volatility, the spread in maximum returns is significantly reduced. Nevertheless, idiosyncratic volatility does not completely explain the high (low) returns to low (high) MAX stocks. The equal-weighted average raw and riskadjusted return differences between the low MAX and high MAX portfolios are much more negative, greater than 90 basis points per month in absolute magnitude, and highly significant with the t-statistics of 7.86 to 7.96, respectively. However, recall that the idiosyncratic volatility effect does not exist in equal-weighted portfolios. When we sort on the average of the five highest daily returns within the month, the return and alpha differences for both value-weighted and equal-weighted portfolios exhibit substantially greater economic and statistical significance, consistent with the univariate results reported in Table 2. In both cases, if we examine the alpha differences individually for each IVOL decile, the pattern is intuitive. Given the high correlation between MAX and IVOL, it is only in the higher IVOL deciles where there are larger numbers of stocks with extreme positive returns. Therefore, the MAX effect tends to increase in magnitude and statistical significance as IVOL increases. What happens if we perform the reverse sort, i.e., if we examine the explanatory power of idiosyncratic volatility after controlling for MAX(N)? In Table 9, Panel B, we first form decile portfolios ranked based either on the maximum daily returns over the past one month (MAX) or the average of the five highest daily returns (MAX(5)). Then, within each MAX(N) decile, we sort stocks into decile portfolios ranked based on IVOL so that decile 1 (decile 10) contains stocks with the lowest (highest) IVOL. When controlling for MAX, the average value-weighted raw return difference between the low IVOL and high IVOL portfolios is 0.38% per month with a t-statistic of 1.98. The 10–1 difference in the fourfactor alphas is also negative, 0.44% per month, and statistically significant. These magnitudes are much smaller than those obtained from the univariate volatility portfolios; nevertheless, for the value-weighted portfolios, maximum daily return does not completely explain the idiosyncratic volatility puzzle in a simple bivariate sort.
There are two possible explanations for this result in combination with the results of Table 9, Panel A, and the significance of IVOL in the context of a univariate sort. First, MAX and IVOL could be picking up separate effects, both of which exist in the data. The absence of an idiosyncratic volatility effect in equal-weighted portfolios could be due to measurement issues for smaller stocks. Alternatively, it could be that bivariate sorts are not powerful enough to disentangle the true effect. While the idea of the bivariate sort is to produce portfolios with variation in the variable of interest but similar levels of the control variable, this goal is extremely difficult to achieve for highly correlated variables. While the stocks in the portfolios whose returns are reported in the first column of Table 9, Panel B do vary in their levels of idiosyncratic volatility, they also vary in their maximum daily returns. For example, the averages of the median idiosyncratic volatilities are 1.69% and 4.57% for the low and high IVOL portfolios, respectively, but the averages of the median MAX for these portfolios are 6.03% and 8.90%. Thus, it is difficult to know which effect is actually producing the negative return and alpha differences between these portfolios. One might think that an independent bivariate sort would solve this problem. Unfortunately, such a sort is infeasible because there are so few stocks with extreme positive returns and low volatility, or high volatility and no extreme returns. As a result, the portfolios of interest are exactly those for which we cannot observe reliable returns. However, columns 2–4 of Panel B do shed further light on the issue of disentangling the effects of the two variables. In column 2, we report the results for equalweighted portfolios, controlling for MAX. The average return difference between the high IVOL and low IVOL portfolios is about 0.98% per month with a Newey-West tstatistic of 4.88. The 10–1 difference in the four-factor alphas is 0.95% per month with a t-statistic of 4.76. Thus, after controlling for MAX, we find a significant and positive relation between IVOL and the cross-section of expected returns. This is the reverse of the counterintuitive negative relation shown by Ang, Hodrick, Xing, and Zhang (2006, 2009). Once we control for extreme positive returns, there appears to be a reward for holding idiosyncratic risk. This result is consistent with a world in which risk-averse and poorly diversified agents set prices, yet these agents have a preference for lottery-like assets, i.e., assets with extreme positive returns in some states. First, note that measurement error in idiosyncratic volatility cannot explain this positive and significant relation between idiosyncratic volatility and returns. Measurement error in the sorting variable will push return differences toward zero, but it cannot explain a sign reversal that is statistically significant, especially at the levels we report. Second, the inability to adequately control for variation in the control variable MAX is also not a viable explanation for these results. Residual variation in MAX is generating, if anything, the opposite effect. Finally, a positive relation between idiosyncratic volatility and returns and a negative relation between MAX and returns provides an explanation for the absence
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
441
Based on the bivariate equal-weighted portfolios and the firm-level cross-sectional regressions with MAX and IVOL, our conclusion is that there is no idiosyncratic volatility puzzle as recently reported in Ang, Hodrick, Xing, and Zhang (2006, 2009). In fact, stocks with high idiosyncratic volatility have higher future returns as would be expected in a world where poorly diversified and risk-averse investors help determine prices. We conclude that the reason for the presence of a negative relation between IVOL and expected returns shown by Ang et al. is that IVOL is a proxy for MAX. Interestingly, Han and Kumar (2008) provide evidence that the idiosyncratic volatility puzzle is concentrated in stocks dominated by retail investors. This evidence complements our results, since it is retail investors who are more likely to suffer from underdiversification and exhibit a preference for lottery-like assets. A slightly different way to examine the relation between extreme returns and volatility is to look at minimum returns. If it is a volatility effect that is driving returns, then MIN (the minimum daily return over the month), which is also highly correlated with volatility, should generate a similar effect to MAX. On the other hand, much of the theoretical literature would predict that the effect of MIN should be the opposite of that of MAX. For example, if investors have a skewness preference, then stocks with negatively skewed returns should require higher returns. Similarly, under the cumulative prospect theory of Barberis and Huang (2008), small probabilities or large losses are over-weighted, and thus, these stocks have lower prices and higher expected returns. To examine this issue, we form portfolios of stocks sorted on MIN after controlling for MAX. For brevity the results are not reported, but the return and alpha differences are positive and statistically significant,
of a univariate idiosyncratic volatility effect in equalweighted portfolios. This particular weighting scheme causes the IVOL and MAX effects to cancel, generating small and insignificant return differences. To confirm these conclusions, the last two columns of Table 9, Panel B present results for portfolios that control for our somewhat more powerful measure of extreme returns, the average of the five highest daily returns during the month (MAX(5)). Using this control variable, the differences between the raw and risk-adjusted returns on high IVOL and low IVOL portfolios are positive, albeit insignificant, and the differences for equal-weighted portfolios are positive and extremely economically and statistically significant. The evidence supports the theoretically coherent hypothesis that lottery-like stocks command a price premium and those with high idiosyncratic risk trade at a discount. We further examine the cross-sectional relation between IVOL and expected returns at the firm level using Fama-MacBeth regressions, with the results reported in the top half of Table 10. In the univariate regression, the average slope coefficient on IVOL is negative, 0.05, but it is not statistically significant (t-stat = 0.97). This lack of significance mirrors the result that there is little or no relation between volatility and future returns in equal-weighted portfolios. The crosssectional regressions put equal weight on each firm observation. When we add MAX to the regression, the negative relation between idiosyncratic volatility and expected returns is reversed. Specifically, the estimated average slope coefficient on IVOL is 0.39 with a Newey-West t-statistic of 4.69. This positive relation between IVOL and expected returns remains significant even after augmenting the regression with the six control variables.
Table 10 Firm-level cross-sectional return regressions with MAX, MIN, and IVOL. Each month from July 1962 to December 2005 we run a firm-level cross-sectional regression of the return in that month on subsets of lagged predictor variables including MAX, MIN, and IVOL in the previous month and six control variables that are defined in the Appendix. In each row, the table reports the time-series averages of the cross-sectional regression slope coefficients and their associated Newey-West (1987) adjusted t-statistics (in parentheses). MAX
IVOL
MIN
BETA
SIZE
BM
MOM
REV
ILLIQ
0.0496 (2.27)
0.1047 ( 2.71)
0.3244 (4.85)
0.7199 (5.35)
0.0716 ( 14.27)
0.0232 (3.76)
0.0390 (1.87)
0.1135 ( 2.77)
0.3312 (4.95)
0.7098 (5.18)
0.0694 ( 14.35)
0.0223 (3.74)
0.0382 (1.80)
0.1082 ( 2.80)
0.3287 (4.92)
0.7100 (5.31)
0.0706 ( 14.75)
0.0230 (3.73)
0.0434 ( 2.92)
0.1549 ( 10.19) 0.0961 ( 7.90)
0.0900 ( 7.84) 0.0761 ( 8.62) 0.1103 ( 6.90) 0.0886 ( 6.67)
0.0530 ( 0.97) 0.3857 (4.69) 0.1210 (1.95)
0.0840 (0.94) 0.0589 (0.80)
0.0593 (2.41) 0.1280 (6.21) 0.0393 (2.66) 0.1029 (5.43) 0.0219 (1.45)
442
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
4. Skewness and MAX
moments in asset pricing has a long history. Arditti (1967), Kraus and Litzenberger (1976), and Kane (1982) extend the standard mean-variance portfolio theory to incorporate the effect of skewness on valuation. They present a threemoment asset pricing model in which investors hold concave preferences and like positive skewness. In this framework, assets that decrease a portfolio’s skewness (i.e., that make the portfolio returns more left-skewed) are less desirable and should command higher expected returns. Similarly, assets that increase a portfolio’s skewness should generate lower expected returns.11 From our perspective, the key implication of these models is that it is systematic skewness, not idiosyncratic skewness, that explains the cross-sectional variation in stock returns. Investors hold the market portfolio in which idiosyncratic skewness is diversified away, and thus, the appropriate measure of risk is co-skewness—the extent to which the return on an individual asset covaries with the variance of market returns. Harvey and Siddique (1999, 2000) and Smith (2007) measure conditional coskewness and find that stocks with lower co-skewness outperform stocks with higher co-skewness, consistent with the theory, and that this premium varies significantly over time. In contrast, the extreme daily returns measured by MAX are almost exclusively idiosyncratic in nature, at least for the high MAX stocks, which produce the anomalous, low subsequent returns. Of course, this does not mean that MAX is not proxying for the systematic skewness, or co-skewness, of stocks. Thus, the first question is whether MAX, despite its idiosyncratic nature, is robust to controls for co-skewness. The second question is whether MAX is priced because it proxies for idiosyncratic skewness. In other words, is MAX simply a good proxy for the third moment of returns? There is some empirical evidence for a skewness effect in returns. For example, Zhang (2005) computes a measure of cross-sectional skewness, e.g., the skewness of firm returns within an industry, that predicts future returns at the portfolio level. Boyer, Mitton, and Vorkink (2010) employ a measure of expected skewness, i.e., a projection of 5-year-ahead skewness on a set of predetermined variables, including stock characteristics, to predict portfolio returns over the subsequent month. Finally, Conrad, Dittmar, and Ghysels (2008) show that measures of risk-neutral skewness from option prices predict subsequent returns. In all three cases, the direction of the results is consistent with our evidence, i.e., more positively skewed stocks have lower returns, but these effects are generally weaker than the economically and statistically strong evidence we provide in Section 2. Of equal importance, there is no theoretical reason to prefer return skewness to extreme returns as a potential variable to explain the cross-section of expected returns. In the model of Barberis and Huang (2008), based on the cumulative prospect theory of Tversky and Kahneman
Our final empirical exercise is to examine the link, if any, between extreme positive returns and skewness in terms of their ability to explain the cross-section of expected returns. The investigation of the role of higher
11 Arditti (1971), Friend and Westerfield (1980), Sears and Wei (1985), Barone-Adesi (1985), and Lim (1989) provide empirical analyses of the role of skewness.
although both the magnitudes and levels of significance are lower than those for MAX. This evidence suggests that stocks with extreme low returns have higher expected returns in the subsequent month. The opposite effects of MAX and MIN are consistent with cumulative prospect theory and skewness preference, but they are not consistent with the hypothesis that extreme returns are simply proxying for idiosyncratic volatility. In addition to the portfolio-level analyses, we run firmlevel Fama-MacBeth cross-sectional regressions with MAX, MIN, and IVOL. The bottom half of Table 10 presents the average slope coefficients and the Newey-West adjusted t-statistics. For all econometric specifications, the average slope on MAX remains negative and significant, confirming our earlier findings from the bivariate sorts. After controlling for MIN and IVOL, as well as market beta, size, book-to-market, momentum, shortterm reversals, and liquidity, the average slope on MAX is 0.089 with a t-statistic of 6.67. For specifications with MAX and MIN, but not IVOL, the average slope on MIN is positive and both economically and statistically significant. Note that the original minimum returns are multiplied by 1 in constructing the variable MIN. Therefore, the positive slope coefficient means that the more a stock fell in value, the higher the future expected return. The addition of the six control variables clearly weakens the estimated effect. This result is not surprising since stocks with extreme negative returns have characteristics similar to those of firms with extreme positive returns, i.e., they tend to be small and illiquid. Thus, size and illiquidity both serve to explain some of the positive returns earned by these stocks. Moreover, the MIN effect is not robust to the same subsampling exercises we report in Section 2.2 for MAX. When we exclude stocks whose market capitalizations would place them in the smallest NYSE size quintile, or when we examine NYSE stocks only, the MIN effect is no longer statistically significant, and, in fact, the sign of the effect is often reversed. Thus, the MIN effect, in contrast to the MAX effect, appears to be limited to micro-cap stocks. This result is perhaps not surprising because it may be costly to engage in the short-selling necessary to exploit the MAX effect, while exploiting the MIN effect involves taking a long position in the relevant stocks. For the full specification with MAX, MIN, and IVOL, the coefficients on MIN and IVOL are no longer statistically significant. However, this result is most likely due to the multicollinearity in the regression, i.e., the correlations between MIN and IVOL (see Table 8) and between MIN, IVOL, and the control variables. The true economic effect of extreme negative returns is still an open issue, but these regressions provide further evidence that there is no idiosyncratic volatility puzzle.
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
(1992), it is the low probability, extreme return states that drive the results, not skewness directly. Similarly, in the optimal beliefs model of Brunnermeier, Gollier, and Parker (2007), it is again low probability states that drive the relevant pricing effects. Only in the model of Mitton and Vorkink (2007), who assume a preference for positive skewness, is skewness the natural measure. To determine whether the information content of maximum daily returns and skewness is similar, we test the significance of the cross-sectional relation between MAX and future stock returns after controlling for total skewness (TSKEW), idiosyncratic skewness (ISKEW), and systematic skewness (SSKEW). In contrast with our other control variables, we calculate these skewness measures primarily over one year using daily returns.12 A one-year horizon provides a reasonable tradeoff between having a sufficient number of observations to estimate skewness and accommodating time-variation in skewness. Total skewness is the natural measure of the third central moment of returns; systematic skewness, or co-skewness, is the coefficient of a regression of returns on squared market returns, including the market return as a second regressor (as in Harvey and Siddique, 2000); and idiosyncratic skewness is the skewness of the residuals from this regression. These variables are defined in more detail in the Appendix. Total skewness and idiosyncratic skewness are similar for most stocks due to the low explanatory power of the regression using daily data. We first perform bivariate sorts on MAX while controlling for skewness. We control for total skewness by forming decile portfolios ranked based on TSKEW. Then, within each TSKEW decile, we sort stocks into decile portfolios ranked based on MAX so that decile 1 (decile 10) contains stocks with the lowest (highest) MAX. The first column of Table 11 shows returns averaged across the 10 TSKEW deciles to produce decile portfolios with dispersion in MAX, but which contain firms with all levels of total skewness. After controlling for total skewness, the value-weighted average return difference between the low MAX and high MAX portfolios is about 0.94% per month with a Newey-West t-statistic of 3.06. The 10–1 difference in the four-factor alphas is 1.00% per month with a t-statistic of 4.34. Thus, total skewness does not explain the high (low) returns to low (high) MAX stocks. The last two columns of Table 11 present similar results from the bivariate sorts of portfolios formed based on MAX after controlling for systematic and idiosyncratic skewness, respectively. After controlling for systematic skewness, or co-skewness, the value-weighted average raw and riskadjusted return differences between the low MAX and high MAX portfolios are in the range of 110–123 basis points per month and highly significant. After controlling for idiosyncratic skewness, the value-weighted average raw and risk-adjusted return differences between the low MAX and high MAX portfolios are 0.93% to 1.01% per month with the t-statistics of 2.96 and 4.34, respectively. These results indicate that systematic and idiosyncratic skewness
12 We test the robustness of our conclusions to variation in the measurement horizon (one, 3, 6, and 12 months) and find similar results.
443
Table 11 Returns on portfolios of stocks sorted by MAX after controlling for skewness. Double-sorted, value-weighted decile portfolios are formed every month from July 1962 to December 2005 by sorting stocks based on the maximum daily returns after controlling for total (TSKEW), systematic (SSKEW), and idiosyncratic skewness (ISKEW). In each case, we first sort the stocks into deciles using the control variable, then within each decile, we sort stocks into decile portfolios based on the maximum daily returns over the previous month so that decile 1 (10) contains stocks with the lowest (highest) MAX. The table reports average returns across the ten control deciles to produce decile portfolios with dispersion in MAX but with similar levels of the control variable. ‘‘Return difference’’ is the difference in average monthly returns between high MAX and low MAX portfolios. ‘‘Alpha difference’’ is the difference in four-factor alphas between high MAX and low MAX portfolios. Newey-West (1987) adjusted t-statistics are reported in parentheses. Decile Low MAX 2 3 4 5 6 7 8 9 High MAX Return difference Alpha difference
TSKEW
SSKEW
ISKEW
1.06 1.11 1.21 1.07 1.13 1.14 0.97 0.87 0.76 0.12
1.12 1.06 1.06 1.10 1.11 1.10 0.98 0.89 0.80 0.03
1.04 1.14 1.18 1.08 1.17 1.10 0.99 0.91 0.74 0.11
0.94 ( 3.06) 1.00 ( 4.34)
1.10 ( 3.75) 1.23 ( 5.50)
0.93 ( 2.96) 1.01 ( 4.34)
cannot explain the significantly negative relation between MAX and expected stock returns. One concern with this analysis is that lagged skewness may not be a good predictor of future skewness, as argued by Boyer, Mitton, and Vorkink (2010). In a rational market, it is expected future skewness that matters. This issue is addressed in Table 12, which presents results from crosssectional, firm-level regressions of total skewness on lagged values of total skewness and our six control variables.13 Skewness is significantly persistent, both in a univariate and multivariate context, although the explanatory power of the regressions is not very high. One possibility is to use the fitted values from the monthby-month cross-sectional regressions as a measure of expected skewness (as in Boyer, Mitton, and Vorkink, 2010), and thus, we include this variable in the crosssectional return regressions that follow. Table 13 presents the cross-sectional Fama-MacBeth regression results including TSKEW, SSKEW, ISKEW, and expected total skewness (E(TSKEW)) as control variables. The table reports the time-series averages of the slope coefficients over the sample period July 1962–December 2005, with Newey-West adjusted t-statistics given in parentheses. The inclusion of any of the skewness measures has only a limited effect on MAX. The average coefficients on MAX in the different specifications are all
13
Using idiosyncratic skewness generates similar results.
444
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
Table 12 Cross-sectional predictability of skewness. Each month from July 1962 to December 2005 we run a firm-level cross-sectional regression of the total skewness measured using daily returns over the subsequent year (TSKEW) on subsets of lagged predictor variables including TSKEW in the previous year and six control variables that are defined in the Appendix. The table reports the time-series averages of the cross-sectional regression coefficients, their associated Newey-West (1987) adjusted tstatistics (in parentheses), and the regression R-squareds. TSKEW 0.1507 (28.38) 0.0813 (21.70)
BETA
SIZE
BM
MOM
REV
ILLIQ
R2 2.46%
0.0026 (1.82)
0.1218 ( 23.10)
0.0472 ( 6.30)
0.1489 (12.70)
0.0007 (3.79)
0.0271 (1.94)
9.69%
Table 13 Firm-level cross-sectional return regressions with MAX and skewness. Each month from July 1962 to December 2005 we run a firm-level cross-sectional regression of the return in that month on subsets of lagged predictor variables including MAX in the previous month, skewness measured over the preceding year (TSKEW, SSKEW, ISKEW), fitted expected total skewness (E(TSKEW)) based on the regression in Table 12, and six control variables that are defined in the Appendix. In each row, the table reports the time-series averages of the cross-sectional regression slope coefficients and their associated Newey-West (1987) adjusted t-statistics (in parentheses). MAX
BETA
SIZE
BM
MOM
REV
ILLIQ
TSKEW
SSKEW
ISKEW
E(TSKEW)
0.1330 (2.56) 0.2436 (0.84) 0.1324 (2.53) 0.0551 ( 5.36) 0.0538 ( 5.17) 0.0551 ( 5.36)
0.0524 ( 4.82)
0.0474 (1.90) 0.0473 (1.89) 0.0475 (1.90)
0.0435 (1.30)
0.1292 ( 2.89) 0.1318 ( 2.89) 0.1294 ( 2.90)
0.0112 ( 0.10)
0.3194 (4.54) 0.3214 (4.53) 0.3195 (4.54)
0.3366 (3.55)
0.6983 (4.95) 0.7057 (4.98) 0.6981 (4.95)
0.5073 (2.68)
0.0712 ( 13.29) 0.0711 ( 13.28) 0.0712 ( 13.29)
0.0752 ( 12.06)
approximately 0.055, slightly smaller in magnitude than the 0.064 reported in Table 7, but still economically very significant and statistically significant at all conventional levels, with t-statistics above 5.0 in magnitude. In all the specifications, the coefficients on the skewness variables are positive, the opposite of the sign one would expect if investors have a preference for positive skewness. However, in the full specifications these average coefficients are statistically insignificant. The results for systematic skewness (co-skewness) differ from the significant negative relation found in Harvey and Siddique (2000) and Smith (2007), presumably due to differences in the methodology. For idiosyncratic skewness, we cannot replicate the negative and significant relation found in Zhang (2005) and Boyer, Mitton, and Vorkink (2010). Again differences in methodology presumably account for the discrepancy, a key difference being that both papers predict only portfolio returns, not the returns on individual securities. For our purposes, however, the message of Tables 11 and 13 is clear. There is no evidence that the effect of extreme positive returns that we show is subsumed by available measures of skewness.
0.0288 (3.53) 0.0290 (3.54) 0.0287 (3.53)
0.0108 (0.05)
0.0436 (1.67) 0.5202 (0.91) 0.0426 (1.63) 1.5188 (3.92) 0.9947 (0.87)
5. Conclusion We find a statistically and economically significant relation between lagged extreme positive returns, as measured by the maximum daily return over the prior month or the average of the highest daily returns within the month, and future returns. This result is robust to controls for numerous other potential risk factors and control variables. Of particular interest, inclusion of our MAX variable reverses the anomalous negative relation between idiosyncratic volatility and returns in Ang, Hodrick, Xing, and Zhang (2006, 2009). We interpret our results in the context of a market with poorly diversified yet risk-averse investors who have a preference for lottery-like assets. In fact, it may be the preference for lottery-like payoffs that causes underdiversification in the first place, since well-diversified equity portfolios do not exhibit this feature. Thus, the expected returns on stocks that exhibit extreme positive returns are low but, controlling for this effect, the expected returns on stocks with high idiosyncratic risk are high. Why is the effect we report not traded away by other well-diversified investors? Exploiting this phenomenon
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
would require shorting stocks with extreme positive returns. The inability and/or unwillingness of many investors to engage in short-selling has been discussed extensively in the literature. Moreover, stocks with extreme positive returns are small and illiquid, on average, suggesting that transactions costs may be a serious impediment to implementing the relevant trading strategy. Finally, these small stocks tend to be held and traded by individual investors, rather than by institutions who might attempt to exploit this phenomenon. We also present some evidence that stocks with extreme negative returns exhibit the reverse effect, i.e., investors find them undesirable and hence, they offer higher future returns. However, this phenomenon is not robust in all our cross-sectional regression specifications, and it appears to be concentrated in a smaller subsample of stocks than the effect of extreme positive returns. Of course, since exploiting this anomaly does not require taking a short position, one might expect the effect to be smaller than for stocks with extreme positive returns due to the actions of well-diversified traders. While the extreme daily returns we exploit are clearly idiosyncratic, we make no effort to classify them further. In other words, we do not discriminate between returns due to earnings announcements, takeovers, other corporate events, or releases of analyst recommendations. Nor do we distinguish price moves that occur in the absence of any new public information. Interestingly, the preponderance of existing evidence indicates that stocks under-react rather than over-react to firm-specific news14; therefore, if the extreme positive returns were caused by good news, one should expect to see the reverse of the effect that we show. Given the magnitude and robustness of our results, this presents a potentially fruitful avenue of further research. Investigating the time-series patterns in the return premiums we document is also of interest. For example, it is conceivable that the magnitude of these premiums is affected by investor sentiment (Baker and Wurgler, 2007).
Appendix. Variable definitions MAXIMUM: MAX is the maximum daily return within a month: MAXi,t ¼ maxðRi,d Þ,
d ¼ 1,. . .,Dt ,
ð2Þ
where Ri,d is the return on stock i on day d and Dt is the number of trading days in month t. MINIMUM: MIN is the negative of the minimum daily return within a month: MINi,t ¼ minðRi,d Þ,
d ¼ 1,. . .,Dt ,
ð3Þ
where Ri,d is the return on stock i on day d and Dt is the number of trading days in month t. TOTAL VOLATILITY: The total volatility of stock i in month t is defined as the standard deviation of daily 14 See Daniel, Hirshleifer, and Subrahmanyam (1998) for a survey of some of this literature.
returns within month t: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi TVOLi,t ¼ varðRi,d Þ:
445
ð4Þ
BETA: To take into account nonsynchronous trading, we follow Scholes and Williams (1977) and Dimson (1979) and use the lag and lead of the market portfolio as well as the current market when estimating beta: Ri,d rf ,d ¼ ai þ b1,i ðRm,d1 rf ,d1 Þ þ b2,i ðRm,d rf ,d Þ þ b3,i ðRm,d þ 1 rf ,d þ 1 Þ þ ei,d ,
ð5Þ
where Ri,d is the return on stock i on day d, Rm,d is the market return on day d, and rf,d is the risk-free rate on day d.15 We estimate Eq. (5) for each stock using daily returns within a month. The market beta of stock i in month t is defined as b^ i ¼ b^ 1,i þ b^ 2,i þ b^ 3,i . IDIOSYNCRATIC VOLATILITY: To estimate the monthly idiosyncratic volatility of an individual stock, we assume a single-factor return-generating process: Ri,d rf ,d ¼ ai þ bi ðRm,d rf ,d Þ þ ei,d ,
ð6Þ
where ei,d is the idiosyncratic return on day d. The idiosyncratic volatility of stock i in month t is defined as the standard deviation of daily residuals in month t: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi IVOLi,t ¼ varðei,d Þ: ð7Þ SIZE: Following the existing literature, firm size is measured by the natural logarithm of the market value of equity (a stock’s price times shares outstanding in millions of dollars) at the end of month t 1 for each stock. BOOK-TO-MARKET: Following Fama and French (1992), we compute a firm’s book-to-market ratio in month t using the market value of its equity at the end of December of the previous year and the book value of common equity plus balance-sheet deferred taxes for the firm’s latest fiscal year ending in the prior calendar year.16 INTERMEDIATE-TERM MOMENTUM: Following Jegadeesh and Titman (1993), the momentum variable for each stock in month t is defined as the cumulative return on the stock over the previous 11 months starting two months ago, i.e., the cumulative return from month t 12 to month t 2. SHORT-TERM REVERSAL: Following Jegadeesh (1990) and Lehmann (1990), the reversal variable for each stock in month t is defined as the return on the stock over the previous month, i.e., the return in month t 1. ILLIQUIDITY: Following Amihud (2002), we measure stock illiquidity for each stock in month t as the ratio of the absolute monthly stock return to its dollar trading volume: ILLIQ i,t ¼ 9Ri,t 9=VOLDi,t ,
ð8Þ
where Ri,t is the return on stock i in month t, and VOLDi,t is the respective monthly trading volume in dollars. 15 In our empirical analysis, Rm,d is measured by the CRSP daily value-weighted index and rf,d is the one-month T-bill return available at Kenneth French’s online data library. 16 To avoid issues with extreme observations, following Fama and French (1992), the book-to-market ratios are winsorized at the 0.5% and 99.5% levels, i.e., the smallest and largest 0.5% of the observations on the book-to-market ratio are set equal to the 0.5th and 99.5th percentiles, respectively.
446
T.G. Bali et al. / Journal of Financial Economics 99 (2011) 427–446
TOTAL SKEWNESS: The total skewness of stock i for month t is computed using daily returns within year t: Dt Ri,d mi 3 1 X , ð9Þ TSKEWi,t ¼ Dt d ¼ 1 si where Dt is the number of trading days in year t, Ri,d is the return on stock i on day d, mi is the mean of returns of stock i in year t, and si is the standard deviation of returns of stock i in year t. SYSTEMATIC and IDIOSYNCRATIC SKEWNESS: Following Harvey and Siddique (2000), we decompose total skewness into idiosyncratic and systematic components by estimating the following regression for each stock: Ri,d rf ,d ¼ ai þ bi ðRm,d rf ,d Þ þ gi ðRm,d rf ,d Þ2 þ ei,d ,
ð10Þ
where Ri,d is the return on stock i on day d, Rm,d is the market return on day d, rf,d is the risk-free rate on day d, and ei,d is the idiosyncratic return on day d. The idiosyncratic skewness (ISKEW) of stock i in year t is defined as the skewness of daily residuals ei,d in year t. The systematic skewness (SSKEW) or co-skewness of stock i in year t is the estimated slope coefficient g^ i,t in Eq. (10). References Amihud, Y., 2002. Illiquidity and stock returns: cross-section and timeseries effects. Journal of Financial Markets 5, 31–56. Ang, A., Hodrick, R.J., Xing, Y., Zhang, X., 2006. The cross-section of volatility and expected returns. Journal of Finance 61, 259–299. Ang, A., Hodrick, R.J., Xing, Y., Zhang, X., 2009. High idiosyncratic volatility and low returns: international and further U.S. evidence. Journal of Financial Economics 91, 1–23. Arditti, F.D., 1967. Risk and the required return on equity. Journal of Finance 22, 19–36. Arditti, F.D., 1971. Another look at mutual fund performance. Journal of Financial and Quantitative Analysis 6, 909–912. Baker, M., Wurgler, J., 2007. Investor sentiment in the stock market. Journal of Economic Perspectives 21, 129–151. Bali, T.G., Cakici, N., 2008. Idiosyncratic volatility and the cross-section of expected returns. Journal of Financial and Quantitative Analysis 43, 29–58. Barberis, N., Huang, M., 2008. Stocks as lotteries: the implications of probability weighting for security prices. American Economic Review 98, 2066–2100. Barone-Adesi, G., 1985. Arbitrage equilibrium with skewed asset returns. Journal of Financial and Quantitative Analysis 20, 299–313. Boyer, B., Mitton, T., Vorkink, K., 2010. Expected idiosyncratic skewness. Review of Financial Studies 23, 169–202. Brunnermeier, M.K., Gollier, C., Parker, J.A., 2007. Optimal beliefs, asset prices and the preference for skewed returns. American Economic Review 97, 159–165. Calvet, L.E., Campbell, J.Y., Sodini, P., 2007. Down or out: assessing the welfare costs of household investment mistakes. Journal of Political Economy 115, 707–747. Carhart, M.M., 1997. On persistence in mutual fund performance. Journal of Finance 52, 57–82. Conrad, J., Dittmar, R.F., Ghysels, E., 2008. Skewness and the bubble. Unpublished working paper, University of North Carolina at Chapel Hill. Daniel, K., Hirshleifer, D., Subrahmanyam, A., 1998. Investor psychology and security market under- and overreactions. Journal of Finance 53, 1839–1885. Daniel, K., Titman, S., 1997. Evidence on the characteristics of crosssectional variation in stock returns. Journal of Finance 52, 1–33. Dimson, E., 1979. Risk measurement when shares are subject to infrequent trading. Journal of Financial Economics 7, 197–226. Fama, E.F., French, K.R., 1992. Cross-section of expected stock returns. Journal of Finance 47, 427–465. Fama, E.F., French, K.R., 1993. Common risk factors in the returns on stocks and bonds. Journal of Financial Economics 33, 3–56. Fama, E.F., French, K.R., 2008. Average returns, B/M, and share issues. Journal of Finance 63, 2971–2995.
Fama, E.F., MacBeth, J.D., 1973. Risk and return: some empirical tests. Journal of Political Economy 81, 607–636. Fang, L., Peress, J., 2009. Media coverage and the cross-section of stock returns. Journal of Finance 64, 2023–2052. Friend, I., Westerfield, R., 1980. Co-skewness and capital asset pricing. Journal of Finance 35, 897–913. Fu, F., 2009. Idiosyncratic risk and the cross-section of expected stock returns. Journal of Financial Economics 91, 24–37. Garrett, T.A., Sobel, R.S., 1999. Gamblers favor skewness, not risk: further evidence from United States’ lottery games. Economics Letters 63, 85–90. Goetzmann, W.N., Kumar, A., 2008. Equity portfolio diversification. Review of Finance 12, 433–463. Han, B., Kumar, A., 2008. Retail clienteles and the idiosyncratic volatility puzzle. Unpublished working Paper, University of Texas at Austin. Harvey, C., Siddique, A., 1999. Autoregressive conditional skewness. Journal of Financial and Quantitative Analysis 34, 465–487. Harvey, C., Siddique, A., 2000. Conditional skewness in asset pricing tests. Journal of Finance 55, 1263–1295. Huang, W., Liu, Q., Rhee, G., Zhang, L., 2010. Return reversals, idiosyncratic risk, and expected returns. Review of Financial Studies 23, 147–168. Jegadeesh, N., 1990. Evidence of predictable behavior of security returns. Journal of Finance 45, 881–898. Jegadeesh, N., Titman, S., 1993. Returns to buying winners and selling losers: implications for stock market efficiency. Journal of Finance 48, 65–91. Kane, A., 1982. Skewness preference and portfolio choice. Journal of Financial and Quantitative Analysis 17, 15–25. Keim, D.B., 1999. An analysis of mutual fund design: the case of investing in small-cap stocks. Journal of Financial Economics 51, 173–194. Kraus, A., Litzenberger, R.H., 1976. Skewness preference and the valuation of risk assets. Journal of Finance 31, 1085–1100. Kumar, A., 2009. Who gambles in the stock market? Journal of Finance 64, 1889–1933 Lehmann, B., 1990. Fads, martingales, and market efficiency. Quarterly Journal of Economics 105, 1–28. Lim, K.-G., 1989. A new test of the three-moment capital asset pricing model. Journal of Financial and Quantitative Analysis 24, 205–216. Lintner, J., 1965. The valuation of risky assets and the selection of risky investments in stock portfolios and capital budgets. Review of Economics and Statistics 47, 13–37. Mitton, T., Vorkink, K., 2007. Equilibrium underdiversification and the preference for skewness. Review of Financial Studies 20, 1255–1288. Mossin, J., 1966. Equilibrium in a capital asset market. Econometrica 34, 768–783. Newey, W.K., West, K.D., 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Nyberg, P., 2008. The dynamic behavior of the idiosyncratic volatility discount: aggregate idiosyncratic volatility and return reversals revisited. Unpublished working paper, Hanken School of Economics. Odean, T., 1999. Do investors trade too much? American Economic Review 89, 1279–1298 Pastor, L., Stambaugh, R., 2003. Liquidity risk and expected stock returns. Journal of Political Economy 111, 642–685. Patel, N.R., Subrahmanyam, M.G., 1978. Utility theory and participation in unfair lotteries. Journal of Economic Theory 19, 555–557. Scholes, M., Williams, J., 1977. Estimating betas from nonsynchronous data. Journal of Financial Economics 5, 309–327. Sears, R.S., Wei, J.K.C., 1985. Asset pricing, higher moments, and the market risk premium: a note. Journal of Finance 40, 1251–1253. Sharpe, W.F., 1964. Capital asset prices: a theory of market equilibrium under conditions of risk. Journal of Finance 19, 425–442. Smith, D.R., 2007. Conditional coskewness and asset pricing. Journal of Empirical Finance 14, 91–119. Spiegel, M.I., Wang, X., 2005. Cross-sectional variation in stock returns: liquidity and idiosyncratic risk. Unpublished working paper, Yale University. Thaler, R.H., Ziemba, W.T., 1988. Parimutuel betting markets: racetracks and lotteries. Journal of Economic Perspectives 2, 161–174. Tversky, A., Kahneman, D., 1992. Advance in prospect theory: cumulative representation of uncertainty. Journal of Risk and Uncertainty 5, 297–323. Van Nieuwerburgh, S., Veldkamp, L., 2010. Information acquisition and under-diversification. Review of Economic Studies 77, 779–805. Walker, I., Young, J., 2001. An economist’s guide to lottery design. Economic Journal 111, F700–F722. Zhang, Y., 2005. Individual skewness and the cross-section of expected returns. Unpublished working paper, Yale University.
Journal of Financial Economics 99 (2011) 447–475
Contents lists available at ScienceDirect
Journal of Financial Economics journal homepage: www.elsevier.com/locate/jfec
The structure and formation of business groups: Evidence from Korean chaebols$ Heitor Almeida a,, Sang Yong Park b, Marti G. Subrahmanyam c, Daniel Wolfenzon d a
University of Illinois, Urbana-Champaign, United States Yonsei University, South Korea New York University, United States d Columbia University, United States b c
a r t i c l e in fo
abstract
Article history: Received 15 May 2009 Received in revised form 2 February 2010 Accepted 2 March 2010 Available online 18 September 2010
We study the evolution of Korean chaebols (business groups) using ownership data. Chaebols grow vertically (as pyramids) when the controlling family uses wellestablished group firms (‘‘central firms’’) to acquire firms with low pledgeable income and high acquisition premiums. Chaebols grow horizontally (through direct ownership) when the family acquires firms with high pledgeable income and low acquisition premiums. Central firms trade at a relative discount, due to shareholders’ anticipation of value-destroying acquisitions. Our evidence is consistent with the selection of firms into different positions in the chaebol and ascribes the underperformance of pyramidal firms to a selection effect rather than tunneling. & 2010 Elsevier B.V. All rights reserved.
JEL classification: G32 G34 Keywords: Business groups Family firms Pyramids Cross-shareholdings Tunneling Mergers and acquisitions
1. Introduction
$
We wish to thank David Thesmar (the referee), Utpal Battacharya, Mara Faccio, Radha Gopalan, Raghu Rau (WFA discussant), Woochan Kim, Hannes Wagner (EFA discussant), Luigi Zingales (NBER discussant), Paolo Volpin (Paris Spring Corporate Finance discussant), and participants at the 2008 WFA meetings, the 2008 NBER Summer Institute (Corporate Finance), the 2008 EFA meetings, the 2008 Paris Spring Corporate Finance conference, the 2008 AsianFA-NFA International Conference, and seminars at Temple University, University of Minnesota, Notre Dame University, University of Texas at Austin, University of Melbourne, Washington University at St Louis, Indiana University, Purdue University, University of Toronto, University of Washington, Duke University, SIFR, FGV-Rio, Vienna University of Economics and Business Administration, and New York University for helpful comments. Ki Beom Binh, Yong Hyuk Choi, Jiyoon Lee, Quoc Nguyen, Seul-Ki Jung, Igor Cunha and Andre De Souza provided outstanding research assistance. All errors are our own. Corresponding author. E-mail address:
[email protected] (H. Almeida). 0304-405X/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jfineco.2010.08.017
Groups of firms under common ownership are prevalent around the world. These so-called business groups account for a large fraction of the economic activity of many countries.1 Most of these groups are controlled by families that hold equity stakes in group firms either directly or indirectly through other firms in the group. For example, one typical ownership structure is referred to as a pyramid. In this structure, the family achieves control of the constituent firms by a chain of ownership relations: the
1 Claessens, Fan, and Lang (2002) find that, in eight out of the nine Asian countries they study, the top 15 family groups control more than 20% of the listed corporate assets. In a sample of 13 Western European countries, Faccio and Lang (2002) find that, in nine countries, the top 15 family groups control more than 20% of the listed corporate assets.
448
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
family directly controls a firm, which in turn controls another firm.2 The previous empirical literature has generally taken group structure as given, and studied the consequences induced by its ownership structure. The literature focuses mostly on the relationship between the controlling family’s cash flow and voting rights and measures of accounting performance and valuation (see, e.g., Claessens, Djankov, and Lang, 2000; Faccio and Lang, 2002). In particular, the findings in the literature suggest that pyramidal ownership may reduce firm performance (see, e.g., Claessens, Djankov, Fan, and Lang, 2002; Joh, 2003), perhaps because of tunneling incentives created by pyramiding (Bertrand, Mehta, and Mullainathan, 2002; Bae, Kang, and Kim, 2002; Baek, Kang, and Lee, 2006). However, the causes that determine a group’s ownership structure remain largely unexplored. In particular, while there have been some recent theoretical attempts to understand pyramidal ownership, there is little empirical research that focuses on how pyramids evolve over time.3 We try to fill this gap in this paper. Our tests draw mostly on Almeida and Wolfenzon’s (2006) theory of pyramidal ownership. In their model, the controlling family chooses the optimal ownership structure of a new firm (call it firm B) which is to be added to the group (for example, through an acquisition). The choices are a pyramidal structure, whereby the family uses the equity of an existing group firm (call it firm A) to finance the investment in the new firm, and a direct ownership structure, whereby the investment is paid for with the family’s personal wealth. The theory generates predictions about the characteristics of firms that are placed in pyramids rather than under direct control. First, firms that have cash flows and/or assets that are difficult to pledge to outside investors (low pledgeability) should be placed in pyramids. This relationship arises because group equity (such as the equity of firm A) is particularly valuable as a financing tool when the family is financially constrained. Since financial constraints are more likely to bind for low pledgeability firms, such firms are optimally controlled through pyramids. Second, the lower the net present value (NPV) of the new firm, the more likely it is that the new firm will be placed in a pyramid. Pyramidal ownership forces the family to share the NPV of firm B with minority shareholders of firm A. Thus, the family prefers to directly control high NPV firms. Third, the theory predicts that firms that are used by the family to set up and acquire other firms (such as firm A) should trade at a discount relative to other public group firms. The valuation discount arises because investors anticipate the
2 Pyramids are very common throughout the world. See, among others, Claessens, Djankov, and Lang (2000), for the evidence on East Asia, Faccio and Lang (2002) and Barca and Becht (2001) for Western Europe, Khanna (2000) for emerging markets, and Morck, Stangeland, and Yeung (2000) for Canada. 3 A recent paper by Fan, Wong, and Zhang (2009) focuses on the formation of state-owned pyramids in China. As discussed by those authors, state-owned Chinese firms are special in that they show no separation between ownership and control. Bertrand, Johnson, Samphantharak, and Schoar (2008) use cross-sectional data on Thai business groups to study the role of family structure for group ownership structure and group firm performance. In particular, they find that groups that are controlled by larger families are more pyramidal in structure.
selection of low NPV firms into pyramids and thus, discount firm A’s shares to compensate for the poor returns associated with future pyramidal investments. We use a unique data set of Korean business groups to test the theory’s implications. The political and regulatory context of chaebols allows us to obtain extremely detailed ownership data on chaebol firms. Since the mid-1990s, the top Korean chaebols have had to report their complete ownership information to the Korean Fair Trade Commission (KFTC). These reports include ownership and accounting data on all firms (public or private) in each chaebol. Another feature that distinguishes our data is their dynamic nature. We have a panel from 1998 to 2004, for a relatively comprehensive sample of chaebol firms. In most countries, these type of data are not generally available.4 The theoretical arguments above motivate new metrics of group ownership other than the standard measures of cash flow and voting rights. First, we provide a measure of the position of any group firm relative to the controlling shareholder. This metric allows us to distinguish pyramidal from direct ownership. In addition, to identify firms that the family uses to set up new firms (such as firm A in the description above), we compute the centrality of a firm for the group structure (e.g., whether a given firm is used by the family to control other group firms).5 We also introduce a new metric to compute voting rights that we call critical control threshold. This metric is closely related to the concept of the weakest link that is used in existing literature. However, unlike the weakest link, it can be computed for group structures of any degree of complexity. We provide algorithms that generate these ownership measures. In our data, this is necessary because the complex ownership structures of Korean chaebols with dozens of firms and several ownership links among them make it difficult for the researcher to directly compute them.6 We start by describing the basic characteristics of Korean chaebols. We find that both pyramids and cross-shareholdings are common in Korean chaebols. Nevertheless, pyramids in Korean chaebols are not ‘‘deep.’’ A large majority of chaebol firms belong to pyramids with a total of two or three firms in the chain. Only a few group firms in each group are classified as being central, and they tend to be the older and larger firms in the group. These findings suggest that in a typical Korean chaebol, a few central firms hold stakes in a large number of firms controlled through a pyramid involving the central firms. We also observe a substantial number of firms that are controlled directly by the family, with no ownership links to other chaebol firms. This cross-sectional variation in chaebol firm ownership structures allows us to test the predictions described above. The empirical evidence on the characteristics of group firms is consistent with the theoretical predictions. First, we
4 Franks, Mayer, Volpin, and Wagner (2008) assemble a data set that contains ownership information on private firms in France, Germany, Italy, and the UK. They focus on the trade-off between family and dispersed ownership, rather than on the ownership structure of groups. 5 The measure of centrality that we derive is similar (but not identical) to that proposed by Kim and Sung (2006). 6 Our algorithms can also be useful in other countries in which groups have complex ownership structures.
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
find that firms that are controlled through pyramids have lower profitability than directly controlled firms. This result is consistent with the selection of firms with low pledgeable income into pyramids. However, it is also consistent with the tunneling of profits away from pyramidal firms and towards firms in which the family has higher cash flow ownership (those at the top of the group). In order to distinguish between the tunneling and the selection explanations for the low profitability of pyramidal firms, we perform several additional empirical tests. First, we focus on instances of large changes in a firm’s position in the group, and ask whether low (high) past profitability predicts large increases (decreases) in the degree of pyramiding in a firm’s ownership structure. Examining large changes in position is useful because it allows us to rule out alternative explanations that rely on the fact that position does not change much over time. Second, we examine a sample of new firms that are added to chaebols during our sample period, and study the determinants of their ownership structure. A firm’s profitability in the year prior to becoming a chaebol firm cannot be affected by the ownership structure chosen later by the chaebol’s controlling family. However, pre-chaebol profitability should explain the firm’s ownership structure, according to the selection hypothesis. Third, we construct additional proxies for pledgeability other than profitability and ask whether these proxies also help predict a new firm’s position in the group. These alternative proxies focus on the ‘‘asset’’ rather than the ‘‘cash flow’’ dimension of pledgeability (for which profitability is a natural proxy). Specifically, we ask whether the family tends to place firms with low tangibility and low collateral in pyramids. Fourth, we examine additional implications of the tunneling hypothesis. We use the sample of newly added firms, and analyze the change in their profitability in the year following their acquisition by the chaebol. The tunneling hypothesis predicts that if a new firm is placed in a pyramid, its profitability should decrease. We also examine predictions of the tunneling hypothesis for the distribution of dividends and accruals in chaebol firms. Specifically, firms at the top of the group should pay higher dividends and display lower accruals than firms that are owned through pyramids. In addition, we use the sample of newly added firms to ask whether dividends and accruals change from the year prior to the year following the firm’s acquisition by a chaebol. Our results are consistent with the selection hypothesis. First, poor past performance predicts an increase in the extent of pyramidal ownership in a firm’s ownership structure. Second, pre-chaebol profitability is strongly related to a firm’s initial position in the group—incoming low profitability firms are more likely to be placed lower down in pyramids. Third, there is some suggestive evidence that low asset pledgeability firms are selected into pyramids. Fourth, we find no support for specific implications of the tunneling hypothesis. New pyramidal firms’ profitability does not decline after they are placed in a chaebol. Dividends and accruals are also unrelated to a firm’s position in the chaebol, and they do not change after a new firm is added to a chaebol. These results suggest that the correlation between pyramidal ownership and profitability that is shown in the previous literature may be due to reverse causality. Low profitability firms are selected
449
into pyramids, but pyramidal ownership does not appear to affect the relative performance of chaebol firms. Next, we test the implication that low NPV firms are selected into pyramids while higher NPV firms are controlled directly by the family. To do so, we focus on a sample of acquisitions that were made by chaebols during our sample period, and use the acquisition premium as a proxy for the NPV of the transaction (the ratio of the acquisition price to the new firm’s book value of equity). We find that the controlling family tends to place firms with high acquisition premiums in pyramids, and chooses to directly control firms with low acquisition premiums. This implication is specific to the selection hypothesis, and thus, provides further evidence that new chaebol firms are selected into different positions of the chaebol according to the financing and valuation consequences of these new acquisitions. Finally, we test the implication that the chaebol firms that are used by the family to acquire other firms through pyramids (‘‘central firms’’) should trade at a discount relative to other chaebol firms. To do so, we compare the market-to-book (Tobin’s Q) ratios of central firms to those of other group firms that do not hold substantial equity in other firms (‘‘non-central’’ firms). Consistent with this implication, we find a robust negative correlation between centrality and market-to-book ratios (Tobin’s Q). Central firms are valued at a discount relative to all other types of group firms (including both directly and pyramidally owned firms). This result indicates that the defining firm characteristic that generates the valuation discount is the fact that a firm holds significant equity in other firms, and not the firm’s position in the group. We also provide direct evidence that the central firm discount is due to the anticipation of value-destroying, pyramidal acquisitions by the chaebol central firms (as suggested by the selection hypothesis). First, while we cannot directly measure shareholders’ expectations of future acquisitions, we can examine whether central firms did more acquisitions than other group firms during our sample period. Under the hypothesis that shareholders look at past acquisitions to predict future ones, a positive correlation between centrality and past acquisition activity can help validate the selection interpretation for the central firm discount. Second, if the selection interpretation is correct, central firms that have shown greater acquisition activity should trade at a larger discount than other central firms that have not been used as acquisition vehicles. The evidence that we find supports the selection hypothesis. Central group firms are, in fact, the ones that are most likely to be used as acquisition vehicles. In addition, central firms that are active acquirers trade at larger discounts than other central firms. These results suggest that the selection of low NPV firms into pyramids is anticipated by shareholders, who discount the shares of central firms accordingly. Overall, we believe that our paper contributes to the literature on business groups in five ways. First, our results shed new light on the process by which pyramids form (through the acquisition of low profitability, low asset pledgeability, high acquisition premium firms by the group’s central firms). Second, we provide evidence that the relative performance of group firms is fundamentally
450
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
affected by the selection of different types of firms into different positions in the group. In particular, we show evidence that the underperformance of pyramidal firms shown in previous literature could be due to a selection effect. Third, we present new results on the relative valuation of business group firms (the central firm valuation discount and its relationship to acquisition activity). Fourth, we develop new metrics of group ownership structure (e.g., position, centrality, and the critical control threshold) that can be useful for other researchers studying complex ownership structures of firms. Fifth, we use our metrics to describe and summarize the typical structure of a Korean chaebol. The outline of the paper is as follows. Section 2 provides a brief review of the literature on the financial performance of family groups. Section 3 develops the empirical implications that we test in this paper. Section 4 introduces our methodology to compute ownership variables for group firms. In Section 5 we describe our data set. Section 6 presents our main empirical tests, and Section 7 concludes.
2. Literature review There is a vast literature on family business groups.7 In this section, we discuss briefly the part of the literature that links ownership structure to financial performance. The existing literature points out that the ownership structure of business groups is a potential determinant of group firm performance and valuation. Most papers use cash flows and voting rights as the main metrics to describe group structure. For example, Bertrand, Mehta, and Mullainathan (2002) use a sample of Indian business groups to show that the value of group firms is affected by the controlling families’ tunneling of resources from firms in which they have low cash flow rights to firms in which their ultimate stake is high.8 In the context of Korean chaebols, Baek, Kang, and Lee (2006) argue that discounted equity issues are more likely when the controlling shareholder has higher ultimate ownership in the acquirer than in the issuer. Bae, Kang, and Kim (2002) argue that intra-chaebol acquisitions transfer wealth from firms in which the family has low cash flow rights (typically the acquirer) to those in which the family has higher cash flow rights.9 Claessens, Djankov, Fan, and Lang (2002) show that firm value is negatively related to the separation between ownership and control in East Asia, while Lins (2003) finds similar results for a sample of firms from emerging markets. Joh (2003) finds that the separation between ownership and control is negatively related to profitability in Korea.10 7
For a detailed review, see Morck, Wolfenzon, and Yeung (2005). In contrast, Gopalan, Nanda, and Seru (2006) examine intra-group loans in Indian business groups, and find little evidence of tunneling. They suggest that loans are used to support financially weaker firms in the group. 9 In a related fashion, Cheung, Rau, and Stouraitis (2006) find that connected transactions between Hong Kong-listed companies and their controlling shareholders (such as transfer of assets across firms under the shareholders’ control) result in value losses for minority shareholders. Their sample includes both group and non-group firms. 10 Bennedsen and Nielsen (2006) find that valuation is negatively related to the separation between ownership and control in Continental 8
Instead of focusing on measures of cash flow and voting rights, other papers examine variables that indicate whether a firm has some indirect (e.g., pyramidal) ownership. In particular, Claessens, Djankov, Fan, and Lang (2002) and Volpin (2002) provide evidence that firms with indirect ownership have lower Tobin’s Q than other firms. In contrast, Masulis, Pham, and Zein (2008) find that Tobin’s Q is higher in pyramidal firms than in firms at the top of the group. The literature has also examined whether group membership affects valuation (Khanna and Rivkin, 2001; Khanna and Palepu, 2000; Fisman and Khanna, 2000; Claessens, Fan, and Lang, 2002). Khanna and Palepu (2000), for example, find a positive effect of group membership in their sample from India. In contrast, Ferris, Kim, and Kitsabunnarat (2003) find a negative effect of Korean chaebol membership on firm value. Baek, Kang, and Park (2004) focus on the effects of the Asian crisis on Korean firms, and show evidence for a stronger impact of the crisis on chaebol firms. In a cross-country study, Masulis, Pham, and Zein (2008) find that, after controlling for group membership choice, groups help improve firm value. In the Initial Public Offering (IPO) context, Marisetty and Subrahmanyam (2010) study underpricing of stand-alone and group firms. Finally, the literature provides some evidence on the correlation between ownership variables and firm characteristics. In particular, there is evidence that firms that are owned through pyramids are smaller and younger than firms at the top of the group (those that own shares in other firms). Aganin and Volpin (2005) describe the evolution of the Pesenti group in Italy, and show that it was created by adding new subsidiaries to the firms the Pesenti family already owned, through carve-outs of existing group firms. One of their conclusions is that, in Italy, business groups expand through acquisitions when they are large and have significant cash resources. Claessens, Fan, and Lang (2002) find that firms with the highest separation of votes and ownership (i.e., those most likely to be owned through pyramids) are younger than those with less separation. Pyramidal firms also seem to be associated with larger scales of capital investment (Attig, Fischer, and Gadhoum, 2003). Claessens, Fan, and Lang (2002) also find that, in East Asia, group firms tend to be larger than unaffiliated firms. Bianchi, Bianco, and Enriques (2001) find similar evidence for Italy.11 Finally, Bianco and Nicodano (2006) find that in Italian pyramids, pyramidal firms have lower leverage than firms at the top of the group. 3. Hypotheses regarding the formation of pyramids The traditional informal explanation for pyramidal structures is based on the idea that families try to control as many firms as possible to enjoy private benefits of control. Pyramidal structures lead to a separation of cash flow from (footnote continued) Europe, but also that profitability is unrelated to measures of separation in the same region. 11 Kang, Park, and Jang (2006a) also analyze the family’s choice of ownership structure in chaebols. However, they focus on average ownership characteristics of the entire group rather than on characteristics of individual chaebol firms.
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
voting rights that allow these families to minimize their ultimate cash flow stake in the firms they control (see, e.g., Bebchuk, Kraakman and Triantis, 2000).12 According to this argument, pyramidal structures are only a device to achieve the desired separation of cash flow from control rights. As discussed by Almeida and Wolfenzon (2006), while pyramids are generally associated with large deviations from ‘‘one share-one vote,’’ this pattern is not universal (see, e.g., Franks and Mayer, 2001). In addition, despite the fact that the family can also use dual-class shares to separate ownership from control, the incidence of pyramids in different countries does not appear to be caused by restrictions on the use of dualclass shares (La Porta, Lopez-de-Silanes, and Shleifer, 1999). This evidence suggests that considerations other than the separation of cash flow from voting rights motivate the creation of pyramids. Almeida and Wolfenzon (2006) present a model of pyramidal ownership that does not rely on separation between ownership and control. In their model, a family has the choice of setting up a new firm (call it firm B) either through a pyramid or directly. The total value created by the new firm has two components: the private benefits enjoyed by the controlling family, and the project’s net present value that is shared by all owners (henceforth, the ‘‘NPV’’). Under the pyramidal structure, firm B is owned by all the shareholders of the original firm (call it firm A). As a result, the family shares the NPV of the new firm with non-family shareholders of firm A. In addition, the family has access to all of the retained earnings (cash) of firm A to acquire equity stakes in firm B. Under direct ownership, non-family shareholders of firm A have no rights to the cash flows of firm B, and thus, the family captures all of its NPV. However, in this case, the family has access only to its share of the retained earnings of firm A (for example, through dividend payments).13 This argument generates a number of testable hypotheses. First, firms that generate low pledgeable income (for example, low cash flows) are more likely to be set up in pyramids. These firms find it harder to raise external finance, and thus, the family’s ability to use the cash retained in firm A to finance the investment in firm B becomes very valuable. In addition, other firm characteristics that facilitate access to external financing should reduce the likelihood that a firm is placed in a pyramid. For example, firms that can pledge their assets as collateral to raise external finance may not require internal equity investments from the group’s central firms and can be owned directly by the family.14
12 This argument goes back at least to the beginning of the 20th century: Berle and Means (1932) and Graham and Dodd (1934) use it to explain the creation of pyramids in the US. 13 Gopalan, Nanda, and Seru (2007) develop a theory of dividends in business groups that uses arguments that resemble those in Almeida and Wolfenzon (2006). In particular, they show how families can use dividends as a way of transferring cash across group firms to finance group investments. Their focus is on explaining group dividend policy rather than ownership structure. See also De Jong, Dejong, Hege, and Mertens (2009), who study the link between dividend policy and leverage among group firms. 14 It might seem that the family would prefer not to place low pledgeability firms in pyramids as this will only exacerbate the expropriation problem. However, the implication that low pledgeability firms are best placed in pyramids is robust to this effect. In Almeida and Wolfenzon (2006), the controlling family benefits from choosing an
451
The second implication that arises from the theory is that projects with high NPV are more likely to be owned directly by the family. The family is more likely to choose a direct ownership structure for these firms to avoid sharing the high value created with the minority shareholders of firm A. By the same token, the family has incentives to use a pyramidal structure when acquiring a firm that provides high private benefits but low NPV. Third, since the family places low NPV firms in pyramids, investors should expect low returns from pyramidal investments. If investors anticipate significant future pyramidal investments by a group firm, then they should discount the shares of this firm accordingly to compensate for the expected effects of future pyramidal investments on its equity returns. We summarize this discussion with a list of the implications about the structure of business groups, which can be tested with our data on Korean chaebols: Implication 1. The controlling family places new firms with low pledgeability of cash flows/assets in pyramids and directly controls firms with high pledgeability. Implication 2. The family places low NPV firms in pyramids and directly controls firms with high NPV. Implication 3. Public group firms that are used by the family to set up and acquire new group firms should have lower valuations than public group firms that are not used for this purpose. Testing Implication 1 requires a measure of pledgeability of cash flows and assets. We use reported profitability as a measure of the pledgeability of cash flows. The reason is that highly profitable firms require less external funds. In addition, it is reasonable to assume that firms can credibly commit to pay out cash flows that they report in audited financial statements. We use tangibility (defined as property, plant, and equipment normalized by assets), and collateral (defined as property, plant, and equipment plus inventories normalized by assets) as proxies for the pledgeability of assets. In addition, we use intangibles (defined as intangible assets normalized by total assets) as a proxy for non-pledgeability of assets.15 Almeida and (footnote continued) ownership structure that minimizes future expropriation of cash flows, since the costs of expropriation are internalized by the family when setting up the new firm. Still, the family may benefit from a pyramidal structure, since the pyramid allows the family to use more internal funds (the cash retained in firm A), and thereby to reduce the amount of external funds that are required to set up the new firm. This financing advantage, in turn, may allow the family to retain higher ownership of the new firm under a pyramidal structure, and thus, to minimize expropriation. In fact, Almeida and Wolfenzon show that when the new firm produces low pledgeable income (for example, when the new firm has low profitability), the pyramidal structure actually reduces expropriation when compared to a direct ownership structure. 15 We also experimented with other proxies for pledgeability, including capital expenditures over assets (CAPEX), capital-labor, and asset-employee ratios. We believe that the relationship of these variables to the firm’s position in the group is less clear than for profitability and collateral-related proxies, since it is confounded by other effects. For example, a firm with high capital expenditures requires more external financing. However, capital expenditures on tangible assets may generate
452
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
Campello (2007) provide evidence that asset tangibility relaxes firm financing constraints by allowing them to pledge a greater fraction of their assets as collateral. The specific collateral proxy that we use (property, plant, and equipment plus inventory) comes from Frank and Goyal (2009), who argue that this proxy is more robustly correlated with firms’ leverage ratios than the standard proxy for tangibility. Implication 1 predicts that high profitability firms are owned directly while low profitability firms are set up in pyramids. This prediction is consistent with evidence in previous studies (see Section 2). However, the interpretation so far has been that this association is evidence that pyramids reduce profitability because they induce tunneling behavior by the family (Bertrand, Mehta, and Mullainathan, 2002; Bae, Kang, and Kim, 2002, Baek, Kang, and Lee, 2006). In contrast, in our argument, the correlation is driven by the opposite direction of causality: lower profitability firms are selected into pyramids. We provide a battery of empirical tests to provide evidence on the direction of causality suggested by the selection hypothesis. First, we examine a sample of firms that are newly added to the chaebol during our sample period, and relate their position in the chaebol to their profitability in the year prior to the firm’s addition. Since profitability is measured prior to the firm’s acquisition, it cannot be affected by the chaebol’s tunneling activities. Second, we also examine additional predictions from the tunneling hypothesis. For example, we use our sample of newly added firms and examine what happens to their profitability in the year following their acquisition by the chaebol. The tunneling hypothesis predicts that if a new firm is placed in a pyramid, its profitability should decrease. In contrast, the selection hypothesis makes no such prediction. Testing Implication 2 requires a measure of NPV that can be operationalized in a relatively large sample of firms. In the empirical analysis, we focus on a subsample of firms for which a reasonable proxy for NPV can be constructed. Specifically, we use the sample of newly added firms for which we can obtain data on the acquisition price. The new firm’s NPV can then be proxied by the acquisition premium. We compute this premium as the ratio of the acquisition price to the book value of the equity of the new firm.16 The argument above would then predict that the family will use pyramids to acquire a new firm, when the acquisition premium is high. In other words, the family would tend to place firms with high acquisition premiums in pyramids, and directly control firms with low acquisition premiums. Testing Implication 3 requires us to identify the group firms that shareholders expect will be used for acquiring new firms. As Section 5.1.1 shows, there are only a few firms in each group that hold substantial equity stakes in
(footnote continued) collateral which can be pledged to outside investors. A similar reasoning holds for capital-labor ratios. In our analysis (not reported), firms with high capital expenditures tend to be owned through pyramids. Also, there is no clear correlation between capital-labor ratios and firms’ positions in the group. 16 Almost all of the newly acquired firms are private, and thus, we cannot compute an acquisition premium using the pre-acquisition market value of these firms.
other group firms. We call them the group’s central firms. Under the assumption that the family will continue to use the central firms to set up and acquire new firms, these are the firms that should trade at a discount. In order to validate this assumption, we construct a measure of a group firm’s acquisition intensity during our sample period, defined as the sum of the value of equity stakes acquired by each group firm in the event of an acquisition of a new group firm, divided by the book value of the equity of the acquirer. We then examine whether a group firm’s centrality helps predict its future acquisition intensity. In addition, Implication 3 also suggests that central firms that have shown greater acquisition activity should trade at a larger discount than other central firms that have not been used as acquisition devices. We examine this implication as well in our empirical tests.
4. Metrics of group ownership structures In order to test the empirical implications described in Section 3, we develop some new metrics of group structure. Specifically, the theory models the family’s choice of whether to set up a new firm as a partial subsidiary of an established firm, or to hold stakes directly. To capture this notion, we define the variable position. We also define the variable centrality to identify firms that the controlling family uses to set up and acquire new firms. In addition, we argue that the standard measure of voting rights (the weakest link) is difficult to apply to groups with complex ownership structures such as the Korean chaebols. We propose an alternative measure of control in a group, the critical control threshold. We provide formulas and simple algorithms to compute all the metrics we propose. This is crucial for the case of Korea, where the web of ownership relations among group firms can be quite complex. As an illustration of this complexity, in Fig. 1, we have selected only 11 of the 27 firms that form part of the Hyundai Motor group and drawn its ownership structure as of 2004. Needless to say, computing ownership metrics in this group can be a daunting task. Importantly, the formulae we propose can easily deal with any type of ownership structure. In Appendix A, we show a numerical example that illustrates the computation of several of the ownership variables described here, including position, the critical control threshold, and centrality.
4.1. Ultimate cash flow rights, position, and loops We start by considering a business groups with N firms. We define the matrix of inter-corporate holdings A as follows: 2 3 0 s12 ... s1N 6 7 0 ... s2N 7 6 s21 7, A¼6 6 ^ ^ ^ ^ 7 4 5 0 sN1 . . . sNN1 where sij is the stake of firm i in firm j. We also define a vector with the direct stakes of the family in each of the
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
Mr. MongKoo Chung
100% Glovis
5%
58%
3%
9%
453
Changwon 9%
8%
14% Hyundai Motor
5%
INI Steel
7%
2%
Hyundai Mobis
Hyundai Capital
5% 47% 57% Dymos 23%
37% 18%
45% BNG Steel
84% 18% Kia Motor
5%
45% 45%
Ajumetal
World Industries
50%
Fig. 1. Ownership structure of Hyundai Motor in 2004. This figure depicts the ownership structure of the Hyundai Motor chaebol in 2004. Mr. Mong Koo Chung is the ultimate owner of Hyundai Motor. The other boxes represent individual group firms. The figures over the arrows show the equity stakes that each firm holds in other firms.
N firms17: f ¼ ½f1 f2 fN u:
ð1Þ
The key insight to derive all formulas in this section is to follow one dollar of dividends paid by firm i. We write the dividend as a vector of zeroes with a one in the ith position, d0i . The family receives fudi when the dividend is paid and group firms receive Adi. Now suppose group firms pay out to shareholders what they themselves receive as dividends from other companies, i.e., the new dividend is now Adi. The family receives an additional fuðAdi Þ and the cash in group firms out of the original dollar paid is A(Adi)=A2di. A simple pattern emerges: After n rounds of dividends, the cash position of group firms is Andi.18
4.1.1. Ultimate cash flow rights We can now compute the family’s ultimate cash flow rights in firm i, ui, which is defined as the fraction of the dividend originally paid by firm i that is (eventually) received by the family: Proposition 1. The ultimate ownership of the family in each 0 of the n firms is given by u= [u1 u2 yuN] : uu ¼ fuðIN AÞ1 ,
This formula is easy to use and can accommodate any group structure, regardless of its complexity.19 Brioschi, Buzzacchi, and Colombo (1989) use a different method to derive this formula. Essentially the formula works through the matrix of cross-shareholdings to arrive at the ultimate ownership. This is very much in the same spirit as input–output analysis (Leontief, 1986) where the share of an industry or sector in the aggregate economy is being computed. 4.1.2. Position Using the same idea, we can now compute the position of a firm in a group. We define position as the distance between the family and a firm in the group. For example, in the case of a simple pyramid with two firms, the firm at the top of the pyramid is in position 1 and the one at the bottom is in position 2. Since there might be multiple chains from a particular firm to the family, we weigh each chain by its importance in terms of the cash flows the family receives. 0 Note that the family receives f di from firm i directly 0 (position 1). It also receives f Adi from firm i through chains that contain one intermediate firm (position 2) and so on. Therefore, the position of firm i is defined by positioni ¼
1 X fudi fuAdi fuA2 di fuAn1 di 1þ 2þ 3 ¼ n: ui ui ui ui n¼1
ð3Þ
Simplifying this expression leads to20: ð2Þ
where IN is the N N identity matrix. 17 For brevity, we refer to the controlling shareholder as the ‘‘family’’ in the ensuing discussion. 18 This argument does not presume that dividends are actually paid. If the dollar is retained in firm i, the formulas will tell us the fraction of the dollar that is owned by the family and the other group firms (e.g., the cash flow rights of the family and group firms).
19 Most papers in the literature compute cash flow rights by multiplying the stakes along the ownership chain. This is correct under the assumption that no cross-shareholdings exist. Under this assumption, the chain multiplication formula is a special case of Eq. (2). 20 Kang, Park, and Jang (2006b) derive an alternative measure of a firm’s position in a group based on whether a firm owns significant equity in other group firms, or whether other firms own a large fraction of the firm’s equity. The first component of the definition creates a mechanical correlation with our centrality variable (defined below), and so we
454
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
Proposition 2. The position of firm i can be written as: positioni ¼
1 fuðIN AÞ2 di , ui
ð4Þ
where IN is the N N identity matrix. 4.1.3. Loops While it is not the main focus of the empirical tests, we can also use these calculations to check whether a firm is part of a cross-ownership pattern and to compute the number of firms involved in this loop. Essentially, if a dividend paid by firm i eventually reappears in firm i, then i is part of a loop. Also, the number of steps that it takes for funds to reappear for the first time in firm i measures the number of firms in the shortest loop, which we define as loop: Definition 1. Let loopi ¼ minfnjn Z 1 and di uAn di 4 0g,
ð5Þ
then firm i is in a loop if and only if loopi o 1. The number of firms in the shortest loop firm i is involved in is given by loopi.
4.2.1. The set of firms controlled by the family To compute the set of firms controlled by the family, we make two assumptions: Assumption 1. A family controls a firm if and only if it holds more than T votes in it, directly or indirectly. Assumption 2. The votes that a family holds in a firm are the sum of its direct votes, plus all the direct votes of firms under family control, where control is defined in Assumption 1. This definition of control is a combination of the idea of a control threshold (Assumption 1), plus the assumption that, if a family controls a firm, it controls the votes that this firm holds in other firms. The following proposition establishes the formal condition that the set of firms controlled by the family must satisfy (for a given control threshold T). Suppose we start the analysis with a set N, which contains the universe of all candidate firms that could be controlled by the family. For example, this set can represent all firms in a country, or a pre-identified subset of those firms. We then have:
4.2. Control rights and centrality The computation of control rights in a complex group is challenging because it is not clear what fraction of the votes held by intermediate firms is ultimately controlled by the family. The most frequently used measure in the literature is the weakest link, which is defined as the minimum stake along the chain of control. This measure is intuitive for simple pyramids: the controlling family must have a better grip on the control of a firm that is higher up in the pyramid than over a firm lower down that is controlled via the initial one. Yet, this measure has some drawbacks. First, when there are multiple chains used to control a firm, the definition calls for adding up the minimums over all chains. The intuition for this is not as clear. Second, in groups where there are multiple chains leading to one firm, this definition can generate numbers above 100%.21 Finally, the weakest link is not well-defined for firms that are part of loops as there are infinite chains leading to these firms. In light of these problems, we define our own measure of control, the critical control threshold. Essentially, the critical control threshold, or CC in short, is the maximum control threshold for which the firm belongs to the set of firms controlled by the family. This new definition has several appealing features. First, it can be defined for any group structure, regardless of its complexity. Second, it is derived from clearly stated assumptions about the characteristics of control. Finally, it turns out that this measure is equivalent to the weakest link when cross-shareholdings and multiple links are absent (that is, for simple pyramids).22 In that sense, it is a reasonable generalization of that simple, intuitive concept. (footnote continued) believe our definition is more appropriate to the general case of complex ownership structures. 21 Simple examples are available from the authors upon request. 22 In particular, if cross-shareholdings and multiple links are absent or not very substantial the weakest link methodology can be used to
Proposition 3. For a given threshold T, the set of firms controlled by the family is given by 8 9 < = X CðTÞ ¼ i 2 N : fi þ sji Z T : ð6Þ : ; j2CðTÞ,jai
In Appendix B we describe an algorithm that can be used to find C(T). 4.2.2. Critical control threshold: definition We can now define our measure of control rights: Definition 2. For any firm i 2 N, the critical control threshold is given by CC i ¼ maxfTji 2 CðTÞg
ð7Þ
The critical control threshold is the highest control threshold that is consistent with family control of firm i. In other words, if the control threshold were higher than CCi, then firm i would not be part of the set of firms controlled by the family. 4.2.3. Centrality of a firm for the control of the group In the empirical tests, we need to identify group firms that the controlling family uses to set up and control new firms. We identify such firms as those that are important for the control of other firms. This leads to the following definition.23 (footnote continued) compute control rights. For example, Faccio and Lang (2002) show that neither problem is very prevalent in Europe, justifying the use of the weakest link as a measure of control in their sample. 23 Kim and Sung (2006) compute a similar variable for Korea, using cash flow rights instead of voting rights. They show that their measure of centrality is inversely related to the probability that the firm goes public. In contrast, we show below that firms with a high centrality value are much more likely to be public in our sample.
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
Korean regulators to compute the separation between ownership and control in chaebol firms.
Definition 3. We define the centrality of a firm i as P centrali ¼
P
jai CC j
i jai CC j
]N1
,
455
ð8Þ 5. Data description
where CCj i is the critical control threshold of firm j, computed as if firm i held no shares in the other group firms. In words, we compute the centrality of firm i as the average decrease in CC across all group firms other than firm i, after we exclude firm i from the group. This computation essentially determines how central a firm is, by comparing the average critical control threshold with and without including the stakes the firm holds in other firms. In order to show that the empirical results are not driven by the control proxy that we use, we also experiment with an alternative measure of centrality that is based only on the direct equity stakes that each firm holds in other group firms. If we let Aj be the total assets and Ej be the total equity of firm j, we have the following definition:
This section describes the sources for the ownership, accounting, and financial data that we use in this study.
5.1. Ownership data
In words, to compute the total votes held by the family in firm i, we simply add the direct votes held by the family in firm i with all the votes held by other firms that belong to C(T). The VR measure is also the measure that is used by
The ownership data for our study are from the Korean Fair Trade Commission (KFTC), which was established in 1981 with the purpose of regulating competition. In particular, the KFTC’s stated goal is to deter excessive concentration of economic power in a small number of large companies, including chaebols. Among other regulatory constraints, the KFTC requires that chaebol firms report complete ownership data. Chaebols are required to report the status of affiliate shareholders and persons with special interest and the financial status of group companies as of April 1 of each year. Shareholders are categorized into seven types; family owner, the relatives of the family owner, affiliates, nonprofit affiliate, group officer, treasury stock, and others. In addition, our data contain the name, the holding quantity, and the ratio of common stocks and preferred stocks of each individual shareholder. The KFTC defines a chaebol in two steps.25 In the first step, the KFTC defines the set of firms that belong to a business group. There are two criteria for this. The first is based on stock ownership. According to this criterion, a firm belongs to a business group if ownership by the controlling shareholder and related persons (relatives and other affiliated companies of the same business group) amounts to more than 30%, excluding preferred shares. The second criterion is qualitative. Firms are also classified as belonging to a business group when the controlling shareholder exercises ‘‘controlling influence’’ over it. The latter criterion is further detailed to include cases of exchange of directors and managers, and also substantial business transactions between a firm that belongs to the business group and the company in question. Because this criterion of controlling influence is interpreted broadly, some companies legally belong to a group even though neither the families, nor other affiliated companies in the group, own shares in those companies. In the second step, some business groups are designated as chaebols based on size, which is defined as the value of the combined total assets of affiliated companies in the group. From 1987 to 2001, the KFTC annually designated the 30 largest business groups as chaebols. From 2002 onwards, the KFTC started using a new category by including any group with total combined assets greater than a certain cutoff, which currently is two trillion won.26
24 Some researchers attribute the weakest-link measure to the paper by La Porta, Lopez-de-Silanes, and Shleifer (1999), but, in fact, they use a different definition of voting rights which is closer to the VR measure. Specifically, they measure indirect ownership in a firm i as the percentage of votes that other group firms hold directly in firm i, provided that these other group firms are also controlled by the family (under control thresholds of either 10% or 20%). See Table I on p. 478 of their paper.
25 To be more precise, the KFTC’s definition that we describe here is that of a large business group. A chaebol is a large business group that is controlled by a family. Because our sample contains only family controlled groups, we refer to chaebols and large business groups interchangeably. 26 Based on the won/dollar exchange rate of 946 on March 9, 2007, two trillion won amounts to approximately 2.1 billion US dollars.
Definition 4. We define the aggregate equity stake of firm i in other group firms as P stakei ¼
j sij Ej
Ai
:
ð9Þ
This measure is essentially the total size of the equity stake that firm i holds in other group firms, normalized by the total assets of firm i. We normalize by the assets of firm i because firm i’s valuation is more likely to be affected when the equity stakes are large relative to the size of firm i.
4.2.4. Consistent voting rights Besides the weakest link, the previous literature has also used an alternative measure of voting rights (VR), namely the sum of the direct stakes held by the controlling shareholder, and all stakes held by firms controlled by this shareholder (La Porta, Lopez-de-Silanes, and Shleifer, 1999; Lins, 2003).24 Definition 5. Given a threshold T, the consistent voting rights of the family in firm i 2 CðTÞ are defined as VRi ðTÞ ¼ fi þ
X
sji :
ð10Þ
j2CðTÞ, jai
456
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
From the ownership and financial database that the KFTC has maintained, we obtained data for the period 1998–2004. We focus only on business groups with the ownership of a natural person (i.e., family business groups), and exclude other business groups such as government-controlled business groups. Our ownership data contain 3,545 firm-year observations. In some tests we use the subsample of firms that were newly added to the chaebol during our sample period. We describe these data later. 5.1.1. Summary statistics: ownership variables and firm characteristics Table 1 shows the average values for the ownership variables across all firm-years in our sample (Panel A), and the cross-correlation matrix (Panel B). We also include other firm characteristics that we use in the analysis. Panel A shows that there are a total of 47 groups and 1,085 firms that were present at some point in the sample between 1998 and 2004. The controlling family holds 13% of the cash flows of the median firm, but it holds substantially more votes according to our two alternative measures of voting power. The consistent voting rights measure (VR) yields the largest voting power. The family and the affiliate firms hold 68% of the votes of the median firm in the sample. In contrast, the critical control threshold (CC) of the median firm is 30%. The data also indicate a substantial degree of pyramiding in Korean chaebols (the median position of a firm is 2.06), but with substantial cross-sectional variation (for example, 25% of the firms show an average position lower than 1.40). The typical pyramid is not deep (the 75th percentile of the position variable is approximately 2.5). Thus, while many chaebol firms are owned through pyramids, most of the time there is only one intermediate firm between the firm in question and the family. Regarding centrality, the main pattern is that only a few firms are central for the group structure. The 75th percentile of centrality is zero. Similarly, the median aggregate stake held by group firms in other firms is zero, and the 75th percentile is just 3.5%. This statistic suggests that only a small fraction of firms hold substantial stakes in other firms. The relatively flat structure of Korean chaebols, coupled with the fact that only a few chaebol firms are central, helps us in the design of the empirical test of Implication 3 (the valuation implication). Implication 3 states that firms that are used by the family to set up and acquire new group firms (firm A in the theory) should trade at a discount, due to the anticipation of future pyramidal investments. The structure of Korean chaebols suggests that we can indeed identify firm A in the data using the centrality variable. If a typical chaebol had several layers and many central firms, then it would be difficult to predict which firms are likely to make future pyramidal investments. We also confirm this assumption by studying whether central firms are more active acquirers than other firms. Most chaebol firms are private (74% of firm-years involve unlisted firms). The median chaebol firm is 13 years old and has 190 employees. Therefore, despite the presence of a few very large firms in the sample, a typical chaebol involves many firms that are small, young, and privately held. The
summary statistics also show that 25% of the firm-years involve firms in indirect cross-shareholding loops.27 The high incidence of cross-shareholdings underscores the importance of taking cross-shareholdings into account when computing the other ownership measures. In Panel B, we present the simple correlations among the ownership variables and the other firm characteristics in Panel A. The correlations show that public firms, central firms, and firms in cross-shareholding loops tend to be higher up in the group structure (negative correlation with position). These variables are also correlated among themselves, that is, central firms are more likely to be public and belong to loops. Regarding firm characteristics, central firms are, on average, older, larger, and more likely to be public than other group firms. The same pattern holds for crossshareholdings, which are more common among public, larger, and older firms. Position, in turn, is negatively correlated with age, public status, and the number of employees.28 The measures of cash flow rights and separation between ownership and control display expected patterns. The family has higher ultimate ownership in private, and smaller firms. Position is highly positively correlated with both of the separation measures, indicating that firms in pyramids have higher separation between ownership and control. 5.1.2. The typical structure of a Korean chaebol Fig. 2 summarizes the statistics above by charting the ownership structure of the typical chaebol. We can think of a typical chaebol structure as being organized in three layers. Some firms (firms 1, 2 in the figure) are owned directly at the very top of the group (a position value close to one), without ownership links to the other firms. The middle layer contains firms that belong to cross-shareholding loops, and also central firms (firms 3, 4 and 5). Unlike the firms in the top layer, firms in this middle layer hold equity stakes in other chaebol firms, including other firms in the middle layer and firms in the bottom layer (such as firms 6, 7, etc.). Central firms in the middle layer tend to be public, and they are, on average, larger and older than other chaebol firms. In the bottom layer, in contrast, we observe firms that are more likely to be private, smaller, and younger. These firms do not own substantial stakes in other firms. Overall, this snapshot of chaebol structure is largely consistent with a historical evolution of chaebols. Chaebol s appear to have grown as the controlling family used 27 The fraction of firms participating in cross-shareholding loops may seem surprising, given the fact that Korean regulation prohibits direct cross-shareholdings in chaebols. However, out of the 893 firm-years in which firms are involved in cross-shareholdings, we find that 72% belong to loops involving three firms, 13% are in loops involving four firms, and 6% are in loops involving five firms or more. Thus, Korean chaebols appear to circumvent the regulations prohibiting cross-shareholdings by creating loops of three or more firms. 28 Despite the negative correlation between centrality and position, we note that there is also a significant amount of variation in position among non-central firms. This variation is important, because it allows us to test some of the theoretical predictions described above. To illustrate this point, we compute the standard deviation in position for firms that have a centrality value lower than the mean (0.02). This standard deviation is 0.81, which is virtually identical to the standard deviation in the entire sample reported in Table 1 (0.82).
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
457
Table 1 Summary statistics of ownership variables and firm characteristics. Panel A presents summary statistics of ownership variables of Korean chaebol firms for the period 1998–2004. Data are from the Korean Fair Trade Commission (KFTC). The variables are defined in detail in the text (see Section 4). Ultimate ownership is a measure of the family’s cash flow rights, and VR (consistent voting rights) and CC (critical control threshold) are two alternative measures of voting rights. Separation CC and Separation VR are defined, respectively, as CC minus ultimate ownership, and VR minus ultimate ownership. Position is a measure of the distance of a firm relative to the controlling family in the group structure. Centrality is the average drop in voting rights when a firm’s votes are not taken into account to compute CC for the other group firms. Stake is the book value of equity stakes held by a chaebol firm in other firms, normalized by assets. Cross-shareholdings takes a value of one if the firm belongs to a cross-shareholding loop. Public is a variable that takes the value of one if the firm is publicly traded. Panel B presents the correlation matrix for the variables summarized in Panel A. Panel A: Basic statistics Variable
Mean
StDev
Median
25%
75%
Firm-years
Ultimate ownership VR CC Separation VR Separation CC Position Centrality Stake Cross-shareholdings Public Employees Firm age No. firms No. groups
0.21 0.68 0.33 0.47 0.12 2.11 0.02 0.08 0.25 0.26 1198 17 1085 47
0.22 0.28 0.19 0.29 0.11 0.82 0.05 0.34 0.43 0.44 3755 14
0.13 0.68 0.30 0.44 0.12 2.06 0.00 0.00 0.00 0.00 190 13
0.05 0.47 0.19 0.23 0.03 1.40 0.00 0.00 0.00 0.00 43 4
0.28 1.00 0.43 0.73 0.19 2.56 0.00 0.04 1.00 1.00 840 26
3545 3545 3545 3545 3545 3545 3545 3545 3545 3545 3545 3545
Panel B: Correlations
Separation VR Separation CC Position Centrality Cross-SH Public Employees Firm age
Ult own
Separ VR
Separ CC
Position
Centrality
Cross-SH
Public
Employees
0.42 0.50 0.52 0.10 0.06 0.16 0.09 0.01
0.28 0.60 0.25 0.20 0.44 0.18 0.33
0.54 0.06 0.04 0.06 0.01 0.04
0.26 0.18 0.23 0.16 0.31
0.21 0.37 0.24 0.39
0.42 0.30 0.46
0.35 0.59
0.32
successful (e.g., large, public) group firms to set up and acquire new group firms that are placed at the bottom of the group, i.e., those with high position values.29 5.2. Accounting and financial data In addition to the data obtained from the KFTC, we also used two other databases developed by Korea Listed Companies Association (KLCA) and Korea Investors Service (KIS), respectively, to obtain additional financial information. KLCA and KIS’s databases contain information not only on listed companies, but also on some private firms that are subject to external audit. We follow the standard procedure of dropping the data on financial institutions (insurance, brokerage, and other financial institutions), which comprise 316 firm-years of the 3,545 firm-years of the sample. These firms are subject to specific regulations and accounting rules that make their financial statements less comparable to the other chaebol firms, which are mostly in the manufacturing sector. 29 Aganin and Volpin (2005) also report similar evidence for one particular Italian business group (the Pesenti group).
Our measure of profitability is profits before interest and taxes normalized by assets. However, to correctly measure the profitability of each individual chaebol firm, we need to ensure that reported figures are not affected by equity stakes that a chaebol firm holds in other firms. Starting in 1999, the financial statements of Korean chaebol firms became subject to the equity method reporting rule. The basic idea behind this accounting rule is to record firm A’s share of firm B’s equity as an asset for firm A, and firm A’s share of firm B’s profits as a source of non-operating income for firm A. The financial statements contain enough information to allow us to back out the exact amount by which accounting figures have been adjusted because of these equity stakes. We use this information to calculate our measures of assets and profitability for chaebol firms, which we denote stand alone assets and stand-alone profitability. The details are provided in Appendix C. There are similar issues involved in the computation of a measure of Tobin’s Q for chaebol firms. The market value of a publicly listed chaebol firm includes the value of the equity stakes that this firm holds in other chaebol firms, both listed and unlisted. However, adjusting for the value of equity stakes is more difficult because the market value of private firms (which comprise a large fraction of the sample) is not
458
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
Direct ownership
Firm 1
Family
Firm 2
Central firms Cross-shareholdings
Firm 4
Firm 3
Firm 5
Public Firms Larger and Older
Pyramidal ownership Private firms
Firm 6
Firm 7
Firm 8
Firm 9
Firm 10
Smaller and younger Fig. 2. Typical ownership structure of a Korean chaebol, 1998–2004. This figure depicts the typical structure of a Korean chaebol. Each box represents an individual firm, with the controlling family in the top box. The arrows represent equity stakes that the family and the firms hold in other firms. In the left, we describe some of the key characteristics of the firms in each layer of the chaebol. For example, central firms (those in the middle layer) tend to be larger and older than other chaebol firms.
observable. Therefore, our preferred measure of valuation is a measure of Q that is unadjusted for the value of equity stakes: Q¼
EV þ Book value of liabilities , Book value of assets
ð11Þ
where EV is the market value of equity. To show that the results are not driven by mismeasurement, we also experiment with a measure of Q that takes the value of equity stakes into account, ‘‘stand-alone Q’’: Qsa ¼
Finally, we use acquisition data to test some of the implications of the theory. We estimate the NPV of acquisitions of new firms by the chaebol using the acquisition premium:
EV þ Book value of liabilitiesValue of equity stakes : Stand-alone assets ð12Þ
To compute Qsa, we assume that private firms are valued at book value. Provided this assumption is correct, Qsa can be interpreted as the Q that a group firm would have if it were valued as a stand-alone entity. We use non-current liabilities divided by stand-alone assets to measure leverage, the absolute value of the difference between operating cash flows over stand-alone assets and net income over stand-alone assets to measure accruals, and we normalize dividends and capital expenditures by stand-alone assets.30 We define tangibility as property, plant, and equipment divided by stand-alone assets, intangibles as the ratio of intangible assets to stand-alone assets, and collateral as the ratio of the sum of property, plant and equipment and inventories to stand-alone assets.31 30 Korean cash flow statements disaggregate gross investments in tangible assets (e.g., increase in buildings) from the liquidation of tangible assets (e.g., decrease in buildings). Our capital expenditure measure is the sum of all gross investment items minus the sum of all liquidation items (e.g., net capital expenditures). 31 Frank and Goyal (2009) propose the use of this collateral variable in lieu of a standard tangibility measure in the context of capital structure regressions.
Acquisition premium ¼
Acquisition price : Book value of equity
ð13Þ
In order to compute this variable, we first identify those firms that appear for the first time in the ownership data of the KFTC. There are 303 firms that are newly added to a chaebol during our sample period (1998–2004). Calculation of the acquisition premium requires the book value of equity and the acquisition price. Out of 303 firms, there are 214 firms for which the accounting data are available from KLCA and KIS. Out of these 214 firms, acquisition prices are obtained for 144 firms from the electronic disclosure system, DART (Data Analysis, Retrieval and Transfer System) of the Korean FSS (Financial Supervisory Service). In some cases, the new firms represent new establishments, and not acquisitions of existing firms. In other cases, the data on the acquisition price are not available. The book value of equity is obtained from KLCA and KIS. We also estimate the extent to which a chaebol firm is used by the family to acquire stakes in other firms with the variable acquisition intensity, which is the sum of the value of equity stakes acquired by each group firm in the event of an acquisition of a new group firm, divided by the book value of the equity of the acquirer. We use the same sample of acquisitions of new firms with available accounting data (described above) to compute this variable.
5.2.1. Summary statistics: Accounting and valuation data Table 2, Panel A, reports the summary statistics for the accounting and valuation variables. Given data availability, we end up with a sample of 2,695 firm-years between
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
1998 and 2004.32 Stand-alone assets are lower than total assets because of the adjustment for equity stakes (approximately by 10%, on average). There are a total of 823 firm-years available for public firms between 1998 and 2004. Notice that Qsa and Q have very similar distributions.33 Finally, notice that the median acquisition premium is 1.07, and the mean is 1.54, indicating that the family pays large premiums for some of the acquisitions. Panel B displays some of the correlations between the financial and ownership variables. Some patterns worth noting are as follows. Stand-alone assets are positively correlated with centrality and negatively correlated with position. Q is negatively correlated with both centrality and separation between ownership and control, but only if such separation is measured using the CC measure of control. Capital expenditures and the acquisition premium are positively correlated with position, and centrality is positively correlated with acquisition intensity. 6. Empirical tests In this section we test the hypotheses that relate group structure to accounting and financial variables (Implications 1 to 3). These implications are all based on the selection hypothesis that we develop in Section 3. In addition, we test some implications that are specific to the tunneling hypothesis, with the goal of evaluating the relative importance of these two hypotheses in our data.
The previous literature shows a positive correlation between profitability and ultimate ownership. In this section we estimate the following empirical model and show that we are able to replicate previous results in our data:
j
sample). In some specifications, we also include group fixed effects to measure within-group effects. The standard errors are clustered at the level of the firm. The results are presented in Table 3. Column 1 shows the standard positive correlation between ultimate ownership and profitability. This correlation is robust to the inclusion of group dummies (column 2). Interestingly, ultimate ownership appears to be more robustly related to profitability than the measures of separation between ownership and control (columns 3 to 6). These results show that we can replicate previous findings in our data. A result to be noted in Table 3 is the negative correlation between centrality and profitability (columns 7 and 8). Despite the fact that central firms are higher up in the group structure relative to pyramidal firms, this finding does not contradict standard findings in the literature. In fact, notice that the positive correlation between ultimate ownership and profitability continues to hold after including centrality in the regressions. In other words, central group firms have lower profitability than other group firms, irrespective of the family’s ultimate ownership. We discuss this finding further in Section 7.
6.2. Pyramids and profitability In order to examine the family’s choice of where to place a firm in the group structure, we estimate empirical models in which position is the dependent variable: Positioni,t ¼ a1 Stand-alone profitabilityi,t1 þ U Controlsit X X þ industryj þ year t þ ei,t : ð15Þ
6.1. Family ownership and profitability
Stand-alone profitabilityi,t ¼ b1 Ownership variableit X X þ b Controlsit þ industryj þ year t þ ei,t ,
459
ð14Þ
t
where we use both ultimate ownership and the measures of separation between ownership and control as alternative ownership variables. The vector of controls includes firm size (measured by the log of stand-alone assets), age, public status, and leverage. In some specifications, we include other measures of group structure (namely centrality and cross-shareholdings) to examine their correlations with profitability. In addition, we control for industry and year fixed effects. The industry classification corresponds roughly to a two-digit standard industrial classification (SIC) in the US (there are 45 different industries in the 32 The data for stand-alone profitability, stand-alone Q, dividends, accruals, and acquisition premium are winsorized at the 1st and 99th percentiles. 33 This is consistent with the results in Bohren and Michalsen (1994), who compute distortions due to double-counting of value of firms with cross-shareholdings in Norway. Valuation metrics such as price-earnings ratio are relatively unaffected by cross-shareholdings, since there is double-counting in both the numerator and the denominator. In contrast, French and Poterba (1991) report a substantial effect of cross-shareholdings on price-earnings ratios in Japan in the 1980s.
j
t
The selection hypothesis predicts that the controlling family is more likely to place a firm in a pyramid (high position) if the firm has low profitability, as per Implication 1. Thus, the coefficient a1 should be negative. We use lagged profitability because the theory on group formation suggests that profitability should predict pyramidal ownership. We recognize, though, that simply lagging this variable is not sufficient to provide evidence on causality, and we address the issue of causality in greater detail below. The controls are identical to those used in Table 3 above (firm size, age, public status, leverage, and dummies for year, industry, and group in some specifications). The selection hypothesis has no clear prediction about the relative profitability of central firms (those at the middle layer of Fig. 2, such as ‘‘firm 3’’). Instead, it predicts that firms that are owned directly by the family (such as ‘‘firm 1’’) should have higher profitability than firms that the family places in pyramids (such as ‘‘firm 6’’). This observation suggests that the effect of profitability on a firm’s position should be driven by variations in position among non-central firms. In order to verify whether this is true, we also estimate Eq. (15) separately for central and non-central firms. We divide the sample in central and non-central firms using the mean value of centrality as a cutoff. In the following tables, central firms (non-central firms) are those for which centrality is greater (lower) than its mean value of 0.02 (see Table 1).
460 Table 2 Summary Statistics of Accounting and Financial Variables. This table presents summary statistics for financial and accounting variables for chaebol firms during 1998–2004. Insurance, securities firms, and other financial institutions are excluded from the sample. Data are from KLCA (Korea Listed Companies Association) and KIS (Korea Investors Service). Stand-alone profitability and Stand-alone assets are computed after an adjustment that takes into account the effect of equity stakes held in other chaebol firms (see Appendix C for details). See Eqs. (11) and (12) for the definitions of Q, Qsa, and the stand-alone market value of assets (the numerator of Eq. (12). Leverage is defined as non-current liabilities divided by stand-alone assets. Accruals are defined as the absolute value of the difference between net income divided by stand-alone assets, and cash flows divided by stand-alone assets. The Acquisition premium is the ratio between the acquisition price and the book value of the equity of firms that were added to a chaebol during the sample period of 1998–2004. Tangibility is property, plant, and equipment divided by stand-alone assets. Intangibles is the ratio of intangible assets to stand-alone assets. Collateral is the ratio of the sum of property, plant, and equipment and inventories to standalone assets. Acquisition intensity is the sum of the value of equity stakes acquired by each group firm in the event of an acquisition of a new group firm, divided by the book value of the equity of the acquirer. Panel A presents summary statistics, and Panel B presents the correlations among these variables and the ownership measures described in Table 1.
Mean
StDev
25%
Median
75%
Firm-years
Stand-alone profitability Assets (millions USD) Stand-alone assets (millions USD) Q Qsa Capital expenditure/Stand-alone assets Leverage Accruals Dividends/Stand-alone assets Tangibility Intangibles Collateral Acquisition intensity Acquisition premium
0.050 794 714 0.917 0.908 0.056 0.213 0.107 0.006 0.419 0.037 0.518 0.004 1.543
0.115 2320 2029 0.324 0.363 0.148 0.296 0.126 0.013 0.275 0.109 0.251 0.021 1.537
0.007 29 27 0.734 0.707 0.008 0.043 0.030 0.000 0.170 0.001 0.325 0.000 0.693
0.058 110 103 0.838 0.828 0.029 0.146 0.070 0.000 0.431 0.006 0.556 0.000 1.065
0.103 527 489 0.994 1.011 0.073 0.301 0.136 0.007 0.634 0.024 0.718 0.000 1.548
2695 2695 2695 823 806 2601 2644 2679 2695 2670 2065 2297 2691 144
Panel B: Correlations
Stand-alone asset Q Capital expenditure/Stand-assets Leverage Accruals Dividends/Stand-alone assets Acquisition Premium Tangibility Intangibility Collateral Acquisition intensity Separation VR Separation VC Average position Centrality
Stand-alone profitability
Stand-alone assets
0.073 0.146 0.003 0.131 0.219 0.328 0.094 0.061 0.110 0.057 0.019 0.062 0.016 0.000 0.032
0.154 0.005 0.075 0.013 0.051 0.139 0.098 0.031 0.085 0.031 0.90 0.028 0.137 0.252
Q
0.246 0.004 0.223 0.291 0.701 0.128 0.173 0.175 0.089 0.027 0.101 0.082 0.147
Capex/ Stand-alone assets
0.009 0.103 0.002 0.060 0.232 0.023 0.151 0.034 0.068 0.037 0.139 0.057
Leverage
Accruals
0.149 0.125 0.056 0.227 0.150 0.241 0.031 0.011 0.051 0.021 0.068
0.076 0.242 0.123 0.051 0.143 0.086 0.033 0.012 0.029 0.020
Dividends/ /Stand-alone assets
0.024 0.063 0.049 0.024 0.006 0.147 0.056 0.021 0.015
Acquisition Premium
0.034 0.069 0.118 0.282 0.138 0.087 0.133 NA
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
Panel A: Basic statistics Variables
0.026 0.282 0.598 0.250 0.021 0.037 0.017 0.116
0.537 0.079
Separation VR Acquisition intensity
Separation VC
Average Position
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
461
In addition, we also use the sample of non-central firms but replace the position variable with a dummy variable (pyramid) that takes the value of one if a firm is owned through a pyramid, and zero if the firm is owned directly. We classify a firm as being owned through a pyramid if its position is greater than or equal to 2.0 (recall that a position of 2.0 characterizes a pure pyramid). The firm is owned directly if its position is lower than 1.5. This discrete classification of firms into pyramidal versus direct ownership ensures that the results are, in fact, driven by a comparison between non-central firms at the top, with non-central firms at the bottom of the group. Since the dependent variable is a dummy, we use a probit specification in these regressions. The results are reported in Table 4. Columns 1 and 2, which include both central and non-central firms, show that lagged profitability is correlated with position in a way that is consistent with the selection hypothesis, both before and after controlling for group fixed effects. The control variables have the expected sign. For example, older and larger firms are more likely to be found at the top of the group. Consistent with Implication 1, these correlations are driven mostly by variation among non-central firms (columns 3 and 4). This result is further confirmed by the probit regressions in columns 5 and 6. Non-central pyramidal firms do appear to have lower profitability than non-central, directly owned firms.
0.009 0.044 0.114 0.041 0.102 0.191 0.025 0.166 0.185 0.196 0.072 Intangibility Collateral Acquisition intensity Separation VR Separation VC Average position Centrality
0.213 0.912 0.005 0.030 0.085 0.042 0.091
Collateral Tangibility
Intangibles
6.3. Does profitability predict pyramidal ownership? The lower profitability of non-central firms placed at the bottom of the group reported in Table 4 is consistent with the selection hypothesis. However, these results are not sufficient to rule out an alternative explanation due to the tunneling hypothesis, which predicts that the family will divert resources away from firms that are placed in pyramids. In this subsection, we provide two additional tests that attempt to distinguish between the selection and the tunneling stories. Both tests exploit the dynamic nature of our data, in that they both focus on large shocks to the group structure. 6.3.1. Evidence from large changes in position One of the challenges in interpreting the results in Table 4 is that lagging profitability is not sufficient to show that it influences a firm’s position because the position of an individual group firm does not vary much over time. It could also be the case that past profitability was determined by the firm’s relative position in the group, which might be very similar to the current position. In fact, in most firmyears, position changes very little. The 25th percentile of the distribution of annual firm-level changes in the position variable is 0.024, while the 75th percentile is 0.04. Thus, most of the variation in position is cross-sectional. In order to provide additional evidence for the selection hypothesis, we experiment with instances of large changes in a firm’s position in a group. Specifically, we create dummy variables that capture cases in which a firm’s position changed by more than 0.10 from one year to the next. This cutoff represents more than 10% of the total standard
462
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
Table 3 Family ownership and profitability. This Table contains the tests described in Section 6.1, which relate a firm’s profitability to family ownership variables (Eq. (14)). The dependent variable is Stand-alone profitability. Stand-alone profitability is computed after an adjustment that takes into account the effect of equity stakes held in other chaebol firms (see Appendix C for details). Ultimate ownership is a measure of the family’s cash flow rights. Ln assets is the logarithm of the book value of assets. Public is a variable that takes the value of one if the firm is publicly traded. Leverage is defined as non-current liabilities divided by stand-alone assets. The coefficients on firm age are multiplied by 1,000. Separation CC and Separation VR are defined, respectively, as CC (critical control threshold) minus ultimate ownership, and VR (consistent voting rights) minus ultimate ownership. Centrality is the average drop in voting rights when a firm’s votes are not taken into account to compute the critical control threshold for the other group firms. Cross-shareholdings takes a value of one if the firm belongs to a cross-shareholding loop, and zero otherwise. We computed robust standard errors clustered at the firm level. T-statistics are in parenthesis. * significant at 10%; ** significant at 5%; *** significant at 1%. Dependent variable: Stand-alone profitability
Firm age Ln assets Public Leverage Ultimate ownership
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
0.223 ( 0.866) 0.008*** (2.824) 0.002 ( 0.314) 0.053*** ( 4.683) 0.064*** (4.132)
0.074 (0.259) 0.007** (2.216) 0.002 ( 0.215) 0.050*** ( 4.052) 0.058*** (3.173)
0.098 ( 0.387) 0.006** (2.319) 0.011 ( 1.380) 0.052*** ( 4.669)
0.125 (0.435) 0.006** (2.165) 0.010 ( 1.269) 0.049*** ( 4.030)
0.089 ( 0.354) 0.006** (2.244) 0.005 ( 0.670) 0.052*** ( 4.711)
0.118 (0.411) 0.006** (2.184) 0.005 ( 0.649) 0.049*** ( 4.065)
0.161 ( 0.586) 0.008*** (2.795) 0.002 ( 0.214) 0.053*** ( 4.623) 0.067*** (4.242)
0.110 (0.368) 0.007** (2.170) 0.002 ( 0.218) 0.050*** ( 4.018) 0.061*** (3.298)
0.020 ( 1.579)
0.016 ( 1.211) 0.040 ( 1.586)
0.042 ( 1.445) 0.089** ( 2.140) 0.002 (0.286) 0.127** ( 2.118) Yes Yes No 2620 0.098
0.102** ( 2.141) 0.007 (1.122) 0.240*** ( 3.873) Yes Yes Yes 2620 0.165
Separation VR Separation CC Centrality Cross-shareholdings Constant Industry fixed effects Year fixed effects Group fixed effects Observations R2
0.105* ( 1.826) Yes Yes No 2643 0.097
0.046 ( 0.724) Yes Yes Yes 2643 0.163
0.064 ( 1.113) Yes Yes No 2643 0.086
0.019 ( 0.304) Yes Yes Yes 2643 0.156
0.063 ( 1.096) Yes Yes No 2643 0.086
0.017 ( 0.277) Yes Yes Yes 2643 0.157
deviation in the position variable.34 The variable position increase takes the value of one if position increased by more than 0.10 from one year to the next, and zero otherwise (there are 388 firm-years that satisfy this criterion). The variable position decrease takes the value of one if position decreased by more than 0.10 from one year to the next, and zero otherwise (there are 278 firm-years that satisfy this criterion). We then replace the variable position with position increase and position decrease in Eq. (15). Since our dependent variable is a dummy, we use a probit model for these regressions. The results from these regressions are reported in the first four columns of Table 5. Clearly, lagged profitability helps predict large changes in position in a way that is consistent with the selection hypothesis. The first two columns show that low past profitability predicts increases in a firm’s position in the group structure, both before and after controlling for group fixed effects. Thus, poor past performance predicts that a firm will be moved to the bottom of the pyramid. High past profitability is also positively correlated with decreases in position, though the coefficients are not statistically significant.
6.3.2. Evidence from new chaebol firms A different way to overcome the lack of time-series variability in the position variable is to examine cases in which the family decides for the first time where to place a firm in the group structure. Specifically, there are 303 firms in our data that appear as chaebol firms for the first time in the sample window of 1998–2004. For 163 of these firms, we also have performance data the year prior to their inclusion. While the size of the sample is drastically reduced if we study only these firms, examining a firm’s profitability before it is added to a chaebol allows for sharper tests of causality. To wit, if lower profitability does predict pyramidal ownership (Implication 1), then the relationship uncovered in Table 4 should also hold if we measure the firm’s profitability before it became a chaebol firm. Presumably, a firm’s profitability in the year prior to becoming a chaebol firm cannot be affected by the ownership structure chosen later by the chaebol’s controlling family. However, pre-chaebol profitability should explain the firm’s ownership structure, according to Implication 1.35
34 The results that we present are invariant to the particular cutoff used. We have also experimented with using changes larger than the mean change in position (0.03), or changes larger than certain percentiles of the distribution of the position variable (25th and 75th for example).
35 When running these regressions, we also lag the values of the other control variables that are available in the year prior to the firm’s addition to the chaebol (size and leverage).
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
463
Table 4 Determinants of a firm’s position in the Chaebol. This Table contains the tests described in Section 6.2, which relate a firm’s position in the group to firm characteristics (Eq. (15)). Position is a measure of the distance of a firm relative to the controlling family in the group structure. Stand-alone profitability is computed after an adjustment that takes into account the effect of equity stakes held in other chaebol firms (see Appendix C for details). Ln assets is the logarithm of the book value of assets. Centrality is the average drop in voting rights when a firm’s votes are not taken into account to compute the critical control threshold for the other group firms. Central (Non-central) firms are those for which Centrality is greater (lower) than its mean value of 0.02. Public is a variable that takes the value of one if the firm is publicly traded. Leverage is defined as non-current liabilities divided by stand-alone assets. Pyramid is a dummy that takes the value of one if a (Non-central) firm is owned through a pyramid, and zero if the Non-central firm is owned directly. We classify a firm as being owned through a pyramid if its Position is greater than or equal to 2.0. The (Non-central) firm is owned directly if its Position is lower than 1.5.The coefficients on Firm age are multiplied by 1,000. We computed robust standard errors clustered at the firm level. T-statistics are in parenthesis. * significant at 10%; ** significant at 5%; *** significant at 1%. Regression model Dependent variable: Position All firms
Stand-alone profitability t 1 Firm age Ln assets Public Leverage Constant Industry fixed effects Year fixed effects Group fixed effects Observations R2
Probit model Dependent variable: Pyramid
Non-central
Central
Non-central
(1)
(2)
(3)
(4)
(5)
(6)
0.380** ( 2.567) 9.557*** ( 3.997) 0.027 ( 1.143) 0.111 ( 1.309) 0.123* ( 1.697) 2.461*** (3.528)
0.347** ( 2.274) 8.356*** ( 3.367) 0.078*** ( 3.422) 0.072 ( 0.913) 0.147** ( 2.010) 3.021*** (3.579)
0.347** ( 2.201) 10.455*** ( 3.379) 0.028 ( 1.118) 0.008 (0.102) 0.169** ( 2.204) 1.499* (1.907)
0.403 ( 1.293) 6.674 (1.399) 0.070 ( 1.218) 0.387** ( 2.154) 0.112 ( 0.569) 3.237*** (2.851)
2.308*** ( 4.813) 32.171*** ( 4.419) 0.086* (1.695) 0.039 ( 0.188) 0.183 ( 1.300) 0.080 ( 0.0918)
2.857*** ( 5.269) 36.757*** ( 4.523) 0.019 (0.311) 0.050 ( 0.239) 0.137 ( 0.861) 0.427 (0.322)
Yes Yes No 2160 0.287
Yes Yes Yes 2160 0.455
Yes Yes Yes 1745 0.454
Yes Yes Yes 396 0.668
Yes Yes No 1386 NA
Yes Yes Yes 1325 NA
The last four columns of Table 5 contain the results. In columns 5 and 6, we run the regression in Eq. (15) using only the sample of new chaebol firms. Low profitability continues to predict that a new firm will be controlled through a pyramid (high position), before and after including group dummies. These results suggest that when the family adds a new firm to the group that has low profitability relative to other group firms, it is more likely to place such a firm in a pyramidal structure. These results support the direction of causality suggested by the selection story. The economic magnitude of the profitability effect also appears to be large. In column 5, for example, the estimates imply that a one-standard-deviation decrease in stand-alone profitability (0.12, according to Table 2) increases the firm’s first position in the group by approximately 0.18 (which corresponds to 22% of the overall standard deviation of the position variable, which is equal to 0.82 in Table 1). In columns 7 to 8, we perform two robustness checks. First, notice that our argument rests on the assumption that a firm’s profitability in the year prior to becoming a chaebol firm cannot be affected by its ownership structure. Nevertheless, in some cases, a firm might have been owned by another chaebol in the year prior to its acquisition, and through a pyramid. In these cases, the firm’s lagged profitability might have been affected by its placement in a pyramidal structure. To ensure that this story does not
explain our results, we eliminate all cases in which we can determine that a firm belonged to another chaebol in the year prior to its first appearance in a new chaebol. There are 16 of these cases. As column 7 shows, eliminating these firms does not affect the previous results. Second, we replicate the results using the dummy variable pyramid defined above, rather than position. Column 8, which uses a probit specification, shows that low pre-chaebol profitability predicts that a firm will be placed in a pyramid by the chaebol.
6.4. Position and stock measures of pledgeability We interpret the negative effect of profitability on position reported above as a result of the selection hypothesis. Since firms with low profitability produce a smaller flow of pledgeable income, the family finds it valuable to use the internal equity of other group firms (rather than external funds) to finance investment in new firms added to the group. In contrast, firms with high profitability require less external funding and thus less internal group equity. This argument suggests that the nature of the assets of new group firms can also affect their position in the group. Specifically, firms that have a larger stock of pledgeable assets could find it easier to raise funds in external capital markets, and thus require less group equity. In this subsection, we examine whether stock-based proxies for pledgeability are also
464
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
Table 5 Does profitability predict pyramidal ownership? This table contains the tests described in Section 6.3. The variable Position increase takes the value of one if position increased by more than 0.10 from one year to the next, and zero otherwise. The variable Position decrease takes the value of one if position decreased by more than 0.10 from one year to the next, and zero otherwise. In columns 1 to 4 we use a probit model. The regressions in columns 5 to 8 use a sample of firms in the years in which they first appear as a member of a chaebol. In column 7 we eliminate all cases in which we can determine that a firm was owned by a chaebol in the year prior to its inclusion in a chaebol as a new firm. In column 8 we use the dummy variable Pyramid (which is defined in Table 4) as a dependent variable. Standalone profitability is computed after an adjustment that takes into account the effect of equity stakes held in other chaebol firms (see Appendix C for details). Ln assets is the logarithm of the book value of assets. Position is a measure of the distance of a firm relative to the controlling family in the group structure. Public is a variable that takes the value of one if the firm is publicly traded. Leverage is defined as non-current liabilities divided by stand-alone assets. The coefficients on Firm age are multiplied by 1,000. We computed robust standard errors clustered at the firm level. T-statistics are in parenthesis. * significant at 10%; ** significant at 5%; *** significant at 1%. Dependent variable
Stand-alone profitability t 1 Firm age Ln assets Public Leverage Constant
Industry fixed effects Year fixed effects Group fixed effects Firms in sample Observations R2
Position increase
Position decrease
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
0.380* ( 1.645) 3.967 ( 1.142) 0.053* (1.814) 0.019 (0.158) 0.121 ( 1.030) 1.094** ( 2.133)
0.468* ( 1.915) 3.677 ( 0.887) 0.020 (0.611) 0.035 (0.280) 0.246 ( 1.624) 0.260 ( 0.266)
0.317 (0.989) 2.246 ( 0.615) 0.020 (0.719) 0.003 ( 0.028) 0.087 (0.851) 1.433*** ( 2.823)
0.596 (1.565) 1.511 (0.386) 0.028 ( 0.857) 0.074 ( 0.647) 0.107 (0.978) 1.045 (0.955)
1.709*** ( 2.904) 19.044** (2.105) 0.002 ( 0.0234) 0.921*** ( 3.795) 0.411 ( 1.317) 3.848*** (2.847)
1.591*** ( 2.619) 22.311** (2.358) 0.053 ( 0.794) 0.988*** ( 3.859) 0.137 (0.400) 2.857* (1.896)
1.717*** ( 2.722) 17.049* (1.739) 0.026 ( 0.307) 0.801*** ( 3.010) 0.483 ( 1.563) 5.665*** (4.031)
4.279*** ( 2.798) 13.673 (0.770) 0.277** (2.078) 1.264* ( 1.898) 1.058* ( 1.767) 3.666 ( 1.556)
Yes Yes No All
Yes Yes Yes All
Yes Yes No All
Yes Yes Yes All
Yes Yes No Additions
Yes Yes Yes Additions
Yes Yes No Additions Not owned by chaebol
No Yes No Additions
1849 NA
1641 NA
1820 NA
1786 NA
143 0.531
143 0.750
127 0.567
113 NA
related to the firm’s position in the group in a way that is consistent with the selection hypothesis. In order to do this, we estimate the following empirical model, which relates the firm’s position in the group to stockbased proxies for pledgeability (denoted by stock variable): Positioni,t ¼ d1 Stock variablei,t1 þ U Controlsit X X þ industryj þ year t þ ei,t : j
ð16Þ
Position
Pyramid
in the chaebol, and a positive correlation between intangibles and position. Nevertheless, with the exception of collateral in column 4, which is significant at a 10% level, the coefficients are statistically insignificant. Thus, while the empirical evidence is suggestive of a selection of firms with low asset pledgeability into pyramids, the evidence is inconclusive. 6.5. Testing predictions of the tunneling argument
t
We use three alternative proxies for the stock of pledgeable assets, namely tangibility, collateral, and intangibles. Firms with low tangibility and collateral and high intangibles (which are all measured relative to the firm’s stand-alone assets) should be placed in pyramids. As in Section 6.3.2, we use the sample of new firms for these tests, so that all stock variables are measured prior to the firm’s inclusion in the chaebol to mitigate endogeneity concerns. As in Table 5, we also measure the new firm’s size and leverage in the year prior to chaebol inclusion. In addition, since profitability could be correlated with these stock variables, we include pre-chaebol profitability in the regressions. We use both position and also the pyramid dummy in the regressions. The results are presented in Table 6. Consistent with the selection hypothesis, there is a negative correlation between pre-chaebol tangibility and collateral and a new firm’s position
While the previous results support the selection hypothesis, they do not directly refute the tunneling argument. In order to provide a more direct test of the tunneling hypothesis, we would need exogenous variation in a firm’s position to test whether changes in position have a causal effect on profitability. In the absence of such variation, we experiment with an alternative strategy, which involves testing auxiliary predictions of the tunneling hypothesis. First, in the sample of firms added to the group, we examine the change in profitability around group inclusion. While the selection argument makes no specific prediction about this change, the tunneling story predicts that profitability should go down if the firm is placed in a pyramid, rather than in a direct ownership structure. In addition, because tunneling might increase when the firm is placed in a group (irrespective of its position), we
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
465
Table 6 Position and stock measures of pledgeability. This table contains the tests described in Section 6.4. Tangibility is property, plant, and equipment divided by stand-alone assets. Intangibles is the ratio of intangible assets to stand-alone assets. Collateral is the ratio of the sum of property, plant, and equipment and inventories to stand-alone assets. The dummy variable Pyramid is defined in Table 4. Stand-alone profitability is computed after an adjustment that takes into account the effect of equity stakes held in other chaebol firms (see Appendix C for details). Ln assets is the logarithm of the book value of assets. Position is a measure of the distance of a firm relative to the controlling family in the group structure. Public is a variable that takes the value of one if the firm is publicly traded. Leverage is defined as non-current liabilities divided by stand-alone assets. The coefficients on Firm age are multiplied by 1,000. We computed robust standard errors clustered at the firm level. T-statistics are in parenthesis. * significant at 10%; ** significant at 5%; *** significant at 1%. Dependent variable:
Stand-alone profitability t 1 Tangibility t 1
Position (1)
Pyramid (2)
Position (3)
Pyramid (4)
Position (5)
Pyramid (6)
1.727*** ( 2.785) 0.186 ( 0.578)
4.188*** ( 2.673) 0.518 ( 0.928)
1.904*** ( 2.626)
2.436 ( 1.562)
1.699*** ( 2.709)
4.015** ( 2.439)
0.341 ( 0.663)
1.798* ( 1.756) 0.931 (0.670) 38.636* (1.653) 0.269* (1.669) 2.104** ( 2.518) 0.002 (0.00165) 6.155** ( 2.173)
No Yes No 84 NA
Collateral t 1 Intangibles t 1 Firm age Ln assets Public Leverage Constant
Industry fixed effects Year fixed effects Group fixed effects Observations R2
20.209** (2.192) 0.009 (0.119) 0.918*** ( 3.792) 0.379 ( 1.216) 5.081*** (4.050)
16.218 (0.881) 0.312** (2.483) 1.327** ( 2.087) 0.970 ( 1.544) 5.344** ( 2.411)
14.936 (1.645) 0.024 ( 0.263) 0.772*** ( 2.751) 0.370 ( 0.871) 2.319 (1.316)
9.983 (0.517) 0.504*** (2.763) 1.900** ( 2.492) 1.480** ( 2.063) 7.496** ( 2.458)
0.975 (1.500) 20.199* (1.860) 0.012 ( 0.172) 0.951*** ( 3.126) 0.125 (0.298) 3.522** (2.420)
Yes Yes No 141 0.538
No Yes No 111 NA
Yes Yes No 110 0.558
No Yes No 71 NA
Yes Yes No 105 0.495
examine whether profitability decreases, on average, from the year prior to the year following the firm’s addition to the chaebol. Second, we examine dividends paid by group firms. The tunneling story predicts that subsidiaries at the bottom of the group are likely to pay lower dividends than firms at the top of the group, as cash flows are diverted to other group firms. We test this implication by examining the relationship between dividends and the firm’s position in the chaebol, for the entire sample of chaebol firms. In addition, we use the smaller sample of newly added chaebol firms and examine the change in dividends from the year prior to the year following the firm’s addition to the chaebol. As with the profitability tests, we look at both the average change in dividends, and also whether this change is related to the firm’s position in the chaebol in a way that might suggest tunneling (e.g., greater decrease in dividends for firms that are placed in pyramids). Third, we look at whether financial statements are manipulated to hide potential tunneling activities. In particular, one might expect to find a large difference between earnings and cash flows due to accruals in firms that are placed in pyramids, as the family attempts to hide the impact of tunneling activities on pyramidal firms’ cash flows. To see whether this holds in our data, we implement the same tests that we do for dividends. In other words, we
look at the relationship between accruals and position in the full sample of chaebol firms, and also examine the preto post-chaebol change in accruals, and whether this change is associated with the firm’s initial position in the chaebol. The results are reported in Table 7. The first two columns examine the relationship between dividends, accruals, and position in the full sample of firms. In these regressions, we use the same controls as those used above (firm age, size, public status, and leverage). In addition, in the dividend regressions we include profitability among the controls, since profitable firms are likely to be able to pay higher dividends. We note, however, that the results that we report do not change if we exclude profitability from the control set. Column 1 shows that there is no significant relationship between a firm’s position in the group and the amount of dividends that it pays out to shareholders. The accrual regression (column 2) also shows a lack of correlation between a firm’s position in the chaebol and the magnitude of accruals.36 Columns 3 to 8 examine the pre- to post-chaebol changes in dividends, accruals, and profitability for the
36 We note that a similar result holds without controls, and after including group dummies.
466
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
Table 7 Testing predictions of the tunneling argument. This table contains the tests described in Section 6.5, which test some predictions of the tunneling hypothesis. Position is a measure of the distance of a firm relative to the controlling family in the group structure. Stand-alone profitability is computed after an adjustment that takes into account the effect of equity stakes held in other chaebol firms (see Appendix C for details). Ln assets is the logarithm of the book value of assets. Public is a variable that takes the value of one if the firm is publicly traded. Leverage is defined as non-current liabilities divided by stand-alone assets. Accruals are defined as the absolute value of the difference between net income divided by stand-alone assets, and cash flows divided by stand-alone assets. The changes in stand alone profits, dividends, and accruals are measured from the year prior to the firm being added to the chaebol, to the year following the firm’s addition. The coefficients on Firm age are multiplied by 1,000. In columns (1) and (2) we computed robust standard errors clustered at the firm level. In columns (3) to (8) we computed robust standard errors. T-statistics are in parenthesis. * significant at 10%; ** significant at 5%; *** significant at 1%. Dependent variable
Constant Firm age Ln assets Public Leverage Stand-alone profitability Position
Industry fixed effects Year fixed effects Observations R2
Dividends/ Stand-alone assets
Accruals
Change in stand-alone profits
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
0.001 ( 0.099) 0.077** ( 2.376) 0.000 (0.176) 0.003** (2.410) 0.003** ( 2.427) 0.035*** (7.725) 0.000 ( 0.621)
0.366*** (3.560) 0.054 ( 0.222) 0.011*** ( 4.275) 0.002 ( 0.270) 0.079*** (4.573)
0.001 ( 0.0247)
0.379 (1.033) 1.131 (0.357) 0.002 ( 0.171) 0.035 (0.717) 0.065 (0.778)
0.003** (2.456)
0.037 ( 1.446) 0.153 ( 1.272) 0.002 (1.331) 0.002 (0.728) 0.002 ( 0.560)
0.016 (0.380)
0.377 (0.987) 4.054 ( 1.224) 0.001 ( 0.102) 0.015 (0.265) 0.073 ( 0.804)
Yes Yes 2643 0.194
Yes Yes 2629 0.108
0.001 (0.251)
Change in dividends
0.038 (1.572)
No No 137 0.000
sample of firms that are added to chaebols during our sample period. If the year of a firm’s addition to the chaebol is year t, these changes are calculated from year t 1 (that is, the year prior to the firm’s addition) to year t+ 1 (the year following the firm’s addition). In these regressions, the constant term captures the average change in these variables, from year t 1 to year t + 1. We experiment with a specification with no controls, and also with a specification in which we include the same controls used in Tables 5 and 6, plus the firm’s initial position in the group.37 Thus, we can also examine whether the changes in dividends, stand-alone profitability, or accruals are related to the firm’s position in the group in a way that is consistent with the tunneling hypothesis. The results do not support the tunneling hypothesis. The coefficient on the constant term is significant only in the dividend regressions with no controls, and even in that case, the sign of the coefficient suggests that dividends increase, on average, after a firm is added to a chaebol, which is opposite to the predictions of the tunneling hypothesis. The coefficients on position are also not significant and typically
37 As in Table 5, the firm’s size and leverage are measured in the year prior to the firm’s inclusion (year t 1), while the other variables are measured in year t.
Yes Yes 118 0.235
Change in accurals
0.002 (1.058)
No No 138 0.000
Yes Yes 119 0.397
0.006 (0.251)
No No 137 0.000
Yes Yes 119 0.372
have the opposite sign to the predictions of the tunneling hypothesis. For example, changes in stand-alone profitability are positively related to the firm’s position in the group.
6.6. Discussion: selection versus tunneling Taken together, the results presented in Tables 4–7 are largely consistent with the selection hypothesis, and do not support the tunneling hypothesis. There is strong evidence that the family selects low profitability firms into pyramids. There is some suggestive evidence that low asset pledgeability firms are also selected into pyramids. While the tunneling hypothesis provides an alternative explanation for the profitability results (namely that pyramiding reduces profitability), we present evidence consistent with the direction of causality suggested by the selection hypothesis (low profitability causes pyramiding). In addition, there is no evidence that a new chaebol firm’s relative performance decreases after it is added to a pyramid. There is also no evidence that dividends and accruals are related to a firm’s position in the group in a way that would suggest the existence of tunneling. Next, we provide additional evidence for the selection hypothesis by testing an implication that is unique to that hypothesis (Implication 2).
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
467
Table 8 Acquisition premium and position in the Chaebol. This table contains the tests described in Section 6.7, which relate a firm’s position in the group to the acquisition premium and firm characteristics. Position is a measure of the distance of a firm relative to the controlling family in the group structure. Stand-alone profitability is computed after an adjustment that takes into account the effect of equity stakes held in other chaebol firms (see Appendix C for details). Capital expenditure is capital expenditures/stand-alone assets. Ln assets is the logarithm of the book value of assets. Public is a variable that takes the value of one if the firm is publicly traded. Leverage is defined as non-current liabilities divided by stand-alone assets. The Acquisition premium is the ratio between the acquisition price and the book value of the equity of firms that were added to a chaebol during the sample period of 1998–2004. The coefficients on Firm age are multiplied by 1,000. We computed robust standard errors. T-statistics are in parenthesis. * significant at 10%; ** significant at 5%; *** significant at 1%. Dependent variable: Position
Acquisition premium
(1)
(2)
(3)
(4)
(5)
0.142** (2.560)
0.104** (2.091) 1.033** ( 2.180) 25.758*** (2.904) 0.004 ( 0.0706) 1.085*** ( 3.531) 0.170 (0.440)
2.477*** (20.59)
2.529*** (2.757)
0.115** (2.317) 0.981** ( 2.097) 27.320*** (3.103) 0.030 (0.560) 1.142*** ( 3.753) 0.061 ( 0.149) 0.683 (1.476) 1.889* (1.963)
0.108** (1.998) 0.924* ( 1.921) 33.760*** (3.781) 0.004 ( 0.0628) 1.223*** ( 4.089) 0.188 (0.388) 0.924* (1.884) 1.988 (1.564)
0.100* (1.680) 1.330** ( 2.509) 37.106*** (3.818) 0.011 ( 0.156) 1.233*** ( 3.817) 0.461 (0.808) 1.427** (2.203) 2.379 (1.584)
No No No 108 0.058
No Yes No 107 0.409
No Yes No 106 0.432
No Yes Yes 106 0.652
Yes Yes Yes 106 0.782
Stand-alone profitability t 1 Firm age Ln assets Public Leverage Capital expenditure Constant
Industry fixed effects Year fixed effects Group fixed effects Observations R2
6.7. Position and the acquisition premium Implication 2 predicts that the family will select low NPV firms into pyramids. We test this implication by examining whether the position of a new firm in the chaebol is correlated with the acquisition premium that the family pays to acquire this new firm. As discussed above, the basic idea is that if the chaebol pays more for a new firm, then the acquisition’s NPV will decrease. To implement this test, we use the acquisition data described in Section 6.3.2. In order to relate a new firm’s position to the acquisition premium, we estimate an empirical model similar to that in Eq. (15), using the acquisition premium as an additional explanatory variable: Positioni,t ¼ b1 Acquisition premiumi,t þ U Controlsit X X þ industryj þ year t þ ei,t, j
ð17Þ
t
The set of controls is similar to those included in Tables 5 and 6. Given the size of the sample, we use different sets of controls and dummy variables in each regression. Table 8 shows that the family does appear to place high premium (or lower NPV) firms into pyramids. Specifically, a new firm’s position in the chaebol is positively correlated with the acquisition premium. In column 1, we show that this result holds in a specification with no
controls.38 In column 2, we introduce the same controls as in Tables 5 and 6, including pre-chaebol profitability and year dummies, but no industry or group dummies. Notice, in particular, that pre-chaebol profitability remains negatively related to a firm’s position, with a very similar coefficient as in Table 5. This result suggests that the negative effect of profitability on position reported in Table 5 is not affected by the omission of the acquisition premium in Table 5. In column 3, we introduce capital expenditures over assets to control for the new firm’s growth opportunities, given that the acquisition premium can also be related to growth opportunities. The coefficient on acquisition premium is largely unaffected. In column 4, we add group dummies to the basic regression, and in column 5, we add both group and industry dummies. In particular, the industry dummies help control for the effect of unobservable growth opportunities on the position of the new firm in the group. These results show that the coefficient on the premium variable is largely unaffected by the inclusion of dummies. Not surprisingly, statistical significance decreases as we include both industry and group dummies. Still, the coefficient on the premium variable remains significant at
38 To be consistent with the later regressions, we require the sample firms to have available data on pre-chaebol profitability to include them in this regression. This results in a sample of 108 firms.
468
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
the 10% level in column (5). These results support the selection hypothesis that lower NPV firms are placed in pyramids (Implication 2). 6.8. Valuation and centrality We now examine whether central firms trade at a discount relative to non-central firms in the group. According to Implication 3, this valuation discount is due to minority shareholders’ anticipation of future pyramidal acquisitions by central firms. This explanation for the discount suggests that it should be larger for central firms that do more acquisitions. We first provide evidence on the relative valuation of central firms, and then we confirm that the central firm valuation discount seems to be associated with acquisition activity. In order to examine the valuation of central firms, we run the following regression: Qi,t ¼ g1 centralityi,t þ l Controlsit þ
X j
industryj þ
X yeart þ ei,t, t
ð18Þ where the controls include firm size (measured by the market value of total assets), age and public status, leverage, capital expenditures (to control for growth opportunities), and stand-alone profitability (to control for current profitability). The previous literature reports some evidence that firms in which the family retains low ownership but high voting rights trade at discounts. Thus, we also control for measures of ownership concentration and separation between ownership and control. To measure centrality, we use the benchmark measure (Eq. (8)) and, in (unreported) robustness checks, we also use the firm’s aggregate equity stake in other firms normalized by its assets (Eq. (9)). We include a variable that measures whether a firm belongs to a cross-shareholding loop, because, as explained in Section 5.1.1, central firms also tend to be part of such loops. We control for industry and year fixed effects, and also for group fixed effects in some specifications. Standard errors are clustered at the firm level. Implication 3 suggests that the coefficient g1 should be negative. Table 9 presents the results, which indicate that centrality is negatively related to firm valuation. The other variables have the expected signs. Larger and younger firms have higher Q, as do firms with high growth opportunities, proxied by their capital expenditures. There is also some indication that firms in cross-shareholding loops also trade at a discount, although this effect is not significant statistically. These results are robust to controlling for ultimate ownership and the different measures of separation between ownership and control (columns 1, 2, and 3). Interestingly, only the measure of separation based on the critical control threshold is significant in these regressions, with the standard negative sign that other papers in the literature have shown. In columns 4 to 6, we include group dummies in the regressions. These regressions show that the correlation between centrality and valuation also holds within groups, suggesting that, in each group, central firms carry lower valuations than other group firms. The magnitude of central firms’ valuation discount also appears to be significant. The distribution of the centrality variable is
very modal (see Table 1), with 75% of the firms having a zero value for centrality, while a few firms (5% of the sample) have centrality values greater than 10%. If we look at these extremes, the coefficients in Table 7 (which range approximately from 0.4 to 0.6) imply that a firm with a centrality value equal to 10% would have a Q that is 4.5% to 6.5% lower than a firm with zero centrality.39 As shown in Fig. 2, there are three basic types of firms in a chaebol, namely central firms, pyramidal firms, and noncentral firms owned directly. Thus, an interesting question is what is the relevant comparison group for central firms in the regressions discussed above. In other words, are central firms valued at a discount relative to directly owned, pyramidal, or both types of firms? The regressions in columns 1 to 6 effectively compare public central firms to all other public group firms, controlling for variation in ownership concentration. In order to provide evidence on this point, we construct dummy variables that capture specific comparison groups more directly. As above (in Table 4), we define a central firm as a firm that has a value of centrality greater than the mean value. Among public firms, the average value of centrality is 0.05, so we create a dummy variable (called Central) that takes the value of one if centrality is greater than 0.05, and zero otherwise. We can then use this dummy variable to make specific valuation comparisons by selecting the group of non-central firms for which Central takes the value of zero (the comparison group). Specifically, we create a dummy variable Central vs. Direct that takes the value of zero only for non-central firms which are owned directly by the family (as in Table 4, those with position lower than 1.5). And we create a second dummy variable (Central vs. Pyramid) which takes the value of zero only for non-central firms that are owned through pyramids (as in Table 4, those with position greater than 2.0). We then replace centrality with these dummy variables in our valuation regressions. We drop from the regression in which we use Central vs. Direct (Central vs. Pyramid) noncentral firms that are owned through a pyramid (directly). The results are shown in columns 7 to 9. In column 7, we use the dummy variable Central to compare the valuation of central firms to all other public group firms, as in columns 1 to 6. The purpose of this regression is to verify that the results continue to hold when we use a dummy variable for centrality instead of the continuous version, and to provide a better benchmark for the next valuation regressions. The result is shown in column 7, and it is consistent with the previous results. The coefficient on Central indicates that a central firm has a Q that is approximately 9% lower than an average non-central group firm. The next columns introduce the specific comparisons to directly owned and pyramidal firms. Column 8 shows that the central firm valuation discount relative to firms owned through pyramids is approximately 16%. And column 9 shows that the discount measured relative to directly owned firms is very similar to the average discount, 9%. Thus, central firms are valued at a discount relative to all other types of group
39 This calculation assumes that other variables are evaluated at their unconditional averages, that is, the discount is 4.5% to 6.5% of average Q (which is 0.9 in our data according to Table 2).
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
469
Table 9 Valuation and centrality. This table contains the tests described in Section 6.8, which relate a firm’s valuation to firm characteristics (Eq. (18)). The dependent variable is Tobin’s Q, as defined in Eq. (11). Stand-alone profitability is computed after an adjustment that takes into account the effect of equity stakes held in other chaebol firms (see Appendix C for details). Ultimate ownership is a measure of the family’s cash flow rights. Cross-shareholdings takes a value of one if the firm belongs to a cross-shareholding loop. Size is the log of the market value of equity. Centrality is the average drop in voting rights when a firm’s votes are not taken into account to compute the critical control threshold for the other group firms. Separation CC and Separation VR are defined, respectively, as CC (critical control threshold) minus ultimate ownership, and VR (consistent voting rights) minus ultimate ownership. Leverage is defined as non-current liabilities divided by stand-alone assets. Capital Expenditure is capital expenditures/stand-alone assets. The coefficients on Firm age are multiplied by 1,000. The dummy variable Central takes the value of one if a firm is classified as being a central firm, among the group’s public firms, and zero otherwise. A central firm is defined as a firm with a value of centrality greater than the mean value of centrality among public firms, which is 0.05. The dummy variable Central vs. Pyramid takes the value of one if a public group firm is classified as being a central firm, and zero if a public group firm is not central, and is owned through a pyramid (position greater than 2.0). The dummy variable Central vs. Direct takes the value of one if a public group firm is classified as being a central firm, and zero if a public group firm is not central, and is owned directly (position lower than 1.5). We computed robust standard errors clustered at the firm level. T-statistics are in parenthesis. * significant at 10%; ** significant at 5%; *** significant at 1%. Dependent variable: Tobin’s Q
Centrality
(1)
(2)
(3)
(4)
(5)
(6)
0.465*** ( 3.176)
0.560*** ( 3.774)
0.511*** ( 3.402)
0.386** ( 2.237)
0.396** ( 2.435)
0.405** ( 2.377)
Central
(7)
(8)
0.086*** ( 2.910)
Central vs. Pyramid
0.159*** ( 3.024)
Central vs. Direct Cross-shareholdings Firm age Size Stand-alone profitability Capital expenditure Leverage Ultimate ownership
0.052 ( 1.597) 4.265*** ( 3.694) 0.083*** (5.841) 0.259 (1.149) 0.418* (1.891) 0.059 (0.463) 0.114 ( 1.231)
Separation VR
0.054 ( 1.630) 4.258*** ( 3.623) 0.087*** (6.398) 0.253 (1.136) 0.402* (1.844) 0.068 (0.528)
Industry fixed effects Year fixed effects Group fixed effects Observations R2
0.052 ( 1.591) 4.265*** ( 3.672) 0.086*** (6.380) 0.252 (1.138) 0.393* (1.806) 0.068 (0.534)
0.046 ( 1.107) 4.596*** ( 3.393) 0.089*** (5.346) 0.342 (1.644) 0.330 (1.570) 0.002 (0.017) 0.061 ( 0.440)
0.044 ( 1.068) 4.521*** ( 3.246) 0.090*** (5.369) 0.340 (1.648) 0.328 (1.565) 0.001 (0.008)
0.069 ( 0.819)
Separation CC Constant
(9)
0.045 ( 1.104) 4.558*** ( 3.344) 0.090*** (5.354) 0.342 (1.649) 0.329 (1.571) 0.002 (0.020)
0.046 ( 1.442) 4.299*** ( 3.663) 0.083*** (5.736) 0.282 (1.242) 0.426* (1.910) 0.053 (0.409) 0.121 ( 1.283)
0.012 ( 0.300) 3.375*** ( 2.760) 0.111*** (5.586) 0.548** (2.220) 0.732** (2.135) 0.079 ( 0.648) 0.051 ( 0.390)
0.086*** ( 2.899) 0.036 ( 1.077) 3.896** ( 2.487) 0.064*** (3.960) 0.176 ( 0.536) 0.278 (1.374) 0.179 (1.224) 0.194** ( 1.988)
1.436*** ( 3.701)
2.126*** ( 3.927)
0.907** ( 2.141)
Yes Yes No 807 0.425
Yes Yes No 482 0.577
Yes Yes No 422 0.503
0.030 (0.336)
1.429*** ( 3.780)
1.103*** ( 3.002)
0.215* ( 1.908) 1.101*** ( 3.002)
Yes Yes No 807 0.426
Yes Yes No 807 0.425
Yes Yes No 807 0.427
1.573*** ( 3.538)
1.597*** ( 3.545)
0.017 (0.138) 1.581*** ( 3.531)
Yes Yes Yes 807 0.527
Yes Yes Yes 807 0.527
Yes Yes Yes 807 0.527
firms. This result indicates that the defining firm characteristic that generates the discount is the fact that a firm holds significant equity in other firms, and not the firm’s position in the group (consistent with Implication 3). We also conduct some robustness checks on our definition of centrality and valuation measures. Given the difficulties in measuring control, which is a crucial component of our centrality measure, we also use the stake variable to measure centrality (see Eq. (9)). Given that the definition of stake is independent of our measure of control, these tests help alleviate concerns that the results are driven by the particular control measure we used. The results of our valuation regressions suggest that this is not the case—stake is also negatively and significantly related to firm valuation. In addition, we test whether our results are sensitive to our
definition of Q. Given the difficulty of adjusting Q for the values of equity stakes (see Section 5.2), we used unadjusted Q in our benchmark regressions. However, our robustness checks suggest that the results remain if we use Qsa (the implied stand-alone market-to-book ratio of chaebol firms) to value chaebol firms. In particular, centrality is negatively and statistically significantly related to Qsa.40 6.8.1. Central firm valuation discount and acquisition activity The fact that central firms trade at a discount is consistent with Implication 3, which suggests that the central firm discount is due to the anticipation of 40
These results are available upon request.
470
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
acquisition activity by these firms. In this section, we provide direct evidence that the discount is, in fact, related to acquisition activity by the group’s central firms. While we cannot directly measure shareholders’ expectations of future acquisitions by central firms and other group firms, we can examine whether central firms did more acquisitions than other group firms during our sample period. Under the hypothesis that shareholders look at past acquisitions to predict future ones, a positive correlation between centrality and past acquisition activity can help validate Implication 3. In addition, under the logic suggested by Implication 3, central firms that have shown greater acquisition activity should trade at a larger discount than other central firms that have not been used as acquisition vehicles. We test both these hypotheses in this section. To test these hypotheses, we construct a measure of a group firm’s acquisition intensity during our sample period, by using the sample of acquisitions of new chaebol firms. We define a firm’s acquisition intensity as the sum of the value of equity stakes acquired by each group firm in the event of an acquisition of a new group firm, divided by the book value of the equity of the acquirer. This proxy captures the size of a firm’s acquisitions, relative to its total equity. As shown already in Table 2, for the vast majority of firm-years this variable is equal or close to zero. This suggests that only a few group firms are used as acquisition vehicles by the family. Columns 1 to 3 of Table 10 provide evidence on which group firms are typically used as acquisition devices, by regressing acquisition intensity on ownership variables and other firm characteristics.41 The results show that central group firms are, in fact, the ones that are most likely to be used as acquisition devices. Not surprisingly, older and larger firms are also more likely to acquire stakes in newly added group firms. The family’s ultimate ownership in a group firm is not significantly related to acquisition intensity. These results hold both before and after controlling for industry and group dummies, and help support Implication 3. In columns 4 to 6 we take a further step and ask whether central firms that show large acquisition activity trade at a greater discount than other central firms. Given that the distribution of acquisition intensity is significantly bimodal (with many values at zero), we construct a dummy variable to increase the power of the test. The dummy variable acquirer takes the value of one if a firm’s acquisitions comprise a significant amount of its equity. Specifically, we let acquirer take the value of one if acquisition intensity is greater than 0.02, which is one-standard-deviation above the median value of zero for this variable (see Table 2), and zero otherwise. We then interact this dummy variable with the dummy variable central that characterizes central firms (defined above). We use the same controls as those used in Table 9, including year, industry, and group dummies. The results provide evidence that central firms that are active acquirers trade at larger discounts than other central firms. The coefficient on the interaction variable
41
In these regressions, we lag the centrality variable to make sure that the correlation between acquisition intensity and centrality is not mechanical.
centralXacquirer is negative and statistically significant after controlling for firm characteristics, industry, and group dummies. To interpret the economic magnitude of the results, notice that we have to take into account the effect of the three dummies together (central, acquirer, and centralXacquirer). For example, take the coefficients in column 6 which control for industry and group dummies. A central firm that is not a heavy acquirer (acquirer = centralXacquirer = 0) trades at a discount of approximately 5.6%. After parsing out the correlation between acquirer and Q,42 we obtain that a central firm that is also a heavy acquirer (acquirer = centralXacquirer = 1) trades at a discount of 9.2%, which is significantly larger than the effect of centrality alone. This result supports Implication 3. 6.8.2. Discussion: central firms’ valuation discount In addition to the systematic evidence presented above, there are many examples of low valuation of central chaebol firms. A well-known case is that of SK Corporation, the most central firm in the SK group. In December 2003, the market capitalization of SK Corporation (the largest oil refinery in Korea) was approximately 2.9 billion dollars. Besides several stakes in private group firms, SK Corporation had a stake of 20% in SK Telecom (the largest mobile telecom company in Korea), which was worth 13.6 billion dollars, and a 39% stake in SK Networks, which was worth 4.3 billion dollars. The value of these equity stakes alone (i.e., assuming a zero value for the stakes in private firms) was 4.4 billion dollars. Thus, the implied equity value of SK Corporation’s stand-alone assets was 1.5 billion dollars. One possible explanation for SK Corporation’s negative equity value is that the firm had a large amount of liabilities (book value of liabilities equal to 8.1 billion dollars). If we add the entire amount of the book liabilities to SK Corporation’s stand-alone equity value, we obtain a stand-alone market value of 6.6 billion dollars for SK Corporation. Under these assumptions, the implied standalone Q (Qsa) of SK Corporation was 0.68 in December 2003. The true Qsa was likely to be even lower, because the stakes in private firms are not worthless, and because the book value of liabilities probably overestimates the true market value of debt of SK Corporation. This relatively low valuation for SK Corporation attracted the interest of an activist investment fund that specializes in emerging market stocks (the Sovereign Fund), which amassed 15% of SK Corporation shares in the market during 2003 and started issuing takeover threats. Sovereign’s attack subsequently raised SK Corporation’s equity value. As a result, by December 2004, SK Corporation’s Qsa had increased to 0.92. The initial low valuation of SK Corporation is consistent with the argument that central firms should be discounted due to anticipated pyramiding. In addition, the increase in its market value after the Sovereign Fund amassed a large stake might be due to the market’s realization that the large blockholder would prevent some of this pyramiding. The key characteristic of central firms is that they hold substantial equity stakes in other firms. Thus, the finding 42 This positive correlation is consistent with the empirical literature on mergers and acquisitions which suggests that high Q firms are more likely to engage in acquisitions (the Q-theory of mergers). See, for example, Jovanovic and Rousseau (2002).
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
471
Table 10 Central firm valuation discount and acquisition activity. This table contains the tests described in Section 6.8.1. In columns 1 to 3 the dependent variable is Acquisition intensity, which is described in Table 2. Centrality is the average drop in voting rights when a firm’s votes are not taken into account to compute the critical control threshold for the other group firms. The variable Centrality is lagged one period in the regressions of columns 1 to 3. In columns 4 to 6 the dependent variable is Tobin’s Q, as defined in Eq. (11). Stand-alone profitability is computed after an adjustment that takes into account the effect of equity stakes held in other chaebol firms (see Appendix C for details). Ultimate ownership is a measure of the family’s cash flow rights.Cross-shareholdings takes a value of one if the firm belongs to a cross-shareholding loop. Size is the log of the market value of equity. The dummy variable Central takes the value of one if a firm is classified as being a central firm, among the group’s public firms, and zero otherwise. A central firm is defined as a firm with a value of centrality greater than the mean value of centrality among public firms, which is 0.05. The dummy variable Acquirer takes the value of one if Acquisition intensity is greater than 0.2, and zero otherwise. The dummy variable Central Acquirer is the product of the dummy variables Central and Acquirer. Leverage is defined as non-current liabilities divided by stand-alone assets. The coefficients on Firm age are multiplied by 1,000. We computed robust standard errors clustered at the firm level. T-statistics are in parenthesis. * significant at 10%; ** significant at 5%; *** significant at 1%. Dependent variable: Acquisition intensity (1)
(2)
Tobin’s Q (3)
Central Acquirer Central Acquirer Centrality t 1
0.046*** (3.355)
(4)
(5)
(6)
0.032 ( 1.262) 0.091 (1.607) 0.101 ( 1.218)
0.072*** ( 2.720) 0.089* (1.662) 0.130* ( 1.695)
0.056* ( 1.886) 0.106** (2.065) 0.143* ( 1.958)
4.327*** ( 5.624)
4.769*** ( 5.309)
0.000 (0.00375) 0.042* ( 1.699) 0.087*** (8.094) 0.358*** (2.972) 0.330** (2.301) 0.064 ( 0.650) 1.117*** ( 3.004)
Yes Yes Yes 803 0.532
0.029** (2.205) 0.079* (1.835) 0.000 (1.406) 0.002 (1.147) 0.001 (0.809)
0.025* (1.790) 0.086* (1.719) 0.001 (1.273) 0.001 (0.982) 0.001 (1.085)
0.003** (2.128)
0.000 ( 0.151) 0.008 ( 1.311)
0.002 (0.678) 0.016* ( 1.860)
0.657** (2.358)
0.049 (0.747) 0.046** ( 2.169) 0.080*** (8.925) 0.301** (2.560) 0.423*** (2.898) 0.132* ( 1.759) 0.970*** ( 2.751)
Yes Yes No 1937 0.041
Yes Yes No 1885 0.049
Yes Yes Yes 1885 0.068
Yes Yes No 812 0.307
Yes Yes No 803 0.430
Firm age Ln assets Public Leverage Cross-shareholdings Size Stand-alone profitability Capital expenditure Ultimate ownership Constant
Industry fixed effects Year fixed effects Group fixed effects Observations R2
that central firms have low valuations bears some resemblance to the closed-end fund puzzle (see, e.g., Shleifer, 2000). Closed-end mutual funds tend to trade at substantial discounts relative to the NAV (net asset value) of the securities in their portfolios.43 In particular, some of the explanations developed to explain the closed-end fund puzzle bear some resemblance to Implication 3. It is possible, for example, that shareholders of the closed-end fund expect poor portfolio management in the future (similar to Implication 3). Nevertheless, not all arguments
43 See Rommens, Deloof, and Jegers (2008), for related evidence using data from Belgian holding companies.
regarding the closed-end fund puzzle seem equally relevant. For example, the investor sentiment story explained in Shleifer (2000) applied to the chaebol context would require individual investors (who are more subject to fluctuations in sentiment when compared to institutional investors) to be more likely to hold and trade shares of the parent company, relative to the subsidiaries. Although we do not examine this issue directly in this paper, there is no reason to expect that condition to hold in the Korean data. Cornell and Liu (2001), Mitchell, Pulvino, and Stafford (2002), and Lamont and Thaler (2003) provide evidence on another phenomenon that bears some resemblance to the central firm discount (the ‘‘parent company discount’’). For example, in the period of 1985–2000, Mitchell, Pulvino, and
472
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
Stafford (2002) identify 70 firms in which the market value of the equity stake that the parent holds in the subsidiary is higher than the market value of the parent. Lamont and Thaler (2003) show some extreme examples of potential misvaluations (such as the Palm and 3Com example), in which a commitment by the parent to spin-off the shares of the subsidiary at a fixed rate in a future date creates an apparently clear ‘‘arbitrage’’ opportunity.44 The standard explanation for this phenomenon in the US is that it is due to noise traders bidding up the prices of the subsidiary stocks, and arbitrage costs that make a price correction difficult to sustain (a large fraction of the firms analyzed in these studies is in the internet sector). We believe this market inefficiency story is also not likely to explain the central firm discount in Korea. First, the Korean phenomenon seems to be more general and persistent than the internet bubble-related discounts in the US. In particular, the subsidiaries of central Korean firms are not concentrated in any particular industry. Second, we provide some evidence in Section 6.8.1 that the discount is linked to expectations about acquisition activities of central firms, thus supporting the logic behind Implication 3. 7. Final remarks The main contribution of this paper is to shed new light on the process by which groups form. In doing this, we depart from the standard approach of assuming that ownership structure is exogenously given. We take advantage of a unique data set that allows us to observe the details of the ownership structure of Korean chaebols, and to have a small window on how chaebolstructure evolves over time. We see this paper as a first step towards the understanding of the evolution of business groups. Naturally, many questions are open for future research. First of all, it would be interesting to see if our findings about group structure are particular to Korean chaebols or if they extend to groups in other countries as well. For that purpose, we note that the metrics of ownership structure that we derive in the paper (such as the critical control threshold, position, and centrality) can be easily applied to other data. To facilitate the implementation of our measures, we provide algorithms that can be used to calculate these variables for groups of any complexity. Second, while our short time-series allows us to observe a few major changes in ownership structure (such as the addition of new firms to the group), there are many questions that require a longer time series. For example, besides observing that central firms are the most established firms in the group, we have little to say about how the family chooses central firms among several candidate group firms. Given that centrality changes little over time, addressing such a question requires a much longer timeseries than the one we currently have. Such data might also allow us to better understand the sources of the low 44 The spin-off fixed a ratio of shares of Palm that each 3Com shareholder would receive (1.5) in one year, subject to the Securities and Exchange Commission (SEC) approval. However, 3Com traded at a price that was substantially lower than 1.5 times the price of Palm. Ross (2004) offers a rational explanation for this phenomenon.
profitability of central firms that we report document in Section 6.1. Last, but not least, we have focused exclusively on understanding the family’s choice of ownership for chaebol firms, ignoring the question of why a given firm becomes a chaebol member in the first place. Clearly, understanding the selection of firms into a chaebol is an essential component of a complete theory of business group structure. In addition, while we have taken the presence of cross-shareholdings into account to compute our ownership measure, we have not attempted to understand the reasons that motivate the family to create cross-shareholding loops among chaebol firms. Both of these questions could be analyzed in future research. Appendix A. Numerical examples of position, voting rights and centrality We illustrate the computation of our main ownership measures using a simple example. The group is represented in Fig. A1. The family owns a 40% direct stake in firm 1 and a 10% direct stake in firm 2. In addition, firm 1 owns a 50% stake in firm 2. While this simple structure is not representative of real world chaebol structures, it can help the reader understand the logic behind the new measures. Ultimate ownership: Ultimate ownership is easy to compute. The family’s ultimate ownership in firm 1 is 40% and in firm 2 is 10% + (40%)(50%) = 30%. Position: Firm 1’s position is clearly equal to one as there is only one chain leading to that firm. The formula we propose leads to the same answer: 0:4 0:4 1 ¼ 1. Regarding firm 2, the family holds the direct stake of 10%, and it also retains a 20% ownership stake through firm 1. Our formula yields pos2 ¼
0:1 0:2 1þ 2 ¼ 1:7: 0:3 0:3
ð19Þ
This is intuitive, since firm 2’s ownership is close to a pure pyramid (the biggest stake is held through firm 1), but it is not a pure pyramid because of the direct stake of 10%. Voting rights: Take, for example, a control threshold equal to 30% (T= 30%). In that case, the family controls firm 1 (since it holds 40% of its votes). According to our formula, the family has 50% of the votes in firm 2 (10% directly and 50% through firm 1, which it controls). Thus, the family also controls firm 2. Clearly, the family controls both firms
Fig. A1. A simple group.
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
for any control threshold lower than or equal to 40%. Thus CðTÞ ¼ f1,2g
for any T r40%:
ð20Þ
For T above 40%, the family no longer controls firm 1. Also, the votes it controls in firm 2 are only 10% (we no longer add the 50% since for T 440%, the family does not control firm 1). Thus, the family does not control firm 2 either CðTÞ ¼ |
for any T 4 40%:
ð21Þ
It follows that the critical control threshold measures are CC 1 ¼ CC 2 ¼ 40%:
ð22Þ
The VR measures for any T r 40% are VR1 ¼ 40%, VR2 ¼ 10% þ 50% ¼ 60%: The VR measure adds the entire stake held by firm 1 in firm 2 to the direct stake of 10%, as long as the family retains control of firm 1. If T 4 40%, VR2 drops to 10%. Centrality: To compute centrality measures, we compute the average critical control threshold with and without the relevant firm. Let us start with firm 2. We know that CC1 =40%. If we eliminate firm 2 from the group and recompute CC1, we would still have CC1 =40%. This implies that eliminating firm 2 from the group does not affect the average voting rights in other group firms. Accordingly, the centrality of firm 2 is zero. In contrast, if we eliminate firm 1, the family will only control firm 2 if T r10%. That is, CC2 goes to 10%. Thus 40%10% ¼ 30%: central1 ¼ 1 Appendix B. Computing the set C(T) We first provide a formal definition of the algorithm to compute C(T) and then we explain how it works. Definition 6 (Algorithm). Let the sequence of sets Sð0Þ + Sð1Þ + Sð2Þ . . . be defined by S(0) =N, and Sðn þ 1Þ ¼ P fi 2 SðnÞ : fi þ j2SðnÞ, jai sji Z Tg. The idea behind this algorithm is to start with all the firms, S(0)= N. In the first stage, we assume that the family controls all the firms and we drop the firms in which the direct and indirect stake of the family is below T. This procedure generates S(1). Next, we assume that the family controls only the firms in S(1) and again drop from S(1) the firms in which the direct and indirect stake of the family is below T. This generates S(2). We can repeat this algorithm a number ]N of times to arrive at Sð]NÞ. This last set is important in light of the following proposition. Proposition 4. Sð]NÞ satisfies the condition in Eq. (6) which we re-write here 8 9 < = X CðTÞ ¼ i 2 N : fi þ sji Z T : : ; j2CðTÞ, jai
To prove this proposition, we need to show Sð]NÞ ¼ P fi 2 N : fi þ j2Sð]NÞ, jai sji Z Tg. The proof is divided into a number of steps.
473
Step 1: Sð]NÞ ¼ Sð]N þ 1Þ. Consider two cases: (1) Sð]NÞ ¼ | and (2) Sð]NÞa|. In case (1), the result follows directly from the definition of Sð]N þ 1Þ. In case (2), we have that, after ]N stages, there are firms that are not yet eliminated. Because we started with ]N firms, this means that there was a stage n r]N such that no firm was dropped. In other words, we have that S(n)= S(n 1). We P can now compute Sðn þ 1Þ ¼ fi 2 SðnÞ : fi þ j2SðnÞ, jai sji Z P Tg ¼ fi 2 Sðn1Þ : fi þ j2Sðn1Þ,jai sji Z Tg ¼ SðnÞ, where the first equality follows from S(n) =S(n 1) and the second from the definition of S(n). Analogously, we can show that SðnÞ ¼ Sðn þ1Þ ¼ Sðn þ 2Þ ¼ ¼ Sð]NÞ ¼ Sð]N þ 1Þ. The last equality proves Step 1. P Step 2: Sð]NÞ Dfi 2 N : fi þ j2Sð]NÞ,jai sji Z Tg. Note that P Sð]NÞ ¼ Sð]N þ 1Þ ¼ fi 2 Sð]NÞ : fi þ j2Sð]NÞ,jai sji Z Tg, where the first equality follows from Step 1 and the second is simply the definition of Sð]N þ 1Þ. Because Sð]NÞ D N, it is P clear that i 2 Sð]NÞ ) i 2 fi 2 N : fi þ j2Sð]NÞ,jai sji ZTg. P Step 3: Sð]NÞ + fi 2 N : fi þ j2Sð]NÞ,jai sji Z Tg. Towards a P contradiction, we suppose that k 2 fi 2 N : fi þ j2Sð]NÞ,jai sji Z Tg and k= 2Sð]NÞ. The first condition implies that X sjk ZT: ð23Þ fk þ j2Sð]NÞ, jai
The last condition implies that firm k was eliminated in some earlier stage in the algorithm, say stage n. Thus, k 2 Sðn1Þ but k= 2SðnÞ. We now have X X T 4fk þ sjk Z fk þ sjk , ð24Þ j2Sðn1Þ,jak
j2Sð]NÞ,jak
where the first inequality follows from the fact that firm k was eliminated in round n and the second inequality follows from Sðn1Þ + Sð]NÞ and the fact that sij Z0. This is a contradiction because Eqs. (23) and (24) cannot hold at the same time. Putting together Steps 2 and 3 leads to the statement of Proposition. & One problem that we need to address is the existence of multiple sets that satisfy Eq. (6). Consider the example in Fig. A2, and assume that T=25%. Clearly, we have that C(25%) ={1,2,3} because the set {1,2,3} satisfies Eq. (6). However, the null set also satisfies Eq. (6) for the same control threshold. To see this, suppose that the family controls no firms, then its voting rights in firms 1, 2, and 3 are 5%, 7%, and 10%, respectively. Note that all of them are below the threshold of 25%, confirming that the family does not control any of these firms. Because in the case of Korea the firms with which we start (the set N) have already been pre-classified as members of the chaebol, we would like to choose the set that satisfies Eq. (6) and at the same time has the maximum number of firms. We can prove the following proposition. Proposition 5. Consider all possible sets of firms that satisfy Eq. (6) for a given control threshold T: C1,C2,y,CM. The S following holds: Sð]NÞ ¼ M i ¼ 1 Ci . This proposition is important for two reasons. First, it tells us that there is a unique set that has the maximum number of firms over all the sets that satisfy Eq. (6). This is important since it removes the arbitrariness of picking a set among many. Second, the proposition tells us that the
474
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
profits coming from affiliate firms can thus be calculated as Affiliate profits ¼ Gain on equity method Loss on equity method:
ð27Þ
With this knowledge, it is easy to adjust the financial statements to back out the values of the accounting figures that refer to each individual chaebol firm. Specifically, we have Stand-alone assets ¼ Total assetsEquity method stock: ð28Þ Fig. A2. A complex group with many cross-shareholdings.
outcome of the algorithm is precisely the set we are looking for. The proof of this result is divided into two steps. S Step 1: Sð]NÞ D M i ¼ 1 Ci . By Proposition 4, we know that Sð]NÞ satisfies Eq. (6), thus, there is an m such that Sð]NÞ ¼ Cm . The result follows. S Step 2: Sð]NÞ + M i ¼ 1 Ci . We show that Cm D Sð]NÞ for all m =1yM. Step 2 follows directly from this. Take a set Cm. Because Cm satisfies Eq. (6), the following is true: X For all k 2 Cm , fk þ sjk Z T: ð25Þ j2Cm ,jak
Towards a contradiction, suppose that some of the firms in Cm are not in Sð]NÞ: That is, there must be a stage in the algorithm in which the first firm of Cm is eliminated. Let that stage be n. We then have that Cm D Sðn1Þ but there is at least one k 2 Cm such that k= 2SðnÞ. We now have that X X T 4fk þ sjk Z fk þ sjk , ð26Þ j2Sðn1Þ,jak
j2Cm ,jak
where the first inequality follows from the fact that k is eliminated in round n and the second follows from Cm DSðn1Þ and the fact that sjk Z 0. This is a contradiction because Eqs. (25) and (26) cannot hold at the same time. This proves Step 2. Finally, putting together Steps 1 and 2 leads to the statement of the proposition. &
Appendix C. Accounting measures of stand-alone assets and stand-alone profitability After January 1, 2003, the item ‘‘stocks accounted in equity method’’ (code number KLCA 123560) reports the aggregate book value of the shares subject to the equity method. Before 2003, however, ‘‘stocks accounted in equity method’’ was not separately recorded but pooled into all investment securities. The data are available from the footnotes to financial statements, which we examined to calculate this item for the remaining years. Regarding profitability, the profits coming from affiliate companies (call it ‘‘equity method profits’’) are recorded in two items in the non-operating portion of the income statement of parent companies. If equity method profits are positive, they are called ‘‘Gain on valuation of equity method’’ (KLCA # 242100). If they are negative, they are called ‘‘Loss on valuation of equity method’’ (KLCA # 252600). The total
Our profitability measure is Stand-alone profitability Ordinary income þ Interest paymentsAffiliate profits ¼ : Stand alone assets
ð29Þ Notice that we compute profitability before interest payments and taxes. Our profitability measure thus corresponds to an earnings before interest and taxes (EBIT) measure in US accounting, adjusted for profits coming from affiliate firms. We also check the data for basic consistency requirements. In particular, if the balance sheet shows a number for the equity method stock (i.e., if item KLCA#123560 is non-missing), then there should also be an item in the income statement for gains and losses from equity method (i.e., KLCA#242100 and KLCA#252600 cannot both be missing). The reverse should also hold. In addition, it should not be the case that both items KLCA#242100 and KLCA#252600 are positive, since affiliates will either generate a profit or a loss. We eliminate all firm-years that do not satisfy this consistency requirement. References Aganin, A., Volpin, P., 2005. History of corporate ownership in Italy. In: Morck, R. (Ed.), A History of Corporate Governance Around the World: Family Business Groups to Professional Managers. University of Chicago Press, Chicago, IL, pp. 325–366. Almeida, H., Campello, M., 2007. Financial constraints, asset tangibility and corporate investment. Review of Financial Studies 20, 1429–1460. Almeida, H., Wolfenzon, D., 2006. A theory of pyramidal ownership and family business groups. Journal of Finance 61, 2637–2681. Attig, N., Fischer, K., Gadhoum, Y., 2003. On the determinants of pyramidal ownership: evidence on expropriation of minority interests. Unpublished working paper, Laval University. Bae, K., Kang, J., Kim, J., 2002. Tunneling or value added? Evidence from mergers by Korean business groups. Journal of Finance 57, 2695–2740. Baek, J., Kang, J., Park, K., 2004. Corporate governance and firm value: evidence from the Korean financial crisis. Journal of Financial Economics 71, 265–313. Baek, J., Kang, J., Lee, I., 2006. Business groups and tunneling: Evidence from private securities offerings by Korean chaebols. Journal of Finance 61, 2415–2449. Barca, F., Becht, M., 2001. The Control of Corporate Europe. Oxford University Press, Oxford. Bebchuk, L., Reinier, K., Triantis, G., 2000. Stock pyramids, cross-ownership, and dual class equity: the mechanism and agency costs of separating control from cash-flow rights. In: Morck, R. (Ed.), Concentrated Corporate Ownership. University of Chicago Press, Chicago, IL, pp. 295–318. Bennedsen, M., Nielsen, K., 2006. The principle of proportional ownership, investor protection and firm value in Western Europe. Unpublished working paper, Copenhagen Business School and CEBR. Berle, A., Means, G., 1932. The Modern Corporation and Private Property. McMillan, New York.
H. Almeida et al. / Journal of Financial Economics 99 (2011) 447–475
Bertrand, M., Mehta, P., Mullainathan, S., 2002. Ferreting out tunneling: an application to Indian business groups. Quarterly Journal of Economics 117, 121–148. Bertrand, M., Johnson, S., Samphantharak, K., Schoar, A., 2008. Mixing family with business: a study of Thai business groups and the families behind them. Journal of Financial Economics 88, 466–498. Bianchi, M., Bianco, M., Enriques, L., 2001. Pyramidal groups and the separation between ownership and control in Italy. Unpublished working paper, Bank of Italy. Bianco, M., Nicodano, G., 2006. Pyramidal groups and debt. European Economic Review 50, 937–961. Bohren, O., Michalsen, D., 1994. Corporate cross-ownership and market aggregates: Oslo Stock Exchange 1980–1990. Journal of Banking and Finance 18, 687–704. Brioschi, F., Buzzacchi, L., Colombo, M., 1989. Risk capital financing and the separation of ownership and control in business groups. Journal of Banking and Finance 13, 747–772. Cheung, Y.L., Rau, R., Stouraitis, A., 2006. Tunneling, propping, and expropriation: evidence from connected party transactions in Hong Kong. Journal of Financial Economics 82, 343–386. Claessens, S., Djankov, S., Fan, J., Lang, L., 2002. Disentangling the incentive and entrenchment effects of large shareholdings. Journal of Finance 57, 2741–2771. Claessens, S., Djankov, S., Lang, L., 2000. The separation of ownership and control in East Asian corporations. Journal of Financial Economics 58, 81–112. Claessens, S., Fan, J., Lang, L., 2002. The benefits of group affiliation: Evidence from East Asia. Unpublished working paper, University of Amsterdam. Cornell, B., Liu, Q., 2001. The parent company puzzle: When is the whole worth less than one of the parts? Journal of Corporate Finance 7, 341–366. De Jong, A., Dejong, V., Hege, U., Mertens, G., 2009. Debt and dividends in pyramidal structures: evidence from France. Unpublished working paper, University of Iowa. Faccio, M., Lang, L., 2002. The ultimate ownership of Western European corporations. Journal of Financial Economics 65, 365–395. Fan, J., Wong, T., Zhang, T., 2009. Institutions and Organizational Structure: The Case of State-Owned Corporate Pyramids. Unpublished working paper, Chinese University of Hong Kong. Ferris, S., Kim, A., Kitsabunnarat, P., 2003. The cost(and benefits)? of diversified business groups: the case of Korean chaebols. Journal of Banking and Finance 27, 251–273. Fisman, R., Khanna, T., 2000. Facilitating development: the role of business groups. Unpublished working paper, Columbia University, and Harvard University. Frank, M., Goyal, V., 2009. Capital structure decisions: Which factors are reliably important? Financial Management 38, 1–37. Franks, J., Mayer, C., 2001. Ownership and control of German corporations. Review of Financial Studies 14, 943–977. Franks, J., Mayer, C., Volpin, P., Wagner, H., 2008. Evolution of family capitalism: a comparative study of France, Germany, Italy and the UK. Unpublished working paper, London Business School, University of Oxford, and Bocconi University. French, K., Poterba, J., 1991. Were Japanese stock prices too high? Journal of Financial Economics 29, 337–363. Gopalan, R., Nanda, V., Seru, A., 2006. Affiliated firms and financial support: evidence from Indian business groups. Journal of Financial Economics 86, 759–795.
475
Gopalan, R., Nanda, V., Seru, A., 2007, Do business groups use dividends to fund investments? Unpublished working paper, Olin School of Business. Graham, B., Dodd, D., 1934. Security Analysis. McGraw-Hill, New York. Joh, W., 2003. Corporate governance and profitability: Evidence from Korea before the economic crisis. Journal of Financial Economics 68, 287–322. Jovanovic, B., Rousseau, P., 2002. The Q-theory of mergers. American Economic Review 92, 198–204. Kang, H., Park, K., Jang, H., 2006a. The choice of group structure: divide and rule. The Korean Journal of Finance 19, 188–230 (in Korean language). Kang, H., Park, K., Jang, H., 2006b. Determinants of internal transactions among the member firms of Korean conglomerates. The Korean Journal of Finance 19, 77–118 (in Korean language). Khanna, T., 2000. Business groups and social welfare in emerging markets: existing evidence and unanswered questions. European Economic Review 44, 748–761. Khanna, T., Palepu, K., 2000. Is group affiliation profitable in emerging markets? An analysis of diversified Indian business groups. Journal of Finance 55, 867–891. Khanna, T., Rivkin, J., 2001. Estimating the performance effects of business groups in emerging markets. Strategic Management Journal 22, 45–74. Kim, W., Sung, T., 2006. What makes group-affiliated firms go public? Unpublished working paper, KDI School of Public Policy and Management. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., 1999. Corporate ownership around the world. Journal of Finance 54, 471–517. Lamont, O., Thaler, R., 2003. Can the market add and subtract? Mispricing in tech stock carve-outs. Journal of Political Economy 111, 227–268. Leontief, W., 1986. Input-Output Economics, second ed. Oxford University Press, New York. Lins, K., 2003. Equity ownership and firm value in emerging markets. Journal of Financial and Quantitative Analysis 38, 159–184. Masulis, R., Pham, P., Zein, J., 2008. Pyramids: Empirical evidence on the costs and benefits of family business groups. Unpublished working paper, Vanderbilt University. Marisetty, V., Subrahmanyam, M., 2010. Group affiliation and the performance of IPOs in the Indian stock market. Journal of Financial Markets 13, 196–223. Mitchell, M., Pulvino, T., Stafford, E., 2002. Limited arbitrage in equity markets. Journal of Finance 57, 551–584. Morck, R., Stangeland, D., Yeung, B., 2000. Inherited wealth, corporate control and economic growth: The Canadian disease? In: Morck, R. (Ed.), Concentrated Corporate Ownership University of Chicago Press, Chicago, IL. Morck, R., Wolfenzon, D., Yeung, B., 2005. Corporate governance, economic entrenchment and growth. Journal of Economic Literature 43, 657–722. Rommens, A., Deloof, M., Jegers, M., 2008. Why do holding companies trade at a discount? A clinical study. Unpublished working paper, Free University of Brussels. Ross, S., 2004. Neoclassical Finance. Princeton University Press, Princeton, NJ. Shleifer, A., 2000. Inefficient Markets: An Introduction to Behavioral Finance. Oxford University Press, New York. Volpin, P., 2002. Governance with poor investor protection: evidence from top executive turnover in Italy. Journal of Financial Economics 64, 61–90.