Hard-to-Measure Goods and Services
Studies in Income and Wealth Volume 67
National Bureau of Economic Research Confe...
36 downloads
706 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Hard-to-Measure Goods and Services
Studies in Income and Wealth Volume 67
National Bureau of Economic Research Conference on Research in Income and Wealth
Hard-to-Measure Goods and Services Essays in Honor of Zvi Griliches
Edited by
Ernst R. Berndt and Charles R. Hulten
The University of Chicago Press Chicago and London
Ernst R. Berndt is the Louis B. Seley Professor of Applied Economics at the Sloan School of Management, Massachusetts Institute of Technology, and director of the Program on Technological Progress and Productivity Measurement at the National Bureau of Economic Research. Charles R. Hulten is professor of economics at the University of Maryland, chairman of the executive committee of the Conference on Research in Income and Wealth, and a research associate of the National Bureau of Economic Research.
The University of Chicago Press, Chicago 60637 The University of Chicago Press, Ltd., London © 2007 by the National Bureau of Economic Research All rights reserved. Published 2007 Printed in the United States of America 16 15 14 13 12 11 10 09 08 07 1 2 3 4 5 ISBN-13: 978-0-226-04449-1 (cloth) ISBN-10: 0-226-04449-1 (cloth)
Library of Congress Cataloging-in-Publication Data Hard-to-measure goods and services : essays in honor of Zvi Griliches / edited by Ernst R. Berndt and Charles R. Hulten. p. cm. — (Studies in income and wealth ; v. 67) Conference proceedings. Includes bibliographical references and indexes. ISBN-13: 978-0-226-04449-1 (cloth : alk. paper) ISBN-10: 0-226-04449-1 (cloth : alk. paper) 1. Griliches, Zvi, 1930– 2. Econometrics. 3. Income distribution. I. Berndt, Ernst R. II. Hulten, Charles R. HB139.H368 2007 330.01’5195—dc22 2006100534
o The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences— Permanence of Paper for Printed Library Materials, ANSI Z39.481992.
National Bureau of Economic Research Officers Elizabeth E. Bailey, chairman John S. Clarkeson, vice-chairman Martin Feldstein, president and chief executive officer Susan Colligan, vice president for administration and budget and corporate secretary
Robert Mednick, treasurer Kelly Horak, controller and assistant corporate secretary Gerardine Johnson, assistant corporate secretary
Directors at Large Peter C. Aldrich Elizabeth E. Bailey John H. Biggs Andrew Brimmer John S. Clarkeson Don R. Conlan Kathleen B. Cooper George C. Eads
Jessica P. Einhorn Martin Feldstein Jacob A. Frenkel Judith M. Gueron Robert S. Hamada Karen N. Horn Judy C. Lewent John Lipsky
Laurence H. Meyer Michael H. Moskow Alicia H. Munnell Rudolph A. Oswald Robert T. Parry Marina v. N. Whitman Martin B. Zimmerman
Directors by University Appointment George Akerlof, California, Berkeley Jagdish Bhagwati, Columbia Ray C. Fair, Yale Michael J. Brennan, California, Los Angeles Glen G. Cain, Wisconsin Franklin Fisher, Massachusetts Institute of Technology Saul H. Hymans, Michigan Marjorie B. McElroy, Duke
Joel Mokyr, Northwestern Andrew Postlewaite, Pennsylvania Uwe E. Reinhardt, Princeton Nathan Rosenberg, Stanford Craig Swan, Minnesota David B. Yoffie, Harvard Arnold Zellner (director emeritus), Chicago
Directors by Appointment of Other Organizations Richard B. Berner, National Association for Business Economics Gail D. Fosler, The Conference Board Martin Gruber, American Finance Association Arthur B. Kennickell, American Statistical Association Thea Lee, American Federation of Labor and Congress of Industrial Organizations William W. Lewis, Committee for Economic Development
Robert Mednick, American Institute of Certified Public Accountants Angelo Melino, Canadian Economics Association Jeffrey M. Perloff, American Agricultural Economics Association John J. Siegfried, American Economic Association Gavin Wright, Economic History Association
Directors Emeriti Carl F. Christ George Hatsopoulos Lawrence R. Klein
Franklin A. Lindsay Paul W. McCracken Peter G. Peterson
Richard N. Rosett Eli Shapiro Arnold Zellner
Relation of the Directors to the Work and Publications of the National Bureau of Economic Research 1. The object of the NBER is to ascertain and present to the economics profession, and to the public more generally, important economic facts and their interpretation in a scientific manner without policy recommendations. The Board of Directors is charged with the responsibility of ensuring that the work of the NBER is carried on in strict conformity with this object. 2. The President shall establish an internal review process to ensure that book manuscripts proposed for publication DO NOT contain policy recommendations. This shall apply both to the proceedings of conferences and to manuscripts by a single author or by one or more coauthors but shall not apply to authors of comments at NBER conferences who are not NBER affiliates. 3. No book manuscript reporting research shall be published by the NBER until the President has sent to each member of the Board a notice that a manuscript is recommended for publication and that in the President’s opinion it is suitable for publication in accordance with the above principles of the NBER. Such notification will include a table of contents and an abstract or summary of the manuscript’s content, a list of contributors if applicable, and a response form for use by Directors who desire a copy of the manuscript for review. Each manuscript shall contain a summary drawing attention to the nature and treatment of the problem studied and the main conclusions reached. 4. No volume shall be published until forty-five days have elapsed from the above notification of intention to publish it. During this period a copy shall be sent to any Director requesting it, and if any Director objects to publication on the grounds that the manuscript contains policy recommendations, the objection will be presented to the author(s) or editor(s). In case of dispute, all members of the Board shall be notified, and the President shall appoint an ad hoc committee of the Board to decide the matter; thirty days additional shall be granted for this purpose. 5. The President shall present annually to the Board a report describing the internal manuscript review process, any objections made by Directors before publication or by anyone after publication, any disputes about such matters, and how they were handled. 6. Publications of the NBER issued for informational purposes concerning the work of the Bureau, or issued to inform the public of the activities at the Bureau, including but not limited to the NBER Digest and Reporter, shall be consistent with the object stated in paragraph 1. They shall contain a specific disclaimer noting that they have not passed through the review procedures required in this resolution. The Executive Committee of the Board is charged with the review of all such publications from time to time. 7. NBER working papers and manuscripts distributed on the Bureau’s web site are not deemed to be publications for the purpose of this resolution, but they shall be consistent with the object stated in paragraph 1. Working papers shall contain a specific disclaimer noting that they have not passed through the review procedures required in this resolution. The NBER’s web site shall contain a similar disclaimer. The President shall establish an internal review process to ensure that the working papers and the web site do not contain policy recommendations, and shall report annually to the Board on this process and any concerns raised in connection with it. 8. Unless otherwise determined by the Board or exempted by the terms of paragraphs 6 and 7, a copy of this resolution shall be printed in each NBER publication as described in paragraph 2 above.
Contents
Prefatory Note Acknowledgments
xi xiii
I. Context and Prologue Introduction Ernst R. Berndt and Charles R. Hulten 1. Theory and Measurement: An Essay in Honor of Zvi Griliches Charles R. Hulten
3
15
II. Classic Input Measurement Issues Revisited 2. Production Function and Wage Equation Estimation with Heterogeneous Labor: Evidence from a New Matched Employer-Employee Data Set Judith K. Hellerstein and David Neumark 3. Where Does the Time Go? Concepts and Measurement in the American Time Use Survey Harley Frazis and Jay Stewart
31
73
4. Technology and the Theory of Vintage Aggregation 99 Michael J. Harper 5. Why Do Computers Depreciate? Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
121
vii
viii
Contents
III. Quality Adjustment and Price Measurement Issues: Recent Developments 6. Downward Bias in the Most Important CPI Component: The Case of Rental Shelter, 1914–2003 Robert J. Gordon and Todd vanGoethem
153
7. Pricing at the On-Ramp to the Internet: Price Indexes for ISPs during the 1990s Greg Stranger and Shane Greenstein
197
8. Different Approaches to Estimating Hedonic Indexes Saeed Heravi and Mick Silver
235
9. Price Indexes for Microsoft’s Personal Computer Software Products Jaison R. Abel, Ernst R. Berndt, and Alan G. White
269
10. International Comparisons of R&D Expenditure: Does an R&D PPP Make a Difference? 291 Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, and Bart van Ark IV. Information Technology and the Acceleration of Productivity Growth 11. Information Technology and the G7 Economies Dale W. Jorgenson
325
12. The Role of Semiconductor Inputs in IT Hardware Price Decline: Computers versus Communications Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
351
13. Computer Input, Computer Networks, and Productivity B. K. Atrostic and Sang Nguyen
383
V. Measuring and Modeling Productivity, Consumption, and Diffusion 14. Services Productivity in the United States: Griliches’s Services Volume Revisited Barry P. Bosworth and Jack E. Triplett
413
Contents
15. A Consistent Accounting of U.S. Productivity Growth Eric J. Bartelsman and J. Joseph Beaulieu
ix
449
16. Should Exact Index Numbers Have Standard Errors? Theory and Application to Asian Growth 483 Robert C. Feenstra and Marshall B. Reinsdorf 17. What Really Happened to Consumption Inequality in the United States? Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
515
18. Technology Adoption from Hybrid Corn to Beta-Blockers Jonathan Skinner and Douglas Staiger
545
19. Zvi Griliches’s Contributions to Economic Measurement Jack E. Triplett
573
Contributors Author Index Subject Index
591 595 601
VI. EPILOGUE
Prefatory Note
This volume contains revised versions of most of the papers presented at the Conference on Research in Income and Wealth entitled “Hard-to-Measure Goods and Services: Essays in Memory of Zvi Griliches,” held in Bethesda, Maryland, on September 19–20, 2003. It also contains some material not presented at that conference. Funds for the Conference on Research in Income and Wealth are supplied by the Bureau of Economic Analysis, the Bureau of Labor Statistics, the Census Bureau, the Federal Reserve Board, Statistics of Income/IRS, and Statistics Canada. For this conference, additional financial support was provided by the National Science Foundation under grant SES-0322213. We are indebted to these organizations for their support. We thank Ernst Berndt and Charles Hulten, who served as conference organizers and editors of the volume. We also thank the NBER staff and University of Chicago Press editors for their assistance in organizing the conference and editing the volume. Executive Committee, October 2006 John M. Abowd Susanto Basu Ernst R. Berndt Carol A. Corrado Robert C. Feenstra John Greenlees John C. Haltiwanger Michael J. Harper Charles R. Hulten, chair
Ronald Jarmin Lawrence F. Katz J. Steven Landefeld Brent R. Moulton Thomas B. Petska Mark J. Roberts Matthew Shapiro David W. Wilcox
xi
Acknowledgments
The Bethesda, Maryland, conference on September 19–20, 2003, took place during Hurricane Isabel, one of the worse storms to hit the area in modern times. Travel plans were severely disrupted, and a number of participants were forced to cancel their attendance, though many did manage to attend and others participated by teleconference link. The perseverance of these participants is an additional tribute to the memory of Zvi Griliches, over and above the offerings in this volume. Conference planning and logistics were also greatly affected by the hurricane, but were executed superbly by Carl Beck, Lita Kimble, Brett Maranjian, and Rob Shannon in the Conference Department at the National Bureau of Economic Research. We gratefully acknowledge the professionalism and grace with which they organized conference details in the face of this unexpected and highly disruptive event. We also note with great sadness the passing of Robert McGuckin, who died on March 12, 2006, of cancer. He made important contributions to the cause of economic measurement, and he will be missed, both as a friend and a researcher. Last, and not least, we would like to acknowledge that this CRIW-NBER Conference was supported in part by the National Science Foundation grant SES-0333313. We thank Dan Newlon of the National Science Foundation for that support.
xiii
I
Context and Prologue
Introduction Ernst R. Berndt and Charles R. Hulten
If the data were perfect, collected from well-designed randomized experiments, there would be hardly room for a separate field of econometrics. Given that it is the “badness” of the data that provides us with our living, perhaps it is not all that surprising that we have shown little interest in improving it. —Zvi Griliches (1986, 1466) Great advances have been made in theory and in econometric techniques, but these will be wasted unless they are applied to the right data. —Zvi Griliches (1994, 2) My father would never eat “cutlets” (minced meat patties) in the old country. He would not eat them in restaurants because he didn’t know what they were made of and he wouldn’t eat them at home because he did. —Zvi Griliches (an old family story, 1986, 1472) Empirical economists have over generations adopted the attitude that having bad data is better than having no data at all,
Ernst R. Berndt is the Louis B. Seley Professor of Applied Economics at the Sloan School of Management, Massachusetts Institute of Technology, and director of the Program on Technological Progress and Productivity Measurement at the National Bureau of Economic Research. Charles R. Hulten is a professor of economics at the University of Maryland, chairman of the executive committee of the Conference on Research in Income and Wealth, and a research associate of the National Bureau of Economic Research.
3
4
Ernst R. Berndt and Charles R. Hulten that their task is to learn as much as is possible about how the world works from the unquestionably lousy data at hand. —Zvi Griliches (1986, 1508) Why are the data not better? . . . Why does it feel as if the glass is still half-empty? . . . The metaphor of the glass half-empty is also misleading. As we fill it, the glass keeps growing. A major aspect of learning is that the unknown keeps expanding as we learn. This should be looked at positively. It is much better this way—especially for those of us who are engaged in research! —Zvi Griliches (1994, 14, 17, 18)
Overview More than fifty years ago, Oskar Morgenstern (1950) pointedly asked whether economic data were sufficiently accurate for the purposes for which economists, econometricians, and economic policymakers were using them. Morgenstern raised serious doubts concerning the quality of many economic data series and implicitly about the foundations of a large number of econometric and economic policy analyses. In 1986, more than thirty-five years later, in the final remarks section of his Handbook of Econometrics chapter entitled “Economic Data Issues,” Zvi Griliches commented with sadness on Morgenstern’s important observations and criticisms, stating, “Years have passed and there has been very little coherent response to his criticisms” (1986, 1507). The absence of a coherent response cannot be laid at Griliches’s feet. His entire career can be viewed as an attempt to advance the cause of accuracy in economic measurement. His interest in the causes and consequences of technical progress led to his pathbreaking work on price hedonics, now the principal analytical technique available to account for changes in product quality. It also led him to investigate the issue of how research and development (R&D) investment is linked to the growth of real output. His research on human capital and its relation to the production function led him to formulate a measure of human capital-cum-labor quality. This approach to measuring the contribution of labor (and capital) to economic growth was one of Griliches’s main contributions to the pioneering work on total factor productivity with Dale Jorgenson. The Jorgenson-Griliches collaboration was especially notable because of its insistence that accurate measurement was inextricably linked to economic theory: the theory of production implied an internally consistent accounting framework for the data, the theory outlined specific measurement methods, and price and quantity data that did not conform to this framework could lead to biased and uninterpretable results. This insight is at the heart of current efforts to improve the U.S. National Income and Product Accounts.
Introduction
5
The study of multifactor productivity led Griliches to the question of the accuracy of service-sector output. Aggregate productivity growth had slowed in the 1970s, and one explanation was the shift in the composition of output toward service-producing industries where output growth measures are problematic and likely biased downward. The 1992 NBER Conference on Research in Income and Wealth (CRIW) volume that he edited (Griliches 1992) was the most comprehensive summary of measurement problems in these “hard-to-measure” sectors of its time. Moreover, his 1994 article (Griliches 1994), which coined the terms measurable and unmeasurable sectors (the latter including construction, trade, finance, other services, and government) focused attention on the breadth of the problem and challenged the view that each service industry was a special problem to be dealt with on its own. While improvements in the accuracy of service-sector outputs must recognize the unique characteristics of each sector’s products, there is a unity to the problem: in all cases, the problem emanates from the fact that the units of measurement of the underlying product are very difficult to define (what is the “output” of a bank, a lawyer, a consultant, or a college professor?). In other words, one must know what “it” is before trying to measure “it.” Griliches’s emphasis on the difficulties in measuring outputs and prices in the service sectors is not just an academic issue but also has substantive policy implications. For example, in his 1998 address at the annual meetings of the American Economic Association and the American Finance Association in Chicago, Federal Reserve Board Chairman Alan Greenspan stated: Of mounting importance is a deeper understanding of the economic characteristics of sustained price stability. We central bankers need also to better judge how to assess our performance in achieving and maintaining that objective in light of the uncertainties surrounding the accuracy of our measured price indexes. . . . The published price data indicate that the level of output per hour in a number of service-producing industries has been falling for more than two decades. It is simply not credible that firms in these industries have been becoming less and less efficient for more than twenty years. Much more reasonable is the view that prices have been mismeasured, and that the true quality-adjusted prices have been rising more slowly than the published price indexes. Properly measured, output and productivity trends in these service industries are doubtless considerably stronger than suggested by the published data. (Greenspan 1998) Many goods and services are easy to measure badly but difficult to measure well, and the units-of-measurement problem is by no means restricted to intangible service-sector outputs. Similar issues arise with tangible outputs and inputs where there is important product variety: different technological vintages of capital goods, workers with varying amounts of human capital, alternative qualities of automobiles. Treating all investment in
6
Ernst R. Berndt and Charles R. Hulten
computing equipment or all worker hours as a homogeneous input with an implicit common unit of measurement, or treating real aggregate expenditure for medical goods and services as a homogeneous output, runs the risk of misstating the true growth of the economy as well as the rate of price inflation. Some of the consequences of such a misstatement were highlighted in earlier remarks by Greenspan in 1995, who indicated to the Senate Finance Committee that he believed the U.S. Consumer Price Index (CPI) overstated true inflation by between 0.5 and 1.5 percent per year. A bias of this potential was particularly important to monetary policymakers in an environment of low measured inflation in the 1990s. (It is even more critical today, as macroeconomists consider the possibility of low measured inflation actually implying a deflationary environment due to continued failure fully to capture the price index consequences of quality improvements embodied in new goods.) It was also of great importance to fiscal and income security policy as the CPI is widely used for cost-of-living adjustments. A commission was established to study the problem, chaired by Michael Boskin, of which Griliches was a member; this commission concluded that in 1995 the best estimate of the bias in the CPI was about 1.1 percent per year (see Boskin et al. 1996) and that a bias of this magnitude would cost the federal government around $1 trillion over the succeeding twelve years. Debates over the existence of a “new economy” also depend critically on the accuracy of statistics on real output and input. Based on his study of the history of lighting, one prominent academic researcher, William Nordhaus (1997), was led to observe that “The bottom line is simple: traditional price indexes of lighting vastly overstate the increase in lighting prices over the last two centuries, and the true rise in living standards in this sector has consequently been vastly understated” (Nordhaus 1997, 30). The hightech meltdown of 2000 underscores the need for accurate statistics on the prices and quantities in an era of rapid technological change. This means that the measurement problems of the hard-to-measure outputs (and inputs) must be confronted head-on. This was the central theme of the conference held to honor the memory of Zvi Griliches. The CRIW Conferences in Honor of Zvi Griliches In recognition of Zvi Griliches’s contributions to the cause of economic measurement and to identify and build on ways in which further progress can be made in improving the quality of our economic statistics, the CRIW sponsored a conference held in the Washington, D.C. area on September 19–20, 2003. This conference focused primarily on economic measurement issues in the areas of productivity, price hedonics, capital measurement, diffusion of new technologies, and output and price measurement in hardto-measure sectors of the economy. An earlier conference was held on Au-
Introduction
7
gust 25–27, 2003, in Paris, France; it focused on other legacies of Griliches, such as returns to R&D, international diffusion of new technologies, econometric tools for dealing with measurement errors of various types, and the economics of intellectual property rights. For the most part, though not exclusively, papers presented at the Paris conference comprise a volume edited by Jacques Mairesse (ENSEE) and Manuel Trajtenberg (Tel Aviv University), assisted by Ernst R. Berndt and Charles R. Hulten, under the title of Zvi Griliches’s last book, R&D, Education and Economic Growth, while those presented at the Washington conference appear in this volume. Summary of Papers at the Conference on the Hard-to-Measure Sectors of the Economy The chapters included in this tribute to Zvi Griliches encompass a series of topics in economic measurement to which he contributed directly, exhibited an abiding interest, or supported indirectly through his role as director of the NBER Program on Technological Change and Productivity Measurement. The chapters are linked by the theme of hard-to-measure goods and services and range over themes mentioned earlier: the measurement of service sector outputs, the measurement of capital and labor inputs, issues in the consistent measurement of input quantities and productivity growth, measurement error, the diffusion of new technologies, and the challenges posed by the definition and measurement of output in the new economy. We begin and end this volume with chapters that focus specifically on Zvi Griliches’s contributions to economic measurement. In chapter 1, “Theory and Measurement: An Essay in Honor of Zvi Griliches,” Charles R. Hulten provides an initial overview of Zvi’s contributions to the cause of economic measurement in the context of how the field of economics (and its general attitude to measurement issues) evolved during the period spanned by his career. In order to appreciate fully the magnitude of Griliches’s contributions to measurement, Hulten argues that it must be recognized that the whole was greater than the sum of its parts. Hulten’s chapter also examines the link between data and theory in the context of Koopmans’ (1947) famous injunction to avoid “measurement without theory.” One of the great achievements of Griliches’s career was to demonstrate how this injunction could be implemented. Hulten then looks to the future of the Koopmans’ injunction and argues for the need to account for possible feedback effects arising from the impact of mismeasurement on the behavior of economic agents, and the associated need to take into consideration the political economy context of measurement bias. The final chapter in this volume, chapter 19—“Zvi Griliches’s Contributions to Economic Measurement,” is based on a luncheon address at the
8
Ernst R. Berndt and Charles R. Hulten
conference by Jack E. Triplett. Triplett reviews the measurement problems on which Griliches worked, those on which he did not work directly but on which he had a significant influence, and those that will likely continue to be important in the future. He also discusses in greater detail Griliches’s interactions with government economists and statisticians in the various statistical agencies. Triplett emphasizes that Zvi Griliches’s impact on measurement extended far beyond his immediate research, including that on his many students and colleagues (and their students and colleagues), as well as from his leadership in the measurement community as a whole. In between these two chapters focusing specifically on Zvi Griliches’s lasting contributions, are sections devoted to issues involving the role of information technology in productivity growth (chapters 2 through 5), specific issues involving the measurement of capital and labor inputs within a consistent framework (chapters 6 through 9), various aspects of price measurement (chapters 10 through 15), analyses of data sets old and new (chapters 16 and 17), and a surprising update of Griliches’s classic paper on the diffusion of hybrid corn (chapter 18). Classic Input Measurement Issues Revisited The next set of four chapters deal with an issue to which Griliches made one of his most important contributions: the accurate measurement of capital and labor inputs and the associated hypothesis, with Dale Jorgenson, that much of what is recorded as total factor productivity is actually measurement error. The first chapter in this section, chapter 2—“Production Function and Wage Equation Estimation with Heterogeneous Labor: Evidence from a New Matched Employer-Employee Data Set,” by Judith Hellerstein and David Neumark deals with labor rather than capital measurement. Hellerstein and Neumark report on efforts underway to link hours worked to labor force characteristics, an innovation that promises to increase the accuracy of the labor input measures used in various analyses of productivity. This is a subject pioneered by Griliches in his early efforts to incorporate human capital in the structure of production. Here Hellerstein and Neumark use cross-sectional data to derive direct estimates of the impact of human capital on output. These findings are then compared to the conventional productivity approach that assumes wages are a satisfactory proxy for the direct effect. In chapter 3, “Where Does The Time Go? Concepts and Measurement in the American Time Use Survey,” Harley Frazis and Jay Stewart report on a newly introduced Bureau of Labor Statistics survey that eventually will provide time series data on the use of household time, both market and nonmarket. Initial results reported in this chapter indicate that the value of nonmarket household time (i.e., household production) amounted to more than $3 trillion in 2003, or about 30 percent of current gross domestic
Introduction
9
product (GDP). In addition to providing information of great value to labor economists, this new data series will be useful for gaining insight into the widely noted divergence between the estimates of employment and wages obtained from the household-based Current Population Survey and the establishment-based Current Employment Statistics program. In chapter 4, “Technology and the Theory of Vintage Aggregation,” Michael J. Harper reexamines a well-known and conceptually difficult aspect of the vintage asset problem: the aggregation of different technological vintages of capital. Harper explores several of the salient theoretical issues and provides an important reminder that much of the empirical literature on the sources of economic growth rest on simplifying assumptions that may not be true. A related set of capital measurement issues is addressed in chapter 5, “Why Do Computers Depreciate?,” by Valerie Ramey, Matthew Shapiro, and Michael Geske. These authors focus on another difficult capital measurement problem—measuring the economic depreciation of computers using the “vintage” price of used computers and separately identifying and measuring the obsolescence and deterioration components of economic depreciation. This separation reveals that the decline in the price of a computer as it ages is largely due to obsolescence and a decline in replacement cost, with only a negligible effect attributed to physical deterioration. Quality Adjustment and Price Measurement Issues: Recent Developments Price measurement was a prominent subject of Zvi Griliches’s research. In this section of the volume, five chapters are devoted to various issues in price measurement. Over the years, numerous researchers, as well as several commissions, have concluded that the CPI overstates true inflation, particularly when one takes into account quality changes embodied in new goods. Robert J. Gordon has long hypothesized that this CPI upward bias phenomenon may not be true for the most important component of the CPI—rental shelter—and that in this case the bias might in fact be downward, not upward. In chapter 6, “Downward Bias in the Most Important CPI Component: The Case of Rental Shelter, 1914–2003,” Robert J. Gordon and Todd vanGoethem assess and find strong support for the hypothesis that the CPI has been biased downward for its entire history since 1914. The bias appears to have been particularly large, on the order of –1.0 percent annually, prior to the methodological improvements in the CPI that date from the mid-1980s. The next three chapters in this section on price measurement focus on more disaggregated price measures in product markets undergoing rapid technological change. In chapter 7, “Pricing at the On-Ramp to the Internet: Price Indexes for ISPs during the 1990s,” Greg Stranger and Shane Greenstein estimate hedonic price indexes for dial-up Internet service pro-
10
Ernst R. Berndt and Charles R. Hulten
viders (ISPs) in the United States from November 1993 to January 1999. Not taking into account quality changes, Stranger and Greenstein find that ISP price indexes are flat. However, hedonic price indexes reveal a decline of about 20 percent in price per unit of ISP quality between late 1996 and early 1999. In chapter 8, “Different Approaches to Estimating Hedonic Indexes,” Saeed Heravi and Mick Silver review and then compare empirically three general approaches to constructing hedonic price indexes: hedonic imputation, dummy time hedonic indexes, and fixed effects model indexes, with each variant also measured as weighted versus unweighted, chained and fixed base, and arithmetic and geometric mean aggregators. These differing methods are applied to 1998–1999 U.K. scanner data for washing machines, dishwashers, and vacuum cleaners. Heravi and Silver summarize their numerous findings on different methods by reporting results of a meta-analysis. In chapter 9, “Price Indexes for Microsoft’s Personal Computer Software Products,” Jaison R. Abel, Ernst R. Berndt, and Alan G. White report on research based on the universe of Microsoft’s PC-based software transactions in the United States over the July 1993 through June 2001 time period. While previous literature has typically focused on retail prices (mail order in particular), their data encompass the relatively much larger volume licensing and original equipment manufacturer channels. Using matched model methods (as their data are only from one manufacturer, the number of distinct models is too small for hedonic estimation), they take into account product changes, such as upgrades, and the transformation from stand-alone to integrated productivity suites. Although there are differences over time periods and across products, they find that the prices of Microsoft’s desktop operating systems and applications have generally been falling over this time period. Finally, Zvi Griliches had a lifelong interest in assessing the contributions of R&D to economic growth. Over the years he devoted considerable efforts in constructing price indexes for R&D that could be used to deflate R&D expenditures into real or quantity measures of R&D. In order to make international comparisons of the contribution of R&D to economic growth, R&D purchasing power parities (PPPs) relative to the United States have been employed; to date, these R&D PPPs have been assumed to be the same as GDP PPPs. In the final chapter of this section, chapter 10—“International Comparisons of R&D Expenditures: Does an R&D PPP Make a Difference?,” Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, and Bart van Ark develop PPPs for R&D expenditures in nineteen manufacturing industries and six countries (United States, France, Germany, Japan, the Netherlands, and the United Kingdom) for 1987 and 1997, based on separate R&D input prices for various cost categories, particularly labor and materials. They then examine the robustness of various
Introduction
11
R&D PPP measures and argue for a preferred PPP that differs considerably from the current norm. Information Technology and the Acceleration of Productivity Growth Chapter 11, “Information Technology and the G7 Economies,” by Dale Jorgenson describes the growth accounting model developed in his collaboration with Griliches and the elaboration and applications that have followed from it, in this case, international comparisons based on harmonized prices for information technology equipment and software. While the seminal Jorgenson and Griliches (1967) study highlighted the importance of disaggregating investment and capital stocks into their equipment and structures components, here Jorgenson focuses on disaggregating equipment into its information technology (IT) and non-IT components. Jorgenson finds that a powerful surge in investment in IT after 1995 characterizes all of the G7 economies. These IT investments accounted for a large portion of the resurgence in U.S. economic growth and a substantial but smaller portion in the remaining G7 economies. More generally, Jorgenson finds that investment in tangible assets was the most important source of economic growth in the G7 nations and, in particular, that the contribution of capital input exceeded that of productivity growth for all countries in all periods. The next two chapters focus more specifically on microprocessors and IT equipment. In chapter 12, “The Role of Semiconductor Inputs in IT Hardware Price Declines: Computers versus Communications,” Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid calculate industryspecific semiconductor input price indexes and then assess the relative impact of changes in this high technology input price on the prices and quality improvement in two high-tech downstream industries—PCs and communications equipment. They find that between 1997 and 1998, changes in semiconductor input prices appear to account for 20–30 percent of price declines in both consumer electronics and local area network (LAN) equipment and for 40–60 percent of price declines in computers. They conclude that differences in the composition of semiconductor input bundles, coupled with significant differences in the relative importance of semiconductor inputs in cost, potentially account for the entire difference between output price declines in the computer and communications equipment producing industries. In chapter 13, “Computer Investment, Computer Networks, and Productivity,” B. K. Atrostic and Sang Nguyen contribute to the debate over how the IT revolution affects economic growth. Atrostic and Nguyen specifically focus on how computers are used, not just how many there are. They hypothesize that the productivity of a single computer is enhanced when it is connected to other computers. Using data from the new Com-
12
Ernst R. Berndt and Charles R. Hulten
puter Network Use Supplement (CNUS) data base, Atrostic and Nguyen assess empirically the magnitude of this network effect on productivity growth. They find that the impact of networks is statistically large in “new” plants and may be as important as investment in computers itself as a source of growth in output per worker. Measuring and Modeling Productivity, Consumption, and Diffusion In chapter 14, “Services Productivity in the United States: Griliches’s Services Volume Revisited,” Barry Bosworth and Jack E. Triplett examine problems associated with measuring output in the service industries of the economy, one of the most prominent hard-to-measure sectors. Bosworth and Triplett begin by summarizing what has been learned from a number of the service-sector conferences held at the Brookings Institution over the last few years. These conferences ranged over issues in finance, insurance and banking, health and education, transportation, and trade. These are all sectors in which conventional measures of output are widely viewed as problematic and which Griliches (1994) dubbed as “unmeasurable” sectors. Bosworth and Triplett provide a brief assessment of current procedures for measuring the output of these sectors and then present estimates of the contribution of the service industries to the recent growth (and pick up) in overall productivity, which they find to be substantial (in apparent contrast to earlier time periods). Information technology investments play an important role in the productivity growth of the service sectors. In chapter 15, “A Consistent Accounting of U.S. Productivity Growth,” Eric J. Bartelsman and Joseph Beaulieu outline and develop a framework for integrating economic statistics from a variety of sources into a unified and internally consistent database. The goal is to present the data in such a way that users can easily change assumptions regarding the way the data are organized and classified so that users can efficiently assess the robustness of their estimates to variations in methodology. The authors illustrate the usefulness of this framework by applying it to productivity measurement in light of the Y2K problem and the possible acceleration of capital retirement during the rush to invest in Y2K-compliant IT capital. When they correct for this effect, the growth rate of multifactor productivity in the nonfarm business sector is found to be larger in the period 1995–1999 and smaller for subsequent years 2000 and 2001. The contribution of capital is correspondingly smaller in the first period and larger in the second. This pattern is of potential importance for the literature on the role of IT investment in the widely discussed post-1995 productivity pick-up. In the following paper, chapter 16—“Should Exact Index Numbers Have Standard Errors? Theory and Application to Asian Growth,” Robert C. Feenstra and Marshall B. Reinsdorf examine the relatively neglected issue of estimating sample variance in the context of constructing exact
Introduction
13
index numbers. Published index numbers are rarely accompanied by an indicator of variability, and it is thus difficult to assess whether a new estimate is significantly different from the previous one. This chapter contributes not only to the technical literature on the subject, but also applies the analysis to the specific case of total factor productivity (TFP) indexes and examines the question of whether TFP growth in Singapore has been negative or positive, which has become an issue of considerable controversy. The authors’ application illustrates both the relevance of the problem and the nature of the solution. The next chapter in this volume focuses very specifically on U.S. government data sets—one on measurement error in the venerable Consumer Expenditure Survey (CES) and the other on household time allocation as measured in the just recently introduced American Time Use Survey (ATUS). Much of Zvi Griliches’s research dealt with measurement error and with issues in the measurement of human capital, including the value of time. In chapter 17, “What Really Happened to Consumption Inequality in the United States?,” Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura consider data quality issues for the analysis of consumption inequality exploiting two complementary data sets from the CES—one known as the Interview sample and the other as the Diary sample. The authors develop a methodology that extracts and combines the most reliable information from each sample to derive a correction for the measurement error affecting observed measures of consumption inequality in the two surveys. They conclude that consumption inequality, as measured by the standard deviation of the log of nondurable consumption, has increased by roughly 5 percent during the 1990s. Griliches’s classic 1957 study of hybrid corn emphasized the importance of economic incentives and profitability in the adoption and diffusion of a new technology. In the final paper in this section, chapter 18—“Technology Adoption from Hybrid Corn to Beta-Blockers,” Jonathan Skinner and Douglas Staiger return to a forty-year-old debate between Griliches and sociologists who emphasized the structure of organizations, informal networks, and “change agents” as forces affecting the diffusion of hybrid corn. Skinner and Staiger consider state-level factors associated with the adoption of a variety of technological innovations over the last seventy-five years: hybrid corn and farm tractors in the first half of the twentieth century, computers in the 1990s, and treatment following heart attacks with beta-blockers during the last decade. They find first that some states consistently adopted new effective technology, whether it be hybrid corn, farm tractors, or effective treatments for prevention of recurrent heart attacks, such as the beta-blockers. Second, the adoption of these new highly effective technologies was closely associated with social capital and state-level 1928 high school graduation rates, but not per capita income, population density, or (in the case of beta-blockers) expenditures on heart attack pa-
14
Ernst R. Berndt and Charles R. Hulten
tients. Skinner and Staiger therefore reopen old debates and suggest new reasons for why medical practice varies geographically. They conjecture that economic models may be useful in identifying why some regions are more likely to adopt early, but sociological barriers—perhaps related to lack of social capital or informational networks—can potentially explain why other regions lag far behind. Future research on factors affecting new technologies—be they agricultural or medical innovations—will undoubtedly continue to assess empirically issues raised by Zvi Griliches in his pathbreaking PhD dissertation.
References Boskin, Michael J., Ellen R. Dulberger, Robert J. Gordon, Zvi Griliches, and Dale W. Jorgenson. 1996. Toward a more accurate measure of the cost of living. Final report of the Advisory Commission to Study the Consumer Price Index. Washington, DC: Government Printing Office. ———. 1998. Consumer prices, the Consumer Price Index, and the cost of living. Journal of Economic Perspectives 12 (1): 2–26. Greenspan, Alan. 1998. Problems of price measurement. Remarks presented at the annual meetings of the American Economic Association and the American Finance Association, Chicago. Griliches, Zvi. 1986. Economic data issues. In Handbook of econometrics. Vol. 3, ed. Zvi Griliches and Michael D. Intriligator, 1466–1514. Amsterdam: Elsevier Science. ———, ed. 1992. Output measurement in the service sectors. Studies in Income and Wealth, vol. 56. Chicago: University of Chicago Press. ———. 1994. Productivity, R&D, and the data constraint. American Economic Review 84 (1): 1–23. Jorgenson, Dale W., and Zvi Griliches. 1967. The explanation of productivity change. Review of Economic Studies 34 (99): 249–80. Koopmans, Tjalling. 1947. Measurement without theory. Review of Economic Statistics 29 (3): 161–72. Morgenstern, Oskar. 1950. On the accuracy of economic observations. 2nd ed., Princeton, NJ: Princeton University Press, 1963. Nordhaus, William D. 1997. Do real-output and real-wage measures capture reality? The history of lighting suggests not. In The economics of new goods, ed. Timothy F. Bresnahan and Robert J. Gordon, 29–66. Studies in Income and Wealth, vol. 58. Chicago: University of Chicago Press.
1 Theory and Measurement An Essay in Honor of Zvi Griliches Charles R. Hulten
1.1 Introduction Several revolutions in the field of economics occurred more or less simultaneously during the decades following World War II. One was the diffusion of formal mathematical techniques into economic theory with the Arrow-Debreu revolution in general equilibrium analysis and the HicksSamuelson revolution, which recast standard economics in a form that made quantitative analysis possible. At the same time, great advances were made in econometric analysis as techniques from statistics and agricultural economics were combined with the functional forms of the mathematical models. Computing power also increased dramatically during this period. Last, but certainly not least, the nation’s macroeconomic statistics were compiled in 1948 into a coherent and internally consistent form in the National Income and Product Accounts (NIPA). Many of the contributors to this transformation are Nobel laureates: Simon Kuznets, George Stigler, Milton Friedman, Tjalling Koopmans, and Richard Stone. Although the contributions of Zvi Griliches were regrettably not recognized with a Nobel Prize, he was a figure of comparable importance. The significance of his early research in the field of applied econometrics and in the study of the factors making for technological change were recognized in 1965 when he was awarded the John Bates Clark Medal. His many contributions are reviewed in the last book he completed before his death in 1999, R&D, Education, and Productivity; they cover a broad range of subjects but are organized on the central question of how Charles R. Hulten is a professor of economics at the University of Maryland, chairman of the executive committee of the Conference on Research in Income and Wealth (CRIW), and a research associate of the National Bureau of Economic Research.
15
16
Charles R. Hulten
knowledge is generated, diffused, and embodied in production and growth processes. The importance of careful measurement is another central theme in Griliches’s research. This theme permeated his work in specific areas like human capital and R&D, and while it did not produce the kind of explicit results that could be easily assessed, in sum it constituted one of his most important legacies. Griliches was keenly aware of the injunction by Tjalling Koopmans (1947) that measurement without theory is to be avoided and made major contributions to developing the linkages between the two. His seminal 1967 paper with Dale Jorgenson on the measurement of productivity change is a virtual manifesto on the need for theory to guide measurement practice. Yet, at the same time, Griliches was also deeply concerned about the limitations on empirical work posed by inadequate data and the tendency for economists to substitute more theory for better data when the latter are found wanting. His presidential address to the American Economic Association (Griliches 1994) is a forceful reminder of the danger of theory without measurement in empirical work. Many others have paid tribute to his contributions to the various topics on which he worked. I will pay my tribute to his contributions to measurement as a whole and to the importance he attached to the accuracy of economic data. In addition to reviewing some of the key issues of the theorymeasurement dichotomy, my remarks will build on this theme by stressing the essential duality of theory and measurement and argue that this dichotomy itself is potentially deceptive. First of all, “getting the data right” often requires “getting the theory right.” Second, theory and measurement are not separable aspects of economic activity in the sense that the accuracy of the data can affect subsequent economic decisions, and a complete economic theory must allow for possible feedback effects associated with inaccurate data. Finally, considerations of political economy come into play because the accuracy of economic measurement affects policy in a variety of ways, and changes in policy can, in turn, affect economic behavior. 1.2 The Quantitative Transformation The convergence of theory, statistics, and data in the 1950s and 1960s involved more than the incorporation of new techniques and methods: it also involved a shift in the sense of what was possible. Koopmans, for example, starts his famous article on “Measurement without Theory” with the example of synergism between the empiricism of Tycho Brahe and the theory of Johannes Kepler in the development of celestial mechanics. While no one seriously believed that the laws of economics were as precise as those of physical science, there was a new sense of the possibilities of quantitative economics and a newfound faith there was enough stability in the
Theory and Measurement: An Essay in Honor of Zvi Griliches
17
underlying economic structure that it is meaningful to give the “laws” of economics an exact mathematical form. And this translated into a program of research that encouraged the partnership of theory and measurement. This structuralist program is apparent in the development of empirical demand and production theory, as well as in the theory of growth. The early phases of this development transformed the largely graphical and heuristic description of the theory of the consumer and producer into a corresponding system of structural equations. This was followed by the effort to express the system in a form suitable for econometric analysis, in part by assigning specific functional forms to the demand and production systems and, more generally, to developing the nonstochastic specification of the system. In the estimation of production and investment functions, for example, ever more flexible functional forms were developed that allowed for more complex interactions between capital and labor inputs: the fixed-proportion and Cobb-Douglas forms were generalized into the constant elasticity of substitution (CES) function (and others), which in turn yielded to the translog and generalized Leontief forms. Duality theory was also developed during this period. The stochastic specification received relatively less attention at this time. Which variables to include in the analysis, and how they should be defined and measured, was also an important part of the nonstochastic specification of econometric models. The debate over whether real output should be measured net or gross of economic depreciation and a parallel debate over capital were central to the famous exchanges between Jorgenson and Griliches (1967, 1972) and Denison (1972). Questions about the inclusion of research and development and its relation to the multifactor productivity residual should also be mentioned (Griliches 2000, chapter 4), as should the role of the Jorgenson (1963) user cost of capital in the accounting system (Christensen and Jorgenson 1969, 1970). A key insight emerged from these debates: theory and measurement were “dual” in the sense that a testable theory of economic growth is associated with a set of data accounts that corresponded to that theory. In this view, theory and measurement imposed mutual constraints on each other. This duality is nowhere more apparent than in the fundamental national income accounting identity between the value of output and the value of factor input. This identity can be derived from a constantreturns-to-scale production function using Euler’s theorem and the assumption of marginal cost pricing. Conversely, if one starts with the accounting identity, different structures of production are implied according to how the prices and quantities are actually measured and interpreted, as well as by what is included in the analysis. This is the Koopmans’s injunction writ large.
18
Charles R. Hulten
1.3 “Data Woes” Unfortunately, the partnership between theory and measurement proved far from equal, and Griliches devoted a significant part of his 1994 presidential address to the American Economic Association to a discussion of this issue. In a section titled “Data Woes,” he asks the following question: Why are the data not better? The facts themselves are not in dispute. Every decade or so a prestigious commission or committee produces a report describing in detail various data difficulties and lacunae. (Griliches 1994, 14) He acknowledges that he “really doesn’t have good answers to this question,” but goes on to offer three observations. First, “measurement problems are really hard”; second, economists have little influence over the budget for data collection activities, and the statistical agencies are balkanized; and, finally, speaking of the economics profession: We ourselves do not put enough emphasis on the value of data and data collection in our training of graduate students and in the reward structure of our profession. It is the preparation skill of the chef that catches the professional eye, not the quality of the materials in the meal, or the effort that went into procuring them. (Griliches 1994, 14) This observation is an ironic inversion of Koopmans’s injunction against “measurement without theory”; it is conceptual ingenuity that builds the careers of both the brilliant chef and the successful academic. The ingredients are of secondary importance, and this leads to the risk of too much theory without measurement, or, more accurately, theory with measurement so long as someone else finds the ingredients. The published data series provided by government statistical agencies are, after all, Samuelsonian public goods, and academic researchers are aware of the disincentives associated with the attendant appropriability/free-rider problem.1 This situation is well illustrated by the notoriously hard problems involved in measuring the output of the service-producing sectors of the economy. Griliches pointed out the true source of the difficulty: the inability to define in principle what exactly is meant when we speak of the out1. It should be emphasized that the neglect of measurement described by Griliches in the mid 1990s was relative to the emphasis it received in earlier periods. And many academic economists, like Griliches, did soldier on despite the diminished interest in measurement by academic economists as a whole. The Productivity Program at the National Bureau of Economic Research, directed by Griliches, and those associated with it deserve special mention in this regard. The same should be said of the CRIW, which was founded in the 1930s to work on the conceptual problems associated with the development of the national accounts and which has continued to promote research on data issues. However, it is also true that during the period with which Griliches was concerned, publication of data-oriented research in prestigious journals became more difficult, and courses on subjects like national income accounting disappeared from graduate curricula.
Theory and Measurement: An Essay in Honor of Zvi Griliches
19
put of banks, insurance companies, doctors, lawyers, teachers, and so on, which he termed the “hard-to-measure” sectors. It is relatively easy for theory to work with concepts like “real output” in an abstract way, without having to take on the hard issue of just what it really means or, significantly, to define the units in which the output of services are to be measured. If theory fails to help with this problem, how is the statistician to implement the Koopmans’s injunction? As Griliches puts it: [I]t is not reasonable for us to expect the government to produce statistics in areas where concepts are mushy and where there is little professional agreement on what is to be measured and how. (Griliches 1994, 14) Moreover, the difficulty of linking theory and measurement became progressively harder as theory evolved. The theoretical models of the early stages of the Hicks-Samuelson revolution tended to be highly aggregative, or based on the representative agent, and also tended to assume perfect competition (e.g., the aggregate production function underlying growth theory and growth accounting). This made for a tidy paradigm to guide the development of macroeconomic data, but theory also showed that the conditions that made for exact aggregation were highly unlikely (Fisher 1965). Subsequent theoretical development took a more disaggregated view of the world, a world that is inherently more messy, with heterogenous agents and imperfect information and competition. Correspondingly, interest in measurement issues became more “micro” and cross-sectional (panel data) as well as more field specific. Academic interest in the quality of “official” macro statistical series waned, even as empirical work with microeconomic data sets increased. Moreover, the structural parameters of economic models became harder to identify as these models became more complicated, and the structuralist paradigm of the earlier period was challenged by reduced-form approaches. The current debate over the interpretation of regression coefficients in the hedonic price model is a recent example of the tension between the two approaches (Pakes 2003; Hulten 2003). Thus, the profession’s apparent neglect of data issues was, in part, a shift in emphasis from macro to micro levels of theory and measurement. Still, it would be hard to make a persuasive case that the economics profession placed much priority on its data during this period. As Zvi Griliches (1986) observed in his contribution to the Handbook of Econometrics, the term data is the plural form of the Latin word datum, which means “given,” and one might further say that researchers in this period were generally happy to be given their data. 1.4 The Greenspan Critique Both theory and data are important elements in the formulation of monetary and fiscal policy. Getting the theory “right” is a priority, as witnessed
20
Charles R. Hulten
by the debate of the Phillip’s curve in the analysis of price inflation (the critiques of Friedman [1968] and Lucas [1976] are discussed below). But it is also important to know what the current rate of inflation actually is before deciding whether policy intervention is needed. It became increasingly apparent to policymakers in the 1980s and 1990s that the lack of reliable data was a binding constraint on policy formulation. The most pointed criticism came from Federal Reserve Board Chairman Alan Greenspan in the mid-1990s. He suggested, in remarks to the Senate Finance Committee in 1995, that the growth rate of the CPI might be biased upward by .5 to 1.5 percentage points per year (Greenspan 1995). This is a very significant bias given that the year-to-year change in the CPI averaged around 3 percent in the years immediately preceding Greenspan’s remarks. An upward bias of this magnitude presented a rather different picture of price inflation and, as we shall see, had major implications for programs like Social Security, in which expenditures are indexed for inflation using the growth rate of the CPI. After a panel of prominent economists concurred with the Greenspan’s conjecture about 1 percentage point bias, a commission was subsequently established to investigate further (chaired by Michael Boskin, whose earlier efforts to improve the quality of economic data became known as the “Boskin Initiative”). The commission, of which Griliches was a member, also found a bias of around 1 percentage point, and attributed about half to methodological issues and half to a failure to capture dynamic improvements in product quality and the development of new goods. This second source of bias in the CPI pointed to another dimension of the Greenspan critique: Greenspan (1998) also questioned the ability of existing macrodata to represent the true dynamism of the American economy, citing the implausibly low estimates of real output growth in the service sectors of the economy. The NIPA, for example, evolved at a time (the 1930s and 1940s) in which the production of tangible goods in manufacturing, agricultural, and natural resource sectors were the major source of gross domestic product (GDP; according to NIPA estimates, the service sectors accounted for 36 percent of GDP in 1947, shortly before the NIPA were launched, and were 56 percent by 1997). With this intersectoral shift came a shift away from the tangibility of output and toward the hard-tomeasure intangible services where the units of real output, and thus economic growth, are hard to pin down. A similar problem occurs with the measurement of “knowledge” capital (Corrado, Hulten, and Sichel 2005). Much of the dynamism of the U.S. economy is the result of scientific and technological innovation, which is reflected in the rapid rate of product and process innovation arising from the commitment of resources to education, research, and development activities. As with services, the output associated with these activities is largely intangible and hard to measure: in what units should knowledge be
Theory and Measurement: An Essay in Honor of Zvi Griliches
21
measured? In what units should innovation-induced quality differences in successive generations of computers be measured? The Boskin Commission estimated that half of its CPI bias was due to the mismeasurement of quality improvement and the introduction of new goods (Boskin et al. 1996), and Shapiro and Wilcox (1996), who produced similar estimates, likened the measure of quality change to “house-to-house combat.” Moreover, estimates of investment by the U.S. private business sector during the 1990s suggest that expenditures for intangible capital were as large as spending for fixed capital. The latter is treated as a component of measured GDP, but intangible investment is not. The U.S. statistical system has confronted these challenges: changes to the CPI have reduced the growth rate of the index by around three-quarters of a percentage point, again a rather sizeable change in percentage terms and a significant proportion of the bias estimated by the Boskin Commission. Programs have been initiated or planned at the Bureau of Economic Analysis that are aimed at more accurate measurement of service-sector output, at improving the way industry output is measured, and, more generally, at better characterization of the role of knowledge in the evolution of the economy (quality-adjustment of high-technology goods, capitalization of software expenditures, and the prospective incorporation of research and development [R&D] investment into the NIPA). Many of these changes have been made in the context of a renewed concern for the architecture or overall design of the statistical system and, in particular, with a concern for how theory shapes measurement.2 In sum, the years since Greenspan’s critique of the mid-1990s have seen an acceleration of change in the macrostatistical system along Koopmansian lines. These years have also seen a renewal of the partnership between theory and measurement, and between academe and the statistical agencies. Though it is difficult to quantify, this renewal is reflected in the membership of the various CPI commissions, in the increased number of conferences and workshops attended by economists from both academe and the agencies (the CRIW has greatly expanded its activities in this area), and in the academic membership in agency advisory committees. These trends are part of the Griliches legacy. 2. The recent CRIW conference on “A New Architecture for the National Accounts,” organized by Dale Jorgenson, Steve Landefeld, and William Nordhaus in 2004, is noteworthy in this regard, as is the paper by Fraumeni and Okubo (2005) presented at an earlier CRIW conference on “Capital and the New Economy.” It is also worth noting, as a matter of historical perspective, that the “architecture” of the U.S. NIPA were strongly influenced by the data needs associated with the Great Depression and World War II. This influence left the NIPA with a distinctly Keynesian personality oriented to accounting for short-run expenditures flows. As interest shifted toward explaining the high rate of economic growth and innovation after World War II, growth and production theory became the theoretical paradigms for the design of the national accounts rather than Keynesian demand-management theory, and the problems of accounting for capital input and real output assumed a new design priority.
22
Charles R. Hulten
1.5 Koopmans Redux The Koopmans’s injunction is grounded in a view of theory and measurement inspired by the physical sciences: the data associated with a system, physical or economic, are a reflection of the principles that guide the evolution of the system, and thus these principles are the ones that must be used to organize the data used to study the system. However, while the analogy with physical science is instructive, it is flawed in one important respect. In the physical world, an atom of iron cannot decide to become a chlorine atom in order to maximize utility; in the economic world, a steel worker can decide to switch jobs and become a chemical-industry worker on the basis of the available information about the attractiveness of employment in the two industries. Moreover, if that information is inaccurate or erroneous, the error may well end up affecting the worker’s future life. In other words, there are feedback mechanisms in economic systems through which measurement error can affect the evolution of the economy. A complete description of the system must therefore include a theory of how measurement error and, more generally, how partial or inaccurate information interacts with the system as a whole. Two feedback mechanisms are important in this regard. The first involves the perception of measurement error on the part of economic agents and their behavioral reaction. In general, each new round of official macrostatistics adds to the body of past statistics and is interpreted by economic agents in light of the perceived accuracy of those statistics. Subsequent decisions are based on the interpretation given to each increment to the decision makers’ information set, and data bias can therefore become embedded in the evolution of subsequent behavior. This behavioral effect can be thought of as an economic bias, and the key point is that the behavioral bias may be different from the underlying statistical bias. A statistical bias in the CPI might, for example, cause people to alter their behavior relative to what it would have been if the statistical bias had been zero, but there is no straightforward link between the size of the measurement error and the behavioral reaction to it. Indeed, a systematic statistical bias in the CPI may not have any economic bias at all if agents are not “fooled” by the error and therefore do not alter their behavior. This result might seem, at first glance, to be counterintuitive, but it is in fact an extension of the Friedman analysis of the Phillips curve and the Lucas (1976) critique of policy effectiveness. Friedman and Lucas argued that a tradeoff between inflation and unemployment assumes a high degree of consumer ignorance and that rational consumers would “pierce the veil of money” and focus on the real economy, leaving unemployment unchanged at its natural rate as inflation accelerated. The same logic applies to measurement errors. If agents are not fooled into changing their behav-
Theory and Measurement: An Essay in Honor of Zvi Griliches
23
ior by a policy-induced increase in the rate of inflation, they would not be fooled by an error-induced increase. As a more concrete example of this critique, suppose that every agent perceives that the growth of the CPI is biased upward by 1 percentage point each year and that recipients of cost-of-living wage adjustments (COLAs) therefore believe that their COLAs overcompensated them for the effects of price inflation. But so would those bargaining with them in wage-setting negotiations, and the final wage bargain might well reflect this mutual knowledge, leaving the total compensation package (basic wage, COLA, and benefit packages) invariant to the bias. Moreover, if the CPI were returned to its unbiased growth rate on the advice of experts and both sides perceived the change, the previous total compensation package would be unchanged even though the components would be adjusted to reflect the new unbiased COLA. This line of reasoning presumes full rationality and complete information. A data bias may have a different effect in other models of economic decision making. If agents are completely ignorant of the 1 percentage point CPI bias, negotiations would proceed as though the biased COLA were an accurate compensation for inflation, and the bias would not be offset by a reduction in the other components of total compensation. Moreover, removing the CPI bias would now affect the distribution of income and most likely change the subsequent evolution of incomes and relative prices. The possibility of the endogenous feedback of measurement to theory implies that the degree of measurement “accuracy” is an organic and nonseparable aspect of the economic system, and the data are not just external characteristics of the system. This, in turn, opens a new frontier beyond the Koopmans’s injunction to be explored in future debates over the meaning of data accuracy and the corresponding role of theory. Future research must also come to terms with a second type of data feedback mechanism: the possibility that economic agents may want to change official statistics instead of changing their economic behavior. Where the first type of feedback described previously operates directly through the choices of individual economic agents, the second operates indirectly through collective choice in the political arena. The opportunity for the political-economy feedback effect arises because many of the relevant macroeconomic statistics are produced by government agencies, and agency budgets and programs are subject to the control of the executive and legislative branches of the federal government, which, in turn, are elected by the very agents whose welfare and decisions may be affected by the accuracy of the data. The possibility that some agents may exert pressure through the political process puts the issue of data accuracy on a slippery political slope. It
24
Charles R. Hulten
is tempting to stand on the moral high ground and assert that the national statistical system should be insulated from the political process and reflect only the experts’ assessment of best-practice measurement technique. However, while this position may seem unassailable from the heights of the ivory tower and, while it embodies a large dose of truth, it ignores the larger reality of the national statistical system: like all other aspects of government in a democratic system, a nation’s official statistics are subject to the consent of the governed. It is ultimately up to the governed to decide whether to accept expert advice about statistics like the CPI (or, more accurately, which expert’s advice to choose). The public may be well advised to insulate the statistical system from the vicissitudes of politics, but it is their choice to make. The history of the CPI over the last decade again illustrates the importance of this political economy issues. Perhaps the most startling finding of the Boskin Commission was the cost to the government of the estimated CPI bias: approximately $1 trillion over twelve years in Social Security and other COLAs based on the CPI. Taken as an unintended government outlay, the estimated bias would rank as one of the largest federal government programs. Conversely, the consequence of removing the CPI bias to those receiving COLAs or inflation escalators based on the CPI was equally $1 trillion. Surely those whose future pensions are reduced have a right to know the reasons for the reduction and to challenge the expert opinion supporting the change with their own analysis to tell their side of the story. The implication of these political economy considerations for economic measurement are illustrated by a National Research Council (NRC) committee empaneled to study the CPI in light of the Boskin Commission report. The NRC report At What Price? (2002) produced two competing visions of the CPI, with different implications for the distribution of Social Security benefits. The first is based on the economic theory cost-of-living index (COLI), the second on the traditional view of the CPI as a fixed basket of products priced in successive periods, whose composition is updated from time to time (termed by the NRC commission the “cost-of-goods index,” or COGI). The former seeks to ground the CPI as firmly as possible in economic theory, following Koopmans, and provides conceptual support for the policy of reducing the bias in the CPI and thereby decreasing COLAs based on the CPI. The COGI approach, on the other hand, focuses on the shortcomings of theory (e.g., the near impossibility of aggregating data derived from heterogenous consumers) and opts for the more heuristic basis for the CPI—a Laspeyres index applied to a fixed (or slowly changing) market basket of goods and services. It thus provides conceptual support for the larger COLAs of the pre-Greenspan–Boskin Commission period. The weight of opinion of the economics profession is undoubtedly on the side of the COLI approach. This is the standard textbook solution, and
Theory and Measurement: An Essay in Honor of Zvi Griliches
25
it gives the COLI approach the perception of scientific credibility with the public as a whole or, at least, with its representatives in Congress who exercise political oversight over the Bureau of Labor Statistics. It should be remembered, in this regard, that the changes in the CPI precipitated by the Boskin Commission emanated from the political process (the Senate Finance Committee), not the expert community of economists. The “perceived credibility” standard was explicitly invoked by the NRC report in its discussion of the use of price hedonic techniques in the CPI program. It endorsed price hedonics as the most promising method for adjusting CPI prices for changes in product quality, but also cautioned the Bureau of Labor Statistics to proceed slowly in actually implementing the technique because hedonic regressions sometimes produce “strangelooking variable coefficients [which] could be indicative of larger problems” (NRC 2002, 142). The NRC panel did not elaborate on this doctrine, nor did it define “perceived credibility,” but it did comment that “it is hard to know when a hedonic function is good enough for CPI work” (NRC 2002, 143). However, the report presumably has in mind both the need for professional consensus on the “science” involved and acceptance of this consensus by the public at large. My own conclusion is that the doctrine of perceived credibility applies to measurement issues far beyond its application by the NRC report to price-hedonic problems (Hulten 2003). I have argued, more generally, that the adage that “an old tax is a good tax” also applies to data. People adjust their behavior in response to a tax, winners and losers are sorted out as the tax matures, and changes are negotiated and unforeseen defects are ameliorated. A change in the statistical system operates in much the same way: people adjust their behavior in light of the new data (the first type of feedback effect), and the modifications may be made as the data “matures” (the second feedback effect). The roles of the Boskin Commission and the NRC panel should be seen in this light. 1.6 Conclusion I conclude my tribute to Zvi Griliches with the observation that the quality of data mattered a lot to him. He was bothered by the propensity of many empirical studies to use what he called “found data,” and knew that the quality of the ingredients was at least as important as the method of preparation, to use his own metaphor. He had an uncanny ability in seminars to “sniff ” out bad “ingredients” even when they were artfully concealed. He was one of the few prominent economists insisting on the importance of linking theory and measurement. This insistence was particularly important during the period in which academic interest in macroeconomic data was on the wane. This situation has begun to change as the realization
26
Charles R. Hulten
grows that (a) macroeconomic policy is constrained by inadequate data as much as by inadequate theory, and (b) that problems of economic measurement are often caused by inadequate theory. The presence of potential feedback effects makes the theory-measurement nexus even more difficult. As models of limited information penetrate more deeply into the core of economic thinking, the importance of these feedback effects will become more and more apparent and will force theorists to incorporate the issue of data accuracy into their theoretical models. The field of economic measurement, which Griliches nurtured with such care and dedication, is (one hopes) set to bloom again.
References Boskin, Michael J., Ellen R. Dulberger, Robert J. Gordon, Zvi Griliches, and Dale W. Jorgenson. 1996. Toward a more accurate measure of the cost of living. Final Report of the Advisory Commission to Study the Consumer Price Index. Washington, DC: Government Printing Office. Christensen, Laurits R., and Dale W. Jorgenson. 1969. The measurement of U.S. real capital input, 1929–1967. Review of Income and Wealth 15 (December): 293– 320. ———. 1970. U.S. real product and real factor input, 1929–1969. Review of Income and Wealth 16 (March): 19–50. Corrado, Carol, Charles Hulten, and Daniel Sichel. 2005. Measuring capital and technology: An expanded framework. In Measuring capital in the new economy, ed. Carol Corrado, John Haltiwanger, and Daniel Sichel, 11–41. Studies in Income and Wealth, vol. 65. Chicago: University of Chicago Press. Denison, Edward F. 1972. Some major issues in productivity analysis: An examination of the estimates by Jorgenson and Griliches. Survey of Current Business 49 (5, pt. II): 1–27. Fisher, Franklin. 1965. Embodied technical change and the existence of an aggregate capital stock. Review of Economic Studies 32 (4): 263–88. Fraumeni, Barbara M., and Sumiye Okubo. 2005. R&D in the national income and product accounts: A first look at its effect on GDP. In Measuring capital in the new economy, ed. Carol Corrado, John Haltiwanger, and Daniel Sichel, 275–321. Studies in Income and Wealth, vol. 65. Chicago: University of Chicago Press. Friedman, Milton. 1968. The role of monetary policy. American Economic Review 58 (1): 1–17. Greenspan, Alan. 1995. Prepared statement. In Consumer Price Index: Hearings before the Committee on Finance, United States Senate, 109–15. Washington, DC: Government Printing Office. ———. 1998. Remarks at the annual meeting of the American Economic Association and the American Finance Association, Chicago. Griliches, Zvi. 1986. Economic data issues. In Handbook of econometrics. Vol. 3, ed. Zvi Griliches and Michael D. Intriligator, 1465–1514. Amsterdam: Elsevier Science. ———. 1994. Productivity, R&D, and the data constraint. American Economic Review 84 (1): 1–23.
Theory and Measurement: An Essay in Honor of Zvi Griliches
27
———. 2000. R&D, Education, and Productivity. Cambridge, MA: Harvard University Press. Hulten, Charles R. 2001. Total factor productivity: A short biography. In New directions in productivity analysis, ed. Charles R. Hulten, Edwin R. Dean, and Michael J. Harper, 1–47. Studies in Income and Wealth, vol. 63. Chicago: University of Chicago Press. ———. 2003. Price hedonics: A critical review. Federal Reserve Bank of New York Economic Policy Review 9 (3): 5–15. Jorgenson, Dale W. 1963. Capital theory and investment behavior. American Economic Review 53 (2): 247–59. Jorgenson, Dale W., and Zvi Griliches. 1967. The explanation of productivity change. Review of Economic Studies 34 (July): 349–83. ———. 1972. Issues in growth accounting: A reply to Edward F. Denison. Survey of Current Business 52:65–94. Koopmans, Tjalling. 1947. Measurement without theory. Review of Economic Statistics 29 (3): 161–72. Lucas, Robert E., Jr. 1976. Econometric policy evaluation: A critique. In Carnegie Rochester Conference on Public Policy, 19–46. Amsterdam: North-Holland. National Research Council. 2002. At what price? Conceptualizing and measuring cost-of-living and prices indexes. Ed. Charles L. Schultze and Christopher Mackie. Washington, DC: National Academy Press. Pakes, Ariel. 2003. A reconsideration of hedonic price indices with an application to PCs. American Economic Review 93 (5): 1578–96. Shapiro, Michael D., and David W. Wilcox. 1996. Mismeasurement in the Consumer Price Index: An evaluation. In NBER macroeconomics annual 1996, ed. Ben S. Bernanke and Julio Rotemberg. Cambridge, MA: MIT Press.
II
Classic Input Measurement Issues Revisited
2 Production Function and Wage Equation Estimation with Heterogeneous Labor Evidence from a New Matched Employer-Employee Data Set Judith K. Hellerstein and David Neumark
2.1 Introduction The measurement of the labor input in production functions arose as an important issue in the middle of the twentieth century, when growth accountants speculated that the large “residual” in economic growth calculations might be due not to disembodied technical change, but rather due to mismeasurement of the labor input. Since that time, many economists have implemented methods to try to measure more accurately the quality of the labor input (and, when appropriate, its change over time).1 Recent advances in the creation of matched employerJudith K. Hellerstein is an associate professor of economics at the University of Maryland, and a research associate of the National Bureau of Economic Research. David Neumark is a professor of economics at the University of California, Irvine, and a research associate of the National Bureau of Economic Research. Many aspects of this paper were inspired by the teaching, research, and guidance that we were privileged to receive from Zvi Griliches. We thank Chuck Hulten and John Abowd for helpful comments and Melissa Powell and, especially, Joel Elvery for excellent research assistance. We also thank Kim Bayard, Gigi Foster, and Nicole Nestoriak for help in the construction of the Decennial Employer-Employee Dataset (DEED) data set. Our long-term research with the data sets used in this paper has been supported by National Science Foundation (NSF) grant SBR95-10876, the Russell Sage Foundation, and National Institutes of Health (NIH) grant 1-R01-HD43358-01A1. This paper reports the results of research and analysis undertaken while the authors were research affiliates at the Center for Economic Studies at the U.S. Census Bureau. It has undergone a Census Bureau review more limited in scope than that given to official Census Bureau publications. Results have been screened to ensure that no confidential information is revealed. Neumark is also a senior fellow of the Public Policy Institute of California and a research fellow of the Institute for the Study of Labor (IZA). Research results and conclusions expressed are those of the authors and do not reflect the views of the Census Bureau or the Public Policy Institute of California. 1. For a comprehensive review of the measurement of the labor input, see Griliches (2000, chapter 3).
31
32
Judith K. Hellerstein and David Neumark
employee data sets have markedly improved our ability to measure the labor input at the level of the establishment. These matched data sets contain detailed information on the characteristics of workers in establishments, which can be used to model and measure the labor input directly, accounting for the different types of workers employed in each establishment. Moreover, estimates of the relationships between the characteristics of workers and their productivity can be contrasted to estimates of the relationships of these characteristics to wages to test theories of wage determination. In this paper, we first describe a recently constructed matched employer-employee data set for the United States that contains detailed demographic information on workers (most notably, information on education). This data set, known as the 1990 Decennial EmployerEmployee Dataset (or 1990 DEED), is a match between the 1990 Decennial Long Form of the Census of Population and the 1990 Standard Statistical Establishment List (SSEL), which is created using addressmatching software. It is much larger and more representative than previous matched data using the Decennial Long Form data. We then use the data from manufacturing establishments in the 1990 DEED to update and expand on previous findings—using a more limited data set— regarding the measurement of the labor input and theories of wage determination (Hellerstein, Neumark, and Troske 1999). Finally, we examine estimates of some of the key characteristics of production functions and how sensitive they are to the specification and measurement of the labor input. We find that the productivity of women is less than that of men, but not by enough to fully explain the gap in wages, a result that is consistent with wage discrimination against women. In contrast, we find no evidence of wage discrimination against blacks. We estimate that both wage and productivity profiles are rising but concave to the origin (consistent with profiles quadratic in age), but the estimated relative wage profile is steeper than the relative productivity profile, consistent with models of deferred wages. We find a productivity premium for marriage equal to that of the wage premium and a productivity premium for education that somewhat exceeds the wage premium. Exploring the sensitivity of these results, we also find that different specifications of production functions do not have any qualitative effects on these findings. Finally, the results indicate that the estimated coefficients on the productive inputs (capital, materials, labor quality) as well as the residual variance are virtually unaffected by the choice of the construction of the labor quality input. We explore why this is and discuss why the results might not generalize to estimating production functions and calculating total factor productivity (TFP) growth using longitudinal data.
Production Function and Wage Equation Estimation
33
2.2 The Construction and Evaluation of the 1990 DEED 2.2.1 Introduction Fifteen years ago, data sets matching employees with their employers were virtually nonexistent. Fortunately, since then matched employeremployee data sets have been created, first for other countries and then more recently for the United States. Indeed, in the most recent volumes of the Handbook of Labor Economics (Ashenfelter and Card 1999), a full chapter is devoted to research using these data (see Abowd and Kramarz 1999). This section of the paper reviews the construction and evaluation of a new U.S. matched employer-employee data set, based on the Decennial Census of Population for 1990.2 The key innovation in this data set—which we call the 1990 DEED (Decennial Employer-Employee Dataset)—is that we match workers to establishments by using the actual written worker responses to the question asking respondents to list the name and business address of their employer in the week prior to the Census. These responses are matched to a Census Bureau file containing business name and address information for all establishments in the United States. The resulting data set is very large, containing information on 3.2 million workers matched to nearly one million establishments, accounting for 27 percent of workers who are Long-Form respondents in the Decennial Census and 19 percent of active establishments in the 1990 SSEL, an administrative database containing information for all business establishments operating in the United States in 1990. As it stands, it is the largest national matched employeremployee database covering the United States that contains detailed demographic information on workers,3 making it a rich source of information for studying a variety of questions of interest to labor economists, demographers, and others. 2. For a complete description of the construction and evaluation of the data set, see Hellerstein and Neumark (2003). 3. Work on the construction of the 2000 DEED is underway. Another national matched employer-employee data set currently under construction at the U.S. Census Bureau is a match between state-level data from worker unemployment insurance records and ES-202 records as part of the broader Longitudinal Employer Household Database (LEHD) project. These matched data are very rich in that they contain observations on all workers in covered establishments (not limited to the one-in-six sample of Census Long-Form respondents) and are longitudinal in nature although they do not cover all states (but do cover some of the largest ones). Until recently, these data could not be linked to Decennial Census data, and therefore detailed demographic information on a large sample of workers was not available in the LEHD. In addition, the matching algorithm matches workers to firms within a state rather than establishments so that an exact match between workers and establishments can only be made when the establishment is not part of a multiunit firm within a state. For details, see http://www.lehd.dsd.census.gov. For a good example of how these data can be used, see Abowd, Lengermann, and McKinney (2002).
34
Judith K. Hellerstein and David Neumark
2.2.2 Previous Matched Data Using the 1990 Decennial Census In past research, we have used or created two more limited matched data sets based on the 1990 Census of Population. The first data set we have used covers manufacturing only, and is called the Worker-Establishment Characteristics Database (WECD). The second, which we created, covers all industries and is called the New Worker-Establishment Characteristics Database (NWECD). The matched WECD and NWECD data sets are constructed from two data sources: the 1990 Sample Edited Detail File (SEDF), which contains all individual responses to the 1990 Decennial Census one-in-six Long Form, and the 1990 SSEL. The WECD and NWECD were created by using the detailed industry and location information for employers available in both the 1990 SEDF and the 1990 SSEL to link workers to their employers. The WECD and NWECD have proven very valuable. However, they also have some important limitations that are ameliorated in the DEED. To explain the advantages of the DEED, it is useful to first discuss the construction of the WECD and NWECD and then the construction of the DEED. Households receiving the 1990 Decennial Census Long Form were asked to report the name and address of the employer in the previous week for each employed member of the household. In addition, respondents were asked for the name and a brief (one or two word) description of the type of business or industry of the most recent employer for all members of the household. Based on the responses to these questions, the Census Bureau assigned geographic and industry codes to each record in the data, and it is these codes that are available in the 1990 SEDF. The SSEL is an annually updated list of all business establishments with one or more employees operating in the United States. The Census Bureau uses the SSEL as a sampling frame for its economic censuses and surveys and continuously updates the information it contains. The SSEL contains the name and address of each establishment, geographic codes based on its location, its four-digit Standard Industrial Classification (SIC) code, and an identifier that allows the establishment to be linked to other establishments that are part of the same enterprise and to other Census Bureau establishment- or firm-level data sets that contain more detailed employer characteristics.4 4. In both the SEDF and the SSEL, the level of detail of the geographic codes depends on the location of the employer. In metropolitan areas, the Census Bureau assigns codes that identify an employer’s state, county, place, tract, and block. A block is the smallest geographic unit defined by the Census in the SEDF and the SSEL. A typical block is that segment of a street that lies between two other streets but could also be a street segment that lies between a street and a “natural” boundary such as a river or railroad tracks. A tract is a collection of blocks. In nonmetropolitan areas, the Census Bureau defines tracts as “Block Numbering Areas” (BNAs), but for our purposes, tracts and BNAs are equivalent. A Census designated place is a geographic area or township with a population of 2,500 or more.
Production Function and Wage Equation Estimation
35
Matching workers to employers to create the WECD and the NWECD proceeded in four steps. First, we standardized the geographic and industry codes in the SEDF and the SSEL. Next, we selected all establishments that were unique in an industry-location cell. Third, all workers who indicated they worked in the same industry-location cell as a unique establishment were matched to the establishment. Finally, we eliminated all matches based on imputed data. The WECD is restricted to manufacturing plants and is also matched to data from the Longitudinal Research Database (LRD), which provides the ingredients necessary to estimate production functions. While the WECD and NWECD have yielded new research methods and previously unavailable results, there are a few shortcomings of these data sets that are of serious concern. Because the match is based on the geographic and industry codes, in order to ensure that we linked workers to the correct employers, we only matched workers to establishments that are unique in an industry-location cell. This substantially reduces the number of establishments available for matching. Of the 5.5 million establishments in the 1990 SSEL with positive employment, only 388,787 are unique in an industry-location cell. (These numbers are for the NWECD; they are much smaller for the WECD, which is restricted to manufacturing.) Once we matched to workers and imposed a few other sample restrictions to improve the accuracy of the data, we ended up with a data set including about 900,000 workers in 138,000 establishments, covering 7 percent of all workers in the SEDF and 3 percent of all establishments in the SSEL. Second, although this is still a very large data set, matching on location and industry codes affects the representativeness of the resulting matched data. Establishments in the WECD and NWECD are larger and are more likely to be located in a metropolitan statistical area (MSA) than the typical establishment in the SSEL. In addition, relative to workers in the SEDF, workers in the matched data are more likely to be white and married, are slightly older, and have different patterns of education.5 2.2.3 Overview of the DEED To address these deficiencies, we have developed an alternative method to match workers to employers that does not require establishments and workers to be located in unique industry-location cells. Instead, this method relies on matching the actual employer name and address information provided by respondents to the Decennial Census to name and address information available for employers in the SSEL. When the WECD and NWECD were created, the specific name and address files for Long5. Finally, the matching procedure used in the NWECD is much more likely to result in matches for manufacturing establishments than for nonmanufacturing establishments although that is less relevant for the present paper as it focuses on the manufacturing sector.
36
Judith K. Hellerstein and David Neumark
Form respondents were unknown and unavailable to researchers. Subsequently, we were able to help track down the name and address files and to participate in their conversion from an internal Census Bureau inputoutput language to a readable format. Because this name and address file had been used solely for internal processing purposes, it did not have an official name, but was informally known as the “Write-In” file. We have retained this moniker for reference purposes. The Write-In file contains the information written on the questionnaires by Long-Form respondents but not actually captured in the SEDF. For example, on the Long Form, workers are asked to supply the name and address of their employer. In the SEDF, this information is retained as a set of geographic codes (state, county, place, tract, block), and the employer name and street address is omitted entirely. The Write-In file, however, contains the geographic codes as well as the employer’s actual business name and address. Because name and address information is also available for virtually all employers in the SSEL, nearly all of the establishments in the SSEL that are classified as “active” by the Census Bureau are available for matching. We can therefore use employer names and addresses for each worker in the Write-In file to match the Write-In file to the SSEL. Additionally, because both the Write-In file and the SEDF contain identical sets of unique individual identifiers, we can use these identifiers to link the Write-In file to the SEDF. This procedure potentially yields a much larger matched data set, one whose representativeness is not compromised by the need to focus on establishments unique to industry-location cells. As noted previously, for virtually all establishments in the United States, the SSEL contains basic establishment-level information including geography, industry, total employment, payroll, and an indicator for whether the establishment is a single-unit enterprise or part of a multiunit firm. Moreover, the SSEL contains an establishment identification code that can be used to link establishments in the SSEL to establishments in Census Bureau surveys. So for manufacturing establishments, for example, the establishment identification code can be used to link SSEL establishments to the LRD and related data sets. We rely on this type of link to obtain establishment-level inputs used in the production function estimation. Finally, the SEDF contains the full set of responses provided by all Long-Form respondents, including individual-level information on basic demographic characteristics (e.g., gender, age, race/ethnicity, education), earnings, hours worked, industry, occupation, language proficiency, and immigrant status and cohort. Because the DEED links the SSEL and the SEDF together, we can assemble characteristics of the workforce of an establishment, providing detailed measures of the labor input within establishments. Before we can begin to link the three files together, we select valid obser-
Production Function and Wage Equation Estimation
37
vations from the SEDF (matched to the Write-In file) and the SSEL. Details on how this is done can be found in Hellerstein and Neumark (2003). Most importantly, for the SSEL we eliminate “out-of-scope” establishments as defined by the Census Bureau, as the data in the SSEL for these establishments are of questionable quality because they are not validated by the Census Bureau. 2.2.4 Matching Workers and Establishments Once we select valid worker and establishment observations, we can begin to match worker records to their establishment counterparts. To match workers and establishments based on the Write-In file, we use MatchWare—a specialized record linkage program. MatchWare comprises two parts: a name and address standardization mechanism (AutoStan) and a matching system (AutoMatch). This software has been used previously to link various Census Bureau data sets (Foster, Haltiwanger, and Krizan 1998). Our method to link records using MatchWare involves two basic steps. The first step is to use AutoStan to standardize employer names and addresses across the Write-In file and the SSEL. Standardization of addresses in the establishment and worker files helps to eliminate differences in how data are reported. For example, a worker may indicate that she works on “125 North Main Street,” while her employer reports “125 No. Main Str.” The standardization software considers a wide variety of different ways that common address and business terms can be written and converts each to a single standard form. Once the software standardizes the business names and addresses, each item is parsed into components. To see how this works, consider the case just mentioned. The software will first standardize both the worker- and employer-provided addresses to something like “125 N Main St.” Then AutoStan will dissect the standardized addresses and create new variables from the pieces. For example, the standardization software produces separate variables for the House Number (125), directional indicator (N), street name (Main), and street type (St). The value of parsing the addresses into multiple pieces is that we can match on various combinations of these components, and we supplement the AutoStan software with our own list of matching components (e.g., an acronym for company name). The second step of the matching process is to select and implement the matching specifications. The AutoMatch software uses a probabilistic matching algorithm that accounts for missing information, misspellings, and even inaccurate information. This software also permits users to control which matching variables to use, how heavily to weight each matching variable, and how similar two addresses must appear in order to be considered a match. AutoMatch is designed to compare match criteria in a succession of “passes” through the data. Each pass comprises “Block” and “Match”
38
Judith K. Hellerstein and David Neumark
statements. The Block statements list the variables that must match exactly in that pass in order for a record pair to be linked. In each pass, a worker record from the Write-In file is a candidate for linkage only if the Block variables agree completely with the set of designated Block variables on analogous establishment records in the SSEL. The Match statements contain a set of additional variables from each record to be compared. These variables need not agree completely for records to be linked, but are assigned weights based on their value and reliability. For example, we might assign “employer name” and “city name” as Block variables and assign “street name” and “house number” as Match variables. In this case, AutoMatch compares a worker record only to those establishment records with the same employer name and city name. All employer records meeting these criteria are then weighted by whether and how closely they agree with the worker record on the street name and house number Match specifications. The algorithm applies greater weights to items that appear infrequently. So, for example, if there are several establishments on Main St. in a given town, but only one or two on Mississippi St., then the weight for “street name” for someone who works on Mississippi St. will be greater than the “street name” weight for a comparable Main St. worker. The employer record with the highest weight will be linked to the worker record conditional on the weight being above some chosen minimum. Worker records that cannot be matched to employer records based on the Block and Match criteria are considered residuals, and we attempt to match these records on subsequent passes using different criteria. It is clear that different Block and Match specifications may produce different sets of matches. Matching criteria should be broad enough to cover as many potential matches as possible, but narrow enough to ensure that only matches that are correct with a high probability are linked. Because the AutoMatch algorithm is not exact, there is always a range of quality of matches, and we were therefore extremely cautious in how we accepted linked record pairs. Our general strategy was to impose the most stringent criteria in the earliest passes and to loosen the criteria in subsequent passes, but overall keep very small the probability of false matches. We did substantial experimentation with different matching algorithms and visually inspected thousands of matches as a guide to help determine cutoff weights. In total, we ran sixteen passes, and most of our matches were obtained in the earliest passes. 2.2.5 Fine-Tuning the Matching In order to assess the quality of the first version of our national matched data set, we embarked on a project to manually inspect and evaluate the quality of a large number of randomly selected matches. We first
Production Function and Wage Equation Estimation
39
selected random samples of 1,000 worker observations from each of the five most populous states (CA, NY, TX, PA, IL) plus three other states (FL, MD, CO), which were chosen either because they provided ethnic and geographic diversity or because researchers had familiarity with the labor markets and geography of those states. We also chose from these eight states a random sample of 300 establishments and their 8,088 corresponding matched worker observations. We then manually checked these 16,088 employer-employee matches, of which 15,009 were matches to in-scope establishments.6 Two researchers independently scored the quality of each match on a scale of 1 (definitely a correct match) to 5 (definitely a bad match), and we then examined in various ways how a score below 2 by any researcher was related to characteristics of the business address in the SSEL or SEDF.7 We then refined our matching procedure to reflect what we saw as the most prevalent reasons for bad matches (which represented fewer than 12 percent of matches in the first place) and reran the matching algorithm to produce the final version of the 1990 DEED (at least the final version to date). More details on how the manual checking proceeded, how matches were evaluated, and how we refined the matching procedure can be found in Hellerstein and Neumark (2003).8 6. As we were constructing the DEED, a working group at the Census Bureau was revising the list of out-of-scope industries. We obtained the updated list of the Census Bureau’s outof-scope industries after matching and deleted matches that were in industries new to this updated list. 7. Hellerstein and Neumark (2003) contains examples of matches and their corresponding scores. Table 2A.1 reports frequency distributions of hand-checked scores in the DEED. The top panel contains the information for all hand-checked scores, and the bottom panel contains the information for hand-checked scores for observations where the establishment is listed in the SSEL as being in manufacturing. Note that over 88 percent of our matches for all establishments received a score of either 1 or 2 from both scorers. In manufacturing, almost 97 percent of the matches received a score of either 1 or 2 from both scorers, illustrating that our match algorithm worked particularly well in manufacturing. 8. Note that the DEED does not contain matches that were formed via imputation. While multiple imputation methods would obviously improve the match rate of workers to establishments, it is not clear that it would improve the accuracy of the data across all dimensions that might be relevant to researchers using the DEED. Consider a simple case where an unmatched female worker was imputed to work in a given establishment based on imputation methods that took into account the sex of the worker and partial business address information of the worker. It might be the case that the imputation worked properly to more accurately characterize the fraction female in the establishment (a variable of interest in this paper). Even if that were true, however, the imputation might harm the quality of other relevant information. For example, if the imputation were not based on residential address information, it is quite possible that the imputation would lead to bias in measuring the average distance traveled to the establishment by its workers. Although we do not utilize residential information on workers in this paper, there are research questions that could be addressed using the DEED that would use such information. In other words, in developing the DEED we were most interested in constructing a data set that could be used not only for the specific questions in which we were initially interested, but could also be used by other researchers (and, in the future, by us) to study a host of questions. We therefore chose not to impute any matches.
40
Judith K. Hellerstein and David Neumark
Table 2.1
Means of worker characteristics in manufacturing: 1990 Sample Edited Detail File (SEDF), 1990 Decennial Employer-Employee Dataset (DEED), and Worker-Establishment Characteristics Database (WECD)
Age Female Married White Hispanic Black Full-time No. of kids (if female) High school diploma Some college B.A. Advanced degree Ln(hourly wage) Hourly wage Hours worked in 1989 Weeks worked in 1989 Earnings in 1989 No. of observations
SEDF (1)
DEED (2)
WECD (3)
38.773 (11.901) 0.329 0.676 0.821 0.067 0.080 0.898 1.767 (1.643) 0.389 0.257 0.114 0.036 2.357 (0.570) 12.469 (8.239) 41.929 (8.266) 48.723 (8.511) 29,046.764 (19,033.637)
39.321 (11.205) 0.313 0.855 0.868 0.046 0.059 0.940 1.777 (1.594) 0.414 0.273 0.118 0.037 2.454 (0.506) 13.250 (7.581) 42.612 (7.089) 49.870 (6.640) 28,500.626 (17,773.132)
40.336 (11.141) 0.276 0.866 0.887 0.029 0.067 0.938 1.811 (1.613) 0.440 0.258 0.102 0.031 2.513 (0.494) 13.917 (7.367) 42.426 (7.130) 49.872 (6.612) 29,742.881 (17,017.719)
2,889,274
522,802
128,425
Note: Standard deviations of continuous variables are in parentheses.
2.2.6 The Representativeness of the DEED for Manufacturing Workers To evaluate the representativeness of the DEED for workers in manufacturing, it is useful to compare basic descriptive statistics from the DEED with their counterparts from the SEDF. In addition, to measure the degree to which the DEED is an improvement over the earlier data sets, it is useful to compare these basic statistics to those in the WECD as well.9 Table 2.1 displays comparisons of the means and standard deviations of an extended set of demographic characteristics from the SEDF, the DEED, 9. The WECD contains only manufacturing establishments, while the DEED and the NWECD cover all industries. However, because this paper studies manufacturing establishments, we focus only on comparing data from the WECD and manufacturing establishments in the DEED.
Production Function and Wage Equation Estimation
41
and the WECD. The three columns show the means (and standard deviations for continuous variables) for workers in each data set, after imposing sample inclusion criteria that are necessary to conduct the production function estimation. We exclude individuals from the SEDF who were selfemployed, did not report working in manufacturing, or whose hourly wage was either missing or not between $2.50 and $100. We exclude workers in the DEED and in the WECD who were matched to a plant that did not report itself in the SSEL to be in manufacturing, who were self-employed, and whose hourly wage was either missing or outside the range of $2.50 to $100. In addition, we restrict the DEED and WECD samples to workers working in plants with more than twenty workers in 1989, and more than 5 percent of workers matched to the plant. The size and match restrictions are made in the DEED and WECD because, as we explain in the following, our empirical methods require us to use plant-level aggregates of worker characteristics that we construct from worker data in the SEDF; limiting the sample to larger plants and those with more workers matched helps reduce measurement error. Finally, because the DEED itself only contains limited information on each establishment, and because we want to estimate production functions, we need to link the DEED to a data set that contains detailed information about the DEED manufacturing plants. As in the WECD, then, we link the manufacturing establishments in the DEED to plant-level data from the 1989 LRD,10 and exclude from our sample establishments that do not report in the 1989 LRD or for whom critical data for estimation of production functions (such as capital and materials) are missing. Out of all 2,889,274 workers in the SEDF who met the basic sample criteria, 522,802 (approximately 18 percent) are also in the DEED sample we use in this paper, a substantial improvement over the comparable WECD sample, which contains 128,425 workers who met similar criteria, or just 4.4 percent of all possible matches.11 While the means of the demographic variables in both matched data sets are quite close to the means in the SEDF, the means in the DEED often come closer to matching the SEDF means. For example, female workers comprise 33 percent of the SEDF, 31 percent of the DEED, and 28 percent of the WECD. In the SEDF, white, Hispanic, and black workers account for 82, 7, and 8 percent of the total, respectively. The comparable figures for the DEED are 87, 5, and 6 percent, and in the WECD, they are 89, 3, and 7 percent. There is also a close parallel among the distributions of workers across education categories in all data sets, but the DEED distribution comes slightly closer than the WECD distribution to matching the SEDF. 10. More details about the LRD are given in the following. 11. In table 2.1, if we did not restrict the samples from the DEED and WECD to observations with valid data in the LRD, the match rate between the DEED and SEDF would be 34 percent and between the WECD and SEDF would be 6 percent.
42
Judith K. Hellerstein and David Neumark
Table 2.2
Manufacturing establishment means: 1990 Standard Statistical Establishment List (SSEL), 1990 Decennial Employer-Employee Dataset (DEED), and Worker-Establishment Characteristics Database (WECD)
Total employment Establishment size 21–75 employees 76–150 employees 151–350 employees 351 employees Nondurables In MSA Region North East Midwest South West Payroll ($1,000) Payroll/total employment
SSEL (1)
DEED (2)
WECD (3)
278.635 (713.039)
265.412 (566.378)
353.114 (846.874)
0.312 0.236 0.258 0.193 0.471 0.763 0.231 0.299 0.296 0.173 7,983.219 (27,825.229) 25.478 (9.397)
Percent of employees matched Multiunit establishment No. of establishments
0.725 41,216
0.295 0.249 0.266 0.190 0.451 0.750 0.213 0.382 0.250 0.153 7,730.607 (22,321.237) 26.571 (9.225) 0.107 0.728 20,056
0.217 0.247 0.287 0.250 0.546 0.876 0.307 0.435 0.201 0.056 10,851.890 (36,299.109) 26.525 (8.760) 0.122 0.819 3,101
Note: Standard deviations of continuous variables are reported in parentheses.
In addition to comparing worker-based means in all three data sets, we can examine the similarities across manufacturing establishments in the SSEL, the DEED, and the NWECD. Table 2.2 shows descriptive statistics for establishments in each data set. There are 41,216 establishments in the SSEL; of these, 20,056 (49 percent) also appear in the DEED sample we use in the following, compared with only 3,101 (7.5 percent) in the WECD sample.12 One of the noticeable differences between the WECD and SSEL is the discrepancy across the two data sets in total employment. In the SSEL, average total employment is 279, whereas in the WECD it is 353.13 In principle, this difference can arise for two reasons. First, because the worker data in the WECD come from the Long Form of the Census, which is itself 12. The same set of restrictions on workers in establishments that is used to create the DEED and WECD samples in table 2.1 are used to create table 2.2. That is, an establishment in all three data sets (SSEL, DEED, WECD) must have more than twenty workers in 1989, and for the latter two matched data sets, more than 5 percent of workers must be matched and the necessary data to estimate production functions must be available. 13. Due to our sample restrictions, both of these total employment figures are conditional on the establishment having more than twenty employees.
Production Function and Wage Equation Estimation
43
a one-in-six sample, it is more likely simply on a probabilistic basis that a match will be formed between a worker and a larger establishment. Second, the WECD match is limited to establishments that are unique in their industry or geography cell. This uniqueness is more likely to occur for large manufacturing plants than for small ones. Note that while WECD employment is much higher than SSEL employment, total employment in the DEED (265) is actually quite close to the SSEL figure of 279, suggesting that it is the issue of uniqueness of plants in industry and geography cells that drives up employment in the WECD. Indeed, table 2.2 shows that the whole size distribution of establishments in the DEED is much closer to the SSEL than is that in the WECD. Not surprisingly, then, the industry composition of the DEED is closer to the SSEL than the WECD is. In the SSEL, 47 percent of establishments are classified in industries that produce nondurables; the corresponding numbers for the DEED and the WECD are 45 percent and 55 percent, respectively. This basic pattern exists for (not reported) finer industry breakdowns as well. Examining the distribution of establishments across geographic areas also reveals that the DEED is more representative of the SSEL than is the WECD. In the SSEL and the DEED, 76 percent and 75 percent, respectively, of establishments are in an MSA, while this is true for 88 percent of WECD establishments. Additionally, the regional distribution of establishments in the DEED is more similar to that in the SSEL than is the distribution in the WECD. Finally, payroll per worker is very similar across the three data sets, whereas the percentages of multiunit establishments in the DEED and SSEL are virtually identical (73 percent), while the percentage is markedly higher in the WECD (81 percent). Finally, in table 2.3 we report summary statistics for characteristics of establishments in the WECD and DEED that are not also in the SSEL. These include variables that originate from the LRD, as well as tabulations of the average demographic characteristics of workers across establishments that are generated by the match between workers and establishments in these data sets. The averages of number of workers matched to each establishment, log output (in dollars), and the log of each of the usual productive inputs (capital, materials, employment) are all smaller in the DEED than in the WECD, reflecting the better representation of smaller establishments in the DEED. Interestingly, however, the demographic composition of establishments between the two data sets is very similar, indicating that, at least for manufacturing plants, the correlations between plant size and worker mix are not very large. 2.3 The Quality of Labor Input in the Production Function Assume an economy consists of manufacturing plants that produce output Y with a technology that uses capital, materials, and a labor quality input. We can write the production technology of a plant as
44
Judith K. Hellerstein and David Neumark
Table 2.3
Manufacturing establishment means: 1990 Decennial EmployerEmployee Dataset (DEED) and Worker-Establishment Characteristics Database (WECD) DEED
No. of workers matched Log output ($1,000) Log total employment Log capital Log materials Log wages and salaries ($1,000) Log compensation costs ($1,000) Log estimated wages ($1,000) Proportion of matched employees that are: Female Black Aged 34 or less Aged 35–54 Aged 55 or more Some college Married Managerial/professional workers Technical, sales, administrative, service workers Precision production, craft, and repair workers Operators, fabricators, and laborers
(1)
WECD
Mean (1)
Standard deviation (2)
Mean (3)
Standard deviation (4)
26.031 9.852 4.953 8.446 9.037 8.172 8.176 8.166
50.379 1.303 1.037 1.512 1.516 1.114 1.111 1.129
41.414 10.191 5.179 8.822 9.429 8.401 8.404 8.381
98.358 1.335 1.072 1.526 1.508 1.167 1.164 1.173
0.303 0.055 0.410 0.472 0.119 0.400 0.841 0.173 0.211 0.206 0.411
0.239 0.119 0.218 0.200 0.132 0.234 0.147 0.171 0.164 0.170 0.237
0.295 0.065 0.393 0.478 0.129 0.361 0.839 0.151 0.203 0.199 0.447
0.227 0.120 0.202 0.183 0.120 0.207 0.137 0.153 0.151 0.149 0.218
Y F(K, M, QL),
where K is capital, M is materials, and QL is the labor quality input. Consistent production function estimation has focused on four key issues: (a) the correct functional form for F; (b) the existence (or not) of omitted variables; (c) the potential endogeneity of inputs; and (d) the correct measurement of the inputs to production. Our focus is on the measurement of the labor quality input although we also touch on these other issues. In the United States, the main source of plant-level data has been the LRD, a longitudinal file of manufacturing establishments maintained by the U.S. Census Bureau.14 The LRD is a compilation of plant responses to 14. For a review of papers that use the LRD to assess both cross-sectional and time series patterns of productivity, see Bartlesman and Doms (2000). The LRD is now being phased out by the Census Bureau in favor of the Longitudinal Business Database (LBD), which covers more sectors, provides a more comprehensive link to other Census databases, and does a better job of tracking plant births and deaths. For a brief description of the LRD and a long description of the LBD, see Jarmin and Miranda (2002). A complete and older description of
Production Function and Wage Equation Estimation
45
the American Survey of Manufacturers (ASM) and the Census of Manufacturers (CM). The CM is conducted in years ending in a 2 or a 7, while the ASM is conducted in all other years for a sample of plants. Data in the LRD are of the sort typically used in production function estimation, such as output, capital stock, materials, and expenditures. One of the big limitations of the LRD (and LBD), however, is that it contains only very limited information about workers in plants for any given year: total employment, the number of production workers, total hours, and labor costs (divided into total salaries and wages and total nonsalary compensation). Because of this, the labor quality input that can be utilized using the LRD alone is quite restrictive. Going back to at least Griliches (1960), and including both crosssectional and longitudinal studies using both microdata and more aggregate data, the labor quality input (or its change over time) has traditionally been adjusted—if at all—by accounting for differences in educational attainment across workers. These studies assume that the labor market can be characterized by a competitive spot labor market, where wages always equal marginal revenue products, so that each type of labor, defined by educational attainment, can be appropriately weighted by its mean income. The (change in the) labor quality input can then be measured as the (change in the) wage-weighted (or income-weighted) sum of the number of workers in each educational category. So, for example, if workers have either a high school or a college degree, the quality of labor input, QL, for a plant would be defined as (2)
QL H wC C,
where H is the number of high school-educated workers in the plant, and C is the number of college-educated workers in the plant. The wage of high school-educated workers is normalized to one without loss of generality, and wC is therefore the relative wage of college-educated workers. Equation (2) can be rewritten as (3)
C QL L 1 (wC 1) , L
To be clear, QL is a quality-adjusted measure of the labor input, in its entirety, for a plant. If there are no wage differences between high schooland college-educated workers, the quality of labor input will simply equal the number of workers in the establishment. If college-educated workers are paid more than high school-educated workers, QL will be greater than L.
the LRD can be found in McGuckin and Pascoe (1988). Due to data access limitations and due to a desire to preserve consistency with our previous work (Hellerstein, Neumark, and Troske 1999), we utilize data from the LRD in this paper. This probably makes little difference as we limit ourselves to a cross section of manufacturing establishments.
46
Judith K. Hellerstein and David Neumark
One can also define the term [1 (wC – 1) (C/L)] as the “labor quality index,” which is equal to one if the relative wage of college-educated workers is one, but will be greater than one if college-educated workers are paid more than workers who have only completed high school. For simplicity, and following what was usually assumed in the early work estimating production functions and in the early work on growth accounting, assume that F is a Cobb-Douglas production function. (4)
Y AK M (QL)
Then taking logs, substituting for QL, rearranging, and appending an error term , we can write (5)
ln(Y ) ln(A) ln(K ) ln(M ) ln(L)
C ln 1 (wC 1) , L which can be estimated with standard linear regression using plant-level data on output, capital, materials, and the number of workers in each education category and wages by education category. As Griliches (1970) notes, when one reformulates the production function in this way, one can indirectly test the assumptions about the nature of the relative weights on the quality of labor term by testing whether, when estimated unconstrained, the coefficients on the log of labor, ln(L), and on the log of the labor quality index, ln[1 (wC – 1) (C/L)], are equal. Specifically, such a test provides some evidence as to whether relative wages are equal to relative marginal products so that there are true productivity differentials associated with more education. Of course, this is only an approximate test in a multivariate context such as this, because mismeasurement of one variable (the log of the labor quality index in this case) can have unpredictable effects on the biases of the estimated coefficients of other variables. So, for example, mismeasurement of the log of the labor quality index could bias the estimates of its own coefficient and the coefficient on the log of labor in opposite ways, leading to a false rejection of the hypothesis that the two coefficients are equal. Moreover, once the quality of labor term varies along multiple dimensions, not just along education as in the preceding example, it becomes much harder to interpret differences between the coefficients on the log of labor [ln(L)] and the log of the labor quality index as arising from a violation of any one particular assumption of the equality of relative wages and relative marginal products. In Hellerstein and Neumark (1995), and subsequently in Hellerstein and Neumark (1999) and Hellerstein, Neumark, and Troske (1999), we modify this approach to measuring the quality of labor in two important ways. First, we note that one need not start by assuming a priori that rel-
Production Function and Wage Equation Estimation
47
ative wages are equal to relative marginal products. For example, one can replace the wage ratio wC in equation (5) with a parameter that can be estimated along with the rest of the parameters in equation (5) using nonlinear least squares methods. The estimated parameter is an estimate of the relative productivity of college-educated workers to high schooleducated workers. This estimate, then, can be compared directly to estimates from data of wC to form a direct test of the equality of relative wages to relative marginal products, without letting violations of this implication of competitive spot labor markets influence the production function estimates. Moreover, by replacing wC in equation (5) with a parameter to be estimated, , the coefficient on labor quality, is primarily identified off of variation in the log of unadjusted labor bodies (L) across plants because variation in C/L primarily identifies .15 (In the case where the log of the labor quality index is orthogonal to the log of labor, identification of comes solely from variation in the log of L.) Finally, it is worth noting (and easy to see in equation [5]) that the closer the estimated parameter is to one, the less important it is to measure labor in quality-adjusted units as the last term in equation (5) prior to the error (replacing wC with ) will drop out. The second modification is to go beyond focusing solely on educational differences among workers and to allow instead for labor quality to differ with a number of characteristics of the establishment’s workforce. Using this approach and given sufficiently detailed data on workers, one can directly test numerous theories of wage determination that imply wage differentials across workers that are not equal to differences in marginal products. This is an important advance over trying to test theories of wage determination using individual-level wage regressions with information on worker characteristics but no direct estimates of productivity differentials. For example, with data on only wages and worker characteristics it is impossible to distinguish human capital models of wage growth (such as Ben-Porath 1967; Mincer 1974; Becker 1975) from incentive-compatible models of wage growth (Lazear 1979) or forced-savings models of life-cycle wage profiles (Loewenstein and Sicherman 1991; Frank and Hutchens 1993). When typical wage regression results report positive coefficients on age, conditional on a variety of controls, these positive coefficients neither imply that older workers are more productive than younger ones nor that wages rise faster than productivity. Similarly, without direct measures of the relative productivity of workers, discrimination by sex, race, or marital 15. In the Cobb-Douglas production function, the coefficients on the productive inputs— , , and —are the elasticities of output with respect to these inputs. We more generally refer to these simply as the coefficients of the productive inputs or the production function parameters, given that our discussion is not confined to Cobb-Douglas production function estimates.
48
Judith K. Hellerstein and David Neumark
status cannot be established based on significant coefficients on sex, race, or marital status dummy variables in standard wage regressions as the set of usual controls in individual-level wage regressions may not fully capture productivity differences.16 2.4 Previous Work This idea forms the basis for the work done in Hellerstein, Neumark, and Troske (1999), where we used data from the WECD to form plant-level quality of labor terms. Specifically, in our baseline specifications, we defined QL to assume that workers are distinguished by sex, race (black and nonblack), marital status (ever married), age (divided into three broad categories—under thirty-five, thirty-five–fifty-four, and fifty-five and over), education (defined as having attended at least some college), and occupation (divided into four groups: operators, fabricators, and laborers [unskilled production workers]; managers and professionals; technical, sales, administrative, and service; and precision production, craft, and repair). In this way, a plant’s workforce is fully described by the proportions of workers in each of 192 possible combinations of these demographic characteristics. To reduce the dimensionality of the problem, in our baseline specifications we imposed two restrictions on the form of QL. First, we restricted the proportion of workers in an establishment defined by a demographic group to be constant across all other groups; for example, we restrict blacks in an establishment to be equally represented in that establishment in all occupations, education levels, marital status groups, and so forth. We imposed these restrictions due to data limitations. For each establishment, the WECD contains data on a sample of workers, so one cannot obtain accurate estimates of the number of workers in very narrowly defined subgroups. Second, we restricted the relative marginal products of two types of workers within one demographic group to be equal to the relative marginal products of those same two types of workers within another demographic group. For example, the relative productivity of black women to black men is restricted to equal the relative marginal productivity of nonblack women to nonblack men.17 With these assumptions, the log of the quality of labor term in the production function becomes
16. See Hellerstein and Neumark (2006) for a more thorough discussion of these alternative approaches to testing for discrimination. 17. We relax this restriction in many ways in Hellerstein, Neumark, and Troske (1999) and discuss the robustness of the results to this restriction. Relaxing the restrictions here yields similar results, and so we refer readers to Hellerstein, Neumark, and Troske (1999) for more on this issue.
Production Function and Wage Equation Estimation
49
1 ( 1) L C P O 1 ( 1) 1 ( 1) ( 1) L L L N S R 1 ( 1) ( 1) ( 1) , L L L
B (6) ln(QL) ln [L (F 1)F ] 1 (B 1) L
C
N
M
M
P
S
O
R
where B is the number of black workers, M is the number of workers ever married, C is the number of workers who have some college education, P is the number of workers in the plant between the ages of thirty-five and fiftyfour, O is the number of workers who are aged fifty-five or older, and N, S, and R are the numbers of workers in the second through fourth occupational categories defined previously. Note that the way QL is defined, productivity differentials are indicated when the estimate of the relevant is significantly different from one. We then estimated the production function using a translog specification18 (although we reported that the relative productivity differentials were robust to using a Cobb-Douglas specification), and we also examined the robustness of the estimates of the s to using a value-added specification and to instrumenting one variable input (materials) with its lagged value. We also tested the robustness of our estimates to relaxing in various ways the restrictions on the quality of labor term. In general, the qualitative results were very robust to these changes. See Hellerstein, Neumark, and Troske (1999) for full results. In order to test whether the estimates of the relative productivity differentials are different from the relative wage differentials, we also estimated wage differentials across workers using a plant-level earnings equation. When estimated jointly with the production function, simple and direct tests can be constructed of the equality of relative productivity and relative wage differentials. Moreover, while there may be unobservables in the production function and the wage equation, any biases from these unobservables ought to affect the estimated productivity and wage differentials similarly, at least under the null hypothesis of competitive spot labor markets equating relative wages and relative marginal products. 18. That is, we estimated a production function (in logs) of the form ln(Y ) ln(A) ln(K ) ln(M ) ln(QL) g(K, M, QL) X , where Y is output (measured in dollars), K is capital, M is materials, QL is the quality of labor aggregate, g(K, M, QL) represents the second-order terms in the production function (Jorgenson, Laurits, and Lau 1973), X is a set of controls, and is an error term. The vector X contains a full set of two-digit industry controls (to control for, among other things, price variation across industries), four size controls, four region controls, and a control for whether the plant is part of a multiunit firm. All specifications reported in this paper include this full set of controls.
50
Judith K. Hellerstein and David Neumark
In specifying the plant-level wage equation, we generally retained the same restrictions made in defining QL in the production function. We also assumed that all workers within each unique set of demographic groupings are paid the same amount, up to a plant-specific multiplicative error term. Under these assumptions, the total log wages in a plant can be written as (7)
1 ( 1) L C P O 1 ( 1) 1 ( 1) ( 1) L L L N S R 1 ( 1) ( 1) ( 1) ε, L L L
B ln(w) a ln [L (F 1)F ] 1 (B 1) L
C
N
M
M
P
S
O
R
where a is the log wage of the reference group (nonblack, never married, male, no college, young, unskilled production worker), and the terms represent the relative wage differentials associated with each characteristic. It is easy to show that this plant-level equation can be interpreted as the aggregation over workers in the plant of an individual-level wage equation, making relevant direct comparisons between the estimates of and those obtained from individual-level wage equations. In order to correspond most closely with individual-level wage data, our baseline results used LRD reports of each plant’s total annual wage and salary bill although the results were robust to more inclusive measures of compensation. 2.5 Using the DEED to Reexamine Productivity and Wage Differentials 2.5.1 Estimates from the DEED In this subsection, we use the DEED to estimate the production function and wage equation described in the previous section and compare the estimates to those obtained using the WECD and reported in Hellerstein, Neumark, and Troske (1999). As described in the preceding, the DEED is far larger and more representative than the WECD. This has two potential advantages. First, the fact that it is more representative of workers and plants in manufacturing may mean that the estimates we obtain here suffer less from any bias induced by the sample selection process that occurs when workers are matched to plants. Second, the larger sample size by itself allows us to gain precision in our estimates and potentially allows us to make sharper statistical inferences regarding wage and productivity differences than we were able to make in our earlier work. As mentioned earlier, in order to make the results exactly comparable, we use the same specifications and sample selection criteria that were used in our previous paper. In table 2.4, we report the results from joint estimation of the production
Table 2.4
Joint production function and wage equation estimates, Cobb-Douglas and translog production functions: 1990 Decennial Employer-Employee Dataset (DEED) Cobb-Douglas
Translog
Log p-value Log p-value Log (wages and [col. (1) Log (wages and [col. (4) (output) salaries) col. (2)] (output) salaries) col. (5)] (1) (2) (3) (4) (5) (6) Female Black Ever married Some college Aged 35–54 Aged 55 Managerial/professional Technical, sales, administrative, and service Precision production, craft, and repair Log capital Log materials Log labor quality
0.869 (0.026) 0.949 (0.051) 1.122 (0.052) 1.565 (0.051) 1.115 (0.035) 0.792 (0.043) 1.114 (0.050) 1.238 (0.048) 1.130 (0.045) 0.071 (0.003) 0.526 (0.002) 0.400 (0.007)
0.621 (0.007) 1.010 (0.018) 1.118 (0.018) 1.357 (0.015) 1.211 (0.014) 1.124 (0.018) 1.214 (0.019) 1.257 (0.017) 1.108 (0.016)
0.000 0.207 0.933 0.000 0.004 0.000 0.035 0.691 0.602
0.789 (0.021) 0.916 (0.045) 1.103 (0.044) 1.481 (0.043) 1.108 (0.031) 0.865 (0.038) 1.224 (0.047) 1.337 (0.046) 1.130 (0.040) 0.066 (0.002) 0.562 (0.005) 0.372 (0.008)
Log labor quality log labor quality Log materials log materials
Log materials log labor quality Log capital log labor quality Log capital log materials
R2
0.997 (0.006) .940
0.000 0.045 0.715 0.002 0.001 0.000 0.898 0.073 0.613
0.099 (0.006) 0.156 (0.002) 0.030 (0.002) –0.115 (0.003) 0.009 (0.003) –0.037 (0.002) 0.9999 (0.006)
Log capital log capital
Returns to scale
0.617 (0.007) 1.003 (0.018) 1.119 (0.018) 1.354 (0.015) 1.210 (0.014) 1.128 (0.018) 1.218 (0.019) 1.259 (0.017) 1.111 (0.016)
.937
.953
.937
Notes: Standard errors of the estimates are reported in parentheses. The sample size is 20,056. Test statistics are from Wald tests. The excluded occupation is operators, fabricators, and laborers. Other control variables included in the production function are dummy variables for industry (13), size (4 categories), region (4), and establishment part of multiplant firm. Other control variables in the wage equation are dummy variables for industry (13), size (4 categories), and region (4). The translog model is estimated with the data transformed so that output is homogeneous of degree S in the inputs, where S is the sum of the coefficients of the linear terms of the production function inputs.
52
Judith K. Hellerstein and David Neumark
function and wage equations using the total wages and salaries reported in the SSEL as paid by the establishment in 1989 as the wage measure.19 Columns (1)–(3) report results using a Cobb-Douglas production function specification in capital, materials, and the labor aggregate, with the quality of labor term defined as in equation (6); columns (4)–(6) report analogous results using a translog production function. Looking first at the production function estimates in column (1), we find that the coefficient for females indicates that women are somewhat less productive than men, with an estimate of F that is 0.87, which is significantly less than one. The point estimate of B indicates that blacks are slightly less productive than whites, but this estimate is not statistically significantly different from one.20 The estimated age profile indicates that prime-aged workers (aged thirty-five to fifty-four) are somewhat more productive than young workers, with an estimated relative productivity of 1.12, but the opposite is true for older workers (aged fifty-five), who have an estimated relative productivity of 0.79; both of these estimates are statistically significant. Workers who have at least some college education are much more productive than their lesseducated counterparts, with a statistically significant relative productivity of 1.57, providing evidence consistent with the human capital model of education in which more-educated workers are more productive. Workers who have ever been married have an estimated productivity of 1.12 relative to never-married workers. As for the controls for occupation, the results in column (1) suggest that unskilled production workers are relatively less productive than workers in the three other occupation categories. Turning to the other estimates, the coefficient on capital is 0.07, the coefficient on materials is 0.53, and the coefficient on labor quality, , is 0.40. Note that the returns-to-scale parameter is 0.997, which is neither qualitatively nor statistically different from one, so that constant returns to scale is not rejected.21 Finally, unlike in the aggregate time series growth regressions that generated the first concerns about the mismeasurement of labor quality back in the middle of the last century, the R2 of this microlevel pro19. There are two other possible wage measures. One is an estimate of wages paid in the establishment that can be constructed using the annual wages of workers matched to the establishment, weighted up by the total employment in the establishment. The other is the total compensation measure in the LRD, which includes nonwage benefits. The results we report here are robust to these alternative definitions of wages. 20. Our statistical tests regarding the relative productivity (or relative wages) of workers in various demographic categories are tests of whether the coefficients equal one. For simplicity, we often refer to one of these estimated coefficients as statistically significant if it is statistically different than one. 21. The notion of the returns to scale is somewhat ambiguous in this context, as explained by Griliches (1957), because it is not clear whether one should calculate the returns to labor simply as , the coefficient on the entire log labor quality term, or as 2 , the returns to the log of L, labor bodies, plus the returns to the labor quality index. We consider the returns to labor to be just , interpreting it as the return to an additional unit of labor quality, and calculate the returns to scale accordingly.
Production Function and Wage Equation Estimation
53
duction function regression is 0.94, so that the vast majority of the variability in log output across establishments is captured in the measured covariates. It remains to be seen how much of this is a function of simple covariates such as capital, materials, the quantity of labor, and the other controls we include, and how much of it instead can be attributed to the detailed measurement of labor quality.22 The estimates of relative wage differentials that are generated when the wage equation is estimated simultaneously with the Cobb-Douglas production function are reported in column (2). The estimates indicate that women’s wages are 38 percent lower than men’s wages, a statistically significant wage gap that is similar to what is found in individual-level wage regressions using Census data.23 The results show that blacks are paid the same as similar whites and that ever-married workers are paid 12 percent more than similar never-married workers. The estimates of relative wages for workers of different ages clearly show a quadratic-type wage profile, with precisely estimated relative wages of workers aged thirty-five to fiftyfour and aged fifty-five of 1.21 and 1.12, respectively. There is an estimated college wage premium of 1.36 and occupation premiums for the three occupations relative to the base category of unskilled production workers. Tests of whether the estimated wage and marginal productivity differentials are equal shed light on whether one can simply substitute relative wages into the production function when forming the labor quality measure and provide evidence regarding specific models of wage determination. Column (3) of table 2.4 reports the p-values of tests of the equality of the coefficients from the production function (column [1]) and the wage equation (column [2]).24 The results for women show clear evidence that while women are estimated to be somewhat less productive than men, the wage gap between men and women exceeds the productivity gap. The wedge between relative wages and relative productivity is –0.25 (0.621 – 0.869), and the p-value of the test of the equality of relative wages and relative productivity for women is 0.000. That is, we strongly reject the hypothesis that women’s lower wages can be explained fully by lower productivity, a finding that is consistent with the standard wage discrimination hypothesis (e.g., Becker 1971). The p-value of the equality of the relative wages and relative productivity of blacks is 0.207, which is not surprising given that neither the esti22. If we exclude the other controls (industry, size, region, multiunit establishment) we include in the production function, the R-squared falls trivially, to 0.936. 23. We do not report results from individual wage equations using the worker-level wage data in the DEED, but results from the DEED are very close to those in the full SEDF and are similar to those we find for the plant-level wage equations as reported in column (2) of table 2.4. 24. These are p-values from Wald tests of the equality of two parameter estimates, where the covariance between the two parameter estimates is obtained easily because the wage and production function equations are jointly estimated.
54
Judith K. Hellerstein and David Neumark
mated relative productivity nor the estimated relative wage of blacks is statistically significantly different from one. Therefore, we find no evidence of wage discrimination against blacks.25 Both the estimated productivity profile and the estimated wage profile are concave in age, but the p-values of 0.004 and 0.000 for the two age categories in column (3) show that the relative wages of workers aged thirtyfive to fifty-four and aged fifty-five are both higher than their respective relative productivities. Because we are identifying relative productivities, this finding implies that the wage profile is steeper than the productivity profile. As mentioned previously, there are a number of models that imply tilted wage profiles like this, with the most famous being Lazear’s model of long-term–incentive-compatible implicit contracts (1979).26 Our results do suggest that more educated workers are underpaid; the pvalue of the equality of relative wages and relative productivity by education is 0.000. This result, which as we report in the following was also found in Hellerstein, Neumark, and Troske (1999), remains somewhat puzzling as it is not predicted by any standard model of which we are aware. Finally, for the occupation categories, the relative wages and relative productivities of two of the three occupation groups are statistically indistinguishable. In contrast, the p-value in column (3) for managerial and professional workers of 0.035 suggests that this group of workers is underpaid. As we will show, however, this particular result turns out to be sensitive to the production function specification we use. In columns (4)–(6) we report results where we specify a translog production function and jointly estimate it with the wage equation. Not surprisingly, the estimated relative wages in column (5) are extremely close to those reported in column (2) as the only difference between how they are derived is the specification of the production function with which they are jointly estimated. The estimated relative productivities show the same patterns as those reported in column (1) although there are some differences between the two. Once again, females are estimated to be less productive than males, with an estimate of F in column (4) of 0.789, lower than that in column (1). Nonetheless, while the relative productivity and relative wages are estimated to be closer together using the translog specification, the p-value from the test of the equality of the two estimates is still 0.000, strongly rejecting their equality. The relative productivity of blacks in column (4) is 0.916, which is lower than that reported in column (1). This, cou25. As we discuss in Hellerstein, Neumark, and Troske (1999), blacks in manufacturing face a much lower negative wage premium (relative to whites) than blacks in other sectors of the economy, making it harder to detect possible differences between wages and productivity. Because of this, we are particularly hesitant to draw conclusions that extend beyond manufacturing regarding wage versus productivity differentials by race. 26. Technically, because we are identifying relative rather than absolute productivities, we cannot be sure that the wage and productivity profiles actually cross, which, in addition to deferred wages, is a feature of the Lazear model.
Production Function and Wage Equation Estimation
55
pled with the fact that the estimate is slightly more precise, generates a pvalue of 0.045 in column (6), which would lead to the conclusion that there is statistical evidence that blacks are slightly overpaid in manufacturing. Given the sensitivity of this result across columns, however, we do not regard the data as decisive about the gap between wages and productivity for blacks. We continue to find in the translog specification that the relative wage and relative productivity of ever-married workers are statistically indistinguishable, and we continue to find strong evidence consistent with wages rising faster than productivity over the life cycle. In the translog specification, unlike in the Cobb-Douglas, the point estimates for the relative wage and relative productivity of managerial and professional workers are indistinguishable both qualitatively and quantitatively (the p-value is 0.898), and we cannot reject the equality of relative wages and productivity for the other two occupations either (although the p-value for the precision production, etc. occupation falls to 0.07). 2.5.2 Comparison with Previous Results from the WECD Before we turn to further estimates and robustness checks using the DEED sample and some of the key issues regarding the more general question of specifying the labor input, in table 2.5 we compare the results from joint estimation of the translog production function and wage equation using the DEED to the previously published results using the WECD. Columns (1)–(3) replicate the results reported in the last three columns of table 2.5, whereas columns (4)–(6) replicate the WECD results reported in table 3 of Hellerstein, Neumark, and Troske (1999).27 The first thing to note is the considerably greater precision of the estimates resulting from the DEED being almost three times larger than the WECD. This is especially visible in the estimates from the production functions and in and of itself (aside from changes in the estimates) affects the inferences one draws from the results. Nonetheless, we consider the qualitative results across the two data sets to be essentially the same, with one important exception that we discuss in the following. The results in both data sets strongly imply that women are underpaid relative to their productivity although the gap between relative wages and relative productivity is smaller in the DEED than in the WECD. The results for blacks differ somewhat across the two data sets. The relative pro27. The published results in Hellerstein, Neumark, and Troske (1999), which are replicated in columns (4)–(6) of table 2.5, are derived from observations on 3,102 establishments in the WECD, whereas the baseline comparisons between the samples in table 2.2 contain 3,101 establishments. This happened because the original microdata from the WECD sample from our previous work is no longer available at the Census Bureau. We therefore recreated the data from scratch using old programs and confirmed that the omission of one establishment does not affect any of the results.
Table 2.5
Joint production function and wage equation estimates, translog production functions: 1990 Decennial Employer-Employee Dataset (DEED) and WorkerEstablishment Characteristics Database (WECD) Translog from DEED
Translog from WECD
Log p-value Log p-value Log (wages and [col. (1) Log (wages and [col. (4) (output) salaries) col. (2)] (output) salaries) col. (5)] (1) (2) (3) (4) (5) (6) Female
0.789 (0.021) Black 0.916 (0.045) Ever married 1.103 (0.044) Some college 1.481 (0.043) Aged 35–54 1.108 (0.031) Aged 55 0.865 (0.038) Managerial/professional 1.224 (0.047) Technical, sales, administrative, 1.337 and service (0.046) Precision production, craft, 1.130 and repair (0.040) Log capital 0.066 (0.002) Log materials 0.562 (0.005) Log labor quality 0.372 (0.008) Log labor quality log labor 0.099 quality (0.006) Log materials log materials 0.156 (0.002) Log capital log capital 0.030 (0.002) Log materials log labor quality –0.115 (0.003) Log capital log labor quality 0.009 (0.003) Log capital log materials –0.037 (0.002)
0.617 (0.007) 1.003 (0.018) 1.119 (0.018) 1.354 (0.015) 1.210 (0.014) 1.128 (0.018) 1.218 (0.019) 1.259 (0.017) 1.111 (0.016)
0.000 0.045 0.715 0.002 0.001 0.000 0.898 0.073 0.613
0.840 (0.064) 1.184 (0.140) 1.453 (0.207) 1.673 (0.156) 1.153 (0.108) 1.192 (0.145) 1.134 (0.136) 1.265 (0.124) 1.060 (0.121) 0.052 (0.007) 0.592 (0.018) 0.343 (0.024) 0.106 (0.016) 0.153 (0.007) 0.021 (0.008) –0.123 (0.007) 0.014 (0.009) –0.027 (0.006)
0.549 (0.016) 1.119 (0.047) 1.371 (0.066) 1.432 (0.044) 1.193 (0.037) 1.183 (0.051) 0.998 (0.043) 1.111 (0.039) 1.023 (0.039)
0.000 0.628 0.676 0.108 0.706 0.949 0.294 0.192 0.750
Notes: Standard errors of the estimates are reported in parentheses. The sample size for the DEED sample is 20,056. The sample size for the WECD is 3,102 and the results in columns (4)–(6) are replicated directly from Hellerstein, Neumark, and Troske (1999). See notes to table 2.4 for other details.
Production Function and Wage Equation Estimation
57
ductivity of blacks in the DEED is estimated to be 0.916, which is marginally statistically significant, while the relative wage of blacks is estimated to be 1.003, and the p-value of the test of the equality of these two coefficients is 0.045. In contrast, the point estimate of the relative productivity of blacks in the WECD is a much higher 1.18 with a large standard error (0.14), while the relative wage is 1.12, and the p-value of the test of their equality is 0.63. Nonetheless, as we showed in table 2.4, the relative productivity of blacks in the DEED is sensitive to the production function specification, so differences in estimates across samples is perhaps not surprising either. Moreover, blacks constitute only a small portion of employment in both samples, so measurement error in the constructed variable for percent black in the establishment may have a particularly large impact on the results; this may be especially true in the translog production function, where measurement error is exacerbated. It is fair to say, though, that our methods and data have yielded a less sharp picture than we would have liked regarding wages and productivity of blacks relative to whites. In both data sets we find a productivity premium associated with marriage that is equal to the wage premium, but the estimates from the DEED are somewhat smaller and much more precise, leading perhaps to more conclusive evidence of the equality of the two premia. Similarly, in both data sets there is a productivity premium for education that exceeds the wage premium although both of these premia are smaller in the DEED. The one substantive difference in the inferences that can be made between the results from the two samples is the estimated wage and productivity profiles over the life cycle. As can be seen in columns (4)–(6), in the WECD the point estimates of the relative wages and productivity of workers in each of the two older age groups are similar, and the p-values for the tests of the equality of the wages and productivity of both groups fail to reject the hypothesis that wage differentials reflect differences in marginal products. However, the relative productivities for workers in the two age groups reported in column (4) are quite imprecise so that one also cannot reject the hypothesis that relative productivity does not change over the life cycle. In contrast, the results from the DEED for these age groups, as reported in columns (1)–(3), present a very different picture. First, while the estimated relative productivity of workers aged thirty-five to fifty-four in the DEED is 1.11, close to the 1.15 estimate in the WECD, the DEED estimate is statistically significantly different from one. Second, the estimated relative productivity of workers aged fifty-five in the DEED is only 0.87 and is statistically significantly different from one and qualitatively quite different from the estimate of 1.19 in the WECD. So, as mentioned previously, there is strong evidence of a quadratic-type productivity profile over the life cycle in the DEED. Both the WECD and DEED results suggest that wages rise as workers age into the thirty-five to fifty-four category, but it is
58
Judith K. Hellerstein and David Neumark
only in the DEED that one sees clear evidence of a quadratic-type wage profile, evidence that again is made possible by the much more precise estimates. Finally, in contrast to the WECD results, and as mentioned previously, the p-values in column (3) from the DEED strongly reject the hypothesis that wage differentials over the life cycle reflect differences in marginal productivity differentials. And again, our ability to find this is due at least in part to the fact that the sample size in the DEED leads to much greater precision in the estimates and, hence, much more statistical power in our tests although the differences between the wage and productivity estimates are larger in the DEED. Interestingly, the point estimates of the coefficients for the productive inputs in the translog production function across the two data sets are remarkably similar although they are, of course, more precisely estimated in the DEED. So although the point estimates of the demographic characteristics are somewhat sensitive to what data set we use, the changes in these coefficients across data sets has virtually no effect on the estimates of the coefficients of the productive inputs. This foreshadows the results we report in the following, where we examine the sensitivity of the estimates of the production function parameters in the DEED as we alter the definition of labor quality. In Hellerstein, Neumark, and Troske (1999), we conduct a series of robustness checks on specifications using the WECD that include relaxing in a number of ways the restrictions on the construction of the quality of labor term, estimating value-added production functions and estimating production functions where we instrument for log materials, and splitting up the sample into establishments characterized by high and low percentages of female employees and high and low total employment. Conducting these same robustness checks using the DEED leads to very similar conclusions to those using the WECD, and so we do not report these here. The other robustness check we report in Hellerstein, Neumark, and Troske (1999) is a Monte Carlo simulation to examine the effects of measurement error in the estimates of the percent of workers in each demographic category on the (nonlinear) production function and wage equation estimates. We conducted that same simulation using the DEED. As with the WECD, the simulation shows that measurement error attenuates the estimates of relative productivity and relative wages toward one, with the attenuation greater in magnitude the farther from one is the true value. However, given that the attenuation occurs in both the wage and production function equations, it serves to bias the results toward finding no differences between the relative productivity and relative wage estimates for a given type of worker. Moreover, these biases are not large enough to change the estimates of the scale parameter, a finding that is consistent with the results we report in the following section.
Production Function and Wage Equation Estimation
59
2.5.3 Production Function Estimates and Properties Returning to the DEED estimates, because we have demeaned the inputs, the sum of the linear coefficients on log capital, log materials, and log labor quality in the translog production function can be used to measure returns to scale. This sum is estimated to be 0.9999 with a standard error of 0.006. That is, we continue not to reject constant returns to scale qualitatively or statistically. Indeed, the coefficients on these linear terms are very similar in the translog and in the Cobb-Douglas specifications. The estimates of the coefficients on the higher-order terms in the translog are all statistically significant. But while we can reject the Cobb-Douglas production function specification in favor of the translog, the key coefficients of interest to us—the coefficients of relative productivities of workers and the coefficients on the linear inputs in the production function (particularly that on labor quality)—are fundamentally consistent across the two specifications. Because of this, we consider the Cobb-Douglas to be the baseline specification against which further results from the DEED are compared. 2.6 The Importance of Heterogeneous Labor for Production Function Estimates In this section, we examine the sensitivity of production function estimates to varying the definition of the quality of labor aggregate by specifying the quality of labor aggregate less richly across many dimensions than we allow in the preceding. In this way, we examine whether the richness of the demographic information on workers in the DEED aids in accurate production function estimation. We focus our attention here on estimated parameters from the production function—, , and —and also report the estimated productivity differentials (the s).28 As described previously and noted first by Griliches (1970), mismeasuring the quality of labor aggregate will have a first-order effect on the bias of the estimate of the coefficient on labor ( ), and so we focus most on that parameter. In all specifications, we estimate Cobb-Douglas production functions jointly with similarly modified wage equations, so these estimates are comparable to those in column (1) of table 2.4. The results are reported in table 2.6. In column (1), we estimate a simple Cobb-Douglas production function where we assume that all labor is homogeneous so that overall labor quality is measured as total 28. We also comment on what changes in the definition of the labor quality aggregate do to estimated gaps between relative wages and relative productivity, but because both the wage and productivity equations have labor quality aggregates that are mismeasured in the same way, we expect that the biases that this produces will affect both equations similarly.
0.938
0.939
0.068 (0.003) 0.525 (0.002) 0.406 (0.007) 1.378 (0.034)
Production/ nonproduction (2)
0.939
1.545 (0.055) 1.431 (0.053) 1.270 (0.051)
0.068 (0.003) 0.525 (0.002) 0.407 (0.007)
Four occupations (3)
0.940
1.688 (0.045)
0.067 (0.003) 0.525 (0.002) 0.406 (0.007)
High- and low-education (4)
0.940
1.619 (0.050)
0.068 (0.003) 0.525 (0.002) 0.406 (0.007) 1.081 (0.032)
0.940
1.095 (0.048) 1.188 (0.046) 1.172 (0.045) 1.611 (0.051)
0.067 (0.003) 0.525 (0.002) 0.406 (0.007)
High and low education; four occupations (6)
0.939
0.067 (0.003) 0.525 (0.002) 0.394 (0.006)
Wage-adjusted labor quality (7)
Notes: Standard errors are in parentheses. The sample size is 20,056. The production function is jointly estimated with the wage equation as in table 2.4, columns (1) and (2). The wage equation results are not reported here. See notes to table 2.4 for other details.
R2
Technical, sales, administrative, and service Precision production, craft, and repair Some college
0.068 (0.003) 0.525 (0.002) 0.407 (0.007)
Homogeneous labor (1)
High- and low-education; production/ nonproduction (5)
Joint production function and wage equation estimates, Cobb-Douglas production function, parsimonious variants of the definition of labor quality: 1990 Decennial Employer-Employee Dataset (DEED)
Managerial/professional
Nonproduction
Log (labor quality)
Log (materials)
Log (capital)
Table 2.6
Production Function and Wage Equation Estimation
61
employment in the plant. The coefficients on capital, materials, and labor are estimated to be 0.068, 0.525, and 0.407, respectively, and all are very precisely estimated. These estimates are all within 0.010 of the estimates in table 2.4, where we allow labor to be heterogeneous across a wide variety of demographic characteristics. In columns (2)–(6) of table 2.6, we allow labor to be heterogeneous across a very limited set of characteristics. In column (2), we split workers into two occupations—production and nonproduction—paralleling the split available in the LRD. In column (3), we split workers into the same four occupations we used in the previous tables. In column (4), we split workers into two groups defined by whether they had any college education. In columns (5) and (6), we allow workers to vary by both education and occupation. Finally, in column (7) we return to a production function as suggested in papers such as Griliches (1970) and in equation (5), where we constrain the relative productivities of workers to be equal to their relative wages.29 Across the first six columns of table 2.6, the estimated coefficients on capital, materials, and labor quality never deviate by more than 0.001. Therefore, at least in these data, variation in the heterogeneity allowed in the quality of labor input has essentially no effect on estimates of the coefficients on capital, materials, or labor. In addition, the R2s of the regressions are virtually identical across the columns and to the Cobb-Douglas results in table 2.4 so that allowing for heterogeneity in the labor input does not lead to measurably lower residual variance. In addition to reporting the estimated coefficients on capital, labor, and materials, we also report in table 2.6 the relative productivity of workers in different groups, as defined across the columns of the table. The most interesting finding based on the estimates of relative productivity is the comparison between columns (2) and (4). In column (2), we split workers into production and nonproduction workers because in the LRD, establishments report total employment split into these two groupings, so one does not need matched data to estimate a production function with heterogeneous labor defined in this limited way. As a result, these classifications have been used in previous research (e.g., Berman, Bound, and Griliches 1994) as a proxy for the dichotomy between more- and less-educated (skilled) workers. We create our two occupations by taking the four occupation categories we use up to this point and consolidating into nonproduction all workers in the managerial/professional category and the technical and so on category, and consolidating into production all workers in precision production and so on and operators and so on (the omitted cat29. This is done by first estimating the wage equation, equation (7), and substituting the estimates of the relative wages of workers—the estimated s—into the quality of labor as defined in equation (6). We then plug this new quality of labor term into the production function and reestimate it jointly with the wage equation.
62 Table 2.7
Judith K. Hellerstein and David Neumark Joint production function and wage equation estimates, Cobb-Douglas and “OlleyPakes” production functions: 1990 Decennial Employer-Employee Dataset (DEED) Cobb-Douglas
Olley-Pakes
Log p-value Log p-value Log (wages and [col. (1) Log (wages and [col. (4) (output) salaries) col. (2)] (output) salaries) col. (5)] (1) (2) (3) (4) (5) (6) Female Black Ever married Some college Aged 35–54 Aged 55 Managerial/professional Technical, sales, administrative, and service Precision production, craft, and repair Log labor quality
0.869 (0.026) 0.949 (0.051) 1.122 (0.052) 1.565 (0.051) 1.115 (0.035) 0.792 (0.043) 1.114 (0.050) 1.238 (0.048) 1.130 (0.045) 0.400 (0.007)
0.621 (0.007) 1.010 (0.018) 1.118 (0.018) 1.357 (0.015) 1.211 (0.014) 1.124 (0.018) 1.214 (0.019) 1.257 (0.017) 1.108 (0.016)
0.000 0.207 0.933 0.000 0.004 0.000 0.035 0.691 0.602
0.886 (0.028) 0.963 (0.054) 1.130 (0.055) 1.594 (0.055) 1.115 (0.037) 0.795 (0.045) 1.188 (0.056) 1.330 (0.054) 1.108 (0.048) 0.349 (0.007)
0.623 (0.007) 1.010 (0.018) 1.119 (0.018) 1.358 (0.015) 1.211 (0.014) 1.123 (0.018) 1.213 (0.019) 1.256 (0.017) 1.108 (0.016)
0.000 0.351 0.832 0.000 0.006 0.000 0.642 0.149 0.974
Notes: Standard errors of the estimates are reported in parentheses. The sample size is 20,056. Test statistics are from Wald tests. The excluded occupation is operators, fabricators, and laborers. Other variables included in the production function for both specifications are log capital, log materials, and dummy variables for industry (13), size (4 categories), region (4), and establishment part of multiplant firm. In the “Olley-Pakes” specification, second- and third-order terms in log materials and log capital are also included. Other control variables in the wage equation are dummy variables for industry (13), size (4 categories), and region (4).
egory in our estimation results).30 As reported in column (2), we estimate that nonproduction workers are 1.38 times more productive than produc30. Alternatively, we could have relied solely on the LRD to create these occupations and not have used the DEED at all, but that would have potentially caused comparability problems across columns of the table. The ASM filing instructions for survey respondents for establishments in 1989 contain lists of occupations that employers should consider when assigning workers to either production or nonproduction. We created an approximate concordance between three-digit Census occupations and the occupations in these filing instructions (not all occupations exist in both classifications), and assigned workers to production and nonproduction work based on their three-digit Census occupation. We then checked how this assignment compared to one where, as in the preceding, we simply split the four broad occupations into production and nonproduction. Using our method, we estimate that 0 percent of precision production and so on workers are misclassified according to the LRD classification, 3 percent of managerial workers are misclassified, 24 percent of technical and so on workers
Production Function and Wage Equation Estimation
63
tion workers, with a standard error of 0.03. In column (4), we instead allow workers to vary by skill by using the DEED to directly measure the proportion of workers in each plant who have some college education. We then recover an actual estimate of the relative productivity of more-educated workers, which as reported in column (4) is 1.69, with a standard error of 0.05. Therefore, although classifying workers as production or nonproduction goes part of the way toward allowing workers to be heterogeneous based on education (or skill), in actuality, the relative productivity of moreeducated workers is far larger than what one can recover using the production-nonproduction split.31 Nonetheless, it is once again worth remembering that regardless of how one classifies the heterogeneity of labor, little else in the production function estimation is affected. To get some intuition for why the estimated coefficients of capital, materials, and labor units are essentially unaffected by the definition of the quality of labor index, consider the specification of labor quality where labor is just divided into high- and low-educated workers, as is done in column (4) of table 2.6. The production function specification that generates these results is
C (8) ln(Y ) A ln(K ) ln(M ) ln L 1 (C 1) L
X , where A is a constant and X is the vector of other controls we include in the production function. This equation can be approximately linearized as (9)
C ln(Y ) A ln(K ) ln(M ) ln(L) X , L
where (C – 1). The issue then becomes why omitting C/L in this linear equation does not cause much omitted variable bias in the estimates of , , or . For each of these parameters, the omitted variable bias can be computed by running an auxiliary regression of C/L on all the right-handside variables (except C/L) in equation (9) and multiplying the estimate of from equation (9) times the estimated conditional correlation of the appropriate right-hand-side variable from the auxiliary regression. So, for example, we estimate from the auxiliary regression that the conditional correlation of C/L and ln(K ) is 0.02. Multiplying this by the estimate of of 0.20 from equation (9) yields 0.004, which is the upward bias in the estiare misclassified, and 5 percent of operators and so on are misclassified; that is, for example, 3 percent of the workers in Census managerial occupations, which we classify as managerial, are classified as production workers according to the ASM classification. 31. Moreover, our definition of more-educated consists of workers with some college or higher, which is a lower threshold than often considered when classifying workers by education as high- or low-skilled.
64
Judith K. Hellerstein and David Neumark
mated coefficient on ln(K ) caused by omitting C/L in equation (9). This is so small that it has no noticeable effect on the estimate of , the coefficient on capital that we report (nor any economically meaningful effect). Moreover, although there is variation across plants in the fraction of college workers (as reported in table 2.3, the mean is .40 with a standard deviation of .23), the variance of (C/L) is small so that the residual variance is also virtually unaffected. This same analysis can also be used to show why, at least with the DEED data used here and the production function estimates we generate, defining labor quality in any of the numerous ways we do between tables 2.4 and 2.6 is not going to have a marked effect on the estimated coefficients on capital, materials, or labor, nor on the R2s.32 Finally, recall the earlier discussion of how Griliches suggested incorporating information on variation in labor quality using wage ratios to proxy for differences in relative productivity. Our results thus far indicate that for quite a few types of workers, the assumption justifying this approach— that wages are set in a competitive spot market and hence relative wages equal relative marginal products—does not hold. On the other hand, our findings indicate that in an approach where wage ratios are used, bias transmitted to the standard production function parameters is negligible as is any change in the residual variance. Does this mean, then, that labor quality differences across workers are unimportant so that human capital cannot explain differences across establishments in productivity in the cross section, or, more importantly, in TFP growth rate calculations if one were to have multiple years of data on establishments, rather than the one cross section we have here? The answer is no. In the cross section, our estimates clearly show that differences in labor quality across establishments are highly statistically significantly related to differences in output. It is just that, relative to the other inputs and controls in the regression (in particular, industry controls), they contribute far less to explaining cross-sectional differences in output across establishments. Moreover, in a longitudinal setting, the results about the invariance of the production function parameters and the residual variance absolutely cannot be generalized for two reasons. Consider the first-differenced form of equation (9): (10)
C ln(Y ) A ln(K ) ln(M ) ln(L) , L
where it is assumed that variables in X, such as industry dummies, are unchanged over time. (A can appear because the intercept can vary over 32. The plant-level residuals can be thought of as measures of relative TFP across plants, controlling for industry affiliation, size, and so on, and the R2s can be thought of as the variance in cross-sectional TFP estimates across plants. Not surprisingly, the correlation and rank correlation in the plant-level residuals across the columns of table 2.6 is consistently in the high 90s.
Production Function and Wage Equation Estimation
65
time.) If there is major skill upgrading by many establishments, particularly relative to changes in other inputs, the inclusion of C/L in the regression may have a marked effect on reducing the estimate of A, which is generally called the TFP growth rate. Moreover, if the rate of skill upgrading is very variable across establishments (which it will be if establishments are starting from different initial levels of skill), including the change in the fraction of college-educated workers may also contribute highly to reducing residual variance. This is all true even if, in the cross section, differences across establishments in the fraction of college-educated workers do not markedly affect residual variance or other estimated parameters. Finally, these changes in the fraction of college-educated workers across establishments may be more correlated with changes in the use of other inputs across establishments than are the levels33 so that the omitted variable bias may be larger in the first-differenced (or panel) setting. This would affect the coefficients on other inputs more heavily in the first difference than in the cross section. In sum, researchers estimating cross-sectional production functions need not worry about the effect of unobserved labor quality on the other usual parameters of interest, but this does not imply that TFP growth rate calculations, residual variance calculations, or estimates of other parameters of interest would be similarly unaffected in the longitudinal setting. 2.7 Accounting for Unobservables in the Production Function Proper measurement of the quality of labor, of course, will not help yield consistent parameter estimates in the production function if the production function itself is misspecified. One of the most common criticisms of basic production function estimation is that it suffers from specification bias in the sense that there are omitted plant-specific state variables that affect input choices and also output (e.g., Marschak and Andrews 1944). There have been various approaches to dealing with this problem over the years, including using panel data and incorporating fixed plant effects (e.g., Griliches and Regev 1995). One of the most innovative attempts at dealing with omitted plant-specific productivity parameters is found in Olley and Pakes (1996). The basic insight in that paper is that because, with only a few assumptions, the plant investment function will be a monotonic function of observed state variables and the plant-specific unobserved state variable, the investment function can be inverted so that the unobserved state variable is a function of the observed state variable and the 33. That is, it is possible that, for example, Corr [ ln(K ), C/L] is much greater than Corr [ln(K ), C/L]. For example, a new type of capital, like computers, may be more complementary with worker skill than other forms of capital so that changes in capital that arise from computer investment may be highly correlated with skill upgrading of workers.
66
Judith K. Hellerstein and David Neumark
plant’s observed investment decisions. They then empirically model this by appending to a Cobb-Douglas production function34 a polynomial expansion of capital (the observed state variable) and a proxy for investment. Estimating this modified production function identifies the output elasticity of labor, but identifying the output elasticity of capital requires a second step as capital enters into this modified production function both as a productive input and in the polynomial expansion as a proxy for the firm’s unobservable productivity shock. Levinsohn and Petrin (2003) build on this by noting that if there are intermediate inputs (like materials) in the production function, then under the same conditions as in Olley and Pakes (1996), a plant’s demand function for this intermediate input is a monotonic function of the observed state variable (capital) and the unobservable state variable. In this case, the input demand function can be inverted so that the unobservable plantspecific state variable is a function of capital and the intermediate inputs. Further, they show that if there are adjustment costs to investment, using investment as a proxy can be problematic. We follow Levinsohn and Petrin by using capital and materials to proxy for the plant-specific unobservable although we continue to follow Olley and Pakes’ suggestion to use a polynomial expansion of capital and materials in the regression to flexibly model the plant unobservable (rather than using locally weighted quadratic least squares regression, as Levinsohn and Petrin do). As Griliches and Mairesse (1998) point out, this idea of using a proxy for a plant’s unobservable productivity shock has the advantage over the more typical fixed-effects panel data approach of allowing for time-varying plant effects and allowing for more identifying variation in the other inputs. It is not, however, a complete panacea. Consider the Cobb-Douglas production function we estimate. Estimation consists of least squares regression of the log of output on the log of capital, the log of materials, and the log of labor quality. If we now want to follow the Olley-Pakes method of controlling for plant-level unobservables, we instead regress the log of output on the log of capital, the log of materials, the log of labor quality, and a polynomial in the log of capital and log of materials. Of course, if we include only a second-order polynomial expansion, then we have gone partway toward specifying a translog production function, where we have omitted the higher-order terms involving the log of labor quality. Similarly, if we include a third-order polynomial expansion, then we have gone partway toward specifying a third-order approximation to an arbitrary production function in (the log of) capital, materials, and labor quality. That is, consistent estimation in the Olley-Pakes framework, like consistent estimation of any production function specification, requires one to take a stand on the correct functional form of the production function so that one 34. One could do the same thing with any chosen production function specification.
Production Function and Wage Equation Estimation
67
can separately identify the effects of the productive factors and the plantspecific unobservables. Similarly, misspecification of the underlying functional form of the production function in the Olley-Pakes framework can lead to inconsistent estimates of the production function parameters.35 This limitation aside, in table 2.7 we explore the sensitivity of our results to incorporation of an Olley-Pakes type correction for plant-level unobservables. The first three columns report results from the Cobb-Douglas production function, replicating the results from table 2.4, columns (1)– (3). Columns (4)–(6) of table 2.7 list results from an Olley-Pakes production function, where we include a third-order polynomial in capital and materials as a proxy for plant unobservables.36 We only report the key coefficients, those on the demographic characteristics plus the coefficient on log labor quality (all of which are consistently estimated in the Olley-Pakes procedure, conditional on the model’s assumptions, using simple nonlinear least squares). The estimates from the wage equations in table 2.7, reported in columns (2) and (5), are essentially identical. This may not be surprising given that the wage equation specifications are identical across these columns, although as we model the error structure in the production function equations differently across the columns, the wage equation estimates can change as a result of the simultaneous estimation of the wage and production function equations. As for the estimated parameters on the demographic characteristics from the production function, reported in columns (1) and (4), the results are very similar. The coefficients on female, black, ever married, some college, and the two age categories differ in each case only in the second decimal place. As a result, as is clear from the p-values reported in columns (3) and (6), inferences about the equality of the wage and productivity parameters are identical across the two specifications. In table 2.4, the results on the production function coefficients for the percent black and for two of the three occupations were sensitive enough to the specification of the production function to cause the inferences from the p-values of the tests of the equality of the relative wage and productivity parameters to fluctuate between the Cobb-Douglas and translog specifications. This happens in table 2.7 only with the coefficients for one of the three occupation categories, managerial and professional workers, where the relative productivity estimate rises from 1.11 in column (1) to 1.19 in 35. Syverson (2001) points out a theoretical limitation of the Olley-Pakes procedure. The consistency of the Olley-Pakes procedure relies on the assumption that plant-level productivity is the only unobserved plant-level state variable in the investment function. This assumption is violated if, for example, markets are segmented so that a plant’s output demand function is another unobserved state variable. 36. Using a fourth-order polynomial did not change the results. Similarly, results comparing a baseline translog production function with a translog augmented by a fourth-order polynomial in materials and capital leads to the same qualitative conclusions that we report in this section.
68
Judith K. Hellerstein and David Neumark
column (4), causing the p-value to rise from 0.04 in column (3) to 0.64 in column (6). Finally, the coefficient on log labor quality falls from 0.40 in the Cobb-Douglas specification to 0.35 in the Olley-Pakes estimates, which is actually smaller than its linear counterpart in the translog specification in table 2.4 but which does not represent a huge qualitative drop. In total, across tables 2.4 and 2.7, the results on the coefficients of demographic characteristics are remarkably consistent across specifications, demonstrating that the exact functional form of the production function and unobserved plant-level heterogeneity does not matter much to estimates of the relative productivity and relative wages of workers or to the differences between relative productivity and relative wages. 2.8 Conclusion In this paper, we document the construction of a new matched employeremployee data set, the 1990 DEED, which is a match between the Long Form of the 1990 Decennial Census of Population and the SSEL. We show that for manufacturing workers and the establishments in which they work, the DEED is representative of the underlying population along many important dimensions, more so than previous matched data in manufacturing, and provides a large and rich data set with which to examine the relationships between workers, wages, and productivity in manufacturing. We then take the subset of manufacturing establishments in the DEED and match them to the 1989 LRD so that we can recover information necessary to estimate production functions. We specify the labor quality input in each plant using the demographic information on workers in the DEED who have been matched to manufacturing establishments and, coupled with information from the LRD, jointly estimate production functions and wage equations. Our results imply that collecting detailed data on workers in manufacturing establishments is useful for testing models of wage determination, where in order to formally test these models, one needs information on relative productivities of workers of different types. But the results also indicate that this detailed information on establishments’ workforces is not a necessary component to the estimation of the rest of the production function. This last finding should be good news to researchers who use the usual microdata sets that do not contain detailed worker information to estimate cross-sectional production functions and suggests that most unmeasured variation in labor quality is unlikely to have large effects on estimates of the production function parameters or of the residual, at least in the context of estimates using microlevel data from manufacturing plants.
69
Production Function and Wage Equation Estimation
Appendix Table 2A.1
Two-way frequency of hand-checked scores for all hand-checked data from 1990 DEED Score B
Score A
1
1
9,930 66.16
2 3
2
3
All industries 2,229 291 14.85 1.94 1,126 406 7.50 2.71 158 1.05
4
4
5
Row total
56 0.37 95 0.63 123 0.82 40 0.27
79 0.53 30 0.20 252 1.68 101 0.67 93 0.62
12,585 83.85 1,657 11.04 533 3.55 141 0.94 93 0.62
314 2.09
555 3.70
15,009 100
3 0.11 4 0.15 9 0.33 1 0.04
0 0.00 0 0.00 0 0.00 1 0.04 0 0.00
2,427 88.71 290 10.60 17 0.62 2 0.07 0 0.00
17 0.62
1 0.04
2,736 100
5 Column total
9,930 66.16
1
1,981 72.40
2 3
3,355 22.35
855 5.70
Manufacturing 426 17 15.57 0.62 243 43 8.88 1.57 8 0.29
4 5 Column total
1,981 72.40
669 24.45
68 2.49
Notes: Percent of sample in cell is reported in italics. We have recorded all nonmatching scores above the diagonal. Score A and Score B refer to the match scores given in checking matches. 1 definitely a correct match; 2 probably a correct match; 3 not sure; 4 probably not a correct match; 5 definitely not a correct match.
References Abowd, John M., and Francis Kramarz. 1999. The analysis of labor markets using matched employer-employee data. In Handbook of labor economics. Vol. 3B, ed. Orley C. Ashenfelter and David Card, 2629–2710. Amsterdam: Elsevier Science. Abowd, John M., Paul Lengermann, and Kevin McKinney. 2002. The measurement of human capital in the U.S. economy. Longitudinal Employer-Household Dynamics (LEHD) Technical Paper no. 2002-09. Washington, DC: Government Printing Office.
70
Judith K. Hellerstein and David Neumark
Ashenfelter, Orley C., and David Card, eds. 1999. Handbook of labor economics, Vols. 3A–3C. Amsterdam: Elsevier Science. Bartelsman, Eric J., and Mark Doms. 2000. Understanding productivity: Lessons from longitudinal micro data. Journal of Economic Literature 38 (3): 569–94. Becker, Gary S. 1971. The economics of discrimination. Chicago: University of Chicago Press. ———. 1975. Human capital. Chicago: University of Chicago Press. Ben-Porath, Yoram. 1967. The production of human capital and the life cycle of earnings. Journal of Political Economy 75:352–65. Berman, Eli, John Bound, and Zvi Griliches. 1994. Changes in the demand for skilled labor within U.S. manufacturing industries: Evidence from the Annual Survey of Manufactures. Quarterly Journal of Economics 109:367–97. Foster, Lucia, John Haltiwanger, and C. J. Krizan. 1998. Aggregate productivity growth: Lessons from microeconomic evidence. NBER Working Paper no. 6803. Cambridge, MA: National Bureau of Economic Research. Frank, Robert H., and Robert M. Hutchens. 1993. Wages, seniority, and the demand for rising consumption profiles. Journal of Economic Behavior and Organization 21:251–76. Griliches, Zvi. 1957. Specification bias in estimates of production functions. Journal of Farm Economics 39 (1): 8–20. ———. 1960. Measuring inputs in agriculture: A critical survey. Journal of Farm Economics 42 (5): 1411–33. ———. 1970. Notes on the role of education in production functions and growth accounting. In Education, income, and human capital, ed. W. Lee Hansen, 71– 115. Studies in Income and Wealth, vol. 35. New York: Columbia University Press. ———. 2000. R&D, education, and productivity. Cambridge, MA: Harvard University Press. Griliches, Zvi, and Jacques Mairesse. 1998. Production functions: The search for identification. In Practicing econometrics: Essays in method and application, ed. Zvi Griliches, 383–411. Cheltenham, UK: Edgar Elgar. Griliches, Zvi, and Haim Regev. 1995. Firm productivity in Israeli industry, 1979– 1988. Journal of Econometrics 65:175–203. Hellerstein, Judith K., and David Neumark. 1995. Are earnings profiles steeper than productivity profiles? Evidence from Israeli firm-level data. Journal of Human Resources 30 (1): 89–112. ———. 1999. Sex, wages, and productivity: An empirical analysis of Israeli firmlevel data. International Economic Review 40 (1): 95–123. ———. 2003. Ethnicity, language, and workplace segregation: Evidence from a new matched employer-employee data set. Annales d’Economie et de Statistique 71–72:19–78. ———. 2006. Using matched employer-employee data to study labor market discrimination. In Handbook on the economics of discrimination, ed. William Rodgers, 29–60. Cheltenham, UK: Edgar Elgar. Hellerstein, Judith K., David Neumark, and Kenneth R. Troske. 1999. Wages, productivity, and worker characteristics. Journal of Labor Economics 17 (3): 409–46. Jarmin, Ron S., and Javier Miranda. 2002. The Longitudinal Business Database. CES Working Paper no. CES-WP-02-17. Washington, DC: Center for Economic Studies. Jorgenson, Dale W., Laurits R. Christensen, and Lawrence J. Lau. 1973. Transcendental logarithmic production frontiers. Review of Economics and Statistics 55: 28–45.
Production Function and Wage Equation Estimation
71
Lazear, Edward. 1979. Why is there mandatory retirement? Journal of Political Economy 87 (6): 1261–84. Levinsohn, James, and Amil Petrin. 2003. Estimating production functions using inputs to control for unobservables. Review of Economic Studies 70 (2): 317–41. Loewenstein, George, and Nachum Sicherman. 1991. Do workers prefer increasing wage profiles? Journal of Labor Economics 9:67–84. Marschak, Jacob, and William H. Andrews, Jr. 1944. Random simultaneous equations and the theory of production. Econometrica 12:143–205. McGuckin, Robert, and George Pascoe. 1988. The Longitudinal Research Database (LRD): Status and research possibilities. Survey of Current Business 68 (November): 30–37. Mincer, Jacob. 1974. Schooling, experience, and earnings. New York: Columbia University Press. Olley, Steven, and Ariel Pakes. 1996. The dynamics of productivity in the telecommunications equipment industry. Econometrica 64 (6): 1263–97. Syverson, Chad. 2001. Output market segmentation, heterogeneity, and productivity. PhD diss., University of Maryland.
3 Where Does the Time Go? Concepts and Measurement in the American Time Use Survey Harley Frazis and Jay Stewart
[T]his is the single most important statistical initiative of the federal government currently underway. —William Nordhaus, referring to the American Time Use Survey in his testimony before the Joint Economic Committee on July 24, 2002
3.1 Introduction Time use surveys have been around for eighty years. They were conducted in the Union of Soviet Socialist Republics (U.S.S.R.) and by the U.S. Department of Agriculture (USDA) in the 1920s and in Finland in the 1930s and 1940s. The Szalai International Study, which was the first multinational effort, was conducted in 1965–1966 in twelve countries including the United States. The 1970s saw a large increase in the number of time use surveys worldwide, but relatively few have been conducted in the United States. In addition to the 1965–1966 time use survey, there have been surveys in 1975–1976 (with a follow-up in 1981), 1978, 1985, 1992–1994, and 1998–1999. All of these surveys have relatively small sample sizes, and none were conducted by the federal government (see Harvey and St. Croix 2003). The American Time Use Survey (ATUS) is the first time use survey conducted by the U.S. government since the USDA studies of the 1920s. It is the only ongoing time use survey, and with the release of the 2004 data, its sample size exceeds that of any other time use survey. The Bureau of Labor Statistics (BLS) started looking into measuring time use in 1991 after the Unremunerated Work Act was introduced. That bill, which did not pass, specifically named BLS as the responsible agency. Since then, the BLS has engaged in many activities—most importantly the BLS Time Use Pilot Study and the MacArthur Conference in 1997 and a Harley Frazis and Jay Stewart are research economists on the Employment Research and Program Development Staff at the U.S. Bureau of Labor Statistics. The views expressed in this paper are ours and do not necessarily reflect those of the Bureau of Labor Statistics. We thank Diane Herz, Jim Spletzer, and Charles Hulten for helpful comments. Any remaining errors are ours.
73
74
Harley Frazis and Jay Stewart
report to the National Academy of Sciences in 1999—to assess the feasibility of collecting time use data on an ongoing basis. These activities provided the foundation for the eventual funding and subsequent development of the ATUS. In the following we describe the ATUS, review some of the uses of time use data, and discuss how specific features of the ATUS affect two key applications—valuing household work and estimating hours worked for pay. 3.2 The American Time Use Survey This section briefly describes the key elements of the ATUS. For a fuller description and the rationale behind many of the decisions that were made, see Herz and Horrigan (2004, 2005). 3.2.1 Data Collection The ATUS sample is a stratified random sample that is drawn from households that have completed their participation in the Current Population Survey (CPS). The CPS households (more strictly, addresses, because movers are not followed) are in the CPS sample for eight months over a sixteen month period (four months in the survey, eight months out, and four months in). Each month, about 7,500 of these are completing their participation in CPS—that is, are in their eighth month in sample, hereafter referred to as “MIS 8.” The pool of eligible households is smaller than the MIS 8 sample size because the CPS oversamples smaller states, and these oversample households are eliminated from the pool of eligible households.1 Sample households are selected based on the characteristics of the reference person, and then the respondent is randomly selected from the list of adult (fifteen or older) household members. All adults within a household have the same probability of being selected. During 2003, ATUS interviewed about 1,725 individuals per month, but beginning in January 2004 the sample size was reduced to about 1,100 per month. The ATUS is administered using computer assisted telephone interviewing (CATI), rather than paper diaries as many other countries do. The cost of collecting paper diaries would be prohibitive for an ongoing survey. (The Canadian time use survey also uses telephone data collection.) Respondents are asked about their activities on the day before they are interviewed. If the respondent is unavailable on his or her initial calling day, then subsequent contact attempts are made on the same day of the week. This insures that the reference day is always the same day of the week as the initial reference day and allows more control over the distribution of the sample over days of the week. Field testing showed that allowing the respondent more flexibility did not improve response rates. 1. Thus, the ATUS sample is nationally representative but cannot be used to generate statelevel estimates.
Concepts and Measurement in the American Time Use Survey
75
3.2.2 Demographic Information Because the ATUS uses the CPS as a sampling frame, it contains the same demographic information as the CPS—most prominently age, race, sex, relationship to the respondent, education, and marital status. For household members that were present during the CPS MIS 8 interview, all demographic information is carried over. For new household members, the ATUS collects only age, sex, and relationship to the respondent. 3.2.3 Labor Force Information The ATUS updates labor force information using a modified version of the basic CPS questionnaire. The reference period for ATUS employment questions is slightly different from that in the CPS. To ascertain the respondent’s employment status on the reference day, the ATUS asks about work activities during the previous seven days (i.e., the last day is the reference day). This differs from the CPS, which asks about the week that contains the twelfth of the month, which is the calendar week prior to the interview. It was believed (and examination of gross flows data confirm this belief—see Stewart 2004) that there would be too many transitions between labor force states if the previous calendar week was used. Of course, it is still possible that the respondent was employed at the beginning of the seven-day period and had lost or left the job by the reference day. But these transitions should be relatively rare. The labor force questions allow us to determine whether the respondent is Employed, Unemployed, or Not in Labor Force (NILF) but do not allow one to distinguish between the three categories of NILF (Retired, Disabled/Unable, and Other) as in the CPS.2 Nor does the ATUS ask the CPS questions that permit classification of NILF respondents as “discouraged workers” who have given up the job search or questions on the respondent’s job history. Job history information can be obtained by matching the ATUS interview to the respondent’s MIS 8 interview. For respondents who are employed, the ATUS asks questions about hours, earnings, and industry and occupation. The ATUS asks respondents to report usual hours, but does not collect actual hours. Actual hours are highly correlated with usual hours, and, for most purposes, usual hours are more relevant.3 The earnings questions are asked of everybody who has 2. The ATUS does distinguish between “At Work” and “With Job but Absent from Work” for the employed, and between “Looking” and “On Layoff ” for the unemployed. 3. There are two problems with collecting hours for the previous seven days. First, respondents may have a difficult time determining hours for a seven-day period that does not correspond to a calendar week. Second, asking about hours for the previous seven days would result in a biased estimate of actual hours worked. For example, an individual who worked unusually long hours during a week would be less likely to be contacted during that week, making it more likely that he or she is contacted the following week (and asked to report hours for the busy week). Hence, long work weeks tend to be oversampled. However, the direction of this bias is indeterminate because vacation weeks also tend to be oversampled.
76
Harley Frazis and Jay Stewart
a new job in the ATUS. This includes people who changed jobs between the MIS 8 interview and the ATUS interview and people who made a nonemployment-to-employment transition. We also ask the earnings questions if the MIS 8 earnings data were allocated. The earnings data for all other respondents are carried over from the MIS 8 interview. The ATUS does not collect as much information about other household members. For the respondent’s spouse or unmarried partner, the ATUS collects basic labor force information—employment status (employed or not employed) and total hours usually worked per week. And for other household members, the ATUS does not collect any labor force information. 3.2.4 Time Diary The core time diary of the ATUS is very similar to those of other time use surveys. The respondent is asked to take the interviewer through his or her day via a conversational interview. The diary day starts at 4:00 a.m. and goes through 4:00 a.m. of the following day (the interview day), so each interview covers a twenty-four-hour period. The respondent describes each activity, which the interviewer either records verbatim or, for a limited set of common activities (such as sleeping or watching television), hits a precode button. For activities that are not precoded, the verbatim responses are coded according to a three-tier scheme so that each activity has a sixdigit code (two digits per tier). Coders are also interviewers, which means that when interviewing respondents, they know what level of detail is required for coding. For example, when the respondent reports that they were reading without giving more detail, the interviewer asks: “Was that for your current job, to get a degree, pleasure, or something else?” For each episode, the ATUS collects either the stop time or duration of the activity (the start time is simply the stop time of the previous activity). For the last activity of the day (the one that spans 4:00 a.m. the morning of the interview), the ATUS records the actual stop time, even though the episode “ends” at 4:00 a.m. for official estimates.4 Respondents are also asked to report where they were and who they were with, unless the activity is sleeping or grooming (neither is asked) or working at a job (only where is asked). If the respondent was at home, he or she is asked to report who else was in the room. If the respondent was away from home, he or she is asked to report who accompanied them. The “who” codes for household members refer to specific individuals, which is particularly useful for researchers who are interested in estimating the amount of time that parents spend with their children. The “where” code for an activity specifies either a location or a mode of transportation. 4. This gives us the completed duration of the final activity, which is usually sleeping. Otherwise, for most respondents, we would have information on two truncated episodes of sleep. For official estimates of the time spent sleeping, we will use the two truncated episodes because the total time in all episodes must sum to 1,440 minutes each day.
Concepts and Measurement in the American Time Use Survey
77
It is important to note that the ATUS data only contain information about the respondent’s primary activity. The BLS is looking into the feasibility of systematically collecting secondary activities. Currently, if the respondent reports two activities, both are recorded by the interviewer, but only the primary activity (as determined by the respondent) is coded. Analysis of these simultaneous activities will allow the BLS to determine how often and under what conditions respondents spontaneously report secondary activities and will provide information about how often interviewers will need to probe for this information. 3.2.5 Summary Questions In addition to the labor force questions and the time diary, the ATUS asks several summary questions that are designed to obtain information that cannot readily be obtained from the core time diary. Childcare as a Secondary Activity In the course of developing the survey, the BLS determined that the most important activity missed by not collecting secondary activities is child care. Examination of data on secondary activities from the Australian National Time Use Survey indicates that individuals spend three to four times as much time in child care as they do in other household work. Further, attendees at the MacArthur Conference and the National Academy of Sciences (NAS) Workshop expressed a strong preference that the ATUS collect child care as a secondary activity (henceforth, we will refer to this as secondary child care). To capture secondary child care, the ATUS asks respondents to identify the activities during which a child under thirteen was “in your care.” Cognitive testing revealed significant variation in how respondents answered the childcare summary questions (see Schwartz 2001). Specifically, some respondents reported only times when both the respondent and the child were awake, while others included activities and times when either the respondent or the child was sleeping. To mitigate the impact of this inconsistent reporting, it was necessary to put limits on when secondary childcare can occur. For official estimates of secondary childcare, the ATUS only includes activities that occurred when the respondent and at least one child under thirteen were awake.5 Paid Work The paid work summary questions are designed to do two things. First, they are designed to identify income generating activities. These are typi5. We begin the child care questions by asking the respondent when the first child under thirteen woke up and when the last child under thirteen went to bed. However, we do not collect any other information about the children’s activities, so secondary child care estimates include time when children are taking naps.
78
Harley Frazis and Jay Stewart
cally things like arts and crafts that are not done as part of the respondent’s main or secondary job but may generate income. Second, and more important, they are designed to identify activities that are done for the respondent’s main or secondary job. This could include things like bringing work home or grading papers. It could also include things like taking clients out to dinner. Ideally, the respondent would report these activities as paid work, but that is not always the case. Furthermore, for self-employed workers who work at home, the distinction between work and home life can be blurred. Although most self-employed workers report to work just like wage and salary workers, some work at home and intermix work and nonwork activities. For example, a respondent may report that he or she spent thirty minutes doing e-mail correspondence at home, but it may not be clear whether this was for work. Like the child care questions, the three paid-work questions ask respondents to identify activities that were done for their main job, their other job(s), or that were done for pay (other income). The ATUS asks a similar summary question that asks respondents to identify activities that were done for a volunteer organization. This is necessary because it may not be clear that, for example, baking cookies for a bake sale is volunteer work if the respondent does not say for whom the cookies are being baked in the time diary. Absences from Home Because the ATUS calls respondents at their homes, it does not obtain any information about what people do when they are absent from home. To fill this gap, the ATUS asks “missed days” questions that allow us to estimate the amount of time people spend away from home and to find out the purpose of the absence. We do not envision using these data to augment official time use estimates, but they should help us better understand what we are missing because we cannot contact respondents while they are away from home. The ATUS asks respondents to report the number of absences from home that lasted two nights or longer during the month prior to the initial calling date.6 For each absence, the respondent is asked to report the purpose of the absence and the number of nights the respondent was away from home. Unfortunately, due to programming difficulties, these data are not available for 2003 and 2004. Although the BLS does not adjust official estimates for time away from home, other analysts may want to adjust their estimates. For example, if one is willing to make some assumptions, estimates of hours worked could be adjusted to account for vacation time and business trips. 6. We do not call these absences “trips,” because they could be hospital stays or jail time. We do not ask about one-night absences, because these are captured by the core time diary.
Concepts and Measurement in the American Time Use Survey Table 3.1
79
Where does the time go? Time spent in major activities, by sex and employment status (in hours per day) Total
Employed
Not Employed
Activity
Men
Women
Men
Women
Men
Women
Personal care activitiesa Eating and drinking Household activitiesb Purchasing goods and services Caring for and helping household members Caring for and helping nonhousehold members Working and work-related activitiesc Educational activitiesd Organizational, civic, and religious activitiese Leisure and sports Telephone calls, mail, and emailf Other activities, not elsewhere classifiedg
9.13 1.24 1.33 0.68
9.54 1.18 2.30 0.94
8.81 1.22 1.16 0.64
9.29 1.16 1.88 0.91
9.93 1.30 1.76 0.80
9.87 1.20 2.89 0.97
0.34
0.75
0.38
0.72
0.23
0.79
0.26 4.57 0.45
0.31 2.87 0.50
0.25 6.26 0.25
0.27 4.88 0.33
0.30 0.25 0.95
0.36 0.11 0.72
0.29 5.41 0.13
0.35 4.83 0.24
0.26 4.53 0.11
0.30 3.89 0.20
0.38 7.66 0.18
0.42 6.12 0.31
0.18
0.20
0.14
0.16
0.27
0.26
a
Sleeping, bathing, getting dressed, etc. Housework; cooking; yard care; home maintenance, repair, renovation, and decoration, etc. c Includes job search activities. d Taking classes, doing homework, taking care of administrative matters, etc. e Includes time spent doing work for volunteer organizations. f Does not include purchases of goods and services. g Uncodeable activities, Don’t Know, Refused. b
3.2.6 Where Does the Time Go? A Look at the Data Table 3.1 presents a sample of the types of estimates that can be generated using ATUS data. It shows the amount of time (in hours per day) spent in twelve major activities on an average day by sex and employment status. Estimates are for men and women age fifteen and older and were generated using sample weights that account for ATUS sample design, nonresponse, and the fact that the ATUS assigns half the sample to weekends. The first two columns of table 3.1 show the time spent by men and women in each of the major activities. Women spend more time doing nonmarket work (4.0 versus 2.4 hours per day),7 while men spend more time in market work and work-related activities (4.6 versus 2.9 hours per day). Adding market and nonmarket work together, we find that men and women do the same amount of work overall—about 6.9 hours per day. The 7. Nonmarket work includes Household Activities, Purchasing Goods and Services, and Care of Household Members.
80
Harley Frazis and Jay Stewart
time spent in other activities is very similar for men and women, except that men spend 0.6 of an hour more in leisure activities, while women spend 0.4 of an hour more in personal care activities. In the third and fourth columns of table 3.1, we see the same pattern for employed men and women. The main difference is that the total amount of time spent in market and nonmarket work is 8.4 hours per day. Turning to nonemployed men and women, we see notable differences. Men do significantly less nonmarket work than women (2.8 versus 4.7 hours per day). They spend more time in work-related activities, mainly job search, but this is not enough to offset the difference in nonmarket work. Thus, nonworking women spend 1.7 hours more per day in market and nonmarket work. The time spent in other activities is very similar for men and women, except that nonworking men spend 1.6 hours more per day in leisure activities. 3.2.7 Comparability to Other Time Use Surveys Researchers will undoubtedly want to compare estimates from the ATUS to those from earlier time use surveys. For these comparisons to be valid, it will be necessary to make the data sources as comparable as possible. This will require accounting for differences in coding systems, samples, and survey methods. There is research currently under way that will shed light on these differences. A report by Ann Gauthier compared the ATUS coding system to those used in other time use surveys and made recommendations, most of which were followed, on how to make the coding systems more compatible. Andrew Harvey and Jonathan Gershuny are examining the impact of methodological differences between the ATUS and earlier U.S. time use surveys and other countries’ time use surveys. Table 3.2 summarizes some of the important differences in the major U.S. time use surveys.8 As can be seen, there are some significant differences across surveys. The 1965–1966 survey sampled individuals in small cities, while the others were nationwide. The 1965–1966 and 1975–1976 surveys contacted respondents in person, while the 1985 survey used several interview modes. The quota sample of days in the 1965–1966, 1975–1976, and 1985 surveys means that individuals were not randomly assigned to days of the week, but that days were assigned to distribute interviews across the days of the week. Respondents in these surveys were generally contacted using a convenientday approach, where they are called on consecutive days until they are reached. The “yesterday” diaries were collected either in person or over the telephone, and the diary day was the day before the interview. The reference day for “tomorrow” diaries was the day following the interview. 8. The information in this table is from Robinson and Godbey (1997) and Harvey and St. Croix (2003).
Nationwide: 44 cities with population of 30,000–280,000; 18–65 year-olds; at least one household member must have been employed in the nonfarm sector
Yesterday (130) Tomorrow (1,244)
Sample description
Diary type and sample size
Yesterday (2,405)
37 states and Washington, D.C.; 18 year-olds
Personal interviews; quota sample of days
Self-administered diary collected through personal interviews and leave-behind diaries; quota sample of days
Data collection mode
Time Use in Economic and Social Accounts Americans’ Use of Time
Computer assisted telephone interviewing
Continuous beginning in January 2003
American Time Use Survey
Yesterday (1,468) Tomorrow (3,890)
Yesterday (approx. 1,850/ month in 2003 and 1,250/ month thereafter)
Nationwide; 12 year olds; Nationwide; 15 year-olds restricted to urban population
Self-administered diary collected through mailbacks, telephone, and personal interviews; quota sample of days
Jan. 1–Dec. 30, 1985
Survey
Oct.–Nov. 1975, and reinterviewed in Feb., May, and Sept. of 1976
Americans’ Use of Time
Comparison of major U.S. time use surveys
Dates survey was conducted Nov. 15–Dec. 15, 1965 March 7–April 29, 1966
Description
Table 3.2
82
Harley Frazis and Jay Stewart
Respondents filled out a paper diary, which either was picked up by the interviewer or mailed back by the respondent. It is not known how differences in interview mode affects response although it has been shown that contacting respondents using the convenient-day approach for “yesterday” diaries results in systematic underestimates of activities done at home and overestimates of activities done away from home.9 In contrast, the designated-day approach does not generate any bias. 3.3 What Can We Learn from Time Use Data? Time use data can shed light on many questions. In the following, we discuss two questions that may be of interest to readers of this volume and then briefly describe several other potential uses of the data. 3.3.1 Household Production and National Income and Product Accounts Economists’ recognition of the importance of household production goes back at least to Reid (1934). She defined household production as an activity that could be done by a third person with the same result. In his influential article, Becker (1965) modeled households as combining time and market goods as inputs to produce the goods that are ultimately consumed by the household. This approach to modeling does not permit one to distinguish between leisure activities and those that satisfy the “third-person” criterion. An alternative approach taken by Gronau (1986) models households as consuming goods and leisure, as in traditional models, but the goods can either be purchased in the market or produced at home. The key innovation of these models is that households are viewed as factories with goods and time being combined via a production function to produce an output (utility). Yet household production is ignored in standard national income accounting, which is oriented toward valuing goods and services that are exchanged between economic units—most prominently those exchanged in markets. For example, the United Nations’s (1993, paragraph 1.20) System of National Accounts 1993 (SNA) states: The System includes within the production boundary all production actually destined for the market, whether for sale or barter. It also includes all goods or services provided free to individual households or collectively to the community by government units or [nonprofit institutions serving households]. This definition of production excludes some economic activity from national income accounts. While both the SNA and the U.S. National In9. More specifically, the convenient-day schedule systematically overestimates activities that are negatively correlated with the probability of contacting the respondent and underestimates activities that are positively correlated with the contact probability. See Stewart (2002).
Concepts and Measurement in the American Time Use Survey
83
come and Product Accounts (NIPA) include within-household production of goods for household use (such as food for farm households), they both exclude nonmarket services produced within the household (with the major exception of owner-occupied housing).10 For many purposes, the exclusion of within-household services can be justified. The SNA (United Nations 1993, paragraph 1.22) cites “the need to prevent flows used for the analysis of market behavior and disequilibria from being swamped by non-monetary values.” However, some questions require accounting for household production. For example, the increasing labor force participation rate of women has led to growth in measured production, but one might want to know to what extent this represents a shift from household to market production. One way to incorporate household production into the national income accounts is to create a satellite account (examples include Landefeld and McCulla 2000; Australian Bureau of Statistics 2000, appendix 1). The first step is to define household production. The usual approach is to apply a third-person test (Reid 1934, cited in Landefeld and McCulla 2000): household production is defined as the output of activities where the same result could be achieved by hiring a third person. For example, cooking a meal is household production, but eating it is not. Household production (as with other production) can be valued either directly, as the value of output, or indirectly, as the sum of the costs of inputs. The United Kingdom has an experimental household satellite account based on output measures (Holloway, Short, and Tamplin 2002). Drawing on a number of data sources, these accounts estimate the volume and value of such items as clean, warm, furnished, maintained accommodation; total distance traveled; meals and hot drinks prepared in the home; and informal child care and adult care. Under this output approach, time use data can be used to estimate productivity in household production, but they are not used to value output. Most satellite account proposals use the input approach, which tends to require fewer data sources. As noted by Landefeld and McCulla (2000), the costs of household production include the cost of purchased goods and services that are inputs to household production, the cost of capital services, and the cost of labor input. Purchased goods and services are already part of conventional income accounts. Accounting for capital services would involve imputing rental rates to consumer durables (and reclassifying durables purchases as householdsector investment). Data on labor input must come from time use surveys. The literature discusses two approaches to valuing labor input. The opportunity-cost approach uses the individual’s market wage to value the time spent in household production. This approach has some conceptual 10. This is the only exception in the NIPA; the SNA also includes major repairs and storage (United Nations 1993, 6.24, 6.27). Several countries (Australia, New Zealand, Canada) include volunteer work in their measure of unpaid work. However, Landefeld and McCulla (2000) argue that volunteer work should be excluded from household satellite accounts.
84
Harley Frazis and Jay Stewart
and practical difficulties associated with it. On a conceptual level, the implicit assumption that hours of paid work are freely variable at the margin may not hold; workers, at least in the short run, may have little choice in their working hours. Perhaps more important, the opportunity-cost approach assumes that people who are highly productive in market work are just as productive doing household work. It is hard to imagine that lawyers are five times more productive building a deck than a carpenter. On a practical level, it would be necessary to impute a wage for nonworkers. The other approach to valuation is the replacement-cost approach, which uses the wage rate that would be paid to a third party. Within the replacement-cost approach, one can use either a generalist or a specialist wage. If specialist wages are used, the labor cost of each task is the wage of specialists in that task. For example, the time spent caring for children is valued according to the rate of pay for child care workers, food preparation is valued at the wage of cooks, and so on. One issue here is that specialists may be more productive than persons working at a variety of tasks in their own household. This shortcoming motivates the generalist-wage approach, which uses the wages for general household workers, namely housekeepers, as the cost of an hour of unpaid work. Simultaneous activities complicate the valuation of household production because, depending on the specific activities and the valuation approach used, it may be necessary to determine how much time was devoted to each activity. To illustrate, consider a respondent who spent one hour looking after children and doing laundry at the same time. If the generalist-wage approach is used, then valuation is straightforward, with the entire hour being valued at the generalist wage. If the specialist-wage approach is used, then one must determine how to value that hour of time: value the entire hour at the housekeeper wage, value the entire hour at the child care worker wage, or value part of the hour at the housekeeper wage and part at the child care worker wage. If the latter valuation is used, the analyst must determine how to apportion time to the two activities. The treatment of simultaneous activities is much simpler if a generalist wage is used although both valuation approaches require disentangling the activities when one of the simultaneous activities is not household work. Australia apparently does not include secondary activities in its measure of unpaid work. New Zealand (Statistics New Zealand 2001) presents some estimates that include secondary activities but excludes passive child care (being “available for care” in their terminology). Conceptually, one might argue for its inclusion on the basis of the third-party test. One could value time spent in passive child care at a housekeeper’s wage (under the generalist-wage approach) or by wages for child care workers or babysitters (under the specialist-wage approach). As long as passive child care is the only secondary activity collected in the ATUS, one can take one of two approaches to incorporating simultaneous activities in valuing household work. The easiest approach would be
Concepts and Measurement in the American Time Use Survey Table 3.3
85
Alternative valuations of nonmarket production: 2003 Aggregate value of nonmarket production using (in $billions): Aggregate hours Specialist wage Generalist wage (in billions)
Activity Housework Meal preparation/Clean-up Interior/Exterior repair Yard work Purchasing goods and services Other housework Child care (as primary activity) Adult care
51 44 13 16 67 27 39 6
461 376 178 183 609 374 373 52
461 397 121 149 609 243 359 52
Total (excluding secondary child care) Child care (as secondary activity)
263 85
2,605 746
2,391 777
Total (including secondary child care) Paid worka Gross Domestic Productb
348 277
3,351
3,167 4,888 11,004
a
The estimated number of paid work hours are the authors’ tabulations of ATUS data. The value of paid work was derived by multiplying total paid work hours by the hours-weighted mean wage computed from the CPS Outgoing Rotation Group files. b The GDP estimate is from the Economic Report of the President.
to value only primary activities and ignore secondary child care. There is some logic to this approach in that it treats all secondary activities the same. But data from the 1992 Australian National Time Use Survey, which collects secondary activities, indicates that individuals spend very little time (nine minutes per day for women and five minutes per day for men) doing “domestic activities” (household work) as a secondary activity. In contrast, men spend seventeen minutes per day doing child care as a secondary activity and women spend thirty-nine minutes.11 The magnitudes of these differences suggest that more is lost by ignoring secondary child care than by treating secondary activities asymmetrically. This leads us to the alternative approach, which is to value the time spent in secondary child care when the primary activity is not household work.12 Table 3.3 presents estimates of the time spent in nonmarket production, broken down by activity, plus four estimates of the total value of this pro11. These estimates exclude times when the primary activity was housework, child care, or sleeping. Also keep in mind that the child care estimates are averages over the entire population, not just parents. 12. The implicit assumption is that it is possible to hire someone to do household chores and look after household children. Alternatively, one could assume that it would be necessary to hire two people—one to do the housework and one to look after the children. Given that individuals routinely perform these tasks simultaneously, the former assumption makes more sense.
86
Harley Frazis and Jay Stewart
duction in the United States in 2003. We apply the generalist wage and the specialist wage to two alternative definitions of nonmarket work. Our first definition includes household activities (including purchasing goods and services) and care of household members done as a primary activity. We exclude volunteer activities and care and helping activities for nonhousehold members. The second definition is the same as the first but includes child care as a secondary activity. As described in the preceding, we exclude secondary child care that was done at times when the respondent was engaged in nonmarket work as a primary activity. The specialist wages were generated using the Outgoing Rotation Group (ORG) files for 2003 from the CPS. We computed the hours-weighted average wage for each three-digit occupation. The time spent in each nonmarket activity was valued at the wage for the occupation that most closely resembles the activity.13 For the generalist wage, we used the (hours-weighted) average wage for Maids and Housekeepers. Using the specialist wage rather than the generalist wage adds between 6 and 9 percent to the value of nonmarket work although there is some variation across activities. Differences in the valuations of individual components are as expected. It is worth noting that the valuation of Child Care (as a primary activity) is higher using the specialist wage, while secondary child care is valued more highly using the generalist wage. The difference arises because primary childcare includes high-skill tasks such as helping children with their homework and home schooling. The lower valuation of secondary child care under the specialist wage arises because the wage for Child Care Workers is less than the wage for Maids and Housekeepers. Other Household Activities, which include household management and other high-skill activities, are valued more highly using the specialist wage. Finally, it is clear from table 3.3 that secondary child care is an important activity. Secondary child care accounts for about one-quarter of the total amount of time spent in nonmarket production (and slightly more than one-fifth of the value of nonmarket production when the specialist wage is used). The results in table 3.3 indicate that nonmarket production is economically significant regardless of which estimate is used. For comparison, we present two alternative measures of market output. The gross domestic product (GDP) figure is the published estimate from the Bureau of Economic Analysis, while the Paid Work estimate was calculated by multiplying hours worked from the ATUS by the hours-weighted average wage of all workers in the United States from the CPS ORG files. We can see that nearly as much time was spent doing nonmarket work as was spent doing market work—more if secondary child care is included—but that the value of nonmarket work was significantly less than that of market work. Even though the per-hour value attached to nonmarket work is significantly lower than the value attached to market work, the total value of nonmarket work is still 13. This crosswalk is available from the authors upon request.
Concepts and Measurement in the American Time Use Survey
87
large, ranging from 49 to 69 percent of the value of market work. If the values of market and nonmarket work were added together, nonmarket work would comprise between 33 and 41 percent of that total. Nonmarket work is a smaller, though still significant, fraction of GDP. Depending on the definition and valuation approach used, nonmarket production is equal to between 22 and 30 percent of GDP, or, if nonmarket production were included in GDP estimates, would comprise between 18 and 23 percent of the combined value of market and nonmarket production. Although these estimates serve as an example of the type of calculation necessary to incorporate nonmarket production into the NIPAs, the lack of a time series means that we cannot examine trends in nonmarket production. One approach, which has been taken by a number of studies, would be to use data from a single time use survey and assume that there are no within-group changes in the time spent in nonmarket production. Under this assumption, any changes in nonmarket production come through changes in the composition of the population (see Eisner 1988). Of course, once enough ATUS data become available it will be possible to generate a consistent time series that allows within-group changes in household production, enhancing our ability to examine long-term trends. Just as important, ATUS data will permit the first analysis of the cyclical behavior of household production. 3.3.2 Measuring Hours of Work Statistics on hours worked are inputs into estimates of productivity and hourly wages, two numbers of great interest to economists. Differences in measured hours between surveys can lead to substantial differences in trends in productivity and wages. For example, Abraham, Spletzer, and Stewart (1998) show that the different trends in hours account for all of the divergence between hourly wages derived from the NIPA, which use hours from the establishment-based Current Employment Statistics program (CES), and estimates from the CPS. In this section, we describe the uses of ATUS measures of hours worked and compare hours worked measures from the CPS with those derived from the ATUS. Comparison of ATUS data with CES measures of hours worked is beyond the scope of this paper. How can the ATUS be used to help measure hours of work? The ATUS sample sizes are too small to compute monthly, or even quarterly, estimates of aggregate hours for the purpose of estimating productivity. Because the ATUS collects time diaries only for a single day, diary-based hours data cannot be used to compute hourly wages for ATUS respondents.14 How14. It will be possible to use ATUS data to construct an hours-weighted average hourly wage, which would be comparable to hourly wage estimates from the CES and the NIPA, and the CPS estimates in Abraham, Spletzer, and Stewart (1998).
88
Harley Frazis and Jay Stewart
ever, time use data can be used to estimate biases of other sources of data. To do this, analysts typically construct synthetic weeks that are representative of the group of interest (usually the entire population or a specific subgroup). For example, it would be possible to compute average weekly hours worked for everyone who is employed or for major industry groups. Data from the ATUS have a number of advantages for measuring hours worked and evaluating the biases in existing measures. Respondents need not try to recall over periods longer than a day, and, by reporting individual episodes of work, they avoid having to add the lengths of different episodes. Paid work can include work at home or otherwise not at the workplace, so off-the-clock work is collected. Moreover, as mentioned previously, after the core time-diary summary questions are asked giving the respondent additional chances to identify an activity as paid work. This improves identification of paid work activities for self-employed respondents who work at home and others who do not “go to work” in the traditional sense. The ATUS interviews are conducted nearly every day of the year, with most holidays being covered (because the telephone centers are open the day after most holidays).15 The ATUS data allow analysts to exclude paid breaks from hours worked if they choose.16 Interviewers prompt respondents by asking “Did you take any breaks of fifteen minutes or longer?” whenever a work episode is reported.17 Paid leave presents a more difficult challenge. Workers who travel away from their home are unavailable for interviewing, which means that these trips are missed in the time diary. This biases estimates of hours worked, but the direction of the bias is not clear. Missed business trips bias hours downward, while missed vacations bias hours upward. The ATUS allows analysts to correct for this missed-days bias by collecting information about the amount and purpose of absences from home as described in section 3.2. This missed-days correction requires an assumption about the hours worked during business travel because they are not collected. For purposes of illustration, we now compare measures of work hours derived from the ATUS to measured hours from the CPS. There are many differences between the ATUS and the CPS that may affect such a comparison. The most obvious difference is that the questions used to estimate hours of work are different. There also turn out to be differences in responses to other variables, such as employment and multiple jobholding, 15. Reference days before major holidays will be missed, as the telephone centers will be closed. The remaining days in the month that fall on the same day of the week as the missing day will have their weights inflated to make up for the missing day, in effect making the assumption (which we make in the absence of other information) that the activities on the missing day are similar to those on other days with the same day of the week. 16. Hamermesh (1990) is one attempt we have seen to examine the effect of paid breaks on wages. 17. Beginning in 2004, this prompt was incorporated into the instrument. The prompt automatically pops up whenever work episodes of four hours or longer are reported.
Concepts and Measurement in the American Time Use Survey
89
even though the relevant questions are similar in the two surveys. The reference periods are also different—the CPS asks about the week containing the twelfth of the month, while the ATUS covers almost every day of the year. Nonresponse implies that there may be systematic differences in the sample. We attempt to identify the effects of all of these differences between the two surveys. We calculate three different measures of hours worked using ATUS data.18 Each of these definitions corresponds to a different concept of hours worked. Going from the most restrictive measure to the least restrictive measure, these are: 1. Time spent in activities coded as “Working at job.” 2. Definition (1) plus activities identified as breaks and time spent in work-related travel (not commuting).19 3. Definition (2) plus activities that were coded as being done for the respondent’s job. Table 3.4 compares estimates of hours worked from the ATUS (the first three columns) and the CPS (the last column) for calendar year 2003. Depending on the definition used for the ATUS estimates, average weekly hours worked in the ATUS are 1.1 to 1.7 hours less than in the CPS over the same period. (All differences mentioned in the text are statistically significant at the 5 percent level unless otherwise noted.) As noted, part of this difference may be due to differences in the composition of the ATUS and CPS samples due to ATUS nonresponse rather than due to differences in reporting. To aid in analyzing this possibility, we decomposed the difference into four terms as follows: (1)
CPS ATUS E(H ATUS ) E(H CPS ) E(Hi,t3,MIS8 ⏐i in ATUS)] i,t i,t ) [E(Hi,t CPS CPS [E(Hi,t3,MIS8 ⏐i in ATUS) E(Hi,t3,MIS8 )] CPS CPS [E(Hi,t3,MIS8 ) E(Hi,t3 )] CPS [E(Hi,t3 ) E(H CPS i,t )]
where H denotes hours of work, i denotes an individual observation, t denotes the reference month for the estimate, the superscript denotes the survey, and the MIS8 subscript indicates that the observation is in MIS 8 (absence of a third subscript indicates that all Months in Sample are included). The first term in brackets is the difference between the time-diary estimate of hours worked from the ATUS and the CPS estimate of hours 18. When computing these measures, we reweight the data so that for the subpopulation we are estimating over each day of the week receives equal weight. (For large samples, this will be approximately true when the original ATUS weights are used; the reweighting is relevant in the following, when we restrict the sample to CPS reference periods.) 19. We defined work-related travel as travel between work sites and identified travel spells as work-related by looking at the surrounding activities.
∗Significantly different from zero at .05 level. ∗∗Significantly different from zero at .01 level.
ATUS ATUS 37.3 –1.7∗∗ –1.4∗∗ 38.6 –0.5 –0.3 270 1 –3
Definition 1 ATUS ATUS 37.7 –1.3∗∗ –1.0∗∗ 39.0 –0.1 0.1 273 4 0
Definition 2
Jan. 2003–Dec. 2003
Hours of paid work in the ATUS and the CPS
Survey—Hours response from: Sample—Respondents participated in: Average weekly hours Difference from CPS actual hours Adjusted difference from CPS actual hours Average weekly hours in CPS reference weeks Difference from CPS actual hours Adjusted difference from CPS actual hours Total annual hours (in billions) Difference from CPS actual hours Adjusted difference from CPS actual hours
Table 3.4
ATUS ATUS 37.9 –1.1∗∗ –0.8∗∗ 39.1 0.1 0.3 274 5∗ 1
Definition 3
CPS MIS 8 38.8
38.8
266
38.7
270
Actual CPS
CPS ATUS 38.5
Actual CPS
267
39.0
CPS CPS 39.0
Actual CPS
Oct. 2002–Sept. 2003
269
39.0
CPS CPS 39.0
Actual CPS
Jan. 2003– Dec. 2003
Concepts and Measurement in the American Time Use Survey
91
worked from the CPS MIS 8 interview (usually three months prior to the ATUS interview) for ATUS respondents. This term represents changes in the responses of ATUS respondents between the last CPS interview and the ATUS interview. These changes include the effects of differences in data collection mode on reporting of hours of work; differences in the reporting of other variables such as employment and multiple job holding; and differences in reference period coverage (the week of the twelfth versus most days of the year). The second term is the difference between ATUS respondents and the entire MIS 8 CPS sample in the CPS estimate of hours worked at time t – 3 (three months prior to the ATUS reference period). This term represents the effect of differences in sample composition between the CPS and the ATUS due mainly to nonresponse in the ATUS. Note that if the propensity to respond to the ATUS is a function of current hours of work, this is an imperfect proxy for the effect of sample composition—changes in hours of work in the months between the CPS and the ATUS may affect the propensity to respond but are not reflected in this term. The third term is the difference between the MIS 8 sample and the entire CPS sample in the CPS estimate of hours worked at time t – 3. This term captures rotation-group effects—the well-known phenomenon that responses to certain questions vary systematically with their month in sample.20 Note that the first term only accounts for differences in responses between the ATUS and the CPS MIS 8. The sum of the first and third terms can be thought of as an estimate of the average difference in responses between the ATUS and the entire CPS. The fourth term is the negative of the change in the CPS estimate of hours worked between three months prior to the ATUS reporting period and the ATUS reporting period and can be thought of as a correction of the first term for the actual change in hours worked between t – 3 and t. In summary, the sum of the first, third, and fourth terms is an estimate of the difference in hours reporting between the CPS and the ATUS, corrected for the change in actual hours between the time ATUS respondents were in the CPS and when they responded to the ATUS. Put differently, the effect of differing survey methods between the CPS and the ATUS on average reported hours can be estimated by taking the difference in reported hours for the same period and subtracting out the sample composition term. The sample composition effect is –0.3, which yields an adjusted difference between ATUS and CPS hours of between –1.4 and –0.8, depending on the definition of paid work used. We now control for differences in reference periods by restricting the ATUS sample to CPS reference weeks. The results are shown in the second 20. For example, the unemployment rate is higher for respondents in their first month of the CPS than it is for respondents in their second and subsequent months. See Bailar (1975).
92
Harley Frazis and Jay Stewart
set of rows of table 3.3. The difference between ATUS and CPS hours estimates changes dramatically. The overall difference between ATUS and CPS hours estimates ranges from –0.5 to 0.1 (and not significantly different from zero) when the ATUS sample is restricted to CPS reference weeks, compared to the –1.7 to –1.1 range when the entire sample is used. Adjusting for sample composition makes the difference more positive: between –0.3 and 0.3 hours, also not significant. Employment and multiple jobholding rates are higher in ATUS than they are in CPS, and this may affect hours of work comparisons. We can directly estimate the importance of these higher rates in the ATUS and thereby arrive at an indirect estimate of the pure effect of survey mode on hours reports. The higher employment rate should have an effect on average hours only to the extent that hours differ between people who were employed in both the CPS and ATUS interviews, and people whose reported employment status changed between the CPS and ATUS interviews.21 To illustrate this effect, we compared responses to the “usual hours worked” question by ATUS respondents in the CPS with responses to the equivalent question in the ATUS. We adjust for changes in hours between the CPS and the ATUS and possible mode effects by subtracting the change in usual hours for respondents who were employed in both the CPS and the ATUS (0.8 hours) from ATUS usual hours for all respondents employed in the ATUS. For these calculations, we restricted our attention to hours on the respondents’ main jobs to eliminate the effects of the differential multiple jobholding rates. Usual hours worked on main job was 38.8 for respondents who were working at the time of their CPS interview. The adjusted estimate of usual hours worked from the ATUS also was 38.8,22 21. The ATUS employment for ages sixteen is 144.8 million, while CPS employment for 2003 is 137.7 million. Part of this difference is because, somewhat surprisingly, employed persons are more likely to respond to the ATUS than the nonemployed. If ATUS respondents retained their CPS MIS 8 employment status, the ATUS weighted employment count would be 140.9 million. Performing a decomposition analogous to (1), approximately 50 percent of the difference between CPS and ATUS employment counts can be attributed to differences in responses to the employment questions between the CPS and the ATUS. A substantial proportion of this can be attributed to differences in reporting teenage employment. The ATUS measured employment of sixteen-to-nineteen-year-olds was 8.1 million in 2003, while CPS employment was 5.9 million. Almost all of the difference was due to differences between CPS and ATUS responses, possibly because most CPS responses were by proxy. 22. The 55.5 percent of respondents who were employed at both interviews reported that they usually worked 39.2 hours per week at their main job in their CPS interview and 40.0 hours in their ATUS interview, which implies that usual hours on main job increased by 0.8 of an hour between the CPS and ATUS interviews. About 7.3 percent of respondents changed from nonemployment in the CPS to employment in the ATUS, while 5.9 percent made the opposite transition. Usual hours worked (in their respective interviews) were 36.7 for respondents who were employed in the CPS but not the ATUS and 34.0 for respondents who were employed in the ATUS but not the CPS. Thus, respondents with new jobs reported longer hours than respondents who were about to leave their jobs, even if one were to subtract the 0.8 hour increase in hours. The adjusted usual hours for the ATUS interview were calculated as follows: {[.555 39.2 0.073 (36.7 – 0.8)]/(.555 0.073) 38.8}.
Concepts and Measurement in the American Time Use Survey
93
which implies that the higher employment rate had a negligible effect on average hours worked. To estimate the effect of differential multiple jobholding rates, we simply multiplied the difference in multiple jobholding rates (10.0 percent – 5.8 percent) by the average number of hours usually worked on second jobs (13.4) in the ATUS.23 This yields a multiple jobholding effect of 0.6, which, combined with the employment effect of zero, implies that the mode effect ranges from –0.9 to –0.3 hours depending on the definition used. These back-of-the-envelope type calculations complicate the calculation of standard errors; however, the mode effect is small for definitions 2 and 3. To summarize our results, actual hours of work reported in the CPS appear to be quite close to those reported in the ATUS during CPS reference weeks. However, CPS reference weeks appear to have greater hours of work than do nonreference weeks. Frazis and Stewart (2004) also found no evidence of significant mode effects using a somewhat different approach. They compared ATUS hours worked estimates to estimates for the same respondents from their CPS MIS 8 interview. By matching respondents, they eliminated the sample composition effects. Their restriction to respondents whose usual hours changed very little between their CPS MIS 8 and ATUS interviews was designed to restrict the sample to individuals who worked the same or similar hours at each interview, but it eliminated most differences that arose because of the higher multiple jobholding rate in the ATUS. After adjusting for differences in the treatment of rotation-group effects between Frazis and Stewart (2004) and the current paper, their findings were equivalent to a mode effect of –0.7 to 0 hours, which are quite close to the current results. We note that these results contrast with Robinson and Bostrom’s (1994) findings that hours reported from CPS-style questions have increasingly diverged from those reported in time use surveys. Abraham, Spletzer, and Stewart (1998) cited Robinson and Bostrom’s results as a potential explanation of the divergence between CPS and CES hours trends alluded to in the preceding; our evidence casts doubt on this explanation. For productivity measurement, analysts may also be interested in total population hours of work. After adjusting for differences in the sample, total hours in the ATUS are quite close to those in the CPS, with no statistically significant differences. The effects of differences in employment and in average hours per worker in the two surveys offset each other.24 23. The corresponding average from the CPS was 13.2. As in the preceding calculation, we used the average number of hours on second job for respondents who were single jobholders in the CPS and multiple jobholders in the ATUS. Average usual hours on second job for respondents who were multiple jobholders in both the CPS and the ATUS were 15.3 in the CPS and 14.3 in the ATUS. 24. This ignores a small amount of measured hours of work for persons not counted as working in the seven days preceding the interview. Including these hours adds about 3 billion hours to total hours worked, and the response change effects are statistically significant at the 5 percent level for definitions 2 and 3.
94
Harley Frazis and Jay Stewart
Analyzing trends in paid work using the ATUS, either by itself or in comparison with other surveys, will clearly not be possible for some time. However, the ATUS will offer the advantages of relatively large sample sizes and consistent survey methods over time. 3.3.3 Other Uses for Time Use Data In addition to measuring nonmarket work and improving measurement of market work, time use data can be used for a variety of other purposes of interest to economists. We mention a few here.25 Intrahousehold Allocation of Time The household production models mentioned previously also have implications for the intrahousehold allocation of resources as do more recent household bargaining models. Both types of models yield testable implications about how husbands and wives spend their time. The ATUS’s collection of time use data from only one household member places some limits on the types of questions—mainly questions regarding the temporal coordination of activities by spouses—that can be answered using ATUS data, but research questions regarding average time spent by spouses in given activities, and how these averages vary with the spouses’ characteristics, can be answered (see Friedberg and Webb [2005] for an example). Because the survey sample is drawn from the CPS, which gathers demographic and labor market information for the entire household, analysts have available a rich set of controls for household members other than the respondent. Most of this information was collected in prior months, but, as noted previously, the ATUS updates the spouse’s labor force status and usual hours of work. One can examine mean hours of time spent in a given activity by individuals in given living arrangements and compare means across different individuals in that same arrangement. For example, one can examine the hours spent in leisure activities for married men and married women. More complicated examples include comparing the leisure time (and difference in leisure time) of husbands and wives when both work full time to leisure time when the husband works full time and the wife works part time. Because the CPS collects data on wages,26 it is also possible to estimate the average difference in time spent in an activity between husbands and wives with a given wage rate for the wife or a given difference in wage rates between the husband and wife. Income and Well-Being It is widely known that income inequality has increased over the past two decades (Gottschalk and Smeeding 1997). But as with national income, 25. See also Hamermesh, Frazis, and Stewart (2005). 26. Note that the analyst will have information on the respondent’s wages as of the previous CPS month and as of the current ATUS month if he or she has changed jobs. The analyst will have information on the spouse’s wages as of the previous CPS month.
Concepts and Measurement in the American Time Use Survey
95
money income tells only part of the story. Given that the value of nonmarket production is equal to about one-quarter of GDP and that household production models predict a negative relationship between money income and time spent doing household work, we would expect the inclusion of nonmarket work to reduce measured inequality. With ATUS data, it is possible to determine the effect of incorporating household production on measured inequality27 and, once we have more years of data, determine how this affects trends. With ATUS data it is possible to examine how incorporating household production affects comparisons across education groups either using CPS MIS-8 earnings data (which ignores unearned income) or (for some months) March Income Supplement data. Similarly, it is possible to compare differences in this broader measure of income inclusive of household production among racial groups or among other demographic categories. Differences in leisure between demographic groups could be analyzed in a similar fashion. Activities of Nonworkers While prime-age males have higher labor force participation rates than prime-age females, an increasing percentage of prime-age males is not in the labor force (see Juhn 1992, 2003; Welch 1997; Stewart 2004). The Gronau model predicts that nonworkers will spend the time freed up by not working in both leisure and household production activities. But time use data is needed to tell us how the freed-up time is divided between these types of activities. This question is important from a resource-utilization point of view. But it can also shed light on the extent to which time spent in household production is sensitive to macroeconomic conditions. Stewart (2004) examines this question using data from a 1992–1994 time-diary study conducted by the University of Maryland. He compares how male workers and nonworkers use their time. He finds that full-time workers spend about 6.6 hours more per day in work and work-related activities than do nonworkers. Put another way, nonworkers have about 6.6 hours more per day to “spend” in activities other than paid work. How do they spend this time? Stewart finds that they spend just over 30 percent of this time in productive activities such as education and household work. The remaining 70 percent is spent in leisure (mainly watching TV) and personal care (mainly sleeping). The ATUS data could add to our understanding of male nonworkers in two ways. First, the sample size is significantly larger. Stewart had 1,833 observations, of which only 151 were nonworkers. A larger sample would generate more precise estimates (although the differences found by Stewart were significant at the 5 percent level) and would allow more detailed 27. See Frazis and Stewart (2005) for an example using 2003 ATUS data.
96
Harley Frazis and Jay Stewart
comparisons. Second, the ATUS has more detailed information about labor market activities. For example, by matching to previous CPS interviews, it is possible to distinguish between long-term and short-term nonworkers. 3.4 Concluding Remarks Economists have been aided by time use surveys in seeking to understand the behavior of hours worked, the extent of household production, and other issues. However, existing surveys have been conducted infrequently, with small sample sizes and with differences in methods between surveys. As a result, analysts have used data from a single survey and assumed that all changes in time use over time were due to compositional changes. Or they have constructed a time series from several surveys that use different methods. The ATUS will allow analysts to track trends in time use. The survey can also track trends in differences between hours in an activity (such as paid work) as shown in a time diary and hours as shown in other surveys using simpler questions. The sample size over periods of a year or more will allow more detailed analyses of time use than has been possible in the past.
References Abraham, Katharine G., James R. Spletzer, and Jay C. Stewart. 1998. Divergent trends in alternative wage series. In Labor statistics measurement issues, ed. John Haltiwanger, Marilyn E. Manser, and Robert Topel, 293–324. Studies in Income and Wealth, vol. 60. Chicago: University of Chicago Press. Australian Bureau of Statistics. 2000. Unpaid work and the Australian economy 1997. Catalog no. 5240.0. Canberra, AU: Australian Bureau of Statistics. Bailar, Barbara A. 1975. The effects of rotation group bias on estimates from panel surveys. Journal of the American Statistical Association 70 (349): 23–30. Becker, Gary. 1965. A theory of the allocation of time. Economic Journal 75 (299): 493–517. Eisner, Robert. 1988. Extended accounts for national income and product. Journal of Economic Literature 26:1611–84. Frazis, Harley, and Jay Stewart. 2004. What can time-use data tell us about hours of work? Monthly Labor Review 127 (12): 3–9. ———. 2005. How does household production affect earnings inequality? Evidence from the American Time Use Survey. Paper prepared for conference, Time Use and Economic Well-Being, Annandale-on-Hudson, NY. Friedberg, Leora, and Anthony Webb. 2005. The chore wars: Household bargaining and leisure time. Paper prepared for American Economic Association meetings, Boston. Gottschalk, Peter, and Timothy M. Smeeding. 1997. Cross-national comparisons of earnings and income inequality. Journal of Economic Literature 35 (2): 633–87.
Concepts and Measurement in the American Time Use Survey
97
Gronau, Reuben. 1986. Home production—A survey. In Handbook of labor economics, ed. Orley Ashenfelter and Richard Layard, 273–304. Amsterdam: North Holland. Hamermesh, Daniel. 1990. Shirking or productive schmoozing: Wages and the allocation of time at work. Industrial and Labor Relations Review 43 (3): 121S– 133S. Hamermesh, Daniel S., Harley Frazis, and Jay Stewart. 2005. Data watch—The American Time Use Survey. Journal of Economic Perspectives 19 (1): 221–32. Harvey, Andrew, and Aimee St. Croix. 2003. Time-use program. St. Mary’s University. Unpublished Table. Herz, Diane, and Michael Horrigan. 2004. Planning, designing, and executing the BLS American Time Use Survey. Monthly Labor Review 127 (10): 3–19. ———. 2005. A study in the process of planning, designing and executing a survey program: The BLS American Time-Use Survey. In The economics of time use, ed. D. Hamermesh and G. Pfann, 317–50. Amsterdam: Elsevier. Holloway, Sue, Sandra Short, and Sarah Tamplin. 2002. Household satellite account (experimental) methodology. London: Office for National Statistics, April. Juhn, Chinhui. 1992. Decline of male labor market participation: The role of declining market opportunities. Quarterly Journal of Economics 107:79–121. ———. 2003. Labor market dropouts and trends in the wages of black and white men. Industrial and Labor Relations Review 56 (4): 643–62. Landefeld, J. Steven, and Stephanie H. McCulla. 2000. Accounting for nonmarket household production within a national accounts framework. Review of Income and Wealth 46 (3): 289–307. Nordhaus, William. 2002. An economist’s view of the statistical state of the nation. Testimony before the Joint Economic Committee of the U.S. Congress, 107th Cong., 2nd sess., July 24, 2002. Reid, Margaret G. 1934. Economics of household production. New York: Wiley. Robinson, John, and Ann Bostrom. 1994. The overestimated workweek? What time diary measures suggest. Monthly Labor Review 117 (1): 11–23. Robinson, John P., and Geoffrey Godbey. 1997. Time for life: The surprising ways Americans use their time. 2nd ed. State College, PA: Pennsylvania State University Press. Schwartz, Lisa K. 2001. Minding the children: Understanding how recall and conceptual interpretations influence responses to a time-use summary question. Bureau of Labor Statistics. Mimeograph. Statistics New Zealand. 2001. Measuring unpaid work in New Zealand 1999. Wellington, NZ: Statistics New Zealand. Stewart, Jay. 2002. Assessing the bias associated with alternative contact strategies in telephone time-use surveys. Survey Methodology 28 (2): 157–68. ———. 2004. What do male nonworkers do? BLS Working Paper no. 371. Washington, DC: Bureau of Labor Statistics. United Nations. 1993. System of national accounts 1993. New York: United Nations. Welch, Finis. 1997. Wages and participation. Journal of Labor Economics 15 (1, pt. 2): S77–S103.
4 Technology and the Theory of Vintage Aggregation Michael J. Harper
I only want to urge that research not get tied to any one particular picture of the way the economy functions. —Robert Solow (2001)
4.1 Introduction When vintage investments are aggregated into “capital stocks,” rigid assumptions are made about the effects of technology. This can limit the usefulness of capital stock measures. Vintage aggregation issues were the subject of a vigorous literature in the 1950s and 1960s, and Zvi Griliches (1963) was an active participant in these discussions. By the 1970s, many economists considered these issues to be resolved, and they have received less attention in recent decades. The acceleration of information processing and communications technologies in the 1990s, however, may increase the potential for bias in capital stock measures. This accelerating technological progress has had many ramifications for economic measurement in the United States. Our National Income and Product Accounts did not account for the quality change in computers until Jack Triplett (1989) proposed adjusting Computer Price Indexes for quality change with hedonic methods. As Charles Hulten (1992) observed, the Triplett treatment identifies quality change with embodied technical change, as Robert Hall (1968) had defined it. Once these hedonic methods were in place, growth Michael J. Harper is Associate Commissioner for Productivity and Technology at the Bureau of Labor Statistics. The views expressed are those of the author and do not necessarily reflect the policies of the U.S. Bureau of Labor Statistics or the views of other staff members. An early draft of this paper was presented at the NBER Summer Institute in 1999, where Zvi Griliches made helpful comments. I would also like to acknowledge Anthony Barkume, Ernst Berndt, Carol Corrado, Edwin Dean, W. Erwin Diewert, Murray Foss, Robert Gordon, Charles Hulten, Marilyn Manser, Peter Meyer, Linda Moeller, Phyllis Otto, Sabrina Pabilonia, Matthew Shapiro, Daniel Sichel, Brian Sliker, Leo Sveikauskas, Jack Triplett, and Cynthia Zoghi for helpful comments, suggestions, and encouragement at various stages of this line of work. Any errors are entirely my own.
99
100
Michael J. Harper
accounting studies by Stephen Oliner and Daniel Sichel (2000) and by Dale Jorgenson and Kevin Stiroh (2001) concluded that computer quality was, perhaps, the most important source of U.S. productivity growth in the late 1990s. However, European national accountants found reasons to be skeptical. For example, Peter Hill (2000) pointed out that quality adjustments made from vintage accounts of prices reflected only the positive aspects of quality change while neglecting some negative effects. This paper uses diagrams as well as formal definitions to shed some light on the implications of the rigid assumptions about vintage aggregation made in our standard total factor productivity work. The approach appeals to two different models of how technology and capital formation can explain growth in labor productivity, both proposed by Robert Solow (1957, 1960). The first model assumed that technology was “disembodied” in that it raised productivity independently of the level of investment. Vintage investments were summarized in a “capital stock” measure that was used to separate the contributions of capital and of “residual” technology change to labor productivity growth. This first model is the basis for modern neoclassical total factor productivity exercises. In the second Solow model, technology was “embodied” in capital, and the contribution of each “vintage” of capital to labor productivity could be different. This second model was equally rooted in neoclassical concepts, but it did not make the rigid assumptions needed to build a capital stock. Instead, Solow described the dynamic allocation of labor among capital of different vintages. After a brief review of relevant material on models of production, capital measurement, and quality adjustment (section 4.2), this paper develops a “model of production with machines” (section 4.3) in which the Solow vintage model is extended to individual machines. This machine model permits clearer definitions of key concepts such as deterioration and embodied and disembodied technical change. The machine model predicts that older vintages are preferentially discarded during a cyclical downturn. This realistic behavior is inconsistent with what is assumed in capital stock calculations. Section 4.4 examines the machine model in nominal terms. Section 4.5 considers the idea of real capital input. The machine model is used to clarify previous discussions of what the marginal product of capital is—the added output obtained from a collection of machines by adding one machine (not a machine hour) and without adding any labor. Section 4.6 discusses how quality adjustments to capital inputs could be overstated. 4.2 Some Background on Models of Growth, Capital Stocks, and Quality Change 4.2.1 The Solow Residual Model Solow’s (1957) residual model constructed an aggregate capital stock, K, and used it in a production function, f, of the form Y f (L, K, t), where
Technology and the Theory of Vintage Aggregation
101
Y is a real value added output measure, L is labor hours, K is aggregate capital stock, and t is the time of observation, to parse out the contributions to labor productivity growth of capital and of shifts in the production function.1 Solow showed that (1)
(y l ) sK(k l ) a,
where y, l, and k represent the growth rates of output, labor, and capital stock, respectively, and where sK is the share of capital or property income, , in the value of output. Property income is calculated using output prices, p, and wages, w, as the residual of labor compensation in nominal value added output: pY – wL. The “Solow residual,” a, is a measure of disembodied technical change in that it is presumed to contribute independently of the level of investment in capital. The growth rate of a is typically determined as the residual of output growth not accounted for by growth in capital and labor inputs. Hulten (2001) reviewed many studies which have measured the residual, most commonly known as total factor productivity growth. The Bureau of Labor Statistics (BLS; 2001) produces measures of a, which it calls multifactor productivity, using the same general methods. Time series estimates of labor productivity and of a are procyclical. This has been a troubling problem for neoclassical models because a is designed to measure technological progress that should not be highly sensitive to the cycle. This issue was explored by Catherine Morrison and Ernst Berndt (1981). Many ideas, such as labor hoarding and disequilibrium, have been put forth in an effort to reconcile apparent short-run increasing marginal returns to labor with the neoclassical prediction of diminishing returns. Section 4.5 will show how the cyclical nature of the residual is partly a consequence of the rigidity with which capital is measured. 4.2.2 The Solow Vintage Model “The controversies still rage[d]” when Harcourt (1969, 369) wrote his account of a bitter debate in the literature over whether capital measurement is useful. An understanding of the issues had gradually emerged in the context of Leontief’s (1947) aggregation theorem. To build a stock, capital had to be like jelly—the ratios of marginal products of different investments could not vary as functions of output or other inputs in the production function. Empirically, the Leontief conditions are rarely satisfied. For example, newer electric power plants are used continuously while older plants are reserved to meet peak demand. Another example is that one fast computer is not a perfect substitute for two slow machines for which the total cost is the same because the latter are designed in most cases to work with two people. 1. The derivation involves differentiating the production function with respect to time, assuming constant returns to scale, that input prices are given, and that inputs are paid the values of their marginal products.
102
Michael J. Harper
In response to this type of problem, Solow (1960) had proposed a vintage capital model. In this model, each vintage, v, of capital has its own production function, ft,v , where v is the time at which the capital was initially marketed or sold. The function describes how much output could be made with any given amount of surviving capital, Kt,v , and labor, Lt,v , in any period subsequent to the year of an initial investment, Iv : Yt,v ft,v(Kt,v , Lt,v )
(2)
This allowed for technical progress to be embodied in capital goods. To enforce the idea that vintage production functions are separate, Solow imposed the ground rule that firms must apply labor to specific vintages with no joint effects. Thus, the observed totals for labor hours and output are the sums of vintage specific contributions: (3)
Lt ∑ v Lt,v and Yt ∑ v Yt,v
This structure accommodates heterogeneity among the production processes used by capital assets of different vintages. In some year, the capital measure required for the Solow residual model might count two slower computers as the same amount of capital as the one fast one. These relative valuations, however, might change over time, and so an aggregate measure of capital counting the two types of machines according to a fixed relationship could be inconsistent, that is, ambiguous. 4.2.3 The Hall Equation A vintage aggregate of investments, or capital stock, is commonly used to calculate the Solow residual. The theoretical conditions under which this aggregate is consistent were thoroughly reviewed by Franklin Fisher (1965). A capital services aggregate, (4)
Jt v zt,v Iv dv,
effectively assumes vintage investments are featureless perfect substitutes, implying that capital is like “jelly,” denoted J. Fisher showed that in order for J to be consistent, the efficiency function, z, must adjust the quantity measure for all differences in marginal product while remaining independent of output prices and wages. Seemingly, z had to be a predetermined function of time. Hall (1968) argued, however, that there is latitude in defining jelly. He observed that: The basic theorem on capital aggregates makes no restriction on the behavior of the function z(v) over calendar time. From one year to the next, the pattern of efficiency as a function of vintage may change arbitrarily. . . . This formulation is so general as to be almost vacuous. (36) In proposing that the recipe for jelly could be changed from year to year, Hall recognized that this generalization was so vast as to obscure the
Technology and the Theory of Vintage Aggregation
103
capital-related phenomena addressed in previous literature. To reach an interpretation, he proposed a structural form for z involving a decomposition into three factors that he could loosely associate with important phenomena: functions of time (dt , disembodied technical change), of age (t–v , deterioration), and of vintage (bv , embodied technical change): (5)
Jt v zt,v Iv dv dt v tv bv Iv dv
Hall then pointed out that functions dt , t–v , and bv reflect only two independent influences, time and vintage, and so the specification can be written in terms of two functions, eliminating the third by including its influence in respecified versions of the other two. This pulled the two Solow models together under a particular specification, which I will refer to as Hall’s equation (5). 4.2.4 Quality Adjustment to Capital Measures New improved models of high-tech equipment that embody improvements are frequently introduced and marketed alongside older models. Quality adjustment involves comparing prices of new improved goods to new unimproved goods. I will define each asset’s model year, m, to be the year in which assets with a given design were first sold. Capital goods prices will be denoted using three subscripts, p Kt,v,m . In principle, a quality adjustment factor, b, can be defined by comparing the prices of brand new goods (v t) of the latest model (m t) to the prices of brand new goods of a previous period’s model (m t – 1). For example, matched models assume bm / bm–1 p Kt,t,t /p Kt,t,t–1. Hedonic models estimate b from wider sets of prices and characteristics. Statistical agencies then measure real capital by deflating measures of nominal capital expenditures, Et , with a price index that tracks the price for a new good of constant quality: (6)
Jt (Et /p Kt,t,t1) Jt1 (Et /p Kt1,t1,t1)
This is equivalent to deflating with a quality-adjusted price index, that is, Jt /Jt–1 (Et bt /p Kt,t,t )/(Et bt–1/p Kt–1,t–1,t–1 ). The quality-adjustment parameter factor, b, would seem to be the right factor to use in Hall’s equation. However, this presumes relative prices of capital goods reflect their relative marginal products, a notion that will be critiqued in section 4.5. 4.3 A Model of Production with Machines In this section, a model is developed describing production from individual machines, which could be almost any type of asset such as computers, trucks, or buildings. As in section 4.2, three subscripts are used to denote time, vintage, and model, where the model variable will be regarded as the time at which machines with specific physical characteristics were first
104
Michael J. Harper
sold in the market. This third subscript will be used in the formulation of definitions as well as in the description of available data on capital goods prices. Solow’s vintage model (equation [2]) is modified to describe output as a function of labor associated with specific models, m, as well as specific vintages, that is, Yt,v,m ft,v,m(Kt,v,m , Lt,v,m ). At any time the economywide stocks of each vintage and model, Kt,v,m , will be regarded as fixed by past investment decisions. A firm owning (and planning to keep) a machine, of type t, v, m, faces the following short-run production possibilities for generating output from labor:
Yt,v,m Lt,v,m gt,v,m Kt,v,m Kt,v,m
(7)
The capital variable, Kt,v,m , involves aggregation across machines of type t, v, m. I will treat these machines as identical, but I will consider their discrete nature rather than treating Kt,v,m as jelly. Let Kt,v,m refer to the set of all machines in existence at any time, t. Consider Kt,v,m to consist of discrete numbers, nt,v,m of identical machines. Assume that all machines in each vintage-model category are used identically at each point in time. Then total output from all machines of each vintage-model combination will be nt,v,m times the output of each machine. The machine production function, f, is then defined in terms of output per machine, by vintage and model: (8)
f n n Yt,v,m
Lt,v,m
t,v,m
t,v,m
∀ v, m ⊂ Kt,v,m
t,v,m
Assume that the output coming from each machine is a smooth function, f, of labor and is characterized by diminishing marginal returns to labor. Figure 4.1 depicts such a machine production function, f. If the firm chooses to operate at point A, the average product of labor (labor productivity) will be the slope of ray OA and the marginal product of labor will be the slope of the tangent to f at A. I next extend Solow’s ground rule (equation [3]) so that labor is allocated to specific machines to produce output: (9)
Lt v m Lt,v,m dm dv and Yt v m Yt,v,m dm dv
As in Solow’s vintage model, assume that output and labor can be measured and that they are homogeneous. Also, cost minimization implies that the marginal product of labor applied to each machine will be the same (and will equal the wage rate relative to the price of output): (10)
∂ft,v,m wt ∂Lt,v,m pt
∀ t, v, m
In figure 4.2, three machine functions are depicted. Expression (10) implies they will be operated at points where tangents are parallel. It is also important to note that labor productivity differs by vintage (even though the
Fig. 4.1
A machine production function
Note: The firm can choose where to operate along f and chooses A. The slope of ray OA is the average product of labor (labor productivity). The slope of the tangent of f at A is the marginal product of labor.
Fig. 4.2
A family of machine functions
Note: Several machines may operate simultaneously. Labor productivity may differ even though the marginal product is the same ( f, g, and h are tangent to parallel lines). A machine’s function may shift down with age due to deterioration or up with time due to disembodied technical change (or both). New machines tend to appear higher in the figure, meaning they allow higher labor productivity that is embodied technical change.
106
Michael J. Harper
marginal product of labor is the same). In this situation, if wt /pt changed, labor would be reallocated in such a way that the marginal product of labor on all vintages would adjust proportionally to wt /pt , but note that the average product of labor could adjust differently on each vintage. If this model were ever to be elaborated as thoroughly as Solow’s residual model, issues such as the heterogeneity of labor and of output (composition or quality effects) and the relationships among different types of capital might be addressed. However, in order to facilitate exposition, this paper will describe a situation where one type of output is made with one type of labor using progressively advancing versions of one type of machine. 4.3.1 Relationships among Functions Zvi Griliches (1963) made one of the most thorough efforts in the literature to define the key concepts of capital measurement, such as replacement, depreciation, deterioration, obsolescence, and capital services. This paper will provide similar definitions that refer to the machine model. In order to facilitate compact mathematical definitions and analysis of phenomena associated with capital, it is assumed that machine production functions, ft,v,m and related variables are continuous functions of time, vintage, and model. As machines age, their physical characteristics change due to wear and tear. The rate of deterioration of output, ft,v,m , (the output decay rate) is defined as the rate at which the output produced by a given amount of labor with a given model varies by vintage: (11)
∂ ln ft,v,m ∂ ln ft,v,m ft,v,m ∂(t v) ∂v
As indicated, this will be the negative of the rate at which output varies by age alone for a given model. Note that the deterioration rate can vary with time, vintage, or model. Newer models embody features that permit them to make more output with the same amount of labor. The rate at which functions differ due to embodied technical change, B ft,v,m , is defined in terms of models, or equivalently, model age: (12)
∂ ln ft,v,m ∂ ln ft,v,m B ft,v,m ∂(t m) ∂m
These are the only two types of shifts considered that are due solely to the machine’s physical characteristics. However, as time passes, people may learn how to get more out of a given machine. Disembodied technical change, D ft,v,m , is the rate at which the function shifts over time for a specific model and age: (13)
∂ ln ft,v,m D ft,v,m ∂t
Technology and the Theory of Vintage Aggregation
107
There was an identification problem with Hall’s (1968) specification in that deterioration ( ) and embodied (B) and disembodied (D) technical change were defined in terms of functions of time, vintage, and age (t – v). These functions were defined, in turn, in terms of only two independent variables, t and v. This is not the case here because a third independent variable, model, is introduced to control for the characteristics of new machines that differ even though they are sold as new in the same year. In principle one could use empirical observations to identify , B, and D separately. Identical brand new models made in different years could help identify disembodied technical change. Thus one could observe ft1,v1,m / ft,v,m to measure D, ft,v1,m /ft,v,m to measure and ft,v,m1/ft,v,m to measure B. 4.4 The Nominal Earnings of Assets Described with the Machine Model This section will use the machine model to analyze the earnings of assets under dynamic conditions, such as how they are influenced by technology and cyclical fluctuations in demand. This material will be helpful in tackling the issues in measuring real capital in section 4.5. 4.4.1 Extraction of Rents from Machines— The Structure of the Shadows For each vintage and model, define the rent or property income, t,v,m , generated per machine as the difference between revenues and variable costs associated with the machine: (14)
t,v,m ptYt,v,m wt Lt,v,m nt,v,m nt,v,m nt,v,m
As Berndt and Fuss (1986) assumed, in the short run, firms can be expected to behave as if capital costs are fixed and sunk, and so they will go about the business of maximizing the rate at which they accrue property income, t,v,m . The ex post rents generated by the aggregate capital stock emerge as the shadow price of the capital stock. The machine model supports an explanation of how output prices and wages influence decisions on operating individual machines. This begins by assuming that, in the short run, each firm has a fixed collection of assets and is too small to influence wages and output prices. Assume that each firm will extract as much rent as possible from each machine it owns. With a given price, a given wage, and a given set of machines in place, the decision as to how much to run each machine can be represented in terms of values rather than in terms of input and output units. The following describes how much revenue can be earned from one machine as a function of expenditure on labor costs: (15)
ptYt,v,m wt Lt,v,m f t,v,m , nt,v,m nt,v,m
108
Fig. 4.3
Michael J. Harper
Revenue function
Note: The machine owner is a price taker for both wages and product price (these are exogenous). For a given wage and price, the machine function, f, can be projected into a revenuecost plane. The revenue function, f , will look exactly like f if the revenue and cost axes are suitably normalized (w 1, p 1). Ray OB delineates where revenue equals labor cost. The owner will choose operating at point A, where the tangent to f is parallel to OB. Then segment AB will measure rents (gross profits measured by revenue less variable cost).
where f is a revenue function that is closely related to the corresponding machine production functions: ∂f t,v,m /∂Lt,v,m wt ∂ft,v,m /∂Lt,v,m ∀ t, v, and m. Given the assumed price-taking behavior at any time, t, one can relabel the axes of figure 4.1 as “revenues” and “labor costs” and construct the scale so that wt 1 and pt 1. Then the revenue function, f will be in the same location as f, as depicted in figure 4.3. Ray OB has been added through points in the first quadrant for which revenues equal labor costs. A machine earns positive rents when operated at any point above ray OB. Rents will be at a maximum when expression (14) is satisfied, so the firm will operate at point A. The tangent to f at A is parallel to OB. Line segment AB is a measure of the rents generated by the machine (revenue less cost). 4.4.2 Visualizing Changes in Output Prices or Wages Fixed output prices and wages are built in to figure 4.3. The revenue function would move when prices or wages changed, while the ray, OB, would remain fixed. If the price of output declined, all points on the function would shift proportionally downward. Similarly, if wages rose, the function would shift rightward and would be stretched to the right. With a
Technology and the Theory of Vintage Aggregation
Fig. 4.4
109
The dynamics of a wage increase
Note: If wages rise relative to prices, the revenue cost ray, OB, would remain fixed and the functions, f and f , g , would shift and elongate rightward. However, it is possible to renormalize the axes in the plane as wages rise so that the functions stay put. The revenue cost ray would then appear to rotate upward, to the position of ray OD. The rents from f will be driven down from the length of segment AB to that of segment CD. Note that labor productivity rises slightly (the slope of OC is greater than that of OA). Machine g meets a different fate: rents become negative after the wage increase, and so it is shut off abruptly to avoid an operating loss.
little imagination, figure 4.3 can be used for a different visualization of the consequences of changes in these variables. Rather than redrawing all of the curves, one can simply adjust the scales by renormalizing wages and prices. Then the revenue function will stay in its original place, and OB will appear to rotate counterclockwise (up) through the first quadrant, perhaps to position, OD, as depicted in figure 4.4. A wage increase would reduce the rents earned from f from the length of segment AB to that of CD. (An output price decrease would involve a change in the scale of the vertical axes, ruining the correspondence of vertical segment lengths to rents, so we will focus on the wage increase.) This illustrates how a wage increase drives down rents and creates pressure to economize on labor. Note that CD is to the left of AB. Faced with a wage increase, the firm will reduce the amount of labor and output slightly, raising average labor productivity, consistent with what Cooper and Haltiwanger (1993) have observed happening to plants as they aged. In the long run, technological improvements generally lead to investments in improved capital goods that, in turn, bid for scarce labor, driving a persistent upward rotation in the ray representing revenue
110
Michael J. Harper
equals cost. The effect of obsolescence is just the rent lost due to the persistent rise in wages relative to the price of output. This rise (or rotation in ray OB) is not necessarily a constant—a cyclical downturn in the economy can accelerate the upward rotation of the ray, while a surge in demand can temporarily reverse the process, causing a downward rotation of ray OB and an increase in rents. A key point is that the rotation of OB reflects both temporal and cyclical influences. The two exogenous influences tend to get swept together in the standard approach to capital measurement. The temporal influence creates obsolescence, while the cyclical influence is what underlies the Berndt and Fuss (1986) “temporary equilibria.” 4.4.3 Negative Rents, If Permanent, Will Induce Asset Retirement Negative rents would occur if wages rose enough so that a revenue function fell entirely below the revenue or cost ray, as is the case with g and ray OD in figure 4.4. Negative rents can occur if the revenue function has a fixed labor requirement. No output (revenue) is produced unless this requirement is met, but once it is met, the function rises rapidly. If OD is high enough, and if diminishing marginal returns set in soon enough, revenues may never cover costs. Any attempt to operate the machine will result in a loss. In this situation, we assume that the machine is shut down. Unlike a capital stock, the machine model can be consistent with an abrupt shutdown of a machine or plant—as OB rises (and before it reaches OD), rents would transition from positive to negative, causing all labor suddenly to be withdrawn from the asset. This type of model could potentially be used with microdata to investigate plant closing behavior, but this paper will focus on measuring capital. In order to predict abrupt retirements, the machine model must be specified with a fixed labor requirement as in my figures.2 This cannot happen with the Cobb-Douglas specification3 of the vintage production function that Solow (1960) used in an empirical exercise. As depicted in figure 4.5, Solow’s functions would start at the origin and move out into the first quadrant, with newer vintages above older. The slope of each curve would gradually decline, reflecting diminishing marginal returns to labor. But there would be diminishing average returns to labor throughout each curve. If the ray OB gradually rotated upward squeezing rents, the firm would continue to operate the machine using less and less labor until labor reached zero. Falling rents would not lead to abrupt shutdowns, and, instead, old machines would just gradually fade away. It is interesting that on any function, f, that has a region where labor productivity is rising, labor productivity will reach a maximum at the point at 2. This can happen with any specification with a region where average returns to labor (labor productivity) are rising. 3. Output per unit of capital, with the Solow’s vintage Cobb-Douglas function, is given by (Yt,v /Kt,v ) Be v(Lt,v /Kt,v ).
Technology and the Theory of Vintage Aggregation
Fig. 4.5
111
Cobb-Douglas specification
Note: Solow mentioned one possible specification for his vintage production function, and here it is graphed for two machines. Because there is no part of the domain where there are increasing average returns to labor, rents will only approach zero as wages become very high. Negative rents are impossible, and there is not an abrupt transition from operating with a lot of labor to being shut down. Old machines do not die, they just gradually fade away.
which the curve is tangent to a ray from the origin. As a consequence of diminishing marginal returns, a machine will never operate to the left of this point. It follows that the labor productivity associated with a machine is at its maximum when the machine is marginal. Machines will operate to the right of this point. 4.5 Measuring Real Capital Given the machine model, we now consider what measurement units and weights would be suitable for the aggregation of real capital inputs. 4.5.1 Measurement Units and the Aggregation of Machines The machine model can be used to devise an aggregate capital measure that reflects many of the factors affecting capital vintages. This perspective will help to identify how these factors are being treated in recent studies of capital and productivity. The model includes a unit, the number of machines, nt,v,m , that can be used to add up identical assets. This is a less constrained starting point for vintage aggregation than the usual real value of investment. However, each category of machine is different. Expression (4)
112
Michael J. Harper
indicates that machine counts need to be weighted by marginal products in order to satisfy the Leontief aggregation conditions. The intuition is that investments must be adjusted for how much work they do. In productivity measurement, the Bureau of Labor Statistics (BLS) adjusts employment by average hours. Many authors have considered an analogous treatment of capital, that is, adjusting the number of machines, nt,v,m , for the intensity of their use. The idea is to adjust for the cyclical changes in marginal product. Jorgenson and Griliches (1967) originally made capacity utilization adjustments, but they later decided to avoid measuring capital in terms of other variables in the production function, like labor hours or energy use. These adjustments seemed to undermine the notion of an independent capital measure. As Berndt and Fuss (1986) pointed out, weighting capital with its shadow price is tantamount to adjusting for capacity utilization, and so a quantity-side adjustment in the Jorgenson and Griliches framework would account for capacity utilization twice. Present day neoclassical capital measurement studies, such as Jorgenson and Stiroh (2001) and the BLS (2001) measures, do not make quantity-side capacity adjustments. Nevertheless, let us work through the idea of making a quantity-side adjustment for capacity utilization instead of a rental price-side adjustment. We would need to adjust the quantities for variations in marginal product. In general machine hours, hKt,v,m , will not correspond to the marginal product of capital. Workers lose utility by giving up leisure time to work—and so they are (usually) compensated by the hour. But idle capital has no utility, and an asset’s owner is (usually) not compensated by how many hours per day it is used. Once an asset is acquired, it is used for as many hours per day as necessary to maximize the difference between revenues and labor costs. So in the temporary equilibrium described by Berndt and Fuss (1986), the marginal value of running a machine one more hour per day will be zero, ∂t,v,m / ∂hKt,v,m 0. That is, one machine hour is not equivalent to another, and the contribution of the last hour is marginal. Therefore the total hours of each type of machine is not necessarily the appropriate weight for use in the aggregation of machines. 4.5.2 The Marginal Product and Rental Price of a Machine While Fisher (1965) showed that vintages needed to be aggregated in terms of marginal product (expression [4]), the literature lacks a careful discussion of what the marginal product of capital is. Present measurement conventions regard “a spade to be a spade” (the “Gertrude Stein dictum,” as Harcourt [1969, 372] put it), so a brand new machine of a given model is assumed to represent the same amount of capital (to have the same marginal product) in each time period. Thus, the quantity unit for capital is tied exclusively to the inherent characteristics of the machine. The machine model leads to a very different conclusion. When a ma-
Technology and the Theory of Vintage Aggregation
113
chine is added to the economy while total labor is held constant, the output of the new machine will be gained, but some output from other machines will be lost because labor must be redeployed to the new machine and away from other machines.4 Because labor is always redeployed at labor’s marginal product, the new machine will boost output by the difference of the average product of labor on this machine, and the marginal product of labor (which will be the same on the new machine as on all other machines). Hence the marginal product of machines, zt,v,m ∂Yt / ∂nt,v,m , is (16)
Yt,v,m Lt,v,m ∂Yt,v,m Yt,v,m ∂Yt,v,m Lt,v,m zt,v,m . nt,v,m nt,v,m ∂Lt,v,m Lt,v,m ∂Lt,v,m nt,v,m
The marginal product of the machine is determined by its own machine production function and by the marginal product of labor, which in turn is determined by the ratio of exogenous functions of time, wt /pt . Marginal product closely corresponds to rent. Rent per machine, or the machine rental price, ct,v,m , is just the price of output times the machine’s marginal product: (17)
t,v,m ct,v,m pt zt,v,m nt,v,m
Note that, ct,v,m reflects the marginal product of capital in that differences in marginal products between machines, z, will show up as differences in the rental prices, c. It is possible to picture how a machine’s marginal product changes by projecting figure 4.4 back into the output–labor-hours plane of figure 4.1. Figure 4.6 depicts the rays OB and OD representing the two given wageprice ratios, projected into output-labor space. Marginal products are proportional to the vertical distances between each operating point and the relevant ray. As the wage-price ratio changes, the vertical distances associated with different vintages will clearly change. If figure 4.6 depicted several functions like figure 4.2, it would be clear that the marginal products of machines are affected disproportionately by variations in the exogenous price of output and wage rate. In particular, rents and the marginal prod4. An example may help. Suppose there are fifty identical machines in the economy, each machine using ten workers to make 100 units of output (500 workers and 5,000 units of output in all). If one more machine is added to the economy and ten more workers, 100 more units of output will be produced. But to compute the marginal product of capital, total labor must be held fixed, so ten of the fifty-one machines now must be operated with only nine workers. If these ten machines now produce only ninety-four units of output each, the net gain from adding the fifty-first machine to the economy would be only forty units [100 – 10 (100 – 94)]. This marginal product will depend on how scarce labor is. Had we started instead by operating the fifty machines with only seven laborers making eighty units of output each, the introduction of the fifty-first machine would require seven machines to be operated with six workers each, the seven machines producing perhaps only seventy units each. Then the marginal product of capital would be only ten units [80 – 7 (80 – 70)]. The usefulness of another machine is lower when labor is a relatively scarce resource.
114
Fig. 4.6
Michael J. Harper
The marginal product of a machine
Note: Rays OB and OD in figure 4.4 can be projected back into output–labor-hours plane of figure 4.1. Segments AB and CD then represent the marginal product of the machine before and after the wage increase. Even though the machine is exactly the same, the marginal product of the machine is driven down by an increase in wages because the opportunity cost of labor (the output the worker could make with some other machine) has risen. In the long run, technical change and investments in efficient assets drive a steady upward rotation of the ray.
ucts of older assets are affected proportionally more by cyclical effects and by obsolescence than are the rents and marginal products of newer more productive assets. The Leontief aggregation conditions require ratios of marginal products among machines to be independent of exogenous variables. Capital stock measures impose this, at odds with how assets with differences in productivity will behave. 4.5.3 The Rigid Homogeneity Assumption Underlying Capital Stock Measures One possible arrangement of machine functions is of special significance. A group of machines, Gt,v,m , is defined to be homogeneous if, for any two functions, fi, j,k and ft,v,m ⊂ Gt,v,m there exists an i, j,k such that: (18)
fi, j,k(i, j,k L) i, j,k ft,v,m(L)
∀ L.
Fisher (1965) and Hall (1968) used different proofs to show that vintages must be homogeneous in order for the vintage aggregate, J, to exist. Figure 4.7 illustrates the similarity of machine functions for a homogeneous group of machines. For any given pt /wt ratio, all machines will operate with the same proportions of output and labor, that is, the same labor productivity. One function is never strictly above another like in figure 4.2, that is,
Technology and the Theory of Vintage Aggregation
Fig. 4.7
115
A family of homogeneous machine functions
Note: The Leontief aggregation conditions fail unless the relative marginal products of machines are unaffected by the rotation of ray OB. Machines with higher potential labor productivity than others (like in figure 4.2) are ruled out of a homogeneous group. Machines can be bigger but not better. This is unsuitable for studying technologically improved equipment, but it is what economists assume in measuring capital stock.
one machine will never produce more output than another with the same labor. It is clear that two machines are not homogeneous if one of them embodies an improvement that enhances labor productivity. Nor can an older machine, whose machine production function has deteriorated, be part of the same homogeneous class as a new machine. Within a homogeneous group, machines can produce different amounts of output with proportionally different amounts of labor input. Call the value of i, j,k that satisfies expression (18) the size of machine i, j, k compared to machine t, v, m. A new machine can be bigger but not better. Imposing a homogeneity assumption is implausible and inappropriate if one is interested in measuring high technology capital and characterizing the sources of growth. Yet such homogeneity is assumed in capital stock measurement. What are the consequences of assuming homogeneity? The fixed ageefficiency schedule is at odds with the fact that relative marginal products will vary by vintage. The fixed schedule imposes homogeneity, as defined by equation (18) on the vintage machine production functions. All vintages are assumed to be affected proportionally by a demand shock. In effect, vintage machines are assumed to differ only in size. To the extent that as-
116
Michael J. Harper
sets actually differ in labor productivity, as in figure 4.2, any exogenous shock should actually affect the marginal products of the oldest and least efficient vintages proportionally more than those of the newer ones. Because of the built-in homogeneity assumption, capital stocks can lead to puzzling results when used in short-run models, such as those reported by Brynjolfsson and Hitt (2003). An age-efficiency function could be constructed to correct for any steady and persistent temporal influence such as obsolescence, as Wykoff (2003) has suggested. In this case, the age-efficiency function is adjusted for the effects on marginal product of obsolescence (the upward rotation of ray OA, as in figure 4.6) as well as the effects of deterioration in the function f. However it is clear that a time-invariant–age-efficiency function will be unable to correct for cyclical influences. The idea of a capacity utilization adjustment is to correct for this problem. However, a capacity utilization ratio for the capital stock will not accurately model the myriad vintagespecific adjustments to marginal products brought about by a cyclical change in demand. Ideally, a separate capacity adjustment would be calculated for each vintage. The vintage aggregate capital service measure would then be the sum across vintages of investments adjusted for capacity effects, as well as for deterioration and obsolescence. 4.6 Measuring Quality Change In the literature on quality adjustment of consumer goods, it is axiomatic that relative prices reflect relative utilities. In measuring inputs associated with durable capital goods, the usual assumption is that relative goods prices measure relative marginal products. My point of departure is that they do not. Triplett (1989) recognized that rental prices rather than purchase prices should be used to compare marginal products. In neoclassical theory, the purchase price of an asset presumably equals the discounted value of its future rents. The ratio of purchase prices of two assets is therefore proportional to the ratio of their discounted streams of future rents and not necessarily to the ratio of marginal products. At first blush, it seems modest to assume the rental streams will be proportional to marginal products, ensuring that the purchase prices are in step with the rental prices. After all, such proportionality will occur if age-efficiency functions are geometric. But if new machines embody technical change, that is, if the labor productivity associated with newer models is higher, and if the machine functions contain regions of increasing returns to labor (such as is the case if there is a fixed labor requirement for each machine), then obsolescence will push down rents of older models proportionately faster than rents of newer models. Because of this, the ratio of the price of a more productive model to that of a less productive model will overstate the ratio of marginal prod-
Technology and the Theory of Vintage Aggregation
117
ucts. As Hulten (1992) noted, the notion of capital quality is grounded in Hall’s (1968) embodiment factor, which itself describes aggregation with marginal products. 4.6.1 Obsolescence, the Functional Form of Marginal Product with Age, and Quality Change Observations of purchase prices, whether determined with hedonic or matched-model techniques, are required to measure quality. But information on the functional pattern by which obsolescence affects marginal products as models age also is required. For example, assume that models are impacted as they age by obsolescence, but not by deterioration. Oliner and Sichel (2000) contend that obsolescence dominates deterioration in contributing to a high-tech asset’s demise. Further, assume that wt /pt rotates upward at a steady rate without cyclical disturbances. Processes, such as quality improvements, are then presumed to occur at fixed rates so that the quality of a new model, in any year t, relative to one introduced one year earlier will be Bt zt,t,t /zt,t,t–1. Let be the age()/efficiency function, i.e. zt,t,t– /zt,t,t . Under the assumptions, Bt 1/ 1. From the neoclassical axiom that the price of an asset equals the discounted future rents, one can determine the price of a new model relative to last year’s model:
u0 ptu uerudu Bt u0 ptu uerudu pKt,t,t pKt,t,t1 p erudu p ( / )erudu
(19)
u0 tu u1
u0 tu
u1
1
If quality raises the labor productivity of newer models, (which it must do if there is a technological improvement as distinct from an increase in size), rents, p , will be forced down by obsolescence proportionally faster with age, i.e. d 2 ln /d 2 0, and the machine will eventually be retired, that is,
0 for L. Under these conditions the ratio of integrals on the righthand side of expression (19) will be greater than one. The ratio of model prices will exceed relative quality. The bias in the existing durable goods quality adjustments is likely to be substantial. Figure 4.8 plots the marginal products of hypothetical goods. For example, computers could be depicted by straight-line–age-efficiency functions with short lives. The age-efficiency functions decline because of the temporal effects of a steady increase in wt /pt . Newer models embody technical improvements. The relative marginal product of a newer model to that of an older one at any time would be proportional to the ratio of heights of the lines in left-hand-side portion of figure 4.8. The relative asset prices will be proportional to a ratio involving areas. Thus, the area under each line, from a given time through the rest of the life of the asset, will represent the asset’s (nondiscounted) future rents. If the discount rate is zero and the effects of obsolescence are straight-line, as in figure 4.8, qual-
118
Fig. 4.8
Michael J. Harper
Tracing vintage marginal products and prices
Note: Obsolescence causes the marginal products of machines of any model to fall over time. For example, suppose the marginal product of a new model in 1992 is three units of output and that of a new 1991 model (in 1992) is two units. Assuming interest rates are negligible, the price of each asset will reflect the remaining area under its marginal product curve. This is the light shaded area for the 1991 model and the total of the light and dark areas for the 1992 model. The ratio of prices (in units of output) will tend to exceed the ratio of marginal products. In the example, the older model has 2/3 the marginal product but only 4/9 of the price of the newer one. If the marginal products decline along parallel straight lines and interest is negligible, the price ratio will be the product of the effects of marginal product (2/3) and of future obsolescence (2/3). Thus, for small price differentials, quality will be overstated by a factor of about two.
ity change would be overstated by a factor of about two. For an age– marginal-product relationship that declined slowly at first and then faster in absolute level with age, as the BLS assumes, the factor would be even higher. Quality-change bias could occur in successive years for the reasons outlined here. If so, the bias will compound over time in a chained-index number. A geometric age-efficiency specification appears to escape the problem, that is, u1/ 1 u for all u, reducing the right-hand side of equation (19) to Bt . The age-efficiency profiles of all models will fade away proportionately, and, therefore, corresponding goods prices will take on the same proportions. Before taking any comfort in this, note that a geometric ageefficiency function cannot describe a situation where obsolescence erodes the marginal products of assets with older designs proportionately more than newer ones, as is likely to happen when the assets embody different technologies. Capital goods prices will be proportional to marginal products only if the geometric model really describes events at the microlevel, that is, only if the older and newer assets belong to a homogeneous family. This will not happen when, in reality, labor productivity in newer models is higher as a result of embodied quality improvements. In reality, obsolescence forces many older assets out of service. The idea that the older assets tend to be retired before newer ones is common sense. As retirement approaches, the level of an asset’s marginal product approaches zero, declin-
Technology and the Theory of Vintage Aggregation
119
ing faster and faster in percentage terms. Hence Bt must be greater than 1. By assuming that Bt 1, consistent with a geometric model, the standard approach to quality adjustment disregards key evidence. 4.7 Summary Capital stock measures are widely used in the economics literature. Capital stocks are constructed from data on vintage investment by means of strong aggregation assumptions. It is assumed that the capital services of vintage investments are predetermined and that they decay with age, independent of prevailing wages and output prices. These assumptions were identified as a potential limitation in the 1950s. Mechanisms have been developed to adjust capital stocks for the manifestations of these rigid assumptions. Capacity utilization adjustments to capital stocks account for cyclical variations in output, while quality adjustments are made to investments to correct for temporal improvements in the technology embodied in capital goods. Like capital stocks, however, these measurement adjustments involve strong assumptions. This has been recognized for decades in the case of capacity utilization. In the case of quality, the potential for bias may be underappreciated. I hope that this paper helps raise awareness of these issues and their importance to our understanding of economic growth.
References Berndt, Ernst R., and Melvyn A. Fuss. 1986. Productivity measurement with adjustments for variations in capacity utilization and other forms of temporary equilibrium. Journal of Econometrics 33 (1/2): 7–29. Brynjolfsson, Erik, and Lorin M. Hitt. 2003. Computing productivity: Firm-level evidence. Review of Economics and Statistics 85 (4): 793–808. Bureau of Labor Statistics. 2001. Multifactor productivity trends, 1999. U.S. Department of Labor News Release no. 01-125, May 3, 2001. Cooper, Russell, and John Haltiwanger. 1993. The aggregate implications of machine replacement: Theory and evidence. American Economic Review 83:360–82. Fisher, Franklin M. 1965. Embodied technical change and the existence of an aggregate capital stock. Review of Economic Studies 32:263–88. Griliches, Zvi. 1963. Capital stock in investment functions: Some problems of concept and measurement. In Measurement in economics, ed. Carl Christ. Stanford, CA: Stanford University Press. Repr. in Technology, education and productivity, ed. Zvi Griliches, 123–43. New York: Basil Blackwell, 1988. Hall, Robert E. 1968. Technical change and capital from the point of view of the dual. Review of Economic Studies 35 (January): 35–46. Harcourt, G. C. 1969. Some Cambridge controversies in the theory of capital. Journal of Economic Literature 7 (2): 369–405. Hill, Peter. 2000. Economic depreciation and the SNA. Paper presented at the 26th
120
Michael J. Harper
conference of the International Association for Research in Income and Wealth, Cracow, Poland. Hulten, Charles R. 1992. Growth accounting when technical change is embodied in capital. American Economic Review 82 (4): 964–79. ———. 2001. Total factor productivity: A short biography. In New developments in productivity analysis, ed. Charles R. Hulten, Edwin R. Dean, and Michael J. Harper, 1–47. Chicago: University of Chicago Press. Jorgenson, Dale W., and Zvi Griliches. 1967. The explanation of productivity change. Review of Economic Studies 34 (3): 249–82. Jorgenson, Dale W., and Kevin J. Stiroh. 2001. Raising the speed limit: U.S. economic growth in the information age. Brookings Papers on Economic Activity, Issue no. 1:125–235. Washington, DC: Brookings Institution. Leontief, Wassily W. 1947. Introduction to a theory of the internal structure of functional relationships. Econometrica 15:361–73. Morrison, Catherine J., and Ernst R. Berndt. 1981. Short-run labor productivity in a dynamic model. Journal of Econometrics 16:339–65. Oliner, Stephen D., and Daniel E. Sichel. 2000. The resurgence of growth in the late 1990s: Is information technology the story? Journal of Economic Perspectives 14 (4): 3–22. Solow, Robert M. 1957. Technical change and the aggregate production function. Review of Economics and Statistics 39 (3): 312–20. ———. 1960. Investment and technical progress. In Mathematical methods in the social sciences, ed. K. Arrow, S. Karlin, and P. Suppes, 339–65. Stanford, CA: Stanford University Press. ———. 2001. After technical progress and the aggregate production function. In New developments in productivity analysis, ed. Charles R. Hulten, Edwin R. Dean, and Michael J. Harper, 173–78. Chicago: University of Chicago Press. Triplett, Jack E. 1989. Price and technological change in a capital good: A survey of research on computers. In Technology and capital formation, ed. Dale W. Jorgenson and Ralph Landau, 127–213. Cambridge, MA: MIT Press. Wykoff, Frank C. 2003. Obsolescence in economic depreciation from the point of view of the revaluation term. Paper presented at the NBER Summer Institute’s Conference on Research in Income and Wealth, Cambridge, MA.
5 Why Do Computers Depreciate? Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
Personal computers rapidly lose economic value. Within two years after purchase, the price of a used computer falls to one-third of its price when new. This rapid loss in value occurs even though the two-year-old computer can do exactly the same computations it did when it was new and suffers only small changes in reliability, physical appearance, or in other observable attributes. The two-year-old computer can typically produce the same documents, run the same regressions, and connect to the same server as it did when new. Hence, by most measures, it can produce the same output. Thus, economic depreciation takes place with little or no physical deterioration or loss of productive capacity. The general source of this economic depreciation is not a puzzle. New computer models are typically both cheaper and more powerful than older ones. That new computers are cheaper and better than older computers has distinct effects on the value of older computers. First, the value of old computers falls to bring the value of the computing power they can deliver in line with its current replacement cost. Technical change that reduces the price of producing computers with constant specifications reduces the Michael J. Geske was a student at the University of Michigan when this research was done, and he is currently attending medical school at Washington University in St. Louis. Valerie A. Ramey is a professor of economics at the University of California at San Diego, and a research associate of the National Bureau of Economic Research. Matthew D. Shapiro is the Lawrence R. Klein Collegiate Professor and Chair in the economics department at the University of Michigan, and a research associate of the National Bureau of Economic Research. We have benefited from the insightful comments of Kate Antonovics, Richard Carson, Jerry Hausman, Charles Hulten, Daniel Sichel, and participants in seminars at the NBER Summer Institute, the University of California at San Diego, and the University of Michigan. We gratefully acknowledge the support of National Science Foundation Grant SBR9617437.
121
122
Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
value of older computers, but it does not affect their productivity. Second, technical change can also improve the specifications of new computers. Such improvements in new computers can depress the value of older computers by making them obsolete. Even though older computers might be able to do existing tasks perfectly well, they can become obsolete because they become incompatible with new operating systems or software or do not have hardware that becomes standard in new models (e.g., compact disc [CD] readers, Internet adapters). The development of new operating systems or software is likely promoted by reductions in the costs of hardware. Hence, the declines in constant-quality replacement cost and obsolescence are distinct but interrelated processes that have separate effects on the value of older computers. Though the economics of depreciation of computers is relatively clear, there are substantial gaps in measuring this phenomenon. Specifically, we know of no research that explicitly links new and used personal computer prices to measure depreciation rather than presuming a rate of depreciation from the change in prices of new computers.1 The estimates of depreciation of computers in the National Income and Product Accounts are based, for example, on changes in the price of new computers. In this paper, we estimate directly the change in value of personal computers by comparing the price of used computers to the price of the same computer when new. Our data set links new and used prices of several thousand computers, the years sold when new and used, the age, and a precise description of important characteristics. The richness of our data allows us to overcome a common problem in the measurement of depreciation—that the effects of vintage, age, and time are typically not separately identified. (See Hall 1968; Hulten 1996.) The method of this paper extends the procedure of Ramey and Shapiro (2001), which estimated such changes in value of used equipment as a function of age and measures of flexibility of the equipment in alternative uses by including measures of the obsolescence of the used equipment. This paper also presents estimates of a hedonic price index of new computers that is an important ingredient in the calculation of depreciation and user cost. Precise measurement of the change in value of existing computers, as well as a precise decomposition of its sources, is important for addressing several economic issues. First, depreciation estimates are a necessary ingredient in the measurement of the value of the capital stock. Personal 1. Oliner’s (1993, 1996) important work on computer depreciation focuses on mainframe computers and computer peripheral equipment. Berndt and Griliches (1993) use hedonic price regressions based on new personal computers (PCs) only. Since this paper was presented at the conference in 2003, Doms et al. (2004) have produced estimates of depreciation using the same data source as used in this research. Wykoff (2003) presents estimates of obsolescence of laptop computers using data on new prices of computers observed at different points in time.
Why Do Computers Depreciate?
123
computers have become an increasing fraction of both business and household capital. As measures of depreciation are important for estimating the net value of capital, our estimates should be useful for this purpose. Second, it is important to understand the change in value of computers to understand investment in new computers. The user cost of computers is among the highest for any type of equipment because of the rapid fall in replacement cost and the high rate of economic depreciation. For investment to be positive, computers must have very high marginal products to balance the high cost of owning them. That is, computers are purchased with the knowledge that investment in them will have to be amortized over a short period of years. This paper will provide a decomposition of the user cost of computers into change in replacement cost and economic depreciation, with economic depreciation decomposed into age-related deterioration and into obsolescence. Third, to calculate an index of capital services for total factor productivity measurement, it is necessary to have a reliable estimate of the user cost of the various types of capitals (Jorgenson and Griliches 1967). Given the importance of information technology investment in the recent acceleration in total factor productivity, having a good estimate of the user cost of computers can make an important contribution to measuring the pace of technological change. Fourth, the estimates of the impact of obsolescence on the value of installed capital can provide valuable insights into the propagation and effects of new technologies. Our results suggest that part of the estimated rate of obsolescence is directly related to the decline in the hardware prices. Thus, the estimates imply that a slowdown in the rate of technological progress would reduce the depreciation rate on used computers. The remainder of the paper is organized as follows. Section 5.1 sketches our theoretical framework. Section 5.2 discusses our data. Section 5.3 outlines our empirical implementation. Section 5.4 presents the estimation results. Section 5.5 presents their implication for user cost. Section 5.6 gives our conclusions. 5.1 Theoretical Framework The work of Hall and Jorgenson (1967) on user cost and the work of Hall (1968, 1971), Hulten and Wykoff (1981, 1996), Oliner (1993), Jorgenson (1996), and others on depreciation provides the framework for this analysis. Consider first the definition of user cost, (1)
RK P I(r I ),
where P I is the constant-quality price of new investment goods in period t, r denotes the nominal opportunity cost of funds, is the depreciation rate, and I is the rate of change of P I. The user cost relationship is derived from
124
Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
an intertemporal arbitrage between purchasing new equipment currently versus purchasing new equipment in the future, in which the capital stock evolves according to K˙ I – K, where I is gross investment.2 Absent adjustment costs, the marginal product of having a unit of capital installed at time t should equal the user cost, that is, the sum of the opportunity costs of funds, the economic depreciation, and the capital loss from selling the equipment in the future. This paper will use a second arbitrage, between new and used equipment, to quantify the economic depreciation component of user cost. Specifically, the paper will use the wedge between the new and used price of the same computer to quantify economic depreciation. Consider, q NOM t,t–v , nominal ratio of used to new computer prices, (2)
P Ut,tv qNOM , t,tv P Ntv
where PUt,t–v is the price of a used piece of equipment at time t that was new at time t – v, and P Nt–v is the price of the equipment when it was new. Note that in our analysis, the prices refer to a specific piece of equipment, not to a price index. What makes qNOM t,t–v deviate from unity? Suppose that the only change in the environment were the change in the replacement cost of new equipment. That is, suppose that the same piece of equipment were available at time t as at time t – v, and there were no technological change except for potentially a change in the cost of the new equipment. (In the computer example, this would correspond to a decline in the price of a central processing unit (CPU) or random-access memory (RAM) of a given quality.) Moreover, suppose that the used piece of equipment suffered no deterioration whatsoever and that there were no costs of adjustment, installation, or resale. In this case, contemporaneous arbitrage would require that the price of the used computer fall by the amount that replacement cost had U N N declined. That is, with no economic depreciation, q NOM t,t–v Pt,t–v /P t–v P t / N N P t–v . In practice, we do not typically observe P t , the current price of the new good, and instead substitute the constant-quality new (replacement) investment good price index PIt . Hence, if there is no economic depreciaNOM tion, q t,t–v PUt,t–v /P Nt–v P It /P It–v exp( It,t–v ). Note that It,t–v ∫ tt–v I(s)ds denotes the cumulative rate of change of constant quality new investment good prices as defined in the user cost formula in equation (1). To create a variable that adjusts for this change in the price of new goods, define q as 2. This equation may be derived from a continuous time dynamic optimization problem. It can also be viewed as an approximation from a discrete time problem. In the case of computers, though, the rates and are so large that the approximation is not very good.
Why Do Computers Depreciate?
(3)
125
PUt,tv I qt,tv exp( It,tv ) qNOM t,tv exp( t,tv ). P Ntv
Under the special circumstances just outlined, qt,t–v would be equal to one, and user cost would come only from the change in replacement cost. Now consider the more general case, where new and used equipment are not perfect substitutes because of economic depreciation. The variable q measures the fraction of original real value left, so under the assumption of exponential but not necessarily constant decay, it is linked to the depreciation rate by (4)
qt,tv exp(t,tv ),
t where t,t–v ∫t–v (s)ds, and (s) is the depreciation rate at instant s. We will decompose economic depreciation into three components. Age-related depreciation or deterioration, denoted v , captures the wedge in value between new and used equipment that is strictly a function of age. It is frequently modeled as a geometric function of time. We will consider that specification as well as more general cases.3 Age-zero depreciation, denoted 0 , captures the loss in value the instant that a piece of equipment is sold. This instantaneous depreciation can arise from lump-sum costs of adjustment, installation costs, and transactions costs.4 Additionally, it may also represent the discount from customization, that is, that a purchaser of a new computer may get to choose its precise configuration while the buyer of the used computer does not.5 Obsolescence, denoted s , represents the change in value of used computers because they have fallen behind the current technology. Our empirical strategy is to use measures of obsolescence to quantify s . In the following, we discuss in detail how we implement this empirical strategy. We will treat the three components of depreciation as additive in rates of change, so
(5)
v 0 s .
Obsolescence is in no sense a residual. The average discount of used relative to new computers that we cannot account for with observed measures of obsolescence or with age will be counted as age-zero depreciation. All three components of depreciation are measured relative to replacement 3. Deterioration may also be a function of intensity of use. Depreciation in use does not appear, however, to be an important factor for computers and is not considered in this paper. 4. Adjustment costs are another reason for q to differ from 1. We believe these are wellcaptured by the instantaneous depreciation. 5. The instantaneous depreciation could also represent a lemons discount owing to adverse selection. As with the case of machine tools (Ramey and Shapiro 2001), we argue that lemons discounts are unlikely to be substantial in the used PC markets because PCs rarely are lemons and because the rare lemon is easy to detect.
126
Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
cost, so they account for declines in value in excess of the declining costs of producing computers of constant quality. Scrappage is another source of user cost. Old computers are often discarded or given away. Our data set, which contains information on the value of computers that are sold, does not provide any information about computers that are disposed of by other methods. Because the value of scrapped computers is zero (or even negative if there is a cost of disposal), our estimates will not account for the entire user cost of the computers. Doms et al. (2004) use the same data as used in this study, together with parametric assumptions about scrappage rate, in order to account for this important component of user cost. Though our data do not provide new estimates of scrappage rates, we present some calculations based on presumed scrappage rates to provide a full picture of user cost. We can combine equations (3), (4), and (5) to characterize the decomposition of the components of nominal q as (6)
qNOM t,tv exp
t
tv (s)ds I
t
exp [0(s) v(s) s(s)]ds tv
or
t
(6 ) q NOM t,tv exp
tv
t
I(s)ds qt,tv exp
tv
[0(s) v(s) s(s)]ds ,
that is, q equals cumulative economic depreciation. The use of the contemporaneous arbitrage between new and used prices to quantify depreciation in equations (6) provides a link to the intertemporal arbitrage in the user cost relationship in equation (1). It is important to emphasize that technological change can impact user cost through two very different channels. First, technological changes can make new capital goods cheaper over time. This first channel for technological change is captured by I in the user cost expression. When the price of replacement investment goods is falling, this channel adds substantially to user cost even if there is no deterioration or obsolescence. Second, technological change can lead to obsolescence of old capital by making new capital goods better over time. This change does not directly reduce the intrinsic productivity of existing capital; it can still perform its previous functions (e.g., a steam locomotive can still pull a train in the age of diesel). Nonetheless, technological progress can make existing capital obsolete. There are three separate effects within this channel. First, new capital might perform the same tasks faster, better, or with less labor input. Second, the new capital may be able to work with complementary inputs, such as software, in a manner that is impossible for the old capital. Third, the new capital may have better network abilities, such as sharing documents, exchanging data, and connecting to the Internet. All three of these effects are potentially important for computers. The IBM AT computer
Why Do Computers Depreciate?
127
that this paper might have been written with fifteen years ago would have gotten the job done almost as well as the Pentium IV laptop. Certainly, the current statistical software and word processing software is easier to use and runs faster, but the fifteen-year-old technology would have sufficed to get the job done, presumably with no effect on the quality of the analysis or quality of the writing. Using the fifteen-year-old technology to write this paper now would, however, be considerably more difficult. Media for storing and transferring data have changed. Old software does not work with new printers. The old computer cannot run new software, and new software might have been necessary to read a data set. Hence, even though the old AT could have once performed the task and is still physically operational, that is, has not depreciated physically, its productivity has declined. As technology evolves, a serviceable old technology becomes unproductive as the network and infrastructure for operating it vanish.6 Obsolescence as a result of technological change is not well modeled either as physical deterioration or as a reduction in the price of new computers owing to the decline in production costs of delivering computing power. One of the main goals of this paper is to measure this type of obsolescence and to quantify its role in the user cost of computers. 5.2 Data The data consist of information on used computers gathered from the Orion Computer blue books. The Orion Research Corporation (various years) has been publishing used pricing guides for a wide range of consumer products since 1973. The products covered by the guides include audio/visual equipment, cameras, musical instruments, copiers, vintage collectibles, and televisions. They have been publishing their computer price guide quarterly since 1982. The Orion blue books are currently used by retail dealers, insurance companies, computer manufacturers (including Dell, Gateway, and Micron), and the Internal Revenue Service to provide an accurate reflection of the used computer market. Orion determines used computer prices through surveys given to used computer dealers nationwide. Dealers are asked to provide the asking price, selling price, and days the computer was in stock before it was sold. The used price listed in the book is the average price of a computer that was sold in less than thirty days. Computers that were sold after being on the market for longer periods did not have their selling prices included in this computation. The Orion blue books also include a retail price (price when new) of the used computers listed in the book. Using computer company advertisements in back issues of PC Magazine and PC World, we were able to determine that the retail price listed reflects the new price of the com6. To pursue the rail analogy, steam engines are not productive without water towers.
128
Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
puter approximately nine months to one year after the model was first introduced. The range of dates the specific computer model was manufactured is also given as are the specific attributes of the model, including monitor type and size (if one was included in the purchase price of the computer), speed, amount of RAM, hard drive storage space, type of hard drive, type and speed of CD-ROM or DVD-ROM, Ethernet card or modem, and type of processor (Pentium, Celeron, 286, AMD Athelon, etc.). They include prices from nearly 700 manufacturers, including all major computer companies. We limit our analysis to Compaq and Gateway computers. Many manufacturer listings in the blue books were inconsistent from year to year in which models were included in the pricing, making analysis of the same model’s used price over a long period of time difficult. This problem was encountered for many major computer manufacturers listings, including IBM and Dell. Compaq and Gateway have a thorough listing of prices across numerous models and over a long time period, making it well suited for our analysis. We coded the attributes of the computer available from the blue books. These include the dates the model was sold, the new price, the used price, and some characteristics of the computer. These include the amount of RAM, the size of the hard drive, the speed of the CPU, the type of CPU, the speed of the CD drive (if any), and the make. After deleting computer models with missing data, we have 3,112 observations. Some models are observed in several years; we have observations on 1,170 distinct models. We observe used prices in years from 1990 to 2001 (excluding 1991 and 1994, years for which we could not obtain the source data). The computers we observe were produced between 1984 and 2001. The computers range in price when new from a minimum of $499 to a maximum of $32,880. The median new price of a computer was $2,490. Used prices range from $7 to $14,140. The median used price is $333. The computers ranged in speed from 8 to 933 megahertz (MHz), with the median computer having a 100 MHz processor. Random access memory (RAM) varies from 512 kilobytes (KB) to 256 megabytes (MB), with a median of 16 MB. Hard drive space ranged from 1 to 40 gigabytes (GB). The median size is 1 GB. We exclude diskless machines from the sample. These data are summarized in tables 5.1, 5.2, and 5.3. Table 5.1 shows the attributes by year when the computer was new. Several aspects are noteworthy. The rate of quality improvement in computers is striking. From 1984 to 2001 (with adjustments for 2001 where the medians are affected by the small numbers of computers produced in 2001 and resold in the same year), the median RAM in our sample rose 250 times, the median speed rose 87 times, and the median hard drive capacity rose 1,000 times. At the same time, the median price fell 72 percent. Interestingly, median prices rose during the 1980s and then started plummeting during the 1990s. The
Table 5.1
Attributes of new computers, by year produced
Year
N
RAM
Speed
Hard disk
Has CD
Compaq
CPU
Price
1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
16 7 37 58 38 89 136 154 354 319 205 206 217 330 559 291 94 2
0.512 0.512 1 1 1 2 4 4 4 8 8 16 16 32 32 128 128 96
8 8 20 20 20 33 25 25 33 50 66 100 133 200 300 500 650 700
30 30 70 60 40 320 120 120 120 270 540 1,000 1,600 2,500 4,300 13,000 10,000 30,000
0 0 0 0 0 0 0 0 0.02 0.18 0.44 0.59 0.42 0.72 0.72 0.80 0.85 1
1 1 1 1 1 1 1 0.52 0.75 0.88 0.41 0.64 0.88 0.59 0.76 0.54 0.72 1
286 286 386 386 386 386 386 486 486 486 486 PI PI PI PII PIII/IV PIII/IV PIII/IV
5,145 4,799 7,999 7,495 5,190 12,499 9,999 2,560 2,359 2,480 2,490 3,040 2,930 2,470 1,999 1,899 1,431 1,525
Notes: Year year when new; N number of observations; RAM random access memory, megabytes (median); Speed clock speed of CPU, megahertz (median); Hard disk size of hard disk, megabytes (median); Has CD 1 if has a CD drive (mean); Compaq 1 if a Compaq computer and 0 if a Gateway (mean); CPU generation of processor: 386 80386, 486 80486, PI Pentium I, PII Pentium II, PIII/IV Pentium III or IV (median); Price price of new computer, nominal dollars (median).
Table 5.2
Year
N
1990 1992 1993 1995 1996 1997 1998 1999 2000 2001
26 73 118 155 233 284 382 553 595 693
Attributes of used computer, by year sold
RAM Speed 1 2 4 4 4 8 8 16 32 32
20 20 25 33 33 50 75 150 200 300
Hard disk
No CRT
Has CD
65 84 120 120 270 340 650 1,800 3,000 4,300
1 0.95 0.83 0.81 0.67 0.70 1.00 0.82 0.67 0.61
0 0 0.01 0.06 0.17 0.24 0.38 0.50 0.56 0.65
Compaq Age CPU 1 0.95 0.81 0.80 0.76 0.79 0.75 0.73 0.67 0.67
3 3 2 4 4 4 4 2 2 3
386 386 386 486 486 486 PI PI PI PII
Price 3,840 1,470 574 499 417 340 320 327 132 239
q q (BEA) (hedonic) 0.68 0.42 0.23 0.30 0.39 0.44 0.43 0.33 0.16 0.36
0.82 1.03 0.44 0.55 0.51 0.55 0.47 0.44 0.23 0.40
Notes: Year year when sold; N number of observations; RAM random access memory, megabytes (median); Speed clock speed of CPU, megahertz (median); Hard disk size of hard disk, megabytes (median); No CRT 1 if used computer sold without a monitor; Has CD 1 if has a CD drive (mean); Compaq 1 if a Compaq computer and 0 if a Gateway (mean); Age age of the computer when sold, years (median); CPU generation of processor: 386 80386, 486 80486, PI Pentium I, PII Pentium II (median); Price retail price of used computer, dollars (median); q resale price over acquisition price, reflated by BEA price index for computers or reflated by estimated hedonic price index (median).
130
Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
Table 5.3
Age 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Attributes of used computers, by age when sold
N
RAM
Speed
Hard disk
No CRT
Has CD
Compaq
CPU
Price
q (BEA)
q (hedonic)
666 606 500 349 273 237 198 141 89 35 19 13 7 2
32 32 16 8 8 4 4 4 4 2 1 1 1 0.512
300 200 166 66 50 33 33 33 33 20 16 20 16 12
4,300 3,200 2,000 630 340 270 240 210 120 84 40 60 40 30
0.54 0.59 0.77 0.92 0.89 0.87 0.90 0.97 1 1 1 1 1 1
0.63 0.63 0.54 0.38 0.24 0.20 0.07 0.01 0 0 0 0 0 0
0.70 0.67 0.75 0.70 0.74 0.74 0.76 0.80 0.83 1 1 1 1 1
PII PI PI PI 486 486 486 486 486 386 386 386 386 286
934 562 306 220 155 70 25 17 14 16 47 15 9 15
0.62 0.47 0.34 0.29 0.22 0.14 0.10 0.08 0.07 0.07 0.07 0.04 0.03 0.09
0.66 0.58 0.46 0.38 0.35 0.27 0.19 0.17 0.18 0.23 0.27 0.17 0.16 0.30
Note: See notes to table 5.2.
speed up in technological progress during the late 1990s is also evident from the table. Tables 5.2 and 5.3 show the data from the vantage point of when the computer was sold. Table 5.2 shows the attributes by the year that it was sold. The values for q (the ratio of used to new price, adjusted for the change in the price of new computers) show that in most years, the average computer sold used had lost close to 70 percent of its value. (See table 5A.1 for averages.) Table 5.3 shows the attributes by the age when it was sold. One-year-old computers lost 34 percent of their real value on average, and two-year-old computers lost 42 percent of their value, as shown by the values of q in the last column. 5.3 Empirical Framework 5.3.1 Hedonic Model for New Computer Prices For goods such as computers, where the quality of new goods is changing rapidly, much of the decline in the resale price of existing capital derives from competition with new models that are both better, and possibly cheaper, than the older models. This environment does not alter conceptually the user cost framework, but it does provide a substantial measurement challenge. A constant-quality, that is, hedonic, price index can be used to adjust the acquisition price of a used computer to make it comparable to a new computer. While not the focus of the paper, we use our data set to estimate a hedonic model of new computer prices. We regress the log of new computer
Why Do Computers Depreciate?
131
N prices, log(P t–v ), on a constant, year dummies, and attributes to measure the quality. These attributes include the log of the CPU speed, the log of the size of the hard disk drive, the log of the size of the RAM, a dummy for whether the computer has a CD drive, a dummy for whether the computer is made by Compaq, and a set of dummies for six generations of CPU (Intel 80286, 80386, 80486, Pentium I, Pentium II [and non-Intel competitors AMD K6, Celeron, Duron, Cyris], and Pentium III or IV or AMD Athelon). We experimented with allowing the prices of the attributes to vary with time as recommended by Pakes (2003). The estimates (fitted values) were extremely noisy. (Estimation error is also given substantial attention by Pakes.) Our data set is not designed for estimating hedonic models along lines Pakes suggests, that is, it is relatively small and not designed explicitly so that competing models of different attributes are marketed simultaneously, so it is not a good test bed for Pakes’s recommended procedure. Therefore, we use time dummy estimates, which a recent National Academy panel has labeled as Griliches-neutral (see Schultze and Mackie 2002, 151). Table 5.4 reports the estimates of the hedonic equation for new computer prices. Even this simple model explains two-thirds of the variance of the log of price. The year dummies show a sharp and relatively steady rate of decline in prices.
5.3.2 Modeling Depreciation The estimation equation we consider follows from taking logarithms of both sides of equation (6) or (6 ) and then considering alternative functional forms for the various components of depreciation. Noting that cumulative change in constant value replacement cost is ∫ tt–v I(s)ds log(P It / P It–v ), the basic equation is (7)
t P It log(q NOM [0(s) v (s) s(s)]ds t,tv ) log P Itv tv
or t
(7 )
log(qt,tv) [0(s) v (s) s(s)]ds, tv
where log(qt,t–v ) log(PUt /P Nt–v ) – log(PIt /P It–v ). Recall that the used and new prices, PUt and P Nt–v , are specific to a particular observation, while the constant-quality price index P It is either a function of time only (in the case of the Bureau of Economic Analysis [BEA] index) or of time and attributes of the computer (in the case of our hedonic index). We will estimate relationship (7) or (7 ) over our sample of computers. We observe the same computer models at two points in time, so our data have a panel structure. Note, however, that the theoretically mandated specification above takes the difference (used versus new price) as the dependent variable rather than differencing an expression in the level of
132
Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
Table 5.4 New computer price hedonic equation—Dependent variable: log(P Nt ) log of CPU speed log of RAM log of hard disk size CPU 386 CPU 486 CPU Pentium I CPU Pentium II CPU Pentium III or IV Compaq Has CD drive Year 1985 Year 1986 Year 1987 Year 1988 Year 1989
0.410 (0.070) 0.250 (0.035) 0.127 (0.029) 0.045 (0.100) 0.273 (0.103) 0.516 (0.120) 0.164 (0.151) 0.419 (0.169) 0.138 (0.037) –0.041 (0.023) –0.004 (0.074) –0.184 (0.117) –0.376 (0.117) –0.526 (0.139) –0.432 (0.152)
Year 1990 Year 1991 Year 1992 Year 1993 Year 1994 Year 1995 Year 1996 Year 1997 Year 1998 Year 1999 Year 2000 Year 2001 Constant No. of observations R2 SEE
–0.800 (0.120) –1.697 (0.123) –2.148 (0.133) –2.619 (0.146) –2.849 (0.153) –3.240 (0.173) –3.515 (0.186) –3.974 (0.199) –4.282 (0.207) –5.015 (0.229) –5.336 (0.242) –5.439 (0.254) 7.225 (0.208) 3,112 0.68 0.35
Notes: Dependent variable is the log of the price of computer when new. Excluded CPU is 80286. Excluded year is 1984. Standard errors corrected for heteroscedasticity and clustering by model are in parentheses.
price, so the coefficients of time-invariant parameters (such as characteristics of the computer) are identified, and time-invariant unobserved effects of the computer remain in the disturbances. These disturbances are implicit in the integrals of the components of depreciation in the preceding expressions. They will be made explicit in what follows. Though we observe different computers at different points in time, our econometric specification is a cross section of changes of price from new to used because we do not follow particular computers at more than two points in time.7
7. For many models, we observe linked new-used prices at different dates when the used computer was sold. In our econometric specifications, it is reasonable to assume that these observations within model are correlated across time. Accordingly, we correct standard errors in our regression estimates for clustering by model.
Why Do Computers Depreciate?
133
The following subsections consider alternative parameterizations of the economic depreciation function. Age-Related Depreciation To model age-related depreciation, we consider several functional forms for how the value of qt,t–v depends on the age of the computer. In the most general formulation, we allow depreciation to be a general function of the age v of the computer. To do so, consider the relationship V1
(8)
˜ v ε, log(qt,tv ) 0 ∑ ˜ v D v1
where 0 is instantaneous (time-zero) depreciation, ˜ v is the cumulative ˜ v are dummies that equal one for age-related depreciation as of age v, and D observations with age v and zero otherwise, and V is the maximum age of a piece of equipment in our sample. The variable ε is a mean zero, idiosyncratic disturbance. The s in this regression correspond in most cases to the negative of the s in the depreciation model of the theoretical section. In the estimates, we consider a different formulation. Let Dv be a dummy variable that equals one for a piece of equipment of age v or greater and zero otherwise. We will estimate the relationship V
(9)
log(qt,tv ) 0 ∑ v Dv ε, v2
where the v are the annual rates of depreciation between ages v – 1 and v. Note that equation (9) fits the data identically to equation (8), but is easier to compare with annual estimates of depreciation. There are a few observations in our sample where the year when new is the same as the year when sold. We have coded these as one-year-old pieces of equipment, so that age 1 depreciation is not separately identified from the 0. We also consider the restriction that the annual rate of depreciation is constant or a polynomial function of age. Specifically, we estimate K
(10)
log(qt,tv ) 0 ∑ kv k ε, k1
where K is the order of the polynomial, v k is the kth power of age, and k are parameters. With K equal to 1, we have standard case of constant geometric depreciation, that is, that the coefficients v in equation (8) are equal. This specification identifies age-related depreciation for all ages. Obsolescence We then generalize equations (8) and (9) to allow for shifters of the discount for used capital relative to new capital. Ramey and Shapiro (2001) emphasize how specificity of capital can lead to such discounts over and above physical depreciation. Personal computers are highly fungible
134
Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
across industry and activity, so these considerations seem very unlikely to be relevant. Computers may, however, be less fungible over time. As discussed in the introduction, computers can lose productivity because of changes in technology. This reduction in productivity relates not to the physical operation of the computer, but to its interoperability with other computers or with current software. Even before the Internet, such network economies related to common software, media, and data formats drove much of the value of computers. Increasing incompatibility with the current state of technology is a source of obsolescence and therefore of user cost that is not well modeled either as physical deterioration or as a reduction in the price of new computers owing to the decline in production costs of delivering computing power. This paper will attempt to measure obsolescence and quantify its role in the user cost of computers. Specifically, we augment equation (9) or (10) as (11)
log(qt,tv ) f(age) X ε,
where f(age) is the dummy-variable or polynomial function of age discussed above, X is a vector of indicators of obsolescence, is a vector of parameters to be estimated. To indicate obsolescence, we consider how the attributes of the used computer—the speed of its CPU, the amount of its RAM, and the size of its hard disk—stand in relationship to attributes of current new computers. Again, current software and operating systems are designed to make use of power and capacity of current new computers. Often, the owner of an older computer does not have the choice of running new software. The decline of replacement cost of hardware helps drive the development of new software. Incompatibilities with such new software accelerate the obsolescence of older computers. Recall that qt,t–v controls for the direct effect of the change in technology on the cost of production of computers. Hence, X accounts for further effects of technology on the value of used computers that are no longer on the technological frontier. The specific measures we consider are the deviation of the logarithm of the computer’s speed, RAM, or disk size from the median log speed, RAM, or disk size of current new computers. We also consider a composite of these measures, defined as a weighted sum, where the weights are the coefficients of the attributes in a hedonic regression of new computer prices. The estimates of these hedonic coefficients are given in table 5.4. Table 5.5 reports the mean values of these measures of obsolescence by year sold and age when sold.8 Table 5.5 shows how rapidly attributes of 8. We have few observations on new computers produced in 2001 because very few were resold within the year. The median RAM of the 2001 computers that we observe actually fell from 2000. For this calculation only, we recode the median RAM in 2001 to equal 128MB, the same value as in 2000. For all earlier years,there are sufficient observations of new computers
Why Do Computers Depreciate? Table 5.5
135
Attributes of used computers relative to current new models Deviation from median attribute of new computers
Year sold 1990 1992 1993 1995 1996 1997 1998 1999 2000 2001 Age sold 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Composite
RAM
Speed
Hard disk
CPU lag
0.506 0.425 0.637 1.109 1.018 1.238 1.131 1.399 1.077 1.020
1.301 0.785 1.041 1.477 1.164 1.658 1.228 2.039 1.497 1.228
0.290 0.471 0.656 1.189 1.188 1.380 1.398 1.440 1.279 1.094
0.483 0.281 0.845 1.988 1.887 2.025 1.975 2.354 1.403 2.082
1.2 1.0 1.6 1.4 1.1 0.9 1.6 2.1 1.6 1.3
0.332 0.606 0.930 1.209 1.520 1.765 2.133 2.313 2.401 2.512 2.583 2.818 3.125 3.151
0.478 0.826 1.205 1.566 1.958 2.278 2.765 3.005 3.105 3.512 3.665 4.047 4.545 4.135
0.357 0.678 1.030 1.351 1.704 1.986 2.386 2.587 2.684 2.733 2.758 2.999 3.309 3.624
0.521 0.954 1.622 2.078 2.610 3.003 3.644 3.949 4.126 4.040 4.215 4.537 4.972 4.965
0.7 1.0 1.3 1.6 1.8 2.2 2.6 2.8 2.9 3.0 3.1 3.5 3.7 4.0
Notes: The first four columns report means of log deviation of the attribute of the used computer relative to the median log attribute for the new computer in the year when sold. Composite is the weighted average of Hard disk, Speed, and RAM, with the weights taken from their coefficients in the hedonic regression reported in table 5.4. The CPU lag is the number of generations the CPU of the used computer is behind the latest CPU being marketed when the used computer is sold. Generations defined as 80286, 80386, 80486, Pentium I, Pentium II, and Pentium III or IV.
computers get out of date. At an age of one year, the RAM of a used computer is 48 percent that below the median RAM of a new computer, its speed is 36 percent slower, and its hard disk is 52 percent smaller. The value metric of the composite attribute shows a 33 percent decline. For older ages, the decline is rapid and continues for all but the oldest ages. in our sample to get reliable estimates. Note that the observations of new computers are perhaps not a representative sample. To get into the sample as a new computer, the computer must have a resale price. If prices of computers for which there is a secondary market differs systematically from the representative new computer, this feature of the data set leads to a potential source of bias.
136
Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
The other measure of distance of the used computer from the current technological frontier is how many generations its CPU is behind the generation of the best CPU available in new computers. We classify CPUs according to the six generations discussed previously in the specification of the hedonic model. The last column of table 5.5 reports the average number of generations a used computer is behind the frontier by year sold and by age. There is an upward trend in number of generations the CPU is behind with year sold. In earlier years, there are fewer generations of CPUs available. With age sold, the number of generations behind increases from just under one, on average, for one-year-old computers to four generations for the oldest ones. 5.4 Estimates of Depreciation In this section we present estimates of the depreciation of personal computers based on estimating how resale price falls as a function of the age of the computer (equations [9] and [10]) and how this function shifts when the controls for obsolescence are included (equation [11]). 5.4.1 Age-Related Depreciation Table 5.6 reports estimates of age-related depreciation for the dummy variable specification (9) and the polynomial specification (10). The lefthand-side variable is logarithm of nominal q. The right-hand-side variables include the change in the new price index. In columns (1) through (6), the coefficient of the price index is constrained to equal 1, so the regressions have implicitly the log of q on the left-hand side and no price variable on the right-hand side. The last two columns relax this restriction. Table 5.6 present estimates using two measures of the price of new computers, the BEA deflator for computers, denoted P BEA , and the hedonic t price index reported in table 5.4, denoted P HED . The index P BEA is a funct t tion only of year, while P HED is a function of year and the attributes of the t computer included in the hedonic equation, so it is more closely matched to the specific computers in our sample than the BEA index. The first column of table 5.6 reports the age-dummy estimates of agerelated depreciation using the BEA deflator to measure price change on new computers. All specifications also include a dummy variable that is 1 if the used computer is sold without a cathode ray tube monitor (CRT). The CRT represents a substantial—approximately one-quarter—fraction of the value.9 The constant of –0.280 indicates instantaneous depreciation of more 9. The value of the CRT enters multiplicatively in the specification. An alternative would be to enter it additively, though that would be more awkward econometrically. We do not know anything about the quality or value of the original CRT. Our specification assumes that it varies proportionately with the value of the computer.
Table 5.6
Explaining the resale price of computers, by age—Dependent variable: U N log(q NOM t,t–v ) log(P t,t–v /P t–v ) (1)
Age 2 Age 3 Age 4 Age 5 Age 6 Age 7 Age 8 Age 9 Age 10 Age 11 Age 12 Age 13 Age 14
(2)
–0.287 (0.027) –0.261 (0.038) –0.090 (0.043) –0.191 (0.047) –0.341 (0.054) –0.423 (0.055) –0.247 (0.066) –0.227 (0.067) –0.306 (0.177) 0.354 (0.209) –0.498 (0.194) –0.195 (0.232) 0.902 (0.260)
Age
–0.244 (0.008)
1.0
1.0
–0.242 (0.020) –0.000 (0.002) 1.0
log(P HED /P HED t t–v ) No CRT Constant R2 SEE
(4)
(5)
(6)
(7)
(8)
–0.134 (0.009)
–0.200 (0.019) 0.007 (0.002)
–0.074 (0.030) –0.005 (0.002) 1.465 (0.066)
–0.133 (0.035) 0.006 (0.002)
–0.206 (0.024) –0.170 (0.035) –0.077 (0.042) –0.044 (0.048) –0.207 (0.054) –0.299 (0.053) –0.155 (0.067) –0.036 (0.068) 0.148 (0.147) 0.335 (0.174) –0.408 (0.156) –0.128 (0.161) 0.557 (0.235)
Age2 log(P BEA /P BEA t t–v )
(3)
1.0
1.0
1.0
–0.425 (0.030) –0.180 (0.020)
–0.412 (0.029) –0.052 (0.022)
–0.412 (0.029) –0.054 (0.032)
–0.368 (0.030) –0.225 (0.026)
–0.377 (0.030) –0.138 (0.028)
–0.360 (0.030) –0.038 (0.035)
–0.398 (0.029) –0.081 (0.032)
1.164 (0.075) –0.346 (0.031) –0.045 (0.037)
0.60 0.57
0.59 0.58
0.59 0.58
0.36 0.58
0.34 0.59
0.35 0.58
0.86 0.57
0.85 0.58
Notes: Dependent variable is the log of the ratio of the used to new price, P BEA and P HED are t t the BEA’s and the authors’ hedonic price indexes for new computers. In columns (1) through (6), the price index for new computers is included in the regression with a coefficient of 1. Taking the price index to the left-hand side of the equation makes the dependent variable of the log of q. Standard errors corrected for heteroscedasticity and clustering by model are in parentheses. Age dummies are defined so the coefficient is the annual age-related depreciation (log difference). Age (v) is measured in years. No CRT is a dummy for the used computer being sold without a monitor. No. of observations 3,112.
138
Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
than 28 percent. Depreciation is 29 percent in year two, 26 percent in year three, 9 percent in year four, and 19 percent in year five. (These rates are measured as log differences. In tables 5.9, 5.10, and 5.11, we convert the estimates to levels and compute percent changes.) Later years have lower rates, on average, and they are more variable. Though the restriction that the annual rates of depreciation are equal, which is imposed in column (2), is rejected at any standard level of statistical significance, there is only a negligible reduction in R2 from imposing the restriction. Allowing a quadratic term in column (3) yields a significant coefficient but adds only modestly to the goodness of fit. The annual rate of depreciation of 24 percent is very high—much higher than is plausible for physical deterioration. Together with the constant of 41 percent, these estimates fit the facts that computers lose over half their value over the first two years of their lives. The estimates in columns (4), (5), and (6) based on our hedonic price index yield similar patterns of age-related depreciation but point estimates that correspond to about half the annual rate of depreciation.10 The average geometric rate of depreciation is 13 percent based on the estimates with our hedonic price index, which, though lower than the estimates based on the BEA price index, is still very high relative to our priors about the deterioration of computers. The last two columns present an informal specification check. We relax the restriction that the coefficient is 1 for each of the measures of new computer price change. For the BEA index, the unrestricted coefficient jumps to nearly 1.5 with a substantial downward effect on the annual rate of depreciation. (With a bigger coefficient on price change, which increases with age, the coefficient of age takes on a lower value.) Though the coefficient of price change differs from 1 when price change is measured with our hedonic index in column (8), it is much closer to 1. Consequently, the annual rate of depreciation is affected less. We use our hedonic price index in the remainder of our estimation. We will return to the specification test based on inclusion of the price index in the estimation equation once we have considered the measures of obsolescence. 5.4.2 Obsolescence As discussed previously, an advantage of our data set is its rich detail about the characteristics of the computers sold. By including measures of computer characteristics in our specifications, we can separate the effects of age, time, and obsolescence. We explore the effect of several measures of obsolescence of personal computers on the discounts of used computers relative to their reflated acquisition cost. These results are reported in table 5.7. The first column of table 5.7 includes the quadratic in age specification from table 5.6, column (6), for reference. 10. The R2 falls because the implicit left-hand-side variable is more variable.
Why Do Computers Depreciate? Table 5.7
Age Age2
Explaining the resale price of computers, by age and obsolescence—Dependent U N variable: log(q NOM t,t–v ) log(P t,t–v /P t–v ) (1)
(2)
(3)
(4)
(5)
(6)
(7)
–0.200 (0.019) 0.007 (0.002)
–0.001 (0.022) 0.001 (0.002) –0.055 (0.022) –0.041 (0.031) –0.383 (0.058)
–0.021 (0.023) 0.002 (0.002)
–0.093 (0.020) 0.008 (0.002)
–0.037 (0.022) 0.006 (0.001) –0.016 (0.021) 0.064 (0.028) –0.208 (0.058)
–0.060 (0.023) 0.007 (0.002)
–0.105 (0.034) 0.008 (0.002)
Hard disk deviation RAM deviation CPU speed deviation Composite deviation
–0.538 (0.046)
–0.360 (0.030) –0.038 (0.035)
–0.229 (0.034) –0.138 (0.037)
–0.231 (0.035) –0.121 (0.040)
–0.270 (0.029) –0.217 (0.031)
–0.248 (0.030) –0.250 (0.030)
–0.255 (0.031) –0.233 (0.032)
–0.145 (0.056) 0.014 (0.032) –0.424 (0.057) –1.107 (0.096) –1.139 (0.135) 0.871 (0.065) –0.253 (0.031) –0.221 (0.032)
0.35 0.58
0.42 0.55
0.42 0.55
0.49 0.52
0.50 0.51
0.49 0.51
0.49 0.51
CPU lag 1 CPU lag 2 CPU lag 3 CPU lag 4 log(P HED /P HED t t–v ) No CRT Constant R2 SEE
139
1.0
1.0
1.0
0.014 (0.030) –0.459 (0.045) –1.204 (0.072) –1.240 (0.111) 1.0
0.035 (0.030) –0.400 (0.053) –1.052 (0.092) –1.057 (0.119) 1.0
–0.145 (0.057) 0.045 (0.032) –0.374 (0.054) –1.043 (0.091) –1.029 (0.116) 1.0
Notes: Dependent variable is the log of the ratio of the used to new price. P HED is the authors’ hedonic t price indexes for new computers. In columns (1) through (6), the price index for new computers is included in the regression with a coefficient of 1. Taking the price index to the left-hand side of the equation makes the dependent variable the log of q. Standard errors corrected for heteroscedasticity and clustering by model are in parentheses. Age (v) is measured in years. Hard disk, RAM, and CPU speed deviation are the median value of those variables for the year when the used computer is sold minus the value of those variables for the used computer. The composite deviation is the weighted value of those variables using the hedonic coefficients reported in table 5.4 as weights. The CPU lag variable are dummies for number of generations the CPU of the used computer is behind the most recent CPU in production when the used computer is sold. The generations are defined as 80286, 80386, 80486, Pentium I, Pentium II, and Pentium III or IV. No CRT is a dummy for the used computer being sold without a monitor. No. of observations 3,112.
Obsolescence of Attributes Columns (2) and (3) of table 5.7 present estimates for the determination of depreciation for used computers by age and by these measures of obsolescence, as measured as deviation of the attribute of the computer from the median new computer at the time when it was sold used (see the pre-
140
Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
ceding). Column (2) includes the speed, RAM, and hard disk measures separately. Column (3) includes the composite measure based on the weighted sum of the three separate measures. In column (2), the obsolescence measures based on the individual attributes have jointly significant incremental explanatory power and, except for RAM, are individually statistically significant. The obsolescence of the attributes has negative effects on depreciation. The effects of obsolescence can be quantitatively significant. For example, having a CPU that is half the median speed lowers the value of the used computer by 19 percent, all other things equal. Imposing the restriction that these measures enter as the weighted sum in column (3) has only a negligible effect on the fit. The estimated effect of obsolescence on depreciation is substantial. The average two-year-old computer has a composite obsolescence of 0.61, that is, the hedonic value of the hard disk/speed/RAM bundle has fallen by 61 percent (see table 5.5). Multiplying this amount by the coefficient of –0.538 yields a predicted depreciation from obsolescence of 33 percent. Controlling for these measures of obsolescence has a substantial effect on the estimates of the age-related depreciation, whether entered individually in column (2) or as a composite in column (3). Age-related depreciation is estimated to be small and insignificant. Hence, controlling for obsolescence essentially eliminates the age-related component of depreciation. Distance of Used CPU from Frontier We construct a variable CPU lag that indicates how many generations the computer’s CPU is behind the frontier. If the computer has the latest CPU, then this variable is equal to 0. The number of generations the CPU is behind the frontier is broadly a measure of incompatibility of an existing computer with current software and operating systems. Table 5.7, column (4) reports estimates for CPU lag entered as dummy variables for lagging one to four generations. The zero lag is the omitted category; there are no computers in our sample sold with a lag of five generations. This set of variables has more explanatory power than the measures of attributes. A lag of 1 has little effect on value, a lag of 2 reduces it by almost half, and a lag of 3 or 4 eliminates most of value. Columns (5) and (6) of table 5.7 present estimates where both CPU lag and the measures of obsolescence of the CPU speed, RAM, and hard disk are included as explanatory variables. Though the speed variable remains significant, as does the composite attribute, these variables add little to the explanatory power of the CPU lag dummies. The coefficient of RAM has the “wrong” sign, but given its interaction with other factors such as CPU generation, this coefficient should not be overinterpreted. 5.4.3 Specification Test Recall that in the last two columns of table 5.6, when the restriction that the new price change has a unit coefficient is relaxed, the restriction is re-
Why Do Computers Depreciate?
141
jected. If the measures of age and obsolescence are correctly accounting for depreciation, then the new price change should have unit effect on the nominal used-new price ratio. Table 5.7, column (7) reports an estimate that allows us to test this restriction. When the coefficient of log(P HED /P HED t t–v ) is freely estimated, it is 0.871, with a standard error of 0.05, close to the theoretically mandated value of 1. Hence, the attributes we include in the equation are appropriately controlling for change in value insofar as they are correlated with the change in replacement cost measured by the deflator. 5.4.4 Age-Related Depreciation Revisited Once obsolescence is taken into account, age-related depreciation of the personal computers that were resold is negligible. This result does not mean that the effect of age on the entire cohort of computers is negligible. By concentrating only on those computers that are resold, we neglect the effect of age on the probability of the computer being scrapped. To see how we should amend our estimates, consider the equation that relates the average price of computers in the entire cohort to those that are sold used, (12)
p (v) S(v) p(v) [1 S(v)] 0,
where p (v) is the average price of the entire cohort at age v, p(v) is the price observed in the used computer market, and S(v) is the probability of survival to age v. It is assumed that scrapped computers have zero value. Taking logarithms of equation (12) and then differentiating with respect to age, we obtain (13)
∂ ln[S(v)] ∂ ln[ p (v)] ∂ ln[ p(v)] . ∂v ∂v ∂v
The first term on the right-hand side comes directly from our estimate of the effects of age on used price. The second term is the percent change in the probability of survival with respect to age. As Oliner (1996) points out, it is the negative of the hazard rate of scrappage with respect to age. The survival function for computers depends both on physical failure as well as technological obsolescence. Physical failure is only a small part of scrappage. Electronic components are typically assumed to follow a “bathtub” model of reliability, which is well approximated by a Weibull distribution (Wilkins 2002). A few months of decreasing failure rates are followed by several years of low, constant failure rates, with increasing failure rates during the end-of-life, wear-out phase. The most important factor driving scrappage is technological obsolescence. According to Matthews et al. (1997), the average life-span of a personal computer is around five years. Our data suggests that the survival distribution must have a long-tail because we have a number of much older computers in our sample. To capture the mean age of scrappage of five years as well as the long tail, we assume that the survival function is governed by a two-parameter Weibull distribution
142 Table 5.8
Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro Estimated scrappage rates for installed computers (hazard rates for Weibull distribution with 2 and 5.64) Age
Hazard rate (%)
1 2 3 4 5 10
6 13 19 25 31 57
Notes: Authors’ calculations. These hazard rates are parameterized to match the scrappage rate of personal computers. They need to be added to the decline in value of installed computers to account fully for user cost. See section 5.4.4 for details.
(14)
f(v)
1
e( / )
with 2 and 5.64.11 The implied hazard rates for this distribution are given by (15)
h(v)
1
.
The hazard rates at different ages for this distribution are given in table 5.8. The numbers in table 5.8 give the rate at which surviving computers are scrapped as of various ages. In order to characterize the depreciation rates of the entire cohort of computers instead of just the installed computers on which we base our analysis, these numbers must be added to the depreciation rates estimated based on the sample of used equipment. 5.4.5 Premia for Oldest Computers Though there are powerful factors pushing down the price of used computers, there are aspects of our results that older is not uniformly less valuable. First, the quadratic term in age-related depreciation is positive, so the rate of depreciation falls with age. In the next section, we will see that the quadratic term indeed dominates the negative linear term for the older vintages. Second, the coefficient of the CPU generation for lagging four generations is about the same as those lagging three generations. Hence, for surviving older models, there are factors pushing against deterioration and obsolescence that add to value. Ramey and Shapiro (2001) found similarly that there was a premium for some very old machine tools that were no longer manufactured. This finding could be accounted for by survivorship
11. Doms et al. (2004) use a similar distribution. For a very useful discussion of the Weibull distribution, see http://www.weibull.com.
Why Do Computers Depreciate?
143
bias in these very old models or the value of being able to run older applications. 5.5 Decomposing the Decline in Value of Used Computers Using the estimates from the last section, we can now decompose the decline in computer value into its key components: (a) the change in price of new computers; (b) instantaneous depreciation; (c) age-related depreciation; and (d) obsolescence. Specifically, the decline in value of used computers can be decomposed as (16)
P HED t I log(q NOM log(q NOM t,tv ) log t,tv ) t,tv P HED tv log(qt,tv ) (0 v s ) ε,
that is, instantaneous depreciation, age-related depreciation, obsolescence, and a residual. Tables 5.9, 5.10, and 5.11 summarize the key findings of the paper through this decomposition of user cost. For these tables, we convert the variables on the left-hand side and the fitted value on the righthand side of equation (16) to levels by exponentiation, compute percent changes for individual observations, and then average. Tables 5.9, 5.10, and 5.11 then report the average cumulative or annualized percent change in value. The average exponentiated values differ from the exponentiation of the averages because of heterogeneity. The appendix tables give the averages in terms of logarithms, the units in which equation (16) is estimated. Table 5.9 gives the cumulative and annualized values of the variable by age sold. The average nominal decline in value for a used computer relative to its nominal acquisition cost in our sample is 77 percent.12 The decline in replacement cost is 66 percent, on average, over the interval between acquisition and sale, or 32 percent per year of age. The decline in q is 41 percent over this interval, or 18 percent per year.13 Hence, the decline in replacement cost looms very large in the loss in value of used computers. The last three columns of table 5.9 decompose the q, the discount of used price relative to new price adjusted for the change in replacement cost. On average, age-related depreciation v is 7 percent cumulatively and 3 percent per year of age. This accords with the prior that deterioration of computers is negligible. Instantaneous depreciation of 21 percent is substantial 12. The annual rate of decline per year of age is 45 percent. The average cumulative change in value is –2.19 in logarithms (see table 5A.1). Exp(–2.19) is 0.11, which corresponds to an 89 percent decline in value. The difference between the 77 percent value in table 5.9 and this value illustrates the importance of taking into heterogeneity in taking averages. 13. The percent changes in table 5.9 are multiplicative and corrected for heterogeneity, so they do not add up either across columns or down rows. The log changes in the appendix tables are additive. Note that the first three columns in the appendix tables do not add to the last three columns because of the residual in equation (16) except for the averages.
Table 5.9
Explaining the price of used computers, by age Depreciation
Age sold (v)
Nominal q [log(q NOM t,t–v )]
Average
77
1 2 3 4 5 6 7 8 9 10 11 12 13 14
48 69 82 87 91 95 98 99 99 99 99 100 100 99
Average
45
1 2 3 4 5 6 7 8 9 10 11 12 13 14
47 46 46 43 42 42 44 43 42 41 36 38 37 32
New price [log(P HED /P HED t t–v )]
q [log(qt,t–v )]
Age-zero (0)
Obsolescence (s )
Cumulative decrease in value (%) 66 41 7
21
28
32 54 70 78 85 90 94 95 97 98 98 98 99 99
21 21 21 21 21 21 21 21 21 21 21 21 21 21
4 12 20 30 39 52 62 68 70 68 70 76 77 77
10
8
21 11 7 6 5 4 3 3 3 2 2 2 2 2
4 7 8 9 10 12 14 14 13 11 11 11 11 10
233 41 44 44 52 65 71 71 67 60 74 79 62
Age-related (v)
5 9 11 12 11 9 6 1 –5 –14 –25 –39 –57 –80
Annual rate decrease in value per year of age (%) 32 18 3 32 33 33 32 32 32 33 32 32 33 30 30 29 26
20 20 19 17 14 15 17 17 15 13 9 11 11 7
5 4 4 3 2 2 1 0 –1 –1 –2 –3 –4 –4
Notes: The percent decreases are calculated by exponentiating the logarithmic values and then averaging. The first panel contains averages of cumulative percent decreases. The second panel contains averages of percent decreases per year of age. The first column is change in nominal q, the second column is the change in the new price holding attributes constant, the third column is the change in q (the difference of the first two columns), and the last three columns give the estimates of components of depreciation from the regression in table 5.7, column (6).
Why Do Computers Depreciate? Table 5.10
145
Explaining the price of used computers, by year Depreciation
Age sold (v)
Nominal q [log(q NOM t,t–v )]
Average
77
1990 1992 1993 1995 1996 1997 1998 1999 2000 2001
41 71 79 80 79 80 76 76 81 74
Average
45
1990 1992 1993 1995 1996 1997 1998 1999 2000 2001
20 35 54 39 41 41 41 49 53 41
New price [log(P HED /P HED t t–v )]
q [log(qt,t–v )]
Age-zero (0)
Obsolescence (s )
Cumulative decrease in value (%) 66 41 7
21
28
38 77 64 73 66 71 63 72 67 60
21 21 21 21 21 21 21 21 21 21
11 8 24 26 19 21 27 40 30 25
10
8
11 8 12 7 8 8 10 11 10 9
3 3 11 7 5 5 6 15 9 7
4 –35 40 25 37 30 39 36 57 51
Age-related (v)
9 9 8 8 8 7 6 6 8 8
Annual rate decrease in value per year of age (%) 32 18 3 19 40 38 32 29 32 28 42 35 25
1 –8 26 11 16 13 18 11 28 21
4 3 4 3 3 3 3 3 3 3
Note: See notes to table 5.9.
but is much less than what is found for other capital goods. For example, Ramey and Shapiro (2001) find that the instantaneous discount for forklifts, the most fungible of the aerospace equipment they study, was about 40 percent. The majority of economic depreciation, 28 percent of value cumulatively or 8 percent per year of age, is attributable to our observed indicators of obsolescence. Recall that this loss of value owing to obsolescence of used computers is above and beyond the loss of value because of the decline in the replacement cost of a computer of constant quality, which is controlled for by the log (P HED /P HED t t–v ) term in equation (16) shown in the third column of table 5.9. Table 5.9 also shows the user cost decomposition by age sold. Age-related depreciation depresses value in the first years of the computers life by a modest amount. It accounts for approximately a 10 percent cumulative decline in value in the first two to three years of life. The rate of decline in value
146
Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
Table 5.11
Age sold 1 2 3 4 5 6 7 8 9 10 11 12 13 14 CPU 80,286 80,386 80,486 Pentium I Pentium II Pentium III/IV
Obsolescence (s ), by year, age sold, and CPU type: Cumulative decrease in value (%) 1990
1992
1993
1995
1996
1997
1998
1999
2000
2001
–2 18 3 1 42 42
0 8 –1 3 12 14 43 43
10 21 31 27 42 48 41 73 73
3 6 20 27 35 38 46 53 46 75 75
3 7 8 20 29 36 35 48 54 48 76 76
6 8 12 13 26 31 39 39 51 56 51 77 77
0 5 7 28 41 54 58 70 67 76 76 76 77 77
10 36 43 45 62 71 75 76 76 76 78 77 78
0 12 36 44 45 62 71 75 76 75
2 3 14 38 45 47 62 71 76 77
41 1 –2
41 1 –2
72 38 1
74 44 8 5
75 45 9 4
76 48 14 8
76 74 45 4 0
76 75 43 3
75 75 43 3 1
76 44 5 3
Notes: Cumulative obsolescence (percent decrease) based on estimates from table 5.7, column (6). See also notes to table 5.9.
attributable solely to age then gets smaller. At high age, the quadratic term dominates, so the age-specific component adds to value. As noted previously, this finding can arise from selectivity or a premium (other things equal) for old models that can operate old software. Given the thinness of markets for very old computers, the results for the earlier ages where the linear term dominates are of greater interest and are also more reliable. Obsolescence increases substantially with age. The rate of increase, shown in the lower panel of the table, increases for moderate age, but then levels off. Obsolescence accounts for most of the decline in q at all ages once instantaneous depreciation is taken into account. This finding is clearest in the logarithmic results shown in table 5A.1, which are additive across columns and down rows. For example, a five-year-old computer changes in value by –0.82 on log scale. Of this, –0.12 owes to age-related depreciation, –0.23 owes to instantaneous loss of value, and –0.55 owes to obsolescence.14 14. Note that these figures do not sum to –0.82 because of the residual in equation (16), which only averages to zero for the whole sample, not for a particular age.
Why Do Computers Depreciate?
147
Finally, note again that results are for depreciation of installed computers. To obtain an estimate of total depreciation per period, the scrappage rates in table 5.8 need to be added to the rates in the bottom panel of table 5.9. Table 5.10 shows the cumulative components of user cost by year sold. The corresponding results in logarithms are reported in table 5A.2. Our hedonic price index declines on average of 32 percent per year.15 Our estimate bounces around somewhat from year to year. There is no clear trend in the rate of change of the index. While the estimates of obsolescence also have a lot of variation from year to year, its rate increases somewhat over time. Recall that the coefficients of the measures of obsolescence are timeinvariant, so this increase in obsolescence with time is coming from the declining relative attributes of used computers. Given that personal computers were relatively new products at the beginning of our sample, this pattern is not surprising and would not be expected to apply to mature products. Table 5.11 takes a closer look at obsolescence by time, age, and CPU generation. The top panel shows how the increase in obsolescence with age gets more pronounced over time. The bottom panel shows how obsolescence increases over time within generation of CPU. There is a distinct pattern of slow aging of CPUs until the next generation is introduced and then higher rates of obsolescence within several years of the introduction. The 80286 had experienced substantial obsolescence at the beginning of the sample. (The 80386 was introduced in the mid-1980s.) The obsolescence of the 80386 becomes significant in 1993. The 80486 does not become significantly obsolete until 1998, while the Pentium I becomes significant obsolete one year later in 1999. We do not discern a pattern of faster or slower obsolescence of computers over time. 5.6 Conclusion This paper has sought to provide a detailed answer to the question of why computers lose their economic value so quickly. In order to answer the question, we gathered data on the characteristics of over 3,000 computers, including the new and used price, detailed features of the computer, and age. By linking the ratio of the used and new price of the computer to observable characteristics, we were able to estimate key components of the user cost of installed computers. The typical computer, when it is sold, has experienced about a 77 percent decline in value compared to its price when new. About half of this decline in value can be accounted for by the decline in replacement cost of com15. The BEA price index for personal consumption of computers declined at an annual rate of 25 percent over the 1990 to 2001 time period. We do not have an explanation of the differences in these rates.
148
Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
puters of constant quality. That is, even if nothing intrinsic has happened to the computer, it can be replaced at much lower cost. What accounts for the remaining decline in the value of this computer? This paper shows that obsolescence accounts for most of the remaining decline. Though instantaneous depreciation is important (accounting for a 20 percent decline in the used price relative to the new price), age-related depreciation is small. By using a parsimonious set of variables to quantify obsolescence, we can account for the remaining quarter of the decline in the value of computers to when they are sold new, or over half of the drop in the q of a three-yearold computer. Without accounting for obsolescence, the estimated agerelated depreciation is between 15 and 25 percent per year. Therefore, the standard procedure of attributing all age-related depreciation to deterioration can be seriously misleading. The paper has identified the forward movement in the technological frontier as the source of obsolescence. The interactions of the improvements in hardware with the design of software magnify the effects of technological progress. The high rate of obsolescence during the period of study is in large part the outcome of the unique interaction of hardware and software in computers. During this time period, technological change in hardware manufacturing drastically lowered the cost of RAM, speed, and hard disk space. The lower cost by itself would not cause obsolescence. For example, the real price of new microwaves has also fallen over time, but obsolescence of existing microwaves has been minimal because the only network component of microwaves is electricity, which has not changed. In contrast, the lower cost of computer hardware spurred software designers to write more versatile programs that were more demanding on the hardware. The newer software does not run well on the limited capabilities of older machines. Moreover, one cannot simply set up two or more older machines to achieve the same capabilities of a newer machine: if a program needs 400 MHz to run well, setting up two 200 MHz machines will not solve the problem. Without the decline in replacement cost, it is unlikely that obsolescence would have proceeded so quickly. If the rate of technological progress in the production of computer hardware slows down, one would expect the rate of obsolescence of used computers to decrease as well.
Appendix Table 5A.1
Explaining the price of used computers, by age Depreciation
Age sold (v)
Nominal q [log(qNOM t,t–v )]
Average
–2.19
1 2 3 4 5 6 7 8 9 10 11 12 13 14
–0.70 –1.30 –1.91 –2.31 –2.76 –3.35 –4.11 –4.60 –5.02 –5.35 –4.93 –5.70 –6.09 –5.31
New price [log(P HED /P HED t t–v )]
q [log(qt,t–v )]
Age-zero (–0)
Obsolescence (–s )
Cumulative change in value (log levels) –1.46 –0.73 –0.08
–0.23
–0.42
–0.42 –0.81 –1.22 –1.53 –1.94 –2.33 –2.78 –3.10 –3.49 –3.97 –3.88 –4.25 –4.51 –4.28
–0.23 –0.23 –0.23 –0.23 –0.23 –0.23 –0.23 –0.23 –0.23 –0.23 –0.23 –0.23 –0.23 –0.23
–0.05 –0.15 –0.26 –0.38 –0.55 –0.81 –1.07 –1.22 –1.27 –1.21 –1.27 –1.45 –1.49 –1.49
–0.29 –0.50 –0.69 –0.78 –0.82 –1.03 –1.33 –1.49 –1.53 –1.38 –1.05 –1.46 –1.58 –1.03
Age-related (–v)
–0.05 –0.09 –0.11 –0.12 –0.12 –0.10 –0.06 –0.01 0.05 0.13 0.22 0.33 0.45 0.59
Note: See notes to table 5.9.
Table 5A.2
Explaining the price of used computers, by year Depreciation
Age sold (v)
Nominal q [log(q NOM t,t–v )]
Average
–2.19
1990 1992 1993 1995 1996 1997 1998 1999 2000 2001
–0.57 –1.31 –1.89 –2.01 –1.97 –2.08 –2.01 –2.38 –2.57 –2.16
New price [log(P HED /P HED t t–v )]
Note: See notes to table 5.9.
q [log(qt,t–v )]
Age-zero (–0)
Obsolescence (–s )
Cumulative change in value (log levels) –1.46 –0.73 –0.08
–0.23
–0.42
–0.49 –1.55 –1.27 –1.62 –1.41 –1.63 –1.45 –1.68 –1.46 –1.26
–0.23 –0.23 –0.23 –0.23 –0.23 –0.23 –0.23 –0.23 –0.23 –0.23
–0.15 –0.10 –0.34 –0.35 –0.25 –0.27 –0.42 –0.65 –0.47 –0.38
–0.08 0.24 –0.63 –0.39 –0.56 –0.45 –0.56 –0.71 –1.10 –0.90
Age-related (–v)
–0.09 –0.10 –0.08 –0.09 –0.08 –0.07 –0.06 –0.06 –0.08 –0.08
150
Michael J. Geske, Valerie A. Ramey, and Matthew D. Shapiro
References Berndt, Ernst R., and Zvi Griliches. 1993. Prices indexes for microcomputers: An exploratory study. In Price measurements and their uses, ed. Murray F. Foss, Marilyn E. Manser, and Allan H. Young, 63–93. Chicago: University of Chicago Press. Doms, Mark E., Wendy E. Dunn, Stephen D. Oliner, and Daniel E. Sichel. 2004. How fast do personal computers depreciate? Concepts and new estimates. In Tax policy and the economy. Vol. 18, ed. James Poterba, 37–79. Cambridge, MA: MIT Press. Jorgenson, Dale W., and Zvi Griliches. 1967. The explanation of productivity change. Review of Economic Studies 34 (July): 239–80. Hall, Robert E. 1968. Technical change and capital from the point of view of the dual. Review of Economic Studies 35 (January): 34–46. ———. 1971. The measurement of quality changes from vintage price data. In Price indices and quality change, ed. Zvi Griliches, 240–71. Cambridge, MA: Harvard University Press. Hall, Robert E., and Dale W. Jorgenson. 1967. Tax policy and investment behavior. American Economic Review 57 (April): 391–414. Hulten, Charles. 1996. Quality change in capital goods and its impact on economic growth. NBER Working Paper no. 5569. Cambridge, MA: National Bureau of Economic Research, May. Hulten, Charles, and Frank Wykoff. 1981. The estimation of economic depreciation using vintage asset prices. Journal of Econometrics 15 (April): 367–96. ———. 1996. Issues in the measurement of economic depreciation. Economic Inquiry 34:10–23. Jorgenson, Dale W. 1996. Empirical studies of depreciation. Economic Inquiry 34 (January): 24–42. Matthews, H. Scott, Francis C. McMichael, Christ T. Hendrickson, and Deanna J. Hart. 1997. Disposition and end-of-life options for personal computers. Green Design Initiative Technical Report no. 97-10. Pittsburgh, PA: Carnegie Mellon University. Oliner, Stephen. 1993. Constant-quality price change, depreciation, and retirement of mainframe computers. In Price measurements and their uses, ed. Murray F. Foss, Marilyn E. Manser, and Allan H. Young, 19–61. Chicago: University of Chicago Press. ———. 1996. New evidence on the retirement and depreciation of machine tools. Economic Inquiry 34:57–77. Orion Research Corporation. (Various years). Orion computer blue book. Durango, CO: Orion Research Corporation. Pakes, Ariel. 2003. A reconsideration of hedonic price indices with an application PC’s. American Economic Review 93 (December): 1578–96. Ramey, Valerie A., and Matthew D. Shapiro. 2001. Displaced capital: A study of aerospace plant closings. Journal of Political Economy 109 (October): 958–92. Schultze, Charles, and Christopher Mackie. 2002. At what price? Conceptualizing and measuring cost-of-living and price indexes. Washington, DC: National Academies of Science. Wilkins, Dennis. 2002. The bathtub curve and product failure behavior. Hotwire 22 (December): http://www.weibull.com/hotwire/issue22. Wykoff, Frank C. 2003. Obsolescence in economic depreciation from the point of view of the revaluation term. Pomona College. Unpublished Manuscript.
III
Quality Adjustment and Price Measurement Issues Recent Developments
6 Downward Bias in the Most Important CPI Component The Case of Rental Shelter, 1914–2003 Robert J. Gordon and Todd vanGoethem
6.1 Introduction This paper develops new price indexes from a variety of sources to assess the hypothesis that the Consumer Price Index (CPI) for rental shelter housing has been biased downward for its entire history since 1914. Rental shelter housing is the most important single category of the CPI, especially for those years when rent data have been used to impute price changes for owner-occupied housing. If valid, the implications of the hypothesis of downward bias would carry over to the deflator for personal consumption expenditures (PCE) and, in the opposite direction, to historical measures of real PCE and real gross domestic product (GDP).1 The high-water mark of widespread belief in the pervasiveness of upward bias in the CPI may have been reached on December 4, 1996, the reRobert J. Gordon is Stanley G. Harris Professor in the Social Sciences and professor of economics at Northwestern University, and a research associate of the National Bureau of Economic Research. Todd vanGoethem is associate consultant for Bain & Company, Inc. This research was supported by the National Science Foundation. The authors are grateful to Ted Crone and Leonard Nakamura of the Federal Reserve Bank of Philadelphia for providing their data from the American Housing Survey (AHS), which we have subsequently obtained and extended directly from the AHS source, and to Barbara Fraumeni for supplying numerous sources on Bureau of Economic Analysis (BEA) structures deflation methodology. Matt Scharf provided useful research assistance in the early stages of this research, Ike Song helped to update the final AHS results for publication, Ian Dew-Becker supplied crucial last-minute assistance, and Gail Mandelkern contributed greatly by her painstaking research on Evanston rent and house price indexes. In keeping with the inspiration of the conference where this paper was originally presented, we would not have known about one of our most crucial sources, the 1972 dissertation by Rafael Weston, if Zvi Griliches had not reported its existence thirty years ago. 1. Before 1983, the CPI employed its own idiosyncratic method for owner-occupied housing, while the PCE and GDP deflators used the CPI rental shelter index as the deflator for imputed rent on owner-occupied housing.
153
154
Robert J. Gordon and Todd vanGoethem
lease date of the Boskin Commission Report.2 Since then the Boskin conclusion has been tempered in at least three directions. First, the report itself was criticized for overstating the extent of upward quality-change bias for several products including the subject of this paper, rental shelter prices (Moulton and Moses 1997). Second, the report appeared in a period of rapid improvement in the CPI, particularly in its treatment of substitution bias so that the current CPI is substantially less vulnerable to some of the Boskin Report’s criticisms. Third, there is increasing recognition that the Boskin results, which explicitly referred to the situation as of 1995–1996, may not be applicable to previous historical periods. 6.2 The Logical Case for Downward Bias For historical analysis a basic point on the direction and magnitude of bias was made by Chuck Hulten (1997) in his discussion of William Nordhaus’s (1997) seminal paper on the history of the price of light. Hulten’s point implies that the CPI (linked to pre-1914 indexes developed by economic historians) could not logically have been upward biased by a significant amount over as long as two centuries. If the CPI had been biased upward by, say, 1.4 percent per year since 1800, as Nordhaus had speculated, then the implied standard of living of U.S. households in the year 1800, Hulten argued, would have been implausibly low. Picking up Hulten’s theme, and using the hypothetical upward bias rate of 1.4 percent per year, Gordon (2004) calculated that the median household in 1800 would have been able to buy only 1.3 pounds of potatoes per day, with nothing left over for clothing, shelter, or anything else. Extending the point back to the happy, well-fed, and clothed Dutch burghers depicted in the paintings of Pieter Bruegel the elder (1525–1569), the Nordhaus 1.4 percent bias would imply the purchase of only 0.8 ounces of potatoes per day, with nothing left over for apparel, shelter, or anything else. Thus, there is a logical case that, if there has been an upward bias in the CPI in recent decades, it must flatten out or even become negative before some point back into the depths of history. If we make the plausible assumption that the CPI for durable goods is upward biased for the entire twentieth century, as Gordon (1990) showed for the period 1947–1983, then some other major component of the CPI must have been downward biased. This paper assesses the extent of a downward bias for rental shelter housing, and a companion paper (Gordon 2004) examines new evidence showing a downward bias for apparel.3 This set of research results finding 2. The Boskin conclusion was that, as of 1995–1996, the CPI was biased upward at a rate of 1.1 percent per year. Implicit in the report is the conclusion that prior to 1993 back to some unspecified date the bias rate was 1.4 percent per year. The Boskin Commission Report is listed in the references as Boskin et al. (1996). 3. This line of research awaits a study of the history of food prices, which is needed to complete the trilogy of necessities, food, clothing, and shelter, which together accounted for 79 percent of household expenditure for wage earners in 1918 (Brown 1994, table 3.9, 78).
Downward Bias in the Most Important CPI Component
155
upward bias for some products and downward bias for others echoes Jack Triplett’s (1971) perceptive suggestion more than three decades ago that the overall CPI bias could go either way because the bias has different signs for different products. 6.3 Circumstantial Evidence of Downward Bias We can compare the change in the CPI for shelter rent between the mid1920s and the late 1990s with scattered pieces of evidence on rents and house prices. The large discrepancies revealed here could occur because of unmeasured CPI bias, unmeasured quality change, or differences in the evolution over time of shelter rent and house prices. The ratio of the 1999 to 1925 value of the CPI for rental shelter is 177.5/ 34.6 on a base of 1982–1984 100, that is, a ratio of 5.1.4 The ratio for nominal gross rent per rental unit for the same years is 19.6 (see table 6.1 in the following). The 1999-to-1925 ratio for the median price of existing single-family houses in Washington, D.C. is 22.5.5 Amazingly close is the ratio for the same two years of nominal net residential capital stock per housing unit, 22.1.6 These alternative indexes are all unadjusted for either inflation or quality change. Brown’s (1994) detailed study of household expenditure patterns allows us to narrow the comparison to a particular type of household, the “wage earner” and the “salaried worker.” Here data can be used to compare 1988 with 1918, for which the CPI ratio is 5.9. For wage earners, the 1988-to1918 ratio for rent excluding utilities is 29.1 and for rent including utilities is 25.4. For salaried workers, the ratio excluding fuel is 26.6 and including fuel is 22.9.7 For the 1999 to 1925 comparison, a ratio of 22 translates into an annual growth rate of 4.18 percent per year, while the CPI ratio of 5.1 translates into 2.20 percent per year, for a difference of 1.98 percent per year. This difference in growth rates overstates the amount of potential downward CPI bias by the annual growth rate in quality over the same interval. Here, the similarity of the rental and house price ratios is somewhat puzzling as we would expect that the quality of owner-occupied houses has increased substantially more than that of rental apartments. For instance, there has 4. For aggregate sources, see table 6.1. 5. For 1925, the median asking price of existing homes in Washington, D.C. was $7,809, Historical Statistics, series N149. For 1999, the median price was $176,500, Statistical Abstract (2000, table 1202, 716). 6. For 1925, the value of net residential wealth consisted of $51.1 billion of structures (excluding land), or an average of $2,621 per each of 19.5 million dwelling units, from Historical Statistics, series N133. For 1998, the value was $9,405 billion, or an average of $81,783 for each of roughly 115 million units, Statistical Abstract (2000, table 1222, 726). 7. For 1988, Brown (1994, table 3.6A, 62) lists annual per-household expenditures on “rent” and “fuel and light” separately for each earner type. Table 7.8A (392–93) lists “tenant rent” and table 7.9 (398) lists “Renter fuel” and “Renter utilities.” For 1918, see Brown (1994, table 3.6A, 62).
156
Robert J. Gordon and Todd vanGoethem
not been any appreciable increase in the size of apartments; the number of rooms in units rented by wage earners was 4.9 in 1918 and by all renters was 4.3 in 1988.8 6.4 Why Rental Shelter Prices Represent an Appealing Research Topic The circumstantial evidence reviewed in the preceding, implying a substantial downward bias in the CPI, is only the first of several reasons to place priority on the research topic of this paper. Second, rental shelter carries by far the largest weight in the CPI, especially when one recognizes that owner-occupied housing prices are proxied by the rental shelter index with a different set of weights. Third, rental units are less heterogenous in size at any given time, are more homogeneous over time, and experience quality change along fewer dimensions than owner-occupied housing units.9 Fourth, price changes on rental units are more homogeneous across space than for owner-occupied units.10 Fifth, discussion of tenant rent is conceptually simpler than for owner-occupied housing, where issues of the effect of tax-deductible mortgages and capital gains are central to changes in the true user cost. Rent is not tax deductible and generates no capital gains. If changes in tax laws or capital gains affect the incentives of landlords to supply apartments, this would be reflected (perhaps after a long lag) in the cost of rental as measured by the CPI and any alternative price index. Because of the importance of rental shelter prices in the CPI, any finding of a significant downward bias over a long period of time would have implications for the history of inflation, economic growth, and productivity change. Findings that the degree of bias differed across historical decades would imply accelerations or decelerations in economic growth that might be different than in the current official data. Evidence developed in this paper would need to be weighed against evidence of upward bias in 8. Rooms per apartment for 1918 come from Brown (1994, table 3.6A, 62). For 1988, we take the average of the mean values for 1987 and 1989 from the American Housing Survey data summarized in table 6.2. 9. In 2001, 80 percent of rental units had between three and five rooms, whereas only 35 percent of owner-occupied units fell in this range. Fully 20 percent of owner-occupied units were in the top-end category of eight rooms, whereas only 2 percent of rental units fell into this top category. See Statistical Abstract (2002, table 937, 599). Over time, between 1960 and 2001 the average number of rooms per owner-occupied unit rose from 5.2 to 6.2, while the average number of rooms per rental unit increased only one-third as much, from 4.0 to 4.3 rooms. These are weighted averages of size distributions given in Statistical Abstract (1962, table 1253, 753), and Statistical Abstract (2002, table 937, 599). The comment about dimensions of quality change is discussed further in the following. 10. The startling dichotomy between selling prices of homes in coastal “glamour” cities compared to the rest of the United States is emphasized in Case and Shiller (2003). They contrast Boston, with a 9.1 percent annual rate of price increase during 1995–2002, with the mere 5.1 percent rate of increase in Milwaukee. For rental units, however, the differential is miniscule, admittedly over a different period of 1988–1997, with annual growth rates of rents of 3.3 percent for Milwaukee and 3.0 percent for Boston, see Goodman (2003, exhibit 1).
Downward Bias in the Most Important CPI Component
157
some other categories, especially consumer durable goods, before a final verdict on the implications of historical CPI bias could be rendered. 6.5 Contributions of This Paper There are relatively few papers that study rental shelter prices using data external to the CPI, as contrasted to those studies that have examined behavior using the CPI data sample, that is, Randolph (1988). No paper covers our long historical period going back to 1914. Our paper is complementary to the recent pair of papers by Crone, Nakamura, and Voith (CNV; 2003a,b) and shares with Crone, Nakamura, and Voith (2003b) the development of hedonic price indexes for rental shelter based on data from the American Housing Survey (AHS) for the period after 1975, ending in 1999 for Crone, Nakamura, and Voith and in 2003 for this study. However, our research strategy differs from that of Crone, Nakamura, and Voith (2003b), who are primarily interested in issues of functional form, whereas we are mainly interested in quality change. Because there is much more quantitative information on quality change available after 1975 than before, and even more after 1985 than before, we take advantage of this data richness to measure the rate of quality change and its determinants. This then allows us to apply these rates of quality change based on good data to earlier periods when we have much less detailed evidence. For the period 1930–1975 ours is the first published study to provide quantitative estimates of rental price and quality change, building on an unpublished dissertation by Weston (1972). We bridge the data gap between the end of Weston’s data in 1970 and the beginning of the AHS data in 1975 by estimating hedonic regression equations from micro Census of Housing data for the four years 1960, 1970, 1980, and 1990. Our results are complementary to the pre-1975 bias estimates of Crone, Nakamura, and Voith (2003a), which unlike ours are not based on actual rental data but rather on a theoretical model of how particular deficiencies in CPI methodology translate into price index bias. Three types of data allow us to push the results back before 1930. First, we use the budget studies in Brown (1994) to create indexes of rent paid per room by different classes of tenants; this allows us to link rent per room in 1918 with selected subsequent years extending up to 1988. We also develop an informal analysis of quality change from comments and data in the Brown book. Second, we compile an alternative set of data on rent per household and per room from early NBER studies of national income and wealth, especially Grebler, Blank, and Winnick (1956), allowing us to go back to 1914 and before. Third, we report on alternative rental price indexes developed by Gordon and Mandelkern (2001) for Evanston, IL covering the period 1925–1999, based on newspaper listings, and in some cases tracking rent changes for apartments having the same street address.
158
Robert J. Gordon and Todd vanGoethem
6.6 Comparing the CPI with Gross Rents over a Near Century Table 6.1 provides our first systematic look at the data. The CPI for rental shelter is available continuously for each year from 1913, and column (1) displays the CPI for each year when we have another index to compare to the CPI. Column (2) displays the implicit rent calculated from data in Grebler, Blank, and Winnick (GBW; 1956). While based on aggregate data, this source implies an average monthly rent of $19.23 in 1914, which is not far from the $20.67 for 1918 reported in column (7) from Brown’s (1994) research based on the Consumer Expenditure Survey (CES). The next four columns are based on official government sources. The “Weston” column, (3), extracts mean rent from the Census of Housing for 1930 to 1970.11 The next column, (4), labeled “CNV Median Gross Rent,” combines Census data through 1970 with AHS data beginning in 1977. The subsequent column (5) exhibits mean contract rent from the Census microdata files, and then column (6) presents the mean contract rent from the AHS data. Any differences between the CNV, Census, and AHS columns reflect the distinction between the median used by Crone, Nakamura, and Voith (2003) and the mean values used in our calculations from the original government sources. Column (7) extracts from Brown’s (1994) budget data the monthly cost of rent for “salaried workers” over the five years that she examines. The index numbers in the top section of table 6.1 are translated into growth rates in the bottom section. Columns (8) and (9) in the bottom section show one or two differences between the growth rate of the CPI over a particular interval minus the growth rate of the alternative index displayed in that column in the top part of the table. All eight of the growth rate comparisons show that the CPI grew more slowly than the comparison index, except for the Crone, Nakamura, and Voith (2003) version of the AHS index over the period 1985–1995. From 1914 to 1985, most of the alternative indexes of mean or median rent grow about 2 percent per year faster than the CPI, and this is true of the Grebler, Blank, and Winnick and Brown indexes that cover the pre-1935 period. Over 1930–1970, the difference with the Weston-based data from the U.S. Census of Housing is also quite large—2.12 percent per year—and this is identical to the difference with the Crone, Nakamura, and Voith–calculated median contract rent from the same Census of Housing data. The next line for 1960–1990 displays a difference between the CPI and Census of Housing mean rent at an annual rate of –2.03 percent, almost the same as the 1930–1970 difference. Finally, the next line for 1973–1988 displays the largest difference, that between the CPI and the Brown budget data of –3.10 percent per year. The final three lines exhibit differences between the growth rates of the 11. The Census of Housing began in 1940, but Weston was able to infer similar data from the 1930 Census of Population.
21.0 21.5 27.4 34.6 31.2 21.4 23.7 29.7 38.7 46.5 52.5 58.0 64.8 74.3 80.9 87.9 100.1 111.8 123.1 127.8 132.8 138.4 143.3 150.3 157.8 166.7 177.5 192.1 205.5
Year
1914 1918 1920 1925 1930 1935 1940 1950 1960 1970 1973 1975 1977 1979 1980 1981 1983 1985 1987 1988 1989 1990 1991 1993 1995 1997 1999 2001 2003
28.37 33.91 30.49 28.44
19.23
GBW mean gross rent (2)
20.86 35.09 59.62 91.65
30.89 46.08 74.92 115.80
589.32
474.84
337.98
159.33
26.26
CNV median contract rent (4)
33.22
Weston mean gross rent (3)
Alternative measures of monthly rental expenditure, 1914–2001
CPI for rent (1982–84 100) (1)
Table 6.1
410.03
216.04
62.31 98.95
Census mean contract rent (5)
425.61 456.02 494.76 509.52 595.84 639.27 683.18
394.76
241.57 271.12 314.50 344.11
135.20 159.33 188.97
AHS mean contract rent (6)
549.25
141.67
41.00
30.17
20.67
Brown CES budget study rent “salaried” (7)
(9)
(continued )
(8)
Differences
0.01 2.36 1.00 4.25 5.93 6.82 3.45 3.30
CPI for rent (1982–84 100) (1)
(continued)
1.86
GBW mean gross rent (2)
3.12
Weston mean gross rent (3)
9.40 3.40
3.13
Growth rates
CNV median contract rent (4)
6.28
Census mean contract rent (5)
8.50 4.53 4.03
AHS mean contract rent (6)
9.03
2.22 4.07
Brown CES budget study rent “salaried” (7)
–2.12 –2.03 –3.10 –2.58 0.05
–1.85
(8)
–1.68 –1.08 –0.73
–2.21 –1.71 –2.13
(9)
Differences
Sources: Column (1): CPI for rent, 1982–1984 100, BLS web site, series CUUR0000SEHA, “U.S. City Average, Rent of Primary Residence, 1982–1984 100.” Column (2): Grebler, Blank, and Winnick (1956). Total nominal expenditures on aggregate rental expenditures from table I-1 on page 407, averaged together as appropriate. For instance, 1914 is based on the line labeled “1909–19,” 1920 is the average of the lines labeled “1909–19” and “1919–29,” and so on. Number of nonfarm households from table 23 on page 82. Mean gross rent is aggregate rental expenditures divided by total nonfarm households. Column (3): Weston (1972), mean rents calculated from tables 3-3 and 3-4. Table 3-3 contains the number of units cross-classified by type. Table 3-4 contains rents for each of the types. The mean rent was calculated by multiplying each cell from those tables to yield rental revenue, summed to equal total revenue in each year, and then divided by the total number of rental units. Column (4): Calculated from Crone, Nakamura, and Voith (2003b, table 11), starting with our 1977 value in column (5) from the AHS data, and working forward and backward by calculating the Crone, Nakamura, and Voith annualized growth rates into changes in levels using the exponential function. This conversion introduces an unknown degree of error, because the growth rates calculated by Crone, Nakamura, and Voith in their appendix 2 are not accurate calculations of compound growth rates using natural logs. Column (5): Integrated Public Use Microdata series, University of Minnesota, http://www.ipums.org. Column (6): Mean of all observations in AHS regression data. For issues involved in sources and manipulation of AHS regression data, see data appendix. Column (7): Brown (1994), for the five years shown, the source tables and page numbers are 1918, table 3.6A, page 62; 1935, table 4.8, page 127; 1950, table 5.10, pages 212–13; 1973, table 6.8A, pages 294–95; 1988, table 7.8A, pages 392–93.
1914–1935 1935–1973 1930–1970 1960–1990 1973–1988 1977–1985 1985–1995 1995–2003
Year
Table 6.1
Downward Bias in the Most Important CPI Component
161
CPI and the AHS data, both as calculated by Crone, Nakamura, and Voith (2003) and in our study. In our calculation (column [9]), the difference in growth rates between the CPI and the AHS mean rent shrinks slowly from –1.68 percent per year in 1977–1985 to –1.08 percent per year in 1985–1995 to –0.73 percent per year in 1995–2003, whereas in the Crone, Nakamura, and Voith (2003) calculation (column [8]) the difference starts higher and ends lower. These “differences” do not, of course, provide any evidence of bias in the CPI as in principle the differences could be explained by quality change. Subsequently, we shall estimate hedonic price indexes for the 1975–2003 period that take account of those aspects of quality change that correspond to quality characteristics reported in the AHS data. If we were to conjecture that quality change advanced at a steady pace over the twentieth century, then the differences reported in the bottom section of table 6.1 are intriguing. The differences were close to 2 percent per year over most of the period after 1930 and before 1989. The difference was minor during 1914–1930 in the first line and was relatively small for 1995– 2003 in the last line. Obviously, a conclusion that quality change proceeded at a rate of 2 percent per year would explain the differences displayed in the bottom of table 6.1 and reject the hypothesis that the CPI for rental shelter is downward biased over the past century. A conclusion that quality change proceeded at a rate significantly slower than 2 percent per year, for example, 1.0 or 0.5 percent per year, would support the hypothesis that the CPI is downward biased by the difference shown at the bottom of table 6.1 and the calculated rate of quality improvement. 6.7 Conceptual Issues in the Development of Rental Price Indexes The basic task of the CPI is to measure changes in the quality-adjusted price of a rental unit. In December, 2002, the share of the total CPI allocated to the rent index was 31.4 percent, consisting of a 6.5 percent share for rent of primary residence, 22.2 percent rental equivalence for owneroccupied housing, and 2.7 percent for lodging away from home (Greenlees 2003, 1). The crucial point is that changes in tenant rent are imputed to owner-occupied housing by changing weights but not by creating a new and different index of the unique costs or benefits of owner occupancy. Thus, the CPI makes the implicit assumption that any benefits of tax deductions or capital gains to home owners are quickly reflected in rents as landlords in a hypothetically competitive rental market pass along their own changes in user cost to their tenants. Of course, this implicit CPI assumption is dubious. Economists have long recognized that rental prices are “sticky,” that is, slow to adjust. As documented by Genesove (1999), 29 percent of rental apartment units had no change in rent from one year to the next. Nominal rigidity was much higher among units where tenants continued from the previous year as
162
Robert J. Gordon and Todd vanGoethem
contrasted to units where the tenants changed. Genesove also finds that units in single-unit and small buildings were much more likely to display nominal rigidity. Because apartment rents are sticky, the underlying CPI assumption that apartment rents can be translated into owner occupancy costs is problematic. Fundamental changes that influence home ownership costs, for example, a reduction in interest rates that (as in 2001–2003) allowed many homeowners permanently to reduce their true home ownership cost, may be reflected in rental costs (and hence in the CPI) only after a long lag if at all. It is striking how many dimensions of the literature on house prices refer back to tenant rent as a baseline for analysis. A recent example is Bajari, Benkard, and Krainer (BBK; 2003, 3), who translate the dependence of house price indexes on rental equivalence as follows: Dougherty and Van Order (1982) were among the first to recognize that the user cost could be a good measure of inflation in the cost of housing services. They note that the user cost is a marginal rate of substitution of housing consumption for other consumption. Further, in a competitive economy, the user cost should be equal to the rental price of a single unit of housing services charged by a profit-maximizing landlord. Thus, the inherently difficult task of measuring an unobservable marginal rate of substitution is replaced by the much easier task of measuring rents. The Bajari, Benkard, and Krainer paper makes a striking and controversial point, that all price increases on transactions in existing homes are welfare-neutral, because any benefits of capital gains to sellers are cancelled by reductions in the welfare of buyers. Welfare is increased only by construction of new homes and renovation of existing homes. Indeed, the structure of housing finance, at least in the United States, severely handicaps home renters relative to home owners, not only by providing tax deductions on mortgage interest to home owners, but also by transferring the benefits of capital gains to landlords, at least in the short run. In the long run, capital gains on rental properties, as well as tax deductions available to landlords, should translate into an increased supply that drives down rents, just as (more immediately) costs of home ownership are reduced by unrealized capital gains on houses. This process of adjustment may be inhibited by supply constraints.12 Anecdotal evidence suggests that low interest rates in 2001–2003 made the purchase of condominium units so attractive that an oversupply of apartments and softness of rents developed in many cities. Díaz and Luengo-Prado (2003) provide a convincing explanation of a fundamental puzzle, which is why, in the perspective of subsidies and ad12. We conjecture that supply constraints may be less significant for rental apartments, where a relatively small parcel of land can accommodate numerous apartments in a high-rise building, than for single-family houses that consume significant land for yards and streets.
Downward Bias in the Most Important CPI Component
163
vantages to home ownership, all households are not owner-occupiers. They estimate the effects on the percentage of home ownership (66.5 percent in their data) of adjustment costs, uncertainty, tax deductibility, down payment percentages, and discount rates. Their analysis provides an intuitive explanation of why one-third of American households are tenants and thus the subject of this research on rental prices.13 Renters are young, have not yet saved the down payment necessary for home ownership, move too often to allow the advantages of home ownership to offset transaction and adjustment costs, and are subject to capital market constraints based on credit histories and “permanent” income. An example of the fundamental role of rents in the analysis of house prices comes from Sinai and Souleles (2003), who demonstrate that the demand for home ownership responds positively to “rent risk,” that is, the perceived variance in rental prices. If a prospective tenant anticipates that rents will be variable in the future, he or she is more likely to hedge that risk by buying a home. The Sinai-Souleles analysis seems to be limited in applicability to the U.S. housing markets with its unique institution of fixedrate–long-term mortgages. In this environment, home buyers can eliminate almost all uncertainty about the cost of mortgage finance (not, of course, energy or maintenance costs or property taxes) by switching from uncertain future rents to home-ownership with a fixed-rate mortgage. Likewise, the analysis is quite dependent on a past environment when inflation in rents was relatively rapid. In a hypothetical future environment of low overall inflation, implying low nominal rent inflation, the advantages of home ownership would diminish accordingly. 6.8 The Analytical Case for Downward Bias in the CPI for Rent Throughout its history, the CPI has measured tenant rent. Beginning in 1983 (for the CPI-U, 1985 for the CPI-W), the BLS adopted the “rental equivalence” approach to measuring price changes for owner-occupied housing. This attempts to measure the change in the amount a homeowner would pay to rent his or her home in a competitive market. The index used for homeownership does not collect new data but rather reweights the rent sample to apply to owner-occupied units. Between 1987 and 1997, the prices of owner units were moved by rent changes for rental units that are matched to a CPI owner sample based on similar location, structure type, age, number of rooms, and type of air conditioning. Beginning in 1998 the owner sample was dropped due to the difficulty of finding renter-occupied units in neighborhoods consisting mostly or entirely of owner-occupied 13. The proportion of owner-occupiers has increased substantially over time. Brown (1994, table 3.6A, 62) indicates that in 1918 only 19 percent of “laborer” households were homeowners, compared to 24 percent of “wage earners” and 36 percent of “salaried” workers.
164
Robert J. Gordon and Todd vanGoethem
units and the methodology returned to the same as during the 1983–1986 period, namely to reweight the rent sample to represent owner-occupied units.14 The ex-ante assumption of downward bias in the CPI is based on more than the circumstantial evidence reviewed in the preceding. The BLS itself studied and then, beginning in 1988, corrected aging bias that results from the neglect of the fact that a given rental unit systematically experiences a decline in rent as the result of depreciation. The extent of aging bias was initially revealed in a BLS research paper based on the hedonic regression methodology (Randolph 1988), and since 1988 the CPI for rental shelter has been corrected by location-specific aging factors based on the hedonic regression. The annual correction for depreciation ranges from a high of 0.36 percent in major northeastern cities to 0.17 percent in the south (Lane, Randolph, and Berenson 1988), and so the CPI for shelter is presumed to be biased downward by this amount prior to 1988. Less well known is the nonresponse bias, which is the major focus of Crone, Nakamura, and Voith (2003a). Beginning in 1942, the BLS began collecting data on rent changes from tenants rather than landlords. This poses the major problem that rent increases tend to take place when one tenant departs and another arrives, but the departing tenant is not reached by the BLS survey while the arriving tenant often has no knowledge of the rent paid by the previous tenant. Crone, Nakamura, and Voith (2003a) estimate that over the period 1942–1977 roughly one-third of rent increases failed to be recorded, leading to a major downward bias that they estimate to be roughly 1.5 percent per year. Methodological improvements in the CPI gradually eliminated nonresponse bias.15 Beginning in 1978, the size of the BLS sample was reduced with the explicit intention of giving field agents more time to capture rent increases that occurred when a tenant moved and also giving them the latitude to interview landlords and building managers to obtain data on rent changes. In 1985 a correction was introduced for the bias associated with vacant units, involving the imputation of rent changes for vacant units based on rent changes experienced in occupied units in the same location. Finally, in 1994 the method was changed to eliminate a recall bias that had been introduced in 1978 when respondents were asked not only about the current month’s rent but also the previous month’s rent. Now the monthly rate of rental inflation is calculated as the sixth root of the average sixmonth inflation rate (since the previous interview taken six months earlier), and this results in roughly a three-month lag in reporting of changes in the rental inflation rate (Armknecht, Moulton, and Stewart 1995). 14. Facts in this paragraph come from Placek and Baskin (1996). 15. This history of CPI improvements is taken from Crone, Nakamura, and Voith (2003a, 11–12).
Downward Bias in the Most Important CPI Component
165
We have seen in table 6.1 that over the period from 1930 to 1985 or 1988, the CPI for rent increases more slowly than unadjusted mean rent at a differential rate of greater than 2 percent per year. Crone, Nakamura, and Voith (2003a) present adjustments based on a theoretical model of nonresponse bias; their average bias correction for 1930–1985 is 1.6 percent per year for their basic estimate and 1.4 percent per year for their “conservative” alternative estimate. We shall return to a discussion of these bias corrections when we present our own evidence for subperiods that overlap with the Crone, Nakamura, and Voith (2003) results. 6.9 Hedonic Regression Estimates of Rents from AHS Data All hedonic regression studies share the standard issues that arise in estimation using cross-section data, including coping with colinearity, potential nonnormal errors, variables subject to measurement error, and choice of functional form in relationships that may be nonlinear. Most of the literature on hedonic price-index methodology for housing, for example, Wallace (1996), Meese and Wallace (1991, 1997), and Sheppard (1999), refers to the sales price of houses, not rents paid by tenants. Nevertheless, some of the issues confronted in studies of house prices apply to tenant rents as well. Housing markets are characterized by search, imperfect information, and the competition between newly constructed homes and existing units. Housing, both owner-occupied and tenant-occupied, is very heterogeneous, having in common with such products as automobiles extreme complexity but with the added dimensions of location across regions, rural versus urban, and location within metropolitan areas. Houses tend to cost less in the South and more in the West, and they tend to cost more in the suburbs than in the central city, partly because the quantity of land that comes with the house is seldom revealed in the data. As noted by Sheppard (1999, 1616), “it is surprising how many hedonic models lack either a variable for land area, or a variable that explicitly identifies the location of the structure.” The importance of location in determining house prices leads to the related problem that observations may lack stochastic independence due to spatial autocorrelation, the tendency of the error in one observation to be correlated with those observations that are located nearby. We might find, for instance, that house prices are higher in a particular suburb or enclave that has any combination of excellent schools, unusually good public services, or unusually low property taxes. Our hedonic study of rents from the AHS shares with Crone, Nakamura, and Voith (2003b) the absence of data on location, except for four regions of the country and urban versus nonurban location. Thus we are unable to include factors determining the value of land, the quality of local schools, or nearby amenities including oceans, lakes, parks, or open space. To the
166
Robert J. Gordon and Todd vanGoethem
extent that these left-out determinants of house prices and rent are correlated with included variables, then coefficients on those variables will be biased. Fortunately, the issue of missing information on land value and other location-related variables is less serious for this study of rents than for studies of house prices as rental units typically have little or no attached land and are more homogeneous than owner-occupied units in many dimensions.16 6.9.1 Mean Values The AHS data examined in our hedonic regression study extends from 1975 to 2003 and covers only odd-numbered years. Details of sources and data construction and a discussion of problems and weaknesses in the AHS data appear in the data appendix. A problem with the AHS data set that determines our method of presentation is that the data consist of three separate panel data sets covering, respectively, 1975–1983, 1985–1995, and 1997–2003. The number of variables included jumps in the second data set. As Crone, Nakamura, and Voith (2003b, 8) also found, estimated regression coefficients for the time period 1983–1985 are problematic because of the lack of homogeneity of the panels between 1983 and 1985, and we have further found that the 1985–1995 panel cannot be merged with the 1997– 2003 (see further discussion in the data appendix). Table 6.2 displays for 1975, 1985, 1993, and 2003, the mean values of rent, of four quantitative explanatory variables, and percentage means for a host of additional variables represented in the regression analysis as dummy variables. The top row showing mean rent corresponds to the “AHS” column in table 6.1. Particularly interesting on the second line is the size of the rental unit measured in square feet (available only starting in 1985), and this changes remarkably little in contrast to the much more rapid growth in the size of new single-family houses, which over 1970–2001 experienced an increase in median square feet of 52 percent and in mean square feet of 55 percent.17 Other measures of size also show little increase between 1975 and 2003. There is a large jump in average age that presumably reflects changes in the panel of units. The quality characteristics in table 6.2 are divided into five sections, at the top those representing quantitative attributes like square feet and then below an array of dummy variables representing location, positive quality attributes, negative physical and environmental characteristics, and, fi16. Randolph (1988) has additional locational data, namely a large number of separate metropolitan area locational variables. Unfortunately, Randolph’s estimates are of little value for this study as he uses only a single year of data (1983) and thus cannot estimate the variation in a hedonic price index over time. 17. See Statistical Abstract (1987, table 1273, 706; 2002, table 922, 591). The median went from 1,385 square feet in 1970 to 2,103 square feet in 2001. By comparison a sample of new houses started in the first half of 1950 had an average floor area of only 983 square feet (Grebler, Blank, and Winnick 1956, 119).
Downward Bias in the Most Important CPI Component Table 6.2
167
Mean values, AHS data
Variable
1975
1985
1993
2003
135.20 n.a. 1.84 2.24 25.22
314.50 1,058.68 1.88 2.39 30.81
453.10 1,075.52 1.92 2.43 37.10
683.18 1,040.98 1.94 2.38 42.38
Northeast region (%) Midwest region (%) South region (%) West region (%) Urban area (%)
25.68 23.25 30.50 20.57 57.76
24.04 23.15 30.09 22.73 89.66
23.37 21.51 31.17 23.95 87.22
17.29 21.29 30.94 30.48 87.00
Has multiple bathrooms Has central air conditioning Interaction: Central air & NE Interaction: Central air & MW Interaction: Central air & S Interaction: Central air & W Has dishwasher Has fireplace Has porch Has elevator Garage included in rent
6.57 14.99 5.05 13.07 28.33 9.77 n.a. n.a. n.a. 7.89 n.a.
15.51 26.93 8.74 23.48 50.04 19.06 28.24 11.20 56.92 9.80 27.98
15.20 35.68 1.16 5.27 21.79 7.47 32.80 20.06 74.85 1.99 48.78
20.23 45.52 3.34 9.37 23.22 9.59 43.29 12.30 71.75 9.02 35.15
Lacks piped hot or cold water Incomplete plumbing fixtures No sewer connection Visible wiring Signs of rodents Holes in floors Cracked walls Noise problem Litter problem Neighborhood bothersome
n.a. 4.67 16.43 n.a. 11.87 3.67 n.a. n.a. n.a. n.a.
0.68 1.20 9.24 3.35 7.52 2.92 10.76 11.22 4.31 40.26
0.65 2.08 20.51 0.77 20.63 2.56 10.89 3.45 1.82 15.92
0.27 2.00 6.28 0.79 12.88 1.43 7.21 3.29 2.08 14.53
Public housing Rent is federally subsidized Rent is locally subsidized Rent is federally subsidized or locally subsidized (1997 and 2003)
6.83 1.68 n.a.
7.36 4.37 1.46
1.61
4.37
2.80
5.44
Rent Unit square feet Bedrooms Other rooms Approximate age
nally, special aspects of rental finance, for example, whether the unit is in public housing or carries a subsidy. While the size of rental units does not increase appreciably over time, there is a marked improvement in several other measures of quality between 1975 and 2003. The presence of air conditioning increases from 15 percent of the units in 1975 to 46 percent in 2003, while multiple bathrooms increases from 7 to 20 percent. Units having no sewer connection decreased from 16 percent in 1975 to 6 percent in
168
Robert J. Gordon and Todd vanGoethem
2003. There is a modest improvement in the variables in the bottom of the table measuring negative externalities. 6.9.2 Regression Estimates Estimated coefficients for the full set of available variables are shown separately in table 6.3 for three periods, the first panel covering 1975–1983, the second panel covering 1985–1995, and the third for 1997–2003. Explanatory variables are listed in the same order as in table 6.2. All regressions are estimated in double-log form and thus differ from the Box-Cox flexible functional form estimated by Crone, Nakamura, and Voith (2003b) and the semilog form used by Randolph (1988).18 All coefficients displayed in table 6.3 are significant at the 1 percent level or better (except for scattered negative attributes in 1997–2003), which is perhaps not surprising in light of the large sample sizes of between 30,000 and 52,000 observations in the three regressions. All coefficients appear to have correct signs, except for two negative environmental variables (“Noise Problem” and “Neighborhood bothersome”), which have small positive coefficients. The regional and urban coefficients are quite large, and estimated hedonic price indexes that omit regional effects will miss changes in prices due to the shift of the population from the Northeast and Midwest to the South and West (although the rent-lowering movement to the South is partly or entirely cancelled by the rent-raising movement to the West). A few of the coefficients are surprising—the coefficient on central air conditioning seems small and declines rapidly to a negligible 5 percent, whereas the coefficients on dishwasher and fireplace seem surprisingly large and may be correlated with other unmeasured attributes, for instance high-grade kitchen cabinets and countertops in the case of “dishwasher” and a higher general level of amenities and trim in the case of “fireplace.” The time dummy coefficients at the bottom of table 6.3 provide an alternative measure of inflation for every two years over the period 1975–2003, except for 1983–1985 and 1995–1997. After completing our discussion of the regression results, we will examine the implications of these estimated time dummy coefficients for annual rates of change over specified intervals. At that point, we will compare our results with the CPI and the hedonic regression results of Crone, Nakamura, and Voith (2003b). 6.10 The Effects of Quality Change: A “Stripping Exercise” In addition to estimating hedonic price indexes using all the available AHS data, we also want to look more closely at the sources and magnitude of quality change. Our basic question is “by how much we would overstate 18. Crone, Nakamura, and Voith (2003b, table 5) show that the average rate of increase of their hedonic price index is insensitive to alternative functional forms.
Table 6.3
Parameter estimates, AHS data
Variable
1975–1983
1985–1995
1997–2003
4.91∗∗
5.16∗∗ 0.07∗∗ 0.07∗∗ 0.10∗∗ –0.04∗∗
Intercept ln(Unit square feet) ln(Bedrooms) ln(Other rooms) ln(Approximate age)
0.15∗∗ 0.11∗∗ –0.18∗∗
5.00∗∗ 0.04∗∗ 0.09∗∗ 0.10∗∗ –0.07∗∗
Northeast region Midwest region South region West region Urban area
0.26∗∗ — –0.31∗∗ 0.15∗∗ 0.16∗∗
0.37∗∗ — –0.21∗∗ 0.32∗∗ 0.28∗∗
0.30∗∗ — –0.23∗∗ 0.29∗∗ 0.26∗∗
Has multiple bathrooms Has central air conditioning Interaction: Central air & NE Interaction: Central air & MW Interaction: Central air & S Interaction: Central air & W Has dishwasher Has fireplace Has porch Has elevator Garage included in rent
0.31∗∗ 0.17∗∗ 0.15∗∗ — 0.28∗∗ –0.18∗∗
0.17∗∗ 0.11∗∗ –0.06∗∗ — 0.18∗∗ –0.22∗∗ 0.16∗∗ 0.10∗∗ –0.04∗∗ 0.21∗∗ 0.09∗∗
0.17∗∗ 0.05∗∗ –0.02 — 0.18∗∗ –0.16∗∗ 0.21∗∗ 0.12∗∗ –0.03∗∗ 0.19∗∗ 0.09∗∗
–0.89∗∗ –0.11∗∗ –0.10∗∗ –0.06∗∗ –0.04∗∗ –0.05∗∗ –0.02∗∗ 0.02∗∗ –0.03∗∗ 0.02∗∗
–0.39∗∗ –0.01 –0.11∗∗ 0.01 –0.02 –0.02 –0.05∗∗ 0.01 –0.02 0.02∗
–0.65∗∗ –0.28∗∗ –0.13∗∗
–0.58∗∗
Lacks piped hot or cold water Incomplete plumbing fixtures No sewer connection Visible wiring Signs of rodents Holes in floors Cracked walls Noise problem Litter problem Neighborhood bothersome Public housing Rent is federally subsidized Rent is locally subsidized Rent is federally or locally subsidized (for 1997–2003 only)
0.06∗∗
–0.79∗∗ –0.08∗∗ –0.08∗∗ –0.10∗∗
–0.60∗∗ –0.33∗∗
–0.20∗∗
1977 time dummy 1979 time dummy 1981 time dummy 1983 time dummy 1987 time dummy 1989 time dummy 1991 time dummy 1993 time dummy 1995 time dummy 1999 time dummy 2001 time dummy 2003 time dummy
0.22∗∗ 0.30∗∗ 0.49∗∗ 0.63∗∗
Adjusted R2 Degrees of Freedom Standard Error of Estimate Sum of Squared Residuals
0.51 30,811 0.52 8,268
0.10∗∗ 0.20∗∗ 0.31∗∗ 0.36∗∗ 0.45∗∗
0.41 52,169 0.51 13,424
0.13∗∗ 0.21∗∗ 0.26∗∗ 0.27 33,015 0.61 12,233
Note: In this and subsequent tables, ∗∗ indicates statistical significance at the 1 percent level and ∗ indicates statistical significance at the 5 percent level.
170
Robert J. Gordon and Todd vanGoethem
the rate of change in rents if we had fewer or no quality change variables?” Asking this question another way, what is the difference between changes over time in the hedonic price index versus mean contract rent and which explanatory variables contribute to this difference? In this exercise it is important to distinguish between true changes in quality and changes in other explanatory variables that do not represent changes in quality, that is, locational variables and government-related variables (public housing and subsidized housing). To implement this distinction between quality and nonquality explanatory variables, we remove variables in several steps, as shown in table 6.4. Starting from the full regression in column (1), the first step is to remove all Table 6.4
Effect of stripping sets of variables, AHS data, 1975–2003
Full specification (1)
Weston analysis specification (2)
Weston housing subsidy variables (3)
Removed quality variables (4)
Year only (5)
1975–1983 sample 1977 time dummy 1979 time dummy 1981 time dummy 1983 time dummy SEE SSR
0.22∗∗ 0.30∗∗ 0.49∗∗ 0.63∗∗ 0.52 8,268
0.18∗∗ 0.35∗∗ 0.56∗∗ 0.69∗∗ 0.57 9,965
0.21∗∗ 0.36∗∗ 0.57∗∗ 0.71∗∗ 0.54 8,952
0.19∗∗ 0.36∗∗ 0.56∗∗ 0.70∗∗ 0.64 13,273
0.17∗∗ 0.35∗∗ 0.58∗∗ 0.68∗∗ 0.70 15,384
1985–1995 sample 1987 time dummy 1989 time dummy 1991 time dummy 1993 time dummy 1995 time dummy SEE SSR
0.10∗∗ 0.20∗∗ 0.31∗∗ 0.36∗∗ 0.45∗∗ 0.51 13,424
0.13∗∗ 0.25∗∗ 0.37∗∗ 0.45∗∗ 0.54∗∗ 0.58 26,898
0.13∗∗ 0.25∗∗ 0.37∗∗ 0.45∗∗ 0.54∗∗ 0.55 23,899
0.11∗∗ 0.23∗∗ 0.34∗∗ 0.40∗∗ 0.49∗∗ 0.58 27,601
0.09∗∗ 0.23∗∗ 0.32∗∗ 0.40∗∗ 0.50∗∗ 0.65 34,317
1997–2003 sample 1999 time dummy 2001 time dummy 2003 time dummy SEE SSR
0.13∗∗ 0.21∗∗ 0.26∗∗ 0.61 12,233
0.17∗∗ 0.24∗∗ 0.30∗∗ 0.67 14,219
0.18∗∗ 0.25∗∗ 0.32∗∗ 0.64 13,449
0.13∗∗ 0.21∗∗ 0.27∗∗ 0.66 14,946
0.17∗∗ 0.24∗∗ 0.31∗∗ 0.71 17,102
Annual growth rates 1975–1983 1985–1995 1997–2003
7.83 4.48 4.33
8.63 5.36 5.00
8.88 5.42 5.33
8.81 4.86 4.50
8.53 4.96 5.17
0.80 0.88 0.67
1.05 0.94 1.00
0.98 0.37 0.17
0.70 0.48 0.83
Variable
Difference from column (1) 1975–1983 1985–1995 1997–2003
Downward Bias in the Most Important CPI Component
171
quality variables other than those available in Weston’s analysis of the 1930–1970 period (discussed in the following). Thus column (2) retains the number of rooms, age, and incompleteness of plumbing fixtures, as well as regional location. The housing subsidy variables are added back in columns (3) and (4), while column (4) removes all remaining quality variables. Column (5) removes all explanatory variables other than the time dummies. We will discuss the differences in the annual rates of price change over each of the three intervals in succession, starting with 1975–1983, and we refer to the annual growth rates of the time coefficients summarized in the bottom three lines of table 6.4. Comparing columns (1) and (2) provides evidence on the effect of quality variables not available to Weston, especially multiple bathrooms, air conditioning, and presence of an elevator. For 1975–1983, these quality variables explain 0.80 percent per year of price change, and a comparison of columns (1) and (4) indicates that removing all quality variables (while leaving in the regional and subsidy dummies) explains 0.98 percent per year of price change. The regional and subsidy effects, dropped in going from column (4) to (5), contribute –0.28 percent per year, indicating that apartment rents were pulled down by a movement to the South and an increased share of subsidized rental housing.19 Because the CPI controls for location and such attributes as public financing, we want to include those variables in the regressions compared with the CPI, as in columns (1), (3), and (4). The next section of table 6.4 carries out the same exercise for the subsequent decade 1985–1995 when our set of explanatory variables is considerably richer. The result in going from column (1) to (2) is slightly larger; 0.88 percent per year of price change is explained by the combined effects of the long list of variables not available to Weston. Surprisingly, omitting the remaining quality variables in going from the second to fourth column actually reduces the cumulative price increase, probably reflecting the jump in the average age of rental units shown previously in table 6.2. For the 1985–1995 decade, a comparison of the final two columns indicates that removing the regional and subsidy variables adds back in 0.11 percent per year of price change. The final section of table 6.4 presents results for 1995–2003. The annual rate of price change explained by quality change in going from column (1) to (2) of table 6.6 is 0.67 percent per year, but again going from column (2) to (4) reveals a quality deterioration of 0.50 percent per year that may be explained by increasing age. Because the sharp jump in age in going from 1975 to 2003 (see table 6.2) is implausible, it may reflect an inconsistency in 19. These annual rates of change are calculated by converting the time dummy coefficients, which are in the form of decimal log changes, into percents and dividing by the number of years in each interval.
172
Robert J. Gordon and Todd vanGoethem
Table 6.5
Effects of stripping sets of variables, census microdata, 1960–1990
Variable
Census hedonic (1)
Quality variables removed (2)
Year only (3) 3.98∗∗
Intercept ln(Bedrooms) ln(Other rooms) ln(Approximate age)
4.43∗∗ 0.11∗∗ 0.15∗∗ –0.19∗∗
3.90∗∗
Northeast region Midwest region South region West region
0.19∗∗ — –0.10∗∗ 0.25∗∗
0.15∗∗ — –0.07∗∗ 0.27∗∗
Incomplete plumbing fixtures
–0.71∗∗
1970 time dummy 1980 time dummy 1990 time dummy Adjusted R2 Degrees of Freedom Standard Error of Estimate Sum of Squared Residuals
0.36∗∗ 1.15∗∗ 1.78∗∗
0.46∗∗ 1.27∗∗ 1.89∗∗
0.47∗∗ 1.28∗∗ 1.89∗∗
0.65 708,246 0.49 170,047
0.59 7E05 0.53 2E05
0.57 708,253 0.57 224,586
Source: Census microdata extract courtesy of the IPUMS project (http://www.ipums.umn .edu/ ).
the AHS sample for which we have not yet found an explanation.20 Removal of the regional and subsidy dummy variables raises price change by 0.66 percent per year. Overall, the regressions reduce the change in the hedonic index by 0.83 percent below the raw price change in the sample, of which just 0.17 points is attributable to quality change and 0.66 points to the regional or subsidy effects.21 6.11 Hedonic Regressions Based on Census Microdata A supplementary set of hedonic regressions is estimated from the Census of Housing microdata file, and here we have an amazing sample size of over 750,000, but a much smaller set of quality change variables, lacking even any control for air conditioning. In table 6.5 we present in column (1) the full hedonic regression result, in column (2) the effect of removing the 20. One source of inconsistency in the AHS sample is that the 1975–1983 panel contains six age subcategories of which the oldest is “built before 1939” while the 1985–2001 panel contains nine age subcategories of which the oldest is “built before 1919.” This inconsistency would cause approximate age to jump spuriously from 1975 to 1985 but not after 1985. 21. To check on the stability of the results during 1997–2003, we ran separate adjacent-year regressions for 1997–1999, 1999–2001, and 2001–2003. Not surprisingly in light of the large samples, the quality and time coefficients in the adjacent-year regressions were almost identical to the six-year regression results shown in table 6.6.
Downward Bias in the Most Important CPI Component
173
quality variables, and in column (3) the effect of removing the regional variables. The regional variables make no difference throughout, and removing the quality variables has an effect that varies over time. Looking only at 1960–1970, the price increase in column (2) is 10 percent faster than in column (1), indicating a quality effect of 1.0 percent per annum. However, the quality effect declines to 0.60 percent per annum for 1960–1980 and to 0.37 percent per annum for 1960–1990. Decade-by-decade, the implied quality change was at a rate of 1.0 percent per annum in 1960–1970, 0.2 percent in 1970–1980, and –0.1 percent in 1980–1990. The results in table 6.4 and 6.5 are converted to annual growth rates and summarized in table 6.6. The four lines represent the period of the Census data (1960–1990) and the three subperiods of the AHS data (1975–1985, 1985–1995, and 1995–2003). A comparison of columns (2) and (5) in the first line indicates an annual growth rate of quality over 1960–1990 of 0.37 percent per year and a difference between the CPI and Census hedonic (column [8] minus column [1]) of –1.67 percent per annum. The next three lines of table 6.6 summarize the results using the AHS data. The years of data gaps, 1983–1985 and 1995–1997, are bridged by assuming that each AHS variant index grew at the same rate as the CPI during those two pairs of years. Thus, for the 1975–1985 and 1995–2003 intervals shown in table 6.6, the results shown in columns (2) through (6) are biased toward zero by design. Column (1) displays the baseline regression results of Crone, Nakamura, and Voith (2003b), also based on AHS data but ending in 1995. Their price increase in column (1) is substantially faster than ours in column (2) for 1975–1985 but is very close in 1985–1995. As discussed previously, removing the quality variables other than rooms, age, and plumbing completeness yields measures of the annual rate of quality change in the three AHS periods of 0.60, 0.88, and 0.37 percent, respectively, an amazingly consistent record. Removing all quality variables in column (5) implies, in comparison with the full hedonic results in column (2), respective rates of “total” quality change of 0.70, 0.38, and 0.09 percent per year. The implied CPI bias (comparing column [8] with column [2]) is –1.05, –1.03, and –0.78 percent per annum. Figure 6.1 summarizes the hedonic regression results, displaying the Census- and AHS-based hedonic price indexes and the CPI for the period 1960–2003. The Census hedonic indexes and the CPI are expressed on a basis of 1970 100, and the AHS index is linked to the Census index in 1975, which amounts to expressing the AHS index on a 1970 base year with the Census average growth rate for 1970–1980 used to proxy the missing AHS observations for 1970–1975. During the overlapping period of 1975– 1990, the Census and AHS indexes are surprisingly close in light of the much longer list of explanatory variables in the AHS data set, indicating that the location and subsidy variables essentially offset the effect of the quality variables.
CNV Box-Cox hedonic specification (1)
n.a. 9.04 4.66 n.a.
Time period
1960–1990 1975–1985 1985–1995 1995–2003
n.a. 8.21 5.36 4.58
Weston analysis specification (3) n.a. 8.41 5.42 4.83
Weston housing subsidy variables (4) 6.29 8.31 4.86 4.20
Removed quality variables (5)
6.31 8.11 4.96 4.70
Year only (6)
6.28 8.44 4.53 5.38
Mean rent (7)
4.25 6.56 3.45 3.30
CPI (8)
Sources: Column (1): Crone, Nakamura, and Voith (2003b, table 5). Columns (2) through (6) are computed by setting the rate of change of the hedonic index equal to that of the CPI for 1983–1985 and 1995–1997, reflecting the inability to mesh data for 1983 with 1985, or data for 1995 with 1997. Column (7): AHS, IPUMS Census Microdata, see table 6.1. Column (8): BLS.
5.92 7.61 4.48 4.08
Full specification (2)
Annualized growth rates, by index
Table 6.6
Downward Bias in the Most Important CPI Component
Fig. 6.1
175
CPI and hedonic price indexes from Census and AHS data, 1960–2003
6.12 Additional Quantitative Evidence on Quality Change, 1918–1970 6.12.1 The Weston Data and Analysis Our main source of changes in rent for the period 1930–1970 comes from an unpublished dissertation by Rafael Weston (1972). His data originate in frequency table form published in the 1940, 1950, and 1960 Census of Housing volumes and preliminary data for 1970. While 1940 was the first year in which the Census of Housing was conducted, he was able to obtain corresponding data from the 1930 Census of Population. Weston’s quality characteristics are based on whether a unit was inside or outside a SMSA, its Census geographic region, the age of the unit, the number of rooms, completeness of plumbing, and “condition,” which in turn is either “dilapidated” or “not dilapidated” as subjectively assigned by the Census interviewer. The published frequency tables contain these characteristics cross-classified by rent and region but not by one another. An important advantage of the data is that the number of rental units in each quality category is provided, and this allows us to calculate rental expenditure in each category and thus to develop a price index based on expenditure weights. To generate a full cross-classification from this limited data set, Weston supposed a multinomial model for each variable and fit the data to log-normal distribution using a complex analysis of variance (ANOVA)–based methodology. He then conducted an analysis of quality change, measuring the implied quality change associated with each vari-
176
Robert J. Gordon and Todd vanGoethem
Table 6.7
Year 1930 1940 1950 1960 1970 Annual growth rates 1930–1940 1940–1950 1950–1960 1960–1970 1930–1970
Mean gross rent and two price indexes from Weston’s data, 1930–1970 Implied quality index from:
Mean gross rent
Weston price index
Törnqvist index from Weston data
Weston
Törnqvist
33.22 30.89 46.08 74.92 115.80
100.0 97.4 149.4 222.4 292.2
100.0 98.3 146.6 229.7 305.5
100.0 95.4 92.8 101.4 119.3
100.0 94.6 94.6 98.2 114.1
–0.73 4.00 4.86 4.35
–0.26 4.28 3.98 2.73
–0.17 4.00 4.49 2.85
–0.47 –0.28 0.88 1.63
–0.56 0.00 0.37 1.50
3.12
2.68
2.79
0.44
0.33
Sources: First and third columns from Weston (1972), tables 3-2 and 3-2. Second column from Weston (1972), table 5-1.
able and its interaction terms. Weston produced price indexes for both house prices and rents. Table 6.7 in the first column copies from table 6.1 the mean gross rent data that Weston obtained from the Census. As calculated in table 6.1, this series increases 2.1 percent per year more rapidly than the CPI for rent over the period 1930–1970. Displayed in the second column is a qualitycorrected price index that Weston calculated from his own data. Because Weston’s explanation of his methodology is quite obscure, we have calculated an alternative quality-adjusted Törnqvist price index that calculates the rent change separately for each of Weston’s cells (e.g., two rooms, complete plumbing, not dilapidated) and then aggregates the separate log rent changes by the average nominal rental expenditure in each cell in the first and second year of the comparison. Thus, log rent changes in each cell from 1930 to 1940 are aggregated using the nominal expenditure share of that cell averaged between the 1930 and 1940 value. The two right-hand columns compute an implicit quality index as the ratio of an index of mean gross rent to each of the two price increases. If rent increases faster than a price index, this implies that quality has increased. Quite surprisingly, there was no improvement in quality between 1930 and 1960. A deterioration in quality during the 1930s was just offset by a small improvement in quality in 1950–1960. Only in the final decade, 1960–1970, did quality improve rapidly. The bottom part of table 6.7 calculates annual growth rates for each decade and for the four decades taken together. Over the full period 1930– 1970, the Weston price index increases at 0.44 percent per year less than
Downward Bias in the Most Important CPI Component Table 6.8
Weston quality attributes
Age 0–10 10 Rooms 1–2 3–4 5–6 6 Meana Condition Not dilapidated Dilapidated Plumbing With all Lacking Weighted mean of rent ratiob a
177
1930
1940
1950
1960
1970
30.5 69.5
10.8 89.2
14.5 85.5
16.6 83.4
19.6 80.4
11.7 32.7 37.4 18.1 4.65
16.8 41.3 33.5 8.4 4.13
17.7 52.1 26.3 3.9 3.81
14.8 52.9 27.9 4.4 3.91
5.7 32.2 44.5 17.6 4.89
82.6 17.4
84.6 15.4
89.6 10.4
93.9 6.1
97.0 3.0
57.5 42.5 1.96
63.9 36.1 2.07
68.3 31.7 1.58
81.9 18.1 1.79
93.4 6.6 1.76
Calculated on midpoints of each bin; 7 was used for the last bin. Mean ratio of rent for a unit with proper plumbing to one without, weighted by quantity.
b
mean gross rent, and the Törnqvist price index increases at 0.33 percent per year less, implying implicit quality change indexes of the same magnitude. This leaves us with the puzzle as to why quality change was so slow in the period 1930–1960 and then accelerated so much from 1960 to 1970. Several answers are suggested in table 6.8, which provides means of the main Weston quality variables. First, due to lack of construction during the Great Depression, average age increased sharply from 1930 to 1940, with a drop in the number of units of ten years or younger from 30 to 11 percent. Going in the same direction, and probably more important, was a decline in the average number of rooms from 4.65 in 1930 to 3.81 in 1950, followed by a slight recovery to 3.91 in 1960 and then a big jump to 4.89 in 1970. The other two quality variables improved steadily, with a decline in “dilapidated” from 17 percent in 1930 to 3 percent in 1970, and in partial or no plumbing from 43 percent in 1930 to 7 percent in 1970. Shown below the plumbing percentages is the implicit value of plumbing, measured as the ratio of the rent of a unit with complete plumbing to a unit lacking plumbing, calculated cell by cell and weighted by the number of units in each cell.22 In the following, we attempt to make a rough correction for the value of improvements over time in heating, plumbing, and electrification. 22. Each “cell” shows the rent and the number of units in every combination of quality attribute, for example, a two-room apartment more than ten years old, not dilapidated, and with full plumbing.
178
Robert J. Gordon and Todd vanGoethem
Table 6.9
Mean values, census microdata
Variable
1960
1970
1980
1990
Rent
62.31
98.95
216.04
410.03
Bedrooms Other rooms Approximate age
1.7 2.2 23.9
1.8 2.2 21.6
1.8 2.3 23.7
1.9 2.2 26.3
Northeast region (%) Midwest region (%) South region (%) West region (%)
31.5 24.6 27.1 17.2
29.4 23.5 27.1 19.9
26.4 22.3 28.2 23.1
22.4 20.6 31.2 25.4
Incomplete plumbing fixtures
17.8
6.1
0.9
0.7
Because Weston’s quality correction for 1960–1970 is so much larger than for the other decades, it is worth checking Weston’s results against the Census microdata that was used to develop the hedonic regressions of table 6.5. As shown in table 6.7, the unadjusted annual growth rate of rent for 1960–1970 is 4.35 percent for Weston and in table 6.9 is 4.63 percent for the Census microdata. The Weston price index based on the Törnqvist method increases at 2.73 percent per year compared to 3.6 percent for the Census hedonic price index of table 6.5. The implicit increase in quality occurs at a rate of 1.5 percent for Weston and 1.0 percent for the Census. An interesting similarity is the implicit value of plumbing. The bottom line of table 6.8 shows that the average value of plumbing is to make rent 1.77 times higher than without plumbing or to make the log 0.57 higher. This is remarkably close to the coefficient for absence of plumbing of –0.71 in the Census microdata regression in table 6.5. The major discrepancy between Weston and the Census microdata concerns the change in the number of rooms from 1960 to 1970. There was virtually no change in the Census, only from 3.93 total rooms to 3.99 total rooms, in contrast to Weston’s jump in table 6.8 from 3.91 to 4.89. It is possible that the Weston data on mean rooms reflect a coding error or the fact that he was using a preliminary summary of 1970 Census data. We note from table 6.2 that total rooms in the AHS data were much closer to the 1970 Census figure throughout 1975–2001, ranging from 4.08 in 1975 to 4.40 in 2001. Accordingly, we discount the Weston conclusion on quality change in the 1960–1970 decade and prefer the conclusion of the hedonic price index developed from the Census microdata. 6.12.2 Brown’s Evidence on Quality Change In table 6.1 we have already examined Brown’s rental prices from five budget studies based on CES data spanning the period 1918–1988. We found that over the 1918–1973 period, Brown’s rental price per unit increased at about 1.9 percent per year faster than the CPI. Going beyond
Downward Bias in the Most Important CPI Component
179
raw rent data, Brown’s book contains a wealth of information on quality change. An initial problem is that all of Brown’s data from the CES on household expenditures by type (types of food, types of clothing, shelter, fuel, home furnishings, etc.) are listed separately for different classes of workers— laborers, wage earners, and salaried workers. Managerial employees and owners of small businesses are excluded from the CES source. As a first attempt to extract some useful information about changes in shelter quality, we average together the percentages displayed for wage earners and salaried workers. This omits laborers at the low end and managerial and self-employed business people at the high end. Also, the data generally refer to urban and nonfarm rural families and omit living conditions on farms. Of the quality changes that Brown quantifies or discusses over the five years of her study (as shown above in table 6.1), we are primarily interested in electrification, heating, plumbing, and household appliances. Of these only the presence or absence of “complete” plumbing facilities is taken into account in the Weston study summarized in tables 6.7 and 6.8. The best that we can do to extract data from the Brown study is presented in table 6.10. As shown there, the definitions of variables tend to differ from one year to the next, and there is progressively less detail shown on the quality of rental apartments in each year after the initial year of 1918. Two surprising facts are listed at the top of table 6.10. Rooms per rental unit were 5.3 in 1918 and 5.2 in 1935, as compared to Weston’s figure for 1930 of 4.7 rooms. The second surprise, doubtless related to the first, is that more than half of the rental units in both 1918 and 1935 were houses rather than apartments. Thus the 1918 households surveyed by the CES cannot be accurately characterized as living in dark, dank tenements as more than half of them lived in houses. Presumably these were small houses typical of Chicago’s “bungalow belt” and similar areas of other cities, but at least these rental tenants did have small yards and outside windows on all four sides.23 Even in the lowest “laborer” class houses accounted for 56 percent of rental units. In contrast, in 2001 “single-family detached and attached units and mobile homes” accounted for only 36 percent of rental units.24 As of 1918, electrification of the urban and nonfarm rural population had reached the halfway mark, and the task of spreading electrification to the nonrural population was largely complete by 1935 and totally complete by 1950. Electrification came sooner to large cities than smaller towns, and because rental units were predominately located in large cities, it is likely
23. Brown (1994, 40) indicates that median household income in the CES sample was $1,400 in 1918. The mean income for her three classes are $1,037 for laborers, $1,344 for wage earners, and $2,272 for salaried workers. 24. See Statistical Abstract (2002, table 937, 599).
180
Robert J. Gordon and Todd vanGoethem
that the data on the third line of table 6.10 understate the spread of electrification to tenant-occupied units in 1918 and 1935. In contrast, central heating was still rare in 1918 and even in 1935. Roughly half the rooms in tenant-occupied units in 1918 were “equipped for heating,” but this usually meant some kind of stove that heated a single room, often fueled by coal. Central heating did not reach a penetration of 50 percent until sometime between 1935 and 1973. Indoor plumbing came to the rental unit earlier than central heating. By 1918 almost 80 percent of units had an indoor toilet and almost two-thirds had a bathroom. By 1935, 80 percent had not just electricity but also both hot running water and a flush toilet. Thus, while there was a substantial further spread of indoor plumbing after 1918, much of the transition had already taken place in prior years. The data exhibit a contradiction for 1950 as it cannot be true simultaneously that 84 percent of all units were equipped with a bathroom, hot running water, and a flush toilet, while at the same time 34 percent “lacked full plumbing.” The mean percentage lacking full plumbing in the Weston data in table 6.8 for 1950 was 32 percent; the Weston number of 18 percent for 1960 agrees with the Census number of 18 percent in table 6.9.
Table 6.10
Data on characteristics of rental units and all dwelling units, 1918–1973 Sources Percentages
Rooms Percent of renters in houses Electrification, urban and nonfarm rural Heating Rooms equipped for heating Warm-air furnaces Central heating Plumbing With bathroom With inside water closet With hot running water, flush toilet, and electricity With bathroom, hot running water, flush toilet, and “not dilapidated” No bathtub or shower No indoor toilet Lacking full plumbing
Rental or all units?
Source table
Source page
1918
1935
R R
3.6B, 4.8 3.6B, 4.8
62,127 62,127
5.3 64
5.2 55
A
HS
S73
47.4
83.9
R A A
3.6B
62 126
55
R R
3.6B 3.6B
62 62
A
4.8
127
80
A R R A
5.1
212–213 127 127 298
28 20
1950
1973
96.6
31 78 64 78
84
34
3
Notes: Any data referring only to rental units (R) refers to the average of wage earners and salaried workers. HS refers to the Historical Statistics volume cited in the references.
Downward Bias in the Most Important CPI Component
181
Some additional insight into the quality of housing units (both tenantoccupied and owner-occupied) in 1935 can be obtained from the description of a “typical American home” from the U.S. BLS (1935) as quoted by Brown (1994, 126): single-family dwelling, about 19 years old, of wood or frame construction containing five rooms. It is equipped with either bathtub or shower, indoor water-closet, uses electricity for lighting and gas for cooking. For the country as a whole, reliance is placed predominantly on heating stoves for heat, although over 31% of all dwelling units use warm-air furnaces. Coal is the principal fuel used. Not much change was registered in 1950, except for the conversion to central heating and the addition of appliances: The typical urban home had four to six rooms for three persons. Amenities included running water, private toilet and bath, central heating (except in the South), gas or electric stove, and mechanical refrigerator. The rent for such a home was estimated by one study to be about $38 monthly. (Brown, 1994, 215) 6.12.3 Other Evidence on Quality Change When we combine the Brown, Weston, and Census data, we are faced with a conflict between an improvement in quality characteristics involving electricity, heating, plumbing, and appliances, but a decline in the average number of rooms per unit. This decline is verified by Grebler, Blank, and Winnick (1956, 119–21), who display a special tabulation from the 1950 Census of Housing, showing a decline from 4.76 rooms per urban and rural nonfarm dwelling unit for units built before 1919 to 4.26 rooms for units built after 1945. They argue convincingly that this decline understates the true decline because of conversions that created more units per multifamily building over the years between the construction date and the data source in 1950. They argue that because conversions to increase the density of multifamily buildings occur mainly in older buildings, then the pre1919 buildings were originally built with more rooms per unit than the 4.76 figure cited previously. Overall, the authors conclude that this decrease in average dwelling size “was probably more than enough to compensate for the addition of new equipment and facilities since the twenties” (Grebler, Blank, and Winnick 1956, 121). 6.13 Quantifying Quality Change To summarize our findings on quality change to this point, we found that quality attributes available in the 1975–2003 AHS data but not available in the Weston or Census of Housing data contributed an average of 0.67 percent per year to explaining price change (table 6.6, comparison of columns
182
Robert J. Gordon and Todd vanGoethem
[2] and [3]). On balance, the characteristics available for the pre-1970 Census years, primarily rooms per unit and age, exhibit a quality deterioration after 1975 due to increasing age. This result is highly suspect because age does not increase nearly as much in the Census microdata (table 6.9, line 3) as in the AHS data (table 6.2, line 4). Quality change is also measured to occur at an annual rate of about 1.0 percent in the Census microdata for the decade 1960–1970, but at a negligible rate of 0.05 percent per year during 1970–1990. The Weston analysis exhibits no net quality change between 1930 and 1960 because a decline in rooms per unit and an increase in age offsets the benefits of improved plumbing and reduced “dilapidation.” But Weston does not include key aspects of quality improvement reported in the CES budget studies summarized by Brown, who documents a transition from 1918, when most tenant units lacked central heating, half lacked electricity, one-third or more lacked full plumbing facilities, and virtually none had electric appliances, to 1973 when central heating, electricity, full plumbing, and a refrigerator and stove were standard equipment in apartments. How much were these quality improvements worth? Both the Weston data and the Census regressions estimate the value of full plumbing as increasing the log of rent by about 0.6. The AHS regressions for 1975–1985 yield a coefficient of 0.8, while after 1985 the coefficient on plumbing is much lower, presumably because it had become almost universal. At least in principle, the Weston quality change measures incorporate a plumbing effect back to 1930. If during 1918–1930 the extent of complete plumbing increased roughly from 0.6 to 0.75, a coefficient of 0.6 would imply a quality improvement of 9 percent, or 0.75 percent per year during the 1918– 1930 interval.25 An analogy to the value of central heating can be taken from the example of central air conditioning, for which we have coefficients in the range of 0.05 to 0.17 in table 6.3, averaging out at 0.11. Over the period 1975 to 2003, the percentage of units with central air conditioning in the AHS sample increased from 15 to 46 percent, and this can be translated into an annual rate of improvement of quality of 0.11 percent per year.26 It could be argued that the value of central heat was less than the value of air conditioning as housing units were already heated, albeit inconveniently, 25. A coefficient of 0.6 means that the presence of full plumbing compared to the absence of full plumbing raises the log of rent by 0.6. This full effect of 0.6 would occur if the presence of full plumbing went from zero to 100 percent. A 15 percentage point increase would be 15 percent of this, or 0.15 times 0.6, or 0.09. 26. Following the procedure in the previous footnote, a complete conversion from 0 percent to 100 percent central air conditioning would raise the log value of the average apartment by 0.11. The observed increase of 29 percentage points raised the log value by 0.11 times .29, or 0.031, and this occurred over twenty-eight years for an annual rate of improvement of 0.031/ 28, or 0.11 percent per year.
Downward Bias in the Most Important CPI Component
183
before central heating became pervasive, whereas before the invention of residential air conditioning around 1950, people just sweltered. The convenience and cleanliness advantage of the transition from coal to fuel oil and natural gas raises the value of central heating, so let us consider a coefficient of 0.25, more than double the average 1975–2003 coefficient of 0.11 on central air conditioning. An increase in the percentage use of central heating from 15 percent in 1918 to 100 percent in 1973 would represent an annual rate of quality improvement of 0.39 percent per year. It is more difficult to speculate about electrification. Once a rental unit had electricity, then households could bring lighting into the home for the cost of a few inexpensive light fixtures. Later on, as home appliances were invented and improved, homes with electricity had access to refrigerators and washing machines. The benefit of electricity must have been as great as that of central heating, say a coefficient of 0.25, implying that the increase in electrification from 50 percent in 1918 to 100 percent by 1950 represented an annual rate of quality change of another 0.39 percent per year. Adding up only these three aspects of quality change, we have for 1918– 1930 0.75 for plumbing, 0.39 for heating, and 0.39 for electricity, for a sum of 1.53 per year. After 1930 there is no separate adjustment for plumbing, which is taken into account in Weston’s analysis, but the heating and electricity contributions continue, adding up to 0.78. Gradually in the 1950s and 1960s, the heating and electricity contributions die out but are replaced by other contributions of quality change, as indicated in our regression analysis of the Census and AHS data. Overall, there seems ample evidence to support a rate of quality improvement in rental apartments of 1.0 percent per year, with perhaps a greater rate of improvement in the first half of the twentieth century when the impact of indoor plumbing, electricity, the conversion to central heating and away from coal, and the inclusion of a refrigerator and a stove as standard equipment had their maximum effect. By coincidence, a completely independent analysis of the relationship between rent, age, and maintenance costs of commercial office buildings arrives at an estimated rate of technical progress for structures of 1.0 percent per year (Gort, Greenwood, and Rupert 1999, 225). 6.14 Merging the Prehedonic and Hedonic Results into a Century-Long Perspective Thus far the discussion in this paper has combined two quite different perspectives on quality change, those based on hedonic regressions from Census data for 1960–1990 and AHS data for 1975–2003, with more impressionistic evidence on the pre-1960 period. Table 6.11 provides a systematic summary of all the results in the paper, with the columns representing the seven subperiods suggested by breaks in the data sources. The first three lines exhibit a summary of the information already presented in
3.41 –1.04
0.87 –0.86
1.39 0.10 3.14 –1.30
1.00
0.39
1.84 4.63 –2.79
1960–1970
0.20 0.00 6.04 –1.62
0.20
4.42 6.24 –1.82
1970–1975
0.88 0.88 –0.25 7.81 –1.235
6.56 8.44 –1.88
1975–1985
0.40 0.40 0.10 4.03 –0.58
3.45 4.53 –1.08
1985–1995
0.10 0.10 0.40 3.63 –0.33
3.30 4.03 –0.73
1995–2003
Source: Row (1): CPI from table 6.1, column (1). Row (2): 1914–1935, the average of the growth rate of the Grebler, Blank, and Winnick (1957) index in table 6.1, column (2) for 1914–1935 (1.86) with the Brown index in table 6.1, column (7) for 1918–1935 (2.22). 1935–1960, the average of the growth rate of the CPI for 1935–1940 and for 1940–1960 the average of the growth rates of the Weston mean gross rent data from table 6.1, column (3) with the Crone, Nakamura, and Voith (2003) data in table 6.1, column (4). 1960–1970 from the Census mean contract rent, table 6.1, column (5). 1970–1975 is the growth rate from the 1970 Census figure in table 6.1, column (5) to the AHS figure in table 6.1, column (6). Growth rates after 1975 all come from the AHS data in table 6.1, column (6). Row (3): Row (1) minus row (2). Rows (4a–c): See text discussion. Note that the plumbing adjustment is incorporated in the Weston quality index after 1930, which for this discussion we take to mean 1935. Row (4d): Average growth rate of the two quality indexes in table 6.7 between 1935 and 1960, with 1935 interpolated linearly between 1930 and 1940. Row (4e): Time dummy coefficient for 1970 versus 1960 from table 6.5, column (2) minus column (1). 1970–1975 is taken as the annual growth rate of 1980 versus 1970 from table 6.5, column (2) minus column (1). Row (4f): From table 6.4, the annualized growth rate of the time dummy coefficients in column (4) minus column (1). Row (5): The sum of all rows in section 4. Row (6): For 1960–1970 and 1970–1980, the annualized growth rate of the difference in the time dummy coefficients in table 6.5, column (3) minus column (2). For 1975–2003, the annualized growth rate of the difference in the time coefficients in table 6.4, column (5) minus column (4). Row (7): Row (2) minus the sum of rows (5) and (6). Row (8): Row (1) minus row (7).
0.87
0.39 0.39 0.09
2.37 4.28 –1.89
1935–1960
1.17
0.39 0.39 0.39
0.01 2.04 –2.03
1914–1935
Summary of results on quality change, other determinants of rent, and implied CPI bias, 1914–2003 (annual growth rates in percent)
1. CPI 2. Actual mean or median rent 3. Difference between CPI and actual rent 4. Value of quality change a. Plumbing b. Central heating c. Electrification d. Weston quality index e. Census hedonic quality index f. AHS hedonic quality index 5. Total change in value of quality 6. Other rent determinants (location, subsidy) 7. Fully adjusted rent (comparable to CPI) 8. Implied CPI bias
Table 6.11
Downward Bias in the Most Important CPI Component
185
table 6.1 about the annual growth rate over these seven intervals in the CPI and mean or median nominal rent unadjusted for changes in quality or location, and the difference between the growth rate in the CPI and average rent, which is negative in all seven periods, with an average difference of about –2 percent over the interval 1914–1985, with a smaller difference after 1985. Line 4 of table 6.11 has six sections that extract from all of our previous results the implied rate of quality change in rental units. For 1914–1935 we have conjectured estimates of the contribution of improved plumbing, central heating, and electrification, based in part on coefficients from hedonic regressions for the post-1960 period. For 1935–1960 we drop the plumbing estimate as plumbing is one of the quality characteristics explicitly controlled in Weston’s approach. He also controls for location, age, and condition, characteristics that may have either a positive or negative influence on rent and which are not taken into account in the pre-1935 period. Our hedonic regressions provide most of the evidence on quality change after 1960, except that we add an explicit allowance for heating, which (along with air conditioning) is one of the variables missing from the Census data used to cover the 1960–1975 period. After 1975 the quality estimates are entirely based on the “stripping exercise” carried out for the AHS data in table 6.4, in which we removed quality variables from the regressions to isolate the separate effects of the Weston quality variables, other quality variables, and the location and subsidy variables. In table 6.11, line 5 sums the various sources of quality change, line 6 adds in the effects of the location and subsidy variables in the hedonic regressions, and the final comparison of lines 1 and 7 provides the bottom-line estimates of the CPI bias, which is uniformly negative in each of the seven periods. The average bias for 1914–2003 is –0.97 percent per year. For the period before major improvements in CPI methodology, 1914–1985, the average bias is –1.09 percent per year. For the period emphasized by Crone, Nakamura, and Voith as involving the tenant nonresponse problem, represented here by 1935–1985, the average bias is –1.19 percent per year. Over the entire 1914–2003 period, the average annual rates of change are mean rent 4.37, CPI 2.54, and this CPI-rent difference of –1.83 is divided between a 0.86 contribution of quality (including a small contribution of location/subsidy) and a remaining –0.97 estimated CPI bias. Thus our initial conjecture that the 2 percent difference between the growth of mean rent and the CPI might be explained roughly half-and-half by quality, and CPI bias appears to be roughly validated by the results. 6.15 A Study of Apartment Rents in a Specific Locality, 1925–1999 A final piece of long-term historical evidence on tenant rents comes from a project designed to collect detailed rent data at the local level in or-
186
Robert J. Gordon and Todd vanGoethem
der to assess historical changes in the CPI for rent. This local data set has the advantage in that it allows us to control for many types of quality change as discussed previously, including type of heat, electrification, and plumbing equipment. Just as important, by its limitation to a single locality, the resulting index is free of the effects of changing regional and metropolitan location on average rents paid. Evanston, Illinois, is the location for a pilot project to determine the feasibility of this kind of research.27 Most important, data were readily available in the archives of the local suburban newspaper, which has published continuously since the 1920s. In addition, the housing stock in Evanston combines aspects of city and suburb, serving as a microcosm for a range of different types of apartments and houses. The closest northern suburb of Chicago along Lake Michigan, Evanston, had a population in 2000 of about 72,000. The population ranges from very wealthy to poor, and homes range from mansions to tiny houses and modest apartments. The city was founded in the mid-1800s and was well established by 1925, the year for which our data begin. These factors allowed us to collect data on tenant rent and prices for a variety of living units over the past seventy-five years. The first phase of our research involved collecting apartment prices over the interval 1925–1999 from classified advertisements in the Evanston Review, a weekly local newspaper. In order to control for quality change, data were collected on apartments for which the advertisement provided detailed descriptions, including number of rooms and bathrooms; proximity of public transportation, schools, or shopping; parking; heat (type and whether included in rent); air conditioning (first appearing in the 1960 ads); and whether anything else was included (such as appliances). We noted other descriptive attributes, such as wood floors or garden view, and terms such as “luxury building.” Because of space limitations, each ad did not contain information for each of the mentioned categories. When possible, we chose buildings that listed the specific address and only considered unfurnished apartments. Data were collected for every five years from 1925 to 1999. September was chosen as the month for each sample because many buildings advertise at this time, possibly to attract returning college students, although August and October were also used as a supplement if the September issues did not contain enough data. Our ideal was to find the same building addresses repeated from sample to sample. In some instances this was possible, and a “Specific Address” index was compiled. However, for several time periods, insufficient data containing specific address information were available. This was particularly a problem for 1945 and 1950, when there was a housing shortage. This problem affected comparisons for the surrounding periods. 27. This is a summary of Gordon and Mandelkern (2001). See also Mandelkern (2001).
Downward Bias in the Most Important CPI Component Table 6.12
187
Evanston apartment rent indexes and CPI, 1925 100 No. of observations
Year 1925 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 1999 Annual growth rates 1925–1950 1950–1975 1975–1999 1925–1975 1925–1999
CPI for rent
Specific address index
Median index
Address index
Median index
100.0 90.3 61.9 68.7 71.9 86.1 103.1 112.1 118.5 134.6 167.9 234.2 320.2 395.4 450.6 506.9
100.0 122.7 62.2 82.1 108.3 134.5 158.9 155.9 154.9 232.3 335.6 494.5 695.9 846.8 955.7 1,087.1
100.0 119.8 73.3 84.7 114.2 143.8 169.6 178.9 177.3 257.8 355.0 504.9 694.6 920.8 996.8 1,257.6
n.a. 10 10 6 n.a. n.a. n.a. 6 7 n.a. 3 3 5 11 12 10
16 16 37 35 n.a. 9 25 28 23 16 22 23 20 29 42 26
–0.60 2.67 4.60
1.19 3.66 4.90
1.45 3.61 5.27
1.04 2.19
2.42 3.22
2.53 3.42
To analyze our data, we matched apartments as closely as possible over each five-year interval. When possible, we matched apartments in the same building and with the same description (especially number of rooms and bathrooms) so that our resulting rent index is equivalent to the matchedmodel indexes used in previous research on durable goods, apparel, and computers. We were able to find between three and eleven exact address matches for each interval other than 1925–1930, 1940–1955, and 1965– 1970. Because of the small number of matches in some instances and the lack of information in others, we filled in the gaps in the “Specific Address” index by borrowing from the Median index (discussed in the following). The five-year change in rent for each matching apartment was averaged together with equal weights, yielding a log rent change for each five-year period. This series of changes was then cumulated into the “Specific Address” rent index, which is displayed and compared with the CPI for rent in table 6.12. It is important to note that while our Evanston indexes are matchedmodel indexes like the CPI, we have the important advantage that we have no problem with tenant nonresponse bias as emphasized by Crone, Naka-
188
Robert J. Gordon and Todd vanGoethem
mura, and Voith (2003). All of the price information that we have collected is based on newspaper ads and thus is obtained directly from landlords, not tenants. To supplement the first index, we grouped apartments into categories based on the number of rooms for three-, four-, five-, and six-room apartments. To make the sample as accurate as possible, we included as many apartments for which we could find data (generally at least ten, but fewer for the intervals previously mentioned for which data were limited). Starting with the 1960 ads, some ads contained information about the number of bedrooms rather than the number of total rooms. This alternative method of counting rooms extended through 1999 and became the norm in the ads. It was not clear whether an apartment listed only as a “one bedroom” was better averaged with the “three-room” or “four-room” categories. However, many ads included wording such as “one-bedroom, fourroom apartment” during the transitional years. By using this transitional information and by comparing listed rents, we decided to convert between the listings on the basis that X bedrooms equals (X 3) rooms. After compiling the mean data for three-, four-, five-, and six-room apartments for 1925–1999, we used the same raw data to compile several other indices. In the years from World War II to the present, there were sometimes insufficient listings for three-room and six-room apartments. To make up for this, we compiled an index including only four- and five-room apartments (for which data were plentiful). To compare with our other indices, we also compiled an index using the median, instead of the mean, for three-, four-, five-, and six-room apartments. Because the median, mean, and the four–five-room indexes were very close, table 6.12 displays only the Specific Address index and the Median index for three-, four-, five-, and six-room apartments. Differences between the CPI and the two new apartment rent indexes are summarized at the bottom of table 6.12, which displays average annual growth rates over the intervals 1925–1950, 1950–1975, and 1975–1999. Differences between the two new rent indexes are relatively minor, and both display growth rates faster than that of the CPI in all three periods. The difference for the Specific Address index is 1.78 percent per year in 1925–1950, 0.98 percent per year in 1950–1975, and a much smaller 0.29 in 1975–1999. The average annual growth rate for the entire period is 1.03 percent faster than the CPI for the Specific Address index and 1.23 percent faster for the Median index. The primary weakness in the new rent indexes is the potential for unmeasured quality change. Presumably the Specific Address index is more accurate than the Median index. The most important types of quality differences among apartments are carefully controlled in the new indexes, especially number of rooms, bathrooms, location, and presence or absence of air conditioning. There may be some downward bias because the indexes
Downward Bias in the Most Important CPI Component
189
do not make any explicit allowance for age, and many of the apartments were new in the 1920s and more than seventy years old in 1999. While this source of bias was corrected after 1988, it has been estimated that the downward bias for aging in the CPI prior to 1988 is 0.3 percent per year (Randolph 1988). Because our new indexes share with the CPI the method of following the same apartments over time, they share both the aging bias and also the lack of explicit allowance for renovations and modernization that may largely or entirely offset the aging bias. Overall for the 1925–1975 period, the difference between the CPI and our two indexes are –1.38 and –1.49 percent, respectively, and this compares with the average CPI bias in table 6.11 for 1914–1975 of –1.07 percent. The smaller bias in the national results in table 6.11 could indicate that rents have risen faster in Evanston than in the nation as a whole or that our Evanston indexes may miss some elements of quality change. 6.16 Conclusion We have examined a wide variety of data on the historical behavior of tenant rents over the entire history of the CPI from 1914 to 2003. We began from the hypothesis that the CPI is biased downward over its history and have linked that hypothesis to complementary work on CPI methodology by Crone, Nakamura, and Voith (2003a) that traces the downward bias primarily to nonresponse by tenants who moved just as rents were raised. Crone, Nakamura, and Voith (2003a) pinpoint the period of greatest bias as 1942 to 1988, and in our data the CPI rises less rapidly than mean or median contract rent at an annual rate of exactly 2.00 percent between 1940 and 1987.28 Our initial examination of data finds that the 2 percent difference extends to other time periods and data sources, as summarized in the bottom section of table 6.1. The difference was much less after 1987, reflecting presumably an improvement in CPI methodology. Any difference, no matter how large, does not imply a bias in the CPI if quality change were sufficiently rapid. We have gathered a rich set of data sources to assess the importance of quality change in rental housing units over our long historical period of study. We begin with a hedonic regression analysis on a large set of panel data from the American Housing Survey (AHS) covering 1975–2003. Our primary focus is on understanding the contribution of quality characteristics to differences between estimated hedonic price indexes and raw unadjusted changes in apartment rent. We segregate the explanatory variables into traditional quality measures (number of rooms, age, and presence or absence of full plumbing), nontraditional quality characteristics, and variables for regional location and government subsidies that do not themselves measure quality. We find that the tradi28. See table 6.1, where we take for 1940 the average of the values in columns (3) and (4).
190
Robert J. Gordon and Todd vanGoethem
tional quality measures contribute little, or even a negative amount, to the explanation of price change, primarily because of large increases in the age of apartment units that may be partly spurious. The nontraditional quality characteristics consistently contribute about 0.7 percent per year to the explanation of price change. The major challenge in the paper is to assess the importance of quality change prior to the beginning of the AHS data in 1975. We create an overlap measure of quality-adjusted price change from Census of Housing microdata for 1960–1990. The Census data have the defect that they are limited to the traditional quality measures, and these yield an estimated rate of quality increase of 1.0 percent per year for 1960–1970 but negligible rates after that, at least in part because of the influence of the increasing age of rental units. Also available for the pre-1975 period is Weston’s study based on Census data for 1930–1970. We extract a price and quality index from his data, and these indicate virtually no quality change between 1930 and 1960 and then a rapid rate of about 1.50 percent per year for 1960– 1970. Aspects of the Census data look more plausible to us for the 1960– 1970 period, and we prefer the Census quality change estimate of 1.0 percent for that decade. For earlier periods, we rely on two types of analysis. First, we rely on Weston’s cross-classification of rents and quality characteristics to develop a basic measure of quality change for 1930–1960. Second, we stitch together data on the diffusion of important quality attributes of rental units, including plumbing, heating, and electrification, over the period 1918–1973. Applying guesstimates about the value of these attributes based in part on the post-1960 hedonic regression coefficients, we conclude that quality change in the 1918–1973 period must have been substantial. Our guesstimates yield larger estimates of the growth rate of quality as we move further back because the impact of indoor plumbing was largely completed by 1935 and that of electrification by 1950. As summarized in table 6.11, we estimate that quality improved at an annual rate of about 1.2 percent during 1914–1935 and 0.9 percent during 1935–1960. Our final piece of evidence is based on a study of rents in a single local community, Evanston, IL, covering the period 1925–1999. Here we control for location effects by limiting the project to a single small area and control for such quality attributes as number of rooms, number of bathrooms, type of building, heating, and air conditioning. One of our indexes is analogous to repeated-sales indexes of housing prices (Case and Shiller 2003), in that it measures changes in rent for apartments having the same specific street address over time. This study yields a difference between the CPI and the two Evanston indexes of –1.38 and –1.49 percent per year for 1925–1975, about –0.4 percent more than the CPI bias estimates based on the nationwide data. Our overall conclusions are surprisingly consistent that the CPI bias was roughly –1.0 percent prior to the methodological improvements in the CPI
Downward Bias in the Most Important CPI Component
191
that date from the mid-1980s.29 Our reliance on a wide variety of methodologies and of evidence on types of quality change and their importance, while leaving the outcome still uncertain, at least in our view substantially narrows the range of possibilities regarding the history of CPI bias for rental shelter over the twentieth century.
Data Appendix American Housing Survey (AHS) This paper uses fifteen cross-sections of American Housing Survey30 microdata for 1975–2003, courtesy of the Inter-University Consortium for Political and Social Research, and the U.S. Department of Housing and Urban Development. The AHS provides detailed cross-sectional microdata in two survey forms, metropolitan and national. The metropolitan survey is conducted during even years and the national survey in odd years. This study makes exclusive use of the national survey. Each year a consistent basic panel is sampled and units are followed year to year whenever possible. Panels are updated for new construction in areas where building permits are required and units missed in the reference census year. Interviews were done in person on paper form until 1997 when laptops were introduced to enhance speed and accuracy in data collection. The resulting data sets provide a robust set of characteristic and quality variables that are well suited for the estimation of hedonic price equations. The original 1973–1983 AHS panel was based on the 1970 Census of Housing. In 1985, the panel and survey form were redesigned to improve data quality and incorporate the 1980 Census results. This basic 1985 panel has been used every year since. Data Quality Issues in the American Housing Survey The most important variable for our analysis is clearly rent. The AHS records contract rent in a continuous fashion from $0 up to a different topcode in each year. Although this will inevitably cut off the tail of the distribution, it is unlikely to adversely influence our results. Units in the highest price echelon are likely to have highly specialized attributes that cannot be recorded in basic characteristic data and thus cannot be priced by a traditional hedonic approach. 29. Hence we reject the Crone, Nakamura, and Voith (2003a) conclusion of a bias of roughly 1.8 percent between 1940 and 1985 as excessive and making insufficient allowance for quality change. 30. Before 1983, the AHS was known as the Annual Housing Survey. We use only the new title in this work.
192
Robert J. Gordon and Todd vanGoethem
The year a unit was built is not continuous in the AHS. Irregularly shaped bins are used in place of discreet years. The 1973–1983 panel has six such bins, and the 1985–present panel has nine. Our calculations estimate a unit’s approximate age using the midpoint of each bin. The last bin is unbounded and creates a catchall for older units. End bins were problematic; their final coding treats the end bins as if they were the same size as the earlier bins. The approximate age variable cannot be viewed as an ideal measure of mean unit age. While the first panel was in use between 35–45 percent of all rental units fell into the end bin, making age estimates very susceptible to the approximation. The problem is ameliorated in the 1985– present panel by the introduction of more bins covering older build dates. While generally of very high quality, the AHS data occasionally suffers when a malformed survey question creates double counting or, oppositely, underestimation. For example, before 1984 respondents were asked a single question asking for the total count of rooms. This caused acute underreporting of rooms because of the dubious definition for exactly what constituted a room. When the survey was redesigned this was established, and the current counts are more accurate. Differences between the 1975–1983 and 1985–present surveys make some variables noncomparable. Those describing a unit’s location relative to a city or metro area changed due to the methods used to assign status as within a metropolitan area. Privacy concerns previously disallowed identification in any area with a population under 250,000 persons. This rule was relaxed to any area under 100,000. Similarly, data for plumbing was made useless in the 1985 data when a malformed survey question unreliable answers. This resulted in an unreasonable drop (and subsequent rise upon correction) in the quantity of units with incomplete plumbing facilities. Also particularly problematic in the first panel are the data on neighborhood characteristics. Respondents were asked if certain attributes— for example, crime, litter, and noise—were earmarked as bothersome instead of merely present, thereby making the measurement of these already difficult to measure characteristics nearly impossible. Surveyors were also instructed to collect some neighborhood variables for certain kinds of dwelling units. This makes comparisons for variables such as having crime, litter, and noise problems unreliable. Our work includes these variables but focuses more on unmeasured quality change due to basic characteristic variables. Crone, Nakamura, and Voith (2003b) came to a similar conclusion with respect to the AHS’s coverage of these variables. At a late stage of this research, we determined that it was impossible to treat post-1985 as a single panel because of data discontinuities between 1995 and 1997. This accounts for the fact that our results are presented as three sets of regressions (1975–1983, 1985–1995, and 1997–2003) with no regressions spanning 1983–1985 and 1995–1997. As discussed in the text and tables, we use the CPI for those intervals, thus assuming that the CPI was
Downward Bias in the Most Important CPI Component
193
an accurate measure of rent changes during those two pairs of years and hence biasing toward zero our final estimates of the difference between CPI growth and the growth of an alternative hedonic price index fully adjusted for quality, location, and subsidies. Further discussion of the background and reasons for the 1995–1997 data continuity can be found at http://www .huduser.org/intercept.asp?loc/Datasets/ahs/docchg1997.pdf. Decennial Census Data Microdata To make comparisons to older measures of quality change and specifically Rafael Weston’s PhD thesis, our study makes use of Census of Housing microdata files spanning from 1960–1990. These data are used courtesy of the University of Minnesota at Minneapolis’s Historical Census Project. The Integrated Public Use Microdata Series (Ruggles et al. 2003) provides easily accessible data sets and codebooks and maintains information on the comparability of each variable in their series over time. Compared to the AHS, Census data do not contain nearly as robust a set of variables and is thus less useful for understanding the breakdown of quality change over time. The longer time sample for Census data allows us to extend the analysis into history with relative ease. Rent and age information are encoded into discrete bins similarly to the AHS’s build-year variable. This creates artificially low variability in the continuous estimated rent and age variables used in the hedonic price regressions. This is responsible for the very high level of explained variation seen in each of the Census regressions.
References Armknecht, Paul A., Brent R. Moulton, and Kenneth J. Stewart. 1995. Improvements to the food at home, shelter, and prescription drug indexes in the U.S. Consumer Price Index. BLS Working Paper no. 263. Washington, DC: Bureau of Labor Statistics. Bajari, Patrick, C. Lanier Benkard, and John Krainer. 2003. House prices and consumer welfare. NBER Working Paper no. 9783. Cambridge, MA: National Bureau of Economic Research. Boskin, Michael J., Eller R. Dulberger, Robert J. Gordon, Zvi Griliches, and Dale Jorgenson. 1996. Toward a more accurate measure of the cost of living. Final Report of the Advisory Commission to Study the Consumer Price Index. Washington, DC: Government Printing Office. Brown, Clair. 1994. American standards of living: 1918–88. Oxford, UK: Blackwell Publishers. Case, Karl E., and Robert J. Shiller. 2003. Is there a bubble in the housing market? Brookings Panel on Economic Activity 34 (2). Crone, Theodore M., Leonard I. Nakamura, and Richard Voith. 2003a. The CPI for rents: A revisionist history. Paper presented at Brookings Workshop on Economic Measurement.
194
Robert J. Gordon and Todd vanGoethem
———. 2003b. Regression-based estimates of rental increases. Working Paper, April. Díaz, Antonia, and María José Luengo-Prado. 2003. On the user cost and home ownership. Northeastern University, Working Paper. Dougherty, A., and R. Van Order. 1982. Inflation, housing costs, and the consumer price index. American Economic Review 72 (March): 154–64. Genesove, David. 1999. The nominal rigidity of apartment rents. NBER Working Paper no. 7137. Cambridge, MA: National Bureau of Economic Research, May. Goodman, Jack. 2003. Performance across local apartment markets. http:// www.nmhc.org/Content/ServeContent.cfm?IssueID86&ContentItemID1253 &siteAreaResources,Resources. Gordon, Robert J. 1990. The measurement of durable goods prices. Chicago: University of Chicago Press. ———. 2004. Apparel prices 1914–93 and the Hulten-Bruegel paradox. Paper presented at the CRIW Conference on Price Index Concepts and Measurement, Vancouver, CA. http://www.faculty-web.at.northwestern/economics/gordon/ apparel.pdf. Gordon, Robert J., and Gail Mandelkern. 2001. Local indexes of apartment rent and house sale prices. Working Paper, January. Gort, Michael, Jeremy Greenwood, and Peter Rupert. 1999. Measuring the rate of technological progress in structures. Review of Economic Dynamics 2:207–30. Grebler, Leo, David M. Blank, and Louis Winnick. 1956. Capital formation in residential real estate. Princeton, NJ: Princeton University Press. Greenlees, John. 2003. U.S. Consumer Price Index: Changes in the cost of shelter. Presentation at Brookings Workshop on Economic Measurement. Hulten, Charles R. 1997. Comment. In The economics of new goods, ed. Timothy F. Bresnahan and Robert J. Gordon, 66–70. Chicago: University of Chicago Press. Lane, Walter F., William C. Randolph, and Stephen A. Berenson. 1988. Adjusting the CPI shelter index to compensate for effect of depreciation. Monthly Labor Review 111 (October): 34–37. Mandelkern, Gail. 2001. Calculating a price index for residential rent: Methods of controlling for quality change bias. Senior thesis, Northwestern University. Meese, Richard, and Nancy E. Wallace. 1991. Nonparametric estimation of dynamic hedonic price models and the construction of residential housing price indices. Journal of the American Real Estate and Urban Economics Association 19:308–32. ———. 1997. The construction of residential housing price indices: A comparison of repeat sales, hedonic regression, and hybrid approaches. Journal of Real Estate Finance and Economics 14:51–73. Moulton, Brent R., and Karin E. Moses. 1997. Addressing the quality change issue in the Consumer Price Index. Brookings Papers on Economic Activity, Issue no. 1:305–49. Nordhaus, William D. 1997. Do real-output and real-wage measures capture reality? The history of lighting suggests not. In The economics of new goods, ed. Timothy F. Bresnahan and Robert J. Gordon, 29–66. Chicago: University of Chicago Press. Placek, Frank, and Robert M. Baskin. 1996. Revision of the CPI housing sample and estimators. Monthly Labor Review (December): 31–39. Randolph, William C. 1988. Housing depreciation and aging bias in the Consumer Price Index. Journal of Business and Economic Statistics 6 (1): 359–71. Ruggles, Steven, Matthew Sobek, et al. 2003. Integrated Public Use Microdata Series: Version 3.0. Minneapolis: University of Minnesota, Historical Census Project.
Downward Bias in the Most Important CPI Component
195
Sheppard, Stephen. 1999. Hedonic analysis of housing markets. In Handbook of regional and urban economics, ed. E. S. Mills and P. Cheshire, 1596–1635. Elsevier. Sinai, Todd, and Nicholas S. Souleles. 2003. Owner-occupied housing as a hedge against rent risk. NBER Working Paper no. 9482. Cambridge, MA: National Bureau of Economic Research, January. Triplett, Jack E. 1971. Quality bias in price indexes and new methods of quality measurement. In Price indexes and quality change: Studies in new methods of measurement, ed. Zvi Griliches, 180–214. Cambridge, MA: Harvard University Press. U.S. Bureau of the Census. 1957. Historical statistics of the United States: Colonial times to 1957. Washington, DC: U.S. Bureau of the Census. U.S. Bureau of the Census. (Various years). Statistical abstract of the United States. Washington, DC: U.S. Bureau of the Census. U.S. Bureau of Labor Statistics. 1935. Housing conditions in American cities. Monthly Labor Review 40 (March): 724. Wallace, Nancy E. 1996. Hedonic-based price indexes for housing: Theory, estimation, and index construction. Economic Review of the Federal Reserve Bank of San Francisco (3): 35–48. Weston, Rafael R. 1972. The quality of housing in the United States, 1929–70. PhD diss., Harvard University.
7 Pricing at the On-Ramp to the Internet Price Indexes for ISPs during the 1990s Greg Stranger and Shane Greenstein
7.1 Introduction Prior to commercialization, the Internet was available only to researchers and educators. Less than a decade after commercialization, more than half the households in the United States were online according to the National Telecommunications Information Administration (NTIA; 2001). The Internet access industry generated 15.6 billion dollars in revenue in 2001 (U.S. Department of Commerce 2003, 733). This growth presents many challenges for measuring the contribution of the Internet to gross domestic product (GDP). In this study we consider the formulation of consumer price indexes for commercial Internet access. We focus on constructing an index for the earliest period of growth of dial-up service, when the challenges for index construction are greatest. No simple measurement strategy will suffice for formulating price indexes for Internet activity. On average, more than two-thirds of time online is spent at so-called free sites. Many of these are simply browserware or Usenet clubs for which there is no explicit charge. Some of these are partly or fully advertising-supported sites. Households also divide time between
Greg Stranger is a manager at the Boston Consulting Group. Shane Greenstein is the Elinor and Wendell Hobbs Professor in the Management and Strategy Department of the Kellogg School of Management at Northwestern University, and a research associate of the National Bureau of Economic Research. This paper is partly derived from Stranger’s PhD dissertation at Northwestern University. We thank David Dranove, Barbara Fraumeni, Zvi Griliches, Brent Moulton, Mark Roberts, Scott Stern, Jack Triplett, an anonymous reviewer, and especially Ernst Berndt for comments and suggestions. We received funding from the National Science Foundation, the Bureau of Economic Analysis, and the Kellogg School of Management. All errors are our responsibility.
197
198
Greg Stranger and Shane Greenstein
activities that generate revenue directly from use. For example, most electronic retailing does not charge for browsing but does charge per transaction. Other media sites, such as pornography, newspaper archival, and some music, charge directly for participation (Goldfarb 2004). There is one place, however, where almost every household transacts money for service. Internet service providers (ISPs) provide the point of connection for the vast majority of household users, charging for such a connection. From the outset of commercialization, most users moved away from ISPs at not-for-profit institutions, such as higher education (Clemente 1998). Far more than 90 percent of household use was affiliated with commercial providers (NTIA 2001). This continues today. In this paper we investigate the pricing behavior at ISPs from 1993 to 1999 with the goal of generating price indexes. We begin with the earliest point when we could find data, 1993, when the commercial ISP market was still nascent. We stop in 1999 for a number of reasons. For one, the industry takes a new turn with the AOL/Time Warner merger in early 2000, an event that we believe alters strategies for accounting for qualitative change. Second, until the merger many industry sources indicate that all online providers followed the same technological trajectory. This helps us construct indexes without data on market share, which we lack. Third, and somewhat independently, broadband began to diffuse just near the end of our sample. After a few years, it connected enough households to influence Internet price indexes and would require us to alter the procedures carried out in this paper. Finally, spring 2000 marks the end of unqualified optimism about the persistence of the Internet boom. This change in mood was affiliated with restructuring of the ISP industry, potentially bringing about a marked departure in price trends. Using a new data set about the early period, we compute a variety of price indexes under many different methods. The results show that ISP pricing has been falling rapidly over time. The bulk of the price decline is in the early years of the sample, especially between early 1995 and the spring of 1996. We also find a 20 percent decline in price per unit of ISP quality for the thirty-three-month period between late 1996 and early 1999. We assess alternative models that vary in their attention to aspects of qualitative change. We find that this attention matters. Accounting for qualitative change shapes the estimates of price declines and the recorded timing of those declines. This paper is unique in that it is the first to investigate a large sample of U.S.-based ISPs. This setting gives rise to a combination of familiar and unique challenges for measurement. This novelty and challenge should be understood in context. There have been many papers on hedonic price indexes in electronic goods (Berndt and Griliches 1993; Berndt, Griliches, and Rappaport 1995; Berndt and Rappaport 2001) and new industries, such as automobiles (Griliches 1961; Raff and Trajtenberg 1997). We bor-
Pricing at the On-Ramp to the Internet
199
row many lessons learned from those settings (see Berndt [1991] for an overview). There is also another paper about prices at Canadian ISPs (see Prud’homme and Yu 1999), which has some similarities to our setting, though involving many fewer firms. This is one of the first papers to investigate and apply these hedonic methods to estimate price indexes for a service good. In this setting, physical attributes are not key features of the service, but features of the contract for service are. These features can improve quite rapidly from one year to the next as contracting modes change, as new entrants experiment with new service models for delivery, and as technological change alters the scope of possible services available to ISPs. Our primary goal is to understand hedonic price indexes in such an evolving market. Many, but not all, ISPs offer more than one type of contract for service. In our data there is no one-to-one association between firm and the features of service. This provides some challenges for measurement, as well as some opportunities. We compare alternative ways to control for unobserved quality at the level of the ISP. This is another novelty, albeit a small one for the results. We view this paper as one small step in a much larger research enterprise, measuring the economic changes brought about from the diffusion of and improvement in the Internet. There is much distance between our historical exercise and an ideal cost-of-living index for the Internet (Greenstein 2002). During the time period under examination, the Internet underwent dramatic changes. The quality of what users got from the Internet skyrocketed. Said another way, what the user did with the service they got from an ISP also changed dramatically over this time period. We measure only a small piece of that dramatic change in experience. 7.2 A Brief History of Internet Service Providers in the United States The Internet began as a defense department research project to develop networking technologies more reliable than existing daisy-chained networks. The first product of this research was the Advanced Research Projects Agency Network (ARPAnet). Stewardship was handed to the National Science Foundation (NSF) in the mid-1980s, which established NSFnet, another experimental network for universities and their research collaborators. The NSF’s charter prohibited private users from using the infrastructure for commercial purposes, which was not problematic until the network grew. By 1990, the transmission control protocol/Internet protocol (TCP/IP) network had reached a scale that would shortly exceed NSF’s needs. For these and related reasons, the NSF implemented a series of steps to privatize the Internet. These steps began in 1992 and were completed by 1995. Diffusion to households also began to accelerate around 1995, partly as a consequence of these steps as well as due to the commer-
200
Greg Stranger and Shane Greenstein
cialization and diffusion of an unanticipated innovation, the browser (Greenstein 2001). 7.2.1 The Origins of Internet Functionality and Pricing A household employs commercial Internet providers for many services, most of which had their origins in the ARPAnet or NSFnet. The most predominant means of communications is e-mail. The e-mail equivalent of bulk mail is called a listserv, where messages are distributed to a wide audience of subscribers. These listservs are a form of conferencing that is based around a topic or theme. Usenet or newsgroups are the Internet equivalent of bulletin board discussion groups. Messages are posted for all to see, and readers can respond or continue the conversation with additional postings. Chat rooms serve as a forum for real-time chat. “Instantmessaging” has gained increased popularity, but the basic idea is quite old in computing science: users can communicate directly and instantaneously with other users in private chatlike sessions. Some tools have been supplanted, but the most common are World Wide Web (WWW) browsers, gopher, telnet, file transfer protocol (ftp), archie, and wais. Browsers and content have grown in sophistication from the oneline interface designed by Tim Berners-Lee, beginning with Lynx, then Mosaic, and, more recently, Netscape Navigator, Internet Explorer, and the open-source browser, Opera. The Internet and WWW are now used for news and entertainment, commerce, messaging, research, application hosting, videoconferencing, and so on. The availability of rich content continues to grow, driving demand for greater bandwidth and broadband connectivity. Pricing by ISPs requires a physical connection. The architecture of the Internet necessitates this physical connection. Both under the academic and commercial network, as shown in figure 7.1, the structure of the Internet is organized as a hierarchical tree. Each layer of connectivity is dependent on a layer one level above it. The connection from a computer to the Internet reaches back through the ISP to the major backbone providers. The lowest level of the Internet is the customer’s computer or network. These are connected to the Internet through an ISP. An ISP will maintain their own subnetwork, connecting their points of presences (POPs) and servers with Internet protocol (IP) networks. These local access providers derive their connectivity to the wider Internet from other providers upstream, either regional or national ISPs. Regional networks connect directly to the national backbone providers. Private backbone providers connect to public (government) backbones at network access points. 7.2.2 The Emergence of Pricing and Services at Commercial Firms An ISP is a service firm that provides its customers with access to the Internet. These are several types of “access providers.” At the outset of the in-
Pricing at the On-Ramp to the Internet
Fig. 7.1
201
The organization of the Internet
dustry, there was differentiation between commercial ISPs, “online service providers” (OSPs; Meeker and Dupuy 1996), and firms called “commercial online services” by Krol (1992). Internet service providers offer Internet access to individual, business, and corporate Internet users, offering a wide variety of services in addition to access, which will be discussed in the following. Most OSPs evolved into ISPs around 1995–1996, offering the connectivity of ISPs with a greater breadth of additional services and content. Most households physically connect through dial-up service, although both cable and broadband technologies gained some use among households near the end of the millennium.1 Dial-up connections are usually made with local toll calls or calls to a toll-free number (to avoid longdistance charges). Corporations often make the physical connection through leased lines or other direct connections. Smaller firms may connect using dial-up technology. These physical connections are made through the networks and infrastructure of competitive location exchange companies, incumbent location exchange companies (such as Regional Bell Operating Companies), and other communications firms. Large ISPs may maintain their own network for some of the data traffic and routing; the largest firms often lease their equipment to other ISPs for use by their customers. Smaller ISPs are responsible for the call handling equipment 1. Approximately 5 percent of U.S. households subscribed to a broadband connection as of 2000; see NTIA (2001).
202
Greg Stranger and Shane Greenstein
(modems, routers, access concentrators, etc.) and their own connections to the Internet, but in some locations they may lease services for traveling customers. Charging for access occurs at the point of access by phone. Internet service providers generally maintain POPs where banks of modems let users dial in with a local phone call to reach a digital line to the Internet. Regional or national ISPs set up POPs in many cities, so customers do not have to make a long-distance call to reach the ISP offices in another town. Commercial online services, such as America Online, have thousands of POPs across many countries that they either run themselves or lease through a third party. Many ISPs provide services that complement the physical connection. The most important and necessary service is an address for the user’s computer. All Internet packet traffic has a “from” and “to” address that allows it to be routed to the right destination. An ISP assigns each connecting user with an address from its own pool of available addresses. ISPs offer other services in addition to the network addresses. These may include e-mail servers, newsgroup servers, portal content, online account management, customer service, technical support, Internet training, file space and storage, Web-site hosting, and web development and design. Software is also provided, either privately labeled or by third parties. Some of it is a standard component of the ISP contract (Greenstein 2000b; O’Donnell 2001). Some ISPs also recommend and sell customer equipment they guarantee will be compatible with the ISP’s access equipment. Internet service providers differ in size. The national private backbone providers (i.e., MCI, Sprint, etc.) are the largest ISPs. The remaining ISPs range in size and scale from wholesale regional firms down to the local ISP handling a small number of dial-in customers. There are also many large national providers who geographically serve the entire country. Many of these are familiar names such as Earthlink/Sprint, AT&T, IBM Global Network, Mindspring, Netcom, PSINet, and so on. The majority of providers provide limited geographic coverage. A larger wholesale ISP serves all ISPs further up the connectivity chain. Local ISPs derive connectivity from regional ISPs who connect to the national private backbone providers. A large dialup provider may have a national presence with hundreds of POPs, while a local ISP may serve a very limited geographic market. It is difficult to describe modal pricing behavior for ISPs over time. The most likely date for the existence of the first commercial ISPs is 1991–1992, when the NSF began to allow commercialization of the Internet.2 In one of 2. PSINet, a now bankrupt ISP, used to claim that it was the first commercial ISP, offering connection in 1991 though many others have also made a similar claim. The history is cloudy because it is unclear whether the NSF “allowed” connection or some firms connected in violation of the restrictions against commercial behavior and, needing an excuse to privatize, NSF figured out how to accommodate such behavior.
Pricing at the On-Ramp to the Internet
203
the earliest Internet handbooks, Krol (1992) lists 45 North American providers (eight have national presence). In the second edition of the same book, Krol (1994) lists 86 North American providers (10 have national presence). Marine et al. (1993) lists 28 North American ISPs and six foreign ISPs. Schneider (1996) lists 882 U.S. ISPs and 149 foreign ISPs. Meeker and Dupuy (1996) reports that there are over 3,000 ISPs, and the Fall 1996 Boardwatch Magazine’s Directory of Internet Service Providers lists 2,934 firms in North America. This growth was accompanied by vast heterogeneity in service, access, and pricing. Descriptions of regional and wholesale connectivity (see Boardwatch Magazine’s Directory of Internet Service Providers [1996]) imply that contracts are short term. 7.2.3 Pricing Behavior at Commercial Firms and How It Changed Prior to the Internet, there were many bulletin boards and other private networks. The bulletin boards were primarily text-based venues where users with similar interests connected, exchanged e-mail, downloaded or uploaded files, and occasionally participated in chat rooms. The private networks or OSPs (e.g., AOL, CompuServe, Genie, and Prodigy) had similar functionality, with segregated content areas for different interests. Users could post and download files and read and post interest group messages (similar to today’s Internet newsgroups, but usually moderated). These forums (as they were called on CompuServe) were often centered on a specific topic and served as a customer service venue for companies. The pricing structure of the majority of these services was a subscription change (on a monthly or yearly basis) and possibly an hourly fee for usage. At this early stage, circa 1992–1993, most users would batch together the work they needed to do online, connect, and quickly upload and download files, e-mail, and messages. Then they would disconnect, minimizing time online. Specialized software existed to facilitate this process. When ISPs first commercialized in 1992, there were similar expectations that users would continue to use the Internet in such bursts of time. Because much of the usage was for uploading and downloading, it was sensible to charge more for faster access. Pricing by speed is close to pricing by volume (or pricing for traffic). Consequently, many ISPs services varied the hourly charge based on the speed of the connection. In the early 1990s, speeds moved from 300 bytes per second (bps) to 1,200; 2,400; 4,800; 9,600; and eventually to 14,400 and 28,800. The latter two were the norm of the mid 1990s. 56k (or, on some lines, 43,000bps) became the norm in the latter part of the 1990s. As speeds changed and as behavior changed, a variety of pricing plans emerged. Price plans began to offer larger amounts of hours that were included in the monthly fee and offered marginal pricing above those included hours. These plans offered traditional nonlinear pricing or quantity discounts. In these plans, the marginal hours would be priced lower than
204
Greg Stranger and Shane Greenstein
the average cost of the included hours. We will say more about this in the following. Only later, after the ISP industry began to develop and mature, and users demonstrated preferences for a browsing behavior, pricing began to shift to unlimited usage for a fixed monthly price. These plans are commonly referred to as “flat-rate” or “unlimited” plans. These unlimited plans caused capacity issues at POPs because the marginal cost to the user was zero, and some users remained online much longer. Internet service providers reacted to this behavior by introducing plans with hourly limits and high marginal pricing above the limit. Most such plans were not particularly binding unless the user remained online for hours at a time most days of the month. Some ISPs also instituted automatic session termination when an online user remained inactive, eliminating problems arising from users who forgot to log off. However, this was perceived as poor service by some customers; consequently, many small ISPs hesitated to employ it. 7.2.4 The Structure of the ISP Market and Pricing The ISP market began to experience explosive entry around 1995, accelerating after the commercialization of the browser around the same time. Early movers in this market had experience with the network used in higher education. Firms such as PSINet, IBM, and MCI tried to stake positions as reliable providers for business and each achieved some initial success. A signal event in 1995–1996 was the entry of AT&T’s Worldnet service, which was first aimed at business in late 1995 and then explicitly marketed at households in early 1996. It became associated with reliable e-mail and browsing as well as flat-rate pricing at $20 a month, which imposed pricing pressure on other ISPs throughout the country. This service quickly grew to over a million users within a year, though its market growth eventually stalled. Indeed, it never met forecasts from 1995 that it would dominate the market because, in effect, its competitors also grew rapidly. Growing demand for all services meant that no player achieved dominance for several years (Greenstein 2001). The online service providers—Prodigy, Genie, CompuServe, MSN, and AOL—all began converting to Internet service around 1995, with some providing service earlier than others. All failed to gain much additional market share from this move except AOL, who used this conversion as an opportunity to alter their service’s basic features. That said, AOL’s conversion was not smooth. When AOL converted fully to Internet access in 1996, it experienced a difficult transition. Management underanticipated their own users’ responses to the introduction of flat-rate pricing. This bad publicity also facilitated further entry by other firms looking to pick up customers who fled the busy phone lines. AOL survived the bad publicity through a series of new investments in facilities and intense marketing.
Pricing at the On-Ramp to the Internet
205
Furthermore, in 1997 it made a deal with Microsoft to use Internet Explorer, which allowed it to grow at Microsoft Network’s (MSN’s) expense, who had been one of its big competitors until that point (Cusumano and Yoffie 1998). Furthermore, in 1998 AOL bought CompuServe, a merger that, in retrospect, initiated it on the path toward solidifying its leadership of dial-up service.3 Another important change was due to consolidation, especially in 1998. AOL sold off its physical facilities in 1996. When IBM sold its facilities to AT&T in 1997, AT&T became one of the largest business providers of access in the United States. When MCI and Uunet eventually became part of WorldCom in 1998 (subject to a restructuring and sell-off of MCI’s backbone, as mandated by the Department of Justice) WorldCom became the largest backbone provider in the United States and one of the largest resellers of national POPs to other firms. Neither AT&T’s entry, nor IBM’s or MCI’s positioning, had satisfied all new demand. After 1995, thousands of small entrepreneurial ventures also grew throughout the country and gained enough market share to sustain themselves. New entrants, such as Erols, Earthlink, Mindspring, Main One, Verio, and many others, gained large market positions. The entry (and exit) continued through 1999. Private label ISPs also emerged when associations and affiliation groups offered rebranded Internet access to their members. These groups did not own or operate an ISP. Instead, their access was being repackaged from the original ISP and rebranded. By 1997 more than 92 percent of the U.S. population had access to a competitive market filled with a wide variety of options. Another 5 percent of the population—found in many different rural locations throughout the United States—had access to at least one firm by a local phone call (Downes and Greenstein 2002). Economies of scale and barriers to entry were quite low, so thousands of firms were able to sustain their businesses. Roughly speaking, market share was quite skewed. A couple dozen of the largest firms accounted for 75 percent of market share and a couple hundred for 90 percent of market share, but there was so much turnover and fluctuation that estimates more precise than this were difficult to develop. Just prior to the AOL/Time Warner Merger in 1999–2000, the ISP market remained in flux. Broadband connections (digital subscriber line [DSL] or cable) began to become available in select places—primarily urban areas, offering these home users a faster and possibly richer experience. The so-called free-ISP model also emerged in late 1998 and grew rapidly in 1999, offering free Internet access in exchange for advertisements placed on the users’ screen. These firms eventually signed up several million households. The scope of service also continued to differ among ISPs, with 3. In 1999, AOL bought Netscape, well after the browser wars. The merger with Time Warner was proposed in 2000.
206
Greg Stranger and Shane Greenstein
no emergence of a norm for what constituted minimal or maximal service. Some ISPs offered simple service for low prices, while other ISPs offered many additional services, charging for some of these services and bundling other services in standard contracts. Stated succinctly, over a six-year period there were many changes in the modal contract form and user behavior. Variations in the delivery of services and changes in user expectations resulted in numerous qualitative changes in the basic service experienced by all users. All players were buffeted by many of the same competitive forces. 7.2.5 Turbulent Times and Price Indexes In a market as turbulent as this one, we are quite skeptical of traditional price-index construction using only measured prices weighted by market share, unaltered for qualitative change and competitive conditions. Our working hypotheses are simple: (a) it will be difficult to execute matchedmodel methods; (b) not accounting for quality will lead to problematic indicators of the true state of the market. Why are these hypotheses our starting point? First, large improvement in the quality of service occurred and went unmeasured. These changes were widespread and not unique to any particular firm. They happened too frequently to be measured. Every surviving firm, whether big or small, had to experiment often with alternative modes for delivery and different features in the service. Second, market share was frequently in flux, and such changes were likely to fall below the radar screen of any government price record. Experimentation enabled many new entrants to succeed in growing market share well after commercialization began. Yet data on market share normally is collected by government agencies at a frequency of two or three years at most. This only coarsely reflects the rapid addition of new users over time. Third, marketwide experimentation imposed competitive pressure on incumbent behavior, even when these were very large firms. Behaving as if they were “paranoid,” the most nimble largest firms of this era, such as AOL and Earthlink, did not stand still.4 Incumbent ISPs were compelled to make frequent releases of upgrades to their software, to spend lavishly on marketing, to add new features constantly, and to keep prices low by not charging for extras—to prevent the growing young firms from cutting into the incumbents’ leads. Yet most of these competitive outcomes, except nominal prices, were not measured. In short, while it is often satisfactory to ignore the behavior of small fringe firms, that omission (or de-emphasis) could lead us to throw away useful information. If the large and small acted 4. This paranoia appeared justified as the least nimble firms, such as AT&T WorldNet, did not keep up and, consequently, did not prosper (after a spectacular start in 1995–1996).
Pricing at the On-Ramp to the Internet
207
as if they were close substitutes, the small firms provide information about the unmeasured activities of the large. In summary, quality changed so rapidly that market share bounced around, and the large firms acted as if they were afraid of losing market share to the small. These observations will push us to examine the behavior of all firms in this market and not just the top dozen.5 7.3 Data Set Description The data set used in this paper is compiled chiefly from issues of Boardwatch Magazine’s Directory of Internet Service Providers (1996–1999). The directory debuted in 1996 and continued to be published through 1999. Since 1998, the same publisher has maintained a list of ISPs at http://www .thelist.com. Before the directory was published, Boardwatch Magazine published lists of Internet service providers in its regular magazine. These issues date from November 1993 until July 1995. Another handful of observations in the data set were collected from the contemporaneous “howto” Internet books that are listed in the references. The sample covers the time period from November 1993 until January 1999, approximately a six-year period. The sample is an unbalanced panel, tracking a total of 5,948 firms with a total of 19,217 price-plan observations.6 The data set consists of demographic information about the ISP (name, location, phone, and Web address). In each year there are a variety of other characteristics of the ISP that are measured, including whether they are a national provider, how many area codes they serve, presence of upstream bandwidth, and their total number of ports. There is additional data from a survey/test done by Boardwatch, tallying the percentage of calls completed and the average speed of actual connections for the national providers in 1998, though we will only partially use this data in this paper. Each ISP is associated with one or more price plans from a given time period. Each price plan observation includes the connection speed, monthly fee, and whether the plan offers limited or unlimited access. If access is limited, then there is information on the hourly limit threshold and 5. We could also appeal to precedent. A fair number of hedonic studies for PC software and hardware have used unweighted sales data for their estimation of quality-adjusted price indexes. For example, see Berndt and Rappaport (2001) on PC hardware, or Pakes (2002), or Berndt (1991) more generally. 6. This data does not represent all firms in the industry. Two or three price plans generally listed by Boardwatch for any given provider at one specific time represent most, but not all, plans available from that ISP. Greenstein (2000b) confirms that the Boardwatch data was incomplete in terms of the number of plans actually offered by an ISP. However, Boardwatch does state that the plans represent “the majority of users at an ISP or the most frequently chosen plans.” As table 7.1, documents, 25 observations are drawn from November 1993; 47 from January 1995; 1,283 from May 1996; 2,822 from August 1996; 3,813 from March 1997; 5,659 from January 1998; and 5,568 from January 1999.
208
Greg Stranger and Shane Greenstein
Table 7.1
Directory firmsa Sample firmsb Total observations Speedsc 14.4k 28.8k 56k ISDN 64k ISDN 128k T1 1.544mb Limited hours Unlimited % Limited 28.8k speed Limited hours Unlimited % Limited
ISP price dataset: Counts of firms and observations, by year 11/1993
1/1995
24
35
5/1996
25
47
710 1,283
25
42 5
702
8/1996
3/1997
1/1998
1/1999
2,934
3,535
4,167
4,511
2,822
3,813
5,659
5,568
2,822
3,367 446
3,972 1,564 54
2,562 3,006
299 282 13 12 52
22 25 47
303 980 24
996 1,826 35
1,024 2,789 27
69 1,130 4,529 20
581 4,987 10
2 3 40
303 399 43
996 1,826 35
1,024 2,343 30
1,130 2,842 28
581 1,981 23
Total
67 13,430 5,016 353 282 69
Note: The data set comprises all data published by the data sources listed in the references. The sole exception is the 5/1996 data, which represents a random sample of 710 firms from a total population of ~2,050 firms. The overall results presented in this paper are insensitive to the inclusion or exclusion of this subset of observations. a Some firms disappear from the published data, and others continue to be listed without price plan information. We are not sure of the fate of these firms, though it is likely that the ones that disappear have either been consolidated or failed. Firms that continue to appear without price data provide evidence that Boardwatch did in fact continue to monitor and update the pricing in their listings. This eliminates some bias in the results that would have occurred if the prices were not up to date. b Some firms listed in the data sources did not have price plan information. That is why there are few firms represented in the data sample. c Number of observations at each speed by year.
the cost of additional hours. In a given year, there may be multiple priceplan records for a given firm because they offer a variety of plans at different connection speeds. The published information generally gives pricing for 28.8k access as well as higher-speed access.7 Table 7.1 summarizes the number of observations in the panel. Four observations from the first two years were dropped due to the fact that they were extreme outliers. They certainly were unpopular, but because we lack market share, they had an overwhelming and undue impact on the early 7. Boardwatch mildly changed its formats from one year to the next. Depending on the year, this higher-speed plan could be for 64k or 128k ISDN access or for 56k access. It should be noted that the price plans for these higher speeds included no information about hourly limitations or marginal prices. We have chosen to treat them as unlimited plans. The other choice would be to attribute the same hourly limitations as the slower plan from the same firm in the same year, but we have no basis for doing so.
Pricing at the On-Ramp to the Internet
209
price-index results. No other cleaning of the data has been done, apart from simple verification and correction of data entry. As table 7.1 shows, the latter part of the sample period produces the greatest number of observations. This is one indication of how fast this industry was growing.8 Approximately 21 percent of the observed plans have an hourly limit, and the majority of those are accompanied by a marginal price for usage over that limit. Over time the universe of firms and plans grows, and the speeds offered increases. At the start of the sample, prices are only given for 14.4k connections. By the end of the data, 28.8k and 56k have been introduced, and there are price observations at 64k and 128k integrated services digital network (ISDN) speeds as well as a small number of observations of T1 connection prices.9 For limited plans, the hours included in the plans continue to increase over time. The number of plans with limitations is decreasing over time as a proportion of the sample. The pattern in the mean of monthly prices is not easy to discern. Greenstein (2000b) uses a more comprehensive source of data with a different format and examines contracting practices for only 1998. In that data, approximately 59 percent of firms quote only one price schedule, approximately 24 percent quote two price schedules, and 17 percent quote three or more. Of the single price quotes, approximately 26 percent are for limited prices. In this data set, 71 percent of the observations are firms quoting only one price, 26 percent quote two prices, and the remainder quote three or more prices. This is also highlighted in table 7.1, where the average is 1.2 price-plan observations per firm. The difference between the data here and in Greenstein (2000b) seems to be that here we have more firms who quote only one plan and fewer firms that quote more than two plans. We conclude that the data set represents a subset of the plans offered by each provider because the publishing format limited the variety of plans that an ISP could list. One of the weaknesses of this data set is the lack of quantity measures of subscribers and usage. Without usage data, there is no way to weight the price observations in the calculation of an ideal price index. At the same time, as noted previously, we would be quite skeptical of an outcome using such weighting. We discuss this further in the following. We construct our index assuming that most firms were responsive to the same technological trends. We are confident that qualitative change found 8. Consider the publishing pattern of ISP information in Boardwatch. In 1993–1995, the list of ISPs is relatively short and is included in the magazine, but by 1996 the market is growing rapidly and the listings are published in a separate directory that is updated quarterly. By 1998, changes in the market have slowed enough that the directory is only updated and published semiannually. By 1999, the directory is updated on an annual basis. 9. ISDN stands for integrated service digital network. It is a standard for transferring data over phone lines at 128k and requires both the phone line and the user to upgrade appropriately. Unlike the dial-up connections whose prices we study in this paper, a T1 line refers to a direct and fast connection, one that brings the pipe to the user’s premise, usually to a business.
210
Greg Stranger and Shane Greenstein
at one firm spread to others quickly. Another way to say this is as follows: it is as if we are assuming that the measured improvement at the small firms is informative about the unmeasured improvements at the large. In a companion paper, we partly test this assumption by examining the sensitivity of price estimates to the age of the ISP, which proxies for the durability of incumbency and stable market presence. We find it makes sense to do so (see Stranger and Greenstein 2004). We do not think this assumption makes sense after 2000. After the consolidation of AOL’s leadership and its merger with Time Warner, AOL begins to follow its own path. This is also increasingly true for MSN after the browser wars ended (in 1998) and after the entry of the free ISPs, such as NetZero, whose spectacular growth ceases after 2001. Moreover, the rate of unmeasured improvement in features of dial-up service begins to decline after the dot-com crash in spring of 2000 (though introduction of new features does not end after that, to be sure). As noted, the lack of market share is more problematic for a stable dial-up market, which, arguably, begins to emerge after 1998, and obviously emerges when adoption of the Internet slows at households, as it does by 2001 (NTIA 2001). Thus, we did not collect data after early 1999. 7.4 Elementary Price Indexes The most elementary price index is displayed in table 7.2. It does not adjust prices for any differences of quality over time. The means of the monthly prices trace a sharp upward path from 11/1993 to 5/1996 with an even sharper fall from 5/1996 to 8/1996, followed by small increases to 1/1998 and another steep fall in 1/1999. The medians also decline over time, but the changes are discrete. The fundamental problem with the data presented in table 7.2 is that the observations in each time period reflect very different service goods. For example, the outlying mean of prices in May 1996 is due to the inclusion of high speed contracts. Table 7.1 shows that more than 581 contracts from
Table 7.2
Nominal price index: Mean and median of monthly price—Full sample Time
Mean
Median
Plans
Nov. 1993 Jan. 1995 May 1996 Aug. 1996 March 1997 Jan. 1998 Jan. 1999
30.84 38.86 71.08 20.02 21.40 39.13 19.29
30.00 30.00 28.00 19.95 19.95 19.95 19.95
25 47 1,275 2,822 3,813 5,659 5,568
Pricing at the On-Ramp to the Internet Table 7.3
211
Nominal price index: Mean and median of monthly price—Speed 28.8k and below Time
Mean
Median
Plans
Nov. 1993 Jan. 1995 May 1996 Aug. 1996 March 1997 Jan. 1998 Jan. 1999
30.84 38.86 22.64 20.02 19.80 19.77 19.01
30.00 30.00 19.95 19.95 19.95 19.95 19.95
25 47 694 2,822 3,367 3,972 2,562
May 1996 are ISDN contracts, which Boardwatch reports in that issue (and then never again). Table 7.3 shows that homogenizing the sample does reduce the variation in the calculated means and medians. The price index based on the means now only rises from 11/1993 to 1/1995 and falls for the remainder of the sample period. This rise is persistent throughout the price indexes in the paper. It is discussed in more detail in a later section. The index based on the median falls early in the sample period and then remains steady for the remainder. This is indicative of the growing homogeneity across firms and plans in the later part of the sample. 7.4.1 Alternative Unweighted Matched Models A procedure such as matched models compares products that exist in two adjacent periods. This could be an improvement, but it suffers because it ignores the introduction of new products (at least until they have existed for two periods). This method also ignores the disappearance of older or obsolete products because there is no natural comparison to the product after its last year. If quality is increasing, then matched models will overstate the period-to-period index number, biasing upward the measured price change. Using the matched observations, it is possible to compute the values of the Dutot, Carli, and Jevons indexes (see table 7.4). Given a number of prices for matching services, represented as Pi,t ,10 these formulas are used for the indexes. More precisely, to construct the matched-model indexes, we matched price plans where firmi , speedj at timet are matched with firm i , speedj at timet1 . Table 7.5 reports results for an analysis for this strict matching, where both firms and speeds must match for a plan to be included in the calculation.11 The hypergrowth and turnover of the industry in the first few years 10. The i subscript designates the price plan and t subscript designates the time period. 11. Even the strict matching ignores any change in hours. We have ignored situations in which a plan switched between limited and unlimited.
212
Greg Stranger and Shane Greenstein
Table 7.4
Dutot, Carli, and Jevons indices
Index
Dutot
Formula
IDutot
∑ i Pi,t ∑ i Pi,t–1
Mean ratio of the prices
Pi , t
∑ P i
Carli ICarli
Jevons
IJevons
N i, t–1
Table 7.5
∑ i
Mean of the price ratios
Pi,t Pi,t–1
1/N
Geometric mean of price ratios
Matched model: Strictly matched observations Indices
Date Nov. 1993 Jan. 1995 May 1996 Aug. 1996 March 1997 Jan. 1998 Jan. 1999 Cumulative index
No. of matches
Dutot
Carli
Jevons
15 5 535 2,599 3,561 2,691
1.00 1.34 0.58 0.95 0.99 0.97 0.94
1.00 1.72 0.57 1.06 1.03 1.01 1.02
1.00 1.30 0.53 0.98 0.99 0.99 0.96
0.67
1.10
0.64
results in relatively few matches in the 1993–1996 period. In 1996, 510 plans12 from 5/1996 are matched into the 8/1996 part of the sample. From 1996 to 1997, a similarly large proportion of plans match. Although the absolute number of matching plans remains high, the proportion of plans that are matched decreases toward the end of the sample. It has been noted that the Carli index generally overestimates the index level, and this seems to be confirmed in the results in table 7.5 (Diewert 1987). This is because a single large or extreme value of P1/P0 swamps small values of P1/P0 when averaged. The simplest explanation is that this price ratio is unbounded above (price increases can exceed 100 percent), but the ratio is bounded below (price decreases can only be 100 percent) to zero. The Dutot index is nothing more than a comparison of the mean prices of the matched products. Because it is a simple average, the Dutot index is also 12. Of the 1,283 total plans in 5/96, only 702 can possibly match a plan in 8/1996 because the remaining 581 plans are either 64k or 128k plans that are not reported for any firms in the 8/1996 data.
Pricing at the On-Ramp to the Internet
213
susceptible to influence by large outlying data. The Jevons index is quite different. As a geometric average, the Jevons index works very efficiently in a large sample to reduce the impact of outlying observations. The results suggest that prices are declining throughout the sample period, with especially dramatic changes arising in between January 1995 and May 1996, though the sample is quite small for that time period. The notable exception is the Carli index, which shows price increases in nearly every period except May 1996, where the sample is very small. The average annual growth rate (AAGR) for the Jevons and Dutot indexes for the entire period is –7.8 percent. In all cases, the Jevons and Dutot indexes agree on direction of price change, despite differing on the exact magnitude of the change. These results are intriguing and suggest that more precise quality controls will yield interesting insights. 7.4.2 Determinants of Price Before proceeding to examine hedonic regressions and the associated price indexes, we motivate the selection of the hedonic price model. The speed and duration of the plan are important as are complementary service offerings. Contract length and setup costs may also be important, but they are not recorded in this data. Firm quality, experience, and the competitive environment are also potential determinants of price. One of the key developments in ISP service offerings over the 1993–1999 time period is the move from limited and metered plans to largely flat-rate unlimited usage plans. As noted earlier, in 1993, when ISPs began to offer services to consumers, there was little need for unlimited plans. In table 7.6, we show the mean fixed monthly cost of Internet access in this sample of ISPs. In each year, the mean price for limited contracts is below the mean price for unlimited contracts. These differences are all statistically significant, with p-values less than 1 percent. The table also illustrates the shift away from limited plans over the 1993–1999 time frame. At the outset, the limited plans make up roughly 50 percent of the sample plans. By 1999, limited plans make up just over 10 percent of the plans in the sample. In 1999, limited plans are, on average, $0.91 per month less expensive than unlimited plans. In table 7.7, we continue to examine the effect of plan limitations on ISP pricing. The data in the table indicate that for nearly every year, there is a persistent pattern to the mean prices and the hourly limits. The lowest prices are from the contracts that include ten hours or less in the fixed price. As the hourly limits expand, so do the mean prices. This is true across all years (except for 1/1995), and the monotonic relationship is maintained until the limits exceed 100 hours. Hour limitations above 100 hours appear to have no obvious relation to price that is consistent across the observational periods in the sample. Survey data from March 2000 report that 93.4 percent of users have monthly usage of eighty-two hours or less, and 90 percent of users have
214 Table 7.6
Greg Stranger and Shane Greenstein Descriptive statistics for prices of limited and unlimited plans Prices Nov. 1993 Mean SD N Jan. 1995 Mean SD N May 1996 Mean SD N Aug. 1996 Mean SD N March 1997 Mean SD N Jan. 1998 Mean SD N Jan. 1999 Mean SD N
Limited
Unlimited
15.15 12.65 13
47.83 25.06 12
27.71 15.58 22
48.67 38.73 25
19.73 12.72 303
24.90 19.26 391
18.36 7.79 996
20.93 6.22 1,826
18.29 7.60 1,024
22.54 22.21 2,789
18.67 9.19 1,130
21.38 14.59 4,406
18.48 5.94 581
19.39 7.46 4,987
Notes: SD standard deviation. All of the differences between means are significant at pvalues of 1 percent or smaller.
monthly usage of sixty-five hours or less (Goldfarb 2004). Thus, it is not surprising that limitations higher than 100 hours have little effect on ISP price. Comparing the higher-limitation mean prices with the unlimited plans in table 7.6, we observe that it is clear that these high-limitation plans are not priced very differently than the unlimited plans. Other relevant variables are in table 7.8. Connection speed is another important dimension of Internet access. Over the full sample, there are observations from price plans that range from 14.4k at the low end up to some prices for T1 speeds (1.544Mbs) at the upper end. As noted earlier, these speeds should be given a broad interpretation. The changing nature of user behavior influenced the marginal returns to faster connections.13 13. Of course, the other argument is that as connection speeds have improved, content providers have begun to offer richer content that uses higher transmission bandwidth.
215
Pricing at the On-Ramp to the Internet Table 7.7
Descriptive statistics of nominal prices by hourly limitation Hourly limitations (hrs)
Prices Nov. 1993 Mean SD N Jan. 1995 Mean SD N May 1996 Mean SD N Aug. 1996 Mean SD N March 1997 Mean SD N Jan. 1998 Mean SD N Jan. 1999 Mean SD N
10
20
35
50
80
100
150
250
250
11.25 4.79 4
20
16.69 3.25 7
38.74 19.32 4
26.23 5.82 8
47.48 38.93 2
12.59 7.85 70
15.31 5.31 34
20.11 7.03 28
22.43 6.31 39
21.41 9.17 24
22.94 5.72 37
22.86 6.42 32
25.48 5.14 23
30.43 40.29 18
11.28 6.52 163
13.80 5.34 119
17.87 8.71 105
21.13 7.51 122
21.05 6.27 122
22.33 6.89 135
21.02 6.08 122
20.82 5.08 81
20.41 5.62 43
10.44 4.91 141
13.46 5.35 99
17.65 10.48 102
19.52 6.66 109
20.61 6.86 130
21.85 6.64 152
20.82 5.83 130
21.07 4.75 114
19.29 5.41 65
10.15 5.15 123
13.12 5.85 91
15.74 5.28 126
19.33 6.56 110
20.25 6.79 135
22.74 14.73 170
20.95 5.49 152
21.26 4.85 140
20.84 11.06 101
9.65 6.29 30
10.69 2.76 34
16.10 4.77 38
15.97 5.48 33
18.70 4.73 47
21.01 6.37 69
20.11 5.10 112
20.44 4.56 135
19.15 4.45 87
1 33 1
Notes: SD standard deviation. Survey data from March 2000 in Goldfarb (2004) shows that 93.4 percent of users have monthly usage of 81.7 hours or less, 90 percent of users use 65 hours or less. So limitations at or above 80 hours were probably not binding at all until recently and then only for a very small percentage of users.
There are a number of other measures in the data set that could signal ISP quality. More specialized types of access services being offered by an ISP could signal the technical expertise of their staff and their reputation for quality and adoption of leading technology. While there are many different ways to proxy for quality, we, for the most part, do not employ them in our hedonic analysis.14 In part, this is due to data limitations. More14. We explored using such factors as whether the ISP provided national coverage, whether they provided additional services and some coarse measures of capacity, such as ports or T1 line backbone connections. These largely did not predict as well as the factors we left in the hedonic analysis. In addition, some of these were not available in all time periods, resulting in us using nonnormalized measures of qualitative change over time.
216
Greg Stranger and Shane Greenstein
Table 7.8
Descriptive statistics for hedonic regression explanatory variables: Full sample
Variable
No. of observations
Mean
SD
Min.
hrs10 hrs20 hrs35 hrs50 hrs80 hrs100 hrs150 hrs250 isdn limited price speed speed14 speed28 speed56 speed64 speed128 speedT1 yr93 yr95 yr96a yr96b yr97 yr98 yr99
19,217 19,217 19,217 19,217 19,217 19,217 19,217 19,217 11,964 19,217 19,209 19,217 19,217 19,217 19,217 19,217 19,217 19,217 19,217 19,217 19,217 19,217 19,217 19,217 19,217
0.028 0.020 0.021 0.022 0.024 0.029 0.029 0.026 0.504 0.212 29.163 43.392 0.003 0.699 0.261 0.018 0.015 0.004 0.001 0.002 0.067 0.147 0.198 0.294 0.290
0.165 0.140 0.144 0.145 0.153 0.169 0.167 0.158 0.500 0.409 100.845 91.607 0.059 0.459 0.439 0.134 0.120 0.060 0.036 0.049 0.250 0.354 0.399 0.456 0.454
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 14.400 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Max. 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 3,200 1,544 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
over, as we show in the following, however, we employ a random-effects estimator that correlates errors at an ISP over time. This will capture a portion of any unobserved quality that is correlated at the same firm.15 7.4.3 Hedonic Price Indexes Hedonic models can be used to generate predicted prices for any product (i.e., bundle of characteristics) at any given time. The first hedonic model that we will estimate is (1)
ln Pijt 0 tYearijt 1Limitedijt 29dHrlyijt Limitedijt 15dSpeedijt εijt ,
where the subscripts designate firm i, plan j, at time t. To divide the hourly limitations into indicator variables, we examined the frequency plot of the 15. In our companion paper (Stranger and Greenstein 2004), we will control for quality with vintage and age effects. For more on measuring quality at ISPs, see Augereau and Greenstein (2001) and Greenstein (2000a,b, 2002).
Pricing at the On-Ramp to the Internet Table 7.9
217
Frequency counts for limited hours bins Variable
Hourly limitation (hrs)
Count
hrs10 hrs20 hrs35 hrs50 hrs80 hrs100 hrs150 hrs250 hrgt250
0–10 10–20 20–35 35–50 50–80 80–100 100–150 150–250 250
538 382 407 415 458 563 549 493 314
Notes: Each hourly limitation includes the upper boundary but not the lower boundary. The limit “10–20” is the set of hours (10,20).
hourly limits. Those divisions and frequencies are shown in table 7.9. Note that the use of indicator variables provides flexibility for the coefficient estimates. The specification in equation (1) was estimated for the whole pooled sample and for each pair of adjacent time periods. Regression results are reported in table 7.10. In all cases, the standard errors are robust standard errors with corrections for clustering. Because of the abundance of data between 1995 and 1999 and because of the similarity of pricing strategies across ISPs in a given year, we expect most of the coefficients to be tightly estimated. In general, we also expect the specifications for adjacent time periods to be superior to the pooled specification. We observe in the data that over time ISPs offer increasingly fast connection speeds. Unlimited plans have become more prevalent over time, while the hours allowed under limited plans have increased over time. These trends also indicate increases in “quality” over time. In the adjacent period models, the time indicator variable is only being compared to the previous period. In the pooled models, each coefficient on the time indicator variables represents a difference in price relative to the omitted time period (11/1993). In the pooled model, the coefficients should all be negative, and the coefficients of each succeeding period should be more negative than the previous one because each successive coefficient estimate represents an accumulated price decline. Limited plans should have a negative impact on prices, but that impact should be decreasing as the number of hours allowed under the plan increases. For the regression, this means that we expect the difference between the coefficients Hrs10 L and Limited to be negative. Each difference should be smaller in absolute value as Limited is compared to higher-level buckets, but the differences should remain negative (or approach zero—indicating that a high-limit plan is really no different than an unlimited plan).
19,199 5,575 0.534
19,199 5,575 0.533
71 45 0.402
–0.433∗ omitted
0.866∗∗
–1.039∗∗∗ 0.019 0.746∗ 0.562 1.025
4.044∗∗∗ 0.058
93/95
4,097 2,988 0.548
0.977∗∗∗ 1.513∗∗∗
1.463∗∗∗ 1.999∗∗∗ 1,322 705 0.496
omitted
–0.091 –0.642∗∗∗ –0.356∗∗∗ –0.115 0.071 0.038 0.136∗∗ 0.084 0.110
–0.098∗∗∗
3.104∗∗∗
96a/96b
–0.131 –0.601∗∗∗ –0.275∗ –0.102 0.101 –0.025 0.130 0.116 0.241∗ omitted 0.490∗
–0.968∗∗∗
3.586∗∗∗
95/96a
6,635 3,596 0.233
omitted 0.253∗∗∗
–0.070 –0.664∗∗∗ –0.381∗∗∗ –0.138 0.029 0.038 0.133∗∗∗ 0.077 0.090∗
–0.028∗∗∗
3.005∗∗∗
96b/97
Adjacent period regressions
11,218 5,137 0.593
4.246∗∗∗
4.270∗∗∗ 9,471 4,186 0.576
omitted 0.042∗∗∗ 0.852∗∗∗
–0.073∗∗∗ 0.017 –0.795∗∗∗ –0.526∗∗∗ –0.274∗∗∗ –0.126∗∗∗ –0.053∗ 0.070∗∗ 0.018 0.048∗
2.981∗∗∗
98/99
omitted 0.123∗∗∗ 0.877∗∗∗
–0.010 –0.738∗∗∗ –0.454∗∗∗ –0.229∗∗∗ –0.060 –0.019 0.093∗∗∗ 0.041 0.062∗
–0.035∗∗∗
2.991∗∗∗
97/98
Notes: Robust standard errors were used throughout, including corrections for clustering by firm. Not too much should be made of the R2 measures across regressions. The higher R2s occur in the regressions with the high-speed (64, 128, and 1544) plans where there is the greatest degree of price dispersion. The high R2 is predominantly due to the dichotomous variables on the high-speed plans. ∗∗∗Significant at p-values less than 1 percent. ∗∗Significant at p-values less than 5 percent. ∗Significant at p-values less than 10 percent.
No. of observations Firms R2
3.262∗∗∗ 0.332∗∗ –0.643∗∗ –0.748∗∗ –0.757∗∗ –0.784∗∗∗ –0.863∗∗∗ 0.030∗∗∗ –0.782∗∗∗ –0.499∗∗∗ –0.263∗∗∗ –0.097∗∗∗ –0.057∗∗
3.282∗∗∗ 0.313∗∗ –0.663∗∗ –0.768∗∗∗ –0.776∗∗∗ –0.803∗∗∗ –0.881∗∗∗ –0.036 –0.716∗∗∗ –0.432∗∗∗ –0.196∗∗∗ –0.030 –0.005 0.104∗∗∗ 0.055∗ 0.087∗∗∗ omitted 0.494∗∗ 0.564∗∗ 1.446∗∗∗ 1.998∗∗∗ 4.748∗∗∗
Constant Year95 Year96a Year96b Year97 Year98 Year99 Limited Hrs10 L Hrs20 L Hrs35 L Hrs50 L Hrs80 L Hrs100 L Hrs150 L Hrs250 L Speed14 Speed28 Speed56 Speed64 Speed128 SpeedT1 0.494∗ 0.564∗∗ 1.446∗∗∗ 1.998∗∗∗ 4.749∗∗∗
Restricted
Model full
Regression results from estimation of eq. (0.1)
Variable
Table 7.10
Pricing at the On-Ramp to the Internet
219
The expected sign of the estimated coefficients on the speed indicator variables varies depending on which speed variable is omitted although in all cases we expect that higher-speed plans should have higher (more positive) coefficients than lower-speed plans. The regression results based on equation (1) appear in table 7.10. The largest hourly limitation buckets have been discarded from the full model, so we focus on the coefficients of the restricted model and the accompanying adjacent period regressions. In the restricted regression (second column), all estimated coefficients are significant predominantly at the 1 percent or 5 percent level. The coefficients on each of the speed variables confirm the previously given hypothesis. The coefficients for the higher speeds exceed the coefficients for the lower speeds, and the pattern is monotonically increasing. The differences between the hourly limitation variables coefficients and the coefficient on limited also confirm the previously given hypothesis. Specifically, plans with a limited number of hours are priced at a discount to unlimited plans, but this discount diminishes as the number of included hours increases.16 The coefficients on the time indicator variables agree largely with the previously given hypotheses. Apart from the period from 11/1993 to 1/1995, the estimated coefficients indicate that quality-adjusted prices were falling, and the coefficients become successively more negative as time passes, consistent with the hypothesis described earlier. There are two anomalies regarding the time indicator variable. The difference between the coefficients on year95 and year96a is very large (indicating that 5/1996 prices are 40 percent of the level of 1/1995 prices). This dramatic large price decline needs to be investigated further, which we do in the following. One interesting result from the regression is that prices appear to increase on a quality-adjusted basis from 11/1993 up to 1/1995. This is a recurring pattern through many of the models. It can be explained by the fact that the nature of Internet access changed during the intervening time period. In 11/1993, the connections that were offered were all unix-to-unix copy (UUCP) connections that were capable of exchanging files, newsgroups, and e-mail but had no interactive features. By 1/1995, all of the plans in the data are for serial line Internet protocol (SLIP) access, which is a more highly interactive connection that has all the capabilities of 16. After testing the coefficients for each of the hourly buckets, all but the lowest four were dropped from the model. Results from hypothesis tests (For example, testing H0: Hrs80 L – Limited 0) indicated that these coefficients were not significantly different from the coefficient on Limited because they added no more information than the Limited variable. In the unrestricted models (both pooled and adjacent year models), the omitted hourly Limited indicator variable is for all hourly limits above 250 hours. The omitted speed indicator variable is for plans offering 14.4k access. The omitted time period indicator variable (year) is for 11/ 1993.
220
Greg Stranger and Shane Greenstein
UUCP plus additional features (including multimedia capabilities).17 When the quality increase is the same across all of the sample products, then it cannot be identified separately in an hedonic regression from the time period indicator variable. Thus, in 1/1995 prices are higher than in 11/ 1993, but it is because Internet access technology has fundamentally improved. Because all the ISPs have adopted the new type of access and quality has increased, there is no heterogeneity in the sample and no way to control for the quality change. The final six columns in table 7.10 display results from the adjacent period regressions. The pooled model is a significant restriction. In the pooled model, intercepts may vary across time, but the slopes with regard to the characteristics are restricted to be equal across periods. The adjacent period models relax this restriction so that the slopes are restricted to be equal only across two periods in any model. In the latter years of the data, this restriction does not affect the estimated coefficients much. The restriction does matter in the early years of the data. The estimate on Limited and some of the specific hour limitations varies paired-year to paired-year. As we have an abundance of data for later years and not in early years, we lose degrees of freedom with the adjacent year regressions during the earliest part of the sample, when we most need it. Although some of the coefficients among the adjacent period models are statistically insignificant, the majority of coefficients confirm the stated hypotheses. The hourly limitations and speeds affect price in the same way as the pooled model. The price increase in 1/1995 is indiscernible because although the coefficient has a positive sign, it is not significant. The remaining interperiod indicators are of negative sign and the steep change in price from 1/1996 to 5/1996 is still present and very significant. The coefficients from the hedonic regression model in equation (1) lead to a calculation of the estimated price indexes. These estimates are a consequence of the form of the model, plus a correction for the bias from exponentiating estimates of equation (1).18 The models estimated in table 7.10 lead to estimated price indexes in table 7.1, where 11/1993 is our base time period. These are easily reconverted to period-to-period indexes. The models in table 7.10 that consider adjacent time periods also lead directly to estimates of the period-to-period indexes. 17. Looking carefully at the data and the advertisements, we observed that it is clear that firms were promoting SLIP accounts as a premium service (as opposed to UUCP). The data seem to indicate that they were charging a premium for it as well. Because there is no heterogeneity among the 1/1995 plan options, it is impossible to identify this effect and separate it from the time period constant. 18. See the discussion in Berndt (1991). The correction involves adding half of the squared standard error of the regression to the simulated price, correcting for the nonzero expectation of an exponential normal error. Sometimes this correction can make a big difference to the estimate for the price index. See Pakes (2002) for such an example. In our case it did not make much difference to the estimated price index.
Pricing at the On-Ramp to the Internet Table 7.11 Model Regression coefficients Jan. 1995 May 1996 Aug. 1996 March 1997 Jan. 1998 Jan. 1999 Indices Nov. 1993 Jan. 1995 May 1996 Aug. 1996 March 1997 Jan. 1998 Jan. 1999
221
Direct price indices calculated from hedonic specification eq. (0.1) Restricted
93/95
0.332 –0.643 –0.748 –0.757 –0.784 –0.863
0.058
95/96a
96a/96b
96b/97
97/98
98/99
–0.968 –0.098 –0.028 –0.035 –0.073
Cumulative
Period-toperiod
1.000 1.394 0.526 0.473 0.469 0.457 0.422
1.39 0.38 0.90 0.99 0.97 0.92
93/95
95/96a
96a/96b
96b/97
97/98
98/99
1.06 0.38 0.91 0.97 0.97 0.93
The results are shown in table 7.11. The table shows that the cumulative quality-adjusted index declines 58 percent to 0.422 in 1/1999 when compared to 1.00 in the base period, 11/1993. The individual period-to-period indexes display large variation during the initial periods, but then moderate to a 1–10 percent decline per period thereafter.19 The calculations from the adjacent year regressions are largely the same as the results from the restricted model. The exception is the 11/1993 to 1/1995 index, which displays a less extreme rise during the time period under the adjacent years method. The extreme drop in the index from 1/1995 to 5/1996 is still present and deserves an explanation. Two factors produce this drop. First, there is a large difference in the number of firms. The observations from January 1995 describe a couple dozen ISPs selling connections to the Internet for purposes of using a Mosaic browser, or a beta version of the Netscape browser, and basic e-mail client. By May of 1996, most of the new entrants are small ISPs selling connections for the Netscape browser and e-mail. Second, by the spring of 1996, AT&T WorldNet has entered home service and the market is heading toward a twenty-dollar price point for basic service at 28k speeds. Even without controlling for quality, table 7.6 shows 19. It is difficult to compare all of the adjacent period indices. Each time period is of different length, so for accurate and easier comparison, it would be correct to annualize the changes.
222
Greg Stranger and Shane Greenstein
that prices declined during this period for both unlimited and limited plans. However, table 7.6 does not control for precise levels of limits. With such raw data, it is not a surprise that estimated price declines by more than half once hedonic estimates control for the same level of limits. This finding is consistent with popular perceptions about the growth in the Internet, usually timed to Netscape’s initial public offering (IPO) in August 1995. To our surprise, the price declines do not stop after the spring of 1996. We also find a 20 percent decline in price per unit of ISP quality for the thirty-three-month period between spring 1996 and early 1999. 7.4.4 Hedonic Price Indexes with Random Effects The data set covers very few characteristics of each plan or product, and there are undoubtedly unmeasured elements of quality that are missing from the model specified in equation (1). One concern is that unmeasured quality at an ISP is roughly the same across contracts and across years. In other words, if an ISP guaranteed high-quality service (e.g., large modem banks that rarely have busy signals) or offers the same enticements to all its customers (e.g., large e-mail accounts), then this quality will be unmeasured for contracts coming from the same firm. Because of the (unbalanced) panel nature of the data set, the firm-specific unmeasured quality can be at least partially corrected using a random-effects model, where the unmeasured error is assumed to be the same for all contracts offered by one firm. In this case, the regression model given above in equation (1) will be changed by adding a firm-specific error term (i ). (2)
ln Pijt 0 tYearijt 1Limitedijt 29dHrlyijt Limitedijt 15dSpeedijt i εijt
This specification will emphasize within variation in contracts in situations where we observe multiple contracts from the same firm. Because much of the data comes from small firms who do not appear often in the data set, it was difficult to predict whether this specification will change the estimates much. We have estimated both the fixed- and random-effects specifications of model (2)—using the standard subroutines in Stata. The regression results are shown in table 7.12. The Breusche-Pagan test indicates that the hypothesis that var(i ) 0 can be rejected with better than 1 percent certainty. The Hausman specification test indicates that the random-effects specification is preferred.20 We therefore examine the random effects results in further detail. 20. Intuitively speaking, it is easy to see why random effects is preferred in this data set. The fixed-effect model throws out all the observations where an ISP has one contract. In contrast, the random-effects specification employs the variation between these ISPs who offer only one contract, while the fixed-effect specification does not.
19,199 5,575 0.532
3.136∗∗∗ 0.299∗∗∗ –0.516∗∗∗ –0.586∗∗∗ –0.590∗∗∗ –0.613∗∗∗ –0.684∗∗∗ –0.038∗∗ –0.682∗∗∗ –0.350∗∗∗ –0.147∗∗∗ –0.044∗ –0.007 0.056∗∗ 0.040∗ 0.079∗∗∗ omitted 0.450∗∗∗ 0.522∗∗∗ 1.389∗∗∗ 1.934∗∗∗ 4.688∗∗∗
3.009∗∗∗ 0.335∗∗∗ –0.428∗∗∗ –0.477∗∗∗ –0.477∗∗∗ –0.499∗∗∗ –0.563∗∗∗ –0.034∗ –0.663∗∗∗ –0.299∗∗∗ –0.094∗∗∗ –0.055∗ –0.017 0.005 0.029 0.074∗∗∗ omitted 0.464∗∗∗ 0.538∗∗∗ 1.401∗∗∗ 1.944∗∗∗ 4.697∗∗∗
19,199 5,575 0.529
Full, RE
Models full, FE
19,199 5,575 0.532
0.044∗∗ omitted 0.450∗∗∗ 0.523∗∗∗ 1.390∗∗∗ 1.934∗∗∗ 4.689∗∗∗
3.125∗∗∗ 0.309∗∗∗ –0.505∗∗∗ –0.575∗∗∗ –0.579∗∗∗ –0.603∗∗∗ –0.674∗∗∗ –0.003 –0.717∗∗∗ –0.385∗∗∗ –0.181∗∗∗ –0.079∗∗∗ –0.034∗
Restricted, RE
Regression results from estimation of eq. (0.2)
Note: See notes to table 7.10. ∗∗∗Significant at less than 1 percent. ∗∗Significant at less than 5 percent. ∗Significant at less than 10 percent.
No. of observations Firms R2
Constant Year95 Year96a Year96b Year97 Year98 Year99 Limited Hrs10 L Hrs20 L Hrs35 L Hrs50 L Hrs80 L Hrs100 L Hrs150 L Hrs250 L Speed14 Speed28 Speed56 Speed64 Speed128 SpeedT1
Variable
Table 7.12
71 45 0.378
omitted –0.401
–0.761∗∗∗ –0.260 0.597∗ 0.204 0.543
3.964∗∗∗ 0.119
93/95
6,635 3,596 0.233
0.978∗∗∗ 1.514∗∗∗
1.367∗∗∗ 1.897∗∗∗
4,097 2,988 0.547
omitted 0.276∗∗∗
omitted
1,322 705 0.496
0.030
–0.003 –0.711∗∗∗ –0.391∗∗∗ –0.172∗∗∗ –0.043 0.010
–0.023∗∗∗
2.998∗∗∗
96b/97
0.021
–0.002 –0.731∗∗∗ –0.446∗∗∗ –0.205∗∗∗ –0.019 –0.030
–0.097∗∗∗
3.104∗∗∗
96a/96b
0.177 omitted 0.391
–0.031 –0.707∗∗∗ –0.363∗∗∗ –0.173 –0.018 –0.100
–0.824∗∗∗
3.538∗∗∗
95/96a
11,218 5,137 0.593
4.237∗∗∗
4.254∗∗∗ 9,471 4,186 0.574
omitted 0.048∗∗∗ 0.883∗∗∗
0.018
–0.064∗ 0.021∗∗∗ –0.688∗∗∗ –0.416∗∗∗ –0.229∗∗∗ –0.124∗∗∗ –0.056∗∗
2.975∗∗∗
98/99
omitted 0.139∗∗∗ 0.917∗∗∗
0.024
0.016 –0.645∗∗∗ –0.396∗∗∗ –0.215∗∗∗ –0.101∗∗∗ –0.044∗
–0.030∗∗∗
2.986∗∗∗
97/98
Adjacent period regressions—random effects specification
224
Greg Stranger and Shane Greenstein
The random-effects regression results does not differ from the earlier results much except in one key place. The main difference is that the drop in prices ascribed to 1/1995 to 5/1996 period is dampened. The pattern among the time period indicator variables is maintained. The significance and pattern among the plan limitations fits with earlier hypotheses and follows the pattern of the earlier results. The estimated coefficients on the speed indicator variables also follow the pattern outlined in the preceding hypotheses and reconfirm the results from the earlier regression. Table 7.12 also reports the adjacent period regression results. They follow the same pattern of the earlier results with, again, the main difference being a dampened drop in the index from 1/1995 to 5/1996. Using the regression results from the random-effects restricted model and the random-effects adjacent period model, we have recalculated the cumulative and period-to-period indexes in table 7.13, which are biased predictors under the stochastic specification. Even then, the results are qualitatively similar. The cumulative index drops from 1.00 in 11/1993 to 0.51 in 1/1999, imply that the quality-adjusted prices fell by 49 percent over this period. As before, the period-to-period indexes swing wildly in the initial periods but then settle to steady declines of almost 7 percent per year on average. The notable difference between the random-effects model results and the earlier results is shown in the period-to-period index from 1/ 1995 to 5/1996. Without random effects, the index declined to 0.38 over this single period. Taking other unmeasured elements of firm quality into ac-
Table 7.13 Model Regression coefficients Jan. 1995 May 1996 Aug. 1996 March 1997 Jan. 1998 Jan. 1999 Indices Nov. 1993 Jan. 1995 May 1996 Aug. 1996 March 1997 Jan. 1998 Jan. 1999
Direct price indices calculated from estimation of hedonic eq. (0.2) Restricted
93/95
0.309 –0.505 –0.575 –0.579 –0.603 –0.674
0.119
95/96a
96a/96b
96b/97
97/98
98/99
–0.824 –0.097 –0.023 –0.03 –0.064
Cumulative
Period-toperiod
1.000 1.362 0.604 0.563 0.560 0.547 0.510
1.36 0.44 0.93 1.00 0.98 0.93
93/95
95/96a
96a/96b
96b/97
97/98
98/99
1.13 0.44 0.91 0.98 0.97 0.94
Pricing at the On-Ramp to the Internet
225
count dampens this drop in the price index. In table 7.13, the index only drops to 0.44. The index values calculated from the adjacent period models are all nearly the same as the single period indexes derived from the pooled model. The only difference is the 11/1993 to 1/1995 index, but this is an insignificant coefficient in the adjacent period regression. It appears, therefore, that firm-level random effects slightly alter the quality-adjusted price index, but not by much. The basic reason is that there is so much entry and exit in the sample. With thousands of new firms each year, it is not possible to get a sufficient number of repeat observations on enough firms to measure changing quality. In more recent times, when the set of firms is so stable we would expect this correction to have a greater effect, but it does not due to the presence of many firms offering only a single contract. We conclude that accounting for measured and unmeasured quality is a simple and useful addition to the tools for calculating price indexes. It is a further refinement of the standard hedonic techniques, and it is not difficult to implement. To be sure, in this example, it did not yield a large difference in estimates, but it was enough to raise questions about the quality of estimates early in our sample. It is worthwhile to further explore in service industries where quality of service is correlated across all services offered by one firm. 7.4.5 Analysis of Subsample with Speeds below 28.8k Because change in modem speeds is coincident with the transition to unlimited plans, we were aware of the possibility that the preceding results could be an artifact of change in modem speeds. We assessed this empirically by examining contracts only for 28.8k service. We have repeated the random-effects modeling (from equation [2]) with a subsample of plans that offer connection speeds at or below 28.8k. Table 7.14 presents the regression results from this subsample. The results shown for the subsample correspond well to the full sample regression. The coefficients display the same pattern as the earlier full sample regressions, supporting the preceding hypotheses. Quality-adjusted prices decline over the sample period, with the coefficient for each time period being more negative than the previous one. The apparent quality-adjusted price rise from 11/1993 to 1/1995 persists, suggesting that this pattern is not an artifact of the higher-speed plans. The plans with hourly limitations reconfirm the pattern of the full sample. Additional limited hours are consistently more valuable, with the highest limited plans nearly indistinguishable from unlimited plans. In the pooled regression for the subsample, speed of a plan is handled using a dichotomous variable indicating whether a plan is 14.4k or 28.8k. In the regression, the 28.8k plan indicator was the omitted category. The only result in this subsample that conflicts with the earlier results is the coefficient on
13,484 5,282 0.533
3.110∗∗∗ 0.305∗∗∗ –0.604∗∗∗ –0.677∗∗∗ –0.697∗∗∗ –0.704∗∗∗ –0.738∗∗∗ –0.048∗∗∗ –0.725∗∗∗ –0.353∗∗∗ –0.164∗∗∗ –0.026 0.002 0.060∗∗∗ 0.043∗∗ 0.082∗∗∗ 0.566∗∗∗ omitted
3.136∗∗∗ 0.299∗∗∗ –0.516∗∗∗ –0.586∗∗∗ –0.590∗∗∗ –0.613∗∗∗ –0.684∗∗∗ –0.038∗∗ –0.682∗∗∗ –0.350∗∗∗ –0.147∗∗∗ –0.044∗ –0.007 0.056∗∗ 0.040∗ 0.079∗∗∗ omitted 0.450∗∗∗ 0.522∗∗∗ 1.389∗∗∗ 1.934∗∗∗ 4.688∗∗∗
19,199 5,575 0.532
28.8 (1.2) subsample
Model (0.2) full sample
71 45 0.394
0.414 omitted
0.829
–0.912∗∗∗ –0.119 0.742∗∗ 0.364 0.721
3.614∗∗∗ 0.049
93/95
Regression results from estimation of eq. (0.2): 28.8k speed plans only
741 697 0.291a
–0.128 –0.603∗∗∗ –0.279∗∗∗ –0.100 0.101 –0.027 0.129 0.114 0.242∗∗ 0.479∗∗∗ omitted
–0.937∗∗∗
3.566∗∗∗
95/96a
3,516 2,981 0.257
–0.082∗∗ –0.680∗∗∗ –0.363∗∗∗ –0.134∗∗∗ 0.034 0.044 0.102∗∗ 0.066 0.044
–0.073∗∗∗
3.088∗∗∗
96a/96b
6,189 3,590 0.242
–0.095∗∗∗ –0.650∗∗∗ –0.296∗∗∗ –0.103∗∗∗ 0.061∗ 0.089∗∗∗ 0.153∗∗∗ 0.082∗∗ 0.114∗∗∗
–0.020∗∗∗
2.997∗∗∗
96b/97
Adjacent period regressions
7,339 4,173 0.2251
0.004 –0.743∗∗∗ –0.430∗∗∗ –0.228∗∗∗ –0.068∗∗ –0.054∗∗ 0.009 0.028 0.035
–0.007∗∗
2.978∗∗∗
97/98
6,530 4,835 0.209
–0.036∗∗∗ 0.004 –0.793∗∗∗ –0.477∗∗∗ –0.252∗∗∗ –0.113∗∗∗ –0.052∗∗ 0.065∗∗∗ 0.014 0.052∗∗
2.968∗∗∗
98/99
Notes: See notes to table 7.10. For estimates in 1996 and later the only available speeds are 28.8k and higher. Because all high-speed plans are dropped from the data, any speed variable is collinear with the constant term in the regression. a The drop in R2 here and in the following adjacent year regressions is due to the loss of heterogeneity in plan speeds. In this regression, only 42 14.4k plans remain. The balance of the observations are 28.8k speed. ∗∗∗Significant at less than 1 percent. ∗∗Significant at less than 5 percent. ∗Significant at less than 10 percent.
No. of observations Firms R2
Constant Year95 Year96a Year96b Year97 Year98 Year99 Limited Hrs10 L Hrs20 L Hrs35 L Hrs50 L Hrs80 L Hrs100 L Hrs150 L Hrs250 L Speed14 Speed28 Speed56 Speed64 Speed128 SpeedT1
Variable
Table 7.14
Pricing at the On-Ramp to the Internet
227
14.4k speed plans. Recall the earlier argument that put forward the hypothesis that faster plans should command a price premium. To be consistent with that hypothesis, the coefficient on Speed14 should be negative (because Speed28 is the omitted dichotomous variable). However, in table 7.14, this estimated coefficient is positive and statistically significant. Coefficient estimates from the adjacent period regressions are also shown in table 7.14. Similar to the pooled model, these regressions on the subsample largely reconfirm the results from the full sample. Prices decline over time, and larger limits are more valuable. In the 95/96a regression results, a similar positive and significant coefficient appears for Speed14. This again is unexpected and runs contrary to the preceding hypothesis. The remaining adjacent period regressions do not control for speed of plan because only 28.8k speed plans are considered in the remaining part of the subsample. Using the regression results from the random-effects restricted model and the random-effects–adjacent-period model, we have recalculated the cumulative and period-to-period–quality-adjusted price indexes; they appear in table 7.15. The results are consistent with the results from the full sample. The cumulative index reveals that prices in this subsample drop from 1.00 in 11/1993 to 0.48 in 1/1999. This implies that the estimated quality-adjusted prices have dropped by 52 percent over the sample period. This index is consistent with the full sample cumulative index, which
Table 7.15
Model Regression coefficients Jan. 1995 May 1996 Aug. 1996 March 1997 Jan. 1998 Jan. 1999 Indices Nov. 1993 Jan. 1995 May 1996 Aug. 1996 March 1997 Jan. 1998 Jan. 1999
Direct price indices calculated from estimated coefficients in hedonic eq. (0.2): 28.8k speed plans only Restricted
93/95
0.305 –0.604 –0.677 –0.697 –0.704 –0.738
0.049
95/96a
96a/96b
96b/97
97/98
98/99
–0.937 –0.073 –0.02 –0.007 –0.036
Cumulative
Period-toperiod
1.000 1.357 0.547 0.508 0.498 0.495 0.478
1.357 0.403 0.930 0.980 0.993 0.967
93/95
95/96a
96a/96b
96b/97
97/98
98/99
1.050 0.392 0.930 0.980 0.993 0.965
228
Greg Stranger and Shane Greenstein
dropped by 49 percent over the same period. The adjacent period calculations are consistent with the full sample results. The price index increases between the first two periods, which is followed by a sharp decline and then steady annual declines of 1–7 percent thereafter. In summary, repeating the analysis of the random-effects estimation of equation (2) on a subsample of plans with speeds at or below 28.8k yields results consistent with the full sample. This suggests that the treatment of the hourly limitations in the higher-speed plans is not significantly skewing the results for the full sample. It also suggests that although many new entrants appeared over time offering higher-speed plans, the pattern of quality-adjusted prices was not different between the “old” and “new” providers. The most important conclusion is that the unobserved limits for the high-speed plans are not affecting the overall results. 7.4.6 Weighted Hedonic Price Indexes As noted earlier, we were skeptical of calculating a price index with market shares or revenue shares of the product or service. However, even with such skepticism, we would still prefer to calculate such an index and see what difference, if any, such weighting makes. Unfortunately, the Boardwatch ISP pricing data did not contain such information. Because the listings are organized by area codes served, we considered using the number of area codes served by each ISP as a coarse market share weighting. However, close inspection of this procedure reveals that it is fraught with problems. First, even in the best of times, it would be a coarse measure because population density is not uniform across area codes and intra-area code market shares are not evenly split across population areas. Moreover, the number of potential area codes in which an ISP can offer service is capped at the maximum number of area codes in the United States, just over 200 (and growing slightly over the years of the sample). So the number only captures a difference between extremes, such as local and national ISPs. Prior to 1995, all ISPs were local, so the number does not really weight between ISPs until after 1995. Second, the interpretation of the area code variable changes when many facilities-based firms initiated programs to rent their backbones and modem banks to others, who could in turn offer access elsewhere in the country. This is especially common in the later years of our sample (1997–1999), rendering the area code variable almost meaningless as a measure of market share. That is, the footprint of area code coverage for many firms became disconnected from ownership of facilities. Hundreds of firms advertised a national footprint in 1998 and 1999, even small ISPs with only a few customers who had just entered service. We conclude, therefore, that the area codes provide equal weight of all ISPs prior to 1995 and a meaningless weight after 1997 and beyond. We
Pricing at the On-Ramp to the Internet
229
also conclude that the area code variable does not provide a consistent interpretation over time. Hence, we have abandoned the proposal of using number of area codes as a measure of market size or share. Another (and simpler) alternative is to weight the plans based on the connection speed offered. Such data is available from the Graphics, Visualization, and Usability (GVU) lab WWW surveys (Georgia Institute of Technology 1997). The GVU laboratory at the Georgia Institute of Technology has conducted a WWW users survey semiannually since January 1994. The surveys cover a broad range of topics but, one portion of the survey inquires about online services, Internet usage, and speed of connection to the Internet. The GVU has collected information on Internet connection speeds since 1995. Their data are shown in table 7.16. The split of plans between 28.8k and 56k in the 1997–1999 periods of the data set roughly mirrors the data in the GVU survey. Taking 28.8k and 33.6k to be equivalent speed plans, the comparative proportions are shown in table 7.17. The only substantial difference between our data and the GVU survey occurs in 1997. It appears that in 1997, the Boardwatch data overrepresent the prevalence of 56k connections by about 2.5 times. It is clear that the proportions in 1998 and 1999 are roughly the same. Because 1997 was not a year of dramatic change in measured contracting behavior, such as hourly limitations, the impact on a recalculated index is minimal.21 We also considered different schemes that alter the index more directly as a function of whether the ISP is young, old, exiting, or innovating with a new product or service. These issues delve into questions about the interaction between the changing industrial organization of this market and the pricing of firms, a topic on which we focus in our companion paper (Stranger and Greenstein 2004). 7.5 Conclusion Internet service providers are a necessary component of Internet infrastructure. They enable businesses and individuals to connect to the Internet. The earliest history for ISPs dates back to late 1992-early 1993. This paper investigates pricing trends in this nascent industry over the time period from 1993 to 1999, with attempts to incorporate adjustment for quality change. Using a new data set, we have computed a variety of price indexes, ranging in sophistication from very crude averages to quality-adjusted ones based on hedonic models. The results show decisively that ISP prices have been falling rapidly over time. The bulk of the price decline is in the early years of the sample, especially the period between early 1995 and spring of 1996, but a significant and steady decline continues throughout. We con21. Available from the author upon request.
April 1995 Oct. 1995 April 1996 Oct. 1996 April 1997 Oct. 1997 April 1998 Oct. 1998
Date
517 1,514 451 644 1,272 1,471 544 85
Unknown 402 140 32 32 42 17 11 2
14.4 2,930 3,407 1,106 1,579 1,393 324 243 37
14.4 810 2,822 1,749 4,291 4,584 1,368 1,558 349
28.8
GVU WWW survey data: Internet connection speeds
2,558 1,753 1,611 388
33.6 284 397 155 240 362 377 1,242 760
56 83 188 129 232 464 201 182 98
128 393 528 541 748 1,541 591 707 288
1mb
Source: Georgia Institute of Technology (1997). Note: Data presented is extracted from surveys 3–10 and represents counts of respondents from the United States only.
Jan. 1998 Jan. 1999
May 1996 Aug. 1996 March 1997
Jan. 1995
ISP
Table 7.16
138 234 77 120 280 102 124 71
4mb
806 995 133 150 276 117 133 47
10mb
84 156 29 50 112 44 47 82
45mb
6,447 10,381 4,402 8,086 12,884 6,365 6,402 2,207
Total
Pricing at the On-Ramp to the Internet Table 7.17
Comparison of Boardwatch to GVU plan speeds Boardwatch ISP data
Date March 1997 Jan. 1998 Jan. 1999
231
GVU survey data
28.8 plans
56k plans
Ratio
Date
3,367 3,972 2,562
446 1,554 3,006
7.55 2.56 0.85
April 1997 April 1998 Oct. 1998
28.8–33.6 plans
56k plans
Ratio
7,142 3,169 737
362 1,242 760
19.73 2.55 0.97
Source: Georgia Institute of Technology (1997).
clude that ignoring aspects of quality underestimates the price declines. It also alters the timing of the measured declines. We view this paper as only one small step in a much larger research enterprise, measuring the economic benefits from improvement in the Internet. During the latter half of the 1990s, the Internet underwent dramatic changes. The quality of what users got from the Internet skyrocketed. Over the next half decade, many users adopted the Internet who had never used it. Constructing a cost-of-living index for the user’s experience would face many challenges. Such an index would have to measure the change in the cost of living arising from the growth of the use of the Web, as well as the economic change in user experience from the rapid infusion of e-mail and browsing into everyday life. Not trivially, no price index could possibly accomplish that goal without accounting for changes in speed, changes in availability, changes in the quality of standard contract features, changes in reliability and other nonprice dimensions of use, changes in the size of the network effects, and other features of user experience.
References Augereau, Angelique, and Shane Greenstein. 2001. The need for speed in emerging communications markets: Upgrades to advanced technology at Internet service providers. International Journal of Industrial Organization 19:1085–1102. Berndt, Ernst R. 1991. The practice of econometrics—Classic and contemporary. Reading, MA: Addison-Wesley. Berndt, Ernst R., and Zvi Griliches. 1993. Price indexes for microcomputers: An exploratory study. In Price measurements and their uses, ed. M. Foss, M. Manser, and A. Young, 63–93. Studies in Income and Wealth, vol. 57. Chicago: University of Chicago Press. Berndt, Ernst R., Zvi Griliches, and Neal J. Rappaport. 1995. Econometric estimates of price indexes for personal computers in the 1990s. Journal of Econometrics 68:243–68.
232
Greg Stranger and Shane Greenstein
Berndt, Ernst R., and Neal J. Rappaport. 2001. Price and quality of desktop and mobile personal computers: A quarter century historical overview. American Economic Review 91 (2): 268–73. Boardwatch. 1996–1999. Directory of Internet service providers. Littleton, CO: Boardwatch. Clemente, Peter C. 1998. The state of the net: The new frontier. New York: McGrawHill. Cusumano, Michael, and David Yoffie. 1998. Competing on Internet time: Lessons from Netscape and its battle with Microsoft. New York: The Free Press. Downes, Tom, and Shane Greenstein. 2002. Universal access and local commercial Internet markets. Research Policy 31:1035–52. Diewert, Erwin. 1987. Index numbers. In The new Palgrave: A dictionary of economics. Vol. 2, ed. J. Eatwell, M. Milgate, and P. Newman, 767–80. London: Macmillan. Georgia Institute of Technology. 1997. GVU’s WWW user survey: Surveys 3–10. Graphics, Visualization and Usability Lab, Georgia Institute of Technology. http://www.cc.gatech.edu/gvu/user_surveys/. Goldfarb, Avi. 2004. Concentration in advertising-supported online markets: An empirical approach. Economics of Innovation and New Technology 13 (6): 581–94. Greenstein, Shane. 2000a. Building and developing the virtual world: The commercial Internet access market. Journal of Industrial Economics 48 (December): 373–90. ———. 2000b. Valuing the net: What determines prices for dial-up Internet access? Northwestern University, Working Paper. ———. 2001. Commercialization of the Internet: The interaction of public policy and private actions. In Innovation, policy, and the economy, ed. Adam Jaffe, Josh Lerner and Scott Stern, 151–86. Cambridge, MA: MIT Press. ———. 2002. Is the price right? The CPI for Internet access. A report for the Bureau of Economic Analysis. Washington, DC: Bureau of Economic Analysis. Griliches, Zvi. 1961. Hedonic price indexes for automobiles: An econometric analysis of quality change. In The price statistics of the federal government, 173–96. New York: National Bureau of Economic Research. Krol, Ed. 1992. The whole Internet. Sebastopol, CA: O’Reilly & Associates. ———. 1994. The whole Internet. 2nd ed. Sebastopol, CA: O’Reilly & Associates. Marine, April, Susan Kilpatrick, Vivian Neou, and Carol Ward. 1993. Internet: Getting started. Indianapolis, IN: Prentice Hall PTR. Meeker, Mary, and Chris Dupuy. 1996. The Internet report. New York: HarperCollins. National Telecommunications Information Administration. 2001. A nation online: How Americans are expanding their use of the Internet. http://www.ntia .doc.gov/reports.html. O’Donnell, Shawn. 2001. Broadband architectures, ISP business plans, and open access. In Communications policy in transition: The Internet and beyond, ed. Benjamin Compaine and Shane Greenstein, 35–58. Cambridge, MA: MIT Press. Pakes, Ariel. 2002. A reconsideration of hedonic price indices with an application to PCs. NBER Working Paper no. 8715. Cambridge, MA: National Bureau of Economic Research. Prud’homme, Marc, and Kam Yu. 1999. Towards an elementary price index for Internet services. Statistics Canada. Unpublished Manuscript. Raff, Daniel M. G., and Manuel Trajtenberg. 1997. Quality-adjusted prices for the American automobile industry: 1906–1940. In The economics of new goods, ed.
Pricing at the On-Ramp to the Internet
233
Timothy Bresnahan and Robert Gordon, 71–108. Chicago: University of Chicago Press. Schneider, Karen G. 1996. The Internet access cookbook. New York: NealSchuman. Stranger, Greg, and Shane Greenstein. 2004. Pricing in the shadow of firm turnover: ISPs in the 1990s. Northwestern University. Mimeograph. U.S. Department of Commerce. 2003. Statistical abstract of the United States. 123rd ed. Washington, DC: Government Printing Office.
8 Different Approaches to Estimating Hedonic Indexes Saeed Heravi and Mick Silver
8.1 Introduction Measurement bias in the U.S. Consumer Price Index (CPI) has been the subject of three major reports: the Stigler Committee (Stigler 1961); the Boskin Commission (Boskin et al. 1996); and the report by the Committee on National Statistics (2002), the Schultze panel. A major concern of all three reports was bias due to an inability of the CPI to incorporate properly the effects of changes in the quality of goods and services consumed. The primary mechanism in CPI methodology for controlling for the effects on price of quality changes is the matched-model method. A sample of items is selected in a price-reference (base) period, their prices are recorded in that period, and items are matched in subsequent periods so that the resulting price changes are untainted by quality changes; like is compared with like. Two sources of bias may arise with this method. The first is that the matched sample ignores the prices of unmatched varieties, particularly Saeed Heravi is a reader in the Quantitative Methods Group at Cardiff University. Mick Silver is a senior economist at the International Monetary Fund. Elements of this study are part of a wider study funded by the U.K. Office for National Statistics (ONS). We are grateful to the ONS for permission to reproduce some of this work in the form of this paper. This paper should not be reported as representing the views of the ONS or the International Monetary Fund (IMF), its executive board, or any of its member governments. The views expressed in this paper are those of the authors alone. We acknowledge the help of David Fenwick (ONS), Adrian Ball (ONS), and Pat Barr (GfK Marketing) in this respect. David Scott-Johnson (Bureau of Labor Statistics), W. Erwin Diewert (University of British Columbia), and Jack Triplett (Brookings) provided particularly helpful advice on earlier drafts as did Ernst Berndt (MIT), who also helped with drafting. We are also grateful for other comments received, including those from Ana Aizcorbe (Bureau of Economic Analysis), Timothy Erickson (Bureau of Labor Statistics), Matthew Shapiro (University of Michigan), and two reviewers.
235
236
Saeed Heravi and Mick Silver
new varieties introduced after the base period—what Triplett (2004) calls out-of-sample bias. As the matched sample becomes increasingly unrepresentative of the universe of varieties, such bias may increase (Silver and Heravi 2005). Griliches (1997) referred to the problem as being “too late”: [O]nce included in the [U.S.] CPI, a chosen model is not changed until it is rotated out (on average after five years in the sample) or until it disappears and has to be replaced. If old items had the same price history as new ones, this would not matter. But many durable goods, and some service providers whose market share is declining, do not reduce their prices. Rather, they exit. As a result observed price history is not representative of a more inclusive average price history. Also, the current rotation policy will miss a whole generation of items whose turnover is rapid, such as computer models. . . . The big problem is that the new models are rarely compared with the old: Because the CPI does not use hedonics for PCs, it has no way to evaluate and incorporate the implicit price decline due to the appearance, successively, of the 386, 486, and Pentium models. (Griliches 1997, 170) The second potential source of bias arises from the methods statistical offices use to estimate a continuing series of prices when items from the matched sample are no longer sold in subsequent periods. The price changes can be imputed by assuming they are the same as other goods in their class, or replacement items’ prices may be used with or without an explicit adjustment for any difference in its quality, depending on its perceived comparability. If such assumptions or adjustments are wrong, this gives rise to Triplett’s in-sample bias (Triplett 2004). Hedonic regressions have been considered best suited for quality adjustments by the Stigler, Boskin, and Schultze reports, though a more cautious stance was taken by the latter: Hedonic techniques currently offer the most promising approach for explicitly adjusting observed prices to account for changing product quality. But our analysis suggests that there are still substantial unresolved econometric, data, and other measurement issues that need further attention. (Committee on National Statistics 2002, 6) This paper examines alternative approaches to the use of hedonic indexes for CPI measurement in dynamic markets to explicitly adjust for insample and out-of-sample bias when both matched and unmatched data are used. Such indexes are distinguished in section 8.2.2 from hedonic adjustments to noncomparable replacements that only dip into the universe of models when a model is unavailable for matching. Hedonic indexes use a sample of all prices in the periods under comparison. The need for an evaluation of the methods for hedonic indexes requires emphasis. Many product markets are highly differentiated by brand and characteristics with rapid turnover of models. Monitoring the matched prices of, for ex-
Different Approaches to Estimating Hedonic Indexes
237
ample, near obsolete models of personal computers (PCs) while ignoring new models is unsuitable. Hedonic indexes are based on (representative samples of) prices of models in each period, some of which will be matched, but some will reflect the dynamic nature of the market. There are a host of such methods, and this paper contributes to the evidence on their nature and how they differ. This study examines alternative methods for dealing with situations in which the matched-models method breaks down. The broad nature of hedonic indexes is outlined in section 8.2. Section 8.3 outlines thirty-six alternative methods and discusses their relative merits. These methods fall under three general approaches: (a) Hedonic imputation (HI) indexes,1 which rely on parameter instability for the measurement of price changes; (b) dummy time (variable) hedonic (DTH) indexes, which paradoxically constrain parameters between the periods to be the same; and (c) fixed effects model (FE) indexes, which are similar to DTH indexes but use dummy variables for individual models, as opposed to their characteristics, to control for quality changes. The breadth of the empirical work allows us to consider a number of research questions. We comment on the use of chaining, weighting, arithmetic versus geometric aggregation, parameter instability, base-current period spread, and the differences between and relative merits of the three approaches. Research issues and formulas are summarized in section 8.4. Section 8.5 outlines the data for the study: monthly scanner data for three electrical consumer durables: washing machines, vacuum cleaners, and dishwashers. The data includes details of prices, sales and quality characteristics on about 43,000 observations representing over 10 million transactions. Section 8.6 discusses the results from the thirty-six measures for three products over two years, that is, 216 resulting index numbers. It employs a meta-analysis of this data to better establish the patterns from employing different index number formulations. This extends to an analysis of the spread of base to current-period HI indexes, an issue of interest given a recommendation by Pakes (2003) for the use of “Paasche-type” current period HI indexes that require hedonic estimates in only the base period. 8.2 The Hedonic Approach 8.2.1 Theory The hedonic approach involves the estimation of the implicit, shadow prices of the quality characteristics of a product. A set of (zk 1, . . . , K ) price-determining characteristics of the models is identified and data over 1. Also referred to as “characteristic price indexes” (Triplett 2004).
238
Saeed Heravi and Mick Silver
i 1, . . . , N models are collected. An hedonic regression equation of the price of model i, pi , on its set of quality characteristics zki is given by K
(1)
ln pi 0 ∑ k zki εi h(zi ) εi . k1
The k are estimates of the marginal valuations the data ascribe to each characteristic. Rosen (1974) showed that they can be equated in economic theory to a mapping of the equilibriums in characteristic space of production possibility curves and indifference curves of specific distributions of optimizing consumers and producers with respective varying tastes and technologies. Rosen (1974), Griliches (1988), Triplett (1988), and Pakes (2003)2 have argued that the derivatives of a hedonic regression should not be interpreted as either one of estimates of willingness to pay derivatives or cost derivatives, but arise from equilibriums processes (though see Diewert [2003] for a demand-based framework). Griliches (1988, 120) noted that: My own view is that what the hedonic approach tries to do is to estimate aspects of the budget constraint facing consumers, allowing thereby the estimation of “missing” prices when quality changes. It is not in the business of estimating utility functions per se, though it can also be useful for these purposes. . . . What is being estimated is the actual locus of intersection of the demand curves of different consumers with varying tastes and the supply curves of different producers with possible varying technologies of production. One is unlikely, therefore to be able to recover the underlying utility and cost functions from such data alone, except in very special circumstances. Nerlove (2001) commented that Griliches and others would have gotten nowhere if they had paid careful attention to the formidable identification problems. Griliches justified his continued use of hedonic regressions in the light of the ambiguity in the interpretation of the coefficients from hedonic regressions on pragmatic grounds: Despite the theoretical proofs to the contrary, the Consumer Price Index (CPI) “exists” and is even of some use. It is thus of some value to attempt to improve it even if perfection is unattainable. What the hedonic approach attempted was to provide a tool for estimating “missing” prices, prices of bundles not observed in the original or later periods. It did not pretend to dispose of the question of whether various observed differentials are demand or supply determined, how the observed variety of the 2. Pakes (2003) identified the hedonic price function as the sum of the marginal cost function and a function that summarizes the relationship between mark ups and characteristics. The coefficients can thus change when the characteristics of products or the distribution of consumer preferences change. Two implications arise: coefficients may change over time— new products will be directed to parts of characteristic space where mark ups had been high driving down the mark up—thus being “unstable.” Second, and more contentiously, Pakes argues that there is no reason to expect the coefficients to be positive on desirable characteristics.
Different Approaches to Estimating Hedonic Indexes
239
models in the market is generated, and whether the resulting indexes have an unambiguous welfare interpretation. Its goals were modest . . . All of this has an air of “measurement without theory” about it, but one should remember the limited aspirations of the hedonic approach and not confuse it with attempts to provide a complete structural explanation of the events in a particular market. (Ohta and Griliches 1975, 326–27) The application of hedonic regression to automobile prices in Griliches (1961, 1964) and Adelman and Griliches (1961) revived the hedonic approach to the construction of price indexes. He revisited hedonic price index number methodology in Griliches (1971) with some notes on the state of the art. This early paper foreshadowed many of the current issues of concern and the empirical subject matter of this study.3 He recognized the need for weighting in regression estimates: Most of the analyses have used unweighted data on models, specifications, and prices. But at any point of time some manufacturers may offer models with characteristics in undesirable combinations and at “unrealistic” (from the consumer’s point of view) relative prices. Such models will not sell very well and hence should also not be allowed to influence our argument greatly. There is no good argument except simplicity for the one-vote-per-model approach to regression analysis. (Griliches 1971, 325) The argument extended to the time dummy approach “But even here, we should use a weighted regression approach, since we are interested in an estimate of a weighted average of the pure-price change . . .” (Griliches 1971, 326). He noted further the need to investigate the empirical form of the relationship, commenting on the preferred use of semi-logarithmic form. He also drew attention to the relative merits of the two main approaches to hedonic indexes, the hedonic imputation (or characteristic price index) and the dummy time variable index. The former comes in many forms depending on the choice of which period’s basket of characteristics are held constant. Using base- and current-period characteristic baskets may well generate different results. He referred to this “Laspeyres-Paasche problem” and advocated the use of chaining to ameliorate such differences. Griliches (1971) also drew attention to a sample selectivity problem in such hedonic index number construction. By using constant base-period characteristics, “new” models that exist in the current period but not in the base period are excluded. Similarly, by using constant current-period characteristics, “old” models that exist in the base period but not in the current pe3. Also considered in the paper were the use of second-hand market prices, a subject of subsequent research in Berndt and Griliches (1993) and Berndt, Griliches, and Rappaport (1995), the importance of utility theory in quality adjustments for price-level measurement and the identification problem with regard to supply and demand of characteristics.
240
Saeed Heravi and Mick Silver
riod are excluded.4 He contrasted this index number approach with the time dummy variable hedonic index method and expressed concern about the latter. First, it constrains the estimated parameters on the characteristics to be the same. Second, it is not well articulated with the rest of the index number literature, and, finally, it is subject to the vagaries of sample selection due to a comparability problem with the models available in each of the periods compared. This study picks up on these self-same issues, albeit nearly forty years hence: weighting, arithmetic versus geometric aggregator, current- to base-period spread, chaining, and the time dummy versus HI method. His empirical work over the years did not of course neglect such issues. More recent examples include Berndt and Griliches (1993) and Berndt, Griliches, and Rappaport (1995) who showed how price indexes for computers can be constructed in various ways to give different results. They considered alternative specifications of age, time, and vintage effects in hedonic regressions and the interpretation of their coefficients and showed that estimated quality-adjusted price indexes based on these varied specifications gave different answers. They also experimented with formulations that took account of the information on quantities to estimate Laspeyres, Paasche, and (Törnqvist) Divisia-type indexes. Much empirical evidence was also provided on the instability of parameter estimates, something we argue in the following and in more detail in Silver and Heravi (2007), that should influence the choice of method. This study continues this tradition of experimenting with different formulations of hedonic indexes. It is necessary to show whether choice of method does matter, that is, whether different measures provide substantively different results. We also note that something can be said about the choice of preferred method: that weighted hedonic indexes are preferred to unweighted ones, that symmetric weights of a superlative form are preferred to asymmetric ones, that chained hedonic indexes reduce spread and are preferred when prices and quantity changes are relatively stable, with issues relating to the choice between the time dummy variable method and HI method also being considered in the following and more fully in Silver and Heravi (2007). Further discussions of econometric issues and examples of empirical work can be found in Cole et al. (1986), Dulberger (1989), Gordon (1990), Griliches (1990), Triplett (1990), Arguea, Hsiao, and Taylor (1994), Silver and Heravi (2001), Kokoski, Waehrer, and Rozaklis (2001), Diewert (2002), Pakes (2003), and Triplett (2004). 8.2.2 Alternative Methods and the Scope of the Study Statistical offices use the matched-models method for CPI measurement, whereby price collectors select a sample of models in a price refer4. However, a (geometric) mean of the two estimates would include all data.
Different Approaches to Estimating Hedonic Indexes
241
ence period 0 and then continue to collect prices of these same matched models in subsequent periods so that the prices of like are compared with like. When a model is missing because it is obsolete, the price collector may find a replacement of a comparable quality, in which case a direct price comparison may be made. If the replacement model is not directly comparable in quality, then the coefficients (or predicted value) from a hedonic regression may be used to make a quality adjustment so that the old and new (noncomparable) unmatched prices can be compared. Silver and Heravi (2003a) refer to this as patching and Pakes (2003) as hybrid indexes. However, patching can only make use of data outside of the matched sample when an item is missing. It may be that several new varieties are introduced in a month when there are few, if any, replacements. The likely atypical price changes of the new varieties will be ignored with patching. In dynamic markets with a high rate of model turnover, such as personal computers, there is a need to resample each month the models sold if the index is to cover a representative sample of what is purchased. The concern of hedonic indexes is to ensure changes in the average quality of the models purchased do not taint measured changes in their average price. Of course chain-linked, Divisia, matched indexes would incorporate some of the dynamic changes in the prices of goods of different qualities, but the hedonic approach was “more willing to carry the ‘linking’ idea further, across models that differed significantly in more than one dimension” (Griliches 1990, 191). 8.3 The Methods In this section we outline thirty-six hedonic index methods. Their purpose is the same: to measure the aggregate change in price between period 0 and period t of models of a product sold, all of which may not be comparable in quality between these periods. Indeed some old models may only exist in period 0 and new ones only in period t. The hedonic index methods outlined differ in many ways. In sections 8.3.1 to 8.3.5, the methods outlined are hedonic imputation (HI) methods in which estimates of the prices in the two periods of a constant basket of characteristics are used. For example, in the numerator of equation (2) are the estimated (predicted) prices using the coefficients from a hedonic regression estimated in period t, but applied to a period 0 set of characteristics, ht(z 0i ). In the denominator are the estimated prices using the coefficients from a regression estimated in period 0, but again applied to a period 0 set of characteristics, h0(z 0i ). All that changes in the comparison is the estimated implicit price coefficients; the bundle of characteristics compared remains the same—constant period 0 ones. Equation (3) is similar to equation (2), but holds period t characteristics constant, and equation (4) is the geometric mean of the two. Equations (1) to (4) use unweighted geometric means of hedonic prices
242
Saeed Heravi and Mick Silver
in the numerator and denominator. In section 8.3.2, their weighted, by expenditure share, counterparts are outlined. Of course, an alternative approach is to calculate unweighted and weighted hedonic indexes using arithmetic means of price relatives rather than geometric means, and these arithmetic formulations are outlined in sections 8.3.3 and 8.3.4, respectively. Section 8.3.5 outlines HI indexes that instead of holding either period 0 or period t characteristics constant, hold a mean function of these characteristics constant. Section 8.3.6 outlines a quite different approach, the DTH index, that uses data for the two time periods compared in a single hedonic regression estimate. It includes a dummy variable for the time period and the coefficient on the dummy variable is an estimate of the price change untainted by quality changes. In section 8.3.6, a fixed effects estimator is used in which a dummy variable is included for each model, rather than variables on its quality characteristics. The HI, DTH, and fixed effects estimator can take chained or direct fixed base or can be fully constrained. In the following, we outline these methods in further detail. 8.3.1 HI Indexes—Unweighted Geometric Means The first approach is the HI method, which has the same formulation as equation (1), that is, separate hedonic regressions of the (log of price) of model i on its quality characteristics zki are estimated for the base and current periods. The coefficients from these regressions (h0 and ht ) are estimates of the implicit prices of the quality characteristics (z 0 and z t ) for the base and current periods and will then be used to calculate the HI indexes. Four methods are outlined here. Base- and current-period direct HI indexes (each requiring hedonic regressions in both periods), a geometric mean of the two, and an indirect current period hedonic index requiring only a base-period regression. All methods outlined here use geometric means. The Bureau of Labor Statistics (BLS) uses the geometric mean— the Jevons index—at this elementary aggregation for much of the U.S. CPI (Dalton, Greenlees, and Stewart 1998). A semilogarithmic formulation of the DTH method is used in the following, which is consistent with a geometric mean.5 The unweighted geometric (Jevons) hedonic base-period index holds base period 0 characteristics constant under both base- and current-period prices. Consider a semilogarithmic hedonic function pˆ 0i h0(z 0i ) where pˆ i0 are estimated prices (excluding εi in equation [1]) in period 0 with period 0 quality characteristics and N 0 observations using equation (1). The resulting unweighted (or, more precisely, equally weighted) Jevons hedonic baseperiod index, PJHB , is given by 5. Silver (2002) has shown that influence effects in the regression of outliers may distort the representativity of such indexes.
Different Approaches to Estimating Hedonic Indexes
Ni1 ht(z 0i )1/N Ni1 ht(z 0i )1/N PJHB . 1/N 1/N Ni1 h0(z 0i ) Ni1 pˆ 0i 0
(2)
243
0
0
0
0
0
0
0
It is a hedonic price comparison because the characteristics are held constant and a base-period one because they are held constant in this period. Some authors refer to this as Laspeyres or Laspeyres-type index. The terminology is misleading as weights have yet to be applied, and these weights may be current or base period. We refer to base- or current-period HI indexes when the characteristic set being valued is a base- or current-period one. Consider the first term of equation (2). The prices in equation (2) can be considered as those predicted from a period 0 bundle of characteristics using both period t and period 0 hedonic equations and then compared. The denominator is the geometric mean of predicted prices in period 0. The numerator is hypothetical: it is the geometric mean of prices of tied bundles of period 0 characteristics evaluated at the characteristic prices estimated in period t. Of course a utility maximizing consumer in period t would not purchase a period 0 bundle of characteristics, but choose more of those characteristics whose relative prices had fallen. The base-period HI index thus overstates, or is an upper bound on, its true theoretical cost-of-living index (COLI) as by measuring the cost of a fixed base period basket of characteristics, it does not allow for consumers substituting toward items or characteristics with below-average price changes.6 Consumers are not going to be worse off under a base-period imputation because they can always substitute away from the base-period bundle of characteristics and may be better off from doing so. The unweighted geometric (Jevons) hedonic current-period index with constant current period characteristics is given by 1/N Ni1 ht(z ti )1/N Ni1 pˆ ti PJHC 1/N 1/N . Ni1 h0(z ti ) Ni1 h0(zti ) t
(3)
t
t
t
t
t
t
t
What is apparent from the first terms of equations (2) and (3) is that parameter instability is the essence of quality-adjusted price change measurement using HI indexes. All that changes are the estimated coefficients. It is also apparent from equation (3) that by holding the basket of characteristics constant in the current period t, the hedonic imputation will give too little emphasis to above-average price changes of characteristics. It will understate its theoretical COLI, while equation (2) will overstate it. 6. Hedonic base-period indexes are defined in economic theory as the ratio of the minimum expenditures required to maintain a base-period level of utility when the consumer faces pt and pt–1 prices and tied bundles of quality characteristics zt and zt–1 (Triplett 1988; Feenstra 1995; Diewert 2002).
244
Saeed Heravi and Mick Silver
The geometric mean of base- and current-period HI indexes are argued by Diewert (2002) to be a suitable symmetric mean in this (and many other) contexts: PJHBC P JHB PJ HC
(4)
The indirect current-period HI method is calculated as a Jevons hedonic imputed quantity index divided into an index measuring the change in price to derive a Jevons hedonic current-period indirect price index: 1/N Ni1 h0(z ti )1/N Ni1 ht(z ti )1/N Ni1 p ti 1/N 1/N 1/N P PJHCI JHC Ni1 p 0i Ni1 h0(z 0i ) Ni1 h0(z ti ) t
(5)
0
t
0
t
0
t
0
t
t
t
t
Equation (5) is, of course, equivalent to equation (3) and is used by some statistical offices on the assumption that the geometric mean of predicted prices is equal to that of actual ones. It is worth noting that what we are trying to achieve is to bring into the calculation models of different quality; more specifically, old models available in period 0, but not period t, and new models available in period t, but not in period 0. Assume the sample is matched, z 0i z ti and N t N 0 N. In this case, for equations (2) and (3), and similarly for other base- and current-period indexes: 1/N
(6)
1/N
Ni1 ht(z it) Ni1 p ti 1/N 1/N PJHB PJHC Ni1 h0(z 0i ) Ni1 p 0i
That is, the hedonic base- and current-period indexes for the matched samples of items with identical characteristics require no quality adjustment; they are the ratio of average prices (or average of relatives).7 Our problem is because samples are not matched. 8.3.2 HI Indexes—Weighted Geometric Means Equations (2), (3), (4), and (5) are unweighted indexes. In the compilation of a CPI, weights are not used at the lowest level of aggregation, say, for individual models of washing machines, due to lack of data on expenditure shares though Balk (2002) has argued that they may be implicit in the sample design. However, it is axiomatic that were data on expenditure shares available, they should be used to weight the price changes. Because scanner data provides current- and base-period expenditure share weights (s ti and s 0i ) and allows regressions to be run on current- and base-period 7. Diewert (2002) establishes similar results for weighted versions of these indexes. It can be argued that there is a bias in the estimator for a geometric mean of predicted prices, and the mean of predicted prices may not be equal to one of actual prices. However, for practical purposes and of exposition, we assume here that they cancel.
Different Approaches to Estimating Hedonic Indexes
245
data, the unweighted HI indexes can be compared with their weighted counterparts. Because equations (2) and (3) are ratios of geometric means, their weighted counterparts use a geometric aggregator for consistency so that the effects of weights can be determined without being confused by functional form. The weighted indexes in the base and current periods are geometric Laspeyres and geometric Paasche indexes, and these are applied to the Jevons base- and current-period HI indexes, respectively: The geometric Laspeyres base-period hedonic index,
Ni1 ht(z 0t )s i PHB-GLas 0 ; 0 Ni1 h 0(z0i ) s i 0
(7)
0
the geometric Paasche current-period hedonic index,
Ni1 ht(z ti )s i PHC-GPas t ; t Ni1 h0(z it ) s i t
(8)
t
and the counterpart to equation (4), the Törnqvist HI index, (9)
PHBC-Törnq P HB-GL as PHC-G. Pas
8.3.3 HI Indexes—Unweighted Arithmetic Means of Relatives We compare the unweighted and weighted geometric aggregators in equations (2) to (9) with their arithmetic relatives counterparts:8 The unweighted arithmetic (Carli) hedonic base-period index, t 0 0 h (z i ) ∑ Ni1 h0(z 0i ) (10) PCaHB ; N0 the unweighted arithmetic (Carli) hedonic current-period index, t t t h (z i ) ∑ Ni1 h0(z ti ) (11) PCaHC ; Nt and the geometric mean of unweighted Carli hedonic base- and currentperiod indexes, (12)
PCaHBC-GM (PCaHB PCaHC )1/2.
8. There is a further set of arithmetic hedonic indexes based on the ratio of arithmetic averages, that is, Dutot hedonic indexes as there are other formulations including harmonic mean hedonic indexes not considered here. For a Carli hedonic-base index, separate (linear) hedonic N N N estimates would be required for each period as PCaHB Σi1 h ti (z 0i )/h0i (z 0i ) Σi1 hti (z 0i )/pˆ 0i Σi1 h ti (z 0i )/p0i . Diewert (2002) and Silver and Heravi (2003a) have argued that pˆ 0i should be used and not p0i since any misspecification error that removes a price from the hedonic surface would then be included in the numerator, but not in the denominator, thus leading to bias.
246
Saeed Heravi and Mick Silver
8.3.4 HI Indexes—Weighted Arithmetic Means of Relatives We can also compare the weighted geometric indexes in section 8.2.2, equations (7) to (9), to their arithmetic counterparts: The Laspeyres hedonic base-period index, N0
(13)
ht(z 0i ) PHB-Las ∑ s 0i ; h0(z 0i ) i1
the Paasche hedonic current-period index, Nt
(14)
ht(z ti ) PHC-Pas ∑ sti ; h0(z ti ) i1
and the Fisher hedonic index, (15)
PHBC-Fisher P HB-La sPHC-Pas.
As regards the preferred weighting, s i0 or s it , both baskets and indexes are equally justifiable from a conceptual point of view. Laspeyres (s i0) is widely used for the pragmatic reason that base-period expenditure weights are readily available. Laspeyres is likely to overstate price changes because its fixed base-period weights do not reflect the substitution of items with below-average price increases for those with above-average price increases. Similarly, Paasche understates its theoretical COLI counterpart. However, there exists a class of superlative indexes, to which the Fisher and Törnqvist (equations [9] and [15]) indexes belong, that use symmetric averages of both base- and current-period quantity information (Diewert 1990). Such indexes do not suffer from substitution bias and, moreover, can also be justified from an axiomatic and average fixedbasket approach (Diewert 1997). Following Boskin et al. (1996), the BLS introduced a trailing Fisher index in recognition of its superiority as a measure of a COLI.9 A “trailing” index is one not computed in real time because there is a time lag until the necessary information (current-period weights) are gathered. Yet once computed, it is useful in establishing the magnitude and direction of any difference between it and the index computed in real time. All of the preceding methods can be used as fixed- or chained-base indexes. A fixed-base Laspeyres HI index, for example, would compare prices in the base period 0 and current period t, while a chained version would form binary links between succeeding periods combining them using successive multiplication. 9. Note that the Schultze panel could not reach agreement as to whether COLI or a costof-goods index (COGI) should be the preferred target (Committee on National Statistics 2002).
Different Approaches to Estimating Hedonic Indexes
247
8.3.5 Mean Value Function for Hedonic Indexes A constant-characteristics HI index may be based on a mean value of the base- and current-period characteristics, say zi (z 0i z ti )1/2. In such a case, equations (2) and (3) would become10 1/N
(16)
1/N
Ni1 ht(zi ) Ni1 p ti 1/N 1/N , Ni1 h0(zi ) Ni1 p 0i
but this would only hold for matched samples. If models exist in period t but not in 0, and vice versa, then the left-hand side of equation (16) is a hybrid measure, the matched items being evaluated at zi while the unmatched ones may be at z 0i or z ti in the denominator and numerator, respectively. The equality in equation (16) would then not hold.11 8.3.6 Dummy Time Hedonic (DTH) Indexes A second approach is the DTH variable method that, as with HI indexes, does not require a matched sample.12 The formulation is similar to equation (1) except that a single regression is estimated on the data in the two time periods compared, i ∈ N t ∩ N 0, the equation also including a dummy variable Dt being 1 in period t, zero otherwise: K
(17)
ln p ti 0 1D t ∑ ∗k z tki ε ti k1
The coefficient 1 is an estimate of the quality-adjusted price change between period 0 and period t. Specifically, it is an estimate of the change in (the logarithm of) price, having controlled for the effects of variation in quality via ΣKk1 ∗k zkit . The ∗k coefficients are each constrained to be the same over periods 0 and t. Three versions of equation (17) are considered for both weighted and unweighted indexes. The weighted versions use a weighted least squares (WLS) estimator, with the weights being expenditure shares. Diewert (2002) shows the form the weights should take for the estimates to correspond to particular index numbers, and Silver (2002) shows how observations with undue influence affect the “representativity” of the weights. 10. Similarly defined unweighted arithmetic and weighted-geometric and weightedarithmetic baskets of characteristics can be defined. These can be placed in similarly defined unweighted and weighted HI indexes, (akin to Walsh and Marshall-Edgeworth formulas). An HI index is a family of indexes and can be defined for any average basket, such indexes differing from averages of the base- and current-period indexes. 11. We can conceive of a measure that extrapolates z ti or z0i using zj /ztj or zj /z0j for j matched items expected to have similar changes in characteristics. 12. See de Haan (2003) for a variant that uses matched data when available and the time dummy only for unmatched data—his double imputation method.
248
Saeed Heravi and Mick Silver
A fixed-base dummy (time) variable hedonic regression comparing January with December, for example, would use data only for these two months, the coefficient on the dummy variable taking a value of 1 in December and 0 in January. A rolling, chained-base dummy (time) variable hedonic regression for a January–December comparison would estimate separate fixed-base dummy variable indexes for the January–February index, the February– March index, the March–April index, . . . , the November–December index, and combine these “links” by successive multiplication. A fully constrained dummy (time) variable hedonic regression is a single constrained regression for, say, January to December with dummy variables for each month, though this is impractical in real time as it requires data on future observations. The regressions constrain each of the quality k coefficients to be the same across months. In restricting the slopes to be the same, the (log of the) price change between, say, periods 0 and t can be measured at any value of z. Bear in mind the HI indexes outlined in the preceding sections estimate the difference between price surfaces with different slopes. As such, the estimates have to be conditioned on particular values of z, which gives rise to the two estimates considered: the base hedonic imputation using z0 and the current-period HI using z t. For the DTH method, the very core of the method is to constrain the slope coefficients to be the same, so there is no need to condition on particular values of z. The estimate usefully and implicitly makes symmetric use of base- and current-period data.13 8.3.7 Fixed Effects (Panel) Estimator A fixed effects dummy (time) period regression (suggested by Diewert 2003) treats the data as if it were panel data—the observations are on cross sections of models over time. The regression equation effectively has on its right-hand side the usual dummy variables for time, but also dummy variables for each (but one reference) model in any month instead of the quality characteristics, thus allowing us to control more directly for model heterogeneity (see Aizcorbe [2003] for an application). The specification of 13. It is worth noting that Pakes (2003) is critical of the method on this very ground as he considers a proper index to be one that is an (upper) bound on the true price index, rather than an estimate of it. He argues that the coefficients might be expected to be unstable over time, and, thus, restricting the coefficients to be the same does not provide an estimate that is an (upper) bound. Yet it is well accepted that neither Laspeyres nor Paasche are conceptually superior, and a Fisher or other superlative index is preferable. The Paasche only has an advantage because it requires a single base-period hedonic equation to be estimated. But this is not only conceptually unjustified, it is inconsistent with the base Laspeyres formulation used. A Paasche imputation is neither a fixed-base period COGI, which forms the conceptual basis of many European CPIs, nor a good approximation to a Fisher COLI index, which is the conceptual base for the U.S. CPI.
Different Approaches to Estimating Hedonic Indexes
249
such a model would require a large number of dummy variables, and, to ease the computation, statistical software employ an equivalent, but much simpler, procedure. Each variable for model i in period t is subtracted from its mean over all periods t. The price deviations ( p ti – pi ) for each model are regressed on the deviations of the explanatory dummy variables (x ti – x i ) for each model in each period t with an adjustment for degrees of freedom (Davidson and Mackinnon 1993, 323). The fixed effect panel estimator is effectively based on data of deviations of price and deviations of the dummy variables on the models from their respective means, for a model over time. Fixed-base–fixed effects indexes are estimated comparing, for example, January data directly with December for a December index based on January. However, if a model is unmatched in either month, its price p ti subtracted from pi , and its dummy explanatory variable, is 0. The estimator effectively estimates indexes for only matched data. For an index that compares January with December, a large number of models will not be available in December (January) that were in January (December). As shown by Silver and Heravi (2001), less data is lost in the matching if chained indexes are estimated. The fixed-base, fixed effects estimator is effectively the matched-models estimator. White, Berndt, and Monroe (2004) expand on this in some detail. They call this matched-model econometrics. A chained-base fixed effects index compares January with February, February with March, . . . November with December, the results being combined by successive multiplication. A chained fixed effect index would not necessarily include all of the data but is likely to include very much more than a fixed-base one. For example, for a model available from January to March, the chained index for April would include its price change for the January to February and February to March links, but exclude it for the March to April link. The direct fixed-based index for January to April would exclude it (Silver and Heravi 2005). A fully constrained fixed effects index may well utilize more data than the chained version as models may appear and reappear in subsequent periods, allowing ( p ti – pi ) 0 slightly more frequently. 8.4 Research Methods and Issues 8.4.1 Methods and Research Questions Table 8.1 summarizes the formulas used in this study for all three approaches. The following are the research questions: 1. Is the spread of the base- to current-period HI index (say PJHB to PJHC ) large? If so, neither a current-period HI index nor a base-period HI index
250 Table 8.1
Saeed Heravi and Mick Silver Alternative formulas for hedonic indexes Direct fixed base
Chained base
HI indexes Unweighted geometric Jevons hedonic base imputation Jevons hedonic current imputation Geometric mean of above
PJHB PJHC PJHBC P JHBPJHC
CPJHB CPJHC CPJHBC
Weighted geometric Geometric-Laspeyres—hedonic base imputation Geometric-Paasche—hedonic current imputation Törnqvist geo-mean—hedonic base/current imputation
PHB–GLas PHC–GPas PHBC–Törnq
CPHB–GLas CPHC–GPas CPHBC–Törnq
Unweighted arithmetic Carli hedonic base imputation Carli hedonic current imputation Geometric mean of above
PCaHB PCaHC PCaHBC–GM
CPCaHB CPCaHC CPCaHBC–GM
Weighted arithmetic Laspeyres hedonic base imputation Paasche hedonic current imputation Geometric mean of above (Fisher)
PHB–Las PHC–Pas PHBC–F
CPHB–Las CPHC–Pas CPHBC–F
Unweighted Unweighted binary comparisons Unweighted fully constrained
PTD PTD–FC
CPTD
Weighted Weighted binary comparisons Weighted fully constrained
PTDW PTDW–FC
CPTDW
Unweighted Unweighted binary comparisons Unweighted fully constrained
PFE PFE–FC
CPFE
Weighted Weighted binary comparisons Weighted fully constrained
PFEW PFEW–FC
CPFEW
Time dummy variable
Fixed effects
by itself is justifiable,14 and a symmetric average of the two is more appropriate. 2. Does chaining minimize the spread? 3. Does weighting matter? 4. Does the use of a geometric aggregator over an arithmetic one matter? 5. What governs the base- to current-period hedonic spread? 14. Both current-period and base-period HI indexes are equally justified. The former uses current period characteristics and the latter base-period characteristics and neither can be said to be right or wrong. The issue of choice only matters if the results—the spread—is large, in which case a symmetric average of the two should be used.
Different Approaches to Estimating Hedonic Indexes
251
6. Are the results from the DTH approach similar to those of the HI approach? 7. Does weighting for the DTH approach matter? 8. What benefits, if any, are there from using a fixed effects (panel) estimator, and how do the preceding results compare with matched-model indexes? 8.4.2 Formula Choice, Changes in the Characteristic Mix, and Parameter Instability In General We take it as axiomatic that weighted indexes are preferred to unweighted ones. Indexes that make symmetric use of information are preferred to those that do not (Diewert 1997). So for weighted HI indexes, Fisher (Törnqvist) is preferred to Laspeyres and Paasche (geometric), and for unweighted hedonic indexes, geometric means of base- and currentperiod HI indexes are preferred to their constituent elements. It is apparent from equations (2) and (3), and similar such formulas, that such differences are primarily dictated by the extent to which the characteristics change over time, that is (z 0i – z ti ). But the further hedonic base- and current-period estimates are apart, the less justifiable is the use of an individual estimate and the less faith there is in a compromise geometric mean.15 For unweighted indexes, a geometric mean (Jevons) is preferred to an arithmetic mean (Carli) of price relatives. The latter is upward biased in its failure of the time reversal test, while the former can be justified under the more reasonable assumption of unitary elasticity and sampling with probability proportionate to expenditure shares (Dalton, Greenlees, and Stewart 1998; Balk 2002). Chained-base indexes are preferred to fixed-base ones, especially when samples degrade rapidly and spread is reduced. Some caution is advised when prices “bounce” as chained indexes can drift (Forsyth and Fowler 1981; Szulc 1983). We consider in the following the relative merits of HI indexes as against DTH indexes, though note here that the equivalence of the fixed (panel) effect method to matched data makes it less desirable compared with HI and DTH indexes that use all the data (Silver and Heravi 2005). On Parameter Stability and HI Indexes The issue of parameter stability has been raised as an area of concern to the application of hedonic indexes. There is some empirical evidence of such instability. Berndt and Rappaport (2001) found, for example, from 1987 to 1999 for desktop PCs, the null hypothesis of adjacent-year equality to be 15. As an estimate of a COLI index the spread is irrelevant as the need is to include substitution effects and Fisher meets this need. However, Laspeyres and Paasche answer meaningful questions and act as bounds on models of economic behavior that different consumers might pursue. The Fisher estimate with less dispersion is more satisfactory.
252
Saeed Heravi and Mick Silver
rejected in all but one case. And for mobile PCs, the null hypothesis of parameter stability was rejected in eight of the twelve adjacent-year comparisons. Stability tests can also be undertaken within product areas (Berndt and Rappaport [2001] compared and found parameter instability between mobile and desktop PCs) and across countries (Heravi, Heston, and Silver [2003] tested and were unable to reject parameter stability for cross-country price). Aizcorbe (2003) showed for a study of Intel’s microprocessor chips the parameters to be unstable over time (annual data 1993–1999) and the use of different period’s constrained parameters to lead in some periods to quite different indexes, though the parameters used were estimated from data that extended outside of the periods of the price comparisons. This would argue for our only constraining parameters within the sample comparison, unlike the fully constrained model outlined in section 8.3.6. It should be noted that parameter instability, that is, from hi0(z i0 ) to hit(z i0 ) in, for example, equation (2), is the essence of the measure of price change; it is not the cause of spread. The cause of spread between equations (2) and (3) is the change in characteristic values. If the coefficients were stable, there would be no price change in either equation (2) or (3). The HI method allows them to be unstable to enable price change measurement. Yet it has entered the debate for a specific reason. Pakes (2003) had as his target index a base-period one and, using quarterly data on PCs between 1995 and 1999, found very slight differences between base- and currentperiod hedonic indexes. He concluded that it might be reasonable to use a current-period HI index for initial price index publications by government statistical agencies.16 Parameter instability is thus identified as a problem for one-sided bound estimation on the grounds that were the base-period slope and intercept parameters stable, they could serve as current-period estimates. But were slope and intercept parameters stable, the basis of the measure would have no useful meaning—it would denote no price change.17 Thus, for a target index that uses an average of baseand current-period information, we caution against the use of either estimate alone if the spread is large, something dictated by the change in characteristics. Dummy Time Hedonic Index Compared with HI Indexes While the change in the slope coefficients is the essence of price measurement for HI indexes, the DTH method paradoxically constrains these slope coefficients from the two periods to be the same. The problem with 16. Pakes (2003) also found evidence of severe instability for PCs with the null of equality of coefficients for a general model in which the fourth year, when Pentium II was launched, was constrained to be the same as the preceding three years being rejected with a 2 of 61,000 for 18 df. 17. Of course if only slope coefficients were stable, an HI index would be equivalent to a DTH index.
Different Approaches to Estimating Hedonic Indexes
253
HI indexes is that they are conditioned on a given basket of characteristics, say, base or current period, resulting in more than one possible index. An index that is invariant to the choice of basket would be one whose parameters on the characteristics were the same (parallel) over the ranges of z in multivariate space. Because, it can be argued, there is no reason to prefer period 0 estimates of the (marginal) valuations of the characteristics to period t ones, constraining the parameters to be the same as in equation (17), is not unreasonable. Thus the intuition of averaging baskets, aside from having a physical manifestation, is no less restrictive than one of averaging (constraining to be the same) marginal valuations. Both HI indexes and DTH indexes rely on hedonic regressions for quality adjustment, and both make use of an averaging process, of base and current indexes in the former case and constrained parameters in the latter, to achieve a desired measure. There is, at least in these broad conceptual terms,18 little to choose between the two approaches (though see Silver and Heravi 2006). The two approaches can be compared based on considerations from economic theory and their satisfaction of axioms. Diewert (2004, chapters 15– 18) shows how the economic and axiomatic approach supports the use of the Fisher index number formula, and these same considerations will carry over to support of a Fisher HI index. The approach has the further advantage of giving an insight into the spread of the base- and current-period hedonic imputation estimates so that the reliability of an individual average estimate can be gauged.19 But functional forms for time dummy hedonic regressions, particularly the semilogarithmic, have also been shown (Diewert 2002) to possess good axiomatic properties. Silver (2002) has shown that while HI indexes explicitly incorporate weights, they are implicitly incorporated in the ordinary least squares (OLS) or WLS estimator used for DTH. The latter may not be fully representative being subject to influence effects from observations with high leverage and residuals. Silver (2002) has also shown that adverse leverage effects are generated by observations with unusual characteristics and, 18. Some care is needed in the specification of the regressions for a correspondence of the approaches. For example, Diewert (2002) shows that for matched data an average of revenue shares over the two periods should be used as weights in a WLS estimator and for unmatched data the square root of the revenue share in the relevant period, if a correspondence with a Törnqvist index is desired. 19. Diewert (2002) points out that the main advantage of HI indexes is that they are more flexible, that is, changes in tastes between periods can readily be accommodated. Yet hedonic imputations are argued to have a disadvantage that two distinct estimates will be generated, and it is somewhat arbitrary how these two estimates are to be averaged to form a single estimate of price change. Yet a Fisher average is generally supported axiomatic grounds. Diewert (2002) rightly identifies the main advantages of the dummy variable method as being that it conserves degrees of freedom and is less subject to multicollinearity problems. In this study, we are fortunate that degrees of freedom are not an issue given the relatively large sample size. We are careful to make our quality adjustments using predicted values rather than individual coefficients to avoid bias from multicollinearity.
254
Saeed Heravi and Mick Silver
again, deficiencies in measures can be attributed to characteristic mix changes. We now turn to empirical evidence on the differences between the formulas summarized in table 8.1. 8.5 Preliminary Empirical Analysis 8.5.1 Data: Scope and Coverage This study uses British scanner data on a monthly basis for the two year period 1998 and 1999 for three consumer durables: washing machines, vacuum cleaners, and dishwashers. Scanner data are compiled from the scanner (bar-code) readings of retailers. The electronic records of almost all transactions include the transaction price, time of transaction, place of sale, and a model number code for the item sold, which is linked to a file on the characteristics of the model. The transactions are counted and prices aggregated for each model sold in each outlet-type in each month (the data being supplemented by visits to independent outlets without scanners) to yield the volume, total value of sales, and, thus, the unit value or “price” of each model in each month or outlet-type. The observations are for a model of the product in a given month in one of four different outlet types: multiples (chains), mass merchandisers, independents, and catalog. Hedonic regressions are estimated to derive, for each month, coefficients on brands, characteristics, and outlet-types. The coverage of the data is impressive both in terms of transactions and features. For Great Britain, for example, in 1998, table 8.2 shows the data to cover about 3 million transactions for vacuum cleaners. The coverage of outlets is estimated (by GfK Marketing Services) to be “well over 90%,” with scanner data being supplemented by data from price collectors in outlets that do not possess bar-code readers. The number of observations is given for each product in table 8.2 for 1998 and 1999, there being, for example, 9,043/12—about 750 models of vacuum cleaners sold in each month on average in 1998. However, these figures treat the same model sold in a different outlet-type as a separate observation as their prices may Table 8.2
Details of the data, annual, 1998 and 1999
No. of transactions (millions)
Dishwashers Vacuum cleaners Washing machines
No. of models by outlet type (observations)
Total sales value (£ millions)
1998
1999
1998
1999
1998
1999
0.382 3.077 1.517
0.436 3.174 1.732
4,621 9,043 7,750
4,483 9,378 7,728
140 420 550
140 420 600
Different Approaches to Estimating Hedonic Indexes
255
differ. For example, in 1998 there were 9,043 observations on 4,088 models of vacuum cleaners; 7,750 observations on 3,426 models of washing machines; and 4,605 observations on 2,259 models of dishwashers. Each model of vacuum cleaner, washing machine, and dishwasher was, on average, in 2.21, 2.26, and 2.04 outlet-types, respectively. From table 8.2 the data for the three products can be seen to amount to 43,000 such observations representing 10.3 million transactions valued at £2.27 billion. The observations are by model in an outlet-type, so the data clearly delineates which model is sold in each transaction. 8.5.2 Data: The Variables The set of performance characteristics naturally varies between products. They are given in the appendix and, in their dummy variable representation, can be seen to be particularly extensive. Common to just about all products is, first, volume, which is the sum of the transactions during the period. Many of the models sold in any month have relatively low sales. Second is price, which is the unit value (value divided by quantity) of a model sold in one of four outlet-types in a month.20 8.5.3 The Hedonic Regressions The OLS regressions were estimated on a data set that excluded models with sales of thirty or less in any month and a minimal number of models with extreme prices arising from variables not included in the data, such as stainless steel washing machines. The choice of thirty was based on some experimentation.21 The loss in the number of observations was quite severe for washing machines from 7,750 to 3,957, while the loss in terms of the volume of sales was minimal, from 1.517 million to 1.482 million. The corresponding figures were dishwashers, 4,605 to 1,890 observations, 381.2 thousand to 358.5 thousand sales and vacuum cleaners, 9,043 to 5,367 observations, 3.077 million to 3.036 million sales. As should be apparent from the preceding, many of the models had often only a single transaction, being the end of an old line with relatively low average prices (Silver and Heravi 2005). The OLS estimated regressions all fitted well using conventional criteria 20. The definition of unit values benefits from it being defined across outlet types as opposed to all outlets in general (Silver and Webb 2003). There is some potential bias in the leastsquares estimates of predicted prices. Suppose the average price of model k in one of the outlet types is pk for i 1, . . . , m transactions. Following footnote 5, let ln pj a bxj uj , then ln pˆj a bxj (in general, pj exp[h(xj ) u j ]. pk 1/mk Σmi exp(a bxj u j ) 1/mk Σim exp[h(xj ) u j ] 1/mk Σmi exp[h(xj )](exp(uj )) and ln ( pk ) h(xj ) ln[1/mk Σmi exp(uj )]. The second term on the right-hand side need not equal zero, and this is ignored in the hedonic regression. Furthermore, as mk may be related to pk the OLS regression may be subject to omitted variable bias. However, we first examined the means of the residuals and found them close to unity (results available from authors) and, second, employed a WLS estimator that took some account of mk . Finally, because our index numbers use ratios of predicted values, any bias may well cancel. 21. The results were qualitatively similar for weaker constraints.
256
Saeed Heravi and Mick Silver
such as F-tests rejecting the null hypothesis of all coefficients equaling zero, R 2s of around 0.85, and individual coefficients having the expected signs and magnitudes (results available from authors). The details of each of the estimated regression equations in each month are not presented here for reasons of space.22 There is a technical issue to consider. The residuals from estimates from a semilogarithmic regression may be homoskedastic, but distributed lognormally. If so, they are biased and an adjustment of 1/2 variance of the residuals is required (Van Garderen and Shah 2002). The effect was found to be minimal in this study, the standard errors being very small. For example, the effect for 1998 estimates using a fixed-base time dummy method was to lower the estimated monthly price fall by 0.001 percentage point. 8.6 Results Table 8.3 presents the results for the thirty-six formulas for three products for 1998 and 1999, 216 indexes in all. Choice of formula does matter. The standard deviations of monthly inflation rates for 1998 and 1999 are, respectively, –0.210 and –0.242, about half of their respective means of –0.391 and –0.473. Bear in mind that a standard deviation of about onehalf its mean value implies, when the observations are normally distributed, that approximately 95 percent of observations will be within the mean price plus or minus its own value, which is a substantial level of dispersion. The multitude of measures and influences makes it not straightforward to evaluate the results. Table 8.4 presents the results of a metaanalysis from a linear OLS regression of the hedonic indexes on dummy variables of distinguishing factors. 8.6.1 Why Hedonic Indexes Differ In this section, we explore why the formulas give different results. Such differences are explained in terms of the period, product, use of weights, aggregator (geometric versus arithmetic), method (HI versus DTH versus FE index), and periodicity of the comparison (chain versus fixed base). Underlying these differences are further analytical factors including the proportion of all available observations that are actually used and have leverage on estimates and the correlation between relative (characteristic) price and quantity changes. The latter is considered separately below. The coefficient on the year 1999 in the first column of table 8.4 shows the compound monthly change in hedonic-adjusted prices for 1999 compared with 1998: a fall in 1999, on average, by a further 0.082 percentage points than for 1998. This decline was after controlling for the different index number formulations. Dishwasher prices fell by on average 0.087 22. Though they are available from the authors upon request.
1998: –0.579 1999: –0.452 1998: –0.519 1999: –0.223
Geo-mean (PJHBC P JHBPJH, C CPJHBC ) Fixed 1998: –0.507 Base 1999: –0.449 Chain 1998: –0.681 Base 1999: –0.419
Unweighted Paasche current hedonic (PHC–Carli, CPHC–Carli ) Fixed 1998: –0.455 1998: –0.470 Base 1999: –0.401 1999: –0.549 Chain 1998: –0.571 1998: –0.074 Base 1999: –0.335 1999: –0.252
Unweighted base hedonic (PHB–Carli, CPHB–Carli ) Fixed 1998: –0.507 1998: –0.569 Base 1999: –0.423 1999: –0.189 Chain 1998: –0.673 1998: –0.497 Base 1999: –0.329 1999: –0.058
1998: –0.519 1999: –0.662 1998: –0.326 1999: –0.123
Jevons current hedonic (PJHC, CPJHC ) Fixed 1998: –0.480 Base 1999: –0.446 Chain 1998: –0.625 Base 1999: –0.429
Dishwashers
1998: –0.640 1999: –0.241 1998: –0.712 1999: –0.324
Washing machines
Unweighted Hedonic imputations
Washing machines Dishwashers
Weighted
1998: –0.228 1999: –0.449 1998: –0.096 1999: –0.217
Paasche current hedonic (PHC–Pas, CPHC–Pas ) Fixed 1998: –0.539 1998: –0.365 Base 1999: –0.277 1999: –0.477 Chain 1998: –0.673 1998: –0.890 Base 1999: –0.211 1999: –0.011
1998: –0.216 1999: –0.291 1998: –0.139 1999: –0.103 (continued )
1998: –0.290 1999: –0.367 1998: –0.167 1999: –0.161
HI indexes—Arithmetic means of relatives Laspeyres base hedonic (PHB–Las, CPHB–Las ) 1998: –0.113 Fixed 1998: –0.498 1998: –0.463 1999: –0.343 Base 1999: –0.401 1999: –0.181 1998: –0.096 Chain 1998: –0.650 1998: –0.336 1999: –0.303 Base 1999: –0.174 1999: –0.187
1998: –0.236 1999: –0.318 1998: –0.201 1999: –0.185 1998: –0.270 1999: –0.353 1998: –0.213 1999: –0.214
Törnqvist (PHBC–Törnq, CPHBC–Törnq ) Fixed 1998: –0.545 Base 1999: –0.256 Chain 1998: –0.700 Base 1999: –0.238
Geo-Paasche hedonic current (PHC–GLas, CPHC–GLas ) Fixed 1998: –0.568 1998: –0.399 Base 1999: –0.302 1999: –0.555 Chain 1998: –0.710 1998: –0.326 Base 1999: –0.259 1999: –0.198
1998: –0.305 1999: –0.389 1998: –0.225 1999: –0.243
Vacuum cleaners
1998: –0.450 1999: –0.381 1998: –0.414 1999: –0.280
1998: –0.209 1999: –0.438 1998: –0.249 1999: –0.435
1998: –0.266 1999: –0.490 1998: –0.248 1999: –0.384
HI indexes—Geometric means Geo-Laspeyres hedonic base (PHB-GLas, CPHB–GLas ) 1998: –0.152 Fixed 1998: –0.522 1998: –0.501 1999: –0.387 Base 1999: –0.210 1999: –0.207 1998: –0.249 Chain 1998: –0.689 1998: –0.502 1999: –0.485 Base 1999: –0.215 1999: –0.361
Vacuum cleaners
Results from hedonic index numbers formulas (monthly compound growth rates; %)
Jevons base hedonic (PJHB, CPJHB ) Fixed 1998: –0.533 Base 1999: –0.453 Chain 1998: –0.736 Base 1999: –0.409
Hedonic imputations
Table 8.3
1998: –0.586 1999: –0.281 1998: –0.637 1999: –0.274
Fully constrained (PFEW–FC ) Fixed 1998: –0.938 Base 1999: –0.855
1998: –0.801 1999: –0.531
1998: –0.773 1999: –0.576 1998: –1.345 1999: –0.865
Fixed effects (panel) regression indexes Binary (PFEW, CPFEW ) 1998: –0.426 Fixed 1998: –0.594 1999: –0.531 Base 1999: –0.865 1998: –0.656 Chain 1998: –1.345 1999: –0.772 Base 1999: –0.984 1998: –0.558 1999: –0.674
1998: –0.431 1999: –0.228
Fully constrained (PTDW–FC ) Fixed 1998: –0.549 Base 1999: –0.484
1998: –0.107 1999: –0.492
1998: –0.380 1999: –0.315 1998: –0.525 1999: –0.385
Dishwashers
DTH indexes Binary (PTDW, CPTDW ) 1998: –0.265 Fixed 1998: –0.488 1999: –0.374 Base 1999: –0.564 1998: –0.256 Chain 1998: –0.702 1999: –0.394 Base 1999: –0.507
Washing machines
1998: –0.414 1999: –0.329 1998: –0.240 1999: –0.088
Hedonic imputations Fisher (PHBC–Fisher, CPHBC–Fisher ) Fixed 1998: –0.519 Base 1999: –0.235 Chain 1998: –0.662 Base 1999: –0.193
1998: –0.170 1999: –0.396 1998: –0.096 1999: –0.260
Vacuum cleaners
Weighted
Note: Monthly compound growth rate is calculated for January to December as r in: January index (1 r)12 December index.
Fully constrained unweighted (PFE–PC ) Fixed 1998: –0.782 Base 1999: –0.558
1998: –0.629 1999: –0.435
1998: –0.594 1999: –0.426 1998: –0.746 1999: –0.514
Binary (PFE, CPFE ) Fixed Base Chain Base
1998: –0.549 1999: –0.531 1998: –0.864 1999: –0.558
1998: –0.522 1999: –0.198
Fully constrained unweighted (PTD–FC ) Fixed 1998: –0.632 Base 1999: –0.422
1998: –0.0531 1999: –0.453 1998: –0.778 1999: –0.452
Binary (PTD, CPTD ) Fixed Base Chain Base
Dishwashers
1998: –0.519 1999: –0.369 1998: –0.286 1999: 0.096
Washing machines
Unweighted
(continued)
Fisher (PHBC–Carli, CPHBC–Carli ) Fixed 1998: –0.481 Base 1999: –0.412 Chain 1998: –0.622 Base 1999: –0.335
Hedonic imputations
Table 8.3
1998: –0.210 1999: –0.674
1998: –0.254 1999: –0.665 1998: –0.920 1999: –1.530
1998: –0.255 1999: –0.555
1998: –0.228 1999: –0.553 1998: –0.245 1999: –0.450
1998: –0.253 1999: –0.329 1998: –0.153 1999: –0.132
Vacuum cleaners
259
Different Approaches to Estimating Hedonic Indexes Table 8.4
Meta-analysis regression of monthly compound hedonic inflation rates
Hedonic indexes Regression of: Intercept 1999 Washing machines Dishwashers Chained Weighted Geometric-aggregation Fixed effect (FE) Time dummy hedonic (TD) TD chained TD weighted Geo-mean of indexes Current weighted
Spread (absolute values)
Time dummy less hedonic imputation index (absolute values)
Coefficient
t-statistic
Coefficient
t-statistic
Coefficient
t-statistic
–0.252 –0.082 –0.186 –0.087 0.105 –0.008 –0.063 –0.327 0.041 –0.160 –0.002 0.013 0.003
–6.58∗∗∗ –3.63∗∗∗ –6.69∗∗∗ –3.12∗∗∗ 4.17∗∗∗ –0.34 –2.28∗∗ –8.26∗∗∗ 0.75 –2.49∗∗ –0.04 0.37 0.07
0.123 0.255 0.048 0.046 –0.156 0.007 –0.085
2.30∗∗ 3.83∗∗∗ 0.89 0.87 –2.31∗∗ 0.16 –1.96∗
0.001 0.003 0.027 0.003 0.099 0.061 0.005
0.02 0.07 0.76 0.08 2.21∗∗ 2.11∗∗ 0.19
R 2 N
0.47 216
0.38 24
0.35 24
∗∗∗Significant at the 1 percent level. ∗∗Significant at the 5 percent level.
percentage points more than vacuum cleaners and washing machines even further—by 0.186 percentage points more than vacuum cleaners. Chained indexes fell by on average 10.5 percentage points less than fixed-base ones, and weighted ones had no statistically significant difference to unweighted ones, after controlling for other features. We emphasize that these findings are for the overall average effect and that weighting can matter for some products (less so in 1998); for example, for washing machines in 1999, the unweighted geometric mean of geometric base and current period HI indexes fell by 0.449 percent, compared with a weighted Törnqvist index falling by 0.256 percent (table 8.3). The use of geometric aggregation, as opposed to arithmetic aggregation, led to an on average further fall of 0.063 percentage points. The Fisher and Törnqvist hedonic indexes can be seen from table 8.3 to be fairly close in their fixed-base weighted form, but less so in their chained form and even less so in their unweighted formulations.23 The latter is because prices of poorly selling models decrease more rapidly than popular models. 23. A Fisher HI index differs from the Törnqvist HI index not only in the functional form of the weighted aggregator but also in linear as against semilog functional form used for the hedonic imputations.
260
Saeed Heravi and Mick Silver
The FE estimator was argued in the preceding to be implicitly constrained to matched samples, and its use had a more pronounced effect— a further fall of 0.327 percentage points against other approaches. As identified earlier, the FE–fixed-base index implicitly only considers matched data between January and December, and the chained-base index is based on only matched successive binary comparisons. The fixed-base-FE matched-models estimator ignores, in any comparison between, say, January and December, the unmatched old models, available in January but unavailable in December. It also ignores the unmatched new models, available in December but unavailable in January. It is thus based on a more restrictive sample of data than the other indexes. The restriction is quite severe. Table 8.5 shows that for dishwashers, for example, just over half of the models available in January were no longer available in December 1998 for matching. Moreover, in December 1998 the matched sample had lost about 12 percent of the total January sales value (unmatched old models) and about 30 percent of the December sales value (unmatched new models; Silver and Heravi 2005). The chained-base indexes have clearly fallen more than their fixed-base counterparts, possibly due to the exclusion in the latter of many more unmatched new models with relatively low qualityadjusted prices. There was no overall statistically significant difference between the DTH and HI indexes; the coefficient on the time dummy (TD) in table 8.3 of 0.041 compared with the benchmark HI was not statistically significant. However, in its chained form, this difference was statistically significant amounting to a quite substantial 0.16 percentage points. Table 8.3 reports substantial variation in the results for different DTH formulations; the chained form, for example, fell faster, by and large (or at a roughly equivalent rate) than the fixed-base index. The decision to use either of these three DTH formulations, given the discrepancies in results, argues for a clear idea of purpose. If it is to compare prices in a comparative static manner, not influenced by what went on in between, the fixed base is appropriate, otherwise the chained-path dependency or constrained aggregation is preferred.
Table 8.5
Summary of coverage of matched models, 1998 Percentage of January’s observations
Percentage of January’s sales value
Washing machines 81.60
Percentage of December’s sales value
December
53.00
48.20
December
55.86
87.78
72.14
December
63.61
Vacuum cleaners 95.32
72.60
Dishwashers
Different Approaches to Estimating Hedonic Indexes
261
It should be borne in mind that some of the HI indexes in the data are geometric means of other indexes, though their impact, other things being equal, is not statistically significant. A more important concern is the difference (coefficient on the dummy) between current-period hedonic imputations as against their base-period counterparts. Other things controlled for, this difference or spread is not statistically significant, but this is on average for a meta-analysis, and thus we consider spread in further detail in the following. 8.6.2 The Spread between Base- and Current-Period HI Indexes The spread of individual results can be seen in table 8.3 to be quite substantial. We employed a similar meta-analysis to that used for all the index results, but on the (twenty-four absolute values of the) spread between the 48 current- and base-weighted formulas. The mean spread was 0.17, with a standard deviation of 0.027 percentage points. The distribution was highly skewed as differences were more substantial than expected: for dishwashers in 1999, for example, the monthly average fall for the fixed-base HI index was 0.241 percent compared with 0.662 percent for the currentperiod HI index although other indexes had relatively small spread: the monthly average fall for washing machines was 0.453, compared with 0.446 in 1999 for base- and current-HI indexes, respectively. A regression (table 8.4) of the twenty-four differences found the (absolute) spread can change over time. In 1999 the spread was, on average, a substantial 0.255 percentage points more than its monthly amount in 1998. A minimal spread in one period should not be expected to hold for the next. Chaining reduced the (absolute) spread by a considerable 0.156 percentage points, on average, and the use of a geometric mean aggregator further reduced such spread by 0.085 percentage points (though this was borderline statistically significant at the 5 percent level). An advantage of chaining is that it generally reduces spread as long as prices and quantity movements are smooth (see Forsyth and Fowler 1981; Szulc 1983). 8.6.3 Differences between HI Indexes and Dummy Time Variable Hedonic (DTH) Indexes The mean and standard deviation of the absolute differences between the two methods was 0.09 and 0.018 percentage points, respectively. Table 8.3 shows that the results from the DTH approach often fell outside of the base- and current-period HI index bounds. Any differences between the approaches was argued in Section 8.4.3 to be in part positively associated with spread. Because spread was itself determined via the dummy variables that characterize the formulas (table 8.4), we regressed the difference between the DTH and HI indexes on such characteristics. Chaining and weighting were both found to increase the absolute value of the difference between the DTH results as against the HI ones (table 8.4). Chaining can
262
Saeed Heravi and Mick Silver
be seen to be influential in all the regressions in table 8.4: it helps explain variation in hedonic indexes, spread, and the difference between DTH and HI indexes. There is a preference for chaining for the DTH approach as the restrictions on the coefficients to be the same over binary successive periods is not as severe as a fixed-base DTH restriction. There is also a preference for chaining if it reduces the spread in HI indexes. While the results suggest, to its merit, that chaining reduces spread in HI indexes, the increase in the difference between DTH and HI indexes is undesirable. The authors have undertaken research on factors relating to the difference between DTH and HI indexes to help resolve this (Silver and Heravi 2007) quandary. The difference arising from weighting is more problematic because on grounds of “representativity” we cannot argue for unweighted indexes. In table 8.4, weighting can be seen to have influence only with regard to the difference between DTH and HI indexes, and this may be due to the manner in which weights are used in the two formulations, in DTH indexes via WLS and explicitly in HI indexes.24 It is, of course, possible to say something about the best estimates. As noted in the preceding, superlative indexes are preferred to nonsuperlative ones and are known to approximate each other, so there is a strong case for Fisher or Törnqvist HI indexes and symmetrically weighted DTH indexes. Chaining is generally preferred to fixed-base indexes as it reduces the spread for HI indexes (as long as prices and quantities move smoothly) and requires less-severe restrictions for the coefficients from DTH regressions. Weighted indexes are generally preferred to unweighted ones. 8.7 Summary The CPI measures aggregate changes in the prices of matched models of goods. The models are matched over time so that only pure price changes are measured, not those due to changes in quality. However, in many product markets, new models of different quality are introduced and old models discontinued. A CPI based solely on matched models would ignore these new and old models and not properly represent price changes. Hedonic regressions are a mechanism by which a valuation can be put on the quality components of a model of a good. Hedonic indexes allow aggregate price changes to be measured that include models of changing quality, that is, new and old unmatched models. As was argued in section 8.2, the need for hedonic indexes for the measurement of quality-adjusted prices is particularly acute in cases where there are differentiated products subject to a high turnover in models. But hedonic indexes come in many forms. Further, the different forms 24. Silver (2002) showed that the weights implicit in a WLS estimator need not correspond to those explicitly used in an HI index because of influence effects.
Different Approaches to Estimating Hedonic Indexes
263
can give different results. The methods used in this study differ with regard to whether a chained formula or fixed-base index is used; a geometric as opposed to arithmetic mean; a fixed effect approach that only takes account of matched models, or one that uses all models; a DTH method as opposed to a HI method; and whether base-period, current-period, or some average of the two period’s characteristics are held constant. There are many ways of using hedonic indexes. In order to examine the extent of variation between methods and to try to explain such variability, this paper provides results on a meta-analysis of the 216 results (table 8.4) arising from examining thirty-six methods for constructing hedonic indexes for three products over each of two years, 1998 and 1999. A finding of the study is that the choice of method does matter; the standard deviations of monthly inflation rates for different methods for 1998 and 1999 are 0.210 and 0.242, respectively, about half of their respective means of falls of 0.391 and 0.473 percent. The meta-analysis reports smaller overall falls from chaining, larger falls from geometric aggregation and from the (matched) fixed effect approach. Of particular note is the substantial differences between chained DTH comparisons, which are built up by successive multiplication, from regression estimates over linked periods, say January with February, February with March, . . . , November with December over fixed-base DTH ones that compare January directly with February, and then January directly with March, . . . , December. When regression coefficients change over time, a case can be made for the use of chaining as fixed-base DTH indexes are more restrictive in their assumptions. Also of note is the quite substantial base- and current-period spread found in HI indexes. The extent of such spread is shown to be unstable over time (1998 compared with 1999) but could be reduced by chaining and employing a geometric aggregator. The result of substantial differences between HI indexes and DTH indexes is also of interest. The discussion in sections 8.4.2 and 8.4.3 implied there is little to choose between these approaches on theoretical grounds, which is cause for concern given the extent of the differences found. In particular, chaining was found to increase such differences arguing against its use from this standpoint. The results of the paper are limited to British electrical domestic appliances and different patterns may emerge for other countries or product areas. Theory provides some guidelines as to which of these hedonic index formulations are preferred. As noted in the preceding section, there are good theoretical reasons to prefer symmetric averages or superlative formulations to base- or current-period ones, though often data are only available for asymmetric formulations. This study shows that the spread between the asymmetric base- and current-period formulations can be substantial. Chaining also has a good theoretical justification—unless prices and quantities do not move smoothly—and chaining should reduce
264
Saeed Heravi and Mick Silver
the base- and current-period spread. The implicit restriction of the fixed effects estimator to matched data for binary comparisons argues against its use. However, the disparities between the two major approaches, DTH and HI indexes, remain of major concern and research on an analytical framework behind this difference is required (see Silver and Heravi 2007). As regards further work, insights25 into the difference between index number formulas can be obtained by decompositions of the differences between such formulas. For example,26 the ratio of Paasche to Laspeyres price indexes (PP and PL ) can be decomposed into an expression comprising the weighted coefficients of variation of price changes (p /PL ) and quantity changes (q /QL ) and the weighted correlation coefficient (rp:q ) between price and quantity changes: (18)
p q PL 1 rp:q PL QL PP
Details are provided in Allen (1975, 62), following a derivation in the 1922 and 1923 works of von Bortkiewicz and an application is in Abel, Berndt, and White (2003). However, first, such analysis is for matched price and quantity relatives, and our concern is with matched and unmatched data. Second, the prices and weights for indexes such as in equations (2) and (3) are prices and quantities of characteristics. Further research might look at how the Bortkiewicz decomposition may be applied to unmatched data and characteristic prices.
Appendix Characteristic Sets Included in Regression Formulations Washing Machines (i) Manufacturer (make)—dummy variables for about twenty makes; (ii) type of machine: five types—top-loader, twin tub, washing machine (WM), washer dryer (WD) with and without computer, WD with or without condensers; (iii) drying capacity of WD; (iv) height of machines in centimeters; (v) width; (vi) spin speeds: five main—800rpm, 1,000rpm, 1,100rpm, 1,200rpm and 1,400rpm; (vii) water consumption; (viii) load capacity; (ix) energy consumption (kWh per cycle); (x) free standing, built25. Alternatively the difference between some formulas can be phrased in terms of the dispersion of price relatives and economic theories relating to prices dispersion can be applied to explain such differences (Silver and Heravi 2003b). 26. Allen (1975, 186) also provides details of a decomposition of the difference between chained- and fixed-base Laspeyres.
Different Approaches to Estimating Hedonic Indexes
265
under and integrated, built-under not integrated, built-in and integrated; (xi) vintage; (xii) outlet types: multiples, mass merchandisers, independents, multiples; (xiii) vintage is the year in which the first transaction of the model took place. Dishwashers (i) Manufacturer (make)—dummy variables for about twenty-two makes; (ii) type of machine: four types—built-under, built-under integrated, table top, free standing; (iii) with microchip; (iv) width; (v) height; (vi) kWh per cycle; (vii) number of plates; (viii) number of programs; (ix) partly integrated, fully integrated, nonintegrated switch panel; (x) water consumption; (xi) stainless steel; (xii) vintage; (xiii) outlet types: multiples, mass merchandisers, independents, multiples; (xiv) vintage is the year in which the first transaction of the model took place. Vacuum Cleaners (i) Manufacturer (make)—dummy variables for about twenty-nine makes; (ii) wattage; (iii) integrated or separate; (iv) remote control; (v) cord rewind; (vi) shampoo; (vii) speed control; (viii) soft or hard box; (ix) type of machine: six types—cylinder, upright, wet or dry, steam, handstick, rechargeable; (x) outlet types: multiples, mass merchandisers, independents, multiples.
References Abel, J. R., E. R. Berndt, and A. G. White. 2003. Price indexes for Microsoft’s personal software products. NBER Working Paper no. 9966. Cambridge, MA: National Bureau of Economic Research, September. Adelman, I., and Z. Griliches. 1961. On an index of quality change. Journal of the American Statistical Association 56:535–48. Aizcorbe, A. 2003. The stability of dummy variable price measures obtained from hedonic regressions. FEDS Working Paper no. 2003-05. Washington, DC: Board of Governors of the Federal Reserve System, February. http://www .federalreserve.gov/pubs/feds/2003/index.html. Allen, R. D. G. 1975. Index numbers in theory and practice. London: Macmillan. Arguea, N. M., C. Hasea, and G. A. Taylor. 1994. Estimating consumer preferences using market data: An application to U.S. automobile demand. Journal of Applied Economics 9:1–18. Balk, B. M. 2002. Price indexes for elementary aggregates: The sampling approach. Statistics Netherlands Research Paper no. 0231. Voorburg, The Netherlands: Statistics Netherlands. Berndt, E. R., and Z. Griliches. 1993. Price indexes for microcomputers: An exploratory study. In Price measurement and their uses, ed. Murray F. Foss, Marilyn E. Manser, and Allan H. Young, 63–93. Studies in Income and Wealth, vol. 57. Chicago: University of Chicago Press. Berndt, E. R., Z. Griliches, and N. J. Rappaport. 1995. Econometric estimates of
266
Saeed Heravi and Mick Silver
price indexes for personal computers in the 1990s. Journal of Econometrics 68:243–68. Berndt, E. R., and N. J. Rappaport. 2001. Price and quality of desktop and mobile personal computers: A quarter-century historical overview. American Economic Review 91 (2): 268–73. Boskin, M. J., E. R. Dulberger, R. J. Gordon, Z. Griliches, and D. W. Jorgenson. 1996. Toward a more accurate measure of the cost of living. Final report of the Advisory Commission to Study the Consumer Price Index. Washington, DC: Government Printing Office. Cole, R., Y. C. Chen, J. A. Barquin-Stolleman, E. Dulberger, N. Helvacian, and J. H. Hodge. 1986. Quality-adjusted price indexes for computer processors and selected peripheral equipment. Survey of Current Businesses 65 (1): 41–50. Committee on National Statistics. 2002. At what price? Conceptualizing and measuring cost-of-living and price indexes. Panel on Conceptual, Measurement and Other Statistical Issues in Developing Cost-of-Living Indexes, ed. Charles Schultze and Chris Mackie. Washington, DC: National Academy Press. Dalton, K. V., J. S. Greenlees, and K. J. Stewart. 1998. Incorporating a geometric mean into the CPI. Monthly Labor Review 120 (10): 1–6. Davidson, J. R., and J. G. Mackinnon. 1993. Estimation and inference in econometrics. New York: Oxford University Press. Diewert, W. E. 1990. The theory of the cost-of-living index and the measurement of welfare change. In Price level measurement, ed. W. E. Diewert, 79–147. Amsterdam: Holland. ———. 1997. Commentary for “Alternative strategies for aggregating prices in the CPI.” Federal Reserve Bank of St. Louis Review 79 (3): 113–25. ———. 2002. Hedonic regressions: A review of some unresolved issues. University of British Columbia, Department of Economics. Mimeograph. ———. 2003. Hedonic regressions: A consumer theory approach. In Scanner data and price indexes, ed. Mathew Shapiro and Rob Feenstra, 317–48. Studies in Income and Wealth, vol. 61. Chicago: University of Chicago Press. ———. 2004. Chapters 15 to 18. In Consumer price index manual: Theory and practice, 263–344. Geneva: International Labour Office. http://www.ilo.org/public/ english/bureau/stat/guides/cpi/index.htm. Dulberger, E. 1989. The application of an hedonic method to a quality adjusted price index for computer processors. In Technology and Capital Formation, ed. D. W. Jorgenson and R. Londaus, 37–75. Cambridge, MA: MIT Press. Feenstra, R. C. 1995. Exact hedonic price indexes. Review of Economics and Statistics 77 (4): 634–53. Forsyth, F. G., and R. F. Fowler. 1981. The theory and practice of chain price index numbers. Journal of the Royal Statistical Society A 144 (2): 224–47. Gordon, R. L. 1990. The measurement of durable goods prices. Chicago: University of Chicago Press. Griliches, Z. 1961. Hedonic price indexes for automobiles: An econometric analysis of quality changes. Government Price Statistics: Hearings before the Subcommittee on Economic Statistics of the Joint Economic Committee. 87th Cong., January 24, 1961. ———. 1964. Notes on the measurement of price and quality changes. In Models of income determination, 381–418. NBER Studies in Income and Wealth, vol. 28. Princeton, NJ: Princeton University Press. ———. 1971. Hedonic price indexes revisited: Some notes on the state of the art. In Price indexes and quality change, ed. Z. Griliches, 3–15. Cambridge, MA: Harvard University Press. ———. 1988. Postscript on hedonics. In Technology, education, and productivity, ed. Zvi Griliches, 119–22. New York: Basil Blackwell.
Different Approaches to Estimating Hedonic Indexes
267
———. 1990. Hedonic price indexes and the measurement of capital and productivity: Some historical reflections. In Fifty years of economic measurement: The jubilee conference of research in income and wealth, ed. E. R. Berndt and J. E. Triplett, 185–202. Studies in Income and Wealth, vol. 54. Chicago: University of Chicago Press. ———. 1997. The commission report on the consumer price index: Commentary. Federal Reserve Bank of St. Louis Review 79 (3): 169–73. Haan, J. de. 2003. Time dummy approaches to hedonic price measurement. Paper presented at the seventh meeting of the International Working Group on Price Indices, Paris. Heravi, S., A. Heston, and M. Silver. 2003. Using scanner data to estimate country price parities: An exploratory study. Review of Income and Wealth 49 (1): 1–22. Kennedy, P. 1998. A guide to econometrics. Oxford, UK: Blackwell. Kokoski, M., K. Waehrer, and P. Rozaklis. 2001. Using hedonic methods for quality adjustment in the CPI: The consumer audio products components. BLS Working Paper no. 344. Washington, DC: Bureau of Labor Statistics. Nerlove, M. 2001. Zvi Griliches, 1930–1999: A critical appreciation. The Economic Journal 111 (472): 442–48. Ohta, M., and Z. Griliches. 1975. Automobile prices revisited: Extensions of the hedonic hypothesis. In Household production and consumption, ed. N. Terleckyj, 325–91. Studies in Income and Wealth, vol. 40. New York: National Bureau of Economic Research. Pakes, A. 2003. A reconsideration of hedonic price indexes with an application to PCs. The American Economic Review 93 (5): 1576–93. Rosen, S. 1974. Hedonic prices and implicit markets: Product differentiation in perfect competition. Journal of Political Economy 82:34–55. Silver, M. 2002. The use of weights in hedonic regressions: The measurement of quality-adjusted price changes. Cardiff University, Cardiff Business School. Mimeograph. Silver, M., and S. Heravi. 2001. Scanner data and the measurement of inflation. The Economic Journal 11 (June): 384–405. ———. 2003a. The measurement of quality-adjusted price changes. In Scanner data and price indexes, ed. Mathew Shapiro and Rob Feenstra, 277–317. Studies in Income and Wealth, vol. 61. Chicago: University of Chicago Press. ———. 2003b. Why price index number formulae differ: Economic theory and evidence on price dispersion. In Proceedings of the 7th meeting of the (U.N.) International Working Group on Price Indices (Ottawa Group), ed. Thierry Lacroix, 175–212. Paris: INSEE. http://www.insee.fr/en/nom_def_met/colloques/ottawa/ pdf/paper_silver.pdf. ———. 2005. Why the CPI matched models method may fail us: Results from an hedonic and matched experiment using scanner data. Journal of Business and Economic Statistics 23 (3): 269–81. ———. 2007. The difference between hedonic imputation indexes and time dummy hedonic indexes. Journal of Business and Economic Statistics 25 (2). Silver, M., and B. Webb. 2003. The measurement of inflation: Aggregation at the basic level. Journal of Economic and Social Measurement 28 (1–2): 21–36. Stigler, G. 1961. The price statistics of the federal government. Report to the Office of Statistical Standards, Bureau of the Budget. New York: National Bureau of Economic Research. Szulc, B. J. 1983. Linking price index numbers. In Price level measurement, ed. W. E. Diewert and C. Montmarquette, 537–66. Ottawa: Statistics Canada. Triplett, J. E. 1988. Hedonic functions and hedonic indexes. In The new Palgrave’s dictionary of economics, 630–634. New York: Macmillan. ———. 1990. Hedonic methods in statistical agency environments: An intellectual
268
Saeed Heravi and Mick Silver
biopsy. In Fifty years of economic measurement: The jubilee conference on research in income and wealth, ed. E. R. Berndt and J. E. Triplett, 207–33. Studies in Income and Wealth, vol. 56. Chicago: University of Chicago Press. ———. 2004. Handbook on quality adjustment of price indexes for information and communication technology products. Paris: Organization for Economic Cooperation and Development. Van Garderen, K. J., and C. Shah. 2002. Exact interpretation of dummy variables in semi-logarithmic equations. Econometrics Journal 5:149–59. White, A. G., J. R. Abel, E. R. Berndt, and C. W. Monroe. 2004. Hedonic price indexes for personal computer operating systems and productivity suites. NBER Working Paper no. 10427. Cambridge, MA: National Bureau of Economic Research, April.
9 Price Indexes for Microsoft’s Personal Computer Software Products Jaison R. Abel, Ernst R. Berndt, and Alan G. White
9.1 Introduction In this paper, we report on research examining measures of price changes for Microsoft’s personal computer (PC) software products over the time period July 1993 through June 2001. The focus of this paper is on the measurement of price changes for Microsoft’s software products, not on the factors underlying or causing any price changes. As such, this paper adds to a relatively small literature on price indexes for PC software products (summarized in section 9.6 of this paper). That literature for the most part ends in 1994 or earlier, and typically focuses on sales only in the retail or mail-order channels, for full versions of software products. In the following, we argue that changes in product form and distribution channel since 1994 imply that retail or mail-order sales of full versions of standalone software products are increasingly unrepresentative of Microsoft’s transactions. We therefore examine price changes for Microsoft’s software products based on prices received by Microsoft for virtually all its PC softJaison R. Abel is a vice president of Analysis Group. Ernst R. Berndt is the Louis B. Seley Professor of Applied Economics at the Sloan School of Management, Massachusetts Institute of Technology, and director of the Program on Technological Progress and Productivity Measurement at the National Bureau of Economic Research. Alan G. White is a vice president of Analysis Group. This research was originally sponsored by Microsoft Corporation in the context of various antitrust allegations and was conducted under the direction of Ernst Berndt. The opinions expressed herein are those of the authors and not necessarily those of Microsoft Corporation, its legal counsel, MIT, or the NBER. The authors gratefully acknowledge the assistance and input of many on this project—in particular, Cory Monroe, Sarita Digumarti, and Eric Korman for numerous hours of careful and thorough research assistance, other staff at Analysis Group, Microsoft counsel, and NBER Conference on Research in Income and Wealth participants, in particular David Johnson and Timothy Bresnahan, for helpful comments and suggestions.
269
270
Jaison R. Abel, Ernst R. Berndt, and Alan G. White
ware products over the primary channels of distribution through which Microsoft sells. More specifically, here we report on the measurement of price changes in Microsoft’s PC desktop operating systems and applications over the time period July 1993 through June 2001. The operating systems included in this analysis are MS-DOS, Windows, Windows 95, Windows 98, Windows Millennium Edition, Windows NT Workstation, and Windows 2000 Professional. In terms of applications, we measure price changes for the applications Word and Excel (sold as stand-alone products and in suites such as Office and Works) and Office.1 We collectively refer to these Microsoft operating systems and applications products as “the Microsoft Products.” 9.2 Background: Significant Changes in the Marketplace for Prepackaged PC Software Summarizing the pricing behavior of a large multiproduct firm is particularly challenging when diverse product market segments are dynamic, and significant changes occur over time involving channel of distribution mix, product form, and quality improvements. This is clearly the case in the markets for prepackaged PC software that we study. For Microsoft’s operating systems, between 1993 and 2001, the majority of licenses were sold through the Original Equipment Manufacturer (OEM) channel,2 while full-packaged product (that is, in a shrink-wrapped package) sales declined significantly from 1995 (when Windows 95 was introduced) to 2001. For applications, the share of licenses sold under volume-licensing agreements increased substantially during the 1993 to 2001 time period.3 In fact, volume licensing sales have largely replaced the shrink-wrapped full packaged product sales of the early 1990s. Finally, sales of applications software have grown more rapidly than those of operating systems, with the license share of applications growing from about 20 percent in 1993 to slightly more than a third of sales in 2001. Changes in the product form have also occurred over time for both applications and operating systems. At various times, upgrades to preexisting software versions have comprised a significant percentage of sales, de1. A “stand-alone” version of software is one that is not sold as part of a suite or any other integrated software package. 2. Sales in the OEM channel are primarily to personal computer manufacturers, such as Compaq and Dell. Sales in the finished-goods channel are primarily to distributors and resellers. 3. In these calculations, a suite such as Office is a single license, even if it contains both word processor (Word) and spreadsheet (Excel) components. Volume-licensing programs are pricing agreements targeted toward larger organizations that provide discounts based on the number of desktops for which Microsoft software is licensed. Open and select agreements are two of Microsoft’s most popular volume-licensing program.
Price Indexes for Microsoft’s Personal Computer Software Products
271
pending on the timing and release of new versions of various operating systems (e.g., Windows 95) and applications. For applications, Enterprise Agreements (described in the following) have constituted an increasing percentage of applications sales—over 25 percent of applications sales in 2001. There has also been a dramatic shift in product form for Word and Excel during the 1990s with sales of Word or Excel as part of the Office suite almost completely replacing stand-alone sales of Word and Excel. For example, for Excel, in 1993 the proportion of licenses sold in standalone form was about 35 percent, while the remaining approximately 65 percent of licenses were sold as part of the Office suite; by 2001, these proportions had changed to less than 1 percent and over 99 percent, respectively. For Word, the stand-alone share has fallen from about 50 percent in 1993 to less than 10 percent in 2001. Prices for Microsoft’s software differ considerably across channel, user type, and product form so that changing compositions have a material impact on aggregate average-price or price index calculations. Such changes need to be accounted for in measuring aggregate price trends over time. For example, the average prices for operating systems sold through the OEM channel are generally lower than those in the finished-goods channel. An overall average price across both channels of distribution would lie somewhere in between the two average prices from the separate channels, depending on the relative sales and price-level differences between these two channels of distribution. Economists have long recognized that in such a dynamically evolving context, in order to measure aggregate price change, the use of chainweighted price index procedures is generally preferable to various averageprice calculations. However, it is also widely believed that use of price index methods, such as the chained matched-model methods, can fail to incorporate fully the quality-change implications of exiting and newly entering goods; see, for example, discussions in Oliner and Sichel (1994) and Grimm and Parker (2000). Indeed, as discussed in the following, this failure to capture fully the quality-adjusted price declines has led the Bureau of Economic Analysis (BEA) to make an explicit additional qualityadjustment when constructing and utilizing the Bureau of Labor Statistics’ (BLS) prepackaged software producer price indexes (PPIs) in computing real gross domestic product (GDP) by industry. 9.3 Elementary Units and Aggregate Price Indexes 9.3.1 Elementary Units and Matched-Model Price Indexes In the matched-model framework of price index measurement, a welldefined product, called an elementary unit, is identified on the basis of the product’s distinct price-determining characteristics. It is this elementary
272
Jaison R. Abel, Ernst R. Berndt, and Alan G. White
matched-model unit that is used as the basic building block for making price comparisons over extended time periods. Price changes of elementary units are then weighted to construct aggregate price indexes. In particular, when the BLS collects price data for its monthly price indexes, an effort is made, wherever possible, to compare prices of the same welldefined elementary units over time. By defining the elementary unit in detailed terms and then comparing prices over time only for well-defined matched models, the price index comparison avoids problems caused by comparing prices of different products. Although matched-model price indexes have some limitations (such as the inability to capture fully qualitychange effects from newly entering or exiting products), currently, in almost all developed countries, measures of aggregate price inflation are constructed by government statistical agencies using matched-model procedures. 9.3.2 Fixed-Basket Indexes: Laspeyres, Paasche, and Fisher The fixed-basket approach to measuring price changes is used by the BLS and implicitly by the BEA. By a fixed basket, it is meant that price changes of a fixed set of clearly defined elementary units are compared over time. The weights that are applied to this fixed set of elementary units to calculate an aggregate price index are also fixed over time. In practice, the implementation of a fixed-basket price index raises a number of difficult issues. Products disappear, and new products appear over time. When this occurs, the fixed basket can become unrepresentative or even obsolete. Furthermore, as prices of certain products become relatively more expensive, the fixed-basket approach does not take into account the fact that some consumers will switch to products that are relatively less expensive (which implies that the quantities and quantity weights associated with the relatively more expensive products would become smaller). Not only does the fixed-basket approach assume fixed quantities of the products whose prices are being measured over time, but in general it also implicitly assumes that the quality of these products is held constant. For many products, product quality has improved over time, and this quality change needs to be taken into account when computing quality-adjusted measures of price change. Over the years, the problems of unrepresentative baskets, exiting and new products, and quality change have been discussed in numerous reports and studies, most recently by Boskin et al. (1996), The Conference Board (1999), and the National Academy of Sciences (2002), all with respect to the U.S. Consumer Price Index (CPI) published by the BLS. Much of index number theory and the academic literature on price indexes focuses on the issue of which index number formula is most appropriate when combining prices of varying products over time into a summary measure of average changes in prices. One characteristic that distinguishes different price index number formulas is the choice of weights
Price Indexes for Microsoft’s Personal Computer Software Products
273
that are applied to the different prices. Some indexes weight all prices equally (called “unweighted” indexes), while others use distinct and unequal weights for different products (called “weighted” indexes).4 The bestknown weighted index number formulas used for making price comparisons over time are the Laspeyres, Paasche, and Fisher price indexes. The BEA computes its price indexes using a variant of the chained Fisher Ideal price indexes; these official price indexes are used when the BEA converts nominal GDP and nominal gross product by industry (GPI) into inflation-adjusted real GDP and real GPI.5 In this paper, measures of price change in the Microsoft Products over time are constructed based on the chained Fisher Ideal price index, using sequentially updated quantity weights. We also present the Laspeyres and Paasche versions of the price indexes. Laspeyres and Paasche price indexes can differ considerably in situations where weights are changing rapidly. When demand curves are fixed, it is of course well known that in response to a small increase in the price of one good, the measured price increase will be larger for the Laspeyres than the Paasche price index. When demand curves are shifting, however, this need not be the case. On the supply side, firms can increase quantity supplied in response to a price increase, generating a situation in which the measured price increase will be larger for the Paasche instead of the Laspeyres. More generally, the following relationship can be shown to exist between the Laspeyres and Paasche price indexes, when computed over bilateral time periods: (1)
p q PP QP 1r , PL QL PL QL
where Pi , Qi , i L, P are the Laspeyres (L) and Paasche (P) price and quantity indexes, respectively, r is the weighted correlation coefficient between the price and quantity relatives, and i , i p, q, are the weighted standard deviations of the price and quantity relatives respectively. Note that the expression in parentheses in equation (1) is the product of two coefficients of variation, that is, the standard deviations of the price and quantity relatives divided by their respective weighted means. Because the product of the two terms in parentheses is always positive (assuming a nonzero standard deviation), the sign of r is sufficient to determine the direction of the divergence between the Paasche and Laspeyres price indexes, for example, if r is positive (negative) then the Paasche price index value will be greater (less) than the Laspeyres price index value calculated over 4. A discussion of differences in weighting methodologies in price indexes can be found in Diewert (1995) and Balk (1995). 5. The BEA uses price indexes published by the BLS, and occasionally modifies these, and then uses these as inputs when constructing measures of real output. For an account of BEA’s adoption of the Fisher price index, see Triplett (1992).
274
Jaison R. Abel, Ernst R. Berndt, and Alan G. White
the same time period. A derivation of this formula can be found in Allen (1975, 62–63), drawing on earlier work by von Bortkiewicz (1922, 1924, referenced in Allen [1975]). Although relatively uncommon, there are instances in the published literature in which the Paasche price index shows a greater increase or a smaller decline than the Laspeyres price index. In Berndt, Busch, and Frank (2001, table 12.7, 491), for example, between 1991 and 1992, a Laspeyres price index for the treatment of acute phase major depression increased from 1.000 to 1.003, while the Paasche increased from 1.000 to 1.011. The intuition behind this is that nonhomothetic demand shifts were occurring that were larger for the increasingly expensive component treatments. In that context, as physicians learned about the efficacy and increased tolerability of a new class of higher-priced antidepressant drugs (a class known as the selective serotonin reuptake inhibitors [SSRIs]), relative demand shifts occurred favoring the higher-priced treatment. A related interpretation is that the measured prices in that context failed to account properly for quality improvements in the new class of antidepressant drugs, and that had proper quality-adjusted prices been utilized instead, the more common Paasche less than Laspeyres price increase result might instead have resulted. Allen (1975) discusses other contexts in which various inequalities between Paasche and Laspeyres can occur; a related discussion is also found in Danzon and Chao (2000). Because observed price and quantity movements reflect the net outcome of changes in demand and supply, differing inequalities between measured Paasche and Laspeyres price index changes can occur over time, reflecting a variety of underlying shifts in demand and supply. In the following, we present empirical findings on the divergence between Laspeyres and Paasche price indexes in the context of software price indexes for the Microsoft Products and put forward interpretations of these divergences. We note in passing that substantial differences between Paasche and Laspeyres price indexes have been reported by Prud’homme and Yu (2002), based on matched-model price indexes and scanner data for various prepackaged PC software products sold in Canada between January 1996 and June 2000. Notwithstanding these inequalities in the two components of the Fisher Ideal price index, we emphasize that the literature expresses a strong desire for using the Fisher Ideal price index for measuring price changes over time (see, in particular, Diewert 1992). 9.4 Matched-Model Price Indexes for Microsoft’s Software Products 9.4.1 Data We now consider implementation of the chained Fisher matched-model price index method to measure price changes for the Microsoft Products.
Price Indexes for Microsoft’s Personal Computer Software Products
275
The data we use for our analyses are from MS Sales, Microsoft’s internal transactions database, and cover the time period July 1993 through June 2001.6 These data contain revenue and license information for the universe of Microsoft’s sales into the first line of distribution, for example, distributors and OEMs. Producer prices and corresponding weights are calculated from these data and are then used in the matched-model price indexes reported in the following. Because the transactions prices reflect prices received by Microsoft at the first point in the distribution chain, they are best interpreted as corresponding to producer rather than consumer price indexes. The products contained in the MS Sales data are organized in an hierarchical fashion at different levels of aggregation. The product-family level of the MS Sales product hierarchy provides the most appropriate grouping of transactions for the purposes of constructing matched-model indexes for the Microsoft Products.7 9.4.2 Identifying the Elementary Unit Defining the elementary unit for making price comparisons is the first step in constructing matched-model price indexes. We employ two considerations in drawing the boundaries of an elementary unit, that is, in defining “buckets” of distinct elementary units. We placed two products in the same bucket only if two conditions were satisfied. First, we placed two products in the same bucket if substitutability in response to a price change would likely be substantial, but placed them in separate buckets if possibilities for substitutability in response to a price change were likely to be very limited. Thus, because of the substantial costs of changing one’s eligibility, academic sale products were placed in a bucket different from nonacademic sales. Similarly, because eligibility to purchase an upgrade was contingent on first purchasing the full version, upgrades were treated as a separate elementary unit from full versions. A second criterion was based on functionality. Here the issue is what criteria to use in determining whether two versions of, say, a word processor program, were sufficiently similar or different to merit placing them in the same or in different buck6. Specifically, the data used in our analyses are taken from the Microsoft “As Shipped” and “As Allocated” perspectives of the MS Sales data. 7. Microsoft defines a product family as “A group of functionally equivalent products that share the same core features, facilities, and public name across multiple operating systems, versions, and languages” (MS Product Attribute Reference Guide, page 17, last updated October 19, 2001. The desktop operating systems product families used in our analyses are “MS-DOS,” “MS-DOS with Enhanced Tools,” “Windows,” “Windows for Workgroups,” “Windows 95,” “WIN95/ISK BUNDLE,” “Windows 98,” “Windows ME,” “Windows NT Workstation,” and “Windows 2000 Professional.” The applications product families used in our analyses are “Word,” “Excel,” “Office,” “Office Professional,” “Office Pro w/VisFoxPro,” “Office Pro/Bookshelf Bundle,” “Office w/Bookshelf,” “Office Small Business,” “Office Pro/ Bookshelf/Vfoxpro,” “Office Premium,” “Office Pro w/FrontPage,” “Office Pro Special Edition,” and “Office Pro w/Publisher.”
276
Jaison R. Abel, Ernst R. Berndt, and Alan G. White
ets. Software companies such as Microsoft typically release a new version of a product, for example, moving from version 5.xx to 6.xx after they have made significant changes to the product. Because new versions can contain significant changes to the original product and may be priced differently, new versions of a product are properly viewed as a separate product, that is, as a distinct elementary unit within the context of the matched-model framework. It is also common for software companies to update their products, for example, move from version 3.1 to version 3.2, to correct “bugs” in the source code or to introduce minor changes to the previous product. Because such updates do not constitute significant changes in functionality and typically are offered free to licenses having purchased that version, it is appropriate to treat them as part of the same elementary unit to which the previous version belongs. Although these boundaries are inherently to some extent subjective, it is worth noting that in computing its producer price indexes for prepackaged software, the BLS generally treats different versions as distinct elementary units, for example, version 5.xx as different from 6.xx, but treats updates as being in the same elementary unit as the original version. This construction of boundaries among versions and updates is also consistent with procedures utilized in Oliner and Sichel (1994) and in the maximum overlap method of Prud’homme and Yu (2002). With these general considerations in mind, for the Microsoft Products we define elementary units along the following four dimensions:
• Channel: • User Type: • Academic Status: • Product Family Version:
Finished Goods, OEM Full Version, Upgrade/Maintenance, Enterprise Agreement Academic, Nonacademic For example, Office Professional 6.XX
Defining an elementary unit in this way ensures that period-to-period price comparisons for a product are not influenced by underlying changes in product form or channel composition. For example, a product sold through the finished-goods channel typically has a higher price level than the same product sold through the OEM channel. If prices within a channel remained constant as relatively more consumers purchased via the finished-goods channel, then a period-to-period comparison of prices over both channels for this product would lead one to conclude erroneously that prices have increased. Therefore, when measuring price changes over time, it is important to control for the channel through which the product is sold. A similar issue exists regarding the user type and academic status of a product. The prices of products sold as a full version, upgrade or maintenance, or as part of an enterprise agreement can differ quite substantially. The same is true of products sold to academic and nonacademic consumers. Thus, as with channel of distribution, it is necessary to control for
Price Indexes for Microsoft’s Personal Computer Software Products
277
the changing underlying composition of the academic status and user type to which the software is ultimately sold. Maintenance agreements are arrangements typically entered into by volume licensing customers that provide a customer with all upgrades released for a given Microsoft product over a two-year time period. Thus, maintenance agreement licenses are functionally equivalent to upgrades. In our analyses, maintenance agreements have therefore been grouped in the same bucket as the more traditional upgrades. Enterprise agreements, which Microsoft introduced in November 1997, are typically three-year agreements that provide volume discounts for a combination of fullversion and upgrade products and contain a built-in maintenance component.8 Because they represent a combination of full versions and upgrades, we treat them as a distinct elementary unit. Because maintenance and enterprise agreements are not traditional single-user licenses, several adjustments to our data analyses were necessary so that they could be incorporated into the price index calculations. Because Microsoft allows its customers to pay for the software they license over the life of each agreement, we developed a procedure based on historical software trends and life cycles that capitalized the revenue associated with maintenance and enterprise agreements in order to make the price comparable to more traditional software licensing programs. This was accomplished by doubling the revenue attributed to the maintenance category (which is typically a two-year agreement) and tripling the revenue attributed to the annuity (enterprise agreement) category (which is typically a three-year agreement) observed in a given year in which the original purchase occurred. In addition, because customers are entitled to automatic upgrades during the life of each agreement, we implemented a procedure based on historical software trends and life cycles that adjusted the number of licenses a typical customer would realize over the course of each agreement. According to the U.S. BLS, the average product life cycle for a successful software product is eighteen months (Bureau of Labor Statistics 2000). This suggests that, on average, a typical maintenance agreement would be associated with 1.33 licenses over two years, while a typical enterprise agreement would be associated with three licenses over three years as a license is obtained at the beginning of the agreement. These “capitalized” revenue and “realized” licenses are then used to compute the per-unit license prices for the elementary units involving maintenance and enterprise agreements. Once the elementary unit has been defined, one must then identify and obtain two fundamental pieces of information: prices and corresponding 8. Initially, enterprise agreements were offered as a bundle of three products: an Office suite, a Windows desktop operating system, and Back Office (a server-based product), but currently enterprise agreements are offered separately for individual products as well.
278
Jaison R. Abel, Ernst R. Berndt, and Alan G. White
weights. Using data from MS Sales, we calculate the average price, by calendar year and product family version, along each of the following eight dimensions:9
• • • • • • • •
Finished Goods, Full Version, Nonacademic Finished Goods, Full Version, Academic Finished Goods, Upgrade/Maintenance, Nonacademic Finished Goods, Upgrade/Maintenance, Academic Finished Goods, Enterprise Agreement, Nonacademic Finished Goods, Enterprise Agreement, Academic OEM, Full Version, Nonacademic OEM, Upgrade/Maintenance, Nonacademic
Calculating prices on an annual basis allows for the effect of returns and credits to be incorporated into the analysis. In some instances, particularly in the context of monthly or quarterly periodicity, due to the incorporation of returns and credits, negative prices, revenues, or licenses may emerge for a particular elementary unit in a given year. When this situation arose in the annual context (which was considerably less frequent than with monthly or quarterly time intervals), in order to preserve the match we replaced the negative price with the most recent positive price from a previous time period and assigned this price a weight of zero. 9.5 Results of Price Changes for Microsoft Products Prices for the Microsoft Products have in general declined between 1993 and 2001. The extent of price decline varies among products and across different time periods. Trends in aggregate matched-model price indexes for stand-alone Word, stand-alone Excel, Office, and desktop operating systems are displayed in tables 9.1 and 9.2.10 Initially, we discuss price index changes based on the Fisher Ideal price index, and later on we focus on differences between the Paasche and Laspeyres components of the Fisher Ideal. As seen in table 9.1, for stand-alone Word, the cumulative price index decline between July 1993 and June 2001 was 50.16 percent, reflecting an average annual growth rate (AAGR) of –8.34 percent.11 For stand-alone Excel, the corresponding cumulative price decline was 9.12 percent, with an AAGR of –1.19 percent. For Office, the 1993 to 2001 cumulative price 9. Maintenance and enterprise agreement transactions occur only in the finished-goods channel. Within the OEM channel, only nonacademic sales take place. 10. Typical numbers of matches for these matched-model indexes range from nine (for stand-alone Excel) to ninety-three (all Microsoft Products), depending on the product index and year. 11. For a price series starting in year 0 and ending in year n, we compute AAGR (Pn /P0)1/n – 1. Although we only have six months of data for both 1993 and 2001, for purposes of computing an AAGR we treat these as full years so that the 1993 to 2001 time period represents nine full years.
100.00 107.01 99.79 81.11 93.78 92.77 67.36 58.18 59.39
–6.3
1993 1994 1995 1996 1997 1998 1999 2000 2001
AAGR (%)
–10.3
100.00 106.35 99.61 84.23 100.70 110.51 51.35 39.87 41.83
Paasche
–8.3
100.00 106.68 99.70 82.66 97.18 101.25 58.81 48.16 49.84
Fisher
5.2
100.00 100.27 159.65 158.53 182.14 184.68 170.20 151.83 150.21
Laspeyres
–7.2
100.00 55.12 59.99 56.44 67.53 65.17 58.72 55.70 54.98
Paasche
Excel (stand-alone)
–1.2
100.00 74.34 97.87 94.59 110.90 109.71 99.97 91.96 90.88
Fisher
–4.1
100.00 93.71 79.93 67.14 61.75 64.61 67.14 63.60 71.45
Laspeyres
–5.4
100.00 92.72 78.65 63.50 57.22 58.51 59.13 56.33 63.95
Paasche
Office
–4.8
100.00 93.21 79.29 65.29 59.44 61.48 63.01 59.85 67.60
Fisher
–1.3
100.00 94.06 89.66 84.73 89.95 91.63 89.33 84.59 90.18
Laspeyres
0.5
100.00 102.07 98.05 94.74 95.85 96.67 94.06 97.62 104.12
Paasche
Operating systems
–0.4
100.00 97.99 93.76 89.60 92.85 94.11 91.66 90.87 96.90
Fisher
Source: MS sales data. Notes: AAGR average annual growth rate. The Office price index includes transactions from the Office, Office Professional, Office Pro w/VisFoxPro, Office Pro/Bookshelf Bundle, Office w/Bookshelf, Office Small Business, Office Pro/Bookshelf/VisFoxPro, Office Premium, Office Pro w/FrontPage, Office Pro Special Edition, and Office Pro w/Publisher product families. The Microsoft desktop operating systems price index includes transactions from the MS-DOS, MS-DOS with Enhanced Tools, Windows, Windows for Workgroups, Windows 95, WIN95/ISK BUNDLE, Windows 98, Windows ME, Windows NT Workstation, and Windows 2000 Professional product families.
Laspeyres
Word (stand-alone)
Matched-model price indexes
Year
Table 9.1
280
Jaison R. Abel, Ernst R. Berndt, and Alan G. White
Table 9.2
Matched-model price indexes All Word
Year
All Excel
Microsoft products
Laspeyres Paasche Fisher Laspeyres Paasche Fisher Laspeyres Paasche Fisher
1993 1994 1995 1996 1997 1998 1999 2000 2001
100.00 80.23 65.35 51.53 49.61 47.72 41.93 34.29 38.13
100.00 87.77 79.39 61.08 57.37 52.57 46.34 38.53 43.36
100.00 83.92 72.03 56.10 53.35 50.09 44.08 36.35 40.66
100.00 64.11 59.85 46.61 44.30 43.82 42.18 38.93 44.45
100.00 86.22 80.44 61.99 54.29 51.84 51.28 49.27 57.51
100.00 74.35 69.39 53.75 49.04 47.66 46.51 43.80 50.56
100.00 83.70 76.57 67.41 68.86 69.02 65.85 60.52 65.67
100.00 94.35 88.66 77.97 75.87 73.96 71.01 69.57 75.85
100.00 88.86 82.40 72.50 72.28 71.45 68.38 64.89 70.57
AAGR (%)
–11.4
–9.9
–10.6
–9.6
–6.7
–8.2
–5.1
–3.4
–4.3
Source: MS sales data. Notes: AAGR average annual growth rate. All Word price index includes transactions from the “Word” product family and allocations from the various Office and Works product families. The All Excel price index includes transactions from the “Excel” product family and allocations from the various Office product families. The Microsoft products index includes All Word, All Excel, and Desktop Operating Systems.
decline was 32.40 percent, or –4.78 percent per annum. For Microsoft’s desktop operating systems, the cumulative price decline between 1993 and 2001 was 3.10 percent, or –0.39 percent per annum. A major shift in product form for Word and Excel occurred during the 1990s. This shift involved a substitution of sales of stand-alone Word and stand-alone Excel into sales of the integrated Office and Works suites. Both stand-alone and suite product forms of these products were simultaneously available during the entire 1993 to 2001 period. Each license of Office can be considered as consisting of, among other programs, a license for Word and a license for Excel. The shift from standalone Word and Excel to the Office suite resulted in an effective price decrease to purchasers of Microsoft software. Consider, for example, the following hypothetical example involving the average prices of full versions of the full-packaged product versions of Excel and Word, and of the Office Standard suite. Assume that customers purchasing both stand-alone Word and stand-alone Excel separately pay on average a total of $200 ($100 for Word and $100 for Excel); assume that the average full-packaged product price of the Office Standard (containing not only Word and Excel, but also other software products such as PowerPoint and Outlook) is $150. By purchasing the integrated Office Standard suite instead of stand-alone versions of Excel and Word, the effective price charged by Microsoft is lowered by at least 25 percent (that is, from $200 to $150).12 12. This 25 percent price decline does not account for additional products included in the suite, such as PowerPoint and Outlook.
Price Indexes for Microsoft’s Personal Computer Software Products
281
One way of assessing the magnitude of the overall effective price reduction for Word and Excel attributable to the shift to the Office suite is to allocate a portion of Microsoft’s Office revenues to Word and Excel and then to compute new effective Word and Excel prices each averaged over their stand-alone plus allocated sales.13 Using Microsoft’s internal allocations, we compute All Word and All Excel price indexes aggregated over standalone and allocated Office and Works sales. As shown in table 9.2, the cumulative price decline for All Word was 59.34 percent, or –10.64 percent per annum. The cumulative decline in the price of All Excel between 1993 and 2001 was 49.44 percent, or –8.17 percent per annum. These 1993 to 2001 cumulative price declines for All Word and All Excel were larger than those for stand-alone sales, that is, –59.34 percent compared to –50.16 for Word, and –49.44 percent compared to –9.12 percent for Excel. Finally, aggregated over all the Microsoft Products, the 1993 to 2001 cumulative price change was –29.43 percent, reflecting an AAGR of –4.26 percent (table 9.2).14 We have also performed a number of sensitivity analyses on our results by looking at different configurations of the elementary unit. For example, we have treated stand-alone and “allocated” as separate products (elementary units) and computed price indexes at the product-unit level (a higher level of aggregation than the product family level, in which the various versions of, say, Word, are placed in the same elementary unit). When standalone and allocated Word are treated as separate elementary units, the All Word price index declines at an annual rate of 8.38 percent, compared to an annual decline of 10.64 percent per year when they are combined into one elementary unit. When the Microsoft Products price index is computed at the product-unit level it declines at a rate of 2.16 percent per annum, compared to a decline of 4.26 percent per annum when the elementary unit is defined at the product-family level. When maintenance agreements and upgrades are treated as separate elementary units, the Office price index declines at a rate of 4.90 percent per annum, compared with a price decline of 4.78 percent per annum when they are combined into one elementary unit. In the context of a constant utility framework with stationary preferences, a well-known result is that the Paasche price index rises less rapidly (or declines more rapidly) than the Laspeyres price index (see Diewert 1993). As shown in equation (1), the Paasche price index value may be higher than that of the Laspeyres price index value when the correlation coefficient between the bilateral price and quantity relatives is positive. To interpret our occasional finding of a slower price decline in the Paasche rel13. Because Word is also sold as part of the Works Suite, an allocation from Works must also be made. To identify the version of Word and Excel sold as components of particular Office and Works suites, we used information from Microsoft’s “As Shipped” data. We then allocated suite revenues to the various versions of Word and Excel, using Microsoft’s internal allocations. 14. The Microsoft Products price index is calculated by combining price changes for All Word, All Excel, and desktop operating systems.
282 Table 9.3
Jaison R. Abel, Ernst R. Berndt, and Alan G. White Correlation coefficients between price and quantity relatives Word (stand-alone)
Excel (stand-alone)
Office
Operating systems
All Word
All Excel
Microsoft products
1993–1994 1994–1995 1995–1996 1996–1997 1997–1998 1998–1999 1999–2000 2000–2001
–0.13 0.19 –0.12 0.30 –0.11 –0.46 –0.39 0.24
–0.68 –0.34 –0.11 0.43 0.32 –0.15 –0.18 0.76
0.31 –0.34 0.05 –0.06 –0.18 –0.01 0.02 –0.01
–0.06 0.11 0.82 –0.15 0.09 0.32 0.92 –0.26
–0.14 0.99 –0.06 –0.08 –0.17 –0.35 –0.13 0.16
–0.29 0.92 –0.04 –0.08 –0.16 –0.06 –0.09 –0.24
–0.02 0.70 0.81 –0.06 –0.14 –0.04 0.66 –0.03
Average
–0.06
0.01
–0.03
0.22
0.03
0.00
0.24
Year
Source: MS sales data.
ative to the Laspeyres, in table 9.3 we present such annual correlation coefficients for the price and quantity relatives of operating systems and various applications, and for the Microsoft Products in aggregate. In order to preserve matches in the index calculations, we replaced any negative price with the most recent positive price from a previous time period and assigned this price a quantity weight of zero. Because of this method, the calculations of the correlation coefficient and the Laspeyres and Paasche price indexes are based on different numbers of observations (when, for example, the quantity is set to zero, resulting in an undefined quantity relative). For this reason it may not be possible to verify the von Bortkiewicz decomposition for every period and every product—in fact, the relationship does not hold for twenty-four of the fifty-six bilateral comparisons in this paper (e.g., for stand-alone Word for 1995–1996 and for 1997–1998).15 Notwithstanding this, for stand-alone Word, stand-alone Excel, Office, and Microsoft Products in aggregate, in five of the eight years the correlation coefficient is negative, while for All Word (All Excel) it is negative in six (seven) of the eight years, suggesting the familiar inequality of the Laspeyres declining less than the Paasche. For operating systems, however, a positive correlation between bilateral price and quantity relatives occurs in five of eight years, and the Laspeyres price index declines more than the Paasche. Over the eight-year time span, the average correlation coefficient between bilateral price and quantity relatives for operating systems is positive. As is seen in figure 9.1, this results in a Paasche price index having a cumulative price change of 4.12 percent, a Laspeyres price index having a smaller cumulative price change of –9.82 percent, and the Fisher Ideal having a cumulative price change in between at –3.10 percent. In a price index 15. When the matched-model price indexes are computed after dropping the zero/negative prices and quantities, there is almost no change in the AAGRs, and the range of differences is from 0.00 to 0.35 percentage points.
Price Indexes for Microsoft’s Personal Computer Software Products
Fig. 9.1
283
Matched-model price indexes: Desktop operating systems, 1993–2001
Notes: (a) The desktop operating systems index includes transactions from the “MS-DOS,” “MS-DOS with Enhanced Tools,” “Windows,” “Windows for Workgroups,” “Windows 95,” “WIN95/ISK BUNDLE,” “Windows 98,” “Windows ME,” “Windows NT Workstation,” and “Windows 2000 Professional” product families. (b) For a price series starting in year 0 and ending in year n, AAGR (Pn/P0)1/n – 1. Source: MS sales Microsoft “As Shipped” perspective, July 1993–June 2001.
study of software based on Canadian scanner data transactions between 1996 and 2000, Prud’homme and Yu (2002) report very different growth rates based on the various price indexes; their AAGRs are –24.9 percent with the Paasche, 18.0 percent with the Laspeyres, and –5.9 percent with the Fisher Ideal. One interpretation of the positive correlations between price and quantity relatives occasionally found in the Microsoft data, particularly in the context of operating systems, is that they reflect the positive feedback on sales from network externalities. A related interpretation is that measured prices do not properly control for quality aspects such as network externalities.16 An analysis of quality-adjusted prices for software is not undertaken in this paper though future work may investigate this issue. 9.6 Existing Research on Measuring Prepackaged Software Prices 9.6.1 Studies on Software Price Changes There have been relatively few research studies to date that report estimates of measures of prepackaged software price changes over time. In addition, the only studies of which we are aware that have reported price in-
16. For a discussion of consumption externalities and impacts on demand in the context of antiulcer drugs, see Berndt, Pindyck, and Azoulay (2003). Studies of network effects in the context of software applications can be found in Gandal (1994, 1995) and Brynjolfsson and Kemerer (1995).
284
Jaison R. Abel, Ernst R. Berndt, and Alan G. White
dexes for a multiproduct firm operating in an unregulated price context are those by Cocks (1974, 1977) for a pharmaceutical manufacturer. A comparison of the Microsoft-specific results presented in this paper with existing academic and government studies on the measurement of price changes for prepackaged software products could be informative. In table 9.4, we present a summary of the main findings of the studies of which we are aware. Direct comparisons of results in these studies to the findings in this paper may be problematic for a number of reasons. First, these studies report results that typically employ data that end in the early to mid-1990s. We have computed results that use MS Sales data beginning in mid-1993 through mid-2001. Because the studies cover different time periods, direct comparisons of results in these studies with the findings in this paper may not be appropriate. Second, studies of software price indexes published to date have focused primarily on retail-level transactions. For Microsoft, sales of full-packaged products sold through the finished-goods channel have become an ever smaller and unrepresentative portion of Microsoft’s applications sales over time. Instead, volume-related sales now constitute the majority of Microsoft’s applications sales. Moreover, OEM sales are not tracked by these studies, and OEM sales are particularly important for desktop operating systems. Therefore comparisons between Microsoft’s price changes and those from other studies relying primarily on retail-level transactions may not be appropriate. Third, most of the U.S. software price index studies published to date, with the exception of that by Oliner and Sichel (1994), employ the hedonic price index method to explicitly adjust for quality changes in software products over time. The matched-model method we have used in this paper attempts to control for quality change by comparing prices only of similar products over time; we have not adjusted the matched-model price indexes further to reflect changes in software product quality over time. However, in the following we discuss adjustments made by the BEA in part to control for bias in the matched-model method due to failure to incorporate fully quality improvements. The studies summarized in table 9.4 show that prepackaged software prices have been declining over time. Although there are differences between these studies and our analyses, it is worth noting that these declines in software prices (both adjusted and not adjusted for quality change) are largely consistent with the declines in software prices we find using our matched-model price indexes. 9.6.2 U.S. Government Producer Price Indexes for Prepackaged Software The BLS compiles and publishes a large number of consumer and producer price indexes for different products at varying levels of aggregation. As
Prud’homme and Yu (2002)
http://www.stats.bls.gov Grimm and Parker (2000)
Oliner and Sichel (1994)
Harhoff and Moch (1997)
General
Spreadsheets Spreadsheets Databases Word processors Spreadsheets Spreadsheets Word processors Databases Databases Word processors Spreadsheets Databases General General
Product(s) considered
Method used Hedonic Hedonic Hedonic Hedonic Hedonic Hedonic Hedonic Hedonic Matched-model Matched-model Matched-model Matched-model Matched-model Interpolation, Matched-model, and Hedonic Matched-model
Summary of price index measurement research
Grohn (n.d.) Brynjolfsson and Kemerer (1995) McCahill (1997)
Gandal (1994) Gandal (1995)
Source(s)
Table 9.4
1986–1991 1989–1991 1989–1991 1985–1995 1987–1992 1986–1993 1985–1994 1986–1994 1986–1994 1985–1993 1985–1993 1985–1993 1997–2002 1959–1998 1996–2000
U.S. Canada
Years covered
U.S. U.S. U.S. Germany U.S. U.S. U.S. Germany Germany U.S. U.S. U.S. U.S.
Country
–10.9 –4.4 to –7.9
–15 –4.4 –1.5 –11.3 to –36.9 –14.8 to –16.5 –9.0 to –16.9 –15.1 to –18.5 –7.41 –9.25 –2.6 –4.5 –4.7 –0.45
Annual price change (%)
286
Fig. 9.2
Jaison R. Abel, Ernst R. Berndt, and Alan G. White
BLS and MS sales price indexes comparison, 1997–2001
Sources: http://www.stats.bls.gov; Series ID: PCU7372#. MS sales Microsoft “As Shipped” perspective, July 1993–June 2001.
part of its producer price index coverage, the BLS first began publishing a monthly producer price index for prepackaged software in December 1997.17 The BLS prepackaged software price index is based on a survey of producer selling prices, that is, at the first line of distribution, collected from a sample of manufacturers of prepackaged software (not just Microsoft). The BLS collects price quotes from both the OEM and finished-goods channels and for full versions and upgrades. To preserve continuity in the index, the BLS attempts to collect price quotes for comparable products over time. The current methodology of the index is a fixed-basket matched-model Laspeyres price index with plans to update the weights every five to seven years. Figure 9.2 shows the BLS annual aggregate producer price index for prepackaged software, from 1997 through 2001. Given the coverage, scope, and methodology of the BLS producer price index for prepackaged software, comparisons between it and the Microsoft price indexes we describe in this paper can be more meaningful than would be a comparison of the Microsoft matched-model price indexes with those based on the studies cited in table 9.4. Over the common 1997 to 2001 time period, the BLS PPI for prepackaged software increased at a rate of 0.35 percent per year, while the price index for the Microsoft Products decreased at a rate of –0.60 percent per year. 9.6.3 Impact of Quality Change and General Inflation Although the BLS aggregate PPI for prepackaged software and that for the Microsoft Products show reasonably similar trends, both likely under17. See http://www.stats.bls.gov, series IDs PCU7372# (prepackaged software).
Price Indexes for Microsoft’s Personal Computer Software Products
287
state quality-adjusted price declines. Specifically, with respect to hedonic price index studies for prepackaged software, the existing literature reports that hedonic quality-adjusted prices for spreadsheets and word processors have generally fallen more rapidly than have the corresponding matchedmodel price indexes. The latter fail to capture fully many quality improvements between different versions and generations of prepackaged software products over time (Oliner and Sichel 1994). Because of the widely recognized potential understatement of true price declines (or overstatement of true price increases) as measured by matched-model price indexes, in 2000 the U.S. BEA began to make a “biasadjustment” to the BLS prepackaged software price index.18 The adjustment is based on the following calculation: Grimm and Parker (2000) compare two sets of indexes over the 1985 to 1993 period: (a) the Oliner and Sichel (1994) matched-model price indexes for spreadsheets, word processors, and databases; and (b) a BEA hedonic price index for spreadsheets and word processors.19 The average annual difference between these two sets of price indexes over the 1985 to 1993 time period is –6.3 percent. The BEA calculates its bias adjustment as one-half of this –6.3 percent annual difference, or –3.15 percent. When compiling and publishing the BEA’s quarterly measures of U.S. real GDP and real GPI, the BEA then applies this bias adjustment, converted from annual to quarterly, to the BLS producer price index for prepackaged software. The use of this adjustment by the BEA to more fully encompass qualityadjusted software price declines than are captured by the BLS matchedmodel price index suggests that it is reasonable to believe that the matchedmodel software price indexes computed here for the Microsoft Products also understate quality-adjusted price declines. In addition, the matched-model price indexes computed for the Microsoft Products do not take into account changes in the general inflation level (as measured by the GDP implicit price deflator) during the 1993 to 2001 period. Between 1993 and 2001, economywide prices rose by an AAGR of approximately 1.90 percent per year as measured by the implicit GDP deflator, which is 6.16 percentage points greater per year than the annual decline in the Microsoft Products of 4.26 percent, based on our matched-model index calculations.20 Over the entire 1993 to 2001 time period, the cumulative difference becomes 61.3 percent. 18. Grimm and Parker state “[a]n annual bias adjustment is made because it is likely—assuming less than complete market equilibrium—that matched-model indexes understate quality-adjusted price declines; quality improvements, such as enhanced power and performance, tend to be introduced in new versions of software, so they are not captured by the matched-model estimates” (2000, 15). A further discussion of the BEA’s software price estimates is found in Seskin (1999). 19. The BEA hedonic price index is an extension of work done by Gandal (1994), Brynjolfsson and Kemerer (1996), and McCahill (1997). 20. See http://www.bea.doc.gov, chain-type price index for GDP, final sales, and purchases, published August 2, 2002.
288
Jaison R. Abel, Ernst R. Berndt, and Alan G. White
9.7 Conclusions Although there are differences over time periods and across products, the prices of Microsoft’s desktop operating systems and applications have generally been falling over the time period between July 1993 and June 2001. During this time there have been important changes in license arrangements with the growth of volume licensing programs and changes in product form involving a major shift toward sales of Office suites and away from stand-alone sales of Word and Excel. Prices for the Microsoft Products have declined at a rate of 4.26 percent annually. This compares with an almost 2 percent rise in economywide prices as measured by the implicit GDP price deflator. This decline in the Microsoft Products price indexes likely understates the true price decline, given the improvements in the quality of software products over the 1993 to 2001 time period. Although the research challenges would be considerable, we believe that incorporating quality improvements into the price indexes of these products would result in even greater declines in prices than those reported here. Another set of challenges in the context of price index measurement is the development of theoretical underpinnings for price indexes of bundled goods, and its empirical implementation in the more specific context of computer hardware and software.
References Allen, R. G. D. 1975. Index numbers in theory and practice. Chicago: Aldine. Balk, B. E. 1995. Axiomatic price index theory: A survey. International Statistical Review 63:69–93. Berndt, E. R., S. H. Busch, and R. G. Frank. 2001. Treatment price indexes for acute phase major depression. In Medical care output and productivity, ed. David M. Cutler and Ernst R. Berndt, 463–505. Studies in Income and Wealth, vol. 62. Chicago: University of Chicago Press. Berndt, E. R., R. S. Pindyck, and P. Azoulay. 2003. Consumption externalities and diffusion in pharmaceutical markets: Antiulcer drugs. Journal of Industrial Economics 51:243–70. Boskin, M. J., E. R. Dulberger, R. J. Gordon, Z. Griliches, and D. W. Jorgenson. 1996. Toward a more accurate measure of the cost of living. Final Report of the Advisory Commission to Study the Consumer Price Index. Washington, DC: Government Printing Office. Brynjolffson, E., and C. F. Kemerer. 1995. Network externalities in microcomputer software: An econometric analysis of the spreadsheet market. Management Science 42:1627–47. Bureau of Labor Statistics. 2000. Industry synopsis, SIC 7372—Prepackaged software. Washington, DC: Bureau of Labor Statistics. Cocks, D. L. 1974. The measurement of total factor productivity for a large U.S. manufacturing corporation. Business Economics 9:7.
Price Indexes for Microsoft’s Personal Computer Software Products
289
———. 1977. Drug-firm productivity, R&D, and public policy. Pharmaceutical Technology 1:21, 46. Committee on National Statistics. 2002. At what price? Conceptualizing and measuring the cost-of-living and price indexes. Panel on Conceptual, Measurement and Other Statistical Issues in Developing Cost-of-Living Indexes, ed. C. Schultze and C. Mackie. Washington, DC: National Academy Press. Conference Board, The. 1999. Measuring prices in a dynamic economy: Reexamining the CPI. New York: The Conference Board. Danzon, P., and L. Chao. 2000. Cross-national price differences for pharmaceuticals: How large and why? Journal of Health Economics 19 (2): 159–95. Diewert, W. E. 1992. Fisher ideal output, input and productivity indexes revisited. Journal of Productivity Analysis 3:211–48. ———. 1993. The economic theory of index numbers: A survey. Essays in Index Number Theory. Vol. 1, ed. W. Erwin Diewert and Alice O. Nakamura, 177–221. London: North Holland. ———. 1995. Axiomatic and economic approaches to elementary price indexes. University of British Columbia, Discussion Paper no. 95-01. Gandal, N. 1994. Hedonic price indexes for spreadsheets and an empirical test of network externalities. RAND Journal of Economics 25:160–70. ———. 1995. Competing compatibility standards and network externalities in the PC software market. The Review of Economics and Statistics 77:599–608. Grimm, B., and R. Parker. 2000. Software prices and real output: Recent developments at the Bureau of Economic Analysis. Paper presented at the National Bureau of Economic Research Program on Technological Change and Productivity Measurement, Cambridge, MA. Grohn, A. (n.d.). Network effects in PC software: An empirical analysis. Kiel University. Unpublished Manuscript. Harhoff, D., and D. Moch. 1997. Price indexes for PC database software and the value of code compatibility. Research Policy 26:509–20. McCahill, R. J. 1997. A hedonic study of prepackaged software. Master’s thesis, Virginia Polytechnic Institute and State University. Oliner, S. D., and D. E. Sichel. 1994. Computers and output growth revisited: How big is the puzzle? Brookings Papers on Economic Activity, Issue no. 2:273–330. Prud’homme, M., and K. Yu. 2002. A price index for (pre-packaged) computer software using scanner data. Statistics Canada. Unpublished Manuscript. Seskin, E. P. 1999. Improved estimates of the National Income and Products Accounts for 1959 to 1998: Results of the comprehensive revision. Survey of Current Business 79:15–39. Triplett, J. E. 1992. Economic theory and BEA’s alternative quantity and price indexes. Survey of Current Business 72:49–52.
10 International Comparisons of R&D Expenditure Does an R&D PPP Make a Difference? Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, and Bart van Ark
10.1 Introduction Concerns with science and technology (S&T) capabilities are widespread in the United States as well as in other developed countries. This is understandable in light of the importance of knowledge and technology in generating long-run growth of productivity, per capita income, and employment. Trends and levels of research and development (R&D) spending and, in particular, ratios of R&D expenditure to gross domestic product (GDP) or national income are often used as a measure of innovativeness as they capture the resources devoted to achieving future technological change.1 In Europe, for example, governments at the Barcelona European Council noted Sean M. Dougherty is an economist at the Organization for Economic Cooperation and Development’s (OECD) Economics Department. Robert Inklaar is an assistant professor at the University of Groningen and a researcher at The Conference Board. Robert H. McGuckin was director of Economic Research at The Conference Board. Bart van Ark is a professor of economics at the University of Groningen and director of International Economic Research at The Conference Board. This research is made possible by grant SRS/SES 00-99594 from the National Science Foundation (McGuckin, van Ark, et al. 2004) and was carried out while the authors were all residing at The Conference Board. Related work was presented at seminars of the National Academy of the Sciences in Washington, D.C.; the OECD Science, Technology, and Industry Directorate in Paris; the NBER Conference on Research in Income and Wealth Summer Institute in Cambridge, MA; The Conference Board’s International Innovation Council in Cambridge, U.K.; and the meeting of the Canberra II Group on the Measurement of NonFinancial Assets in Voorburg, The Netherlands. We received particularly useful comments from Andrew Wyckoff and Dominique Guellec (OECD), Jeffrey Bernstein (Carleton University and NBER), and Ernst Berndt (MIT and NBER). We are deeply saddened that our coauthor Robert H. McGuckin passed away in March 2006. His contribution to the economics profession will live on through his extensive published work. 1. Policy discussions must also consider the productivity and composition of these efforts, which are likely to differ across countries (and industries) as well as the magnitudes of the
291
292
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
that European R&D expenditures are well below those of the United States and set a target to dramatically increase R&D spending from 1.9 percent of GDP to 3.0 percent by 2010 (European Commission 2002). Whereas nominal R&D intensity provides a measure of the burden (in monetary terms) on society of R&D activities, it is less informative about the real resources devoted to R&D because it does not take into account differences in relative prices of R&D inputs across countries. For this purpose, R&D-specific purchasing power parities (PPPs) are needed, which measure how much needs to be spent in a country to acquire one U.S. dollar’s worth of R&D inputs.2 Hence R&D expenditures that are converted at R&D PPPs will give a better measure of the differences in actual resources devoted to R&D between countries. In this sense, PPPs are comparable to price deflators that adjust nominal values for price changes to arrive at real, or volume, measures. When making international comparisons of R&D, PPPs should reflect differences in relative prices. Because R&D output prices cannot be directly measured, we need to focus on the prices of R&D inputs. Most studies and statistics use aggregate proxies, such as the PPP for GDP, but these will generally not suffice. While GDP PPPs reflect relative prices of primary inputs—labor and capital—each input’s representation in GDP does not reflect its importance to R&D, and they are not specific to R&D. Moreover, GDP is based on the concept of final goods and services, rather than the intermediate goods and services that make up a large part of R&D expenditure. Finally, use of GDP PPPs does not capture differences in the industrial composition of R&D across countries. While use of industrylevel nominal R&D expenditure can partially address the composition issue, remaining distortions in prices can be a serious problem. Taking the latter point a step further, when focusing on real R&D intensities by industry, not only the numerator—R&D expenditure—needs to be converted using a specific R&D PPP, but the denominator—industry output—also requires an industry-specific output PPP. The use of a GDP PPP to adjust for relative price levels in manufacturing would be equally inappropriate. Recent experience with industry-level PPPs from the International Comparisons of Output and Productivity (ICOP) project suggests spillovers generated. In examining these issues, it is important to develop R&D capital-stock measures rather than focus only on current expenditures. While these issues are not dealt with in this paper, PPPs and price deflators are basic building blocks for this type of analysis. 2. As rates of equivalence for comparable goods in local currency prices, purchasing power parities (PPPs) have the same units as exchange rates. If PPPs and exchange rates are the same, then there is no difference in relative prices or cost across countries. However, there are many reasons why exchange rates are not good substitutes for PPPs. Of particular relevance to R&D, there is no necessary reason why the relative prices of goods that are not traded internationally should conform to exchange rate values. Exchange rates are also vulnerable to a number of distortions, for example, currency speculation; political events, such as wars and boycotts; and official currency interventions, that have little or nothing to do with the differences in relative R&D prices across economies (National Science Foundation 2002).
International Comparisons of R&D Expenditure
293
that substantial differences exist between manufacturing output PPPs and GDP PPPs, even for economies at similar levels of development (van Ark 1993; van Ark and Timmer 2001). Therefore PPP adjustments—taking account of differences in the structure of relative prices of R&D inputs and output across economies and industries—may be worth the considerable effort required for their measurement.3 A search of the literature finds relatively little empirical work on R&D price indexes, particularly across countries. In fact, the latest R&D PPP estimates we could find were done in the early 1990s for the year 1985. Typically, the issue is either ignored because detailed price data are not available or a GDP PPP is used in cost comparisons. For comparisons of R&D intensity, nominal values are usually employed. To compare R&D expenditure over time, a GDP deflator is most commonly used. The lack of good measures in the area of R&D price indexes has not gone unrecognized. Zvi Griliches lamented the lack of good information on the “price” of R&D in his remarks twenty years ago, on the occasion of the National Bureau of Economic Research (NBER) Conference on R&D, Patents, and Productivity (Griliches 1984). Griliches further emphasized the importance of having reliable information on R&D and its price to compare expenditures and intensities in his presidential address to the American Economic Association (Griliches 1994). This paper brings together a wide range of statistical data to develop relative R&D prices for nineteen manufacturing industries in six Organization for Economic Cooperation and Development (OECD) countries— France, Germany, Japan, the Netherlands, the United Kingdom, and the United States—with the United States as the base country. This exercise is undertaken for two benchmark years, 1997 and 1987, chosen because these are years with information from the U.S. Economic Census, benchmark international PPP studies on industry output, and comprehensive R&D surveys from each of the countries. Industrial census data and collections of international prices are used to compare prices of intermediate goods. Data from national R&D surveys of business enterprises are used to develop R&D-specific prices and quantities. Interpretation of the data was also guided by information collected in over thirty-five interviews of R&D executives at international affiliates of multinational companies in four of the most R&D-intensive industries: pharmaceuticals, computers, telecommunications equipment, and motor vehicles. The interviews were invaluable in understanding issues of comparability in different countries’ data, due to differences in reporting practices, tax regulations, and interpreta3. The Frascati Manual (OECD 1994, 12) states that “[R&D intensity] indicators are fairly accurate but can be biased if there are major differences in the economic structure of the countries being compared.” Arguably, because R&D is not a tradable commodity and one of its major components is labor, whose price exhibits great differences across countries, such differences are probable.
294
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
tions of R&D definitions, among other issues. Moreover, we gleaned important qualitative information that was useful in interpreting the implications of the results.4 In the sections that follow, we first review previous research on R&D PPPs and its limitations. Next we describe our estimates of manufacturing R&D PPPs for 1997 and 1987. These PPPs are used to compare international R&D cost levels and intensity. We then assess differences with current practices. We find that our preferred R&D PPP measure can be simplified without a large impact on the results. This alternative resembles the Griliches-Jaffe R&D deflator and is far easier to construct than our most preferred measure. Both measures differ substantially from the GDP PPP. 10.2 Previous Research on R&D PPPs This study is not the first to address the problem of estimating PPPs for R&D. Nevertheless, there has been relatively little effort to create R&D PPPs, particularly compared to the volume of work carried out by official statistical agencies in the price index area. While there are many reasons for this state of affairs, an important factor is that R&D expenditures are not yet incorporated into the System of National Accounts.5 A key issue in estimating R&D PPPs is that the output of R&D cannot be easily defined. If R&D were a typical economic activity, like the production of steel or cotton, then standard measurement of quantities and prices could be applied. However, the results of R&D often are ideas and other intangibles that are typically in the hard-to-measure area.6 Moreover, R&D services are often transferred within the firm rather than traded on markets so prices are hard to measure. As a result, measurement of R&D prices has generally focused on constructing input price indexes, which can be used to assess differences in costs. This approach has characterized all the major studies from the 1960s onward.7 Given the difficulties 4. The interviews are not described in detail in this paper. More information can be found in McGuckin et al. (2004a,b). Large multinational R&D performers in four high-tech industries in the United States, Japan, and Europe were selected for face-to-face interviews. Even with the small sample, coverage of many countries’ industries is substantial. Interviews involved structured discussions about firms’ R&D organization, composition, and reporting practices. A detailed financial questionnaire on R&D costs items and expenditures was also completed by about one-third of the interviewed firms. 5. See Fraumeni and Okubo (2005) for recent work on developing R&D measures in the framework of the U.S. National Income and Product Accounts. 6. In related work, we have found that although research is for the most part intangible, development is quite different and has physical dimensions that should be relatively easier to measure (McGuckin, Inklaar, et al. 2004). 7. One quite different approach has been applied to pharmaceuticals, where the total cost of an innovation is priced out over its development cycle, including the cost of failures (DiMasi, Hansen, and Grawbowski 2003). While this approach has great appeal when assessing the cost of a specific innovation like a drug, it is harder to apply in other industries and says little about the relative cost of performing R&D in different countries.
International Comparisons of R&D Expenditure
295
in measuring output even for relatively well-defined high-technology products, such as Information Technology (IT) capital, some caution should be used when interpreting price indexes for R&D. 10.2.1 Overview of Earlier Studies In most of the literature, the relative cost of R&D across countries is estimated based on prices for a basket of “standard” R&D inputs at the economywide level.8 Freeman and Young (1965) performed the first of these studies. Their work was undertaken for the year 1962, before the first edition of the Frascati Manual (OECD 1963), and they did not benefit from the more comparable survey instruments in use today. Nevertheless, they use expenditure categories similar to those we apply in this study. Freeman and Young estimate a PPP for R&D by breaking up total R&D expenditure into labor costs, materials, other current and capital expenditures. For labor costs they calculate the wage cost per worker in R&D and assume this is also appropriate for other current expenditure. For materials and capital expenditures, they assume the exchange rate is the appropriate price. Brunner (1967) compares the cost of research projects subcontracted by the U.S. Department of Defense across a number of European countries. For these projects, subcontractors supply budget sheets, which contain data on total costs, including wages, benefits, support, and overhead costs. The cross-country comparability issues are likely to be smaller than in the Freeman and Young study as the Department of Defense imposes similar budget standards on all subcontractors. However, the estimate includes a very specific subset of R&D, and it is unclear if the budgets include all R&D costs (e.g., capital expenditures). The work by MacDonald (1973) extends the previous two studies to sixteen OECD countries by calculating R&D PPPs relative to the United Kingdom.9 He distinguishes between labor cost, other current cost, and capital expenditure. For the countries included in the Brunner (1967) study, MacDonald uses wage data for scientists and for technicians based on that study. For the other countries, he relies on average wage costs (total labor cost over total number of R&D workers). His estimate of a capital PPP is based on relative prices from trade statistics, weighted using the aggregate expenditure on these products. For other current expenditure, he assumes the exchange rate is applicable. Based on these figures, he finds 8. The Conference Board (1976) and Mansfield (1988) directly queried firms about the relative cost of selected R&D inputs, but this approach is difficult to generalize to multiple industries and countries. 9. In table 10.1, we convert these to cost levels relative to the United States to facilitate comparability. This is appropriate as all PPPs are aggregated from individual cost category PPPs using U.K. weights, in effect creating a Laspeyres-type index. Although the Laspeyres index has its weaknesses, it is transitive.
296
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
Table 10.1
Country France Germany Japan The Netherlands United Kingdom United States
Previous studies of R&D PPPs—R&D price levels (cost relative to the United States) Freeman & Young (1962)
Brunnera (1961–62)
MacDonaldb (1963–64)
OECD (1970)
Kiba et al. (1985)
66.7 58.8
42.4 28.7
34.0 100.0
73.3 70.6 57.1 68.1 58.8 100.0
76.8 85.4 81.3
52.6 55.6 100.0
60.0 60.0 35.3 66.7 60.0 100.0
68.0 100.0
Sources: Freeman and Young (1965), Brunner (1967), MacDonald (1973), OECD (1979), and Kiba, Sakuma, and Kikuchi (1994). Notes: R&D price levels are defined as R&D purchasing power parity (PPP) divided by the exchange rate of the country’s currency relative to the U.S. dollar. These levels represent costs relative to the United States. a Refers to research costs only. b Price levels are converted to use the United States as base country (original study used the United Kingdom as base country).
that R&D in the United States is around 40 percent more expensive and Japan 70 percent cheaper than in the United Kingdom (see table 10.1). In 1979 the OECD published a study, presenting calculations for R&D deflators for the 1966 to 1976 period and an R&D PPP for 1970 (OECD 1979). Four cost categories are distinguished in the study: labor, other current costs, land and buildings, and instruments and equipment. The labor PPP is calculated as the average labor cost per R&D worker. A PPP for other current expenditure is proxied as the relative price of current government expenditure other than salaries from International Comparisons Project (ICP) studies. The two capital categories are also ICP-based: for land and buildings the PPP for nonresidential/commercial buildings is used, while for instruments and equipment, the PPP for electrical machinery items is used. The most recent study is by Kiba, Sakuma, and Kikuchi (1994). The countries they cover are France, Germany, Japan, and South Korea, with the United States as the base country. Their breakdown of cost categories is more refined than in previous studies: they distinguish materials spending from other current expenditure, and they break down capital expenditure into machinery and equipment, land and buildings, and other assets. Because such a detailed breakdown was not available for all countries, estimates were made using data from countries where these distinctions could be made. Kiba, Sakuma, and Kikuchi’s (1994) basic approach is to select price parities from GDP final expenditures (ICP studies) to proxy each of the R&D input cost categories. They select their price parities based on the
International Comparisons of R&D Expenditure
297
composition of items in the R&D industry of Japan’s input-output use table. In cases where they cannot identify relevant input price parity headings from ICP, they use the exchange rate as the relative price. This same selection of prices is used for all countries. Their match between R&D categories and price parities is very rough and is based on only the Japanese structure of R&D inputs. If the input-output tables were sufficiently comparable across countries, use of the input structure for the R&D industry could be very useful. However, our research indicates that the data for the R&D industry are not comparable. The problem is that the inputs allocated to the R&D industry depend on the institutional structure of the country and the related issue of which facilities are designated as R&D labs by data collectors. German R&D firms, for example, obtain a significant share of their intermediate inputs from the education sector, while in other countries, this share is nonexistent. In the United States, only stand-alone laboratories are included in the R&D industry and their inputs are very different from integrated facilities (McGuckin, Inklaar, et al. 2004). 10.2.2 Drawing Lessons The methodologies used in these studies for calculating R&D PPPs contain several common features. As the OECD (1979) notes, an ideal approach would be to calculate the labor cost per employment occupational category (scientist, technician, or support), but limitations on the disaggregation of labor expenditure prevent this method from being implemented broadly. While Kiba, Sakuma, and Kikuchi (1994) use ICP government and educational labor PPPs as a proxy for an R&D labor PPP, this is likely to be a less-appropriate measure of the average labor cost per R&D worker. The latter method is commonly employed in studies on an economywide basis; we adopt the same approach at the industry level for this study. Calculating a PPP for the other current expenditure category is a problem because it is difficult to determine exactly what inputs are in this category. In general, there are two major groups, purchased goods and purchased services. The first would include material costs (raw, nondurable goods) but, depending on statutory tax depreciation provisions, also machinery and instruments. The second, frequently referred to as overhead costs, can include anything from building rent to the purchase of scientific journals. The procedure used by MacDonald (1973) that assigns the market exchange rate for materials, and the labor PPP for overhead is probably too crude. Overhead, for example, includes much more than simply extra labor cost. The OECD (1979) and Kiba, Sakuma, and Kikuchi (1994) take a more promising approach by using product-specific ICP expenditure PPPs to come up with a PPP for this cost category. A further point to note is that
298
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
the price consumers pay for final consumption goods or firms for investment goods may not be relevant for intermediate input purchases by R&D labs. MacDonald (1973); the OECD (1979); and Kiba, Sakuma, and Kikuchi (1994) develop capital PPPs using import and export prices. Unfortunately, these prices may not reflect prices paid for similar goods by R&D laboratories. It is probably more appropriate to select one or more PPPs for both land and buildings and instrument and equipment, as is done by the OECD (1979) and Kiba, Sakuma, and Kikuchi (1994) using ICP expenditure PPPs. Finally, the aggregation used in most of these studies could be improved. The earlier studies use a weighted average of the category PPPs to calculate their economywide R&D PPPs. While the MacDonald, OECD, and Kiba et al. studies use a Laspeyres-type aggregation, for many countries they do not have complete expenditure weights. None of the studies calculates a Fisher-type index or some type of multilateral index, which are the preferred methods in PPP studies (Kravis, Heston, and Summers 1982; van Ark and Timmer 2001). Despite the various shortcomings of each study, the studies provide a similar bottom line. Table 10.1 shows that the relative price of R&D of other countries compared to the United States had a strong upward trend between 1962 and 1985.10 The table focuses on those countries that are included in this study. While R&D was initially less expensive outside the United States in every country, the gap narrowed substantially in the twenty years covered by these studies. For example, between 1962 and 1985, the relative cost level of R&D in Germany rose from around 60 percent of the United States in the early 1960s to 85 percent in 1985. These increases partly reflect the large changes in the exchange rates over these years, but changes in real cost play a role as well. 10.3 R&D PPP Estimation in Manufacturing This work is motivated by concerns about the appropriateness of the current practice of using GDP PPPs for R&D expenditure and international R&D intensity comparisons based on nominal expenditures and output. Limitations on the availability and comparability of international data are the biggest obstacle to more systematic development of R&D-specific PPPs. While not all problems associated with calculating R&D PPPs can be resolved, there have been a number of improvements in data in recent years, and there are a number of areas for potential improvements. For example, work coordinated by the University of Groningen’s ICOP group has created 10. Some studies originally used a different base country, but all have been recast to use the United States as the base country to facilitate the comparison.
International Comparisons of R&D Expenditure
299
databases of industry-level PPPs, supplementing the more widely available expenditure PPPs from the ICP programs of the United Nations, the World Bank, the OECD, and Eurostat (see Kravis, Heston, and Summers 1982; van Ark 1993; and van Ark and Timmer 2001). In addition, the comparability of R&D data has improved, in part through the efforts of national statistical agencies guided by the OECD’s Frascati Manual (OECD 1963, 1981, 1994, 2002). Nonetheless, it is far from clear whether companies in different countries report R&D costs in a similar way. For example, in one country companies may include purchases of new computers under current expenditure, while in others it may be reported as a capital expenditure.11 This is one reason for the use of the firm interviews in our work. Still, the problems with comparability should not be overdrawn. The studies surveyed in table 10.1 demonstrate that similar results are found despite large differences in data availability and methodology. 10.3.1 Methodology and Procedures We develop estimates of industry-specific R&D PPPs by aggregating individual price parities for major categories of R&D expenditures with expenditure share weights derived from national surveys. On this basis, we obtain R&D PPPs for nineteen manufacturing industries that are then aggregated to the total manufacturing level. The principal results of these calculations are two measures that we later use in assessing the cross-country differences. The first is an R&D PPP that measures the price of an R&D unit in a particular country relative to the price in the United States, the base country. This measure is in units of local currency per U.S. dollar and can be used to “deflate” R&D expenditures in the spatial dimension. Second, by dividing the R&D PPP by the dollar exchange rate, we obtain the relative cost (price level) of an R&D unit of input compared with the base country. The R&D PPP for each individual industry is estimated from an aggregation of relative R&D input prices (price parities or just PPPs) using corresponding R&D expenditure shares as weights. For each industry and country pair, cost weights of the base country u—the United States—are used to create a Laspeyres PPP, (1)
u PPP x,u L ∑ w i PPPi . i
Equation (1) is simply a share-weighted average of the individual PPPs for four input categories, labor, materials, other current costs, and capital expenditure, indexed by i. Weights are based on the share of each category’s expenditure in R&D (of the base country in U.S. dollars): 11. See McGuckin, Inklaar, et al. (2004) for more discussion of comparability problems.
300
(2)
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
C ui w ui , ∑ i C ui
where i and j index the cost categories. For the comparison country x, we use that country’s expenditure weights to calculate a Paasche PPP, (3)
x PPP x,u P ∑ w i PPPi i
and (4)
(C xi /PPPi ) w xi , ∑ i (C xi /PPPi )
where w xi is the expenditure share of input category i in the comparison country (x) converted into U.S. dollars using the corresponding PPP. Taking a geometric average of equations (1) and (3) yields a Fisher PPP, the measure of the price of R&D in local currency units of country x per U.S. dollar. Dividing these PPPs by the exchange rate provides a unit-free index measure of relative R&D costs compared to the United States, which is the base country in all the calculations. Thus, all of the comparisons are made on a bilateral basis.12 We now turn to the details of the PPP calculations and their sensitivity to various assumptions and data.13 10.3.2 R&D Input Prices and Weights Computation of R&D PPPs requires both prices and weights for each category of R&D input. We identify four main categories of R&D input: labor, materials, other current costs (overhead), and capital. Weights for each category are based on each input’s representation in R&D expenditure. This industry-level expenditure information comes from summary data compiled by the OECD based on national R&D surveys in each country. We also use industry-specific R&D input prices for labor and materials and economywide prices for other current costs and capital. The labor PPPs rely most heavily on comparisons of wages for R&D personnel, derived primarily from the national R&D surveys. We develop independent estimates for the price of material inputs, other current expenditures, and 12. These bilateral PPPs for each country pair differ from multilateral PPPs as used in the expenditure PPP programs of ICP (see, for example, Kravis, Heston, and Summers 1982). In practice this could mean that some of our pairwise R&D PPP estimates are not transitive. However, given that we only cover six countries in this study with relatively similar cost shares, the gains from using multilateral indices were found to be modest. 13. Note that the industry-level PPPs are aggregated across industries using a Fisher index in order to obtain manufacturing-wide PPPs. This procedure takes account of differences across countries in industry weights.
International Comparisons of R&D Expenditure Table 10.2
301
R&D PPP input categories and price measures Input prices
R&D input category 1. Labor compensation 2. Materials and supplies 3. Other current costs 4. Capital expenditure Total R&Da
Measure
Source
Industry specific?
Average weight (%)
Average wages for R&D personnel Price of industry’s output adj. for margins Prices of overhead goods and services Prices of plant and equipment
NSF/OECD ICOP
Yes Yes
49 18
ICOP/ICP
No
24
ICOP/ICP
No
9
Yes
100
Sources: National Science Foundation (2002); OECD (2002b, 2003); ICOP 1997 (O’Mahony and van Ark [2003] and Inklaar, Wu, and van Ark [2003]); ICOP 1987 (van Ark [1993]). Notes: ICOP International Comparisons of Output and Productivity Project, University of Groningen; NSF National Science Foundation; ICP International Comparisons Project (United Nations, World Bank, Eurostat, OECD). a Aggregation of R&D input category prices to total R&D uses R&D expenditure weights from national R&D surveys.
capital using the industry-of-origin PPPs from the University of Groningen’s ICOP program, which are based on item-level matches derived from production census and industrial survey data in the United States, European Union, and Japan. We supplement this information with PPPs derived from ICP studies using the expenditure approach (OECD 2002b) after making appropriate adjustments to “peel off ” estimated margins for transportation and distribution (see Jorgenson and Kuroda 1992; van Ark and Timmer 2001). The firm interviews, as described in McGuckin, van Ark, et al. (2004), are used to inform the necessary assumptions that are made regarding the structure of R&D expenditure and about how to use the data in a way that approaches a constant quality of input basis. Table 10.2 provides an overview of measures and sources used for the R&D input prices for the construction of the R&D PPP measure. In the following, we discuss our input price measures and weights in some detail and examine possible variants to our preferred measure. Additional details on the estimates are available in an online appendix (McGuckin, van Ark, et al. 2004). Labor Labor is the largest component of R&D cost, averaging about half of total expenditures. Average R&D compensation per R&D employee, based on national R&D survey information, measures the price of R&D labor for R&D performed within business enterprises (intramural). For each country and industry, we calculate the average wage of R&D labor by dividing R&D labor expenditures by the corresponding number of full-time
302
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
equivalent R&D personnel. These wages are then divided by the wage of the base country, yielding the relative price (PPP) for R&D labor. This procedure implicitly assumes R&D personnel in different countries are equally productive, ascribing any differences in wages to higher labor cost, not to higher productivity. Data limitations prevent us from grouping employees by function or qualification and comparing their relative wages across countries before they are aggregated to form R&D labor PPPs. However, in the interviews, firm management stated that the biggest differences in compensation are across technical fields, and these variations are likely to be captured by average compensation in each industry. Firm officials also indicated that the skills of R&D personnel in routine development work, which constitutes the bulk of R&D, are quite similar across countries. This suggests that the tacit assumption that workers in each country have comparable qualities or capabilities may not be that far from the reality.14 However, while this assumption may be realistic for the group of (advanced) countries we study here, much more caution would be necessary if countries like Mexico or China were included in a comparison. A major hurdle in developing R&D compensation rates is the coverage of the U.S. R&D survey, which only collects data on the number of research scientists and engineers (RSEs) in its survey of business enterprises. In contrast to all other countries, there is no information on the number of support staff employed.15 In order to determine the number of support personnel in the United States, we examined a wide range of alternative data sources. A careful assessment of this evidence suggests that the support share in an industry’s total employment is a fair representation of its support share in R&D. More detail on this evidence, which was supported by the firm interviews, is described in McGuckin, van Ark, et al. (2004). In addition, our independent estimate of the U.S. share is in the range of that found for the other countries in this study. Because only R&D personnel headcount is collected rather than fulltime equivalents in Japan, the Japanese R&D labor price is probably understated. If part-time R&D personnel are counted as full time, then compensation per employee is underestimated. While this distinction may not be important in practice, one study made a large downward adjustment to the personnel count (National Science Foundation 1998). On the other hand, given Japan’s typically higher working hours, the net effect of the part-time/full-time difference on average compensation may not be large. 14. This assumption is also supported by an insignificant correlation of labor price with the support share of R&D personnel at the industry level. The support share of R&D personnel provides a proxy for (basic) scientific and engineering skills and is the only comparable skill measure available outside the United States. 15. Information on the number of technicians is also not (explicitly) collected in the United States. However, we have found that most firms appear to make little distinction between RSEs and technicians and tend to include them in reported RSEs.
International Comparisons of R&D Expenditure
303
Other Inputs Materials and supplies represent about 20 percent of R&D expenditure. The interviews suggest that the majority of expenditure in this category consists of prototypes of new products or, in other words, the products of the industry itself. Therefore, we use own-industry output PPPs, adjusted for margins so that they represent the purchase prices of own-industry goods used as inputs.16 These prices come from industry-of-origin studies of item-level matches of industrial census data for specific industries in each country and are described further in section 10.4. It was more difficult to identify prices for other current costs, and these are important at 24 percent of R&D expenditure. According to the firms we interviewed, this category includes an array of goods and services typically described as “overhead.” Detailed financial data for about ten firms showed that this category includes such items as communications services, rent, utilities, and noncapital computers and instruments. We were able to identify industry-of-origin (ICOP) and final expenditure (ICP) price parities that matched many of these goods and services.17 However, this information is not industry specific, so we implicitly assume that the relative prices of these overhead goods and services are similar across industries. While most goods purchased for use in R&D programs are obtained in national markets, they may not be used in the same proportions in all industries. Because we do not have any information about the expenditure shares within this category, we use an unweighted average of eleven price “headings.” For high-tech inputs such as computers, it is particularly difficult for the PPPs to take full account of quality differences. Because there is a wide spread in the prices of these inputs, the resulting price parity for this category is somewhat sensitive to what prices are included and excluded, especially in the case of Germany and Japan. Yet some simple experiments in which we removed outlying prices suggest that the impact on the aggregate R&D PPP is not that large (see McGuckin, van Ark, et al. 2004). It was also difficult to develop prices for capital expenditures; they are, however, the smallest category of R&D expenditure, at 9 percent.18 We fol16. Because output PPPs do not reflect transportation and distribution margins, we add these margins back in using information from input-output tables in order to treat these goods as purchased inputs to the industry. 17. For ICOP (intermediate) prices, this means that transportation and distribution margins are added back in, and for ICP (final expenditure) prices, tax margins are removed (peeled off ). These margins are estimated using input-output tables. 18. In considering capital inputs, a number of additional difficulties arise. Some countries appeared to have quite low capital expenditures. This could be related to a greater tendency to own land, in which case the opportunity costs of owning are not accounted for (some interviews suggested this). To the extent that firms in other countries are more likely to lease land, capital expenditures could be misleading. Moreover, capital service flows based on appropriately valued capital stocks are the appropriate concept, but given data limitations, we
304
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
Table 10.3
R&D expenditure shares, total manufacturing, 1997 Shares of total manufacturing R&D expenditure (%)
R&D input category
The United United France Germany Japan Netherlands Kingdom States
1. Labor compensation 2. Materials and supplies 3. Other current costs 4. Capital expenditure Total R&D cost
52.8 16.9 a 23.2 7.1 100.0
61.7 13.9 a 17.5 6.9 100.0
42.7 20.3 27.3 9.7 100.0
52.1 14.7 a 23.7 9.5 100.0
37.0 26.1 24.8 12.1 100.0
46.5 15.8 29.3 8.4a 100.0
Average from interviewsb 46.7 19.7 24.4 9.2 100.0
Source: National R&D surveys, National Science Foundation (2002), OECD (2003). See McGuckin, van Ark, et al. (2004) for more details. a See text for description of assumptions made to determine weights. b Average of 10 firms’ expenditures that provided detailed financial data for total R&D in interviews.
lowed a similar approach to that used for other current costs and selected five ICOP and ICP price parities that correspond to plant and equipment headings appropriate for capital expenditures. Again, because we do not have an industry-level breakdown of capital expenditure, we implicitly assume that the proportions of capital inputs and relative prices of capital inputs used in each industry are similar across countries. The assumption of common patterns and national markets seems more plausible for the other current and capital costs than for labor or materials. But the lack of systematic weights and potential quality-adjustment problems for the prices of current cost and capital items means that we are less confident about the PPPs for these inputs. Therefore, we explore some alternative R&D PPPs that use different proxies for these input categories. The most interesting of these uses the industry-specific material PPPs for all of the nonlabor inputs, while another uses the GDP PPP. These are described further in section 10.3.5. Weights (Shares) Weights for each of the four categories of inputs by country are shown in table 10.3. Each of these expenditure shares for total manufacturing is built up from expenditures of nineteen manufacturing industries in the national R&D surveys. As shown in the table, the expenditure shares from national statistics are in a similar range as those we obtained from firm interviews. In fact, if we compare the ten firms’ financial information we obtained in interviews with corresponding industry expenditures shares in can only consider current capital expenditure. Still, capital expenditures were rarely a large share of expenditure on R&D, so the results may not be substantially affected by these problems.
International Comparisons of R&D Expenditure
305
firms’ home countries, their labor shares only differ by about 2 percent, on average. There were two categories of expenditure where we had to make assumptions about the shares. First, expenditure information on materials and supplies is not collected in France, Germany, and the Netherlands. For these countries, we assigned the average of the United States, United Kingdom, and Japan’s shares of nonlabor, noncapital expenditure. Second, the U.S. R&D survey only collects R&D depreciation, so it is not comparable with the other five countries’ R&D capital expenditures. Moreover, because accounting requirements for R&D (at least in the United States) restrict the capitalization of R&D-specific assets, depreciation is likely to be quite different from even the average expenditure on capital. In fact, the U.S. depreciation share is far lower than the other countries’ capital expenditure shares, at only 1.3 percent compared to the 9 percent average for the other countries. The 9 percent figure is also closer to the typical capital expenditures of the firms we interviewed. We therefore use the industryspecific average of the other five countries’ capital expenditure shares as an estimate of the U.S. share. More details about the interviews and the basis for our assumptions about the R&D input prices and weights are described in McGuckin, van Ark et al. (2004). 10.3.3 Discussion of the Results Table 10.4 provides estimates of the R&D PPP and the price level or cost of R&D for each country. These price levels represent the relative cost of a unit of R&D input in each country compared with the United States. R&D price levels are defined as the R&D PPP divided by the exchange rate of the country’s currency relative to the U.S. dollar. These levels represent costs relative to the United States. If the PPP is the same as the exchange rate, the price level equals 100. Based on these results for 1997, manufacturing R&D in Germany and
Table 10.4
R&D PPPs and R&D price levels (cost relative to the United States), total manufacturing, 1997
R&D (labmatOCcap) Exchange rates R&D price level (U.S. 100)
France (€ /$)
Germany (€ /$)
Japan (¥/$)
The Netherlands (€ /$)
United Kingdom (£ /$)
United States ($/$)
0.86 0.89 96.4
0.98 0.88 111.0
138.1 121.0 114.1
0.80 0.88 90.0
0.54 0.61 88.8
1.00 1.00 100.0
Sources: See sources to tables 10.2 and 10.3. Notes: Exchange rates are year averages (EMU countries converted into Euro equivalents). R&D price levels are defined as R&D purchasing power parity (PPP) divided by the exchange rate of the country’s currency relative to the U.S. dollar. These levels represent costs relative to the United States.
306
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
Table 10.5
R&D input price levels (cost relative to the United States), total manufacturing, 1997
Input category 1. Labor compensation 2. Materials and supplies 3. Other current costs 4. Capital expenditure
France
Germany
Japan
The Netherlands
United Kingdom
United States
84.9 118.1 102.0 108.8
97.6 129.9 133.2 119.1
93.9 101.0 161.3 103.3
76.4 117.5 95.0 118.4
58.9 149.3 107.1 105.2
100.0 100.0 100.0 100.0
Sources: See sources to table 10.2 and 10.3. Notes: R&D price levels are defined as R&D purchasing power parity (PPP) divided by the exchange rate of the country’s currency relative to the U.S. dollar. These levels represent costs relative to the United States.
Japan is 11 percent to 14 percent more expensive than in the United States, while in France, the Netherlands, and the United Kingdom, R&D is 4 percent to 11 percent less expensive. Because the expenditure weights are relatively similar across countries, these cost differentials are driven by the differences in the relative prices of input categories. Comparative price levels for each R&D input category are shown in table 10.5 for total manufacturing. Lower prices in France, the Netherlands, and the United Kingdom can be traced to lower R&D labor prices. The higher prices in Germany and Japan are attributable to the high price of other current costs, or overhead, expenses. For both countries, wholesale and retail trade and transportation and storage have the highest relative prices (McGuckin, van Ark, et al. 2004). In Japan, insurance is also relatively expensive, while in Germany, electricity, gas, and water are relatively costly. The approximate magnitude of the price differences that we observe using these newly constructed R&D PPPs are similar in character to those reported in the interviews. In most cases, the cost of performing routine R&D was described as not varying all that much across the countries included in this study. The differences we measure for total manufacturing in the 5–15 percent range are consistent with these observations. Labor Prices and Interindustry Variation Because labor represents the largest share of R&D and the data are R&D and industry specific, it is worth examining the labor PPPs more closely. Interviews suggest that R&D labor compensation can vary widely between technical fields and that the mix of technical fields varies greatly from firm to firm and from industry to industry. Labor costs do vary considerably across industries and, particularly, across countries, even within industries. Due to shortage of space, this paper does not show the results for the
International Comparisons of R&D Expenditure
307
nineteen individual manufacturing industries.19 Interindustry variation is illustrated by the coefficients of variation (CV) for price levels of R&D labor relative to the United States. These are especially wide for the Netherlands and the United Kingdom, where the CVs are 0.38 and 0.40, respectively.20 In contrast, France has the narrowest range of relative labor-price levels across industries, with a CV of 0.16. An important question is whether the differences across industries are larger or smaller than the differences across countries. We performed a two-way analysis of variance (ANOVA) and found significant differences across both industries and countries, with more of the variation coming from across countries than from across industries. One explanation for the importance of the country effect is national policies and union negotiations in most of the European countries. The large differences in R&D labor prices across both countries and industries illustrate the importance of including R&D labor explicitly in R&D PPPs. Nonlabor Input Prices The three remaining categories of input prices used for the construction of the R&D PPPs are materials, other current costs, and capital expenditures. Only the materials prices are industry specific. The variation in relative price levels across industries for materials is nearly as large as that for labor. The coefficient of variation across industries for each of the five comparison countries is between 0.20 and 0.42. As with labor, an ANOVA analysis shows that the differences across both industries and countries are statistically significant. R&D PPPs for 1987 Using the same methods and data sources, we also derive relative prices in 1987 for the same four categories of R&D inputs and aggregate them using R&D expenditure weights. Although for some countries the source material is less extensive and detailed (in particular for the Netherlands), we are able to follow very similar procedures. The results of this exercise at the level of total manufacturing are shown by country in table 10.6. Comparing the relative R&D price levels for 1987, we observe that the United Kingdom is least expensive, 22 percent cheaper than the United States, and France, Germany, and the Netherlands are most expensive, at 6 percent to 16 percent more costly than the United States; Japan is nearly tied with the United States. The lower R&D prices in the United Kingdom are driven most importantly by lower R&D labor prices, while higher 19. See table B6 in McGuckin, van Ark, et al. (2004). 20. Coefficients of variation are calculated as the standard deviation divided by the unweighted arithmetic mean of the relative price levels of R&D labor by industry.
308
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
Table 10.6
R&D PPPs and R&D price levels (cost relative to the United States), total manufacturing, 1987
R&D (labmatOCcap) Exchange rates R&D price level (U.S. 100)
France (€ /$)
Germany (€ /$)
Japan (¥/$)
The Netherlands (€ /$)
United Kingdom (£ /$)
United States ($/$)
0.99 0.92 107.7
1.06 0.92 115.9
141.1 144.6 97.5
0.97 0.92 105.9
0.48 0.61 77.7
1.00 1.00 100.0
Sources: See sources to table 10.2 and 10.3 and McGuckin, van Ark, et al. (2004). Notes: See notes to table 10.4.
Table 10.7
R&D input price levels (cost relative to the United States), total manufacturing, 1987
Input category 1. Labor compensation 2. Materials and supplies 3. Other current costs 4. Capital expenditure
France
Germany
Japan
The Netherlands
United Kingdom
United States
92.0 128.3 118.5 129.9
110.8 121.9 121.5 125.2
83.1 100.4 116.2 102.3
89.7 112.5 114.1 141.9
48.4 111.2 104.5 115.3
100.0 100.0 100.0 100.0
Sources: See sources to table 10.2 and 10.3 and McGuckin, van Ark, et al. (2004). Notes: See notes to table 10.5.
prices in France, Germany, and the Netherlands can be linked to the high price of capital. The relative price levels for the input categories are shown in table 10.7. 10.3.4 Sensitivity of the R&D PPP: Alternative Measures An important question for the interpretation of our results is how sensitive the R&D PPPs are to the assumptions we make. In general, our R&D PPPs will be more accurate if the underlying relative prices are wellmeasured, if they refer specifically to R&D in each industry, and if there are industry-specific weights to combine them into a single index. Of the four R&D input categories, we are most confident in our measure of the price of R&D labor as it is collected specifically for R&D within each industry and country and is nearly comprehensive across countries.21 As mentioned before, though, a drawback is the lack of a breakdown by labor type. The materials inputs are next best as they are industry specific, and the coverage in each industry is high although they are not R&D specific. 21. This discussion abstracts from various issues associated with the R&D survey design. In particular, the collection of expenditure data at the firm level coupled with the classification of a firm into a single industry means that for diversified firms the industry numbers involve a mix of industries.
309
International Comparisons of R&D Expenditure Table 10.8
Selection of input price measures for alternative versions of R&D PPP, by input category Price measure
R&D input category
Preferred (a)
Alternative (b)
Alternative (c)
Current practice (d)
Labor price parity Materials price parity GDP PPPa GDP PPPa
Labor price parity GDP PPPa GDP PPPa GDP PPPa
GDP PPPa GDP PPPa GDP PPPa GDP PPPa
labGDP
GDP
1. Labor compensation 2. Materials and supplies 3. Other current costs 4. Capital expenditure
Labor Materials Other current Capital
Name of alternative
labmatOCcap labmatGDP
Notes: Price parity is price of good in comparison country divided by price of same good in base country (United States). The labor price parity and materials price parity are available at the level of specific industries. a Categories with the same price measure use the total weight of the merged categories for R&D PPP aggregation.
As discussed in section 10.3.2, the prices for other current costs and capital costs in the preferred R&D PPP construction are more problematic. Here we have a limited number of individual item prices, some of which could be improved with hedonic quality adjustments and no weights for the prices that make up the input categories. Although the choices of price proxies were informed by interviews of R&D-intensive firms, we are less confident about these prices because they are not quality-adjusted, there are no weights, and the available price data is relatively sparse. In many respects the choices we face are simply echoes of the earlier studies. But here we develop several alternative versions of the R&D PPP and use them to ascertain the sensitivity of the resulting R&D PPP. The specific input prices used in developing these alternative R&D PPP estimates are described in table 10.8.22 In addition to our “preferred” R&D PPP discussed previously, labeled (a), we estimate two other versions, labeled (b) and (c), in addition to the current practice labeled (d). The alternatives discussed here use the same industry-specific measure of the price of R&D labor. They also use the same weights for the individual inputs. Only the prices used for the input categories are varied. We compare these different versions of the R&D PPP to understand the sensitivity of the results to the selection of price proxies for the input categories. Both alternative R&D PPPs are roughly based on the concept of the Griliches-Jaffe R&D deflator, which combines the price of labor with a 22. We estimated several other variants as well, making a variety of different assumptions about the prices used for other currency and capital costs. The result of these variants was in each case similar to either alternative (b) or (c), so they are not shown here.
310
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
broader measure of economywide price changes (Jaffe 1972; Griliches 1984).23 Alternative (b) uses industry-specific PPPs for materials and supplies to reflect the cost of prototypes and associated goods. For other current costs and capital expenditure, it borrows from the current practice of using the GDP PPP. This approach makes the assumption that the relative price levels of other current and capital R&D costs equal the average relative price level for the aggregate economy. This alternative is referred to as “labmatGDP,” and because it is strongly industry specific, we consider it to be the most conceptually appropriate alternative to our preferred measure. Alternative (c) uses the GDP PPP to proxy the price of all nonlabor inputs, including materials and supplies. This alternative is referred to as “labGDP,” and it combines industry-specific R&D labor with economywide GDP final goods prices. Finally, we compare the results with the current practice alternative (d), which uses the GDP PPP for all R&D inputs and is widely used by statistical agencies and national science authorities for international comparisons of science and technology indicators. As argued in the preceding, use of GDP is particularly problematic as it includes a wide range of products and services not used in R&D, and the concept is based on final expenditure. The use of these alternatives obviously does not cover the entire range of possible measurement problems. Although we do not have systematic quantitative estimates of potential error, we examine here several simple changes in assumptions within each of the alternative estimates to see if they produce major changes in the resulting R&D PPP. For instance, we have excluded some outliers from the set of prices we use for other current costs in calculating our preferred R&D PPP for Germany and Japan. This results in a drop in the input prices in the range of 6–13 percent relative to the United States. But in such instances, the resulting R&D PPPs are only affected by 1.0–3.5 percent. This result is typical of the tests we have conducted. When we use the Fisher PPP aggregation formula described in section 10.3.1 to aggregate prices across countries, large differences in the underlying weights in fact imply a wide range of possible outcomes. This range is referred to as the Paasche-Laspeyres spread and is usually large when countries have very different price structures. Because the six countries in this comparison are at a similar level of development, we did not expect that this should be a significant problem, and it is not. The Paasche-Laspeyres 23. The Griliches-Jaffe deflator originally referred to a proxy R&D price index for the United States that combined the hourly compensation index with a 51 percent weight and the implicit deflator for nonfinancial corporations with a 49 percent weight (Griliches 1984). We analogize this interpretation to spatial comparisons by using PPPs instead of deflators and extend it to use industry-specific R&D labor prices and weights from actual R&D expenditure shares.
International Comparisons of R&D Expenditure Table 10.9
311
Comparison of price levels (cost relative to the United States) using preferred R&D PPPs and alternative R&D PPPs, total manufacturing, 1997
R&D PPP version Preferred (a) R&D (labmatOCcap) Alternatives (b) R&D (labmatGDP) (c) R&D (labGDP) Current practice (d) GDP (GDP PPP) Difference between (a) and (b) Difference between (a) and (c) Difference between (a) and (d)
France
Germany
Japan
The Netherlands
United Kingdom
United States
96.4
111.0
114.1
90.0
88.8
100.0
98.2 97.3
106.4 104.1
109.0 114.7
90.1 88.0
88.0 81.7
100.0 100.0
111.4 1.8 0.9 14.9
112.3 –4.6 –6.9 1.3
134.7 –5.1 0.6 20.6
100.9 0.1 –2.0 10.9
103.2 –0.7 –7.1 14.5
100.0
Sources: See sources to tables 10.2 and 10.3. Note: Alternative R&D PPPs are described in table 10.8.
spread is on the order of 2–3 percent for most comparisons, suggesting that differences in the weights are not large enough to meaningfully affect the comparisons. Moreover, we anticipate that measurement errors in the underlying prices will affect the results more than any differences in the weights, which are R&D and industry specific.24 Alternative Versions of the R&D PPP at the Country Level, 1997 Table 10.9 reports the different versions of relative price levels based on various R&D PPPs, labeled (a) through (c), and the alternative (d), the GDP PPP that is used in current practice. As discussed previously, these alternatives make different assumptions about what prices to use to represent nonlabor R&D input prices. The price levels based on the alternative R&D PPPs (b) and (c) are quite similar to the one using our preferred R&D PPP (a). They differ by –7.1 to 1.8 percentage points from the preferred specification (a) for each country. Alternative R&D PPP (b) “labmatGDP” yields results that are within 5 percentage points of the preferred R&D PPP (a), while alternative (c) “labGDP” yields results within about 7 percentage points of (a). Recall that both alternatives (b) and (c) are based on a Griliches-Jaffe-type R&D PPP. In particular alternative (c) is relatively straightforward to compute as it only requires a PPP for R&D labor and a GDP PPP. In sharp contrast, the current practice of using the GDP PPP by itself 24. The issue of measurement errors in international pricing programs, such as the ICP program or the ICOP project, is discussed extensively elsewhere. For a discussion of measurement issues related to the expenditure-based ICP program, see the “Castles Report” (OECD 1997). For a review of industry-of-origin studies of PPPs and productivity, see van Ark (1993) and van Ark and Timmer (2001).
312
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
yields substantially different results from the preferred measure. Compared to the preferred R&D PPP (a), current practice version (d) varies by 12.4 percentage points on average and by as much as 20.6 percentage points in the case of Japan. Only for Germany are the results within the range of the other alternatives. The size of these differences suggests that the use of an R&D PPP will yield comparative costs and R&D intensities that vary substantially from the current practice of using GDP PPPs, likely increasing the real R&D performance of the comparison countries relative to the United States. Alternative R&D PPPs at the Industry Level, 1997 When comparing the preferred R&D PPP (a) with alternative (b) that uses fully industry-specific input price data at the level of individual industries, the coefficients of variation across industries are about the same for both R&D PPP versions, and we see similar significant differences across industries and countries under an ANOVA analysis. The price levels are significantly determined by the price of R&D labor, which both preferred version (a) and alternative (b) contain in equal proportions. Therefore, it is not surprising that the simple correlation between the two sets of price levels (a) and (b) is 0.83. If we correlate the industry-specific prices with GDP PPPs by themselves, the correlation is only about 0.59. These results suggest that it is important that the R&D PPP be industry specific, but that it is less essential that a full R&D PPP be developed for all input categories in a specific year. Given the current uncertainties in measurement of the R&D PPP for other current cost and capital expenditure, the alternatives (b) and (c) that combine R&D-specific measures of the price of labor (and preferably also material prices) with output prices performs very similarly to a fully developed R&D PPP. These results are consistent with analogous findings about the importance of measuring R&D labor prices in the time dimension in studies by Mansfield (1987) and Jankowski (1993). Alternative Versions of the R&D PPP, 1987 In order to assess how much the 1987 R&D-specific PPPs differ from the current practice of using the GDP PPP as a substitute, we also compared the preferred R&D PPP and several alternatives with the GDP PPP, just as we did for the 1997 PPPs.25 Again, the alternative R&D PPPs are quite similar to the preferred R&D PPP. The alternative PPPs for the European countries differ by no more than 7 percentage points from the preferred PPPs. The gap for Japan is somewhat larger, with R&D PPP alternative (c) “labGDP” differing by 17 percentage points from the preferred PPP. However, for all countries, the GDP PPP (d) yields quite different results 25. See McGucken, van Ark, et al. (2004), table A4.
International Comparisons of R&D Expenditure
313
from any of the other measures. As with the 1997 values, there is substantial variation across industries and great similarity in the coefficients of variation between R&D PPP versions (a) and (b). Arguably, alternative (a) is too difficult to calculate systematically on a year-by-year basis, but it is relatively straightforward to obtain (b) and in particular (c), and these alternatives provide R&D price levels that correspond reasonably close to the preferred ones. We will return to this issue in our closing comments. 10.3.5 Comparing the Distribution of Relative Prices over Time Having made two benchmark estimates, it is tempting to use them to compare the change in relative price levels of each country vis-à-vis the United States over time. In principle, such comparison should give more reliable results because even though the relative price levels are measured with some error in each benchmark year, they tend to cancel out for measures of change over time provided the errors come from the same sources in each year. Such an argument is often invoked in the context of discussions of productivity growth estimates (for example, Hulten 2001). However, even when basic price and quantity data for the benchmark PPPs and time series are consistent, two index number problems plague a comparison of PPPs for two different benchmark years. The first problem is that for a comparison between two points in time, the weights need to be held fixed. The second element relates to the fact that the time series are typically based on national weights of each individual country, whereas benchmark estimates are based on a common weighting system for both countries. Both weighting problems are well-known in the price-index number literature and have been called the “tableau effect” by Summers and Heston (1991).26 Despite these difficulties, it is informative to compare the change in our R&D PPPs between 1987 and 1997 to the change in GDP PPPs over the same period. While the period considered is relatively short and the levels of development across countries are not too different, comparisons of PPP results from two different benchmark years will only lead to relatively minor inconsistencies when price and quantity structures remain rather stable. In table 10.10 we show the change in the R&D PPPs for total manufacturing for the preferred construction, alternative (b) “labmatGDP” PPPs and the change in the GDP PPPs.27 The table shows that while the sign of the change in the PPPs is the same for each alternative PPP, the magnitudes differ considerably, even between our preferred measure (a) and 26. Conceivably, an appropriate weighting system should exist (something akin to chainweighted or so-called spanning trees) that could remedy these inconsistencies, but an exploration of this issue is beyond the scope of this paper. See, for example, Hill (2004). 27. We do not show changes in relative price levels here as those include both changes in the PPPs and changes in exchange rates and are therefore more difficult to interpret.
314
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
Table 10.10
Changes of R&D PPPs for total manufacturing and GDP PPPs between 1987 and 1997 Preferred (a): labmatOCcap
Country France Germany Japan The Netherlands United Kingdom United States
Alternative (b): labmatGDP
Current practice (d): GDP PPP
1987
1997
Change (%)
1987
1997
Change (%)
1987
1997
Change (%)
1.02 1.08 159.1 0.95 0.56 1.00
0.86 0.98 138.1 0.80 0.54 1.00
–16.8 –0.7 –14.2 –17.4 –2.8 0.0
0.99 1.08 172.8 0.92 0.53 1.00
0.87 0.94 131.9 0.08 0.54 1.00
–12.4 –14.0 –27.0 –14.5 1.8 0.0
1.04 1.13 210.2 1.06 0.56 1.00
0.99 0.99 163.0 0.89 0.63 1.00
–4.5 –12.6 –25.4 –17.2 11.1 0.0
Sources: See sources to tables 10.2 and 10.3 and McGuckin, van Ark, et al. (2004). Notes: Percent changes are log differences. Exchange rates are annual averages.
alternative (b). But on average the difference between the two R&D PPPs is smaller than the difference between the R&D PPPs and the GDP PPPs. These results suggest that the current practice of using GDP PPPs over time will be biased compared to using dedicated R&D PPPs, with the direction of the bias varying by country and industry. In addition, it calls into question using GDP deflators to compare R&D expenditure over time as suggested by Jankowski (1993) on the basis of U.S. data that are now over ten years old. These results suggest that the development of dedicated R&D deflators could be worthwhile. 10.4 Real R&D Intensities As mentioned in the introductory section, the ratios of R&D expenditure to GDP or national income are a key focus of policy discussions across the world and are often used as comparative measures of the intensity of the efforts devoted to innovative activities. Because such comparisons of R&D intensities often rely on nominal figures to make comparisons, this is an application where properly adjusting for price differences may have a substantial impact. We therefore examine the effect on R&D intensities of adjusting for differences in R&D prices as well as output prices at the level of total manufacturing and for individual industries. While we cannot directly apply the R&D PPPs we develop in this study to economywide R&D—as the nonmanufacturing R&D could be quite different and almost 36 percent of private U.S. R&D was outside of manufacturing in 1999—the differences between nominal and real R&D intensities that we observe should be indicative of the dangers that may exist with current practice for similar measures covering the aggregate economy.
International Comparisons of R&D Expenditure
315
10.4.1 Adjusting R&D Intensities for Differences in Price Structure Real R&D intensity measures require that R&D expenditure be deflated by an R&D PPP and output by an appropriate output PPP. In this paper we have developed preferred and alternative R&D PPPs. Here we use the preferred one. For output, the PPPs come from industry-of-origin studies conducted in the ICOP project at the University of Groningen. These output PPPs, or more appropriately unit value ratios (UVRs), are calculated using data on quantities and values of output from production censuses and industrial surveys. Individual products are matched across countries and then weighted to form industry-specific and—after aggregation— manufacturing-wide PPPs.28 Quality Adjustments for Output PPPs As with all price measurement, adequately taking into account the differences and changes in quality of products is a difficult undertaking. Moreover, as with other price indexes, research on quality adjustment has generally focused on comparing constant-quality prices over time, rather than constant-quality prices across countries.29 Exceptions are the work of Danzon and Chao (2000), Konijn, Moch, and Dalén (2003) and van Mulligen (2003). Danzon and Chao (2000) estimate PPPs for pharmaceuticals, Konijn, Moch, and Dalén (2003) estimate computer PPPs, and van Mulligan (2003) estimates automobile PPPs. Of these studies, only the work of van Mulligen (2003) fits well into the industry-of-origin approach as it compares prices of cars that are produced in a country. The other two studies examine the bundle of goods purchased in that country. For our output PPPs, we therefore only make use of the automobile PPPs constructed by van Mulligen (2003). The main difference with standard UVRs based on the (producer) unit value per average car is that these PPPs take into account the fact that cars produced in the United States generally have more horsepower and are larger than those produced in Europe or Japan. Van Mulligen uses power and length characteristics of vehicle models to estimate quality-adjusted PPPs using hedonic methods. Unadjusted conversion factors are shown to be biased downward by as much as 50 percent relative to the United States. It is much harder to gauge what the likely effects would be of qualityadjusted PPPs for other products such as computers or telecom equip28. For a more extensive general description of this method, see van Ark (1993) and van Ark and Timmer (2001). For the European manufacturing UVRs PPPs used in this study, see O’Mahony and van Ark (2003). For the Japan-U.S. PPPs, see Inklaar, Wu, and van Ark (2003). Aggregation follows similar procedures as described earlier in the case of R&D PPPs. 29. For a survey of quality change over time for non-R&D products, see Lebow and Rudd (2003). For literature on cross-country quality measurement, see van Ark (1993) and van Mulligen (2003).
316
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
ment. A complicating factor is that many of these high-tech goods are not produced in all countries. In a series of comparisons of productivity for manufacturing industries by the McKinsey Global Institute, quality adjustments were made on an industry-by-industry basis mostly on the basis of proprietary information or by using expert judgments on the quality of comparable products (McKinsey Global Institute 1993; Gersbach and van Ark 1994). Although quality adjustments could be considerable for particular products, we have not used this information as it is only available for a limited number of industries, relating to the early 1990s and covering only Germany, Japan, and the United States.30 Real R&D Intensities and Ranking The nominal and real R&D intensities at the level of total manufacturing are shown in table 10.11 for 1987 and 1997. The nominal R&D intensity is in the first column, the real R&D intensity in the second, and the difference between the real and the nominal intensities in the third.31 The real R&D intensity is calculated by using R&D PPPs to deflate nominal R&D expenditure and using output PPPs to convert manufacturing gross output to a common currency. The difference between the real and nominal intensities can therefore be traced to these two adjustments. In table 10.11, the nominal R&D intensity is defined as manufacturing R&D expenditure divided by manufacturing gross output, as gross output is the correct measure for sectoral analysis. Because international comparisons often are made using GDP, we also replicated all the analysis using R&D intensities based on value added as the output measure. Table B21 in McGuckin, van Ark, et al. (2004) show these results. The magnitude of the valueadded-based intensities is roughly three times higher because the value-added measure omits intermediate inputs. The main results of the analysis, reported in the following, are the same irrespective of the output measure used. The U.S. R&D intensity is highest in all cases, even after the PPP adjustments described in the preceding. The typical adjustment, using R&D and output PPPs, to each of the comparison countries is positive and sizable, yielding R&D intensities that are closer to the U.S. level than under current practice, and this is true for both 1987 and 1997.32 These results 30. For example, for personal computers the different composition of products produced in Germany and the United States led to an upward adjustment of the census-based GermanU.S. computer output PPP by 41 percent in 1990. The PPP for audio and video equipment (including telecom equipment) was only adjusted upward by 5.1 percent. At more aggregate levels (e.g., for total manufacturing), these effects are likely to be much smaller as quality adjustments for some other industries may bias the PPP in the opposite direction (Gersbach and van Ark 1994). 31. The results described here are based on the preferred R&D PPP. If the alternative R&D PPP is used instead of the preferred R&D PPP, the difference between the nominal and real R&D intensities is similar, and the changes in rank are identical. 32. Because the United States is the base country, the U.S. intensity does not change with the PPP adjustment.
International Comparisons of R&D Expenditure Table 10.11
Nominal and real R&D intensity for total manufacturing (R&D/gross output) using preferred R&D PPP and output PPPs
Current practice Country
317
Nominal
With R&D PPP and output PPP Adjustments Real
Difference
2.47 2.87 2.75 2.21 3.09 3.44
0.42 0.16 0.51 0.17 1.03 0.00
2.40 2.47 2.95 1.74 2.49 3.12
0.18 –0.02 0.06 0.16 0.57 0.00
–0.07 –0.40 0.20 –0.47 –0.61 –0.32
–0.24 –0.18 –0.45 –0.01 –0.46 0.00
Year 1987 France Germany Japan The Netherlands United Kingdom United States
2.06 2.71 2.24 2.04 2.07 3.44
France Germany Japan The Netherlands United Kingdom United States
2.22 2.50 2.89 1.59 1.92 3.12
Year 1997
France Germany Japan The Netherlands United Kingdom United States
Change from 1987 to 1997 0.16 –0.21 0.65 –0.46 –0.15 –0.32
Sources: See sources to tables 10.2 and 10.3. Gross output based on OECD (2004). Output PPPs based on O’Mahony and van Ark (2003) and Inklaar, Wu, and van Ark (2003) for 1997, and van Ark (1993) for 1987. Notes: Adjustments for R&D PPP divide R&D expenditures by the R&D PPP. Adjustments for output PPP divide gross output by an output PPP. Real intensity includes both adjustments.
suggest that the efforts devoted to R&D in each country are more similar across countries than is apparent using the nominal R&D intensities that are currently the norm. The effect of the price adjustments on R&D intensity is particularly large for the United Kingdom: Before adjustment (in nominal terms), its R&D intensity is only 2.1 percent in 1987 and 1.9 percent in 1997. After adjustment for relative prices of R&D and gross output, the U.K.’s R&D intensity (in real terms) is much higher at 3.1 percent in 1987 and 2.5 in 1997. In 1987, these adjustments shift the rank of the United Kingdom from next to last among the six countries in this study to second place after the United States, displacing Germany and Japan. The R&D PPP contributed about two-thirds of the adjustment in that year.
318
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
Not only are the levels of R&D intensity affected by price adjustments, but so are the changes in R&D intensity. From 1987 to 1997, nominal R&D intensity in Germany dropped by 0.2 percentage points. Using real R&D intensity, however, the drop was 0.4 points. In general, R&D PPPs declined less than output PPPs, worsening the trend in R&D intensity between 1987 and 1997. Real R&D Intensities for Individual Industries R&D intensities for individual industries are subject to the same interpretation problems as those at more aggregate levels due to the use of nominal values. Using industry-specific R&D PPP and output PPP price adjustments to adjust nominal industry-level R&D intensities gives real R&D intensities for individual industries. As a result of the large variation in (i) the R&D PPPs (because of large R&D labor-price variation), (ii) output PPPs, and (iii) nominal R&D intensities across industries, these adjustments are often larger in percentage points than those at the total manufacturing level. The average difference between real (PPP adjusted) R&D intensities and nominal R&D intensities is 0.7 percentage point at the industry level, while for total manufacturing this is only 0.3 percentage points. A key question for the interpretation of these differences is how important the adjustment for differences in relative R&D and output prices is compared to the differences in nominal R&D intensity. A two-way ANOVA between real and nominal R&D intensity among the six countries and nineteen industries demonstrates that the variation among industries is very large and statistically significant, while differences across countries are relatively small and not statistically significant. The variation among industries is likely attributable to the differences in technologies and R&D production functions and to demand-side opportunities that generate differences in the intensity of R&D efforts across industries. The smaller differences across countries are most likely a result of internationalization of R&D and increased competitiveness due to globalization. 10.5 Concluding Comments This paper develops R&D PPPs that are conceptually appropriate in that they are based on relative prices for a basket of R&D inputs. To the extent that current data allows, we have developed R&D-specific prices and weights and aggregated them into R&D PPPs for nineteen individual manufacturing industries covering the years 1997 and 1987. Previous R&D PPP estimates did not utilize such detailed R&D-specific price and weight data as in this study, nor did they use interviews to guide the application of their methodology. Thus the R&D PPPs we developed allow us to better evaluate the importance of having R&D-specific measures of R&D price across countries.
International Comparisons of R&D Expenditure
319
A comparison of our preferred R&D PPPs with GDP PPPs suggests that current procedures for comparing R&D across countries are flawed. While there is some netting of industry differences at the economywide level, the GDP PPPs still differ substantially from R&D PPPs. At the industry level, use of the GDP PPP as a proxy for the R&D PPP is inappropriate. The differences between R&D PPPs and GDP PPPs are large, and a substantial fraction of these differences can be traced to variations in the price of R&D labor across industries. The size of this difference and the relatively complex nature of our preferred R&D PPP has led us to consider two alternatives that can be readily calculated and could easily be adopted by statistical agencies. These relatively easy-to-measure alternative R&D PPPs are based on a Griliches-Jaffe-type index and are relatively close to the preferred R&D PPP in approximating differences in R&D price across countries and industries. The most plausible alternative measure combines industryspecific R&D labor PPPs and industry output PPPs for materials and supplies with the GDP PPP for other inputs. While the most important source of differences at the economywide level is still R&D labor cost, prices of the other inputs to R&D can and do vary across industries. So by advocating that priority needs to be given to develop R&D labor PPPs, we are not suggesting that price measurement for other inputs to R&D should be ignored. For comparisons over time, few substitutes for our preferred R&D PPP are available. While industrylevel changes in the preferred R&D PPP over time correlate well with those of the alternative R&D PPPs, differences at the total manufacturing level are large enough to cause significant errors of interpretation in not only R&D expenditures, but also in R&D intensities. This suggests that periodic benchmark estimates of the preferred R&D PPP would be useful to ensure that an alternative R&D PPP that relies mainly upon variations in R&D labor prices maintains a solid grounding over time. Our results in the interspatial domain also suggest that intertemporal R&D deflator work should be given further attention. We find important differences between changes in the GDP PPP and the R&D PPPs. While one cannot draw direct conclusions regarding the development of relative prices due to different weighting systems at varying points in time, the results suggest that it would be useful to reexamine Jankowski’s (1993) finding of a correlation between the GDP and the R&D deflator. One reason is that his study is now over ten years old, and there have been vast changes in economic structure and measurement of quality change. There is also evidence that this correlation does not hold up as well in other countries.33 33. For instance, Bernstein (1986) found that the GDP price deflator did not correspond well with an input-based R&D price deflator for Canada. Cameron (1996) found a similar result for the United Kingdom.
320
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
Moreover, given the lack of strong conceptual roots in using GDP as a measure of R&D price, internationally consistent R&D deflators should be further examined in the time domain. Finally, we consider it vital that research be continued in this area. Our study is the first to examine R&D PPPs at the industry level and the only study that has been able to take advantage of the recently developed measures of comparable prices at the output level from the University of Groningen’s ICOP program. Further improvements in price measurement and ongoing harmonization of R&D statistics and survey instruments could facilitate the construction of future comparisons and render them more reliable. Rapid growth of global R&D activities makes it vital that accurate comparisons be made of R&D, regardless of where it is performed.
References Bernstein, J. 1986. Price indexes for Canadian industrial research and development expenditures. Montreal, Canada: Statistics Canada. Brunner, E. D. 1967. The cost of basic scientific research in Europe: Department of Defense experience, 1955–1966. Report for United States Air Force Project. Santa Monica, CA: RAND. Cameron, G. 1996. On the measurement of real R&D: Divisia price indexes for U.K. business enterprise R&D. Research Evaluation 6 (3): 215–19. Conference Board, The. 1976. Overseas research and development by United States multinationals, 1965–1975: Estimates of expenditures and a statistical profile. New York: The Conference Board. Danzon, P. M., and Chao, L. W. 2000. Cross-national price differences for pharmaceuticals: How large, and why? Journal of Health Economics 19:159–95. DiMasi, J. A., R. W. Hansen, and H. G. Grawbowski. 2003. The price of innovation: New estimates of drug development costs. Journal of Health Economics 22:151–85. European Commission. 2002. Presidency conclusions—Barcelona European Council. http://www.europa.eu/european_council/conclusions/index_en.htm. Fraumeni, B., and S. Okubo. 2005. R&D in the National Income and Product Accounts: A first look at its effect on GDP. In Measuring capital in the new economy, ed. C. Corrado, J. Haltiwanger, and D. Sichel, 275–316. Studies in Income and Wealth, vol. 65. Chicago: University of Chicago Press. Freeman, C., and A. Young. 1965. The research and development effort in Western Europe, North America and the Soviet Union: An experimental international comparison of research expenditures and manpower in 1962. Paris: Organization for Economic Cooperation and Development. Gersbach, H., and B. van Ark. 1994. Micro foundations for international productivity comparisons. Research Memorandum no. GD-11. Groningen, The Netherlands: Groningen Growth and Development Centre. Griliches, Z. 1984. R&D, patents, and productivity. Chicago: University of Chicago Press. ———. 1994. Productivity, R&D and the data constraint. American Economic Review Papers and Proceedings 84 (1): 1–23.
International Comparisons of R&D Expenditure
321
Hill, R. 2004. Constructing price indexes across space and time: The case of the European Union. The American Economic Review 94 (5): 1379–1410. Hulten, C. R. 2001. Total factor productivity: A short biography. In New developments in productivity analysis, ed. C. R. Hulten, E. R. Dean, and M. J. Harper, 1–47. Studies in Income and Wealth, vol. 63. Chicago: University of Chicago Press. Inklaar, R., Wu, H. X., and van Ark, B. 2003. Losing ground: Japanese labour productivity and unit labour cost in manufacturing in comparison to the U.S. Groningen Growth and Development Centre Research Memorandum no. 64. http://www.ggdc.net/pub/gd64.shtml. Jankowski, J. E. 1993. Do we need a price index for industrial R&D? Research Policy 22:195–205. Jaffe, S. A. 1972 A price index for deflation of academic R&D expenditures. National Science Foundation Report no. NSF 72-310. Washington, DC: Government Printing Office. Jorgenson, D., and M. Kuroda. 1992. Productivity and international competitiveness in Japan and the U.S., 1960–1985. Economic Studies Quarterly 43 (2): 313– 25. Kiba, T., Sakuma, I., and Kikuchi, J. 1994. Development of R&D purchasing power parities. NISTEP Report no. 31. Tokyo: National Institute of Science and Technology Policy, March. http://www.nistep.go.jp/achiev/ftx/eng/rep031e/text/ rep031e.txt. Konijn, P., Moch, D., and Dalén, J. 2003. Comparison of hedonic functions for PCs across E.U. countries. Eurostat Discussion Paper. http://www.ifcommittee.org/ konijn.pdf. Kravis, I., Heston, A., and Summers, R. 1982. World product and income: International comparisons of real gross product. Baltimore: Johns Hopkins University Press. Lebow, D. E., and Rudd, J. B. 2003. Measurement error in the Consumer Price Index: Where do we stand? Journal of Economic Literature 41 (1): 159–201. MacDonald, A. S. 1973. Exchange rates for national expenditure on research and development. The Economic Journal 83 (330): 476–94. Mansfield, E. 1987. Price indexes for R&D inputs, 1969–1983. Management Science 33 (1): 124–29. ———. 1988. The speed and cost of industrial innovation in Japan and the U.S.: External vs. internal technology. Management Science 34 (10): 1157–68. McGuckin, R. H., R. C. Inklaar, B. van Ark, and S. M. Dougherty. 2004(a). The structure of business R&D: Recent trends and measurement implications. Conference Board Working Paper. http://www.conference-board.org/economics/ workingpapers.cfm. McGuckin, R. H., B. van Ark, S. M. Dougherty, and R. C. Inklaar. 2004(b). Appendices to the report on internationally comparable science, technology, and competitiveness indicators (revised). National Science Foundation Report no. SRS00-99594. http://www.conference-board.org/economics/workingpapers.cfm. McKinsey Global Institute. 1993. Manufacturing productivity. Washington, DC: McKinsey & Company. Mulligen, P. H. van. 2003. Quality aspects in price indices and international comparisons: Applications of the hedonic method. Voorburg, The Netherlands: Statistics Netherlands. National Science Foundation. 1998. The science and technology resources of Japan: A comparison with the United States. NSF Special Report no. 96-324. Washington, DC: NSF. http://www.nsf.gov/sbe/srs/nsf97324/start.htm.
322
Sean M. Dougherty, Robert Inklaar, Robert H. McGuckin, Bart van Ark
———. 2002. Science and engineering indicators. Washington, DC: NSF. Organization for Economic Cooperation and Development. 1963. Frascati manual—Standard practice for surveys of research and experimental development. The Measurement of Scientific and Technological Activities Series. 1st ed. Paris: OECD. ———. 1979. Price indexes and R&D exchange rates. In Trends in industrial R&D in selected member countries: 1966–1975, 158–69. Paris: OECD. ———. 1981. Frascati manual—Standard practice for surveys of research and experimental development. The Measurement of Scientific and Technological Activities Series. 4th ed. Paris: OECD. ———. 1994. Frascati manual—Standard practice for surveys of research and experimental development. The Measurement of Scientific and Technological Activities Series. 5th ed. Paris: OECD. ———. 1997. Review of the OECD-Eurostat PPP program. Paris: OECD. http:// www.oecd.org/std/ppp/ppps.htm. ———. 2002a. Frascati manual—Standard practice for surveys of research and experimental development. The Measurement of Scientific and Technological Activities Series. 6th ed. Paris: OECD. ———. 2002b. PPPs and real expenditures—1999 benchmark year. Paris: OECD. ———. 2003. Research and development statistics. (2003 ed.). Paris: OECD. ———. 2004. STructural ANalysis (STAN) database. Paris: OECD. O’Mahony, M., and B. van Ark, eds. 2003. E.U. productivity and competitiveness: An industry perspective: Can Europe resume the catching-up process? http://www .ggdc.net/pub/EU_Productivity_and_Competitiveness.shtml. Summers, R., and A. Heston. 1991. The Penn World Table (Mark 5): An expanded set of international comparisons, 1950–1988. Quarterly Journal of Economics 106 (2): 327–68. van Ark, B. 1993. International comparisons of output and productivity. Groningen Growth and Development Centre Monograph Series no. 1. http://www.ggdc.net/ pub/arkbook.shtml. van Ark, B., and M. P. Timmer. 2001. PPPs and international productivity comparisons: Bottlenecks and new directions. Paper presented at the World BankOECD seminar, Purchasing Power Parities: Recent Advances, Washington, DC: http://www.oecd.org/dataoecd/24/0/2424747.pdf.
IV
Information Technology and the Acceleration of Productivity Growth
11 Information Technology and the G7 Economies Dale W. Jorgenson
11.1 Introduction In this paper I present international comparisons of economic growth among the G7 nations—Canada, France, Germany, Italy, Japan, the United Kingdom, and the United States. These comparisons focus on the impact of investment in information technology (IT) equipment and software over the period 1980 to 2001. In 1998, the G7 nations accounted for nearly 60 percent of world output1 and a much larger proportion of world investment in IT. Economic growth in the G7 has experienced a strong revival since 1995, driven by a powerful surge in IT investment. The resurgence of economic growth in the United States during the 1990s and the crucial role of IT investment has been thoroughly documented and widely discussed.2 Similar trends in the other G7 economies have been more difficult to detect, partly because of discrepancies among official price indexes for IT equipment and software identified by Andrew Dale W. Jorgenson is the Samuel W. Morris University Professor at Harvard University. The Economic and Social Research Institute provided financial support from its program on international collaboration through the Nomura Research Institute. I am very grateful to Jon Samuels for excellent research assistance. Alessandra Colecchia, Mun S. Ho, Kazuyuki Motohashi, Koji Nomura, Kevin J. Stiroh, Marcel Timmer, and Bart Van Ark provided valuable data. The Bureau of Economic Analysis and the Bureau of Labor Statistics assisted with data for the United States, and Statistics Canada contributed the data for Canada. I am grateful to all of them but retain sole responsibility for any remaining deficiencies. An earlier version of this paper was published under the same title in World Economics for December 2003. 1. See Angus Maddison (2001) for 1998 data for world GDP and the GDP of each of the G7 countries. 2. See Dale Jorgenson and Kevin Stiroh (2000), Stephen Oliner and Daniel Sichel (2000), and Jorgenson, Ho, and Stiroh (2005).
325
326
Dale W. Jorgenson
Wyckoff (1995).3 Paul Schreyer (2000) has constructed “internationally harmonized” IT prices that eliminate many of these discrepancies.4 Precise measures of computer prices that hold product quality constant were introduced into the U.S. national accounts in 1985 and the U.S. Producer Price Index (PPI) during the 1990s. The national accounts now rely on PPI data. Gregory Chow (1967) had pioneered the use of hedonic techniques for constructing a constant product quality index of computer prices. Hedonic techniques for modeling automobile prices were introduced by Court (1939) and revived by Zvi Griliches (1961). The measurement of IT investment reflects steady progress in meeting the challenges arising from hard-to-measure goods and services, but important gaps remain. In addition to the constant quality computer prices introduced in 1985, the U.S. national accounts incorporated constant quality prices for semiconductor products in 1996. Investment in software was first included in the national accounts in the eleventh comprehensive revision, released on October 27, 1999. Research is still underway on constant quality telecommunications equipment prices and constant quality prices for software.5 Using internationally harmonized prices for France, Germany, Italy, and the United Kingdom, I have analyzed the role of investment and productivity as sources of growth in the G7 countries over the period 1980– 2001. I have subdivided the period in 1989 and 1995 in order to focus on the most recent experience. I have decomposed growth of output for each country between growth of input and productivity. Finally, I have allocated the growth of input between investments in tangible assets, especially IT and software and human capital.6 Growth in IT capital input per capita jumped to double-digit levels in the G7 nations after 1995. This can be traced to acceleration in the rate of decline of IT prices, analyzed in my presidential address to the American Economic Association.7 The powerful surge in investment was most pronounced in Canada, but capital input growth in Japan, the United States, and the United Kingdom was only slightly lower. France, Germany, and Italy also experienced double-digit growth, but lagged considerably behind the leaders. During the 1980s, productivity played a minor role as a source of growth for the G7 countries except Japan, where productivity accounted for 30 percent of economic growth. Productivity accounted for only 15 percent of 3. See Wyckoff (1995). 4. See Schreyer (2000). Colecchia and Schreyer (2002) have employed these internationally harmonized prices in measuring the impact of IT investment. 5. Further details are provided by Jorgenson, Ho, and Stiroh (2005). 6. Methods for productivity measurement are presented in detail by Jorgenson, Ho, and Stiroh (2005). A more concise exposition is provided by Jorgenson (2005). 7. See Jorgenson (2001).
Information Technology and the G7 Economies
327
growth in the United States, 13 percent in France and the United Kingdom, and 12 percent in Germany; only 2 percent of growth in Canada was due to productivity, while the decline of productivity retarded growth by 14 percent in Italy. Between 1989 and 1995, productivity growth declined further in the G7 nations, except for Italy and Germany. Productivity declined for France and the United Kingdom but remained positive for the United States, Canada, and Japan. Productivity growth revived in all the G7 countries after 1995, again with the exception of Germany and Italy. The resurgence was most dramatic in Canada, the United Kingdom, and France, partly offsetting years of dismal productivity growth. Japan exhibited the highest growth in output per capita among the G7 nations from 1980 to 1995. Japan’s level of output per capita rose from the lowest in the G7 to the middle of the group in 2001. Although this advance owed more to input per capita than productivity, Japan’s productivity growth far outstripped the other members of the G7. Nonetheless, Japan’s productivity remained the lowest among the G7 nations. The United States led the G7 in output per capita for the period 1989 to 2001. Canada’s edge in output per capita in 1980 had disappeared by 1989. The United States led the G7 countries in input per capita during 1980 to 2001, but U.S. productivity languished below the levels of Canada, France, and Italy. In section 11.2, I outline the methodology for this study, based on my presidential address. I have revised and updated the U.S. data presented there through 2001. Comparable data on investment in IT have been constructed for Canada by Statistics Canada.8 Data on IT for France, Germany, Italy, and the United Kingdom have been developed for the European Commission by Bart van Ark et al. (2002).9 Finally, data for Japan have been assembled by myself and Kazuyuki Motohashi for the Research Institute on Economy, Trade, and Industry.10 I have linked these data by means of the Organization for Economic Cooperation and Development’s (OECD’s) purchasing power parities for 1999.11 In section 11.3, I consider the impact of IT investment and the relative importance of investment and productivity in accounting for economic growth among the G7 nations. Investments in human capital and tangible assets, especially IT equipment and software, account for the overwhelming proportion of growth. Differences in the composition of capital and labor inputs are essential for identifying persistent international differences in output and accounting for the impact of IT investment. 8. See John Baldwin and Tarek Harchaoui (2002). 9. See van Ark et al. (2002). 10. See Jorgenson and Motohashi (2005). 11. See OECD (2002). Current data on purchasing power parities are available from the OECD Web site: http://www.sourceoecd.org.
328
Dale W. Jorgenson
In section 11.4, I consider alternative approaches to international comparisons. The great revival of interest in economic growth among economists dates from Maddison’s (1982) updating and extension of Simon Kuznets’s (1971) long-term estimates of the growth of national product and population for fourteen industrialized countries, including the G7 nations. Maddison (1982, 1991) added Austria and Finland to Kuznets’ list and presented growth rates covering periods beginning as early as 1820 and extending through 1989. Maddison (1987, 1991) also generated growth accounts for major industrialized countries, but did not make level comparisons like those presented in section 11.2. As a consequence, productivity differences were omitted from the canonical formulation of “growth regressions” by William Baumol (1986). This proved to be a fatal flaw in Baumol’s regression model, remedied by Nazrul Islam’s (1995) panel data model. Section 11.5 concludes the paper. 11.2 Investment and Productivity My papers with Laurits Christensen and Dianne Cummings (Christensen, Cummings, and Jorgenson 1980, 1981) developed growth accounts for the United States and its major trading partners—Canada, France, Germany, Italy, Japan, Korea, the Netherlands, and the United Kingdom for 1947 to 1973. We employed gross national product (GNP) as a measure of output and incorporated constant quality indexes of capital and labor input for each country. Our 1981 paper compared levels of output, inputs, and productivity for all nine nations. I have updated the estimates for the G7—Canada, France, Germany, Italy, Japan, the United Kingdom, and the United States—through 1995 in earlier work. The updated estimates are presented in my papers with Chrys Dougherty (Dougherty and Jorgenson 1996, 1997) and Eric Yip (Jorgenson and Yip 2000). We have shown that productivity accounted for only 11 percent of economic growth in Canada and the United States over the period 1960 to 1995. My paper with Yip (Jorgenson and Yip 2000) attributed 47 percent of Japanese economic growth during the period 1960–1995 to productivity growth. The proportion attributable to productivity approximated 40 percent of growth for the four European countries—France (.38), Germany (.42), Italy (.43), and the United Kingdom (.36). Input growth predominated over productivity growth for all the G7 nations. I have now incorporated new data on investment in IT equipment and software for the G7. I have also employed internationally harmonized prices like those constructed by Schreyer (2000). As a consequence, I have been able to separate the contribution of capital input to economic growth into IT and non-IT components. While IT investment follows similar pat-
Information Technology and the G7 Economies
329
terns in all the G7 nations, non-IT investment varies considerably and helps to explain important differences in growth rates among the G7. 11.2.1 Comparisons of Output, Input, and Productivity My first objective is to extend my estimates for the G7 nations with Christensen, Cummings, Dougherty, and Yip to the year 2001. Following the methodology originally proposed by Jorgenson and Griliches (1967), I have chosen gross domestic product (GDP) as a measure of output. I have included imputations for the services of consumers’ durables as well as land, buildings, and equipment owned by nonprofit institutions. I have also distinguished between investments in IT equipment and software and investments in other forms of tangible assets. A constant quality index of capital input is based on weights that reflect differences in capital consumption, tax treatment, and the rate of decline of asset prices. I have derived estimates of capital input and property income from national accounting data. Similarly, a constant quality index of labor input is based on weights by age, sex, educational attainment, and employment status. I have constructed estimates of hours worked and labor compensation from labor force surveys for each country. In table 11.1, I present output per capita for the G7 nations from 1980 to 2001, taking the United States as 100.0 in 2000. Output and population are given separately in tables 11.2 and 11.3. I use 1999 purchasing power parities from the OECD to convert output from domestic prices for each country into U.S. dollars. The United States maintained its lead among the G7 Table 11.1
Levels of output and input per capita and total factor productivity
Year
United States
Canada
1980 1989 1995 2001
63.9 79.7 85.6 100.3
67.6 78.8 79.6 91.9
1980 1989 1995 2001
70.5 83.9 88.8 100.8
1980 1989 1995 2001
90.6 94.9 96.4 99.5
United Kingdom
France
Germany
Italy
Japan
Output per capita 45.0 45.9 56.5 54.1 61.4 57.0 71.3 64.0
49.3 58.6 65.0 69.2
45.9 57.3 62.1 68.8
43.6 58.4 65.4 70.4
64.2 74.4 75.2 83.7
Input per capita 50.2 46.5 61.2 53.3 67.0 57.0 73.6 61.7
61.0 71.1 73.7 79.0
43.1 55.5 58.8 67.2
61.9 74.8 78.8 81.1
105.4 105.9 105.9 109.7
Total factor productivity 89.5 98.6 92.3 101.5 91.7 99.9 96.9 103.6
80.8 82.4 88.1 87.6
106.6 103.2 105.6 102.5
70.4 78.0 83.0 86.8
Notes: U.S. 100.0 in 2000. Canada data begins in 1981.
330
Dale W. Jorgenson
Table 11.2
Year
1980–1989 1989–1995 1995–2001
Growth rate and level of output United States
3.38 2.43 3.76
1980 1989 1995 2001
5,361.2 7,264.2 8,403.3 10,530.4
1980 1989 1995 2001
51.6 69.9 80.8 101.3
Canada
3.10 1.39 3.34
United Kingdom
France
Germany
Growth rate (%) 2.69 2.38 1.62 1.30 2.74 2.34
1.99 2.34 1.18
Level (billions of 2000 U.S. dollars) 618.4 934.0 932.0 1,421.7 792.6 1,190.3 1,154.3 1,700.2 861.4 1,311.8 1,247.8 1,956.3 1,052.3 1,545.9 1,436.0 2,099.8 Level (U.S. 100.0 in 2000) 5.9 9.0 9.0 7.6 11.4 11.1 8.3 12.6 12.0 10.1 14.9 13.8
13.7 16.3 18.8 20.2
Italy
2.51 1.52 1.90
Japan
3.83 2.23 1.45
955.7 1,197.4 1,311.5 1,470.1
1,875.9 2,648.7 3,017.1 3,301.3
9.2 11.5 12.6 14.1
18.0 25.5 29.1 31.7
Note: Canada data begins in 1981. Table 11.3
Year
1980–1989 1989–1995 1995–2001
Growth rate and level in population (%) United States
0.92 1.23 1.12
Canada
1.18 1.22 0.95
1980 1989 1995 2001
227.7 247.4 266.3 284.8
24.8 27.3 29.4 31.1
1980 1989 1995 2001
80.7 87.7 94.4 101.0
8.8 9.7 10.4 11.0
United Kingdom
France
Growth rate 0.16 0.54 0.24 0.45 0.24 0.41 Level (millions) 56.3 55.1 57.1 57.9 58.0 59.4 58.8 60.9
Level (US. 100.0 in 2000) 20.0 19.5 20.3 20.5 20.5 21.1 20.8 21.6
Germany
Italy
Japan
0.05 0.62 0.14
0.05 0.18 0.18
0.59 0.33 0.22
78.3 78.7 81.7 82.3
56.4 56.7 57.3 57.9
116.8 123.1 125.6 127.2
27.8 27.9 28.9 29.2
20.0 20.1 20.3 20.5
41.4 43.6 44.5 45.1
Note: Canada data begins in 1981.
countries in output per capita after 1989. Canada led the United States in 1980, but fell behind during the 1980s. The U.S.-Canada gap widened considerably during the 1990s. The four major European nations—the United Kingdom, France, Germany, and Italy—had very similar levels of output per capita throughout the period 1980–2001. Japan rose from last place in 1980 to fourth among
Information Technology and the G7 Economies Table 11.4
Growth in output and input per capita and total factor productivity (%) United States
Canada
1980–1989 1989–1995 1995–2001
2.46 1.20 2.64
1.92 0.17 2.38
1980–1989 1989–1995 1995–2001
1.94 0.94 2.10
1980–1989 1989–1995 1995–2001
0.52 0.26 0.54
Year
331
United Kingdom
France
Germany
Italy
Japan
Output per capita 2.54 1.84 1.38 0.85 2.50 1.93
1.93 1.72 1.04
2.46 1.33 1.72
3.25 1.90 1.23
1.86 0.17 1.80
Input per capita 2.20 1.52 1.49 1.11 1.59 1.33
1.71 0.60 1.14
2.82 0.96 2.21
2.10 0.86 0.48
0.06 0.00 0.58
Total factor productivity 0.34 0.32 –0.11 –0.26 0.91 0.60
0.23 1.12 –0.10
–0.36 0.37 –0.49
1.15 1.04 0.75
Note: Canada data begins in 1981.
the G7 in 2001, lagging considerably behind the United States and Canada, but only slightly behind the United Kingdom. Japan led the G7 in the growth of output per capita from 1980 to 1995, but fell behind the United States, Canada, the United Kingdom, France, and Italy after 1995. In table 11.1, I present input per capita for the G7 over the period 1980 to 2001, taking the United States as 100.0 in 2000. I express input per capita in U.S. dollars, using purchasing power parities constructed for this study.12 The United States was the leader among the G7 in input per capita throughout the period. In 2001, Canada ranked next to the United States, with Japan third and Germany fourth. France and Italy started at the bottom of the ranking and remained there throughout the period. In table 11.1, I also present productivity levels for the G7 over the period 1980 to 2001. Productivity is defined as the ratio of output to input, including both capital and labor inputs. Canada was the productivity leader during the period 1989 to 2001, with France and Italy close behind, despite the drop in productivity in Italy! Japan made the most substantial gains in productivity, while there were more modest increases in the United States, Canada, the United Kingdom, France, and Germany. I summarize growth in output and input per capita and productivity for the G7 nations in table 11.4. I present growth rates of output and population for the period 1980 to 2001 in tables 11.2 and 11.3. Output growth slowed in the G7 after 1989 but revived for all nations except Japan and 12. The purchasing power parities for outputs are based on OECD (2002). Purchasing power parities for inputs follow the methodology described in detail by Jorgenson and Yip (2001).
332
Dale W. Jorgenson
Germany after 1995. Output per capita followed a similar pattern, with Canada barely expanding during the period 1989 to 1995. Japan led in growth of output per capita through 1995, but fell to the lower echelon of the G7 after 1995. Japan led in productivity growth during 1980 to 1989, Germany led from 1989 to 1995, and the United Kingdom led from 1995 to 2001. For all countries and all time periods, except for Germany during the period 1989 to 1995 and Japan after 1989, the growth of input per capita exceeded growth of productivity by a substantial margin. Productivity growth in the G7 slowed during the period 1989 to 1995, except for Germany and Italy, where productivity slumped after 1995. Italy led the G7 in growth of input per capita for the periods 1980 to 1989 and 1995 to 2001, but relinquished leadership to the United Kingdom for the period 1989 to 1995. Differences among input growth rates are smaller than differences among output growth rates, but there was a slowdown in input growth during 1989 to 1995 throughout the G7. After 1995 growth of input per capita increased in every G7 nation except Japan. 11.2.2 Comparisons of Capital and Labor Quality A constant quality index of capital input weights capital inputs by property compensation per unit of capital. By contrast, an index of capital stock weights different types of capital by asset prices. The ratio of capital input to capital stock measures the average quality of a unit of capital. This represents the difference between the constant quality index of capital input introduced by Jorgenson and Griliches (1967) and the index of capital stock employed, for example, by Kuznets (1971) and Robert Solow (1970). In table 11.5, I present capital input per capita for the G7 countries over the period 1980 to 2001 relative to the United States in 2000. The United States was the leader in capital input per capita throughout the period, while Japan was the laggard. Canada led the remaining six countries in 1980, but was overtaken by Germany and Italy in 1995. Italy led the rest of the G7 through 2001, but lagged considerably behind the United States. The picture for capital stock per capita has some similarities to capital input, but there are important differences. Capital stock levels do not accurately reflect the substitutions among capital inputs that accompany investments in tangible assets, especially investments in IT equipment and software. Japan led the G7 in capital stock per capita throughout the period 1980 to 2001. The United Kingdom lagged the remaining countries of the G7 throughout the period. The behavior of capital quality highlights the differences between the constant quality index of capital input and capital stock. There are important changes in capital quality over time and persistent differences among countries so that heterogeneity in capital input must be taken into account in international comparisons of economic performance. Canada was the
Information Technology and the G7 Economies Table 11.5
333
Levels of capital input and capital stock per capita and capital quality
Year
United States
Canada
1980 1989 1995 2001
57.7 73.7 81.6 103.9
56.0 67.1 68.3 78.0
1980 1989 1995 2001
76.8 88.4 92.2 101.7
1980 1989 1995 2001
75.1 83.4 88.5 102.2
United Kingdom
France
Germany
Italy
Japan
Capital input per capita 25.8 36.3 37.9 48.3 50.0 52.7 56.1 58.1
44.6 62.1 72.3 83.5
35.6 62.4 73.1 89.4
32.8 43.5 50.7 58.3
40.7 48.5 50.8 55.1
Capital stock per capita 24.1 36.2 31.2 42.4 35.9 47.0 44.5 52.0
60.2 67.9 77.0 85.5
36.0 52.4 62.3 72.3
93.1 104.1 114.8 122.2
137.5 138.2 134.6 141.5
Capital quality 107.0 100.1 121.7 114.0 139.3 112.2 126.1 111.9
74.0 91.5 94.0 97.7
98.8 119.1 117.4 123.6
35.2 41.8 44.2 47.7
Notes: U.S. 100.0 in 2000. Canada data begins in 1981. Table 11.6
Growth in capital input and capital stock per capita and capital quality (%) United States
Canada
1980–1989 1989–1995 1995–2001
2.72 1.70 4.03
2.26 0.31 2.20
1980–1989 1989–1995 1995–2001
1.56 0.70 1.63
1980–1989 1989–1995 1995–2001
1.17 0.99 2.40
Year
United Kingdom
France
Germany
Italy
Japan
Capital input per capita 4.28 3.19 4.61 1.46 1.92 1.63
3.70 2.53 2.40
6.25 2.63 3.35
3.16 2.55 2.31
2.19 1.05 1.36
Capital stock per capita 2.85 1.74 2.36 1.74 3.57 1.67
1.34 2.09 1.75
4.18 2.87 2.49
1.25 1.63 1.04
0.07 –0.74 0.84
Capital quality 1.43 1.45 2.25 –0.27 –1.65 –0.04
2.36 0.44 0.65
2.07 –0.24 0.86
1.91 0.92 1.26
Note: Canada data begins in 1981.
international leader in capital quality in 1980 and 2001, relinquishing the lead to the United Kingdom in 1995, while Japan ranked at the bottom of the G7 throughout the period. I summarize growth in capital input and capital stock per capita as well as capital quality for the G7 nations in table 11.6. Italy was the international leader in capital input growth from 1980 to 1989, while the Canada
334
Dale W. Jorgenson
was the laggard. The United Kingdom led from 1989 to 1995, while Canada lagged considerably behind the rest of the G7. The United States took the lead after 1995. There was a slowdown in capital input growth throughout the G7 after 1989, except for the United Kingdom, and a revival after 1995 in the United States, Canada, France, and Italy. A constant quality index of labor input weights hours worked for different categories by labor compensation per hour. An index of hours worked fails to take quality differences into account. The ratio of labor input to hours worked measures the average quality of an hour of labor, as reflected in its marginal product. This represents the difference between the constant quality index of labor input used by Jorgenson and Griliches (1967) and the index of hours worked employed, for example, by Kuznets (1971) and Solow (1970). In table 11.7, I present labor input per capita for the G7 nations for the period 1980 to 2001 relative to the United States in 2000. Japan was the international leader throughout the period 1980 to 2001. Labor input in Japan was nearly double that in Italy. The United States led the remaining G7 nations. The United Kingdom ranked third among the G7 through 1995, but fell slightly behind Canada in 2001. Italy and France lagged behind the rest of the G7 for the entire period. The picture for hours worked per capita has some similarities to labor input, but there are important differences. Japan was the international leader
Table 11.7
Levels of labor input and hours worked per capita and labor quality
Year
United States
Canada
1980 1989 1995 2001
81.1 91.9 94.2 98.8
73.0 82.1 82.3 89.3
1980 1989 1995 2001
89.7 97.1 95.9 98.3
91.4 96.6 90.9 96.3
1980 1989 1995 2001
90.4 94.7 98.2 100.5
79.9 85.0 90.6 92.7
United Kingdom
France
Germany
Italy
Japan
Labor input per capita 78.9 63.0 85.4 59.4 82.4 61.7 89.2 65.3
75.4 78.7 75.2 75.9
48.8 51.0 50.6 55.1
94.8 107.5 105.5 100.9
Hours worked per capita 92.0 79.3 97.7 71.2 89.8 67.6 94.2 69.7
82.3 82.7 76.4 75.3
71.4 72.1 68.9 72.3
111.9 115.6 109.9 101.1
91.6 95.2 98.4 100.9
68.3 70.7 73.5 76.1
84.7 93.0 96.0 99.9
Labor quality 85.7 79.5 87.4 83.5 91.7 91.2 94.7 93.7
Notes: U.S. 100.0 in 2000. Canada data begins in 1981.
Information Technology and the G7 Economies
335
in hours worked per capita. The United States, Canada, and the United Kingdom moved roughly in parallel. The United Kingdom ranked second in 1980 and 1989, while the United States ranked second in 1995 and 2001. France and Italy lagged the rest of the G7 from 1980 to 2001. The behavior of labor quality highlights the differences between labor input and hours worked. Germany was the leader in labor quality throughout the period 1980 to 2001. The United States ranked second in labor quality, but Canada, France, the United Kingdom, and Japan approached U.S. levels in 2001. Labor quality levels in these four countries moved in parallel throughout the period. Italy was the laggard among the G7 in labor quality as well as hours worked. I summarize growth in labor input and hours worked per capita as well as labor quality for the period 1980 to 2001 in table 11.8. Canada and Japan led the G7 nations in labor input growth during the 1980s, France led from 1989 to 1995 but relinquished its leadership to Italy after 1995. Labor input growth was negative for France during the 1980s, for the United Kingdom, Germany, Italy, and Japan during the period 1989 to 1995, and for Japan after 1995. Hours worked per capita fell continuously through the 1989 to 2001 period for Japan and declined for all the G7 nations during the period 1989 to 1995. Growth in labor quality was positive for the G7 nations in all time periods. Japan was the leader during the 1980s, relinquishing its lead to France during the early 1990s, but regaining its lead in the 1995 to 2001 period. Growth in labor quality and hours worked are equally important as sources of growth in labor input for the G7.
Table 11.8
Growth in labor input and hours worked per capita and labor quality (%) United States
Canada
1980–1989 1989–1995 1995–2001
1.38 0.41 0.79
1.47 0.04 1.35
1980–1989 1989–1995 1995–2001
0.87 –0.21 0.41
1980–1989 1989–1995 1995–2001
0.51 0.61 0.38
Year
United Kingdom
Germany
Italy
Japan
Labor input per capita 0.88 –0.65 –0.59 0.61 1.32 0.95
0.48 –0.78 0.17
0.49 –0.13 1.40
1.40 –0.32 –0.73
0.69 –1.02 0.98
Hours worked per capita 0.67 –1.20 –1.41 –0.86 0.79 0.50
0.06 –1.33 –0.25
0.10 –0.75 0.81
0.36 –0.84 –1.39
0.78 1.06 0.38
Labor quality 0.21 0.55 0.81 1.47 0.53 0.45
0.42 0.55 0.41
0.39 0.63 0.60
1.04 0.52 0.66
Note: Canada data begins in 1981.
France
336
Dale W. Jorgenson
11.3 Investment in IT Using data from tables 11.1 and 11.2, I can assess the relative importance of investment and productivity as sources of economic growth for the G7 nations. Investments in tangible assets and human capital greatly predominated over productivity during the period 1980 to 2001. While productivity fell in Italy during this period, the remaining G7 countries had positive productivity growth. Similarly, using data from table 11.5, I can assess the relative importance of growth in capital stock and capital quality. Capital input growth was positive for all countries for the period 1980 to 2001 and all three subperiods. Capital quality growth was positive for the period as a whole for all G7 countries. Although capital stock predominated in capital input growth, capital quality was also quantitatively significant, especially after 1995. Finally, using data from table 11.7, I can assess the relative importance of growth in hours worked and labor quality. Hours worked per capita declined for France, Germany, and Japan, while labor quality rose in these nations during the period 1980 to 2001. For the United States, Canada, the United Kingdom, and Italy, both hours worked per capita and labor quality rose. I conclude that labor quality growth is essential to the analysis of growth in labor input. 11.3.1 Investment in IT Equipment and Software The final step in the comparison of patterns of economic growth among the G7 nations is to analyze the impact of investment in IT equipment and software. In table 11.9, I present levels of IT capital input per capita for the G7 for the period 1980 to 2001, relative to the United States in 2000. The United States overtook Germany in 1989 and remained the leader through 2001. Canada lagged behind the rest of the G7 through 1995, but France fell into last place in 2001. Table 11.9 reveals substantial differences between IT capital stock and IT capital input. The G7 nations began with very modest stocks of IT equipment and software per capita in 1980. These stocks expanded rapidly during the period 1980 to 2001. The United States led in IT capital stock throughout the period, while Japan moved from the fourth highest level in 1980 to the third highest in 2001. IT capital quality reflects differences in the composition of IT capital input, relative to IT capital stock. A rising level of capital quality indicates a shift toward short-lived assets, such as computers and software. This shift is particularly dramatic for the United States, Canada, and Japan, while the composition of IT capital stock changed relatively less for the United Kingdom, France, Germany, and Italy. Patterns for non-IT capital input, capital stock, and capital quality largely reflect those for capital as a whole, presented in table 11.10.
Table 11.9
Levels of IT capital input and IT capital stock per capita and IT capital quality
Year
United States
Canada
1980 1989 1995 2001
4.5 19.3 38.1 115.3
1.0 3.9 11.2 45.6
1980 1989 1995 2001
9.8 27.4 46.8 110.7
1980 1989 1995 2001
46.4 70.4 81.3 104.1
United Kingdom
France
Germany
Italy
Japan
IT capital input per capita 3.0 4.2 10.9 11.9 20.9 19.1 53.6 38.1
7.1 18.7 31.1 59.7
6.7 18.8 31.2 60.3
1.7 10.3 19.0 46.0
0.8 3.7 9.7 31.8
IT capital stock per capita 2.5 3.5 9.6 9.9 19.2 18.0 44.9 33.4
6.1 15.5 28.2 49.7
4.6 13.1 23.8 44.1
3.5 12.7 22.9 47.8
118.4 107.4 115.0 143.4
IT capital quality 118.5 117.5 112.7 119.7 108.9 106.2 119.3 114.1
117.4 120.4 110.1 120.2
146.8 143.2 131.0 136.6
47.8 81.1 83.0 96.1
Notes: U.S. 100.0 in 2000. Canada data begins in 1981.
Table 11.10
Levels of non-IT capital input and capital stock per capita and non-IT capital quality
Year
United States
Canada
1980 1989 1995 2001
73.8 87.0 90.7 102.2
73.1 83.1 79.9 84.0
1980 1989 1995 2001
82.5 92.5 94.8 101.4
1980 1989 1995 2001
89.5 94.1 95.6 100.8
United Kingdom
France
Germany
Italy
Japan
Non-IT capital input per capita 30.7 41.3 43.4 53.9 55.9 57.9 56.4 62.6
51.9 70.3 79.7 87.3
41.6 71.3 81.2 94.7
39.3 47.9 53.9 57.1
44.1 51.5 53.0 57.4
Non-IT capital stock per capita 25.7 38.0 32.6 44.0 36.9 48.3 44.5 54.1
63.4 70.6 79.3 87.2
38.2 54.8 64.4 75.1
99.1 110.0 120.6 127.1
165.7 161.2 150.7 146.5
Non-IT capital quality 119.2 108.5 133.2 122.6 151.5 119.9 126.7 115.8
81.9 99.5 100.5 100.1
109.2 130.0 126.0 126.1
39.6 43.6 44.7 44.9
Notes: U.S. 100.0 in 2000. Canada data begins in 1981.
338
Dale W. Jorgenson
Table 11.11
Growth in IT capital input and capital stock per capita and IT capital quality (%) United States
Canada
1980–1989 1989–1995 1995–2001
16.09 11.35 18.47
17.66 17.42 23.42
1980–1989 1989–1995 1995–2001
11.47 8.94 14.34
1980–1989 1989–1995 1995–2001
4.63 2.41 4.12
Year
United Kingdom
France
Germany
Italy
Japan
IT capital input per capita 14.43 11.66 10.91 7.92 15.69 11.55
10.71 8.47 10.87
11.44 8.44 10.98
20.19 10.22 14.71
18.88 16.28 19.73
IT capital stock per capita 14.98 11.46 11.50 9.91 14.16 10.35
10.43 9.97 9.40
11.72 9.94 10.28
14.32 9.84 12.25
–1.22 1.14 3.69
IT capital quality –0.56 0.20 –0.58 –1.99 1.53 1.20
0.28 –1.50 1.47
–0.27 –1.49 0.70
5.88 0.38 2.46
Note: Canada data begins in 1981.
I give growth rates for IT capital input per capita, capital stock per capita, and capital quality in table 11.11. The G7 nations have exhibited double-digit growth in IT capital input per capita since 1995. Canada was the international leader during this period, with the United States close behind. Japan was the leader in growth of IT capital input during the 1980s, another period of double-digit growth in the G7. However, Japanese IT growth slowed markedly during 1989 to 1995, and Canada gained the lead. Patterns of growth for IT capital stock per capita are similar to those for IT capital input for the four European countries. Changes in the composition of IT capital stock per capita were important sources of growth of IT capital input per capita for the United States, Canada, and Japan. Information technology capital stock also followed the pattern of IT capital input with substantial growth during the 1980s, followed by a pronounced lull during the period 1989 to 1995. After 1995, the growth rates of IT capital stock surged in all the G7 countries, but exceeded the rates of the 1980s only for the United States and Canada. Finally, growth rates for IT capital quality reflect the rates at which shorter-lived IT assets are substituted for longer-lived assets. Japan led in the growth of capital quality during the 1980s, but relinquished its lead to the United States in 1989. Information technology capital quality growth for the United States, Canada, and Japan outstripped that for the four European countries for most of the period 1980 to 2001. Patterns of growth in non-IT capital input per capita, non-IT capital stock per capita, and non-IT capital quality given in table 11.12 largely reflect those for capital as a whole presented in table 11.6. Table 11.13 and figure 11.1 present the contribution of capital input to
339
Information Technology and the G7 Economies Table 11.12
Year
Growth in non-IT capital input and capital stock per capita and non-IT capital quality (%) United States
Canada
United Kingdom
France
Germany
Italy
Japan
1980–1989 1989–1995 1995–2001
1.83 0.68 2.00
Non-IT capital input per capita 1.60 3.85 2.97 –0.66 4.22 1.20 0.85 0.15 1.30
3.36 2.09 1.52
5.97 2.17 2.57
2.21 1.95 0.96
1980–1989 1989–1995 1995–2001
1.27 0.41 1.11
Non-IT capital stock per capita 1.94 2.62 1.61 0.47 2.07 1.58 1.32 3.12 1.87
1.20 1.92 1.59
4.03 2.68 2.56
1.16 1.53 0.88
1980–1989 1989–1995 1995–2001
0.56 0.27 0.88
2.16 0.17 –0.06
1.94 –0.51 0.01
1.05 0.42 0.08
–0.35 –1.13 –0.47
Non-IT capital quality 1.23 1.36 2.15 –0.38 –2.97 –0.57
Note: Canada data begins in 1981. Table 11.13
Contribution of total capital, IT capital and non-IT capital to output growth (%) United States
Canada
1980–1989 1989–1995 1995–2001
1.53 1.19 2.10
1.71 0.76 1.67
1980–1989 1989–1995 1995–2001
0.45 0.49 0.99
0.39 0.49 0.86
1980–1989 1989–1995 1995–2001
1.08 0.70 1.11
1.32 0.27 0.81
Year
United Kingdom
France
Germany
Italy
Japan
1.44 1.31 1.11
2.55 1.12 1.47
1.49 1.19 1.01
0.18 0.19 0.42
0.19 0.26 0.46
0.24 0.26 0.49
0.44 0.32 0.58
Non-IT capital 1.56 1.94 1.69 0.93 0.18 0.73
1.25 1.05 0.65
2.31 0.86 0.98
1.05 0.87 0.43
Total capital 1.80 2.12 1.96 1.12 0.94 1.15 IT capital 0.24 0.27 0.76
Notes: Contribution is growth rate times value share. Canada data begins in 1981.
economic growth for the G7 nations, divided between IT and non-IT. The powerful surge of IT investment in the United States after 1995 is mirrored in similar jumps in growth rates of the contribution of IT capital through the G7. The contribution of IT capital input was similar during the 1980s and the period 1989 to 1995 for all the G7 nations, despite the dip in rates of economic growth after 1989. Japan is an exception to this general pattern with a contribution of IT capital comparable to that of the United
340
Dale W. Jorgenson
Fig. 11.1
Capital input contribution by country
States during the 1980s, followed by a decline in this contribution from 1989 to 1995, reflecting the sharp downturn in Japanese economic growth. The contribution of non-IT capital input to economic growth after 1995 exceeded that for IT capital input for four of the G7 nations; the exceptions were Canada, the United Kingdom, and Japan. The United States stands out in the magnitude of the contribution of capital input after 1995. Both IT and non-IT capital input contributed to the U.S. economic resurgence of the last half of the 1990s. Despite the strong performance of IT investment in Japan after 1995, the contribution of capital input declined substantially; the pattern for the United Kingdom is similar. 11.3.2 The Relative Importance of Investment and Productivity Table 11.14 and figure 11.2 present contributions to economic growth from productivity, divided between the IT-producing and non-ITproducing industries. The methodology for this division follows Triplett (1996). The contribution of IT-producing industries was positive throughout the period 1980 to 2001 and jumped substantially after 1995. Because the level of productivity in Italy was higher in 1980 than in 2001, it is not surprising that the contribution of productivity growth in the non-IT industries was negative throughout the period. Productivity in these industries declined during the period 1989 to 1995 in Canada and Germany as well as Italy. The decline affected Canada, the United Kingdom, France, and Italy from 1989 to 1995 and became very steep in Germany and Italy from 1995 to 2001.
Information Technology and the G7 Economies Table 11.14
Contributions of productivity from IT and non-IT production to output growth (%) United States
Canada
1980–1989 1989–1995 1995–2001
0.52 0.26 0.54
0.06 0.00 0.58
1980–1989 1989–1995 1995–2001
0.23 0.23 0.48
Productivity from IT production 0.14 0.23 0.29 0.14 0.32 0.29 0.17 0.82 0.56
1980–1989 1989–1995 1995–2001
0.29 0.03 0.06
Year
341
United Kingdom
France
Germany
Italy
Japan
0.23 1.12 –0.10
–0.36 0.37 –0.49
1.15 1.04 0.75
0.28 0.43 0.65
0.32 0.38 0.68
0.15 0.20 0.46
Productivity from non-IT production –0.08 0.11 0.03 –0.05 –0.14 –0.43 –0.55 0.69 0.41 0.09 0.04 –0.75
–0.68 –0.01 –1.17
1.00 0.84 0.29
Productivity 0.34 0.32 –0.11 –0.26 0.91 0.60
Note: Canada data begins in 1981.
Fig. 11.2
Sources of total factor productivity growth by country
Table 11.15 and figure 11.3 give a comprehensive view of the sources of economic growth for the G7. The contribution of capital input alone exceeds that of productivity for most nations and most time periods. The contribution of non-IT capital input predominates over IT capital input for most countries and most time periods with Canada in 1989 to 1995 and the United Kingdom and Japan after 1995 as exceptions. This can be attributed to the
Table 11.15
Sources of output growth (%) United States
Canada
1980–1989 1989–1995 1995–2001
3.38 2.43 3.76
3.10 1.39 3.34
1980–1989 1989–1995 1995–2001
1.33 0.98 1.12
1.33 0.62 1.08
1980–1989 1989–1995 1995–2001
0.45 0.49 0.99
0.39 0.49 0.86
1980–1989 1989–1995 1995–2001
1.08 0.70 1.11
1.32 0.27 0.81
1980–1989 1989–1995 1995–2001
0.23 0.23 0.48
Productivity from IT production 0.14 0.23 0.29 0.14 0.32 0.29 0.17 0.82 0.56
1980–1989 1989–1995 1995–2001
0.29 0.03 0.06
Year
United Kingdom
France
Germany
Italy
Japan
Output 2.69 1.62 2.74
2.38 1.30 2.34
1.99 2.34 1.18
2.51 1.52 1.90
3.83 2.23 1.45
Labor 0.56 –0.24 0.88
–0.06 0.44 0.59
0.32 –0.09 0.17
0.32 0.03 0.93
1.20 0.00 –0.31
0.18 0.19 0.42
0.19 0.26 0.46
0.24 0.26 0.49
0.44 0.32 0.58
Non-IT capital 1.56 1.94 1.69 0.93 0.18 0.73
1.25 1.05 0.65
2.31 0.86 0.98
1.05 0.87 0.43
0.28 0.43 0.65
0.32 0.38 0.68
0.15 0.20 0.46
Productivity from non-IT production –0.08 0.11 0.03 –0.05 –0.14 –0.43 –0.55 0.69 0.41 0.09 0.04 –0.75
–0.68 –0.01 –1.17
1.00 0.84 0.29
IT capital 0.24 0.27 0.76
Note: Contributions; Canada data begins in 1981.
Fig. 11.3
Sources of economic growth by country
Information Technology and the G7 Economies
343
unusual weakness in the growth of aggregate demand in these countries. The contribution of labor input varies considerably among the G7 nations with negative contributions after 1995 in Japan, during the 1980s in France, and during the period 1989 to 1995 in the United Kingdom and Germany. Finally, table 11.16 and figure 11.4 translate sources of growth into sources of growth in average labor productivity (ALP). Average labor productivity, defined as output per hour worked, must be carefully distinguished from overall productivity, defined as output per unit of both capiTable 11.16
Sources of labor productivity growth (%) United States
Canada
1980–1989 1989–1995 1995–2001
3.38 2.43 3.76
3.10 1.39 3.34
1980–1989 1989–1995 1995–2001
1.79 1.02 1.53
1.87 0.20 1.93
1980–1989 1989–1995 1995–2001
1.58 1.40 2.23
1980–1989 1989–1995 1995–2001
Year
United Kingdom
France
Germany
Italy
Japan
Output 2.69 1.62 2.74
2.38 1.30 2.34
1.99 2.34 1.18
2.51 1.52 1.90
3.83 2.23 1.45
Hours 0.82 –1.17 1.03
–0.66 –0.41 0.91
0.11 –0.71 –0.11
0.15 –0.57 0.99
0.95 –0.51 –1.14
1.23 1.19 1.41
Labor productivity 1.87 3.04 2.79 1.71 1.71 1.43
1.88 3.05 1.29
2.36 2.09 0.92
2.89 2.74 2.59
0.40 0.44 0.92
0.35 0.48 0.79
IT capital deepening 0.22 0.19 0.29 0.20 0.71 0.39
0.19 0.28 0.46
0.23 0.28 0.45
0.42 0.33 0.63
1980–1989 1989–1995 1995–2001
0.37 0.34 0.55
0.42 0.16 –0.14
1.20 1.33 0.70
2.25 1.06 0.61
0.69 1.06 0.83
1980–1989 1989–1995 1995–2001
0.30 0.36 0.23
0.40 0.55 0.18
0.26 0.33 0.23
0.23 0.38 0.35
0.63 0.31 0.38
1980–1989 1989–1995 1995–2001
0.23 0.23 0.48
Productivity from IT production 0.14 0.23 0.29 0.14 0.32 0.29 0.17 0.82 0.56
0.28 0.43 0.65
0.32 0.38 0.68
0.15 0.20 0.46
1980–1989 1989–1995 1995–2001
0.29 0.03 0.06
Productivity from non-IT production –0.08 0.11 0.03 –0.05 –0.14 –0.43 –0.55 0.69 0.41 0.09 0.04 –0.75
–0.68 –0.01 –1.17
1.00 0.84 0.29
Non-IT capital deepening 1.20 2.29 2.11 1.15 –0.21 0.25 Labor quality 0.12 0.24 0.49 0.61 0.30 0.19
Note: Contributions; Canada data begins in 1981.
344
Dale W. Jorgenson
Fig. 11.4
Sources of labor productivity growth by country
tal and labor inputs. Output growth is the sum of growth in hours worked and growth in ALP. Average labor productivity growth depends on the contribution of capital deepening, the contribution of growth in labor quality, and productivity growth. Capital deepening is the contribution of growth in capital input per hour worked and predominates over productivity as a source of ALP growth for the G7 nations. Information technology capital deepening predominates over non-IT capital deepening in the United States throughout the period 1980–2001 and in Canada after 1989, the United Kingdom and France after 1995. Finally, the contribution of labor quality is positive for all the G7 nations through the period. 11.4 Alternative Approaches Edward Denison’s (1967) pathbreaking volume, Why Growth Rates Differ, compared differences in growth rates for national income net of capital consumption per capita for the period 1950 to 1962 with differences of levels in 1960 for eight European countries and the United States. The European countries were characterized by much more rapid growth and a lower level of national income per capita. However, this association did not hold for all comparisons between the individual countries and the United States. Nonetheless, Denison concluded:13 13. See Denison (1967), especially chapter 21, “The Sources of Growth and the Contrast between Europe and the United States,” pages 296–348.
Information Technology and the G7 Economies
345
Aside from short-term aberrations Europe should be able to report higher growth rates, at least in national income per person employed, for a long time. Americans should expect this and not be disturbed by it. (1967, 344) Maddison (1987, 1991) constructed estimates of aggregate output, input, and productivity growth for France, Germany, Japan, the Netherlands, and the United Kingdom for the period 1870 to 1987. Maddison (1995) extended estimates for the United States, the United Kingdom, and Japan backward to 1820 and forward to 1992. He defined output as gross of capital consumption throughout the period and constructed constant quality indexes of labor input for the period 1913 to 1984, but not for 1870 to 1913. Maddison employed capital stock as a measure of the input of capital, ignoring the changes in the composition of capital stock that are such an important source of growth for the G7 nations. This omission is especially critical in assessing the impact of investment in information technology. Finally, he reduced the growth rate of the price index for investment by 1 percent per year for all countries and all time periods to correct for biases like those identified by Wyckoff (1995). 11.4.1 Comparisons without Growth Accounts Kuznets (1971) provided elaborate comparisons of growth rates for fourteen industrialized countries. Unlike Denison (1967), he did not provide level comparisons. Maddison (1982) filled this lacuna by comparing levels of national product for sixteen countries. These comparisons used estimates of purchasing power parities by Irving Kravis, Alan Heston, and Robert Summers (1978).14 Maddison (1995) extended his long-term estimates of the growth of national product and population to fifty-six countries, covering the period 1820 to 1992. Maddison (2001) updated these estimates to 1998 in his magisterial volume, The World Economy: A Millennial Perspective. He provided estimates for 134 countries, as well as seven regions of the world—Western Europe, Western Offshoots (Australia, Canada, New Zealand, and the United States), Eastern Europe, Former U.S.S.R., Latin America, Asia, and Africa. Purchasing power parities have been updated by successive versions of the Penn World Table. A complete list of these tables through Mark 5 is given by Summers and Heston (1991). The current version of the Penn World Table is available on the Center for International Comparisons Web site at the University of Pennsylvania (CICUP). This covers 168 countries for the period 1950 to 2000 and represents one of the most significant achievements in economic measurement of the postwar period.15 14. For details, see Maddison (1982, 159–68). 15. See Heston, Summers, and Aten (2002). The CICUP Web site is at http://www.pwt.econ .upenn.edu/aboutpwt.html.
346
Dale W. Jorgenson
11.4.2 Convergence Data presented by Kuznets (1971), Maddison (2001) and successive versions of the Penn World Table have made it possible to reconsider the issue of convergence raised by Denison (1967). Moses Abramovitz (1986) was the first to take up the challenge by analyzing convergence of output per capita among Maddison’s sixteen countries. He found that convergence characterized the postwar period, while there was no tendency toward convergence before 1914 and during the interwar period. Baumol (1986) formalized these results by running a regression of growth rate of GDP per capita over the period 1870 to 1979 on the 1870 level of GDP per capita.16 In a highly innovative paper on “Crazy Explanations for the Productivity Slowdown,” Paul Romer (1987) derived Baumol’s “growth regression” from Solow’s (1970) growth model with a Cobb-Douglas production function. Romer’s empirical contribution was to extend the growth regressions from Maddison’s (1982) sixteen advanced countries to the 115 countries in the Penn World Table (Mark 3). Romer’s key finding was an estimate of the elasticity of output with respect to capital close to three-quarters. The share of capital in GNP implied by Solow’s model was less than half as great. Gregory Mankiw, David Romer, and David Weil (1992) defended the traditional framework of Kuznets (1971) and Solow (1970). The empirical part of their study is based on data for ninety-eight countries from the Penn World Table (Mark 4). Like Paul Romer (1987), Mankiw, Romer, and Weil derived a growth regression from the Solow (1970) model; however, they augmented this by allowing for investment in human capital. The results of Mankiw, Romer, and Weil (1992) provided empirical support for the augmented Solow model. There was clear evidence of the convergence predicted by the model; in addition, the estimated elasticity of output with respect to capital was in line with the share of capital in the value of output. The rate of convergence of output per capita was too slow to be consistent with 1970 version of the Solow model, but supported the augmented version. 11.4.3 Modeling Productivity Differences Finally, Islam (1995) exploited an important feature of the Penn World Table overlooked in prior studies. This panel data set contains benchmark comparisons of levels of the national product at five-year intervals, beginning in 1960. This made it possible to test an assumption maintained in
16. Baumol’s (1986) “growth regression” has spawned a vast literature, recently summarized by Steven Durlauf and Danny Quah (1999; Ellen McGrattan and James Schmitz (1999); and Islam (2003). Much of this literature is based on data from successive versions of the Penn World Table.
Information Technology and the G7 Economies
347
growth regressions. These regressions had assumed identical levels of productivity for all countries included in the Penn World Table. Substantial differences in levels of productivity among countries have been documented by Denison (1967), by my papers with Christensen and Cummings (Christensen, Cummings, and Jorgenson 1981), Dougherty (Dougherty and Jorgenson 1996, 1999), and Yip (Jorgenson and Yip 2000) and in section 11.2. By introducing econometric methods for panel data, Islam (1995) was able to allow for these differences. He corroborated the finding of Mankiw, Romer, and Weil (1992) that the elasticity of output with respect to capital input coincided with the share of capital in the value of output. In addition, Islam (1995) found that the rate of convergence of output per capita among countries in the Penn World Table substantiated the unaugmented version of the Solow (1970) growth model. In short, “crazy explanations” for the productivity slowdown, like those propounded by Paul Romer (1987, 1994), were unnecessary. Moreover, the model did not require augmentation by endogenous investment in human capital, as proposed by Mankiw, Romer, and Weil (1992). Islam concluded that differences in technology among countries must be included in econometric models of growth rates. This requires econometric techniques for panel data, like those originated by Gary Chamberlain (1984), rather than the regression methods of Baumol (1986), Paul Romer (1987), and Mankiw, Romer, and Weil (1992). Panel data techniques have now superseded regression methods in modeling differences in output per capita. 11.5 Conclusions I conclude that a powerful surge in investment in IT and equipment after 1995 characterizes all of the G7 economies. This accounts for a large portion of the resurgence in U.S. economic growth, but contributes substantially to economic growth in the remaining G7 economies as well. Another significant source of the G7 growth resurgence after 1995 is a jump in productivity growth in IT-producing industries. For Japan, the dramatic upward leap in the impact of IT investment after 1995 was insufficient to overcome downward pressures from deficient growth of aggregate demand. This manifests itself in declining contributions of non-IT capital and labor inputs. Similar downturns are visible in non-IT capital input in France, Germany, and especially the United Kingdom after 1995. These findings are based on new data and new methodology for analyzing the sources of economic growth. Internationally harmonized prices for information technology equipment and software are essential for capturing differences among the G7 nations. Constant quality indexes of capital
348
Dale W. Jorgenson
and labor inputs are necessary to incorporate the impacts of investments in IT and human capital. Exploiting the new data and methodology, I have been able to show that investment in tangible assets is the most important source of economic growth in the G7 nations. The contribution of capital input exceeds that of productivity for all countries for all periods. The relative importance of productivity growth is far less than suggested by the traditional methodology of Kuznets (1971) and Solow (1970), which is now obsolete. The conclusion from Islam’s (1995) research is that the Solow (1970) model is appropriate for modeling the endogenous accumulation of tangible assets. It is unnecessary to endogenize human capital accumulation as well. The transition path to balanced growth equilibrium after a change in policies that affects investment in tangible assets requires decades, while the transition after a change affecting investment in human capital requires as much as a century.
References Abramovitz, Moses. 1986. Catching up, forging ahead, and falling behind. Journal of Economic History 46 (2): 385–406. Baldwin, John R., and Tarek M. Harchaoui. 2002. Productivity growth in Canada— 2002. Ottawa, Canada: Statistics Canada. Baumol, William J. 1986. Productivity growth, convergence, and welfare. American Economic Review 76 (5): 1072–85. Chamberlain, Gary. 1984. Panel data. In Handbook of econometrics. Vol. 2, ed. Zvi Griliches and Michael Intrilagor, 1247–1318. Amsterdam: North-Holland. Chow, Gregory. 1967. Technological change and the demand for computers. American Economic Review 57 (5): 1117–30. Christensen, Laurits R., Dianne Cummings, and Dale W. Jorgenson. 1980. Economic growth, 1947–1973: An international comparison. In New developments in productivity measurement and analysis, ed. John W. Kendrick and Beatrice Vaccara, 595–698. Chicago: University of Chicago Press. ———. 1981. Relative productivity levels, 1947–1973. European Economic Review 16 (1): 61–94. Colecchia, Alessandra, and Paul Schreyer. 2002. ICT investment and economic growth in the 1990s: Is the United States a unique case? A comparative study of nine OECD countries. Review of Economic Dynamics 5 (2): 408–42. Court, Andrew T. 1939. Hedonic price indexes with automotive examples. In The dynamics of automobile demand, 99–117. New York: General Motors Corporation. Denison, Edward F. 1967. Why growth rates differ. Washington, DC: Brookings Institution. Dougherty, Chrys, and Dale W. Jorgenson. 1996. International comparisons of the sources of economic growth. American Economic Review 86 (2): 25–29. ———. 1997. There is no silver bullet: Investment and growth in the G7. National Institute Economic Review 162 (1): 57–74.
Information Technology and the G7 Economies
349
Durlauf, Steven N., and Danny T. Quah. 1999. The new empirics of economic growth. In Handbook of macroeconomics. Vol. 1A, ed. J. B. Taylor and M. Woodford, 235–310. Amsterdam: North-Holland. Griliches, Zvi. 1961. Hedonic price indexes for automobiles: An econometric analysis of quality change. In Price statistics of the federal government, 137–96. New York: National Bureau of Economic Research. Heston, Alan, Robert Summers, and Bettina Aten. 2002. Penn World Table version 6.1. Philadelphia: Center for International Comparisons at the University of Pennsylvania (CICUP). Islam, Nasrul. 1995. Growth empirics. Quarterly Journal of Economics 110 (4): 1127–70. ———. 2003. What have we learned from the convergence debate? Journal of Economic Surveys 17 (3): 309–62. Jorgenson, Dale W. 2001. Information technology and the U.S. economy. American Economic Review 91 (1): 1–32. ———. 2003. Information technology and the G7 economies. World Economics 4 (4): 139–70. ———. 2005. Accounting for growth in the information age. In Handbook of economic growth. Vol. 1A, ed. Philippe Aghion and Steven Durlauf, 743–815. Amsterdam: North-Holland. Jorgenson, Dale W., and Zvi Griliches. 1967. The explanation of productivity change. Review of Economic Studies 34 (99): 249–80. Jorgenson, Dale W., Mun S. Ho, and Kevin J. Stiroh. 2005. Information technology and the American growth resurgence. Cambridge, MA: MIT Press. Jorgenson, Dale W., and Kazuyuki Motohashi. 2005. Information technology and the Japanese economy. Journal of the Japanese and International Economies 19 (4): 460–81. Jorgenson, Dale W., and Kevin J. Stiroh. 2000. Raising the speed limit: U.S. economic growth in the information age. Brookings Papers on Economic Activity, Issue no. 1:125–211. Washington, DC: Brookings Institution. Jorgenson, Dale W., and Eric Yip. 2000. Whatever happened to productivity growth? In New developments in productivity analysis, ed. Charles R. Hulten, Edwin R. Dean, and Michael J. Harper, 509–40. Chicago: University of Chicago Press. Kravis, Irving B., Alan Heston, and Robert Summers. 1978. International comparisons of real product and purchasing power. Baltimore: Johns Hopkins University Press. Kuznets, Simon. 1971. Economic growth of nations. Cambridge, MA: Harvard University Press. Maddison, Angus. 1982. Phases of capitalist development. Oxford, UK: Oxford University Press. ———. 1987. Growth and slowdown in advanced capitalist economies: Techniques of quantitative assessment. Journal of Economic Literature 25 (2): 649–98. ———. 1991. Dynamic forces in capitalist development. Oxford, UK: Oxford University Press. ———. 1995. Monitoring the world economy. Paris: Organization for Economic Cooperation and Development. ———. 2001. The world economy: A millenial perspective. Paris: Organization for Economic Cooperation and Development. Mankiw, N. Gregory, David Romer, and David Weil. 1992. A contribution to the empirics of economic growth. Quarterly Journal of Economics 107 (2): 407–37. McGrattan, Ellen, and James Schmitz. 1999. Explaining cross-country income
350
Dale W. Jorgenson
differences. In Handbook of macroeconomics, ed. J. B. Taylor and M. Woodford, 669–737. Amsterdam: North-Holland. Oliner, Stephen D., and Daniel J. Sichel. 20000. The resurgence of growth in the late 1990s: Is information technology the story? Journal of Economic Perspectives 14 (4): 3–22. Organization for Economic Cooperation and Development. 2002. Purchasing power parities and real expenditures—1999 benchmark year. Paris: OECD. Romer, Paul. 1987. Crazy explanations for the productivity slowdown. In NBER macroeconomics annual 1986, ed. Stanley Fischer, 163–201. Cambridge, MA: MIT Press. ———. 1994. The origins of endogenous growth. Journal of Economic Perspectives 8 (1): 3–20. Schreyer, Paul. 2000. The contribution of information and communication technology to output growth: A study of the G7 countries. OECD Working Paper. Paris: Organization for Economic Cooperation and Development. May 23. Solow, Robert M. 1970. Growth theory: An exposition. New York: Oxford University Press. Summers, Robert, and Alan Heston. 1991. The Penn World Table (Mark 5): An expanded set of international comparisons, 1950–1988. Quarterly Journal of Economics 106 (2): 327–68. Triplett, Jack. 1996. High-tech industry productivity and hedonic price indices. In Industry productivity, 119–42. Paris: Organization for Economic Cooperation and Development. van Ark, Bart, Johanna Melka, Nanno Mulder, Marcel Timmer, and Gerard Ypma. 2002. ICT investment and growth accounts for the European Union, 1980– 2000. Brussels, Belgium: European Commission. Wyckoff, Andrew W. 1995. The impact of computer prices on international comparisons of productivity. Economics of Innovation and New Technology 3 (3–4): 277–93.
12 The Role of Semiconductor Inputs in IT Hardware Price Decline Computers versus Communications Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
12.1 Introduction Since at least the mid-1980s, economists have toiled steadily at improving price indexes for high-tech goods and services. The first fruits of this effort were seen in computers.1 The use of quality-adjusted price indexes (primarily hedonic price indexes) for computing equipment has now been institutionalized in the national income accounts of the United States and other industrialized nations and has radically altered our understanding of the macroeconomics of growth and productivity improvement over the last two decades.2 As evidence from these studies accumulated, it also became clear that Ana Aizcorbe is an economist at the Bureau of Economic Analysis. Kenneth Flamm holds the Dean Rusk Chair in International Affairs at the L. B. J. School of Public Affairs at the University of Texas at Austin. Anjum Khurshid is a PhD candidate at the L. B. J. School of Public Affairs at the University of Texas at Austin. We wish to thank Doug Andrey, Ernst Berndt, Mark Doms, Denis Fandel, Bruce Grimm, Daryl Hatano, Mark Roberts, Jack Triplett, Philip Webre, and participants in the Brookings Institution Workshop on Communications Output and Productivity (February 2001), the International SEMATECH Colloquium (February 2002), and the NBER Conference on Research in Income and Wealth on Hard-to-Measure Goods and Services: Essays in Memory of Zvi Griliches (September 2003) for their helpful comments and assistance. We also thank Dataquest, Inc.; International SEMATECH; Semico Research; and the Semiconductor Industry Association for their assistance in obtaining data used in the construction of the price indexes in this paper. The views expressed in this paper are solely those of the authors and do not necessarily represent those of others at the Bureau of Economic Analysis or the University of Texas at Austin. 1. There are now many studies of quality adjustment in computer prices. For an early synthesis of the literature, see Triplett (1989); for a review of more recent work, see Berndt and Rappaport (2001). 2. See, for example, Jorgenson (2001) for one influential reassessment of the impact of IT on U.S. productivity growth.
351
352
Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
much of the improvement in computer price performance was based on even more impressive rates of decline in quality-adjusted prices for semiconductors, the major input to computer manufacture.3 Much recent literature now suggests that changes in semiconductor prices have been a major driver of changes in quality-adjusted computer prices and, even more generally, other types of information technology (IT). Moreover, many have linked an observed quickening in the pace of price declines for semiconductors to an upsurge in the price-performance improvement for IT and ultimately to the improvement in U.S. productivity growth that occurred beginning in the mid-1990s.4 Juxtaposed against this backdrop, it is almost startling to discover that in communications equipment, an equally high-tech product and a similarly ravenous consumer of semiconductor inputs, economic studies have documented vastly lower rates of decline in quality-adjusted price over the same periods in which computer prices have been studied closely. This gap between price declines in computers and communications equipment has been both large and long-lived. The earliest known study of price declines in communications equipment showed quality-adjusted prices for small telephone switches actually increasing over the period 1970 to 1983, prior to the breakup of the old Bell System monopoly.5 By contrast, personal computer (PC) prices over the 1976 to 1983 period have been estimated to have been declining at a rate of about 18 percent per year, and mainframe computers to have fallen at roughly similar rates!6 After divestiture and the breakup of the Bell System in the mid-1980s, the pace of innovation in communications equipment seems to have turned up sharply, but still fell far short of developments in computers. Hedonic estimates of quality-adjusted prices for telephone switches using different data sources, product mixes, and time periods show price declines of about 3. For early calculations suggesting that computer price-performance improvement was due largely to quality-adjusted price changes in electronic components used in computers, see Flamm (1989, 1999). Triplett (1996) constructs an economic framework that, with plausible values, suggests that most of the improvement in computer price performance is due to semiconductors; indeed, he has calculated that multifactor productivity (MFP) for computers is modest, once the contribution of semiconductors has been removed. The first studies of quality-adjusted prices for semiconductor devices were Dulberger (1993), Flamm (1993), and Norsworthy and Jang (1993). More recent work has provided formal modeling and econometric estimation of learning curves (Irwin and Klenow 1994; Flamm 1996) and demand structures (Song 2003) for these devices. 4. For studies suggesting a link between productivity growth and IT quality-adjusted price declines in the productivity speed-up of the 1990s, see Oliner and Sichel (2000), Jorgenson and Stiroh (2000), and Jorgenson (2001). See Flamm (2001) for a detailed analysis of the technical and economic roots of more rapid decline in semiconductor prices as well as an argument that the extraordinary declines in chip prices in the late 1990s must ultimately fall back to a more sustainable pace in the long run. But note that others have expressed some skepticism on the connection between IT price-performance improvement measures and productivity; see Gordon (2000) and Aizcorbe, Oliner, and Sichel (2003). 5. See Flamm (1989). 6. See Berndt and Rappaport (2001) and Cole et al. (1986).
The Role of Semiconductor Inputs in IT Hardware Price Decline
353
9–12 percent annually (for small rural telephone switches, over the period 1982 to 1985), and 9 percent annually (all telephone switches, over 1985 to 1996).7 This contrasts with an average annual decline in PC prices of somewhere between 22 and 31 percent (1982 to 1988) according to one early study, and 18 percent annually (1983 to 1989) in another.8 In the early 1990s, both computer and communications equipment price declines seem to have accelerated, but a substantial differential appears to have been maintained. Grimm’s (1996) study of telephone switch prices shows prices declined faster—to an average decline exceeding 16 percent annually—over 1992 to 1996.9 But PC prices ramped up to decline rates of about 30 percent annually (over 1989 to 1992) according to one study and 34 percent (over 1989 to 1994) in another.10 We know of no empirical studies of telephone switch prices after 1996 but observe other evidence suggesting that the gap between communications equipment and computer price declines continued to be substantial. Berndt and Rappaport (2001) show yet a further increase in the pace of price decline in PCs, to about 40 percent annually after 1994.11 Doms synthesizes fragmentary evidence from a variety of sources to suggest that for communications equipment (including local area network [LAN] equipment, telephone handsets, transmission equipment, and other hardware, in addition to telephone switches), overall, quality-adjusted price declines between 1994 and 2000 were bounded between perhaps 6 percent and 11 percent annually (his “conservative” and “aggressive” assumptions). This compares with a computer price deflator (including all computing equipment, not just PCs) calculated by the Bureau of Economic Analysis that falls at about 21 percent annually over this same period.12 These continuing, persistent, very large differences in measured rates of price decline for computers and communications equipment over a thirtyyear period are difficult to reconcile. Both computers and communications equipment are heavy users of semiconductor devices, yet prices for these two classes of equipment continue to move very differently, even in recent years. Early studies suggested that the lack of “convergence” in qualityadjusted price trends between computers and communications may have been due in large part to regulatory factors.13 But with the break up of the 7. See Flamm (1989) and Grimm (1996). 8. See Berndt and Griliches (1993) and Berndt and Rappaport (2001). 9. See Grimm (1996). 10. See Berndt, Griliches, and Rappaport (1995) and Berndt and Rappaport (2001). 11. See Berndt and Rappaport (2001). 12. See Doms (2003). Current values for the BEA’s price index for computers and peripheral equipment are published in table 5.3.4: “Price Indexes for Private Fixed Investment by Type” of the monthly Survey of Current Business. Historical data are available online at http:// www.bea.gov/bea/dn/nipaweb/TableView.asp?SelectedTable127&FirstYear2002&Last Year2004&FreqQtr. 13. See Flamm (1989) and Gordon (1990).
354
Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
Bell System and deregulation of large parts of the communications market in the mid-1980s, the expanding boundaries of real competition in communications equipment markets and the rapid explosion of growth in the largely unregulated data communications and networking market in subsequent years, regulatory regimes seem a less plausible explanation for observed, continuing differences in rates of quality-adjusted price change between computer and communications equipment. The other possibility that has been considered is that quality improvement in communications hardware is simply poorly measured. Mismeasurement of communications equipment prices has the same distorting effects on measurement of productivity improvement and economic growth that have been the case with computers.14 But even with the improved measurement of quality-adjusted prices documented in recent studies, large differences between computers and communications remain.15 One possible resolution of this paradox is that the specific types of chips that are used in communications equipment show slower price declines than those used in computers. Semiconductors are actually a broad and diverse group of products. They are intermediate goods used in the production of other goods ranging from PCs to timers on household appliances to automotive ignition systems. The prices associated with the different types of chips used in these distinct types of applications are likely very different. We construct and compare semiconductor input price indexes for the two industries and show that the price index of semiconductor inputs to the communications equipment industry does, indeed, decline at a slower rate than does that for the computer industry. Over the 1992 to 1999 period, input price indexes for the semiconductor devices used in communications equipment and in computers fell at a compound annual growth rate of 12 percent and 32 percent per year, respectively. Moreover, we find that these differences in input prices can more than explain the observed differences in the rates of decline in output prices. We caution that much is omitted from this analysis. Other factors could have caused large changes in these end-use prices that may have more than offset, or been offset by, changes in semiconductor input prices. Likely candidates include significant differences in the importance of, and price trends for, other inputs to production (for example, disk drives and displays are important inputs to computer systems, but a relatively minor input in communications gear) and differences in the magnitude and impact of technical innovation originating within the industry itself (as opposed to innovation embodied in components purchased from other industries). 14. See Sichel (2001), Crandall (2001), and U.S. Congressional Budget Office (2001). 15. For example, Doms and Forman (2003) find that rates of decline for data communications and networking hardware in the 1990s remained significantly smaller than those for computers over the same period.
The Role of Semiconductor Inputs in IT Hardware Price Decline
355
This last factor, of course, may also be tied to market structure and competitive conditions in the two sets of industries, another domain in which there may be significant differences. In the next section, we describe the data and methods we used in constructing the input price indexes. In section 12.3, we undertake some illustrative decompositions of the role of semiconductor prices in explaining user industry price trends for computer and communications equipment. We provide concluding comments in section 12.4.
12.2 Construction of the Price Indexes We construct chained-Fisher indexes of price change for semiconductor devices (denoted i) used in different end uses (denoted e). The familiar formula for a Fisher price index (I et,t–1) that measures aggregate price change for end-use e over two adjacent periods (t – 1 to t) is (1)
∑ i ei,t1(P ei,t /P ei,t1) I et,t1 ∑ i ei,t(P ei,t1/P ei,t)
1/2
,
where the expenditure weights are given by (2)
P ei,tQ ei,t ei,t , ∑ i (P ei,tQ ei,t )
and P and Q denote prices and quantities, respectively. The index is a ratio of weighted averages that weigh the price change for each chip by its relative importance in the end use. While equation (1) measures price change for two adjacent time periods (t – 1 to t), price change over longer periods of time (say, time o to time t) is measured by chaining the indexes for adjacent time periods together: (3)
P eo,t (I es,s1) s1,t
To form these indexes, we need data on nominal shipments—for the weights—and on prices—to form the price relatives: P ei,t /P ei,t–1. If the following two conditions occur, then input price indexes will vary across end uses: the end uses must use different types of chips, and the prices for those chips must show different rates of price change. As shown in the following, both of these conditions hold—and in a very significant way—in our data. 12.2.1 Nominal Weights We obtained data on nominal shipments of semiconductor devices broken out by end use from a survey sponsored by the World Semiconductor
356
Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
Fig. 12.1 Value of semiconductors consumed worldwide, by consumption product class, 1999 Source: Semiconductor Industry Association (2002).
Trade Statistics (WSTS) program, a cooperative venture sponsored by national semiconductor industry associations around the world. The survey provides data on shipments for twelve aggregate classes of semiconductor devices: five classes of metal oxide semiconductor (MOS) chips (MOS memory, MOS microprocessors, MOS microcontrollers, MOS microperipherals, and other MOS logic); two classes of other types of integrated circuits (analog and bipolar), and five types of single-function “discrete” semiconductors (power transistors, small signal transistors, thyristors and rectifiers, optoelectronics, and diodes and all other discretes). The data for 1999 are summarized in figure 12.1. Note that much of the world chip market is made up of MOS devices—well-known chips like MOS memory chips (e.g., Dynamic Random Access Memory (DRAM) chips) and microprocessors (MPUs, like Pentium chips) and some less-visible MOS devices like microperipherals (MPRs) and microcontrollers (MCUs).16 16. See Semiconductor Industry Association (2002b) for a detailed descriptions of these devices and their capabilities.
The Role of Semiconductor Inputs in IT Hardware Price Decline
357
Fig. 12.2 Share of leading-edge wafers in total silicon area processed, by product, 1999 Source: Authors’ calculations based on unpublished data on Semico Research wafer shipment data, obtained from International SEMATECH, Austin, Texas.
One important dimension along which these devices differ is the degree of “high techness.” Researchers at the International SEMATECH research and development (R&D) consortium classify these product categories as “leading edge” or “non-leading edge” according to the manufacturing processes used when they are produced and the percentage of the wafers processed in that category that use the latest leading-edge processes. Figure 12.2 shows the share of total silicon wafer area processed in 1999 for several semiconductor device classes using this indicator. The solid bars correspond to more highly aggregated classes of products, while the striped bars correspond to more disaggregated product categories within the aggregates to their right (and note that the shares are of silicon area processed, not of value of product, within a category). According to this indicator, MOS microprocessors (MPUs) are 90 percent leading edge; MOS memory is a little under half leading edge; and microcontrollers, microperipherals, and other MOS logic at about 17 percent are even less dependent on leading-edge manufacturing. Analog, bipolar, and all discrete device categories are entirely produced with more mature technologies that are characterized as non-leading edge. The analog category—making up 15 percent of world shipments in
358
Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
Fig. 12.3
Value of semiconductors consumed worldwide, by end-use sector, 1999
Sources: Semiconductor Industry Association (2002). WSTS End-Use Survey.
1999—is acknowledged within SEMATECH to be poorly characterized within this breakdown and to require further work. It is actually a combination of some very high-tech products produced with leading-edge technology and some relatively mature products produced with relatively old technology. Because analog chips are a major input to communications equipment, this topic is revisited in the following. For each of these classes of semiconductor devices, nominal shipments are further broken out into the following end-use categories: computer, communications, consumer electronics, industrial, automotive, and government.17 As shown in figure 12.3, the largest end use for semiconductor 17. The definitions for each end use are as follows: the computer category includes mainframes, peripherals, and PCs. Communications includes telecommunications, transmission, two-way, and cellular radio equipment. The remaining categories are fairly diverse. Consumer includes the following type of devices: entertainment, radio, TV, VCR, personal or home appliance, cameras, games, and so on; automotive represents chips used in auto entertainment, engine controls, and all other auto applications; the industrial and instrument category includes lab, test, control, and measurements; and chips used in government end uses include those in military and government special purchases.
The Role of Semiconductor Inputs in IT Hardware Price Decline
359
chips is computers: about half of value of worldwide shipments in 1999 went to computer manufacturers. The next-largest end uses that year were communications equipment (21 percent) and consumer electronics (14 percent). Together, these three groups of end use industries accounted for about 7/8 of semiconductor consumption in 1999, while all other categories together accounted for the remaining 1/8 of shipments. The disaggregate data show that the composition of semiconductor devices used in computers is very different from that of communications equipment. As shown in figure 12.4, the bulk—79 percent—of semiconductor shipments to computer makers are made up of MOS devices that are known to have experienced extremely rapid rates of technological change (memory and microcomponents: MPU, MCU, and MPR). These are the largest segments of the overall semiconductor market in volume and value (accounting for 56 percent of global semiconductor sales in 1999) and are the primary products produced using the most technologically advanced, leading-edge fabrication lines (see figure 12.2). The large volumes for these products are used to justify large fixed investments in deploying the most advanced, current manufacturing technology in their production. In contrast, the composition of semiconductor devices used in communications equipment is much more diverse and more skewed toward devices where quality-adjusted price trends are less well understood. MOS memories and microcomponents make up only 34 percent of the semiconductor inputs to communications equipment; the next two largest classes of inputs are other MOS logic and analog devices, where significant technological change has also taken place. The remaining 15 percent of inputs are from older, more mature devices. Data for other years in the 1990s show a similar pattern. These differences in composition have implications for price measurement when the prices of individual devices change at different rates. 12.2.2 Price Relatives Relative prices for individual devices (P ei,t /P ei,t–1) are empirically measured using price indexes. Because price indexes broken out by device and end use are not available, we assume that the measured price change for each device grouping does not vary by end use (P ei,t /P ei,t–1 Pi,t /Pi,t–1). This assumption is plausible for semiconductor devices that are largely commodity-like (for example, standard memory, logic, and microprocessor components), but is potentially problematic for devices that are customized for particular end uses. Most of the price indexes we used are Fisher ideal matched-model indexes either taken from previous studies (Grimm 1998; Aizcorbe 2002; and Aizcorbe, Corrado, and Doms 2000) or recalculated by the authors from the sources used in those studies. For logic chips (accounting for 16 percent
Fig. 12.4 Semiconductors used in the production of computers and communications equipment, by product class, 1999 Sources: Semiconductor Industry Association (2002). WSTS End-Use Survey.
The Role of Semiconductor Inputs in IT Hardware Price Decline
361
of the market in 1999), detailed market share data were unavailable, and geometric means of price relatives for matched models were used instead (see table 12A.3). One important exception is the index for MPR chips. As detailed in the appendix, we use new data to construct an annual qualityadjusted Fisher price index to better capture the rapid technological improvements reported for these devices. The other notable exception is the price index for analog devices. As mentioned earlier, these devices are important in the production of communications equipment and are thought to have poorly measured price indexes. The appendix details the construction of the hybrid index we use for these devices; while we measure price change for the low-tech devices in this class using average sales prices at the lowest possible level of disaggregation, we assume that the price change for the high-tech devices in this class parallels that of devices in the “Other MOS logic” class of chips. To obtain the hybrid index, we average over the two indexes using Fisher weights. All told, we have annual price indexes for twelve classes of semiconductor devices—one for each of the semiconductor classes in figure 12.1. Price measures for these devices—given in the first column of table 12.1—decline at substantially different rates over the 1992 to 1999 period. For the most part, differences in the rates of price declines exhibited by the devices are intuitively plausible. Devices normally associated with rapid rates of product innovation and technical change do, indeed, show rapid price declines: MOS microcomponents (MPUs, MCUs and MPRs), MOS memTable 12.1
Constant-quality price change and nominal weights for semiconductors in computer and communication equipment Nominal shipments weight, 1999 (%)
MOS MPU MOS memory MOS MPR Other MOS logic MOS MCU Thyristors and rectifiers Power transistors Small signal transistors Optoelectronics Diode and all other discrete Digital bipolar Analoga
Price change (CAGR), 1991–1999
Computers
Communications
–52.3 –30.8 –14.0 –13.2 –7.5 –7.1 –5.6 –5.3 –3.6 –2.6 0.6 1.4 (–9.0)
33.9 32.2 10.0 9.0 2.6 0.7 1.1 0.5 1.6 0.5 0.6 7.3
2.5 11.7 4.0 30.3 14.8 0.9 4.2 2.4 6.0 1.7 0.6 20.8
Source: Authors’ calculations. The two price indexes for analog devices are referred to as the World Semiconductor Trade Statistics (WSTS) and Hybrid indexes, respectively. a
362
Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
Table 12.2
Semiconductor input price indexes, by end use, 1992–1999 Compound annual growth rate
Worldwide Auto Communication Computer Consumer Government Industrial North America Auto Communication Computer Consumer Government Industrial
1992–1999
1992–1995
1995–1999
–12.35 –15.33 –32.22 –13.97 –17.30 –15.36
–3.97 –3.33 –11.30 –2.27 –3.00 –3.36
–18.16 –23.34 –44.60 –21.82 –26.62 –23.38
–12.46 –15.58 –34.74 –15.22 –14.74 –16.11
–4.64 –3.41 –13.29 –2.17 –3.37 –4.27
–17.91 –23.69 –47.26 –23.85 –22.39 –24.02
Source: Authors’ calculations based on table 12A.8 in the appendix.
ory chips, and Other MOS logic. Similarly, more mature chips that have not undergone much change in the last decade do not show much price decline, for example, bipolar devices. The second and third columns of table 12.1 report the nominal shares data associated with each device. As may be seen, prices for semiconductor devices that go into computers tend to fall faster than those that go into communications equipment. Chips whose prices fall more than 30 percent account for about 65 percent of the nominal value of chips that go into computers. Prices of the remaining chips fall at much lower rates—14 percent or less—and have a much heavier weight in communications equipment. As shown in the top panel of table 12.2, semiconductor input price indexes differ substantially across end uses.18 For the period 1992 to 1999, input chip prices for automotive end uses decline the most slowly—declining at about a 12 percent compound annual growth rate (CAGR)—while those of computer chips decline the fastest—at about a 32 percent CAGR over the same period. Input prices for communications end uses fell at a 15 percent CAGR over the period—just a bit faster than prices for automobile end uses. The next two columns provide measures of price change for the pre- and post-1995 periods. In all cases, price indexes experience faster price declines after 1995 than in the earlier period. But, in either case, there 18. The robustness of these estimates to changes in the underlying assumptions is discussed in the appendix. Although the numerical results can be sensitive, the qualitative results are the same.
The Role of Semiconductor Inputs in IT Hardware Price Decline
363
is always a substantial gap between the computer and communications equipment indexes. The indexes discussed thus far use worldwide end-user consumption of semiconductors as weights. Alternatively, it is possible to use North American consumption of our twelve classes of semiconductor prices by enduser industry to construct input price indexes for specific U.S. industries. The results, shown in the bottom panel of table 12.2, are very close to those shown in the preceding, reflecting the fact that the mix of semiconductors used in U.S. end-use industries is roughly identical to the mix overseas. Economically, this is a consequence of the fact that semiconductors are sold in what is effectively an integrated global market, with transport costs for this very light and compact product too small relative to the value of the product, to create shelter for regional differentials in prices that might otherwise lead to substitution among device classes and differences in semiconductor input mix across countries. 12.3 Contribution of Changes in Semiconductor Input Prices to Changes in Output Prices We have concluded that differences in the composition of semiconductor inputs used in computer and communications equipment account for significant differences in the rate at which the prices of semiconductor inputs used in these two industries fell through the 1990s. We can now examine the importance of semiconductor prices for prices of the end goods produced by the user industries purchasing these inputs. Our first step is to sketch out a simple analytical framework. We shall assume constant returns to scale in the production of electronic goods that make use of semiconductors and allow for imperfect competition and technological change in their using industries. We approximate short-run marginal cost with a unit variable cost function.19 Conceptually, we have in mind a monopolistic competition model of the market for these electronic products, where every producer makes a unique variation of the basic industry product and therefore faces a downward-sloping demand curve. Profit maximization then yields a price-marginal cost margin that is inversely proportional to the producer’s perceived price elasticity of demand. In the long run, as the effects of entry or exit from the industry and conse19. See Morrison (1992) for an extended discussion of a decomposition of price change into its component elements based on variable cost function and Oliner and Sichel (2000) for a similar framework. Note that our assumption of constant returns to scale is inessential; with nonconstant returns to scale, a scale effect must also be incorporated into our decomposition of price change. This decomposition is derived from cost functions and is dual to a productivity growth decomposition derived from a production function. For discussion, see Basu and Fernald (1997).
364
Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
quent shifts in demand curves are felt, the price-marginal cost margin will adjust so that no economic profits are being earned. Adopting these assumptions, we can write P e (1 )g(Ps , Pz ; k, t),
(4)
where P e is the price of output for some given industry (or end use), is the markup of price over unit variable cost g( ), reflecting both imperfect competition and subequilibrium (short-run capital per unit of output diverging from the long-run optimum). Costs are a function of the semiconductor input price for that industry, Ps , a vector of all other relevant input prices, Pz , a vector of fixed (in the short run) capital inputs per unit of output, k, and an index representing the possible impact of technological changes and other factors shifting the unit variable cost function over time, t. Taking logs on both sides of this equation and differentiating with respect to time, we have (5)
∂g
P g ∂P dt dt 1 1 ∂g dP 1 ∂g dk ∑ ∑ g ∂P dt g ∂k dt 1 ∂g . g ∂t dP e dt
1
1
d(1 )
1
dPs
e
s
zi
i s
j
j
zi
j
Making use of Shepherd’s lemma, and the empirical approximation of (dX/dt)(1/X ), by the annual percentage rate of change (), we then have, approximately, (6)
P e s Ps [(1 ) ∑ zi Pzi ∑ εj kj g], i s
j
where is the variable cost share of an input, εj is the elasticity of variable unit cost with respect to fixed factor kj , and changes in g measure technical change. In effect, we have partitioned the annual percentage change in the price of the output of a semiconductor input-using industry into the effect of semiconductor prices (the first term on the right-hand side) and the sum of all other effects (the terms in brackets). These residual determinants of output price changes not accounted for by semiconductor inputs, we note, are likely to be quite important, reflecting changes in markups over variable cost (which we would expect to be affected by demand swings in these highly cyclical industries, as well as transitory entry and exit by competitors, and secular trends in market structure), other production costs, and changing technology in the user industries. Our strategy is simply to calculate the first term on the right-hand side of this last equation ( s Ps ) and view it as the contribution of semiconduc-
The Role of Semiconductor Inputs in IT Hardware Price Decline Table 12.3
365
Derivation of semiconductor cost share and contribution to output price change (%) Semiconductor cost share
Shipment shareb (2)
Consumer audio Computers Communications
Semi inputs/ Variable cost (4) (2) (3)
Contribution (percentage points) (5) (4) (1)
Price changea (1)
Low
High
Shipments/ Variable costc (3)
Low
High
Low
High
–30.4 –52.7 –31.6
11 20 11
15 30 19
125.9 150.8 168.2
14.0 30.6 18.2
18.7 45.1 31.6
–4.3 –16.1 –5.7
–5.7 –23.8 –10.0
a
Calculations based on appendix table 12A.8, percent change from 1997–1998. Calculations based on appendix table 12A.7. c Calculated as shipments/(shipments – value added payroll) using data from U.S. Census Annual Survey of Manufactures, 1998, for NAICS 3341 (computer and peripheral equipment manufacturing), NAICS 3342 (communications equipment manufacturing), NAICS 3343 (audio and video equipment manufacturing). b
tors to the overall price change for semiconductor-using output (P e). Changes in the industry-specific price indexes for semiconductor inputs that we have just constructed (Ps) are shown in the first column of table 12.3 for three sectors: consumer audio, computers, and communications. As noted earlier, these estimates—for changes from 1997 to 1998—document that the type of semiconductor chips that went into computers that year experienced more rapid price declines than those that went into the other two end uses. The next three sets of columns indicate how we estimate the semiconductor cost share in variable cost ( s ). We estimate this cost share in two steps. First, we gather together industry estimates20 of the share of semiconductor inputs in the value of shipments of each end-use sector’s electronic equipment—measured as (P sQ s)/(P eQ e). Then, we use data from the Annual Survey of Manufacturers (U.S. Bureau of the Census 2000) to translate that share of shipments into a share of unit variable cost. Given the observed data, we actually approximate variable costs as shipments less nonlabor value added (i.e., the ratio of shipments/[shipments – value added payroll] is multiplied by the semiconductor share of shipments). A range of the available estimates for semiconductor content shares is given in the second set of columns of table 12.3; the full set of estimates is given in the appendix. Note that we suspect that estimates of semiconduc20. Measurement of the value of semiconductor input cost in different industries is a notoriously weak link in coverage of statistical agencies of the manufacturing sector (see Triplett [1996] for a more extensive discussion of these problems). Note also that these cost shares are for electronic equipment produced in each end-use sector—thus it is the semiconductor content of automotive electronic equipment, not the entire auto, that is being estimated.
366
Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
tor cost shares are biased downward—electronic equipment shipments data (the denominator) often double-count sales of semifinished assemblies or rebranded equipment among manufacturers. We show both a low and high estimate here to place rough bounds on the industry estimates. The “high” estimates of semiconductor content represent a conservative choice for reasons just described. In either case, the semiconductor share of shipments is typically twice as large for computers than it is for the other two end uses. Multiplying this share by the ratio of shipments to variable cost (table 12.3, column [3]) yields an estimate of the semiconductor content in variable cost for these industries (column [4]). Not surprisingly, the estimated shares are substantially higher for computers (30–45 percent) than for the other two end uses. Multiplying this estimate of semiconductor content by the change in the semiconductor input price index (column [1]) gives our estimate of the portion of the price change for each end use that can be attributed to changes in semiconductor input prices (the last column). Using our “high” estimates of semiconductor content, declines in semiconductor input prices pushed down computer and communications prices by about 24 and 10 percentage points, respectively. But how large is this relative to the declines in end-use prices? That is, how much of the absolute decline in the end-use prices is explained by declines in semiconductor prices? Table 12.4 shows that price declines for
Table 12.4
Contribution of semiconductors to end use price change in 1998 (%) Contribution of semiconductors
Percentage points (2)
Consumer audioa Computersb Communications LAN equipmentc LAN equipment and switchesd a
Share of enduse price change (2)/(1)
End-use price change (1)
Low
High
Low
High
–15.8 –40.3
–4.3 –16.1
–5.7 –23.8
26.9 40.1
36.0 59.0
–29.5
–5.7
–10.0
19.5
33.9
–33.3
–5.7
–10.0
17.3
30.0
Hedonic index with vintage included from Kokoski, Waehrer, and Rozaklis (2000), table 9. Matched model Fisher for all computer systems from Aizcorbe, Corrado, and Doms (2000). c Corrado (2001, 139). d Estimated as follows: Relative expenditure on switches, LAN equipment from Doms and Forman (2003) used as weights; weighted average of LAN equipment price change and estimated switch price change. Estimated switch price change taken as 1.258 times LAN equipment price change based on historical relationship between LAN and switch price change over 1992–1996 taken from Corrado (2001) and Grimm (1996). b
The Role of Semiconductor Inputs in IT Hardware Price Decline
367
semiconductor devices had a large impact on end-use prices. Column (1) gives estimates of quality-adjusted price change from 1997 to 1998 for three end goods: consumer audio electronics, computers, and communications equipment. The estimated effect of semiconductor prices is expressed in both percentage points—the second set of columns—and as a fraction of total equipment price change—the last set of columns. Our analysis suggests that semiconductors can account for roughly 40 to 59 percent of computer equipment price decline, roughly 27 to 36 percent of price declines for consumer audio, and maybe a little less for communications equipment in that year. We can now address the puzzle originally posed: how much of the differential in computer and communication equipment price declines can be attributed to the respective differences in the contributions of semiconductors? To do this, we take the difference in the calculated price declines for communications and computers reported in table 12.4 and partition these differences into price change attributable to semiconductors versus the combined impacts of all other factors. Those numbers are reported in the top panel of table 12.5. The first column of table 12.5, for example, reports that quality-adjusted prices for computer equipment fell about 11 percentage points faster than LAN equipment in 1998. The second colTable 12.5
Estimates of the relative contribution of semiconductors to price change in computers and communications equipment in 1998 (%) Semiconductor contributionb (2)
Change in computer prices less:
End-use price changea (1)
LAN equipment LAN equipment and switches
–10.8
LAN equipment LAN equipment and switches
–10.8
LAN equipment LAN equipment and switches a
=
All other factors (1) – (2) +
High
Low
High
–13.8
–0.4
3.0
–13.8
3.4
6.8
–9.8
–3.5
–1.0
–9.8
0.3
2.8
Calculations using natural logs –22.1 16.0
–21.8
–6.0
–0.3
–18.3
–21.8
–2.2
3.5
–7.0
–7.0
Low
Preferred measures –10.4 –10.4 Alternate sales weights –7.3 –7.3
16.0
Calculated using figures in table 12.4, column (1); for the “Natural Logs” case, the calculations are based on an alternative calculation of the figures in table 12.4, column (1) that uses natural logarithms rather than percent changes. b Calculated using figures in table 12.4, column (2); for the “Natural Logs” case, the calculations are based on an alternative calculation of the figures in table 12.4, column (2) that uses natural logarithms rather than percent changes.
368
Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
umn documents that essentially all of that difference can be attributed to differences in the semiconductor contribution: the higher semiconductor contribution in computers accounts for between 10–14 percentage points of the 10.8 percent difference in computer and LAN equipment end-use price change. If one adds in switches to the communications price index (as in the second row of the table), the higher semiconductor contribution in computers more than explains the differences in end-use prices. We conclude that differences in semiconductor input price changes, coupled with differences in semiconductor intensity, can explain almost all of the difference between rates of decline of computer and LAN equipment prices in 1998. The remaining panels in table 12.5 report two checks for the robustness of our analysis. First, we could have used a different model of competition in semiconductor-using industries. Although we do not find it particularly plausible in these high-tech sectors, we could have assumed perfect competition. In that case, the markup must equal zero, price would equal longrun marginal cost, and total costs would equal revenues. The analysis of equation (6) would continue to hold in all respects, except that s would now represent the share of semiconductor inputs in revenues or sales, rather than variable costs, so we would now be using somewhat smaller weights to translate the impact of semiconductor price changes on computer and communications equipment costs. The second panel of table 12.5 demonstrates that this change would have no substantive impact on our conclusion, with virtually all of the difference in rates of decline in computer and communications equipment prices still explainable as the result of differing rates in semiconductor input price declines. A second issue is our use of percentage rates of change to approximate the expression (dX/dt)(1/X ). Another equally credible approximation would be first differences in natural logarithms of X. For small changes, the two approximations should be quite close. For large changes, however (and some of our changes, exceeding 40 or 50 percent annually, are large), these two approximations could differ significantly. The bottom panel of table 12.5 shows that reworking our tables using first differences of logs (still expressed as a percentage, i.e., multiplied by 100) in lieu of percentage rates of change again leaves our conclusion unaltered. The difference in computer and communications equipment price declines is still entirely explainable by differing declines in semiconductor input prices. Note, moreover, that the share of equipment price changes explained by semiconductor prices increases when using first differences of log prices in our decomposition. Semiconductors now account for 45 to 66 percent of computer price change from 1997 to 1998; 23 to 41 percent of LAN equipment and 21 to 36 percent of LAN equipment and switches; and 30 to 39 percent of consumer audio price changes.
The Role of Semiconductor Inputs in IT Hardware Price Decline
369
12.4 Conclusions This paper documents findings obtained from a first effort at calculating industry-specific semiconductor input price indexes and assessing the proportionate impact of changes in this high-technology input price on the prices and quality improvement in two equally high-tech industries downstream. The quality of data on semiconductor and computer prices is now acceptable for these purposes, but information on semiconductor input expenditures in all sectors and quality-adjusted price indexes in sectors other than semiconductors, computers, and a small fraction of communications equipment remains marginal. Given the available data, we were able to construct a decomposition for the year 1998, the only year where we felt we had relatively credible data on both semiconductor content and on the price indexes for both inputs and end-use outputs. Given these caveats, this initial analysis led us to two conclusions. First, from 1997 to 1998, changes in semiconductor input prices appear to account for somewhere between 20 to 30 percent of price declines in both consumer electronics and LAN equipment and for 40 to 60 percent of price declines in computers. If we were to perform our decomposition using differenced logarithms instead of percentage rates of change in our approximations, the role of semiconductors in accounting for declining product prices would be even greater. Second, in 1998, computer prices fell between 7 and 11 percentage points faster than communications equipment, depending on our measurement of communications price changes. Differences in the quantity and composition of semiconductors used in these two sectors alone would have contributed perhaps 10 to 14 percentage points to this differential. To a first approximation, then (which is all we can reasonably expect given the poor quality of the available data), we conclude that differences in the composition of semiconductor input bundles coupled to significant differences in the relative importance of semiconductor inputs in cost together can potentially account for the entire difference in output price declines between the two sectors.
Appendix Construction of the Semiconductor Input Price Indexes Nominal Weights We obtained data on nominal shipments of semiconductor devices broken out by end use from a survey sponsored by the World Semiconductor
370
Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
Trade Statistics (WSTS) program, a cooperative venture sponsored by national semiconductor industry associations around the world. Under their auspices, the U.S.-based Semiconductor Industry Association has conducted an annual semiconductor end-use survey among U.S. users since 1984; since 1992, this survey has effectively covered all major semiconductor producers globally. The survey—administered to semiconductor producers participating in the WSTS program—asks respondents to classify their total worldwide sales by customer end-use market and geographic location. Sales numbers for nonparticipants in the WSTS program are imputed. The data we use cover the period 1991 to 1999 and report nominal shipments to both North American end users and all (worldwide) users. The annual shipments (in thousands of units) for the world market are given in table 12A.1. Nominal Weights for Microcomponents An unfortunate feature of the data is that before 1995, industry consumption estimates for microprocessors (MPUs), microcontrollers (MCUs), and microperipherals (MPRs) are not reported separately—instead, they are lumped into one category called “MOS Micro.” For this earlier period, we assume the percentage breakdown among these subcategories within user industries of “MOS Micro” prior to 1995 was the same as in 1995. Our results are not sensitive to this assumption. Table 12A.2 redoes table 12.2 in the paper employing an overall index for MOS Micro price aggregated across all user sectors over 1992 to 1994, in lieu of using a detailed sector-specific breakout of 1995 MOS Micro consumption as an approximation to weights for detailed (MPU, MCU, MPR) MOS Micro input price indexes prior to 1995. In the worldwide indexes, input chip prices for automotive end uses still experience the slowest declines, while computer chips still undergo the fastest—now –14 percent versus –31 percent CAGR over the period. Input prices for communications end uses still lies in between the two extremes, falling an average of –17 percent CAGR over the period to 27 percent of its 1992 level by 1999. The North American indexes show a similar pattern. Interestingly, approximating sector-specific consumption bundles within MOS Micro prior to 1995 substantially widens the price decline gap between computers and other semiconductor-user sectors (table 12.2 in the text). This occurs because the specific type of MOS Microchip dominating computer use of these chips (MPU) fell much faster than other MOS Microchip types (MCU, MPR) over 1992 to 1995; these other chips dominated consumption of MOS Micro in other sectors. The net effect of crediting MPU price declines mainly to computers, and reducing the weight of MPU declines in price indexes for other sectors, is to leave noncomputer
54,607,454
Total semiconductor
59,864,958
1,290,629 1,783,320 2,629,819 1,909,873 2,297,378 3,147,449 8,728,687 5,460,259 5,245,160 3,205,239 9,331,793 14,835,353
1992
Source: Semiconductor Industry Association (2002a).
1,341,463 1,803,269 2,489,270 1,912,197 2,421,766 3,421,608 8,335,914 3,565,035 4,851,901 2,971,576 9,260,355 12,233,100
1991
77,309,681
1,498,533 1,979,948 3,015,544 2,142,621 2,654,118 3,149,852 10,673,019 8,589,686 6,560,368 3,921,409 11,857,716 21,266,867
1993
101,878,593
1,747,473 2,432,565 3,704,908 2,596,973 3,238,387 2,773,665 13,585,169 10,995,486 8,276,384 4,548,201 15,529,061 32,450,325
1994
144,403,681
2,465,981 3,309,019 5,181,568 3,048,455 4,343,561 2,773,878 16,646,353 14,278,592 10,735,795 8,381,534 19,781,034 53,457,910
1995
131,966,433
2,189,285 2,884,870 4,936,068 2,868,492 4,146,750 1,925,660 17,043,805 18,529,996 11,435,438 9,862,276 20,125,581 36,018,211
1996
137,203,120
2,262,636 2,756,933 5,083,619 3,061,730 4,505,929 1,594,019 19,788,937 23,466,929 12,622,903 11,676,920 21,047,471 29,335,095
1997
Nominal value of semiconductors consumed worldwide, by product class, 1992–1999 (in thousands of dollars)
Diodes and all other discretes Small signal transistors Power transistors Rectifiers and thristors Optoelectronics Digital bipolar Analog MOS MPU MOS MCU MOS MPR Other MOS logic MOS memory
Table 12A.1
125,611,999
2,144,643 2,374,300 4,616,964 2,787,425 4,617,216 1,099,712 19,072,955 24,775,645 12,115,824 10,449,901 18,564,413 22,993,001
1998
149,378,551
2,429,508 2,752,609 5,404,166 2,796,614 5,777,794 990,300 22,081,701 27,191,405 14,083,190 10,426,667 23,158,467 32,286,130
1999
372
Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
Table 12A.2
Semiconductor input price indexes calculated using aggregate MOS Micro Price Index, by end use, 1992–1999 Compound annual growth rate
Worldwide Auto Communication Computer Consumer Government Industrial North America Auto Communication Computer Consumer Government Industrial
1992–1999
1992–1995
1995–1999
–13.66 –16.33 –31.33 –15.14 –17.80 –15.61
–7.28 –5.96 –8.57 –5.33 –4.36 –4.01
–18.16 –23.34 –44.60 –21.82 –26.62 –23.38
–13.52 –16.54 –33.54 –16.72 –14.76 –16.21
–6.92 –5.94 –9.53 –6.15 –3.90 –4.52
–18.16 –23.69 –47.26 –23.85 –22.10 –24.02
Source: Authors’ calculations.
use semiconductor prices falling much less steeply over 1992 to 1995, while semiconductors used in computers fall even faster. Price Relatives Most of the price indexes we used for MOS devices are either taken from previous studies (Grimm [1998], Aizcorbe [2002] and Aizcorbe, Corrado and Doms [2000]) or recalculated from the sources used in those studies. Where quarterly or monthly indexes (rather than annual ones) are reported in these sources, a variant of a “superlative” procedure suggested by Diewert (2000) is used to aggregate up to an annual price relative.21 Table 12A.3 summarizes features of the underlying price indexes we use for semiconductor devices. In most cases, the price measures are Fisher indexes calculated from highly detailed data. With regard to index construction, Fisher indexes are available for all but 16 percent of the market: price change for subcategories of Other MOS logic chips are measured using 21. Our use of the Törnqvist-Theil index number formula given in Diewert (his formula 26) is to calculate (for annual price of a product in year 1 relative to year 0, based on monthly price data): 1 p1,m ln P 1( p 0, p1, s 0, s 1) ∑ (s 0, m s1, m) ln , 2 p0,m m
i,m
where s is the share of expenditure on the product in question in month m in annual expenditure in year i, and subscript m refers to months. We have used this formula to construct annual price index relatives for adjoining years and then chained these to produce an index extending over the 1992 to 1999 period. See Diewert (2000, 9).
373
The Role of Semiconductor Inputs in IT Hardware Price Decline Table 12A.3
Price indexes for individual semiconductor devices: Underlying data 1999 shares (%)
Type of device MOS Memory chips Microprocessors Microcontrollers
21 18 9
Microperipherals Logic chips General purpose logic Gate array Standard cell Field programmable logic Other integrated circuits, optoelectronics, and discrete devices
6 16
36
Index source
Price measure
Data frequency
Distinct devices
Time period
2 1,3 4 2,4 4 2
Fisher Fisher Fisher Fisher Fisher
Q/Ave Q/Ave M/Ave M/A/Ave A/Ave
84 85 5 53 5
1991–99 1992–99 1991–96 1996–99 1991–99
GeoMeans GeoMeans GeoMeans GeoMeans
A/end A/Ave A/Ave A/Ave
35 63 56 14
1991–99 1991–99 1991–99 1991–94
Fisher
M/Ave
43
1991–99
2,4
Sources: 1. Grimm (1998); 2. Aizcorbe (2002); 3. Aizcorbe, Corrado, and Doms (2000); 4. Authors’ calculations. Table 12A.4
Price indexes for the individual classes of MPR chips
Component price indexes
1991
1992
1993
1994
1995
1996
1997
1998
1999
Chipsets Comm ICs Graphics ICs Mass storage Voice and other
118.1 146.8 113.4 103.7 99.1
100.0 100.0 100.0 100.0 100.0
100.4 92.7 74.7 97.4 83.1
80.9 67.1 58.0 110.4 72.5
102.4 77.2 134.4 111.2 35.0
124.4 103.5 74.1 75.0 44.0
79.5 91.9 24.6 92.9 43.5
76.6 58.4 28.0 71.8 35.9
42.3 28.0 23.7 48.0 22.3
Fisher Ideal Index
116.8
100.0
88.8
73.0
99.6
97.9
65.8
57.5
35.0
Source: Authors’ calculations.
geometric means or price changes because only price data were available at the subcategory level.22 With regard to the underlying data, the quality of the data is not uniform: some indexes—like microprocessors—are built from very detailed data—eighty-five or so types of chips. At the other extreme, about 36 percent of the market—at the bottom of table 12A.4—is measured using only forty-three classes of chips. As is well known, as the data become more coarse, it becomes less likely that the quality of chips in 22. The formula for a geometric mean of price change from time t – 1 to time t is Pi,t 1/2 It,t1 . P i i,t1
374
Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
each class can be held constant over time, and price declines that signal technical change become confounded with price increases that reflect increases in quality. Similarly, some indexes are built using high-frequency data (monthly or quarterly), while other use annual data. While most measures are averaged over the reported period, the prices for general-purpose logic are year-end prices (the only way these data are reported). For microcontrollers from 1996 through 1999, a synthetic Fisher ideal index based on WSTS unit values for digital signal processors (DSPs) and Aizcorbe’s (2002) index for microcontrollers (excluding DSPs) over this period was constructed. Adequate measures were not available for two types of devices. We filled in the gaps by comparing price movements for devices with missing periods with price movements in other categories when prices were available, then selecting the closest fit. For field programmable logic chips, adequate indexes are not available for 1995 to 1999, and we assumed that prices of these devices moved like a subindex of Other MOS logic excluding it (i.e., a Fisher index based only on General Purpose Logic, Gate Array, and Standard Cell devices) over 1995 to 1999. Indexes for microcontrollers were not available for the period before 1996. In that case, we used an average sales price available from the WSTS survey—the only available data. Because indexes for MPUs were only available beginning in 1993, estimates in Grimm (1998) were used to extend the microprocessor index back in 1991. Table 12A.5 provides annual price indexes for all the devices. Two of these product classes required special treatment. We detail the methods and sources for those two indexes next. Special Index for Microperipherals (MPR) This index assumes chip quality is proportional to the number of transistors and other electronic components contained in a chip. The index effectively measures the price per two-dimensional feature (e.g., transistor) on a MOS microperipheral (MPR) chip. The starting point was WSTS data on the value of sales, and number of units sold over 1991 to 1999 for five classes of chips included within MOS MPR: chipsets, communications integrated circuits (ICs), graphics ICs, mass storage ICs, voice and other ICs. Using data from Semico Research, SEMATECH has estimated the average line width per feature etched on each of these different types of chips and the average area of each of these classes of chips. Squaring line width gives an index of the minimum size for an electronic component etched on the surface of a chip, and dividing average chip area by this index yields an estimate of the maximum number of electronic components that fit on a chip with that area. Dividing average sales price per chip by the total number of electronic components then gives us an average price per electronic
The Role of Semiconductor Inputs in IT Hardware Price Decline Table 12A.5
MOS MPU MOS memory MOS MPR Other MOS logic MOS MCU Thyristors and rectifiers Power transistors Small signal transistors Optoelectronics Diode and all other discrete Digital bipolar Analog WSTS All analog Low-tech High-tech Hybrid Other MOS logic
375
Annual Fisher Ideal Price Index, by product class, 1992–1999
1991
1992
1993
1994
1995
1996
1997
1998
1999
CAGR (1991–99)
1.52 1.30 1.17 1.11 0.98
1.00 1.00 1.00 1.00 1.00
0.69 0.97 0.89 0.96 1.01
0.47 0.98 0.73 0.90 0.99
0.19 0.93 1.00 0.84 1.00
0.07 0.45 0.98 0.72 0.87
0.033 0.20 0.66 0.66 0.70
0.010 0.08 0.57 0.43 0.60
0.00 0.07 0.35 0.36 0.53
–52.32 –30.76 –13.98 –13.16 –7.48
1.00 1.07
1.00 1.00
0.98 1.00
1.00 1.03
0.97 1.04
0.77 0.88
0.69 0.74
0.63 0.66
0.56 0.67
–7.09 –5.65
1.05 0.91
1.00 1.00
1.04 1.01
1.05 1.01
1.06 1.04
1.00 0.94
0.82 1.00
0.70 0.70
0.68 0.68
–5.27 –3.63
0.98 0.87 0.95
1.00 1.00 1.00
0.98 1.08 1.07
1.01 1.12 1.16
1.16 1.08 1.23
1.06 0.93 1.27
0.93 0.73 1.18
0.82 0.71 1.09
0.79 0.92 1.06
–2.60 0.57 1.40
0.95 1.00 0.92 1.07 1.11
1.00 1.00 1.00 1.00 1.00
1.07 1.07 1.07 1.00 0.96
1.16 1.21 1.13 1.00 0.90
1.23 1.23 1.24 0.95 0.84
1.27 1.20 1.30 0.85 0.72
1.18 1.09 1.18 0.78 0.66
1.09 1.04 1.07 0.57 0.43
1.06 1.05 1.02 0.50 0.36
1.40 0.63 1.22 –8.99 –13.16
Source: Authors’ calculations. Note: CAGR compound annual growth rate.
component on a chip, which we interpret as a quality-adjusted price index within each of our five classes of MPR chips. We then calculate WSTS revenue share data and price relatives for each of these five classes of MPR chips over the 1991 to 1999 period. Construction of a Fisher ideal price index for the MPR chip category is straightforward, using equation (1) in the text. As shown in table 12A.4 the resulting Fisher index falls substantially over this period, to less than one-third of its 1991 value by 1999. Special Index for Analog Devices We next detail construction of the hybrid index we use for these devices. While we measure price change for the low-tech devices in this class using the available WSTS unit value data, we assume that the price change for the high-tech devices in this class parallels that of devices in the “Other MOS logic” class of chips and average over the two indexes using Fisher weights to obtain the hybrid index. Table 12A.6 compares alternative assumptions to measure price change of analog devices. The measure labeled “WSTS” is constructed using the
376
Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
Table 12A.6
Alternative price indexes for analog devices, 1992–1999 Compound annual growth rate
WSTS All analog High tech Low tech Hybrid index Other MOS logic
1991–1999
1991–1995
1995–1999
1.40 1.22 0.63 –8.99 –13.16
6.85 7.67 5.36 –2.86 –6.76
–3.77 –4.83 –3.88 –14.73 –19.13
Source: Authors’ calculations.
very coarse WSTS data: the index is an annual Fisher index derived from monthly average unit sales prices for between five to eleven classes of analog chips, depending on the time period. This can safely be viewed as a conservative estimate of price declines for these devices. At the other extreme, the measure labeled “Other MOS Logic” assumes the deflator for analog devices is equal to the deflator for other MOS logic—a category of MOS semiconductor chip with price declines intermediate between the highest volume, leading-edge technology used in memory and microprocessors and the relatively mature technology used in non-MOS devices and discrete semiconductors. The hybrid index is a Fisher index of two Fisher indexes. The index for high-tech analog devices uses the Fisher index for other MOS logic to represent price change; the index for low-tech analog devices is a Fisher index of a low-tech subset of WSTS analog product categories (shown in line 3).23 We believe this index is likely to be a better approximation to reality. Annual measures corresponding to the alternative cases are given in table 12A.5. Calculations for the Relative Importance of Semiconductor Inputs Recall that we estimate the semiconductor share of variable cost in two steps. First, we gather together industry estimates of the share of semiconductor inputs in the value of shipments of each end-use device. Then we employ data from the Census Bureau’s Annual Survey of Manufacturers (U.S. Bureau of the Census [2000]) to translate semiconductors’ share of shipments into their share of unit variable cost. 23. Low-tech analog chips are those included in the WSTS categories for amplifiers, interface, voltage regulators and references, and data conversion circuits; high-tech analog chips are those in the special consumer circuits, comparators, and other linear devices categories.
The Role of Semiconductor Inputs in IT Hardware Price Decline Table 12A.7
377
Estimates of semiconductor content as percentage of value of product
Automotive DQ Cons/DQ Eqp WSTS/EIO WSTS/DQ Eqp Communications DQ Cons/DQ Eqp WSTS/EIO WSTS/DQ Eqp Computers DQ Cons/DQ Eqp WSTS/EIO WSTS/DQ Eqp Consumer Electronics DQ Cons/DQ Eqp WSTS/EIO WSTS/DQ Eqp Government DQ Cons/DQ Eqp WSTS/EIO WSTS/DQ Eqp Industrial DQ Cons/DQ Eqp WSTS/EIO WSTS/DQ Eqp
1998
1999
2000
18 19 15
21
16 15
17 13 12
19
26 23 24
30
20 22
13 12 11
15
11 11
4 1 2
5
2 2
9 8 8
0
8 9
11 11
17
16
26
15
2
10
Sources: Semiconductor consumption by user sector: DQ Cons—Dataquest-Gartner Group, Semiconductor Product Trends in 2000, 7/31/2000 (Olsson 2001); WSTS—World Semiconductor Trade Statistics, Semiconductor Industry End-Use Survey (Semiconductor Industry Association 2002c). Value of equipment production by industry: DQ Eqp—DataquestGartner Group, Semiconductor Product Trends in 2000, 7/31/2000 (Olsson 2001); EIO— Electronic Industry Outlook, Fourth Quarter, 1998 (Electronics Outlook Corporation 1998).
Table 12A.7 pulls together a range of estimates of the semiconductor content of computers, communications equipment, and consumer electronics assembled from proprietary industry estimates and the WSTS semiconductor consumption estimates used in constructing our price indexes. The sources are denoted as follows: DQ Cons and DQ Eqp refer to Dataquest-Gartner Group, Semiconductor Product Trends in 2001, July 31, 2000; WSTS refers to the WSTS Semiconductor Industry End-Use Survey, various years; and EIO stands for the Electronic Industry Outlook (Electronic Outlook Corporation 1999). This ratio of shipments to variable cost are based on data reported in the 1998 U.S. Annual Survey of Manufactures. We estimate the markup of shipment price over unit variable cost as shipments divided by shipments less nonlabor value added (i.e., shipments/[shipments – value added payroll]).
378
Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
Table 12A.8
Annual Fisher Ideal Price Index, by end use industry, 1992–1999 Deflator
Worldwide Auto Communication Computer Consumer Government Industrial North America Auto Communication Computer Consumer Government Industrial
1992
1993
1994
1995
1996
1997
1998
1999
1 1 1 1 1 1
0.96 0.97 0.91 0.98 0.98 0.97
0.92 0.94 0.83 0.96 0.96 0.95
0.89 0.90 0.70 0.93 0.91 0.90
0.72 0.69 0.39 0.73 0.65 0.68
0.59 0.54 0.22 0.58 0.48 0.52
0.45 0.37 0.10 0.40 0.32 0.36
0.40 0.31 0.07 0.35 0.26 0.31
1 1 1 1 1 1
0.96 0.97 0.89 0.98 0.98 0.96
0.91 0.93 0.80 0.95 0.96 0.93
0.87 0.90 0.65 0.94 0.90 0.88
0.71 0.68 0.35 0.70 0.70 0.67
0.58 0.53 0.19 0.53 0.56 0.51
0.44 0.36 0.09 0.37 0.39 0.35
0.39 0.31 0.05 0.31 0.33 0.29
Source: Authors’ calculations.
Data Sources for End-Use Prices We measured computer prices using the matched-model price indexes in Aizcorbe, Corrado, and Doms (2000). Although computers are relatively well measured now, quality adjustment of prices for communications equipment and consumer electronics is problematic. For communications equipment, we formed a crude measure of quality-adjusted communications equipment price change in 1998 using the available data. We started with the estimates of quality-adjusted LAN equipment prices for 1992– present that are now available from the Federal Reserve Board. (See also table 12A.8.) For the period prior to 1996, we examined hedonic estimates of digital switch prices reported in Grimm (1996). We then used the historical ratio between quality-adjusted price changes for digital switches and quality-adjusted LAN equipment price changes over 1992 to 1996, multiplied by LAN equipment price changes in 1998, as a crude estimate of switch price changes in 1998. Finally, we average switch and LAN equipment price changes using relative expenditure in 1998 as weights and use the resulting calculation as our measure of quality-adjusted communications equipment price change in 1998. (Note, however, that these two categories of equipment accounted for only 30 percent of communications equipment spending in 1998).24 To measure price change for the consumer electronics sector, we found only one study of quality-adjusted prices for consumer electronics with a 24. See Doms and Forman (2003), table 1.
The Role of Semiconductor Inputs in IT Hardware Price Decline
379
methodology that seems roughly comparable to those for computers and communications.25 That study pertains to consumer audio equipment only, and we can only hope that our consumer electronics prices are roughly comparable.
References Aizcorbe, A. M. 2002. Price measures for semiconductor devices. FEDS Working Paper no. 2002-13. Washington, DC: Federal Reserve Board of Governors, January. Aizcorbe, A. M., C. Corrado, and M. Doms. 2000. Constructing price and quality indexes for high technology goods. Paper presented at the CRIW-NBER Summer Institute, session on Price, Output, and Productivity Measurement, Cambridge, MA. Aizcorbe, A. M., S. Oliner, and D. Sichel. 2003. Semiconductor price puzzles. Paper presented at the CRIW-NBER Summer Institute, session on Price, Output, and Productivity Measurement, Cambridge, MA. Basu, S., and J. G. Fernald. 1997. Returns to scale in U.S. production: Estimates and implications. Journal of Political Economy 105 (2): 249–83. Berndt, E. R., and Z. Griliches. 1993. Price indexes for microcomputers: An exploratory study. In Price measurements and their uses, ed. M. Foss, M. Manser, and A. Young, 63–89. Chicago: University of Chicago Press. Berndt, E. R., Z. Griliches, and N. J. Rappaport. 1995. Econometric estimates of price indexes for personal computers in the 1990s. Journal of Econometrics 68 (1): 243–69. Berndt, E. R., and N. J. Rappaport. 2001. Price and quality of desktop and mobile personal computers: A quarter-century historical overview. American Economic Review 91 (2): 268–73. Cole, R., Y. C. Chen, J. A. Barquin-Stolleman, E. Dulberger, N. Helvacian, and J. H. Hodge. 1986. Quality-adjusted price indexes for computer processors and selected peripheral equipment. Survey of Current Business 66 (January): 41–50. Corrado, C. 2001. Industrial production and capacity utilization: The 2000 annual revision. Federal Reserve Bulletin 37 (3): 132–49. Crandall, R. 2001. Comments on Dan Sichel, “Productivity in communications: An overview of the issues,” presented at Brookings workshop on Communications Output and Productivity, Washington, DC. http://www.brook.edu/es/ research/projects/productivity/workshops/20010223/20010223.htm. Diewert, W. E. 2000. Notes on Producing an Annual Superlative Index Using Monthly Price Data. University of British Columbia, Department of Economics, Discussion Paper no. 00-08. Doms, M. 2003. Communications equipment: What has happened to prices? San Francisco: Federal Reserve Bank of San Francisco. Doms, M., and C. Forman. 2003. Prices for local area network equipment. Working Paper no. 2003-13. San Francisco: Federal Reserve Bank of San Francisco. Dulberger, E. 1993. Sources of price decline in computer processors. In Price mea-
25. See Kokoski, Waehrer, and Rozaklis (2000), table 9.
380
Ana Aizcorbe, Kenneth Flamm, and Anjum Khurshid
surements and their uses, ed. M. Foss, M. Manser, and A. Young. 103–24. Chicago: University of Chicago Press. Electronics Outlook Corporation. 1998. Electronic industry outlook, fourth quarter, 1998. San Francisco: Electronics Outlook Corporation, January. Flamm, K. 1989. Technological advance and costs: Computers versus communications. In Changing the rules: Technological change, international competition, and regulation in communications, ed. R. W. Crandall and K. Flamm, 13–61. Washington, DC: Brookings Institution. ———. 1993. Measurement of DRAM prices: Technology and market structure. In Price measurements and their uses, ed. M. Foss, M. Manser, and A. Young, 157–97. Chicago: University of Chicago Press. ———. 1996. Mismanaged trade? Strategic policy and the semiconductor industry. Washington, DC: Brookings Institution. ———. 1999. Digital convergence? The set-top box and the network computer. In Competition, innovation and the Microsoft monopoly: Antitrust in the digital marketplace, ed. J. A. Eisenach and T. M. Lenard, 255–90. Boston: Kluwer Academic. ———. 2001. Moore’s Law and the economics of semiconductor price trends. Paper presented at NBER Productivity Program meeting, Cambridge, MA. Gordon, R. J. 1990. Telephone transmission and switching apparatus. In The measurement of durable goods prices, 395–404. Chicago: University of Chicago Press. ———. 2000. Does the “new economy” measure up to the great inventions of the past? Journal of Economic Perspectives 14 (4): 3–22. Grimm, B. T. 1996. Quality adjusted price indexes for digital telephone switches. Bureau of Economic Analysis. Mimeograph. ———. 1998. Price indexes for selected semiconductors, 1974–96. Survey of Current Business 78 (February): 8–24. Irwin, D. A., and P. Klenow. 1994. Learning by doing spillovers in the semiconductor industry. Journal of Political Economy 102 (6): 1200–1227. Jorgenson, D. W. 2001. Information technology and the U.S. economy. American Economic Review 91 (1): 1–32. Jorgenson, D. W., and K. J. Stiroh. 2000. Raising the speed limit: U.S. economic growth in the information age. Brookings Papers on Economic Activity, Issue no. 1:71–150. Washington, DC: Brookings Institution. Kokoski, M., K. Waehrer, and P. Rozaklis. 2000. Hedonic approaches to quality adjustment in the CPI for consumer audio products. Division of Price and Index Number Research Working Paper no. 344. Washington, DC: U.S. Bureau of Labor Statistics. Morrison, C. J. 1992. Unraveling the productivity growth slowdown in the United States, Canada, and Japan: The effects of sub-equilibrium, scale economies and markups. Review of Economics and Statistics 74 (3): 381–93. Norsworthy, J. R., and S. L. Jang. 1993. Cost function estimation of quality change in semiconductors. In Price measurements and their uses, ed. M. Foss, M. Manser, and A. Young, 125–55. Chicago: University of Chicago Press. Oliner, S. D., and D. E. Sichel. 2000. The resurgence of growth in the late 1990s: Is information technology the story? Journal of Economic Perspectives 14 (4): 3–22. Olsson, M. 2001. Semiconductor product trends in 2000. Dataquest-Gartner Group Report no. SCSI-WW-MT-0104. San Jose, CA: Dataquest-Gartner Group, September. Semiconductor Industry Association. 2002a. Historic year-end bluebook, 2002. http://www.semichips.org/pre_statistics.cfm.
The Role of Semiconductor Inputs in IT Hardware Price Decline
381
———. 2002b. Product definitions for billings. Semiconductor Industry Association Mimeograph. http://www.semichips.org/pre_stat.cfm?ID-31. ———. 2002c. WSTS End-Use Survey. http://www.semichips.org/pre_statistics.cfm. Sichel, D. E. 2001. Productivity in communications: An overview of the issues. Paper presented at Brookings workshop on Communications Output and Productivity, Washington, DC. http://www.brook.edu/es/research/projects/productivity/ workshops/20010223/20010223.htm. Song, M. 2003. Measuring consumer welfare in the CPU market: An application of the pure characteristics demand model. Paper presented at NBER Productivity Program meeting, Cambridge, MA. Triplett, J. E. 1989. Price and technological change in a capital good: A survey of research on computers. In Technology and capital formation, ed. D. W. Jorgenson and R. Landau, 126–213. Cambridge, MA: MIT Press. ———. 1996. High tech industry productivity and hedonic price indexes. Paper presented at OECD expert workshop, Industry Productivity, International Comparison and Measurement Issues, Paris. http://www.oecd.org/document/33/ 0,2340,en_2649_201185_1825441_1_1_1_1,00.html. U.S. Bureau of the Census. 2000. Statistics for industry groups and industries: 1998. Report no. M98(AS)1. Washington, DC: U.S. Department of Commerce, February. U.S. Bureau of Economic Analysis. 2004. Survey of current business. Washington, DC: U.S. Department of Commerce, May. U.S. Congressional Budget Office. 2001. The need for better price indices for communications investment. CBO Report. Washington, DC: Congressional Budget Office, June.
13 Computer Input, Computer Networks, and Productivity B. K. Atrostic and Sang Nguyen
13.1 Introduction Computer networks may be a new technology that shifts the production function. Our previous research (Atrostic and Nguyen 2005) found a positive and significant relationship between computer networks and labor productivity in U.S. manufacturing, using the first survey data on the presence of computer networks in manufacturing plants, collected in the 1999 Computer Network Use Survey (CNUS). We controlled for other inputs to production, plant characteristics, and the endogeneity of computer networks. However, because no data to proxy for the capital stocks of computers were available, our previous estimate of the relationship between computer networks and plants’ labor productivity may be subject to an omitted variable bias. This paper extends our previous model to include computer capital as a separate input in the production function. We use new plant-level data on computer investment from the 2000 Annual Survey of Manufactures (ASM) to develop a proxy for computer capital input. An important conB. K. Atrostic is a senior economist at the Center for Economic Studies at the U.S. Census Bureau. Sang Nguyen is a senior economist at the Center for Economic Studies at the U.S. Census Bureau. This paper reports the results of research and analysis undertaken by the authors. It has undergone a more limited review than official publications. Opinions expressed are those of the authors and do not necessarily represent the official position of the U.S. Census Bureau. This report is distributed to inform interested parties of research and to encourage discussion. We have benefited from comments by Randy Becker and Jack Triplett; Mark Roberts, the editors, and other participants at the NBER Conference on Research in Income and Wealth on Hard-to-Measure Goods and Services: Essays in Memory of Zvi Griliches; seminar participants at the National Institute for Social and Economic Research, London, England, March 2004; and the referees; but all errors are, of course, the authors’ own.
383
384
B. K. Atrostic and Sang Nguyen
tribution of this paper is to define the sample for which the measures of computer and conventional physical capital available in the data—computer investment and book value—are good proxies for these inputs. We show that these measures are good proxies only for plants that are new. For new plants, computer investment should equal the value of the plant’s computer stock, and book values of buildings and machinery equal the value of the plant’s physical capital stock. We create a sample of new plants with the best proxies possible with the available data. Using this sample, we find positive and significant relationships between labor productivity and both computer networks and computer capital inputs. Our findings suggest that understanding the relationship between computers and productivity requires measures of how businesses use computers. 13.2 Computers, Computer Networks, and Productivity: Measurement Issues Estimating plant-level relationships among computers, computer networks, and productivity requires overcoming many empirical challenges. Researchers must address the substantial standard measurement issues that arise in using plant-level data (see Griliches 1994; Griliches and Mairesse 1995). Serious data gaps specific to the quest to understand the economic role of computers, electronic devices, and computer networks plague the resulting empirical literature on computers and productivity (see, for example, Atrostic, Gates, and Jarmin 2000; Haltiwanger and Jarmin 2000). These gaps likely contribute to its divergent findings on data issues (Stiroh 2004). In this section, we focus on three specific measurement issues—measuring capital inputs in general, measuring computer inputs, and defining a sample with good measures of both—and on using that sample to estimate the relationship between computer networks and productivity. 13.2.1 Measuring Capital Input Our productivity model requires a measure of capital inputs or capital services. Such a measure, or the data needed to create it, is hard to get directly. Researchers have developed ways to use the information that is typically available to create proxies for capital inputs that are widely used in both time series and cross-section analyses. However, our data lack the information needed to create these standard proxies. For time series analysis, a measure of capital services can be generated from information on the capital stock. The perpetual inventory method usually builds the capital measure up from data on capital investments, depreciation, and asset prices. That is, Kt K0 Σ K ( 0, . . . , – 1), where K (It – Dt ) /Pt and I, D, and P denote capital investments, depre-
Computer Input, Computer Networks, and Productivity
385
ciation, and asset prices. A problem with the perpetual inventory method, especially at the plant level, is the lack of plant-level data on depreciation and asset prices. An alternative method uses the book value of capital as a proxy for the capital stock. An advantage of this approach is that book values are frequently collected directly from respondents. A major shortcoming of book values is that they are evaluated at the purchase prices, regardless of when the capital good was bought. Book values therefore reflect the true value of capital stocks only for special cases.1 The plant-level study by Baily, Hulten, and Campbell (1992) finds that both perpetual inventory and book value measures lead to similar empirical results for topics such as productivity dispersion. Doms (1996) also finds that book values and service flows measures yield similar results for a specific set of advanced technologies. Because of these empirical regularities, many researchers using plant-level data (e.g., Doms, Dunne, and Troske 1997; McGuckin, Streitwieser, and Doms 1998; Dunne et al. 2000; Greenan, Mairesse, and Topiol-Bensaid 2001) use the book values of the plant’s total capital stock directly as a proxy for service flows. Stiroh’s recent analysis (2004) also finds little empirical difference between the two measures. For cross-section analysis, it is often impossible to construct a measure of capital services using the perpetual inventory method because the necessary time series of capital investment data are not available. Empirical cross-section studies often use book values of the capital stock as a proxy for capital services. Using book values requires assuming capital input is proportional to book values. This assumption may be correct if all plants in the sample have the same age. But this is not likely to be the case. Because book values are evaluated at the purchase price and plants in the sample differ in ages, book values of capital seriously mismeasure the plant’s capital inputs. Data gaps for recent years make it more difficult to use either perpetual inventory or book value measures of capital (see table 13A.1). Book values of physical capital (buildings and machinery) are now collected less frequently in U.S. manufacturing and for a smaller group of plants. Book values were collected annually in both the Census of Manufactures (CM) and ASM until 1986. Since then, these data are collected only in the Economic Census years (e.g., 1987, 1992, and 1997), and are collected only for the plants that are in the ASM sample in those years. The ASM sample is roughly 55,000 plants, far smaller than the roughly 350,000 plants in the 1997 CM (U.S. Census Bureau 2001).
1. To alleviate this problem, researchers often use plant ages and other plant characteristics as controls in their regression models when using book values as a proxy for capital inputs.
386
B. K. Atrostic and Sang Nguyen
13.2.2 Measuring Computer Input Computers should be treated as a separate input in production and productivity analysis, as suggested by studies such as Jorgenson and Stiroh (2000) and Oliner and Sichel (2000). Computer services are the theoretically appropriate measure of computer input. Computer services, like other capital services, are not observed, and measures approximating this service flow must be constructed. Computer service flows are normally estimated from measures of the computer capital stock in aggregate and industry-level productivity studies (e.g., Jorgenson, Ho, and Stiroh 2005; Triplett and Bosworth 2003). However, book values of computer capital are not collected in government data, so studies using plant-level data often approximate computer service flows with measures of computer investment. Investment has been used as a measure of the presence of computers, or of computer intensity, or as a measure of the intensity of technology use in many recent plant-level studies. Computer investment is used as a proxy for computer input in the plant in Berman, Bound, and Griliches (1994). Doms, Dunne, and Troske (1997) control for computer investment in their analysis of how adopting various technologies affects a series of plant-level economic outcomes. Dunne et al. (2000) examine the role of computer investment in the dispersion of productivity and wages in U.S. manufacturing. Haltiwanger, Jarmin, and Schank (2003) use computer investment as a factor separate from total equipment investment in estimating productivity. Computer investment is a good proxy for computer capital stock under the assumption that this investment is equal or proportional to a plant’s stock of computer capital. This assumption allows researchers to use the only measure at hand. However, it may not be correct. Total plant-level investment typically is lumpy, while service flows are not. Cooper, Haltiwanger, and Power (1999) find that plant-level investment surges are followed by periods of low investment. Becker et al. (2004) look at more recent data and find investment spikes in both firm- and plant-level data for investment in general. Recent research by Wilson (2004) suggests that firmlevel investment may be lumpy across specific kinds of investment, including computers and communications equipment. However, this result is based on the single available cross section of detailed investment data, so the lumpiness of investment can only be defined in terms of the share of a firm’s investment in specific kinds of capital goods, rather than variation over time in the amount and kind of investment. Actual investment to make computers usable in the workplace (coinvention) may be less lumpy than measured computer equipment investment. Coinvention includes expenditures developing and implementing software that engages and connects computers and adapts them to plant-specific
Computer Input, Computer Networks, and Productivity
387
uses, for example, Bresnahan and Greenstein (1997), as well as changes in workplace organization, management, and other organizational capital that make more effective use of computers, labor, and other inputs, for example, Brynjolfsson and Hitt (2003). Some of these expenditures may be capitalized, but others may be expensed. Coinvention may continue in periods when there is no investment in computer hardware and software. Because the scale of coinvention over the life of the computer asset can be as much as the original computer equipment investment (Bresnahan and Greenstein 1997), or up to ten times the investment in computer hardware (Brynjolfsson and Hitt 2003), the joint effect may be to smooth or exacerbate investment lumpiness. These unmeasured complementary computer investments may cause estimated returns to measured computer investments to exceed actual returns to measured computer investments, particularly in the long run (e.g., Brynjolfsson and Hitt 2003). However, any effect of coinvention on actual computer investment will not be captured in our measure because only data for investments in computer hardware and peripherals are collected in the 2000 ASM. Data gaps for recent years also limit the available computer investment data. These data are collected only occasionally and in recent years were not collected at the same time as book values of capital. While computer investment data were collected in the CM for 1977 through 1992, they were not collected at all in 1997 (when book values of physical capital were collected) and were only collected again in the ASM in 2000 and 2001 (when book values of physical capital were not collected). The lumpiness of plantand firm-level investment means that investment data for a single year are not a good proxy for the plant’s (or firm’s) stock, except for new plants. In a new plant, capital investment would be equal to the value of the plant’s capital stock. 13.2.3 Developing a Sample with Good Proxies for Computer and Capital Inputs The data gaps for recent years make it difficult to argue that the data we have available on book values of capital and computer investment provide equally plausible proxies for total capital and computer services for all plants that responded to the CNUS. In this paper, we develop the best sample of CNUS respondents that our measures of computer and total capital allow us to make: a sample of plants that first appeared in the 1997 CM. When a plant is new, the book values of physical capital (buildings and machinery) and computer investment should equal the value of the plant’s capital stock and the plant’s computer capital stock, respectively. If we were estimating productivity in 1997, it would be straightforward to use the book values of total capital that these new plants report in the 1997 CM as a proxy for their total capital services in 1997, making the standard assumption that capital services are proportional to the value of the
388
B. K. Atrostic and Sang Nguyen
capital stock. That is, for physical capital, KT1997 BVT1997, where K is the value of the plant’s total physical capital stock, T indexes total capital, and BV is book value.2 However, we estimate productivity in 2000 (rather than 1997) because computer capital input is measured in 2000, and most of the remaining variables, particularly the variable of interest, computer networks, are measured in 1999.3 We therefore use standard capital theory to relate the flow of total capital services in 2000, ST (KT2000 ), to total book value in 1997 for our sample of plants new in 1997: (1)
ST (KT 2000 ) ≈ T BVT1997 T .
The proportionality factor, T , represents services per unit of total capital. The approximation error, T , increases as 1997 differs from the year for which we wish to measure capital services. That is, T T when ⏐ – 1997⏐ ⏐ – 1997⏐ for plants observed in year compared to year . For computer capital stock in 2000, we use computer investment in 2000 as a proxy under the assumption that the total computer capital stock is proportional to observed investment: (2)
KC2000 ≈ IC2000 ,
where KC2000 represents the plant’s actual computer capital stock, and IC2000 is the plant’s computer investment in 2000. The proportionality factor, , is positive ( 0) and assumed to be the same for all plants in our sample because they opened in the same year, 1997. If 1, the plant completely replaces its old computing stock with new computers. We again use standard capital theory to relate the flow of computer capital services in 2000, SC (KC2000 ), to our proxy for the computer capital stock: (3)
SC (KC2000 ) ≈ C IC2000 C ,
2. We first link all observations that have both information on computer networks in the 1999 CNUS and information on computer investment in the 2000 ASM. Because the 1999 CNUS and 2000 ASM samples each are drawn from a sample frame based on the 1997 CM, the probability-proportionate-to-size sampling strategy leads to a high overlap between the two samples, and the 1999–2000 linking rate is high. Haltiwanger, Jarmin, and Schank (2003) find little sample reduction when they link the 1999 CNUS and the 2000 ASM. Their final sizes range from 22,700 to 22,900, depending on specification. Because the data as entered in the CES data storage system do not allow us to distinguish between plants that do not report computer investment and those that report zero, we exclude both. This means that the plants in our sample all have positive computer investment. We find that roughly one-third of the linked plants report positive computer investment. This response pattern is consistent with the historical pattern when this item was collected in 1977, 1982, 1987, and 1992 (e.g., Dunne et al. 2000). From the linked sample we select plants that first appeared in the 1997 CM (that is, they did not appear in the 1992 CM or the 1993 through 1996 ASMs). 3. Because computer networks are major investments and U.S. manufacturing plants have used some form of networks for decades, it seems reasonable to assume that plants with networks in 1999 will continue to have networks in 2000.
Computer Input, Computer Networks, and Productivity
389
where the proportionality factor, C , represents services per unit of computer capital, and C is the approximation error from using investment data in 2000 to measure computer service flows in 1999. Using these proxies yields the best sample that the data will allow us to create. The sample of plants new in 1997 has 849 observations. We address the concern that the sample is small by constructing a second sample based on a broader alternative definition of new that includes plants between three and eight years old. The broader definition includes plants that first appeared in the 1993 through 1996 ASMs and have positive computer investment. These plants are between three and eight years old in 2000, below the ten-year average age of plants in the 1999 CNUS–2000 ASM linked data set.4 The value of the capital approximation errors, T and C , will be higher for these plants than for plants that are new in 1997, but including them yields a larger sample of 1,755 observations. To test the importance of using the sample of plants for which we have relatively better proxies for total and computer capital stock, we use the linked data to construct a data set containing plants of all ages. Our sample of plants of all ages that report positive computer investment has 12,386 observations. 13.2.4 Estimating the Impact of Computer Networks Because computers have been in commercial use in the U.S. for fifty years, they might be viewed as just another capital input. If so, we would expect to find that computer and noncomputer capital yield similar contributions to productivity. Computer networks also have been used for decades. But the networks that came into use more recently are thought to be qualitatively different (e.g., Bresnahan and Greenstein 1997). Brynjolfsson and Hitt (2000) argue that the effects of organizational changes caused by the newer computer networks may rival the effects of changes in the production process. Viewed this way, computer networks are a productivity-enhancing–general-purpose technology (Bresnahan and Trajtenberg 1995). The question for productivity and other measures of economic performance may no longer be whether computers matter, but whether it matters how computers are used. Despite the importance of understanding whether computer networks matter for productivity, information on networks is scarce. The computer network information collected in the 1999 CNUS is the first such collection for a large and representative national sample of plants in U.S. manufacturing. The CNUS asked about the presence of several kinds of networks, including Internet, Intranet, Local Area Networks (LAN), Electronic Data Interchange (EDI), Extranet, and “other.” We create a dummy vari4. See Haltiwanger, Jarmin, and Schank (2003).
390
B. K. Atrostic and Sang Nguyen
able for the presence of computer networks that takes on a value of one if the plant reports having any of these kinds of computer network and zero otherwise. The 1999 CNUS network data, together with the computer investment information collected in the 2000 ASM, allow us for the first time to specify an empirical model of labor productivity with separate measures of the presence of computers (computer investment) and how computers are used (computer networks). Using only the information on how businesses use computers (the presence of computer networks), as in our previous research, may overstate the importance of those uses because it is picking up the importance of having computers. 13.3 Empirical Implementation We focus on estimating whether labor productivity is related both to computer networks and computer inputs by estimating the following Cobb-Douglas production function: (4)
Q Kc Knc log 0 1CNET 1c log 1nclog L L L
M 2log 3log(MIX) 4MULTI L ∑ j SIZEj ∑ i INDi ε where Q, Kc , Knc , L, and M represent output, computer capital input, noncomputer capital input, labor, and materials. CNET denotes computer networks, and 0 1CNET A, the technological change term, that is, the Solow residual or total factor productivity (TFP). SIZE denotes the size class of the plant. MIX denotes the mix of production and nonproduction workers, and MULTI represents plants that belong to a multiunit firm. IND denotes three-digit NAICS industries. Knc /L, noncomputer capital input in 1999, is proxied by K/L97, the book value of total capital in 1997, divided by 1997 employment. Kc /L computer capital input in 1999, is proxied by KC2000 /L, computer investment in 2000, divided by employment in 1999. Our model distinguishes between the productive effect of computer input in the plant and a technological shift resulting from using computer networks. Equation (4) directly relates computer networks and computer capital to (log) labor productivity. In this formulation, 1, is one of our two parameters of interest. It can be interpreted as measuring the relationship between computer networks and labor productivity, controlling for the in-
Computer Input, Computer Networks, and Productivity
391
tensities of computer and noncomputer capital (Kc /L and Knc /L), materials intensity (M/L), and other plant characteristics. The second parameter of interest is 1c , the coefficient on the intensity of computer capital. This coefficient can be interpreted as a return to the flow of services from the stock of computer capital. Labor productivity is defined as output per worker (Q/L). We use total value of shipments (TVS) as a measure of Q. Our measure of labor, L, is the total number of employees in the plant. Our model differs from those in most previous related plant-level studies in specifying a four-factor production function in which output is defined as gross output (rather than value added), and materials are incorporated as a separate input in production. We described earlier how we use the CNUS, ASM, and CM to specify computer networks, computer inputs, and total capital inputs. We use the same empirical specifications of materials, skill mix, size, multiunit plant status, and industry as Atrostic and Nguyen (2005). The CNUS data are part of a Census Bureau measurement initiative to fill some of the data gaps on the growing use of electronic devices and networks in the economy (Mesenbourg 2001). The appendix contains more information on the 1999 CNUS, 2000 ASM, and the 1992 and 1997 CM. 13.4 Empirical Findings We estimate relationships among computer networks, computer input, and labor productivity using three alternative specifications. The preferred specification includes both computer networks and computer inputs. A specification that parallels our prior research includes computer networks but not computer inputs. The third specification parallels specifications in the literature that include computer inputs but not computer networks. The three specifications are estimated first for the cohort of 849 plants that newly opened their operations in 1997 and had positive computer investment in 2000. We report these results in table 13.1. To assess whether it matters that we restrict our sample to plants that were new in 1997, we estimate the same three specifications using two other samples. Estimates from the sample of 1,755 relatively new plants that opened between 1993 and 1997 and have positive computer investment in 2000 are reported in table 13.2.5 Estimates from the sample of 12,836 plants of all ages that have positive computer investment in 2000 are also reported in table 13.2. This data set allows us to assess the empirical importance of using proxies for 5. We also create four subsamples of plants that are new in each year between 1997 and 1992. The results are similar to the results for plants new in 1997 and all plants that are new between 1992 and 1997, so we do not report them separately.
392
B. K. Atrostic and Sang Nguyen
Table 13.1
Labor productivity OLS regression results: Plants new in 1997 Plants with positive computer investment in 2000 New in 1997
Independent variable
All plants
(1)
(2)
(3)
(4)
3.769∗∗∗ (32.63) .117∗∗ (2.12) .086∗∗∗ (6.02) .050∗∗∗ (4.36) .409∗∗∗ (28.00) .040∗ (1.69) .161∗∗∗ (4.81)
3.051∗∗∗ (32.36) .136∗∗∗ (2.44) .093∗∗∗ (6.42)
3.266∗∗∗ (38.00)
.422∗∗∗ (29.15) .061∗∗∗ (2.64) .155∗∗∗ (4.59)
.088∗∗∗ (6.13) .052∗∗∗ (4.53) .409∗∗∗ (27.96) .044∗ (1.85) .167∗∗∗ (5.00)
2.949∗∗∗ (106.03) .004 (0.25) .098∗∗∗ (26.92) .0478∗∗∗ (16.03) .478∗∗∗ (121.97) 0.04∗∗∗ (7.08) .102∗∗∗ (11.45)
Plant size Industry (3-digit NAICS)
Yes Yes
Yes Yes
Yes Yes
Yes Yes
R2 No. of plants
.655 849
.647 849
.653 849
.740 12,386
Intercept CNET Log(Knc /L) Log(Kc /L) Log(M/L) MIX MULTI
Notes: Dependent variable labor productivity. T-statistics in parentheses. Notation in the table is the same as in the estimating equation (4). Knc /L, noncomputer capital input in 1999, is proxied by K/L97, the book value of total capital in 1997, divided by 1997 employment. Kc /L, computer capital input in 1999, is proxied by Kc2000 /L, computer investment in 2000 divided by employment in 1999. All other variables are measured in 1999. ∗∗∗Significant at the 1 percent level. ∗∗Significant at the 5 percent level. ∗Significant at the 10 percent level.
capital services when they are unlikely to be good measures. Because information on computer networks was collected only in 1999, our analyses are all cross-sectional. It matters empirically whether data are available to proxy for both computer networks and computer inputs. Each coefficient is higher in the specification that excludes the other measure, suggesting that when each is used alone, it picks up part of the impact of the other.6 Computer input and computer networks both have positive and significant relationships to la6. We report only OLS estimates. Because we use new, or relatively new, plants, we have no good instruments. The two-stage estimates reported in our prior research did not have the expected result of reducing the estimated effect of computer networks. When we estimate OLS specifications on the same sample used in the two-stage estimates, coefficients of variables other than networks and computer investment are stable.
393
Computer Input, Computer Networks, and Productivity Table 13.2
Labor productivity OLS regression results: Plants new between 1992 and 1997 Plants with positive computer investment in 2000 New between 1992 and 1997
Independent variable Intercept CNET Log(Knc /L) Log(Kc /L) Log(M/L) MIX MULTI Plant size Industry (3-digit NAICS) R2 No. of plants
All plants
(1)
(2)
(3)
(4)
3.009∗∗∗ (39.78) .126∗∗∗ (2.78) .084∗∗∗ (8.91) .046∗∗∗ (5.42) .456∗∗∗ (43.38) .036∗∗ (2.13) .143∗∗∗ (5.71)
2.916∗∗∗ (39.26) .1510∗∗∗ (3.31) .088∗∗∗ (9.28)
3.117∗∗∗ (47.90)
.466∗∗∗ (44.54) .057∗∗∗ (3.51) .137∗∗∗ (5.43)
.085∗∗∗ (9.01) .049∗∗∗ (5.71) .457∗∗∗ (43.34) .038∗∗ (2.25) .149∗∗∗ (5.98)
2.949∗∗∗ (106.03) .004 (0.25) .098∗∗∗ (26.92) .0478∗∗∗ (16.03) .478∗∗∗ (121.97) 0.04∗∗∗ (7.08) .102∗∗∗ (11.45)
Yes Yes
Yes Yes
Yes Yes
Yes Yes
.678 1,755
.672 1,755
.665 1,755
.740 12,386
Notes: See notes to table 13.1. ∗∗∗Significant at the 1 percent level. ∗∗Significant at the 5 percent level.
bor productivity in estimates from our preferred specification, as reported in column (1) of table 13.1. The coefficient on computer networks is 0.117, controlling for computer and other inputs and plant characteristics.7 Noncomputer inputs (Knc /L) and computer inputs (Kc /L) have separate and significant relationships to productivity, with coefficients of 0.085 and 0.050.8 Computer networks are significant when they enter the estimation alone, and the coefficient of 0.136, reported in column (2), is higher than when computer input is included. When computer networks are excluded, 7. The exponential of the coefficient 0.117 is 1.124, or a differential of 12.4 percent. However, because the differences between the exponential and the coefficient are not large, we discuss the coefficient rather than the exponential in the text. 8. The cost shares of Knc and Kc are 0.28 and 0.005, while the ratio Kc /Knc is 0.09. Given the low cost share (0.5 percent) of computer capital, its coefficient of 0.050 indicates that computer capital input is a highly productive input relative to noncomputer input. Indeed, while the cost share of noncomputer input is 27.7 percent, it contributes only 8.6 percent to total output. In contrast, the cost share of computer input is only 0.5 percent, but it contributes 5 percent to total output.
394
B. K. Atrostic and Sang Nguyen
computer intensity is significant, with the slightly higher coefficient of 0.052 as reported in column (3) of table 13.1.9 The coefficient of one other variable, MIX, the ratio of nonproduction to production workers, changes appreciably across these specifications. In our preferred specification that includes both computer input and networks (column [1] of table 13.1), the coefficient of MIX is 0.040, but is not significant. An estimate similar in size, 0.044, and in lack of significance, comes from the specification that includes only computer input (column [3]). By comparison, in the specification that only includes computer networks, the coefficient of MIX increases to 0.061, suggesting that computer inputs may be positively related to the worker mix ratio (column [3]). Other researchers find similar relationships between worker mix and computer investment (e.g., Dunne et al. 2000; Haltiwanger, Jarmin, and Schank 2003). Coefficients of most other inputs, plant characteristics, and R2 change little across the three specifications reported in table 13.1, suggesting that the computer network and computer input measures are independent of other inputs or plant characteristics. We assess how sensitive these estimates are to the assumption that our proxies for capital and computer service flows are best for new plants by estimating the same three specifications for the largest sample of 1,755 plants that are new between 1992 and 1997. Estimates based on this broader definition of “new” plants yield similar findings. Computer input and computer networks both have positive and significant relationships to labor productivity, as reported in column (1) of table 13.2. The computer network coefficient of 0.126 is significant for relatively new plants with computers, controlling for computer and other inputs and plant characteristics. Computer input has a separate and significant effect, with a coefficient of computer intensity (Kc /L) of 0.046. When computer networks are excluded, computer intensity remains significant, with a slightly higher coefficient of 0.049, as reported in column (2) of table 13.2. When computer inputs are excluded, computer networks remain significant, with a higher coefficient of 0.1510. Having good proxies for all forms of capital services is empirically important. Coefficients of both computer networks and computer input are significant in all the estimates based on new plants, as reported in tables 13.1 and 13.2. A very different picture emerges from estimates based on plants of all ages. In these estimates, computer networks and computer inputs do not each have empirically separate relationships with labor productivity. The network coefficient of 0.004, reported in column (4) of table 13.2, is not statistically significant. Computer input, however, is positively and significantly related to productivity, with a coefficient of 0.0478. Re9. While we calculate coefficients for industry dummies and for size dummies , we do not report them because such coefficients present standard microdata disclosure problems.
Computer Input, Computer Networks, and Productivity
395
sults based on this sample of plants of all ages suggest that computer networks are not a technology that shifts the production function, distinct from the productive effect of computer inputs. Instead, computer networks appear simply to be a measure of computer inputs. However, it is for this sample that our proxies for total and computer capital inputs are most problematic. 13.5 Discussion Our empirical findings suggest that computer networks may be a new technology that shifts the production function, not just an alternative measure of the presence of computers. The measurement issues we raise about capital inputs have important empirical consequences because those findings hold only when we have good proxies for capital inputs. When we lack good proxies, we would conclude instead that our cross-section estimates of the separate relationships of productivity with computer networks and computer inputs are subject to omitted variable bias and that the new network variable yields no additional information about the relationship between computer use and productivity in U.S. manufacturing. We assess these findings by comparing them with results we obtained in our previous study using these data, when only information on computer networks was available, and with results of other researchers. The final portion of this section discusses how remaining data gaps may affect our estimates and what our findings imply for priorities in filling them. 13.5.1 Comparison with Prior Research Using These Data While we do not deal with causality issues, our findings are consistent with our previous research using these data, which showed a significant and positive relationship between computer networks on labor productivity in both ordinary least squares (OLS) and two-stage regressions (Atrostic and Nguyen 2005). Note that our previous and new estimates are not directly comparable because the samples differ in two ways. The sample we use in this paper is for plants that are new in 1997 and have positive computer investment. Our previous research includes plants of all ages, regardless of whether they had computer investment, because data on computer investment in 2000 were not available.10 In addition, the previous research includes computer networks but not computer inputs. With those differences in mind, we compare the specifications that are most similar in the new and previous research. 10. We also perform parallel sensitivity assessments between the 12,836-observation data set of plants of all ages that we use in this paper and the 10,496-observation 1999 CNUS-only data used in our previous research (Atrostic and Nguyen 2005). Because the same specification estimated on these two data sets yield similar results to those reported here, we do not discuss them separately.
396
B. K. Atrostic and Sang Nguyen
The appropriate comparison from our previous research is with computer network coefficients from OLS regressions on the 10,496 observations that we also used in the two-stage estimates. Those OLS estimates, repeated here in column (2) of table 13.3, show that labor productivity is 3.9 percent higher in plants with networks.11 For plants new in 1997, the estimated computer network impact is 14.6 percent (the exponential of the coefficient 0.136 in column [2] in table 13.1). This is nearly four times the 3.9 percent impact of networks for plants of all ages. The higher coefficient for new plants might mean that newer plants open with the newest embodied technology.12 We note that new plants in our data have lower average productivity, regardless of whether they have networks. What our research finds is that productivity is higher in newer plants that have computer networks, compared to newer plants without networks. The coefficient of the MIX term, the ratio of nonproduction to production workers, also is higher for the new plants (0.061 versus 0.039). This suggests that newer plants that are more productive have a higher proportion of nonproduction workers. Higher ratios of nonproduction to production workers are frequently taken as proxies for higher levels of skills embodied in the workers. Careful research linking the broad groupings of production and nonproduction workers with reports from the 1990 Decennial Census of actual worker education suggests that there can be such embodiment (Doms, Dunne, and Troske 1997). However, we cannot make such linkages with our data. The broad worker classification in the MIX term makes it difficult to read too much into any estimated difference in this coefficient between groups of plants of different ages. 13.5.2 Comparisons with the Information Technology Literature Our finding of positive and significant relationships between computers and computer networks and productivity is consistent with the recent empirical literature at the plant and firm level. Previous research using com11. In contrast to standard findings in estimates from OLS versus two-stage regressions, our previous research shows a positive and significant relationship between computer networks and productivity in both estimates, and the estimate in the two-stage regression, 7.2 percent, exceeds the OLS estimate of 3.8 percent. We obtain the 7.2 percent estimate by evaluating the significant coefficient of the predicted network variable (0.669) at the mean of the network variable. 12. The vintage capital model says that newer plants open with the newest, embodied technology and that plants exit when their productivity becomes too low relative to the new entrants. Consistent with the model are results in the literature suggesting that older plants are more likely to exit, but more productive plants are more likely to continue. However, Baily, Hulten, and Campbell (1992) find little evidence for the vintage capital model in examining transition matrices across years in U.S. manufacturing. They and other researchers find that plants entering an industry have low productivity on average but move within a few years to both the highest and lowest productivity groups. Similarly, Power (1998) finds that productivity increases with plant age, but finds almost no relationship between productivity and the age of investments.
397
Computer Input, Computer Networks, and Productivity Table 13.3
Labor productivity OLS and two-stage regressions: Plants of all ages All CNUS plantsa Two-stage estimates OLS estimates
Independent variable
(1)
Intercept CNET Pr(CNET) Log(K/L) Log(M/L) Log(L) Log(RLP92) Log(MIX) MULTI New New interactions with inputs above Industry (3-digit NAICS)
2.948∗∗∗ (114.95) 0.037∗∗∗ (3.00)
R2 No. of plants
b
(2)
2.92∗∗∗ (90.85) 0.038∗∗∗ (2.76)
0.078∗∗∗ (24.19) 0.083∗∗∗ (22.42) 0.451∗∗∗ (118.96) 0.458∗∗∗ (105.21) –0.005∗ (1.78) –0.004∗∗ (–1.25) 0.276∗∗∗ (32.80) 0.277∗∗∗ (29.91) 0.035∗∗∗ (7.25) 0.032∗∗∗ (5.83) 0.088∗∗∗ (10.04) 0.082 (8.55) 0.203∗∗∗ (7.15) Yes
Yes
Standardb (3)
Corrected errorsb (4)
2.362∗∗∗ (17.23)
2.363∗∗∗ (14.68)
c
c
0.669∗∗∗ (4.39) 0.669∗∗∗ (3.88) 0.082∗∗∗ (22.10) 0.082∗∗∗ (17.52) 0.459∗∗∗ (105.39) 0.459∗∗∗ (52.21) –0.003 (–0.94) –0.003 (–0.082) 0.289∗∗∗ (29.86) 0.289∗∗∗ (21.47) 0.034∗∗∗ (6.27) 0.040∗∗∗ (2.74) 0.04∗∗∗ (2.85) 0.0482*∗∗ (3.63)
Yes
Yes
Yes
Yes
Yes
Yes
0.8133 29,840
0.7811 10,496
0.7724 10,496
0.7819 10,496
Notes: Dependent variable labor productivity. T-statistics in parentheses. L employment at the plant. RLP92 the plant’s labor productivity in 1992 (1997 for plants new in 1997), relative to its 4-digit SIC industry. New indicates a zero–one dummy variable equal to one for plants new since 1997. K/L, total capital input in 1999, is proxied by K/L97, the book value of total capital in 1997. Other variables defined as in tables 13.1 and 13.2. a All coefficients are reported in Atrostic and Nguyen (2005). b The number of observations in columns (2), (3), and (4) is smaller than that in column (1) for several reasons. Estimating the probit in the first stage of the two-stage estimates reported in columns (3) and (4) required variables from prior periods that are not used in the OLS estimates. One of these variables, computer expenditures, is reported by only about half of all plants. Additionally, many plants are new since the prior period, 1992. The OLS regression reported in column (2) uses the same reduced sample that is used in the two-stage estimates. c Evaluating the coefficient of the predicted probability at a point consistent with our data yields an estimated network effect of 7.2 percent. This estimated network effect is higher than the OLS estimate of 3.9 percent from the coefficients in column (2). ∗∗∗Significant at the 1 percent level. ∗∗Significant at the 5 percent level. ∗Significant at the 10 percent level.
puter investment data for U.S. manufacturing through 1992 found a positive link between computer investment and plant-level productivity, with much variation among industries (Stolarick 1999a,b). Two recent reviews of plant- or firm-level empirical studies of information technology (including but not limited to computers) and economic performance (Dedrick, Gurbaxani, and Kraemer 2003; Stiroh 2004) conclude that the lit-
398
B. K. Atrostic and Sang Nguyen
erature shows positive relationships between information technology and productivity. Dedrick, Gurbaxani, and Kraemer (2003) review over 50 articles published between 1985 and 2002, many of which are firm-level studies with productivity as the performance measure. They conclude that firm-level studies show positive relationships, and that gross returns to information technology (IT) investments exceed returns to other investments.13 Stiroh (2004) conducts a meta-analysis of twenty recent empirical studies of the relationship between information technology and the production function. He also estimates a number of specifications used in those studies on a single industry-level database. The meta-analysis of nineteen firmlevel studies that use gross output productivity measures yields a mean elasticity of information technology of 0.042, with large variability around that coefficient. His estimates using the single industry-level database yield OLS estimates of computer capital elasticity of 0.047.14 The coefficient estimate, however, is sensitive to econometric specifications that account, for example, for unobserved heterogeneity. Stiroh’s (2004) meta-analysis and basic OLS regression estimates are close to the coefficient of 0.050 that we report for computer capital elasticity in new plants in our preferred specification in column (1) of table 13.1. His estimates are the same as the coefficient of 0.046 that we report in estimates based on our larger sample of plants that are new between 1993 and 1997. While we are reassured by this empirical regularity, we do not make overly much of it. The estimates in Stiroh’s (2004) analysis may not be adjusted for the high obsolescence rate of computers, the well-known continuing decline in computer prices, or coinvention. Our estimates are for the specific sample for which our data have reasonable proxies, plants that are new in 1997. While we obtain similar results for a larger sample of plants that are new since 1992 (0.117 versus 0.126), we note that both these estimates far exceed the coefficient estimates for the full sample of plants that report computer investment (0.004), or the network coefficient for the full sample of plants, omitting computer investment (0.037). It also is subject to other biases whose net effects may be of any sign. There is some downward bias because computer prices continue to fall at a roughly 30 percent annual rate of decline, so the plant’s computer investment in 2000 buys much more computer input than the same dollar investment would have 13. They warn against concluding that higher gross returns mean that plants are underinvesting in information technology. Most studies do not adjust for the high obsolescence rate of information technology capital, which lowers net returns. Also, total investment in information technology may be understated because most studies measure only computer hardware, but not related labor or software, or costs of coinvention, such as reengineering business processes to take advantage of the new information technology. 14. Both Dedrick, Gurbaxani, and Kraemer (2003) and Stiroh (2002) attribute the failure of early microdata studies to find a relationship to inadequate data with small sample sizes.
Computer Input, Computer Networks, and Productivity
399
bought even in 1999. We assume this price decline affects all plants in the CNUS equally. There is an upwards bias in our estimates, as in the estimates in Stiroh (2004), because we do not measure coinvention. Coinvention is estimated to equal roughly the cost of the hardware and peripheral equipment investment over the life of the investment, so omitting it understates computer inputs. Our findings also are consistent with a relatively new literature in plantor firm-level research conducted in other countries and summarized in Pilat (2004). Many studies cited there find positive relationships between information technology and productivity. Several of those studies also find positive relationships between using computer networks and productivity (e.g., Baldwin and Sabourin (2001) for Canada; Bartelsman, van Leeuwen, and Nieuwenhuijsen (1996) for the Netherlands; and Clayton et al. (2004) for the United Kingdom). Recent research by Motohashi (2003) finds separate positive effects of computer expenditures and computer networks in Japan during the 1990–2001 period, with larger effects in more recent years, but also with much heterogeneity in those effects over time and across industries. Many of these new plant- and firm-level studies conclude that computers are not the only factors contributing to productivity. They find important roles for complementary inputs and investments, such as organizational capital, worker skills, and innovation.15 While our coefficient estimates for the computer and total capital variables are consistent with the literature, we note again that our computer and total capital variables are proxies for the desired capital input measures. It is difficult to interpret coefficients of these proxy variables as the theoretically specified marginal products of computer and total capital. Stiroh (2004) concludes that information technology matters, but “[r]easonable differences in econometric techniques yield a wide range of estimates,” (23) and “one must be careful about putting too much weight on any given estimate” (24). We agree. 13.5.3 Parallels to the Growth Accounting Literature We can use our econometric estimates to quantify the contributions of inputs and computer networks to the change in productivity (both labor and total factor productivity). To get a sense of the contribution of networks, we perform a “productivity change accounting” or decomposition exercise at the microlevel similar to the growth accounting exercise that is often performed to explain the growth of output of goods and services. The conventional “growth accounting” equation can be written as16 15. Recent research for the United States using detailed firm-level investment in computers, communications equipment, software, and other capital goods finds that many components of investment, including information technology investments, are related to productivity (Wilson 2004). 16. See, for example, Schreyer (1999) and Jorgenson and Stiroh (2000).
400
(5)
B. K. Atrostic and Sang Nguyen
Q ∑ i Xi A
where Q ln(Qt /Qt1) output growth, i the output elasticity of input i, Xi ln(Xit /Xit1) the growth of input i, and A ln(At /At1) total factor productivity (TFP) change. Within the context of our cross-section productivity model, the preceding equation can be rewritten in terms of moving from the 25th to the 75th percentile of the distributions of outputs and inputs:17 q ∑ i xi A where q ln[(Q/L)75 /(Q/L25 )] the rate of change of output between the 25th and 75th percentiles, i the output elasticity of input i, xi ln[(X/L)i 75 /(X/L)i25 ] the rate of change of input i between the 25th and 75th percentiles; and A ln[A75 (CNET )/A25 (CNET )] the rate of change of TFP, which in this model is a function of computer networks, or CNET.18 The results of this accounting exercise for U.S. manufacturing plants’ labor productivity are in table 13.4. We perform the exercise for the plants with positive computer investment in 2000. We look separately at plants that are new in 1997, plants that are new between 1992 and 1997, and all plants with positive computer investment. For each group of plants, we estimate the percentage increases in labor productivity due to moving from the 25th to the 75th percentile of the three input variables of interest, log of computer intensity (Kc /L), log of noncomputer capital intensity (Knc /L), and log of materials intensity (M/L). The Solow residual (TFP), A(CNET), is decomposed into the contribution of the presence of a computer network (CNET) and other factors. The decomposition results in table 13.4 are based on the estimated coefficients in our preferred specifications of equation (4). Consider the first entry in column (2) of the first panel for new plants, 12.20 percent. This entry suggests that differences in the intensity of noncomputer capital (Knc /L) contribute about 12 percent to the total change 17. Criscuolo, Haskell, and Slaughter (2005) perform a similar exercise for a single period of cross-section microdata, but in a different context. They focus on accounting for how much differences in the sources of innovation that they measure explain differences in firm output. 18. We actually estimate the effect of differences in computer networks at the 10th and 75th percentiles because networks are present in roughly 90 percent of the plants in our sample. To simplify notation and presentation, we use the “25th to 75th percentile” description for all variables in this section of the paper.
Table 13.4
Contributions of factor inputs and computer network to U.S. manufacturing plants’ productivity change
Variables (elasticities) q Knc /L (0.086) Kc /L (0.050) M/L (0.409) Total input Residual c CNET/q (CNET/q)/residual q Knc /L (0.084) Kc /L (0.046) M/L (0.456) Total input Residual c CNET/q (CNET/q)/residual
Moving from 25th to 75th percentile a (1) Plants new in 1997 (N 849)d 0.9098 [0.086(1.29)]/0.9098 = 0.1220 [0.050(1.95)]/0.9098 = 0.1072 [0.409(1.31)]/0.9098 = 0.5910
.117/.9098 12.86/17.98 = 0.7152 Plants new between 1992 and 1997 (N 1,775)e 0.9859 [0.084(1.41)]/0.9859 = 0.1198 [0.046(2.01)]/0.9859 = 0.0995 [0.046(1.36)]/0.9859 = 0.6282
.126/.9859 = .1278 12.78/15.25 = 0.8380
Contribution to change in productivity b (%) (2)
100.00 12.20 10.72 59.10 82.02 17.98 12.86 71.52 100.00 11.98 9.95 62.82 84.75 15.25 12.78 83.80
Plants of all ages with positive capital investment in 2000 (N 12,386) f q 0.9181 100.00 Knc /L (0.098) [0.098(1.24)]/0.9181 = 0.1323 13.23 Kc /L (0.0478) [0.048(1.94)]/0.9181 = 0.1073 10.73 M/L (0.478) [0.478(1.20)]/0.9181 = 0.6248 62.48 Total input 86.44 Residual c 13.56 CNET/q .004/.9181 = 0.0043 0.43 (CNET/q)/residual 0.43/13.56 = 0.0317 3.17 a
We estimate the effect of differences in computer networks at the 10th and 75th percentiles because networks are present in roughly 90 percent of the plants in our sample. To simplify notation and presentation, we use the “25th to 75th percentile” description for all variables in this section of the paper. b The estimated increase in labor productivity in column (2) are calculated by comparing plants at different points in the distribution of the variables. Specifically, the second number in column (2) minus the contribution of Knc /L to q, is calculated as: [0.086(1.29)]/.9098 0.1220 ( 12.20%). c The residual in this table includes the share explained by the plant characteristics, such as industry, included in our empirical specification and the share unexplained by any variables included in the regression. d The estimated elasticities are taken from column (1), table 13.1. e The estimated elasticities are taken from column (1), table 13.2. f The estimated elasticities are taken from column (4), table 13.2.
402
B. K. Atrostic and Sang Nguyen
in plant-level labor productivity. Differences in the intensity of computer capital, Kc /L, account for about 11 percent of the productivity differential, and differences in materials intensity, M/L, account for about 59 percent. These three inputs together account for 82 percent of the change in labor productivity as the plant moves from the 25th to the 75th percentile of the distributions of each input. The residual (TFP) contributes about 18 percent of the change in labor productivity. We decompose the residual into the share explained by the presence of computer networks and the share that remains unexplained.19 The table shows that the presence of computer networks explains 71.5 percent of the residual, or about 13 percent of the change in the plant’s labor productivity. Computer networks and computer and noncomputer capital have comparable contributions to labor productivity. The second panel of table 13.4 repeats this decomposition exercise for plants that were new between 1992 and 1997. The productivity contributions of inputs and computer networks for these plants, shown in column (2), are similar to those for new plants: the presence of a computer network has an impact analogous to an increase in the intensity of either noncomputer or computer capital. In the third panel of table 13.4, we apply this decomposition to all plants that had positive computer investment in 2000, regardless of plant age. The contribution of either kind of capital intensity ranges from of 10 to 14 percent, similar to the increases for the two groups of younger plants. However, unlike the results for new plants, the decomposition for plants of all ages shows no role for computer networks. This lack of productivity impact is analogous to the lack of statistical significance we found for computer networks in the OLS regression for plants of all ages in column (4) of table 13.2. The lack of significance may reflect an inability of older plants to make effective use of new technologies. However, as we argue in the preceding, it also reflects serious measurement errors in both computer and noncomputer capital. This decomposition exercise gives a sense of the relative importance of standard input variables and the computer network variable in our data. Comparisons with other studies are problematic. Most studies in the standard growth accounting literature use different models, different data, and a different unit of analysis. Standard production function models underlay both our research and growth accounting, but our empirical model focuses 19. Because we estimate our model using detailed microdata, the residual has three components: (a) the share that is explained by our variable of interest, the computer network term CNET; (b) the share explained by the other plant characteristics, such as industry, included in our empirical specification; and (c) the share unexplained by any variables included in the regression. The “unexplained” residual we report in table 13.4 and discuss in this section includes both the second and third of these components. Because our interest is in examining the share of the residual explained by CNET, we do not decompose further the 15 to 20 percent of the residual that is not explained by CNET.
Computer Input, Computer Networks, and Productivity
403
on labor productivity, while growth accounting often looks at output growth. Our data are cross-section data at the plant level and limited to manufacturing, while the growth accounting literature is based on time series data aggregated to the industry or economy level and often includes most or all sectors of the economy.20 With these important differences in mind, some broad comparisons show that our empirical results are consistent with those in the growth accounting literature. Jorgenson and Stiroh (2000) calculate the contributions of four inputs—capital, labor, energy, and materials—to U.S. economic growth by industry for the period 1958 through 1996. To illustrate, we use the contributions they report to calculate the share of output growth due to each input, and the unexplained share, for three of these manufacturing industries. The industries we choose—paper products, primary metals, and industrial machinery and equipment—differ in input mix and economic performance. The share of output growth that is due to capital ranges from 11 percent (Primary Metals and Industrial Machinery and Equipment) to 17 percent (for Paper Products). Our estimates of the contribution of noncomputer capital range from 12 to 13 percent, well within the same ballpark. The share of growth unexplained by these inputs in Jorgenson and Stiroh ranges from a low of 14 percent in Paper Products to a high of 31 percent for Industrial Machinery and Equipment. Our estimates of the residual’s share of the change in labor productivity run from 14 to 18 percent, at the low end of their estimates. A second comparison is with growth accounting exercises by Oliner and Sichel (2000) and Jorgenson and Stiroh (2000) for the entire economy that include IT as a separate element of capital services. These studies use different source data and cover slightly different time periods. We choose the results for the period nearest our data, 1996–1999 for Oliner and Sichel and 1995–1998 for Jorgenson and Stiroh, as reported in tables 1A and 1B of Bosworth and Triplett (2001). We again calculate the share of each input to the reported growth of output, and the share of growth that is unexplained. The IT share of output growth is 23 percent in Oliner and Sichel and 20 percent for Jorgenson and Stiroh, with the residual accounting for 24 and 21 percent of output growth. The shares of IT are roughly twice as high in both of these studies as in ours, but the residual shares are similar. 13.5.4 Important Data Gaps and Implications for Data Collections The new network data in the 1999 CNUS and the computer investment data in the 2000 ASM are critical to understanding how IT affects plantlevel productivity. Assessments of the data needed to understand the 20. A series of plant-level studies, beginning with Baily, Hulten, and Campbell (1992) and including studies cited in Schreyer (1999), perform growth decompositions using panel data but do not include computers as a separate input.
404
B. K. Atrostic and Sang Nguyen
emerging electronic economy, for example, Atrostic, Gates, and Jarmin (2000), and Haltiwanger and Jarmin (2000), identified the lack of information on these variables as critical gaps. Recent data initiatives attempted to fill these gaps by collecting this information for large and nationally representative samples (Mesenbourg 2001). Early microdata studies lacked large representative national samples collected by official statistical organizations. For example, Dedrick, Gurbaxani, and Kraemer (2003) report that Barua, Kriebel, and Mukhopadhyay (1995) draws on sixty business units in twenty U.S. companies. Similarly, Brynjolfsson and Hitt (2000, 2003) and Brynjolfsson et al. (2002) analyze between 500 and 600 firms for which they combine information from a private database on the firms’ computer capital stock with public information on other inputs and financial variables from Compustat. Larger samples of roughly 38,000 plants became available in the 1988 and 1993 Surveys of Manufacturing Technology (SMT), but were limited to five two-digit Standard Industrial Classification (SIC) industries. Also, while the SMT collected data on the use of a number of technologies, Doms, Dunne, and Troske (1997) stress that they are process and control technologies, and not measures based directly on the use of computers. The computer network and investment data that we use in this paper, by contrast, were asked of the roughly 50,000 plants in the ASM sample, and these plants are distributed among all the NAICS manufacturing industries. The gap in information on computer capital is being addressed in several ways. Plant-level data on computer investment is collected in the 2001, 2002, 2003, and 2004 ASM. The 2002 Economic Census collected data on both the book values of assets and capital expenditures, with separate information on expenditures on computer equipment and peripherals. In addition, beginning in 2003, the Annual Capital Expenditures Survey (ACES) collects information on both capitalized and expensed expenditures on information and communications technology structures and equipment, including computer software. Because ACES data are collected at the company level, neither totals nor separate detail for expenditures on these information technology expenditures will be available at the plant level. However, our empirical findings suggest that the key new variable in our analysis is computer networks. Computer network information was only collected in 1999. Lacking network information for multiple periods means that we cannot conduct logical next steps in empirical work. Panel data techniques address many standard plant-level measurement issues, including unobserved heterogeneity beyond those input and plant characteristics we control for, such as managerial ability. Nor can we investigate how the presence of computers and computer networks affect the dynamics of plant performance. The 1999 CNUS shows that filling this gap is not difficult. The network measure we use was constructed from a few pieces of information (U.S.
Computer Input, Computer Networks, and Productivity
405
Census Bureau 2002). Our empirical findings show that this measure alone conveys important information about firm heterogeneity in the uses of computers and, in particular, on the newest uses. Room for its components should be eked out of survey instruments and respondent burden calculations. 13.6 Conclusions We find that it is important empirically to have a separate measure of how businesses use computers. Production functions estimates using variables derived from new data on computer networks and computer investment show that both variables have positive and significant relationships with plant-level labor productivity. This finding suggests that computer networks are a new technology that shifts the production function, distinct from the productive effect of computer inputs in the production process. We also show that it is important empirically to have good proxies for computer networks and computer and total capital inputs. When we lack good proxies, computer networks appear to be just an alternate measure of computers. New data raise the level in the statistical glass but also raise our expectations for the questions we can answer, without enabling us to address all them (Griliches 1994). These new data allow us to estimate single-period models, but not panel or other multiperiod models, and we lack any measures of other variables, such as worker or managerial quality, that have been found important in other empirical studies. The statistical glass nevertheless is filled higher for U.S. manufacturing than for other sectors. Data on variables critical to this analysis, such as computer networks, computer input, book value of capital, and other inputs, are rare in official U.S. data collections for sectors outside of manufacturing.
Appendix Data and Empirical Specification of Variables Data The 1999 Annual Survey of Manufactures Computer Network Use Supplement was mailed to the plants in the ASM sample in mid-2000. The supplement asked about the presence of computer networks and the kind of network (EDI, Internet, both). It also collected information about manufacturers’ e-commerce activities and use of e-business processes. The questionnaire asked if the plant allowed online ordering and the percentage of total shipments that were ordered online. Information on online purchases
406
B. K. Atrostic and Sang Nguyen
was also asked. In addition, information was collected about the plant’s current and planned use of about twenty-five business processes conducted over computer network (such as procurement, payroll, inventory, etc.— “e-business processes”) and the extent to which the plant shared information online with vendors, customers, and other plants within the company. The Annual Survey of Manufactures (ASM) is designed to produce estimates for the manufacturing sector of the economy. The manufacturing universe consists of approximately 365,000 plants. Data are collected annually from a probability sample of approximately 50,000 of the 200,000 manufacturing plants with five or more employees. Data for the remaining 165,000 plants with fewer than five employees are imputed using information obtained from administrative sources. Approximately 83 percent of the plants responded to this supplement. All CNUS data are on the North American Industry Classification System (NAICS) basis. Because the data are only from respondents to the CNUS and are not weighted (see the discussion in http://www.census.gov/estats), our results may apply only to responding plants. We note, however, that the plants responding to the CNUS account for a substantial share of the U.S. manufacturing employment and output (about 50 to 60 percent) represented in the ASM. Variables
• Capital (KT): Data on capital services are the appropriate measure for production function estimation and productivity analysis. Because such data are not available at the micro level, we use book values of gross capital stocks (including buildings and machinery assets) collected in the 1997 CM as a proxy for K. We use 1997 data on capital intensity (K/L) because data on total capital stock are collected in the 1997 Economic Census but not in the ASM (see table 13A.1). Although we recognize that these data have limitations as measures of capital services, it is widely recognized that it is difficult to handle these problems in cross-sectional analysis. We therefore follow many previous studies (e.g., McGuckin, Streitwieser, and Doms 1998; Greenan, Mairesse, and Topiol-Bensaid 2001) and use book values of capital as a proxy for capital input, K. This implies that services are proportional to the book value of capital. This assumption is made more reasonable by the controls for plant characteristics in our regressions. • Computer Investment (IC ): This is computer investment as reported in the 2000 ASM. • Materials (M ): These are the sum of values of materials and parts, values of energy consumed (including electricity and fuels), and values of contract work. • Skill Mix (MIX): This variable is defined as the number of nonproduction workers (OW) divided by total employment (TE) in the plant, as reported on the 1999 ASM. Computer networks require highly
Computer Input, Computer Networks, and Productivity Table 13A.1
Computer and capital input and computer network data in the Annual Survey of Manufactures and the Census of Manufactures, 1987–2002
Year
Total capitala,b
1987 1988 1989 1990 1992 1993 1994 1995 1996 1997 1998 1999 2000
X
2001 2002 a
407
X
Computer investment
Computer network
Data available when study was conducted X
X
X X X
X
X X
Book value. Annual Survey of Manufactures sample of the Census of Manufactures only.
b
skilled workers to develop and maintain them. Productivity might thus be higher at plants with a higher proportion of skilled labor because these workers are able to develop, use, and maintain advanced technologies, including computer networks. But applications such as expert systems may allow a function to be carried out with employees who have lower skill levels or with fewer employees.21 • SIZE: Plant size is specified as a standard series of six dummy vari21. Occupational detail would be desirable to test the relationship among productivity, networks, and the presence of such skilled occupations as computer programmers and systems support staff (e.g., Greenan, Mairesse, and Topiol-Bensaid (2001) and Motohashi (2001). However, the ASM only collects information on the total numbers of production and nonproduction workers in the plant, with no further detail by process, function, or worker characteristic. Dunne and Schmitz (1995) found that plants in the 1988 SMT that used advanced technologies had higher ratios of nonproduction to total workers. Doms, Dunne, and Troske (1997) find that plants that adopt new technologies have more skilled workforces both before and after adoption. As with many other plant-level studies, we use this employment ratio to proxy for skill mix in our productivity estimates. Production workers accounted for about onequarter (27 percent) of employment among CNUS respondents in manufacturing. This share is similar to shares reported for the five two-digit U.S. Standard Industrial Classification (SIC) industries in the 1988 and 1993 SMTs (e.g., McGuckin, Streitwieser, and Doms 1998). However, some production workers are in highly skilled occupations, and some nonproduction workers are in relatively less-skilled jobs such as janitors, and the literature is scarcely unanimous that the nonproduction labor share is a measure of skill (e.g., Dunne, Haltiwanger, and Troske [1997]; Berman, Bound, and Griliches [1994]). We follow Dunne et al. (2000) in both using this measure and being cautious in interpreting it as an indicator of skill.
408
B. K. Atrostic and Sang Nguyen
ables. About 30 percent of the plants in our core CNUS sample have fewer than 50 employees, 20 percent have between 50 and 99 employees, about 30 percent have between 100 and 250 employees, and the remaining 20 percent are in larger plants. • Multiunit Firms’ Plants (MULTI): Many manufacturing plants are part of multiunit firms, so employment size alone is an inadequate indicator of available resources, managerial expertise, and scale. We construct a dummy variable, MULTI, that takes on the value of one if the plant is part of a multiunit firm, and equals zero otherwise. Nearly two-thirds of the plants in our sample are part of a multiunit firm. • Industries (IND): All previous studies of plant-level behavior note substantial heterogeneity among plants within detailed manufacturing industries as well as between detailed industries. There are twentyone three-digit NAICS manufacturing industry groups in our sample (NAICS codes 311–316, 321–327 and 331–337). Industry dummies (IND) are included in the basic empirical model specifications to capture industry-specific effects on plant-level labor productivity.
References Atrostic, B. K., J. Gates, and R. Jarmin. 2000. Measuring the electronic economy: Current status and next step. Center for Economic Studies Discussion Papers in Economics no. CES 00-10. Washington, DC: U.S. Bureau of the Census, June. Atrostic, B. K., and S. Nguyen. 2005. IT and productivity n U.S. manufacturing: Do computer networks matter. Economic Inquiry 43 (30): 493–506. Baily, M. N., C. Hulten, and D. Campbell. 1992. Productivity dynamics in manufacturing plants. Brookings Papers on Economic Activity, Microeconomics: 187– 267. Baldwin, J., and D. Sabourin. 2004. Impact of the adoption of advanced information and communication technologies on firm performance in the Canadian manufacturing sector. Analytical Studies Research Paper no. 11F0019MIE20001174. Ottowa, Canada: Statistics Canada. Bartelsman, E., G. van Leeuwen, and H. R. Nieuwenhuijsen. 1996. Advanced manufacturing technology and firm performance in the Netherlands. Netherlands Official Statistics 11 (autumn): 40–51. Barua, A., D. J. Kriebel, and T. Mukhopadhyay. 1995. Information technologies and business value: An analytic and empirical investigation. Information Systems Research 6 (1): 3–23. Becker, R., J. Haltiwanger, R. Jarmin, S. Klimek, and D. Wilson. 2004. Micro and macro data integration: The case of capital. Paper presented at the NBER/ CRIW conference on the Architecture of the National Accounts, Washington, DC. Berman, E., J. Bound, and Z. Griliches. 1994. Changes in the demand for skilled labor within U.S. manufacturing: Evidence from the Annual Survey of Manufacturers. The Quarterly Journal of Economics 109 (2): 367–97. Bosworth, B., and J. Triplett. 2001. What’s new about the new economy? IT, eco-
Computer Input, Computer Networks, and Productivity
409
nomic growth and productivity. International Productivity Monitor 2 (spring): 19–30. Bresnahan, T., and S. Greenstein. 1997. Technical progress and coinvention in computing and the uses of computers. Brookings Papers on Economic Activity, Macroeconomics: 1–78. Bresnahan, T., and M. Trajtenberg. 1995. General purpose technologies: Engines of Growth? Journal of Econometrics 65:83–108. Brynjolfsson, E., and L. M. Hitt. 2000. Beyond computation: Information technology, organizational transformation and business performance. Journal of Economic Perspectives 14 (fall): 23–48. ———. 2003. Computing productivity: Firm-level evidence. Review of Economics and Statistics 84 (4): 793–808. Brynjolfsson, E., L. Hitt, S. Yang, M. N. Baily, and R. Hall. 2002. Intangible assets: Computers and organizational capital/comments and discussion. Brookings Papers on Economic Activity, Issue no. 1:137–99. Clayton, T., C. Criscuolo, P. Goodridge, and K. Waldron. 2004. Enterprise ecommerce: Measurement and impact. In The economic impact of ICT, ed. D. Pilat, 241–60. Paris: Organization for Economic Cooperation and Development. Cooper, R., J. Haltiwanger, and L. Power. 1999. Machine replacement and the business cycle: Lumps and bumps. American Economic Review 89 (5): 921–46. Criscuolo, C., J. E. Haskel, and M. J. Slaughter. 2005. Global engagement and the innovation activities of firms. NBER Working Paper no. 11479. Cambridge, MA: National Bureau of Economic Research. Dedrick, J., V. Gurbaxani, and K. Kraemer. 2003. Information technology and economic performance: A critical review of the empirical evidence. ACM Computing Surveys 35 (1): 1–28. Doms, M. 1996. Estimating capital efficiency schedules within production functions. Economic Inquiry 34 (1): 78–92. Doms, M., T. Dunne, and K. Troske. 1997. Workers, wages, and technology. The Quarterly Journal of Economics 112 (1): 253–90. Dunne, T., L. Foster, J. Haltiwanger, and K. Troske. 2000. Wage and productivity dispersion in U.S. manufacturing: The role of computer investment. NBER Working Paper no. 7465. Cambridge, MA: National Bureau of Economic Research. Dunne, T., J. Haltiwanger, and K. R. Troske. 1997. Technology and jobs: Secular changes and cyclical dynamics. Carnegie-Rochester Conference Series on Public Policy 46 (June): 107–48. Dunne, T., and J. A. Schmitz, Jr. 1995. Wages, employment structure and employer size-wage premia: Their relationship to advances-technology usage at U.S. manufacturing establishments. Economica 62 (245): 89–107. Greenan, N., J. Mairesse, and A. Topiol-Bensaid. 2001. Information technology and research and development impacts on productivity and skills: Looking for correlations on French firm level data. NBER Working Paper no. 8075. Cambridge, MA: National Bureau of Economic Research, January. Griliches, Z. 1994. Productivity, R&D, and the data constraint. American Economic Review 84 (1): 1–23. Griliches, Z., and J. Mairesse. 1995. Production functions: The search for identification. In Practicing econometrics: Essays in method and application, ed. Zvi Griliches, 383–411. Cheltenham, UK: Edgar Elgar, 1998. Haltiwanger, J., and R. Jarmin. 2000. Measuring the digital economy. In Understanding the digital economy, ed. E. Brynjolfsson and B. Kahin, 13–33. Cambridge, MA: MIT Press.
410
B. K. Atrostic and Sang Nguyen
Haltiwanger, J., R. Jarmin, and T. Schank. 2003. Productivity, investment in ICT and market experimentation: Micro evidence from Germany and the U.S. Center for Economic Studies Discussion Papers in Economics no. CES-03-06. Washington, DC: U.S. Bureau of the Census, February. Jorgenson, D., M. Ho, and K. Stiroh. 2005. Growth of U.S. industries and investments in information technology and higher education. In Measuring capital in the new economy, ed. C. Corrado, J. Haltiwanger, and D. Sichel, 403–78. Studies in Income and Wealth, vol. 65. Chicago: University of Chicago Press. Jorgenson, D. W., and K. J. Stiroh. 2000. Industry-level productivity and competitiveness between Canada and the United States. American Economic Review 90 (May): 161–67. McGuckin, R. H., M. L. Streitwieser, and M. E. Doms. 1998. The effect of technology use on productivity growth. Economic Innovation and New Technology Journal, 7 (October): 1–26. Mesenbourg, T. 2001. Measuring electronic business. http://www.census.gov/estats. Motohashi, K. 2001. Economic analysis of information network use: Organizational and productivity impacts on Japanese firms. METI Research and Statistics Department Working Paper. Tokyo: METI, January. ———. 2003. Firm-level analysis of information network use and productivity in Japan. Paper presented at Comparative Analysis of Enterprise Data (CAED) conference, London. Oliner, S. D., and D. E. Sichel. 2000. The resurgence of growth in the late 1990s: Is information technology the story? Journal of Economic Perspectives 14 (4): 3–22. Pilat, D., ed. 2004. The economic impact of ICT. Paris: Organization for Economic Cooperation and Development. Power, L. 1998. The missing link: Technology, investment, and productivity. Review of Economics and Statistics 80 (2): 300–313. Schreyer, P. 1999. OECD manual on productivity measurement: A guide to the measurement of industry-level and aggregate productivity growth. Paris: Organization for Economic Cooperation and Development. Stiroh, K. J. 2004. Reassessing the impact of IT in the production function: A metaanalysis. New York: Federal Reserve Bank of New York. Stolarick, K. M. 1999a. Are some firms better at IT? Differing relationships between productivity and IT spending. Center for Economic Studies Working Paper no. 99-13. Washington, DC: U.S. Bureau of the Census. ———. 1999b. IT spending and firm productivity: Additional evidence from the manufacturing sector. Center for Economic Studies Working Paper no. 99-10. Washington, DC: U.S. Bureau of the Census. Triplett, J., and B. Bosworth. 2003. Baumol’s disease has been cured: IT and multifactor productivity in U.S. services industries. Federal Reserve Bank of New York Economic Policy Review 9 (3): 23–33. U.S. Census Bureau. 2001. General summary: 1997 Economic Census, manufacturing. http://www.census.gov/prod/ec97/97m31s-gs.pdf. ———. 2002. E-stats. http://www.census.gov/estats. Wilson, D. 2004. Productivity and capital heterogeneity: A firm-level analysis. Federal Reserve Bank of San Francisco. Unpublished Mimeograph.
V
Measuring and Modeling Productivity, Consumption, and Diffusion
14 Services Productivity in the United States Griliches’s Services Volume Revisited Barry P. Bosworth and Jack E. Triplett
14.1 Introduction In the introduction to his Conference on Research in Income and Wealth (CRIW) volume on services, Zvi Griliches (1992) reviewed services-sector productivity trends, as well as issues in measuring services productivity, as these matters stood in the early 1990s (see also his American Economic Association presidential address; Griliches 1994). In this paper, we analyze the rapid post-1995 productivity growth in services industries, which as we show have contributed greatly to the strength of U.S. productivity growth in recent years. We also review some of the major measurement issues that Griliches addressed, from roughly a dozen years on. The contexts of the early 1990s and early 2000s are very different yet, at the same time, similar. Griliches wrote in the context of the post-1973 U.S. productivity slowdown, which was the big puzzle of that day. He pointed out that services were crucial to the post-1973 slowdown because productivity in services industries grew much more slowly than productivity in goods-producing industries. Services, therefore, acted as a brake on U.S. productivity growth, a conclusion that was unsettling because services have represented an increasing share of U.S. economic activity, a pattern that is also evident in Europe and other advanced economies. The post-1973 puzzle was never resolved, just abandoned by economists when they were confronted with a new problem—the acceleration of U.S. Barry P. Bosworth is a senior fellow in economic studies (the Robert V. Roosa Chair in International Economics) at the Brookings Institution. Jack E. Triplett is a nonresident senior fellow in economic studies at the Brookings Institution. We are very indebted to David Gunter and Kristin Wilson for their superb research assistance and to Michel Harper, Mun Ho, Larry Rosenbloom, Daniel Sichel, Kevin Stiroh, and Robert Yuscavage for helpful conversations about the data and some of the technical issues.
413
414
Barry P. Bosworth and Jack E. Triplett
productivity growth after about 1995. We find, in this paper and in our previous one (Triplett and Bosworth 2006), that accelerating productivity in services industries played a crucial part in post-1995 U.S. productivity growth. Indeed, in recent years services-industry labor productivity has grown as fast as labor productivity in the rest of the economy, which is why we have previously said that “Baumol’s disease has been cured.”1 In this, our findings are a mirror image of the conclusions emphasized by Griliches: both the post-1973 slowdown and the post-1995 acceleration in U.S. productivity growth—both labor productivity and multifactor productivity (MFP)—are located disproportionately, though not entirely, in services. Services MFP growth has not been emphasized in other research on the post-1995 resurgence in productivity, which has perhaps too strongly emphasized high productivity growth in electronics-producing industries. In Griliches’ time and now, services industries are the industries that are the most intensive users of information technology (IT) and communication technology capital equipment. But unlike Griliches, who complained that the IT effect on services productivity was invisible in the data of his day, we find that IT investments now make a substantial contribution to labor productivity growth in services-producing industries. This, of course, is another change from the early 1990s, when lagging services productivity seemed a stifling problem for economic growth. As in most of his writing on productivity, Griliches (1992, 1994) emphasized measurement issues. He was perhaps the foremost of his generation to insist that measurement is part of the science of economics (as it is in all other quantitative sciences) and not just a low-order task to be left to statistical agencies. Data collecting may not itself be part of the science of economics, but specifying what should be gathered and what is needed for economic analysis certainly is. In this regard, Griliches noted the inadequate state of U.S. services-productivity statistics around 1990, which included (but was not limited to) a major deficiency in the conceptual design of the Bureau of Economic Analysis (BEA) industry database, as it then existed: The double-deflation procedure (the subtraction of deflated intermediate purchases from deflated gross output to arrive at a real value-added concept) is itself troublesome, as is also the GNP by industry construction, which is based on a value-added measure of an industry’s output. . . . For productivity measurement purposes we would be much better off with explicit and separate series on gross output and intermediate inputs in constant prices. (Griliches, 1992, 8–9) 1. See Triplett and Bosworth (2006). Baumol’s disease is the presumption, or perhaps the consequence of the presumption, that it is inherently more difficult to increase services productivity than goods-producing productivity—see Baumol (1967).
Services Productivity in the United States
415
The measurement of services-sector productivity has advanced hugely since the early 1990s. The best indicator of the improvement that has taken place is displayed in our paper: we calculate MFP for two-digit services industries based on gross output (not value added, as in most previous industry-level studies), using a combination of government databases from the BEA and the Bureau of Labor Statistics (BLS)—and implicitly the Census Bureau, as the other two agencies’ compilations rest heavily on data collected by the “economic directorate” part of the Census Bureau. These industry measures incorporate as inputs capital services from different kinds of assets, including separate measures for capital services from IT equipment, and deflated intermediate inputs, exactly along the lines that Griliches recommended. With the new database, we can compare productivity trends in goodsproducing and services-producing industries, we can calculate contributions to growth at the industry level using the well-known Solow framework, and we can aggregate the industry productivity estimates to be consistent with the aggregate productivity estimates that have appeared in “macro” studies such as Oliner and Sichel (2000, 2002), Baily and Lawrence (2001), Gordon (2000, 2002), and Jorgenson and Stiroh (2000). None of this was possible a dozen years ago when Griliches wrote. It has become possible largely because government agencies have implemented some of the recommendations of Griliches and have also taken notice of the substantial contributions to economic accounting of Dale Jorgenson and his collaborators (for example, Jorgenson, Gollop, and Fraumeni 1987). With these great improvements to the government industry database, we can ask and answer questions about post-1995 productivity growth that were nearly impossible to confront for the post-1973 productivity slowdown. The BEA industry accounts are constructed to be fully consistent with the estimates of aggregate gross domestic product (GDP). They exist for sixty-six industries, published annually, at roughly the two-digit industry level of the old Standard Industrial Classification (SIC) system. After excluding government and the farm sector and combining some industries for which the BLS does not estimate separate information on capital services, we have fifty-four industries (twenty-five in goods-producing, and twenty-nine in service-producing) within the private nonfarm business sector, spanning the period of 1987–2001.2 The new database is described more fully in Triplett and Bosworth 2. The BEA-BLS industry data set is an alternative to that developed by Dale Jorgenson and his various coauthors. While they share many of the same sources, the BEA data offer more disaggregation of the service-producing industries. On the other hand, the Jorgenson data are available for a longer time period, and they include measures of labor quality. There are often considerable differences between the two data sets in the growth rates of output at the level of individual industries.
416
Barry P. Bosworth and Jack E. Triplett
(2004). The database improvements are documented in Yuskavage (1996) and in Lum, Moyer, and Yuskavage (2000). An evaluation of the current data set and plans for its extension are outlined in Yuskavage (2001), and more recent updates are Moyer et al. (2004) and Lawson et al. (2006). 14.2 Summary and Overview It is now well known that aggregate U.S. labor productivity and MFP accelerated after 1995, with the amount of the acceleration understandably depending on the end period. Using 2002, for example, labor productivity rose at a 2.8 percent annual rate since 1995, compared to 2.4 percent over the 1995–2001 period (2001 was a recession year). In order to reduce the sensitivity of our results to these end-point issues, we present mainly leastsquares trend rates of change, which give 2.5 percent per year for trend labor productivity growth for the 1995–2001 interval (table 14.1), compared with 1.0 percent for 1987–1995. It is also well known that the sources of recent U.S. productivity advance include capital deepening from increased investment in IT (information and communications technology) and an acceleration in MFP growth. At the macro level, these results for the United States have been presented in Oliner and Sichel (2000, 2002), Baily and Lawrence (2001), Gordon (2000, 2002), and Jorgenson and Stiroh (2000); O’Mahoney and van Ark (2003) review the international evidence. We add to the evidence on recent U.S. productivity growth by computing labor productivity, MFP, and a contributions to growth model at roughly the two-digit SIC level. Three reasons suggest the value of doing productivity research at the industry level.
Table 14.1
Labor productivity and multifactor productivity growth in goodsproducing and service-producing industries (trend rates of growth, value added per worker, BEA industry accounts) 1987–1995
1995–2001
Change
Private nonfarm business Goods-producing industries Service-producing industries
Labor productivity 1.0 1.8 0.7
2.5 2.3 2.6
1.5 0.5 1.8
Private nonfarm business Goods-producing industries Service-producing industries
Multifactor productivity 0.6 1.2 0.3
1.4 1.3 1.5
0.9 0.1 1.1
Source: Table A2-2 in Triplett and Bosworth (2004). As explained there, the aggregate productivity numbers differ from those published by BLS (see also footnote 5 in section 14.3 of this paper).
Services Productivity in the United States
417
14.2.1 Aggregation The Solow (1957) productivity paradigm concerns a production function. The empirical application of the production function to any aggregation of producing units always presents problems, but the production function framework fits an industry level of analysis better than the aggregate level. Fisher (2003, 228) summarized extensive results on aggregation theory by stating: “The question of . . . what meaning can be attached to aggregate production functions . . . [is equivalent to asking] whether there is any system of aggregation over diverse firms that results in some measure of efficiently produced aggregate output being a function of a capital aggregate and a labor aggregate.” As Fisher has documented, the aggregation conditions are stringent. They are undoubtedly not met even for industry data, but they are less violently rejected for industry data. For example, one aggregation condition requires that all firms must produce the same vector of outputs. This is nonsense if aggregation proceeds over barber shops and computer factories, but the output vectors of various barber shops must contain more correspondence than those of barber shops and computer factories, whether the aggregation conditions are exactly satisfied at the level of the barber shop industry. From this, it is reasonable to suppose that aggregation at the industry level must perforce do less quantitative damage to the analysis (though it is also true that one cannot prove this proposition). Against this, one might contend that measurement errors are more severe at the industry level. Grundfeld and Griliches (1960) pointed out long ago that at the aggregate level some measurement errors offset. 14.2.2 Sectoral Sources A major issue surrounding recent U.S. productivity growth is whether the United States has experienced any productivity growth outside the electronics-manufacturing sector. Gordon (2000, 2002) has promoted the view that most if not all of the U.S. productivity advance originates in computer and semiconductor manufacturing. Obviously, the way to resolve this question is to compute productivity growth at the industry level, which we do. Our industry productivity growth results show that the “only in electronics manufacturing” contention is false—more than three-quarters of net labor productivity and MFP growth since 1995 is in the services industries. Moreover, most of the acceleration in labor productivity growth after 1995, and all of the acceleration in MFP growth, took place in the services industries. The goodsproducing industries made no net contribution to the acceleration of U.S. MFP growth after 1995. Though productivity growth is very rapid in electronics, the unprecedented productivity growth of the services industries (both labor productivity and MFP growth) is the most striking attribute of the recent advance in U.S. productivity.
418
Barry P. Bosworth and Jack E. Triplett
14.2.3 IT Contribution to Growth The contribution of IT to aggregate U.S. labor productivity growth is another major research issue that is best approached at the industry level of analysis. We estimate the overall contribution of IT by examining its contribution in the industries and sectors where the IT is located—predominantly, in the services industries. We find that 80 percent of the total contribution of IT to aggregate U.S. labor productivity growth after 1995 arises from IT’s contribution in the services industries. 14.3 Method We construct measures of labor and multifactor productivity for each of the fifty-four industries and various aggregates. Labor productivity growth is the output index divided by a simple index of the labor input. Multifactor productivity growth is the ratio of the output index to a weighted average of the inputs, K, L, and M (capital and labor services and intermediate inputs), so the rate of change in gross output MFP is defined: (1)
d ln MFP d ln Q [(1 v)(sl d ln L skd ln K ) vd ln M ],
where d ln MFP designates the rate of growth of MFP (and similarly for the other variables). Inputs include combined energy, materials, and purchased services intermediate inputs (M ) in addition to labor (L) and capital services (K ), v equals the two-period average share of intermediate purchases in gross output, and s1 and sk are the two-period averages of the share of capital and labor income in value added. We compute a Törnqvist chain index of the weighted annual changes in the inputs.3 We also estimate growth accounting equations for each of these industries in order to analyze the contributions of capital and materials deepening and MFP to the growth and acceleration of labor productivity:
KIT KN M (2) d ln LP wKITd ln wKNd ln wM d ln d ln MFP L L L In both equations, we disaggregate capital services, K, into IT capital (KIT) and non-IT capital (KN ). 14.4 Trends in Labor Productivity and MFP at the Industry Level In this section, we report our estimates of labor productivity and MFP for the industries in the BEA industry database.4 3. The output data of the BEA are aggregated using Fisher indexes. We switched to Törnqvist indexes only to take advantage of a slightly simpler algorithm. 4. This section and the following one summarize empirical work that is presented more fully in Triplett and Bosworth (2004).
Services Productivity in the United States
419
Although we emphasize productivity estimates at the detailed level, the outline of our major findings also emerges from direct sector-level estimates, where industry value added and inputs are aggregated to the sector level.5 As table 14.1 demonstrates, services-sector labor productivity advanced at a 2.6 percent trend rate in 1995 to 2001, compared with 2.3 percent per year for the goods-producing sector. The post-1995 acceleration in the services sector (at 1.8 percentage points) also far exceeds the acceleration of labor productivity growth in the goods-producing sector (0.5 points)—see table 14.1. Similarly, MFP growth in the services sector exceeded MFP growth in the goods-producing sector, post-1995 (1.5 percent per year, compared with 1.3 percent). The services-producing sector accounts for all of the acceleration in U.S. MFP growth because there was minimal acceleration in MFP growth in the goods-producing sector, taken as a whole (only 0.1 percentage point). As we said in our previous paper (Triplett and Bosworth 2006, 34), “Baumol’s disease has been cured.” The aggregations conceal much heterogeneity among the industries. We compute industry labor productivity and MFP for twenty-five goodsproducing industries and twenty-nine services-producing industries.6 In both goods-producing and services-producing sectors, some industries experienced very high labor productivity growth, such as electronics in goods-producing and brokerage/finance among services industries. Labor productivity growth in the goods-producing sector is restrained by low productivity growth in mining and negative productivity growth in construction. A number of services sectors also had negative productivity growth. These industries include hotels, entertainment and recreation, and education. It is important to recognize that the net change in sector productivity reflects the behavior of productivity in the individual industries within the sector, and within both services- and goods-producing sectors, there are industries with negative as well as positive productivity growth. Tables 14.2 and 14.3 provide a detailed view of the changes in labor and MFP for the twenty-nine services-producing industries. We focus on the services-producing industries because they play such a dominant role in the post-1995 productivity resurgence, and it is in this sector that the industry analysis offers a different interpretation of the resurgence compared 5. As explained in Triplett and Bosworth (2004), these aggregations of BEA industry data do not yield precisely the BLS published productivity numbers. Though the differences arise from a number of respects in which BEA and BLS databases differ, the major cause is the fact that the BEA industry database is consistent with the income side of the accounts, where the BLS productivity estimates are based on the expenditure side. This means that the rate of growth in our aggregations is larger than in the BLS published numbers, but this is not a major limitation on our results, partly because in the recently released benchmark revision of GDP, the product side was revised more than the income side. 6. As noted earlier, our productivity estimates use a measure of gross output, rather than value added, as in some past industry-level studies.
420 Table 14.2
Barry P. Bosworth and Jack E. Triplett Growth in labor productivity in 29 service industries, 1987–2001 (annual trend rates of change based on gross output) Trend growth in output per worker
Industry Railroad transportation Local and interurban passenger transit Trucking and warehousing Water transportation Transportation by air Pipelines, except natural gas Transportation services Telephone and telegraph Radio and television Electric, gas, and sanitary services Wholesale trade Retail trade Depository institutions Nondepository institutions Security and commodity brokers Insurance carriers Insurance agents, brokers, and service Real estate (excluding owner-occupied housing) Hotels and other lodging places Personal services Business Services Auto repair, services, and parking Miscellaneous repair services Motion pictures Amusement and recreation services Health services Legal services Educational services Other services
Value added weight
1987–1995
1995–2001
Change
0.4 0.2 1.6 0.2 1.1 0.1 0.4 2.6 0.7 3.4 8.5 11.3 4.0 0.6 1.4 1.9 0.8
6.2 –1.7 3.4 1.7 0.0 –0.7 2.0 5.5 0.0 2.1 3.4 1.3 2.9 2.4 7.2 –0.6 –3.3
2.1 –0.6 0.8 1.0 0.4 1.2 3.5 7.9 1.8 2.0 4.2 3.4 3.1 1.9 10.3 –1.7 2.8
–4.1 1.1 –2.7 –0.7 0.4 1.8 1.5 2.5 1.8 –0.1 0.8 2.2 0.2 –0.6 3.2 –1.0 6.1
6.6 1.0 0.8 5.2 1.1 0.4 0.4 0.9 7.1 1.7 0.9 4.9
2.7 1.0 1.0 2.9 0.9 1.9 0.1 1.6 –0.7 0.0 0.2 –0.4
1.7 –0.6 1.5 3.6 1.5 1.8 0.3 –0.4 0.9 1.5 –1.0 2.0
–1.0 –1.6 0.5 0.7 0.6 –0.1 0.1 –2.0 1.6 1.5 –1.1 2.4
Source: Appendix table A-1 (Triplett and Bosworth 2004).
to the macroeconomic analysis. The results for all fifty-four industries are provided in Triplett and Bosworth (2004). In the services-producing sector, the overall growth in labor productivity and MFP camouflages a wide disparity of trends within the individual two-digit industries. Advancing labor productivity in four large services industries—telephone, wholesale trade, retail trade, finance (both brokerage and depository institutions)—drove the overall sector improvement. Labor productivity gains in these industries ranged from 3 to over 10 percent per year after 1995, in all cases representing acceleration over the corresponding rate before 1995 (table 14.2). These four industries represent over a quarter of total value added in the private nonfarm business sector. However, services-sector labor productivity growth is not just a story of
421
Services Productivity in the United States Table 14.3
Growth in multifactor productivity in 29 service industries, 1987–2001 (annual trend rates of change based on gross output) Trend growth in multifactor productivity
Industry Railroad transportation Local and interurban passenger transit Trucking and warehousing Water transportation Transportation by air Pipelines, except natural gas Transportation services Telephone and telegraph Radio and television Electric, gas, and sanitary services Whole trade Retail trade Depository institutions Nondepository institutions Security and commodity brokers Insurance carriers Insurance agents, brokers, and service Real estate (excluding owner-occupied housing) Hotels and other lodging places Personal services Business Services Auto repair, services, and parking Miscellaneous repair services Motion pictures Amusement and recreation services Health services Legal services Educational services Other services
Domar weight
1987–1995
1995–2001
Change
0.7
3.4
1.5
–1.9
0.4 3.4 0.6 1.9 0.1 0.6 4.3 1.2 5.6 12.4 17.4 5.6 1.4 2.4 4.1 1.3
–1.0 0.9 1.6 2.5 –2.8 –0.3 1.7 1.6 0.5 1.5 0.2 0.2 –0.2 3.1 –0.1 –3.6
1.3 –0.1 0.2 –0.5 1.6 0.2 1.2 –4.5 –0.6 3.1 2.9 1.5 2.1 6.6 0.0 –0.1
2.3 –1.0 –1.4 –2.9 4.4 0.5 –0.5 –6.2 –1.1 1.6 2.7 1.3 2.4 3.5 0.2 3.5
11.2 1.7 1.4 7.8 1.9 0.7 0.9 1.6 10.7 2.2 1.6 8.5
0.4 0.0 –0.9 0.9 –1.4 –1.1 –1.2 0.1 –1.7 –0.8 –0.2 –0.3
1.4 –1.3 0.4 –0.6 1.4 –1.6 0.2 –1.1 –0.5 0.9 –0.8 –0.1
1.0 –1.3 1.3 –1.5 2.8 –0.5 1.4 –1.2 1.2 1.7 –0.5 0.2
Source: Appendix table A-1 (Triplett and Bosworth 2004).
a small number of large industries. Of the twenty-nine detailed services industries, twenty-four experienced labor productivity growth after 1995 and, of the positive growth industries, seventeen experienced acceleration.7 In two industries, accelerations or decelerations were marginal (only 0.1 percentage point), so they might better be set as zero acceleration industries. Negative labor productivity growth occurred after 1995 in five indus7. This contrasts with the goods-producing sector, where post-1995 labor productivity growth was positive in twenty-four out of twenty-five industries, but accelerated in only fourteen of the twenty-four.
422
Barry P. Bosworth and Jack E. Triplett
tries (two fewer than before 1995), but in one of them (local transit) labor productivity actually accelerated, that is, the negative productivity growth became less negative. Multifactor productivity growth shows a more mixed picture in services industries (table 14.3). The 2001 recession is not a factor in this as a similar mix was found in our previous paper, for which the post-1995 period ended with 2000. Strong MFP growth in a number of large industries—telephone, retail and wholesale trade, and finance—was sufficient to offset negative productivity growth in other large industries, including hotels, health, education, entertainment/recreation, and the “other services” (which is a combination of several two-digit SICs). Multifactor productivity growth was actually negative in twelve of the twenty-nine industries after 1995 (three marginally so). More than half of the services industries experienced accelerating MFP after 1995. Acceleration after 1995 is associated with large swings from negative to positive MFP growth in several industries (see, for example, local transit, pipelines, auto repair, and legal services) and strong MFP growth in the big industries of trade and finance. However, the acceleration of MFP growth in medical care (though growth is still negative!) is one area where the result is influenced by a methodological break in the index of real output because new producer price index (PPI) measures of price changes begin in 1991. Methodological breaks also occur in other industries, such as miscellaneous services. In summary, post-1995 productivity growth in the United States—both productivity and MFP—was a product of strong and widespread productivity growth in the services industries. Because services industries by and large did not exhibit strong productivity growth in the previous period, the acceleration of U.S. productivity growth after 1995 is also a product of developments in the services industries. 14.5 The Aggregation of Industry Productivity Measures The fifty-four industries in the data set vary widely in size. Thus, while tables 14.2 and 14.3 report changes in labor productivity and MFP at the industry level, those tables do not show which of the industries made the largest contributions to the post-1995 surge of aggregate productivity growth. Additionally, we need to make sure that our industry productivity results are consistent with the macro-level results that have appeared in other studies, such as Oliner and Sichel (2000, 2002) and Jorgenson and Stiroh (2000).8 In this section, we aggregate our industry productivity measures and 8. This responds to a point raised in oral discussion of our previous paper, a point that could not be answered until data for all fifty-four industries were analyzed.
Services Productivity in the United States
423
show the contributions of individual industry productivities to aggregateand sector-level productivity measures. We find that the industries within the services sector account for the bulk of U.S. productivity growth after 1995, both labor productivity and MFP. Services industries account for all of the post-1995 acceleration. The goods-producing industries, taken together, make no net contribution to the recent acceleration of U.S. productivity growth. 14.5.1 Industry and Aggregate Productivity Relations We presented, in table 14.1, sector-level productivity estimates formed by aggregating industry outputs and inputs and then computing productivity at the aggregate level. We call such measures “direct” aggregate-level productivity measures, or direct sector-level measures, such as the goodsproducing and services-producing sectors, manufacturing durables production, and so forth. Direct aggregate or sector productivity growth is not just the aggregation of productivity changes within the individual industries contained in the sector. Aggregate productivity can also change because of reallocations across industries. As we (and others, including Stiroh [2002] and Jorgenson, Ho, and Stiroh [2006]) show, aggregated industry productivity estimates generally exceed direct aggregate-level productivity change because of reallocation of resources across industries. These reallocation effects are an important and interesting part of the productivity resurgence story that has been overlooked in macro productivity studies. We rely on Stiroh’s (2002) formula that relates the industry measures of gross output labor productivity to aggregate value added per worker: (3)
d ln LPV
∑ w d ln LP ∑ w d ln L d ln L ∑ m (d ln M d ln Q ), Q i
i
i
i
i
i
i
i
i
i
where LPV aggregate value added per worker, LPQi gross output per worker in industry i, wi the two-period average of the share of industry i’s nominal value added in aggregate value-added, and mi the two-period average of the ratio of industry i’s nominal purchased inputs to aggregate value added, and, K, L, and M are the standard notations for capital, labor, and intermediate inputs. In this formulation, we can think of d ln LPV as the direct aggregate-level labor productivity growth discussed earlier and displayed in table 14.1.
424
Barry P. Bosworth and Jack E. Triplett
Equation (3) shows that the direct aggregate-level labor productivity estimate is a combination of (a) an industry productivity effect equal to the weighted sum of the growth in the industry productivities, where the weights are the industry shares of total value added; and (b) two reallocation terms that capture the shift of output among industries with variations in their levels of labor productivity and intermediate input intensity.9 As an intuitive example, suppose industry A contracts out a portion of its activities to industry B. This intermediate deepening (d ln Mi d ln Qi) may raise labor productivity in industry A (presuming that industry A rids itself of labor employed in its own less productive activities), because less labor is required per unit of output in industry A. But contracting out cannot by itself raise aggregate labor productivity; it will only cause aggregate labor productivity to rise if industry B is more productive in the contracted activities than was industry A. The reallocation terms capture this effect. They will be positive when shifts in economic activity go from lessproductive to more-productive industries and will be negative in the opposite case. Domar (1961) expressed the rate of aggregate MFP growth as a weighted average of the industry (gross output) MFP growth rates, with weights equal to the ratios of industry gross output to aggregate value added. That framework was generalized and developed more fully in Hulten (1978) and Gollop (1979). The important point is that productivity improvements at the industry level contribute to the aggregate economy in two ways—first, through direct cost reductions for the industries’ outputs that are part of final demand and, second, through reductions in the cost of intermediate inputs for other industries. For the aggregation of MFP, we have relied on the generalization of the Domar weights given in Jorgenson, Gollop, and Fraumeni (1987): (4)
d ln MFPV
∑ v d ln MFP ∑ v s d ln K s d ln K ∑ v s d ln L s d ln L Q i
i
k i i
i
k
i
i
l i i
l
i
i
where vi two-period average of the ratio of industry i’s gross output to aggregate value added (Domar weights), and si the two-period average share in industry i of the designated factor’s (K or L) income in nominal gross output. 9. This formulation differs from that of Nordhaus (2002) because it uses chain index weights (the vi terms), and it adds an additional source of reallocation by measuring labor productivity at the industry level with gross output instead of value added.
Services Productivity in the United States Table 14.4
425
Aggregation of industry contributions to labor and multifactor productivity growth, nonfarm business sector, 1987–2001 (trend growth rates, except where noted) Growth rate 1987–1995
1995–2001
Change
Labor productivity Direct aggregate levela 1.01 Intermediate inputs recallocation(–) –0.48 Labor reallocation –0.44 Value-added weighted industry aggregate 1.93
2.46 0.14 –0.31 2.63
1.45 0.62 0.13 0.70
Multifactor productivity Direct aggregate level 0.56 Input reallocation –0.09 Domar weighted industry aggregate 0.66
1.44 –0.14 1.58
0.88 –0.04 0.92
Source: Equations (3) and (4) of text, and appendix tables A-5 and A-6 (Triplett and Bosworth 2004). a Differs from table 14.2 because it is a trend rate of change.
Our aggregations of both labor productivity and MFP use Törnqvist chain indexes; that is, the weights are averages of adjacent periods, not single-period or base-period weights.10 The Domar weights (the first element of equation [4]) can best be thought of as the product of two steps in the aggregation: (a) the scaling up of the change in MFP at the industry level by the ratio of gross output to value added at the industry level, and (b) the aggregation using value added weights.11 14.5.2 Sector Aggregation of Industry Productivity Using equations (3) and (4), a summary of the industry contributions to the growth in the direct aggregate (value added) measures of labor productivity and MFP are shown in table 14.4. Because the contributions of industry productivity changes are offset by resource reallocations (the among industries effects) that reduce the aggregate gain, the aggregation of industry labor productivity estimates more than accounts for the growth of aggregate productivity in both periods. For example, in 1987 to 1995, the aggregation of industry labor productivity improvements (the within industry effects, shown in italics in table 14.4) yields 1.93 percent growth per year, which is nearly twice as much productivity growth as is recorded at the aggregate level (1.01 percent). 10. Domar (1961) assumed a Cobb-Douglas function, which implies base-period weights in a logarithmic index. 11. At the level of individual industries, MFP computed from the gross output framework will always be less than MFP computed from the value added data; however, the contribution to the aggregate MFP is the same for both concepts.
426
Barry P. Bosworth and Jack E. Triplett
On the other hand, because the reallocation terms have had a less negative influence in recent years, more of post-1995 labor productivity growth within the industries feeds through to the aggregate level—the weighted industry productivity changes (2.63 percent per year) total only 0.17 points higher than the direct aggregate estimate (2.46 percent). Put another way, the aggregate post-1995 acceleration of 1.45 (2.46 – 1.01) percentage points per year in labor productivity growth is boosted by changes in (i.e., less negative) reallocation terms. For this reason, the acceleration (1.45 points) in aggregate productivity growth is roughly twice as large as is evident from a straight aggregation of the fifty-four individual industries (0.70 points per year).12 The lower part of table 14.4 indicates that the reallocation terms are less important in the aggregation of the (gross output) industry measures of MFP growth. The aggregation of industry MFP is formed using Domar weights, as indicated in equation (4). The aggregation of industry MFPs is larger than direct aggregate-level MFP for both periods, but the reallocation term is small (only –0.14, for 1995–2001). Moreover, the acceleration in MFP is the same (about 0.9 points), whether calculated from the direct aggregate or by aggregating industry MFPs. 14.5.3 Industry Contributions to Aggregate Productivity Growth The contributions of individual industries to aggregate productivity growth are shown, for all twenty-nine services industries and for the major aggregates, in table 14.5. The industry contributions in table 14.5 sum to the totals that are given by the first terms in equations (3) and (4), that is, to the bottom line of table 14.4. This aggregation of the industry productivities is repeated as the top line in table 14.5. As we have already noted, the total industry productivity contribution is larger than the direct aggregate-level productivity change shown in table 14.4 for the nonfarm business aggregate because the direct industry contributions include reallocation effects. Similarly, the sector aggregations in table 14.5 (indicated in italic type) are the sums of the industry contributions within the sector. Accordingly, one should interpret industry (and sector) contributions in table 14.5 in the following way: they show the contribution of industry i (or the industries in sector j) to the total of all industry contributions to productivity change. For example, table 14.5 shows that the two machinery industries (within which are located computer and semiconductor manufacturing) contribute about 17.5 percent of the total increase in industry labor productivity 12. This variation between the aggregate and the industry results is largely due to changes in the relationship between gross output and value added—what we have labeled reallocation of the intermediate inputs. If labor productivity is measured at the industry level using value added, the reallocation term is limited to changes in the distribution of labor among the industries, which does not change very much before and after 1995.
483–484 49 50–51
48
SER
36,38
35
15–17
GD
SIC code
Table 14.5
Service-producing industries Transportation Communications Telephone and telegraph Radio and television Electric, gas, and sanitary services Wholesale trade
Goods-producing industries Agricultural services, forestry, and fishing Mining Construction Manufacturing Durable goods Industrial machinery and equipment Electronic equipment and instruments Nondurable goods
Private nonfarm business
Industry name
Yes Yes Yes No No No No
No Yes 70.4 4.0 3.4 2.6 0.7 3.4 8.5
3.25 9.4
2.32
1.16 0.08 0.15 0.15 0.00 0.07 0.31
0.28 0.14
0.15
0.0 0.7 –0.01 0.72 0.58
0.6 1.9 5.3 21.7 12.3
No Yes No Yes Yes No
0.77
1.93
1987–95
29.6
100.0
Value-added weight
Yes
Yes
Aggregate
1.92 0.04 0.22 0.21 0.02 0.06 0.36
0.31 0.16
0.15
0.01 0.01 –0.06 0.76 0.60
0.71
2.63
Contribution (1995–2001)
Labor productivity
0.76 –0.04 0.07 0.05 0.01 –0.01 0.05
0.04 0.02
0.00
0.01 –0.06 –0.05 0.04 0.02
–0.06
0.70
Change
113.8 7.8 5.6 4.3 1.2 5.6 12.4
7.19 28.1
5.54
0.9 3.1 9.3 59.7 31.7
73.1
186.9
Domar weight
0.27 0.10 0.09 0.07 0.02 0.03 0.18
0.20 –0.01
0.10
–0.1 0.04 0.02 0.34 0.35
0.39
0.66
1987–95
1.20 0.01 0.00 0.06 –0.06 –0.03 0.38
0.29 –0.13
0.22
0.00 –0.03 –0.05 0.45 0.59
0.38
1.58
0.09 –0.12
0.12
0.01 –0.06 –0.07 0.11 0.23
–0.01
0.92
Change
0.93 –0.10 –0.09 –0.01 –0.08 –0.07 0.20 (continued )
Contribution (1995–2001)
Multifactor productivity
Industry contributions to labor and multifactor productivity growth, nonfarm business sector, 1987–2001 (trend growth rates)
0.16 0.05 0.02 0.00 0.11 0.01 0.01 0.00 0.01 –0.07 –0.01 0.00 –0.02
6.6 24.6 1.0 0.8 5.2 1.1 0.4 0.4 0.9 7.1 1.7 0.9 4.9
No No No No No
–0.02
0.8
No No Yes No No No No No No
0.15 0.18 0.12 0.01 0.09 –0.01
1987–95
11.3 8.7 4.0 0.6 1.4 1.9
Value-added weight
0.00 0.06 0.02 –0.01 0.10
0.11 0.43 –0.01 0.01 0.22 0.02 0.01 0.00
0.02
0.38 0.31 0.13 0.01 0.18 –0.04
Contribution (1995–2001)
Labor productivity
No Yes No No No No
Aggregate
Source: Appendix tables A-5 and A-6 (Triplett and Bosworth 2004).
80 81 82 83–87
70 72 73 75 76 78 79
65
60 61 62 63 64
Retail trade Finance and insurance Depository institutions Nondepository institutions Security and commodity brokers Insurance carriers Insurance agents, brokers, and service Real estate (excluding owneroccupied housing) Other services industries Hotels and other lodging places Personal services Business Services Auto repair, services, and parking Miscellaneous repair services Motion pictures Amusement and recreation services Health services Legal services Educational services Other services
Industry name
SIC code
52–59
(continued)
Table 14.5
–0.02 0.14 0.03 –0.01 0.13
–0.05 0.38 –0.02 0.01 0.11 0.01 0.00 0.00
0.05
0.23 0.13 0.01 0.00 0.10 –0.02
Change
1.6 10.7 2.2 1.6 8.5
11.2 39.1 1.7 1.4 7.8 1.9 0.7 0.9
1.3
17.4 14.8 5.6 1.4 2.4 4.1
Domar weight
0.00 –0.18 –0.02 0.00 –0.03
0.05 –0.23 0.00 –0.01 0.06 –0.03 –0.01 –0.01
–0.04
0.04 0.01 0.01 –0.01 0.06 0.00
1987–95
–0.02 –0.06 0.02 –0.01 –0.01
0.16 –0.15 –0.02 0.01 –0.07 0.03 –0.01 0.00
0.00
0.50 0.34 0.09 0.04 0.21 0.00
Contribution (1995–2001)
Multifactor productivity
–0.02 0.13 0.04 –0.01 0.01
0.11 0.08 –0.02 0.02 –0.12 0.05 0.00 0.01
0.04
0.46 0.32 0.08 0.05 0.15 0.00
Change
Services Productivity in the United States
429
([0.15 0.31]/2.63) between 1995 and 2001 and 32 percent ([0.22 0.29]/ 1.58) of the total industry MFP growth. In contrast, the post-1995 resurgence in labor productivity can be traced largely to productivity growth in the services-producing industries. Of the total labor productivity growth of 2.63 percent per year after 1995, services industries account for 73 percent of the total (1.72/2.63), while goodsproducing industries account for the rest (27 percent, or 0.71/2.63). Improvements within durables manufacturing are more than offset by slow productivity growth in mining and continued outright declines in construction. Of the fifty-four industries, thirty industries show an increased contribution after 1995, and nineteen of those are in services. Within services, the largest contributors to post-1995 labor productivity growth are retail and wholesale trade, finance (specifically, brokerage firms), business services, and a miscellaneous grouping of other services.13 Each of the first three of these large services subsectors contributes as much or more to aggregate post-1995 productivity growth as either industrial machinery or electrical machinery, which have received so much attention because of their electronics components. These five services industries represent 70 percentage points of the post-1995 aggregate acceleration in labor productivity (see the “changes” column of table 14.5), and the next ten most important contributors to the acceleration (all of which are in services) add only 30 percentage points. Many of the industries that made the largest contributions to the resurgence of growth in labor productivity also play a large role in the acceleration of MFP growth. Again, the improvements are dominated by the gains in the services-producing industries, which contribute three quarters (1.20/1.58) of the MFP growth, post-1995, and 0.92 points of the net 0.88 points of acceleration (that is, more than the total). The top contributors to the post-1995 MFP acceleration (retail trade, wholesale trade, brokerage firms, and health) are all in services, closely followed by industrial machinery, which includes computers.14 As shown in the table, the contribution of durable-goods manufacturing to the improvement is large, but it is offset by declines in other goods-producing industries, including nondurables manufacturing. Twenty-seven of the fifty-four industries show a post-1995 acceleration of the trend growth in MFP, and seventeen are services-producing industries. Despite the similarity of the large contributing industries, the cross-
13. As mentioned previously, we believe that the productivity improvements recorded in other services are partly due to changes in the methodology for measuring the price deflators for output. 14. The large positive contribution of health arises because the MFP change is less negative after 1995.
430
Barry P. Bosworth and Jack E. Triplett
industry correlation between the post-1995 acceleration of labor productivity and MFP is a surprisingly low 0.33. There is also a large change in the role of business services, which was a major source of the rise of labor productivity, but it makes a negative contribution to the improvement in overall MFP growth. Its positive contribution to labor productivity is largely the result of a rapid increase in its weight; labor productivity growth was high but not accelerating after 1995. However, a large increase in purchases of intermediate inputs results in a post-1995 decline in MFP. 14.6 The Role of IT Capital A number of studies have reported that increasing use of IT capital contributed to the acceleration of labor productivity after 1995, in the standard paradigm of capital deepening, but that non-IT capital per worker did not accelerate after 1995 (see, for example, Oliner and Sichel 2002). Using the labor productivity decomposition in equation (1) and applying it to the nonfarm value added data, we find the same result: overall, increasing IT capital per worker contributed 0.85 points to labor productivity (value added per worker) growth after 1995, and 0.49 percentage points to acceleration (line 1 of table 14.6). Non-IT capital services contributed positively
Table 14.6
Contributions of IT capital to labor productivity growth, by industry, 1987–2001 (trend rates of change) Contribution to industry
Industry Private nonfarm business Goods-producing industries Mining Construction Manufacturing Durable goods Nondurable goods Service-producing industries Transportation Communications Electric, gas, and sanitary services Wholesale trade Retail trade Finance and insurance Real estate (excluding owner-occupied housing) Other service industries
1987–1995
1995–2001
Change
0.36
0.85
0.49
0.12 0.09 0.06 0.11 0.12 0.15 0.23 0.13 0.86 0.25 0.49 0.11 0.62 –0.01 0.14
0.19 0.24 0.09 0.18 0.24 0.23 0.59 0.31 1.29 0.25 1.42 0.26 1.09 0.02 0.47
0.07 0.15 0.03 0.07 0.12 0.08 0.37 0.17 0.43 0.00 0.93 0.15 0.48 0.04 0.33
Source: The direct estimates of the contribution to labor productivity in individual industries are from the gross-output estimates of table A-1, except for the nonfarm aggregates which are value added estimates from table A-2 (Triplett and Bosworth 2004).
Services Productivity in the United States
431
to growth, but only a little less than 0.1 point to acceleration (estimate not shown in the table, but incorporated into appendix table A1 of Triplett and Bosworth [2004]). Again, as with so many aspects of recent U.S. productivity performance, most of the IT capital deepening effect on U.S. labor productivity growth in recent years originates in the services industries. As shown in the lefthand side of table 14.6, the increased use of IT contributed 0.59 percentage points of labor productivity growth in the services-producing industries after 1995, which was 0.4 points more than the contribution of IT capital in these industries in the previous period (0.23). In contrast, IT contributed less than a tenth of a point (0.07) to labor productivity acceleration in the goods-producing industries. Triplett and Bosworth (2006) show that the service-producing industries are also more intensive users of IT than the goods-producing industries. It is thus not surprising that in such ITintensive industries as communications, wholesale trade, and finance and insurance, IT contributes substantially to their post-1995 labor productivity growth—1.29, 1.42, and 1.09 points, respectively (left-hand side of table 14.6). Table 14.6 shows the contributions of IT to the change in labor productivity within each industry. In table 14.7, we use the Domar weights to compute the IT contribution in individual industries to the total IT contribu-
Table 14.7
Contributions of IT capital to aggregate labor productivity growth, nonfarm business sector, 1987–2001 (trend rates of change) Contribution to aggregate
Industry
Domar weight 1987–1995 1995–2001 Change
Private nonfarm business
186.9
0.38
0.77
0.39
Goods-producing industries Mining Construction Manufacturing Durable goods Nondurable goods Service-producing industries Transportation Communications Electric, gas, and sanitary services Wholesale trade Retail trade Finance and insurance Real estate (excluding owner-occupied housing) Other service industries
73.1 3.1 9.3 59.7 31.7 28.1 113.8 7.8 5.6 5.6 12.4 17.4 14.8
0.09 0.00 0.01 0.09 0.04 0.04 0.28 0.01 0.05 0.01 0.06 0.02 0.08
0.15 0.01 0.01 0.13 0.08 0.06 0.62 0.02 0.07 0.01 0.18 0.05 0.13
0.06 0.00 0.00 0.05 0.04 0.01 0.34 0.01 0.02 0.00 0.12 0.03 0.05
11.2 39.1
0.00 0.05
0.00 0.16
0.00 0.11
Note: The contributions to the aggregate are computed using Domar weights and gross output at the industry level and aggregating up to the subsector and sector level.
432
Barry P. Bosworth and Jack E. Triplett
tion.15 The services-producing industries are responsible for 80 percent (0.62/0.77) of the contribution of IT capital to post-1995 productivity growth in the nonfarm economy. The contributions were particularly large from wholesale trade, finance, and other services (primarily business services and health). We have shown that the IT contribution to labor productivity growth and also the MFP contribution are both located largely in services industries. It is perhaps tempting to link these two results to infer that in some way acceleration in MFP growth in the services industries is linked to their increased use of IT capital. Both the productivity model and the evidence, however, are inconsistent with this hypothesis. In the growth accounting framework, MFP is a residual; it shows the productivity growth that is not attributable to growth in inputs (including growth in IT capital inputs). Thus, IT usage should not, in principle, be associated with MFP growth because the influence of IT on productivity growth is already estimated. However, IT and MFP growth could be related in the data for at least three reasons: 1. In the growth accounting framework, one assumes that inputs, including IT inputs, earn normal returns. If IT in fact earns a larger net return than other capital, which is sometimes asserted, then IT’s contribution to output growth would be understated, the error would inappropriately inflate MFP, and IT and (mismeasured) MFP would be correlated. 2. It is often asserted that IT investment involves “coinvestments.” Many of the coinvestments are probably not counted in the national accounts investment data (software is capitalized and included, so software is not a factor in the coinvestment that is omitted). If coinvestments are missed, then the total investment associated with IT is understated by the amount of the uncounted coinvestment, IT’s contribution is also understated, the error inappropriately inflates MFP, and the error is correlated with IT. 3. IT may facilitate entrepreneurial innovation. Investment in innovative resources and innovative activity may not be counted directly in national accounts, so again there is an understatement of inputs, a consequent mismeasurement of MFP, and the measurement error is correlated with IT. We tested the IT-MFP hypothesis in the following manner. First, we constructed measures of IT intensity by industry (Triplett and Bosworth 15. As in other parts of this paper, the aggregation of the industry contributions (shown in table 14.7) does not equal the IT contribution to the direct productivity measure because of reallocation effects discussed earlier. For that reason, the top line of table 14.7 does not equal the top line of table 14.6. Note that reallocation effects reduced the total contribution of IT before 1995, but added to it after 1995.
Services Productivity in the United States
433
2006). Then we regressed the change in MFP post-1995 on the level of IT capital intensity in 1995 (we used the proportion of capital services arising from IT for this purpose). The correlation coefficient was only 0.05 and was not significant.16 From this, we conclude that there is no association between IT investment and MFP growth (which is what the growth accounting model suggests) and no evidence that IT is “special.” Information technology can be analyzed like any other productive input. 14.7 Consistency with Other Studies: IT-producing and Services Industries Studies using macro approaches, including Oliner and Sichel (2000) and Gordon (2000, 2002), find MFP acceleration in the United States after 1995, but also estimate (in somewhat indirect ways) that two-thirds to all of the aggregate acceleration is accounted for by MFP acceleration in the industries that produce IT investment goods. For example, Gordon (2002, 65) concludes: “There has been no acceleration of MFP growth outside of computer production and the rest of durable manufacturing.” The view that all recent MFP growth is in the IT-producing industries suggests that the post-1995 productivity acceleration is fragile because it rests entirely in a single set of goods-producing industries. Additionally, it suggests that recent U.S. productivity performance differs from that of Europe mainly because the United States has a larger IT-producing sector. In contrast, our finding that MFP acceleration is broadly—though not universally—based in services industries leads to the view that something significant did change in the U.S. economy. Moreover, changes in IT-using industries probably explained a good amount of the recent productivity differences between the United States and Western Europe. Thus, reconciling the apparently conflicting findings has considerable importance. Before considering the research results, we address an essential methodological point. 14.7.1 A Note on “Exhausting” Total MFP The macro studies “back off” estimates of MFP in IT-producing industries from the growth of direct aggregate-level MFP. Doing so seems to exhaust or nearly exhaust total MFP growth and to leave little room for MFP growth in the rest of the economy. For example, backing off Oliner and Sichel’s IT MFP estimate (0.77 percent per year) from the trend BLS MFP growth estimate (1.17 percent per year) appears to leave only 0.40 percent per year MFP growth outside the IT producing industries (see the first col16. The same correlation on labor productivity produced a significant coefficient, as expected. See Triplett and Bosworth (2004, 31).
434 Table 14.8
Barry P. Bosworth and Jack E. Triplett Alternative “backing out” exercises for comparisons of IT-producing and other industries multifactor productivity (MFP), 1995–2001 (trend rates of change) BEA data set
BLS MFP
Direct MFP estimate
Sum (industry MFPs)
Sum (positive industry MFPs)
1. Nonfarm business MFP (table 14.5)
1.17
1.44
1.58
2.09a
Contribution of: 2. Machinery industries MFP (table 14.6) 3. Oliner and Sichel (2002) IT-industry MFP 4. Remainder (row 1 – row 2 MFP outside IT) 5. Remainder (row 1 – row 3 MFP outside IT)
0.51 0.77 0.66 0.40
0.51 0.77 0.93 0.67
0.51 0.77 1.07 0.87
0.51 0.77 1.58 1.32
Source: Authors’ computations as explained in text. a Sum of positive (only) industry MFP growth, from tables 14.2–14.6.
umn of table 14.8, row 5).17 This calculation is the basis for Gordon’s statement, quoted previously. If one backs the same IT estimate from the growth in the direct aggregate-level MFP measure from BEA data (which is greater, for the reasons discussed in section 14.4), MFP growth outside IT appears a little greater because the overall MFP growth estimate is larger in the BEA database, as explained earlier (refer to the second column of table 14.8). However, we showed in section 14.5 that the sum of all industries’ MFP growth exceeds growth in the direct MFP measure because of reallocations. If one wants to determine whether non-IT industries contribute to MFP growth, clearly the starting point is the aggregation of industry MFP growth rates, not the direct aggregate-level measure that includes reallocations. As the third column of table 14.8 shows, that backing off exercise leaves more room for non-IT MFP. For illustration, backing off our industry IT MFP measure (0.51) from the net industry MFP change leaves 1.07 percent per year contribution to net MFP growth from industries outside the IT-producing sector, more than twice the amount that originates inside the IT-producing sector. One might think of column three as the answer to the question: “Has there been any net MFP growth outside the IT sector?” But if one really wants to determine whether there has been any MFP growth outside the IT sector, then the starting point should be the sum of all the industries having positive MFP growth. This is shown in the last column of table 14.8. 17. For comparability, we show trend rates of MFP growth in table 14.8. Using Oliner and Sichel’s (2002) average annual rate of MFP growth to the 2001 recession year (0.99) yields only 0.23 for the non-IT growth rate.
Services Productivity in the United States
435
Positive MFP growth in industries outside the IT-producing sector contributes three times as much MFP growth as do the IT-producing industries, using our measure of IT MFP growth, and twice the IT-industry contribution, using Oliner and Sichel’s IT estimate. By any measure of MFP in IT production, MFP growth outside IT production is substantial and greatly exceeds the MFP contribution from IT production. There is no necessary conflict between our finding of substantial MFP growth in services industries and the finding of high MFP growth in the ITproducing industries. The misinterpretation arises because some researchers, observing a large MFP contribution from the production of IT, have concluded incorrectly that there can be no other similar contributions of equal size from other industries. Jorgenson, Ho, and Stiroh (2005, 462) make the same point: The “conclusion . . . that all productivity growth originates in these two IT-producing industries . . . would be highly misleading, since the sum of the contributions of . . . agriculture and wholesale trade . . . also exhaust productivity growth for the economy as a whole.” 14.7.2 Reconciliation Our estimates, however, are different from those of other studies. Two major alternatives to our study are the macro study by Oliner and Sichel (2002) and the industry study by Jorgenson, Ho, and Stiroh (2006). With respect to the contributions of IT capital deepening and of MFP in the IT-producing industries, Gordon’s (2002) influential study relies on Oliner and Sichel’s estimates, though he also buttresses them with independent calculations of his own. Accordingly, we focus on the Oliner and Sichel and Jorgenson, Ho, and Stiroh studies. Our study differs from the others in its output measure and its labor input measure, both of which are tied to our use of the BEA industry database. The Oliner and Sichel (2002) study relies on the BLS output measure (from the expenditure side of the accounts), which means that their output measure grows less rapidly after 1995 than our income-side measure because the statistical discrepancy (the difference between the two sides of the accounts) grew after 1995. Other things equal, the income-side measure we use gives more labor productivity after 1995 and more MFP growth. The benchmark revision to GDP that was released in December 2003 raised the product-side estimate more than the income side, which implies that our product-side productivity measure hold up. Jorgenson, Ho, and Stiroh (2006) use a wider definition of output (it includes both government and the household sectors) than employed in our study or in Oliner and Sichel (2002). The wider measure grows somewhat more slowly, implying less MFP growth, other things equal (partly because the way government output is measured assures low productivity growth in the government sector). Additionally, the labor input in our study does not include a labor qual-
436
Barry P. Bosworth and Jack E. Triplett
ity adjustment, and it is based on employment, rather than hours. When labor quality is growing, this means that we have too much MFP growth because the contribution of the mismeasured input falls into MFP. All studies estimate capital deepening and distinguish IT capital deepening from improvements in the non-IT capital-labor ratio. Jorgenson, Ho, and Stiroh (2006) estimates of capital deepening and IT capital deepening are by far the largest, mainly because of their different output concept (the growth in IT capital services in the household “industry” after 1995 is the largest of any industry). Our estimate of IT capital deepening is slightly smaller (by 0.17 points) than that of Oliner and Sichel (2002). We do not know why, but this is not a major factor in the comparisons. Less capital deepening, of course, increases our aggregate MFP estimate, relative to Oliner and Sichel. Putting all this together, these three factors—difference in output measure, difference in estimates of IT capital deepening, and our omission of labor quality—cause our aggregate non-IT MFP estimate to exceed that of Oliner and Sichel (2002) by around 0.4–0.5 points after 1995.18 Our MFP estimate is more than twice that of Jorgenson, Ho, and Stiroh (2006), mostly because of the effects of including the household and government sectors in their estimates. 14.8
Measurement Issues
The BEA industry data set has been substantially improved in recent years. The situation has changed significantly since Baily and Gordon (1988) and the Griliches (1992) volume on services drew attention to some of the measurement problems. As discussed previously, the most notable change has been the inclusion of measures of gross output and intermediate purchases in a system that previously relied exclusively on value added (GDP originating) measures of output.19 At the industry level, gross output provides a measure that is much more closely aligned with the microeconomic concept of a production function and imposes fewer restrictions on the nature of the substitutions among factor inputs and technical change.
18. Triplett and Bosworth (2004, chapter 2) consider this matter at greater length, but a definitive answer awaits publication of Capital, Labor, Energy, Materials, and Services (KLEMS) data on the new North American Industry Classification System (NAICS) “Electronics and Computers” sector, which is the appropriate industry grouping for the purpose. Previous analyses have been conducted at the level of the total-machinery industries, which makes it difficult to extract the contribution of electronics manufacturing from the non-hightech sectors. 19. The expansion was made possible by the increased information on services provided by the Census Bureau surveys and the expansion of the Producer Price Index program of the BLS to cover a larger number of service industries.
Services Productivity in the United States
437
At the same time, the expanded usefulness of the data set has highlighted some of the remaining important problems. In the following sections we address four aspects: inconsistent data sources, a comparison of alternative output data sets, negative productivity growth industries, and shortcomings in the labor input data. 14.8.1 Inconsistent Data Sources At present, the BEA constructs the industry measures of value added and its components from sources that correspond to those used to measure the income side of the national accounts—that is, the Internal Revenue Services (IRS) for profits and the BLS for wages and salaries. Those data that are derived from company reports must be converted to an establishment basis. In contrast, the measures of gross output are constructed from the sources used to construct the input-output (I-O) accounts, primarily the Census Bureau business censuses and surveys, which focus directly on establishments. Intermediate purchases are then estimated residually as gross output minus value added. This contrasts with the I-O accounts that provide direct estimates of both gross output and purchased inputs, with value added being the residual. The industry estimates of value added (GDP originating) can differ substantially from those of the I-O accounts (see the detailed comparisons in Triplett and Bosworth [2004, chapter 2]). As noted by Yuskavage (2000), the differences are larger at the industry level with some offset within industry groups. Somewhat surprising, the percentage differences are larger and more volatile for the goods-producing industries, but that is partially a reflection of the more detailed division of the goods-producing industries. The quantity (constant price) measures of gross output are computed at the four-digit SIC level largely using price indexes from the BLS price programs and aggregated as chained indexes to the two-digit industry level. Information about the composition of purchased inputs is taken from the I-O accounts, but it must be interpolated for non-I-O years. Thus, purchased inputs lack the compositional detail needed to compute high-quality chain indexes. The volume measure of value added is effectively computed as the difference between the quantity values of gross output and purchased inputs. While one expects measures of labor productivity growth to vary between gross output and value added, the magnitudes are often very large and volatile over time. For our group of fifty-four industries, the standard deviation of the difference between the two growth rates is 3.6 percentage points even though the average growth is 2 percent in each case. It is unlikely that the volatility could result solely from changing patterns of outsourcing. Instead, all of the inconsistencies between the income and I-O data sources are concentrated in the residual calculations of each indus-
438
Barry P. Bosworth and Jack E. Triplett
try’s intermediate purchases. Purchased inputs matter less for MFP as the computation of MFP using either gross output or value added yields essentially the same estimates of its contribution to aggregate (value added) MFP. In the long run, the objective is to fully integrate the GDP by industry and the I-O accounts. The integration is currently incomplete because of insufficient source information, and the problem is particularly severe for services. Census Bureau sources cover about 90 percent of gross output but only 30 percent of purchased inputs. The business surveys of the Census Bureau are being expanded to provide more detail, and the BEA is planning to achieve a partial integration of its GDP by industry and the annual I-O accounts over the next several years. 14.8.2 Alternative Data Sets The BEA is not the only source of industry-level data. Two different programs of the BLS—its productivity program and its employment projections program—also produce industry data that can be used for productivity analysis. The BLS Productivity Office produces detailed industry output and productivity estimates within manufacturing. The manufacturing-output series of the BLS and the BEA are both gross output, and they both rely on Census Bureau shipments data. However, the BLS constructs its own measures of output and excludes an estimate of intramanufacturing shipments. At the level of two-digit SIC industries, the difference in output growth can be quite substantial, ranging from –0.8 percent to 1.0 percent per year over the 1995 to 2000 period. The differences seem too large to explain by changes in the amount of intramanufacturing shipments, but we do not know the sources. A recent very thorough and enlightening analysis of output trends as measured by the BEA and the BLS Productivity Office data is Fraumeni et al. (2006). More relevant for our focus on services, the employment projections program of the BLS produces detailed industry measures of output and employment over the period of 1972 to 2000, covering both goodsproducing and services-producing industries. This is a basic data source for the productivity studies of Dale Jorgenson and his colleagues. The data set includes output measures for a considerable number of the servicesproducing industries that we have used in our analysis. Table 14.9 provides a comparison of the output growth rates over the 1987 to 2000 period for twenty-eight of our twenty-nine industries, where it appears that the coverage by SIC codes is the same.20 It is evident from the table that growth rates for individual industries 20. The BLS projections office data set was not, unfortunately, considered in the otherwise valuable paper by Fraumeni et al. (2006); we gather because of internal bureaucratic reasons.
3.6 1.5 5.7 2.7 3.6 –1.6 6.2 5.4 2.3 2.5 4.4 2.9 1.5 8.2 9.5 0.7 –1.8 4.0 2.4 3.2 8.6 3.2 3.7 4.4 7.4 3.2 1.4 3.3 2.5
Railroad transportation Local and interurban passenger transit Trucking and warehousing Water transportation Transportation by air Pipelines, except natural gas Transportation services Telephone and telegraph Radio and television Electric, gas, and sanitary services Wholesale trade Retail trade Depository institutions Nondepository institutionsa Security and commodity brokers Insurance carriers Insurance agents, brokers, and service Real estate Hotels and other lodging places Personal services Business services Auto repair, services, and parking Miscellaneous repair services Motion pictures Amusement and recreation services Health services Legal services Educational services
Value-added weighted sum
2.1
1.0 1.6 4.0 1.4 4.8 –0.4 5.7 4.5 1.9 0.4 4.0 1.9 3.4 1.0 7.2 1.3 1.2 2.1 2.0 2.8 6.8 3.8 2.3 5.7 5.6 3.0 1.4 2.2
BLS
1987–1995
–0.5
–2.5 0.2 –1.7 –1.3 1.3 1.2 –0.5 –0.9 –0.4 –2.1 –0.3 –1.1 1.9 –7.2 –2.3 0.6 3.0 –1.8 –0.4 –0.4 –1.8 0.6 –1.4 1.3 –1.8 –0.1 0.0 –1.1
Difference
3.5
0.7 2.4 4.1 4.7 5.4 0.1 6.2 13.4 4.7 1.5 6.0 5.5 2.9 12.3 23.4 –1.5 4.1 3.5 3.2 2.7 11.1 4.1 1.0 3.0 3.9 2.6 2.9 2.7
BEA
3.0
–0.6 1.2 4.6 –0.5 1.5 –3.0 5.7 7.6 2.7 1.6 4.3 4.3 6.4 9.5 22.0 0.6 4.0 2.1 2.7 3.9 9.5 5.2 1.7 6.6 6.7 2.8 2.0 2.9
BLS
1995–2000
–0.5
–1.3 –1.1 0.6 –5.2 –3.9 –3.2 –0.5 –5.8 –2.0 0.1 –1.8 –1.3 3.5 –2.8 –1.3 2.2 –0.1 –1.3 –0.5 1.2 –1.6 1.1 0.8 3.6 2.8 0.2 –0.9 0.2
Difference
1.0
–2.9 0.9 –1.7 2.0 1.8 1.7 0.0 8.0 2.4 –1.0 1.7 2.6 1.4 4.1 13.8 –2.3 5.9 –0.5 0.8 –0.5 2.5 0.9 –2.7 –1.5 –3.5 –0.6 1.5 –0.6
BEA
Change
1.0
–1.6 –0.4 0.6 –1.9 –3.3 –2.6 0.0 3.1 0.7 1.2 0.2 2.4 3.1 8.4 14.8 –0.7 2.8 0.0 0.7 1.1 2.7 1.4 –0.6 0.8 1.1 –0.3 0.6 0.7
BLS
a
Source: Gross output measures from the BEA industry data set and the employment projections program of BLS at http://www.bls.gov/emp/hojme.htm. The BLS measure includes SIC 67 (Holding and other investment offices).
BEA
0.0
1.2 –1.3 2.2 –3.9 –5.2 –4.3 0.1 –4.9 –1.6 2.2 –1.5 –0.2 1.6 4.4 1.0 1.6 –3.0 0.5 –0.1 1.6 0.2 0.5 2.2 2.3 4.6 0.3 –0.9 1.3
Difference
Differences in growth rates of industry output, BEA industry accounts and BLS Office of Employment projections, 1987–2000 (average annual rates of change)
Industry
Table 14.9
440
Barry P. Bosworth and Jack E. Triplett
often differ substantially between the BEA- and BLS-projections data sets. The differences are large even for industries, such as transportation, communications, and utilities, where we would believe that the quality of the source data is quite high. For example, the BLS-projections data report a substantial slowdown in airline output growth (comparing 1995–2000 with the previous period), where the BEA data indicate acceleration. The BLS measures also report less growth in the large retail- and wholesale-trade sectors, where we found a large acceleration of growth in both labor productivity and MFP. On the other hand, the BLS-projections data show more output growth acceleration in depository banking, insurance, and the amusement and recreation industry (the latter is one of our negative productivity industries). Using value added weights, we find that the BEA data imply a slightly faster growth of output in the services-producing industries as a whole in both 1987–1995 and 1995–2000; but the magnitude of overall post-1995 acceleration is the same. Thus, despite the large differences at the level of individual industries, the two data sets are in surprisingly close agreement about the overall acceleration of output growth in the services-producing sector. As it has been our experience that the two agencies produce very similar employment estimates at the industry level, the BLS projections programs’ output measures seem to offer strong support for the finding in the BEA industry data of a large improvement of productivity growth in overall services, even though they conflict greatly with BEA output measures at the detailed level. We have been surprised by the degree of overlap between the industry programs and the BEA and the BLS Projections Office, yet it appears that there has been very little effort to compare and contrast their sources and methods. It seems evident that there would be substantial benefit to tracing down the sources of differences in the alternative output measures. It is confusing for the statistical agencies to publish such contradictory measures, particularly when the sources of variation are not documented. While we are unlikely to see movement toward an integrated U.S. statistical system (where such redundancies would be eliminated by consolidation of these statistical programs, thereby melding resources to improve the data), this is one area where there would be significant gains from greater coordination of research efforts between the two agencies.21 14.8.3 The Negative Productivity Growth Industries Negative productivity growth always attracts skepticism, and well it should. In our estimates, the following industries have negative labor productivity growth over the 1995 to 2001 interval: 21. Such a comparison is an obvious extension to the work reported in Fraumeni et al. (2006).
Services Productivity in the United States
• • • • • •
Education Amusement and recreation Hotels Insurance carriers Local transit Construction
441
–0.95 percent –0.41 percent –0.57 percent –1.66 percent –0.61 percent –1.12 percent
Analyses of the negative productivity issue include Corrado and Slifman (1999) and Gullickson and Harper (2002). Both studies set the negativeproductivity industries (a larger number in their studies than in our results) equal to zero and recomputed aggregate productivity growth. There is no doubt some value to this procedure as a “what if ?” exercise. However, we see little reason for supposing that cutting off the left tail of the distribution of productivity changes improves the estimate of the mean. Instead of mechanical “lopping off the tail” exercises, we believe that the statistical agencies should seek to identify the sources of the negative bias—that is, to take negative productivity growth as an indicator for allocating resources to improve measurement. From our experience with the Brookings economic measurement workshops and from other information and research, we offer the following hypotheses. Education Educational output was the subject of a Brookings workshop in which two conclusions emerged. (a) No agreed on measure of the output of the educational function itself exists, and (b) universities, and to an extent perhaps secondary education as well, are classic multioutput firms, in the sense that the cost function for their different activities is not separable on the inputs. For universities, joint outputs include, in addition to education of students, research, lodging and meal services, and entertainment (sports). These outputs interact with educational decision making (Ehrenberg [2000] provides numerous examples from his tenure as dean at a major university), but the output of these other activities is not normally included in the “industry’s” output, which is usually deflated only with an index of tuition. Interestingly, two of the joint products of universities (lodging and entertainment) also exhibit negative labor productivity growth (see our table 14.2), even when located in specialized firms. We suspect that these relationships are neither coincidental nor insignificant, even after allowing for the fact that universities do not pay even the minimum wage to many workers in their entertainment activity. Moreover, at least in some universities, the employment figures may be suspect if faculty members devote an increasing amount of time to outside pursuits that do not directly contribute to the output of their employers. One concludes from this that there are all kinds of measurement problems in computing productivity of the educational sector, covering the def-
442
Barry P. Bosworth and Jack E. Triplett
initions of current price output, the deflators, and counting the labor input. Jorgenson and Fraumeni (1992) and also O’Mahoney and Stevens (2004) estimate the output of education by assessing its contribution to human capital and, therefore, to lifetime earnings streams of graduates. Their estimates are far larger than the output that is presently recorded in national accounts, a result that is consistent with the hypothesis that educational productivity is biased downward because of mismeasurement of educational output. Amusement and Recreation We know of no recent research on the output of the amusement and recreation industries. Hotels For hotels, the McKinsey Global Institute (2001) found the poor labor productivity performance of hotels consistent with other evidence, including information from McKinsey’s own consulting practice. Some of the quality improvements in hotel services, notably computerized reservation services, are unpriced outputs that have clearly created benefits to the customer but are not captured in the output measures used in national accounts. Thus, properly measured hotel productivity might not have negative growth. Insurance We suspect that negative productivity for insurance carriers is the result of an inadequate and unworkable definition of insurance output in the National Income and Product Accounts (NIPA) and the System of National Accounts (SNA). The long international debate on this topic is reviewed in Triplett (2001). In the national accounts, insurance output is defined as “premiums minus claims,” which means that the insurance company is depicted as administering the policy on the behalf of the policy holders and not as absorbing and managing risk. However, Triplett points out that because no contract exists for the “service” of managing the claims pool on behalf of the policy holders, no such service can be priced. Thus, the concepts underlying the PPI price index for insurance (they price the premiums, which we think makes economic sense—see Sherwood 1999) are inconsistent with the national accounts view of insurance output. Insofar as insurance companies have improved their management of risk—which ought, other things equal, to reduce the margin of premiums minis claims—these improvements are outside the scope of the national accounts’ output measure. As additional evidence on this score, we note the peculiar behavior of the data for insurance carriers and insurance agents, considered together: at least one of them is almost always negative,
Services Productivity in the United States
443
but it is not always the same one, and improvements in the performance of one (or GDP revisions to one) are usually reflected in deterioration of the measured performance of the other. In Triplett and Bosworth (2004, chapter 7), we show that the national accounts convention for measuring insurance carrier output accounts for the industry’s negative measured productivity in the BEA industry accounts data. Using BEA data, we reestimate output and labor productivity growth using a premiums-plus-investment-income concept for current-price output, with appropriate adjustments to the deflators. The alternative output concept produces positive labor productivity growth in the insurance industry, not negative growth as does the current BEA output concept. The exercise suggests where at least one measurement problem lies. However, the industry’s rate of productivity growth remains small, on the order of 0.5 percent (half of 1 percent) per year. We speculate that better allowance for insurance companies’ handling of risk and better accounting for the value of its new products would increase the measured rate of insurance productivity even more. Local Transit Although we are not sure the data are correct, on its face the negative productivity growth in local transit is consistent with substituting its own internal labor for previously purchased inputs. Presumably, this would be the result of regulation, union contracts, and the general climate under which these services operate. The industry’s multifactor productivity growth in the recent period is quite respectable (1.29 percent per year) and goes in the opposite direction from its labor productivity. It is also possible that the industry is an example of inconsistency in the source data. Construction Construction is an industry whose productivity performance has puzzled many economists (see Baily and Gordon 1988; Pieper 1990). Ours is a paper on services, not on goods-producing industries, but we think that research on measuring the output of construction deserves high priority. Though major parts of construction output are deflated with hedonic price indexes and have been for many years, deflators for other parts are clearly inadequate. We understand that the producer price index program has turned attention to producing deflators for this industry. The BEA is introducing hedonic indexes for commercial construction in the 2003 benchmark revision, but these indexes rise more rapidly than the deflators they replace, which would make the negative productivity in construction even more negative.22 22. Based on conversations with Bruce Grimm of the BEA.
444
Barry P. Bosworth and Jack E. Triplett
14.8.4 Labor Hours and Input by Industry The labor input in our study is persons engaged in production, and not hours, which are the labor input in the BLS productivity reports and in Jorgenson, Ho, and Stiroh (2006); in addition, we do not apply a labor-quality adjustment. Neither of these aspects is included in the BEA industry data set, and we lacked the resources to estimate an index of labor quality at the industry level. Our analysis indicates that omission of labor quality creates problems for measuring industry MFP. The reliance on employment, rather than hours, is an equally serious problem. We have, however, little confidence in the estimates of hours across industries. The major source of industry hours is the BLS monthly establishment survey, known as the “Current Employment Survey.” The objective of this survey can only be described as archaic, for it persists in collecting hours and earnings information only for what it calls “production workers” in manufacturing and “nonsupervisory workers” in the rest of the economy. The BLS productivity program estimates the hours of nonproduction and supervisory workers, using whatever information it can find. Hours of self-employed and salaried workers are obtained from the BLS-Census monthly household survey, the Current Population Survey (CPS). Why the BLS emphasizes production and nonsupervisory workers for its establishment employment surveys defies understanding. Even on statistical grounds, the decision is questionable. With the huge changes in workplace organization and management in recent years, the boundary between what is a “production” and a “nonproduction” worker has become so blurred that it has lost its meaning. The same statement applies to “supervisory” and “nonsupervisory” workers outside manufacturing, except there the distinction has always been unclear. This should not be news to a government statistical agency, for the line between what is a supervisory and a nonsupervisory worker within government has also provoked great controversy. But even if the boundaries between what the BLS does and does not collect were sharply defined, devoting the huge amount of resources that are put into the BLS establishment program23 to collecting hours and earnings data on only a fraction of the workers shows a profound disregard for the data that are important for economic analysis. Surely we want to know employment, earnings, and hours for all workers, not just for some fairly arbitrarily defined subset of them. As we understand it, the BLS reasoning behind holding onto the “production worker/nonsupervisory worker” definition for its establishment surveys rests on preserving time series comparability. Although we, too, 23. In both budgetary and sample size (so in resplendent burden), this is one of the largest collections in the U.S. statistical system. Significantly, other countries seem to collect the same information at far less expense, Canada being one example.
Services Productivity in the United States
445
value time series continuity, it should not be at the cost of a failure to collect the information that is most relevant for analysis. In any event, blurring of the boundaries means that a constant definition does not produce time series comparability. When “measurement problems” come up in the analysis of productivity, most economists immediately think about deflators. For industry productivity, the lack of a well-measured labor input is an equally serious problem and more inexplicable because measuring worker hours in services industries is nowhere nearly so complicated as measuring services industries’ output prices—an area where the BLS (in its PPI program) has made exemplary progress in recent years. 14.9 Conclusions Using the relatively new BEA-BLS industry database, our study shows that the post-1995 growth in U.S. productivity is largely a story of developments in the services industries. For both labor productivity and MFP, productivity growth has been faster in services-producing industries than in goods-producing industries in recent years. Because this was not the case before 1995, services industries account for most of the acceleration in labor productivity and all of the acceleration of MFP after 1995. We emphasize that this is not just a story of growth in a small number of large services industries—twenty-four of twenty-nine services industries show labor productivity growth after 1995, and in seventeen of them, labor productivity growth accelerated. In more than half of the services industries, MFP accelerated after 1995. In terms of contributions, services industries account for nearly threequarters of total industry labor productivity growth and more than three quarters of total industry MFP growth in the period after 1995. Moreover, capital deepening from IT investment in services industries accounts for 80 percent of the total IT effect in the U.S. nonfarm sector after 1995. The services productivity story in recent years is a striking change from the past, and strikingly different as well from the stereotypical view that services industries are stagnant and without potential for technical changes or rapid productivity improvement. In recent years, services industries have become an engine of economic growth.
References Baily, Martin N., and Robert Gordon. 1988. The productivity slowdown, measurement issues, and the explosion of computer power. Brookings Papers on Economic Activity, Issue no. 2:347–420. Washington, DC: Brookings Institution.
446
Barry P. Bosworth and Jack E. Triplett
Baily, Martin N., and Robert Z. Lawrence. 2001. Do we have a new e-conomy? NBER Working Paper no. 8243. Cambridge, MA: National Bureau of Economic Research. Baumol, William J. 1967. Macroeconomics of unbalanced growth: The anatomy of urban crises. American Economic Review 57 (3): 415–26. Corrado, Carol, and Lawrence Slifman. 1999. Decomposition of productivity and unit costs. American Economic Review: Papers and Proceedings 89 (2): 328–32. Domar, Evsey D. 1961. On the measurement of technological change. The Economic Journal 71 (December): 709–29. Ehrenberg, Ronald G. 2000. Why can’t colleges control their costs. Paper presented at Brookings workshop on Measuring the Output of the Education Sector, Washington, DC. http://www.brook.edu/dybdocroot/es/research/projects/productivity/ workshops/20000407.htm. Fisher, Franklin. 2003. Comment. Journal of Economic Perspectives 17 (4): 227–35. Fraumeni, Barbara M., Michael J. Harper, Susan G. Powers, and Robert E. Yuskavage. 2006. An integrated BEA/BLS production account: A first step and theoretical considerations. In A new architecture for the U.S. national accounts, ed. Dale W. Jorgenson, J. Steven Landefeld, and William D. Nordhaus, 355–435. Studies in Income and Wealth, vol. 66. Chicago: University of Chicago Press. Gollop, Frank. 1979. Accounting for intermediate input: The link between sectoral and aggregate measures of productivity growth. In The meaning and interpretation of productivity, ed. Albert Rees and John Kendrick, 318–33. Washington, DC: National Academy of Sciences. Gordon, Robert. 2000. Does the “new economy” measure up to the great inventions of the past? Journal of Economic Perspectives 14 (4): 49–74. ———. 2002. The United States. In Technological innovation and economic performance, ed. Benn Steil, David G. Victor, and Richard R. Nelson, 49–72. Princeton, NJ: Princeton University Press. Griliches, Zvi, ed. 1992. Output measurement in the services sectors. Studies in Income and Wealth, vol. 56. Chicago: University of Chicago Press. ———. 1994. Productivity, R&D, and the data constraint. American Economic Review 84 (1): 1–23. Grunfeld, Yuhuda, and Zvi Griliches. 1960. Is aggregation necessarily bad? Review of Economics and Statistics 62 (1): 1–13. Gullickson, William, and Michael J. Harper. 2002. Bias in aggregate productivity trends revisited. Monthly Labor Review 125 (3): 32–40. Hulten, Charles. 1978. Growth accounting with intermediate inputs. Review of Economic Studies 45:511–18. Jorgenson, Dale W., and Barbara Fraumeni. 1992. Investment in education and U.S. economic growth. Scandinavian Journal of Economics 94 (Supplement): S51–S70. Jorgenson, Dale W., Frank M. Gollop, and Barbara M. Fraumeni. 1987. Productivity and U.S. economic growth. Cambridge, MA: Harvard University Press. Jorgenson, Dale W., Mun S. Ho, and Kevin J. Stiroh. 2005. Growth in U.S. Industries and Investments in Information Technology and Higher Education. In Measuring Capital in the New Economy, ed. Carol Corrado, John Haltiwanger, and Daniel Sichel, 403–78. Studies in Income and Wealth, vol. 65. Chicago: University of Chicago Press. Jorgenson, Dale W., and Kevin J. Stiroh. 2000. Raising the speed limit: U.S. economic growth in the information age. Brookings Papers on Economic Activity, Issue no. 2:125–211. Washington, DC: Brookings Institution. Lawson, Ann, Brian Moyer, Sumiye Okubo, and Mark Planting. 2006. Integrating industry and national economic accounts: First steps and future improvements.
Services Productivity in the United States
447
In A new architecture for the U.S. national accounts, ed. Dale W. Jorgenson, J. Steven Landefeld, and William D. Nordhaus, 215–61. Studies in Income and Wealth, vol. 66. Chicago: University of Chicago Press. Lum, Sherlene K. S., Brian C. Moyer, and Robert E. Yuskavage. 2000. Improved estimates of gross product by industry for 1947–98. Survey of Current Business 80 (June): 24–54. McKinsey Global Institute. 2001. U.S. productivity growth, 1995–2000. Washington, DC: McKinsey Global Institute. Moyer, Brian C., Mark A. Planting, Mahnaz Fahim-Nader, and Sherlene K. S. Lum. Preview of the comprehensive revision of the annual industry accounts. Survey of Current Business 84 (3): 38–51. Nordhaus, William D. 2002. Productivity growth and the new economy. Brookings Papers on Economic Activity, Issue no. 2:211–65. Oliner, Stephen D., and Daniel E. Sichel. 2000. The resurgence of growth in the late 1990s: Is information technology the story? Journal of Economic Perspectives 14 (fall): 3–22. ———. 2002. Informational technology and productivity: Where are we now and where are we going? Economic Review 87 (3): 15–44. O’Mahony, Mary, and Philip Stevens. 2004. International comparisons of performance in the provision of public services: Outcome-based measures for education. London: National Institute of Economic and Social Research, July. http:// www.swan.ac.uk/economics/res2004/program/papers/OMahonyStevens.pdf. O’Mahony, Mary, and Bart van Ark. 2003. EU productivity and competitiveness: An industry perspective. Luxembourg, UK: Office for Official Publications of the European Communities. Pieper, Paul E. 1990. The measurement of construction prices: Retrospect and prospect. In Fifty years of economic measurement: The jubilee of the Conference on Research in Income and Wealth, ed. Ernst R. Berndt and Jack E. Triplett, 239– 68. Chicago: University of Chicago Press. Sherwood, Mark K. 1999. Output of the property and casualty insurance industry. Canadian Journal of Economics 32 (2): 518–46. Solow, Robert M. 1957. Technical change and the aggregate production function. Review of Economics and Statistics 39 (August): 312–20. Stiroh, Kevin. 2002. Information technology and U.S. productivity revival: What do the industry data say? American Economic Review 92 (5): 1559–76. Triplett, Jack E. 2001. Price, output, and productivity of insurance: A review of the measurement issues. Brookings Institution Discussion Paper. Washington, DC: Brookings Institution. Triplett, Jack E., and Barry P. Bosworth. 2004. Productivity in the U.S. services sector. Washington, DC: Brookings Institution. ———. 2006. Baumol’s disease has been cured: IT and multifactor productivity in U.S. service industries. In The new economy and beyond: Past, present, and future, ed. Dennis W. Jansen, 34–71. Cheltenham, UK: Edward Elgar. Yuskavage, Robert E. 1996. Improved estimates of gross product by industry, 1959– 94. Survey of Current Business 76 (8): 133–55. ———. 2000. Priorities for industry accounts at BEA. Paper presented at the Bureau of Economic Analysis Advisory Meeting, Washington, DC. ———. 2001. Issues in the measure of transportation output: The perspective of the BEA industry accounts. Paper presented at the Brookings Institution Workshop on Transportation Output and Productivity, Washington, DC. http:// www.brook.edu/dybdocroot/es/research/projects/productivity/workshops/2001 0504.htm.
15 A Consistent Accounting of U.S. Productivity Growth Eric J. Bartelsman and J. Joseph Beaulieu
15.1 Introduction Zvi Griliches thought so much of the difficulty inherent in building data sets suitable for analysis that he devoted a chapter to the problem in the Handbook of Econometrics (1986). With respect to available data, he wrote: There are at least three interrelated and overlapping causes of our difficulties: (1) the theory (model) is incomplete or incorrect; (2) the units are wrong, either at too high a level of aggregation or with no way of allowing for the heterogeneity of responses; and, (3) the data are inaccurate on their own terms, incorrect relative to what they purport to measure. The average applied study has to struggle with all three possibilities. (1468– 69) The problems are especially acute in the study of productivity, where researchers usually have to “find” their data from different sources. These disparate data sources are produced by different government agencies or private outfits and are designed to answer different questions. As such,
Eric J. Bartelsman is a professor of economics at the Vrije Universiteit Amsterdam, and a research fellow of the Tinbergen Institute. J. Joseph Beaulieu is an economist at Brevan Howard, Inc., and was a staff member of the Board of Governors of the Federal Reserve System when this chapter was written. The authors would like to thank Carol Corrado for comments at various stages of the project; Jonathan Eller, Suzanne Polatz, Marcin Przybyla, Brian Rowe, Koen Vermelyen, and Matt Wilson for research assistance; the Bureau of Economic Analysis (BEA) for providing data; comments from Barbara Fraumeni, Edward Prescott, and Daniele Coen-Pirani; participants at the NBER Conference on Research in Income and Wealth Summer Institute; the Federal Reserve Bank of Minneapolis; and the winter meetings of the Econometric Society. This paper was prepared while Beaulieu was an economist at the Board of Governors of the Federal Reserve System.
449
450
Eric J. Bartelsman and J. Joseph Beaulieu
changes in the ratio of real outputs to inputs may reflect inconsistencies in the data set, rather than movements in productivity. A basic measure of total factor productivity (TFP) requires data on real output, labor input, capital services, real intermediate inputs, and the distribution of income to factors of production. These data often are assembled from different sources and thus may not measure the activities of the same set of producers—even if the descriptions of the producers’ activities are the same. Across different variables, the underlying microdata of producing units may have been collected in an inconsistent manner. Data sets may be organized using disparate classifications to describe the activities of producers. Even if the data come from one source, the classification system can vary over time. Sometimes different classification schemes reflect more fundamental differences, but the data still may contain exploitable information. Another problem is the estimation of missing data. Data are often published at higher levels of aggregation than desired. Sometimes data are available at a fine level of detail in one dimension but only at a very high level of aggregation in another dimension when detailed data are needed in both dimensions. A practical problem occurs when new data releases first provide totals, followed with detailed data at a considerable lag. In order to conduct research with all the desired detail and to make use of the latest data, procedures are needed to best use all the available information. The purpose of this paper is twofold. First, it describes how our systematic approach to data organization and manipulation helped us in constructing a consistent data set to study productivity. Many of the data hurdles just described had to be overcome in order to line up information on outputs, inputs, and prices from multiple sources in a way that minimized inconsistencies in definitions and coverage of the data. Second, the paper presents some simple applications. It reports estimates of TFP growth by industry and by legal form of organization, and it reconsiders these estimates assuming that firms scrapped an unusual amount of capital addressing potential Y2K bugs. Productivity in U.S. industries and sectors is the focus of the paper and is prominent in the discussion of the data problems. However, the data issues are ubiquitous, and the systematic approach can be applied to other areas of empirical study; can be scaled down for use with microdata; or can be used to handle additional dimensions, such as regions or countries. At present, the systematic approach is applied in a system using statistical and relational database facilities of the software package SAS.1 Versions of the system presently are in use at the Board of Governors of the Federal Reserve System and by the European Union (EU) 6th Framework research program EUKLEMS.2 1. The SAS Institute is in Cary, NC. 2. See http://www.euklems.org.
A Consistent Accounting of U.S. Productivity Growth
451
There are distinct advantages to the systematic approach to data organization and standardization of manipulation techniques. First, much tedious work in mapping relations between data sources is simplified. Next, documentation of any particular application can refer to standardized data manipulation routines. Further, the data organization scheme simplifies management of complex projects and the sharing of data between users. For researchers interested in data quality, the systematic approach provides a laboratory: researchers can easily vary the particular assumptions that they used to create estimates in order to test for their sensitivity. A researcher could go farther and produce confidence bands around such estimates. One could also apply this methodology to already published data that are produced by the statistical agencies. Indeed, some data sets that have been made available by government agencies rely on the same techniques available in our system to produce their estimates. As such, a reconsideration of the assumptions that these agencies employ to produce their estimates may be useful.3 Finally, this approach allows one to consider rigorously counterfactual exercises or to explore the implications of mismeasurement, such as in Jorgenson and Stiroh (2000). 15.2 Constructing a Consistent Data Set The systematic approach to data organization and manipulation that we have developed provides a practical method to cope with the data problems. Before describing our approach, we give some brief examples of the types of hurdles faced in building a consistent productivity data set. 15.2.1 Data Hurdles A potentially difficult problem arises when two aggregates that share a common description or title in statistical publications in fact are defined differently. For example, before the 2003 comprehensive benchmark revision to the National Income and Product Accounts (NIPAs), the Bureau of Labor Statistics (BLS) and the BEA had different definitions of nonfarm business. The BLS (and since 2003, the BEA) excludes two imputations (owner-occupied housing and the rental value of nonprofits’ capital equipment and structures) that the BEA makes to estimate gross domestic product (GDP). Although a careful reading of underlying documentation can trap such differences, only the detailed reconstruction and reaggregation 3. The exception of the recent literature considering mismeasured or biased prices, we are not aware of a lot of papers that directly explore the idea that published data are partially built on assumptions and models where alternatives can be considered. Exceptions include Wilcox (1992) and Miron and Zeldes (1989). There is, however, a developed literature studying the effects of measurement error; see Bell and Wilcox (1993) and references therein. See Weale (1985) for an approach similar to the strategy that could be contemplated with our system.
452
Eric J. Bartelsman and J. Joseph Beaulieu
Table 15.1
Comparison of 2001 compensation and profits data in the GPO and the NIPA data sets
Compensation
Manufacturing Transportation and utilities Wholesale trade Retail trade Remaining domestic private industries
Profits with inventory valuation adjustment
GPO
NIPA
GPO
NIPA
939.2 382.1 379.8 531.1 2586.9
939.2 382.1 379.8 531.1 2586.9
52.1 23.8 48.5 79.3 320.8
83.4 27.7 44.8 79.1 524.4
Note: GPO and NIPA compensation data are collected on an establishment basis. NIPA profits data are collected by firms; the GPO converts these data to an establishment basis.
of the underlying data will allow one to reconcile the differences in outcomes of analysis based on the two output definitions. A more fundamental problem is related to differences in data collection underlying the aggregates. One well-known example is the firmestablishment problem. United States business data are usually collected at one of two levels: at the establishment level, such as at individual plants, stores, or comparable places of work; or at the firm level (Postner 1984). A problem arises, however, when a firm has multiple establishments that are engaged in different lines of work. General Electric (GE) has extensive operations in manufacturing, finance, and services. Data collected at the establishment level will effectively split GE data among different industries along the different lines of work of the individual establishments. Data collected at the firm level will classify all of GE in one industry based on its major line of work. Currently Compustat assigns GE to the catchall category “miscellaneous,” although a few years ago GE was designated an electrical equipment manufacturer. Researchers manipulating the data need to know how the data were collected. In putting together the Gross Product Originating (GPO) data set, economists in the Industry Division at the BEA have converted all of the data to an establishment-basis concept.4 The NIPA, on the other hand, also present some industry data, but the definition of industry is not consistent across different types of income. The NIPA compensation data are collected at the establishment level, and as table 15.1 illustrates, the two 4. In this paper we refer to these industry data as the GPO data set. The BEA has published versions of this data set under different names. In 2003, the data set was called the GDP-byIndustry data. In 2004, the BEA significantly changed the method by which it calculates these data, yet it continues to refer to them as GDP-by-Industry data. In order to clearly identify the data that we are using as the data consistent with the 2003 and prior methods, we use the older name, GPO data.
A Consistent Accounting of U.S. Productivity Growth
453
sources match. The NIPA profit data, however, are collected at the firm level from administrative sources, and, therefore, the two databases do not agree on the mix of profits across industries although they do match in the aggregate. A problem that is particularly annoying to researchers is when classification schemes vary over time. Researchers often need long time series, but changes in classification schemes cause breaks in the series. Usually, industry data before 1987 are based on the 1972 Standard Industrial Classification (SIC) System, while later data are organized on the 1987 SIC. Recently statistical agencies have switched to the North American Industry Classification System (NAICS). Input-output tables use their own classification systems, which also have changed over time. Reclassifying the data so that they are all on one system—a procedure called concording—can be difficult when only published data sources are available. Sometimes, two classification systems may be motivated by entirely different concepts. Nevertheless, incorporating information from both systems may be useful. A good deal of NIPA data is presented by legal form of organization (LFO). While these data cannot be simply linked to data split by industry, we know from the economic censuses that the mix of corporate versus noncorporate businesses varies across industries. Manufacturing, mining, and utilities are predominately corporate, while some service industries, such as membership organizations, personal, and legal services have a large fraction of unincorporated firms. As such, the LFO data contain exploitable information on the mix of an aggregate across industries. Another way data can be mismatched to the needs of the researcher is when some data are incomplete or missing. One data set may present manufacturing industry data split at the two-digit SIC level, while another may include only durable and nondurable subaggregates. A different example can be found in the NIPA where, at the national level, taxes on production and imports, business transfers, and the surplus of government enterprises less subsidies are presented separately, but for corporations only the sum is listed. A second example of the problems presented by missing data arises when a researcher has data of aggregates in different dimensions but does not have detailed estimates broken out in each dimension. For instance, the GPO contains information on noncorporate net interest paid by various industries. The NIPA provide national totals for net interest paid by partnerships and proprietorships and by other private businesses. No published data, however, exist on net interest paid split both by industry and by this level of legal form of organization. A final way in which data can be incomplete is when aggregate data are updated, but updated disaggregated data are not yet available. For example, the BEA publishes initial data on all of the components of gross domestic income (GDI) for a particular year at the end of March of the following year. Typically, it publishes benchmarked data at the end of July, but the industry data are not finalized until well after. One could imagine that
454
Eric J. Bartelsman and J. Joseph Beaulieu
it would be possible to develop initial estimates for the recently completed year and incorporate the revised national data to update quickly industry estimates in prior years. Indeed, the BEA has developed a program to produce such “accelerated current-dollar estimates” (see Yuskavage 2002); even so, revised data at the more detailed level are only available with the release of the full data set. 15.2.2 Overview of Data Organization and Manipulation The system we use consists of four interrelated components that provide practical tools to deal with these problems. First, we store economic data, such as the NIPA, GPO, and input-output data in a relational database, using an appropriate data model. The data model in our application is set up to reflect the conceptual structure of the system of national accounts, with flows between actors tagged by information on their sector, activity (industry), commodity and transaction type, among others. The second aspect of the data organization is to code information on data definitions and on ways in which the data interrelate, the so-called metadata. In particular, these metadata describe how the detailed data in one dimension, for example, industry, aggregate in a hierarchy, or how detailed industries in two different industry classifications map into each other. The metadata imply linear restrictions across observations that ensure overall consistency of the data set. Third, the relational database and the metadata make it possible to write standardized routines, or tools, to manipulate the data. These generalized tools fall in one of four categories: aggregating, disaggregating, balancing, and concording data. These four operations help to overcome many of the hurdles that researchers face when using data from different sources. Finally, the system contains some specialized tools necessary for the study of productivity. These specialized tools allow users to estimate capital stocks, capital services, and TFP employing a variety of assumptions. 15.2.3 Relational Structure of the Productivity Data Set The main component of the consistent productivity database we have put together is the GPO data set published by the BEA. The GPO data set includes annual data on price deflators, real and nominal measures of gross output, intermediate inputs, and value added. Industries are defined roughly at the two-digit SIC level. The data set also includes nominal income components of value added, such as capital consumption allowances, compensation, and capital income. The data are consistent with the income-side measure of domestic product in the NIPA; the sum across all industries equals gross domestic income.5 5. Gross domestic income equals gross domestic output less the statistical discrepancy (see Parker and Seskin 1997).
A Consistent Accounting of U.S. Productivity Growth Table 15.2
455
Structure of GPO data set
Date
Sector
Industry
Imputed
Transaction
Units
Value
1987 1995 1996
Total Total Total
Total Farms Retail trade
Total Total Total
Value added Compensation Gross output
Bil. Ch-96$ Bil. $ Deflator
6,113.2 15.7 100.0
To bring the GPO data into our system, we defined a conceptual data model to code the information. In a relational database, a particular piece of datum is not simply the particular numerical value an observation takes, but a set of all the relevant characteristics that identifies the value within the data model. The model we chose for the GPO data set looks like the one in table 15.2. Five columns, or dimensions, of data characterize the observations in this data set. Industry indicates the specific industry (organized on the 1987 SIC); the first observation in the table is for real GDI, and so industry equals the total over all industries. Transaction describes where the product or input relates in the chain of production. There are two types of transactions, distributive and productive. Distributive transactions are typically income, or incomelike items such as compensation, profits, capital consumption, and so on. Production transactions relate to goods and services produced or consumed as inputs to the production process, such as gross output, intermediate inputs, labor hours, capital services, investment, consumption, and so on. Date is the particular date of the observation. In this case, the date simply records the year because the GPO data set contains annual data. In other cases, one could code values in this dimension to incorporate information on the frequency of the data (monthly, quarterly, etc.) and other timing attributes, such as average over period, end of period, and so on. One could imagine that in some applications, frequencies and timing attributes would be coded in different dimensions when the data set contains a variety of combinations. Unit describes how the variable is measured and whether it is a nominal variable, price deflators, or real variable. Value reports the numerical value of the data of interest. Because we augment the GPO data set with other information, we have added two additional dimensions to describe the data. Sector represents the NIPA institutional sectors (business, general government, households, and nonprofit institutions). The business sector is further refined by legal form of organization (corporate, noncorporate, etc.). Imputed accounts for whether the data apply to the two imputed sectors in the NIPA, owner-occupied housing and the rental value of nonprofits’ capital equipment and structures, or not. In the NIPA, a large component of consumption is owner-occupied housing. The BEA accounts for the production of this service by assuming that there is an entity that owns the
456
Eric J. Bartelsman and J. Joseph Beaulieu
stock of owner-occupied housing and rents it back to its owners. The rental value of owner-occupied housing is treated as consumption. To preserve the identity that GDP equals GDI (up to the statistical discrepancy), the BEA imputes income to this entity. This accounting convention also makes GDP and GDI invariant to whether a house is rented or occupied by its owners. The second imputation involves the rental value of nonprofits’ capital equipment and structures. As for owner-occupied housing, the BEA pretends that there is an entity that owns this capital and rents it to nonprofit organizations.6 In some cases, the GPO data provide information on sectors. Capital consumption allowances, inventory valuation adjustments, and net interest paid by industry are split between corporate and noncorporate sectors. Other income items are known to accrue to only one type of sector. Profits only go to corporations; proprietors’ income and rental income involve only noncorporate businesses, and government surpluses accrue only to government enterprises. Other items, such as compensation, indirect taxes, and gross output have no information on sector. We know totals across industries from the NIPA; we allocate these domestic totals to different industries using the techniques described below. Likewise, we appeal to the NIPA table on imputations to calculate estimates of income for owneroccupied housing and the rental value of nonprofits’ capital. These imputations involve one industry (real estate) and therefore require no additional estimation. A complete accounting of the circular flow of goods, services, and income would include a few other dimensions that identify not only who produces the good or service or who pays the income, but also who purchases the good or service or receives the income. In such a way one could fully integrate all of the NIPA data into the system (for example, tables of personal income and government receipts and expenditures). Such an analysis would be necessary when studying income dynamics or general equilibrium, but these dimensions are not needed for the present study and are excluded. The presence of various industrial classification schemes presents a small dilemma. One could imagine having separate dimensions to describe each classification scheme: one for SIC 1972, another for SIC 1987, and a third for NAICS. Under this strategy, observations using one system would be concorded to all of the other relevant classification schemes before they were stored in the database. We do not follow this strategy. Usually one is 6. To be exact, these are only nonprofit institutions that primarily serve individuals. Nonprofits that primarily serve businesses, such as trade associations, are treated like any other business in the NIPA in that their consumption of nondurable goods are counted as intermediate usage, and their purchases of equipment and structures are counted as investment. The income paid by these institutions to various factors of production is included in the aggregates for corporations.
A Consistent Accounting of U.S. Productivity Growth
457
not particularly interested in seeing how the different classification systems compare; instead, one just wants to convert all of the data to one particular system. Maintaining new data on an old classification scheme could become burdensome, and the new classification should have some advantages in representing the current structure of the economy. Nonetheless, it would be possible to implement this strategy, and in some cases, such as building a concordance from microdata, it would be the way to go. 15.2.4 Metadata for the Productivity Data Set The second part of the system involves coding various linear constraints in two types of metadata, hierarchies, and concordances. A hierarchy describes how the data add to their total. Knowing the hierarchy is useful for several reasons. It makes the calculation of interesting subaggregates possible, and it makes matching data sets that differ on their level of aggregation easier. One can keep some subtotals in a database and use the hierarchy to then exclude those subaggregates when calculating other subaggregates or totals. It may be important to carry these subaggregates in the database, especially when they are read directly from a data source. Rounding issues make the subaggregates read directly from the data source more accurate than anything that can be calculated, especially for chain-weighted aggregates. Finally, and perhaps most important, the hierarchies code the myriad of linear constraints that exist in economic theory as well as various data sets. In our data model, we need hierarchies in four main dimensions: industries, transactions, sectors, and imputed indicators. (See table 15.3.) Note that these hierarchies apply at any level of another hierarchy. Value added for the whole economy equals the consumption of fixed capital and income. Likewise, value added for corporate farms equals the consumption of fixed capital and income of corporate farms. While there are four relevant conceptual dimensions in our data model, in practice the various data sets we work with each have their own classifications for each dimension; for example, some data sources use the 1972 SIC, while others use the 1987 SIC to describe the industry hierarchy. The second type of metadata, a concordance, describes how two classification schemes relate. The concordance can be as simple as a list of which components in one system map to the components of a second system and vice versa, or it can provide more detail on the relative magnitudes of how much of a component of one system is split among the components of the other system. What distinguishes a concordance with detailed information on relative magnitudes from simply a detailed data set is that the information on magnitudes in a concordance is typically available for only one year. The concordance tool ensures that these relative magnitudes are applied across years, and the discussion of the concordance tool describes concordances in more detail. In particular, it explains how we constructed
458
Eric J. Bartelsman and J. Joseph Beaulieu
Table 15.3
Hierarchies for four main dimensions
Industries Domestic total Farms Nonfarm Agricultural services, forestry, fishing Mining Metal mining Coal mining Oil and gas extraction Mining services Construction Manufacturing Durable manufacturing Lumber Furniture and fixtures Sectors Domestic total Business Corporate Financial corporations Nonfinancial corporations Noncorporate business Sole proprietors and partnerships Other private business Households and institutions Households Institutions Government General government Federal State and local Government enterprises Federal State and local
Transactions Gross output Intermediate inputs Value added Consumption of fixed capital Income Compensation Taxes on production and imports Net operating surplus Current transfers Proprietors’ income Rental income Profits Inventory valuation adjustment Surplus of government enterprises Imputed Domestic total Owner-occupied housing Rental value of nonprofits’ capital Not imputed
the concordance to convert GPO data from the 1972 SIC for the years 1977 to 1986 to the 1987 SIC. 15.2.5 Standardized Operations The third component of the system uses the metadata along with the organization of the data in a relational database to automate regular data operations. For example, if a data set contains information in the dimension industry using the 1987 SIC classification, the aggregation routine refers to the 1987 SIC metadata to find out which aggregates need to be created by summing over which detailed industries.
A Consistent Accounting of U.S. Productivity Growth
459
Aggregating The most straightforward operation is aggregation. Nominal dollar and count data, such as hours and employment, are simply added up a defined hierarchy to calculate aggregates at various levels of detail. Other types of aggregation, such as Laspeyres, Paasche, Fisher Ideal, or Divisia chained indexes involve more complex operations that require additional data on weights. In our particular application, we aggregate over all nonfarm industries and over all business sectors for observations where the imputed dimension has value “not imputed” in order to calculate the aggregate consistent with the definition employed by the BLS in its multifactor productivity program. Disaggregating A second operation that is often required is disaggregation, which is the inverse operation of aggregation. Given an aggregate, one can estimate the constituent pieces. For instance, in the GPO data before 1987, industries 36 (electrical machinery) and 38 (instruments) are aggregated together; however, we would like to have historical data for both separately. The difference between aggregation and disaggregation, however, is that the former is a many-to-one operation. No other information besides the constituent pieces, and perhaps corresponding weights in the case of fancier aggregates, is required to calculate the aggregate. On the other hand, disaggregation is usually a one-to-many operation, and, thus, one needs additional information to choose among the infinite possible ways to split a total.7 We refer to this additional information as a pattern. The pattern data need not be consistent with the original data of interest. After all, if the pattern data were to aggregate to the target, one would already have consistent estimates of the pieces. For simple data that add, the procedure scales up or down the pattern data to yield disaggregated pieces that sum to the known total. Let i index the component of an aggregate T. Denote the observed aggregate d T, and suppose that there are pattern data on the pieces, p i. Then the estimate of the disaggregated pieces is given by d i d Tp i/ ΣIj1 p j. In the case of Fisherideal indexes, the procedure does this separately for Paasche and Laspeyres indexes, which do add, and then takes the geometric average of the two. The quality of the result depends on how well the initial pattern reflects the true distribution of the aggregate. Sometimes, only a few scraps of information may be available that provide rough guidance; in the limit, the fall back could be simply to split the aggregate evenly. Other times, some 7. In cases when the aggregate and all but one component are known, the procedure is exact, and no pattern data are needed. This case arises when one wants to exclude one component from a large aggregate; typically, all of the data on both the aggregate and the piece to be excluded are known.
460
Eric J. Bartelsman and J. Joseph Beaulieu
market conditions or other reasonable assumptions may be used to justify a particular pattern. In building our data set, we augmented the GPO database with an estimate of hours by industries. The NIPA contain estimates of hours paid by industry at a fairly aggregated level. We disaggregated this information down to the industry level in our data set using full-time equivalent employees as an indicator. By using employees as an indicator, we are implicitly assuming that average hours per full-time employee are the same across different detailed industries that are part of the aggregate observed in the NIPA data. We then used these estimates to disaggregate the BLS hours measures, which adjust for the difference between hours worked and hours paid, to get a measure of employee hours worked by industry. The GPO data set also includes data on all persons engaged in production, which equals the number of employees in an industry plus the number of people working alone. The BLS publishes aggregate estimates of the labor hours of the self-employed and an estimate of self-employed compensation. This last measure represents the fraction of proprietors’ income that could be considered labor compensation, as if the proprietor pays a salary to him or herself. The BLS makes this calculation in order to correctly weight the contribution of labor and capital in production function estimates. We make this same adjustment at a more detailed level; we estimate self-employed hours and compensation by industry controlled to the BLS’s aggregates using the disaggregation procedure. For self-employed hours, we use an estimate of self-employed workers as a pattern indicator; for self-employed compensation, we use employees’ compensation per hour times our estimate of self-employed hours as a pattern indicator. The automated nature of the tool provides an additional advantage. By varying the pattern data, such as by adding random noise, one can measure how sensitive the results are to the original pattern. Indeed, with a set of statistical assumptions, one could estimate standard errors around these estimates. Balancing A third operation, balancing, allows one to estimate data subject to linear constraints in multiple dimensions. An example of a balancing problem shows up when trying to calculate capital services. To do this, one needs investment by type of equipment and by type of industry, while only data on economywide investment by type of equipment and total investment by industry are typically available. As with disaggregation, there are multiple solutions to the linear constraints; several solutions to the problem of finding one, best set of estimates have been proposed in the literature (Schneider and Zenios 1990). (See table 15.4.) The first is directly applicable when, as in the preceding investment example, there are linear constraints in two dimensions. In this particular example, one can think of the unknowns as a matrix, where the rows repre-
461
A Consistent Accounting of U.S. Productivity Growth Table 15.4
Investment flow matrix Asset types
Row controls
T1
T2
…
Totals
a11
a12
…
∑a
J
J
I2
a21
a22
…
∑a
2j
Totals
… I
∑a
i1
i1
…
…
j1
I
Column controls
1j
j1
…
Industries
I1
I
∑a
i2
i1
J
∑∑a
ij
i1 j1
sent different values in one dimension, and the columns represent different values in the second dimension. For instance, the rows can represent different industries, while the columns could represent different asset types. The constraints are represented as restrictions on the sums across the rows and columns. Suppose one has an initial guess of the matrix, Ak–1, which is not consistent with the row and column controls. The first technique, the so-called RAS procedure, estimates A through the following algorithm. One multiplies Ak–1 by Rk so that Rk Ak–1 satisfies the column controls. Then one multiplies Rk Ak–1 by Sk so that Rk Ak–1Sk satisfies the row controls. Let Ak Rk Ak–1Sk . Repeating the procedure leads to a series of matrices that, under certain conditions, converges, so that A RAS, where A satisfies both row controls and column controls.8 The limiting condition, A RAS, also explains the moniker “RAS” algorithm that has been attributed to Stone (Stone and Brown 1962). The restriction implied by the procedure that the final matrix is a function of only a series of row and column-scaling factors is also known as the biproportional constraint, and this algorithm is also known as biproportional matrix balancing. A different strategy is to stack the columns of the matrix into a vector and write a 0i ai εi or a 0i εi ai where a0 is the vector of initial guesses of the true value a and the error term εi has a known distribution. Two commonly used distributions are the normal and log normal distributions. The advantage of this approach is that it can handle multiple dimensions and more general restrictions. We further generalize the problem by allowing the constraints also to be measured with error. 8. Bacharach (1965) provides uniqueness and convergence results. Schneider and Zenios (1990) credit Sinkhorn (1964) for an early result that if the entries of A are strictly positive, then the RAS procedure will converge.
462
Eric J. Bartelsman and J. Joseph Beaulieu
The unknown values are estimated via a maximum likelihood procedure: N
or min ai ,vk
i
i1 N
or min ai ,vk
1
[log(a ) log(a ∑ i
1
(a a ) ∑ i
i1
i
0 2 i
0 i
K N 1 )] 2 ∑ vk ∑ ki ai k1 k i1
K N 1 ∑ vk ∑ ki ai k1 k i1
2
2
N
both subject to the k linear constraints ∑ ki ai vk . i1
If k 0, the control is measured exactly, and k replaces 1/k in the minimization problem where k is now an unknown Lagrangian multiplier to be solved for. Stone, Champernowne, and Meade (1942) first proposed a least squares model. In their application, they weighted the observations according to how precise the estimates of the pattern were, but they assumed the controls were measured exactly. Each method has its own advantages. The advantages of the RAS model is that it is easy to calculate, and under certain circumstances, the biproportional constraint has been given an economic interpretation. In the case of calculating an input-output matrix in year t based on a known inputoutput matrix in year t – 1, Parikh (1979) interprets the two scaling factors, R and S, as follows:
• A substitution effect that measures the extent to which the output of a sector substitutes or has been substituted by the output of the product of other sectors as an intermediate input • A fabrication effect that measures the extent to which the ratio of intermediate goods to total gross output has changed in a sector The benefit from the statistical approach is that it allows one to test a subset of restrictions using either a likelihood ratio test or a Wald test. Weale (1985) uses this insight to test the hypothesis that the U.S. current account was positive in 1982 to 1983 instead of negative, as measured by the BEA.9 Modeling the distribution of the errors as a normal distribution, perhaps with a standard deviation proportional to the observed values of a0, also allows the procedure to choose negative values. In cases where several values are known to be zero, a solution to the problem may require a switch in the signs of the initial guess, and in such a case, the RAS procedure will not converge.10 9. Golan, Judge, and Robinson (1994) develop a generalized version of the RAS model whereby the probabilities over a discretized space of values are estimated via something like the RAS procedure. These estimates also allow one to conduct statistical tests. 10. The RAS procedure can be adapted to allow for negative values (Günlük-S¸enesen and Bates 1988), but the procedure will not switch the signs of the initial guesses.
A Consistent Accounting of U.S. Productivity Growth
463
In creating our data set, we employ the balancing procedure several times. We used it to build a consistent concordance between the 1972 and 1987 SICs (described in the next subsection). We also used the procedure to estimate income components by industry and by legal form of organization for those transactions that the BEA did not already publish such splits.11 For example, we have information on compensation by industry and total compensation by legal form of organization from the NIPA that serve as our controls. As an initial pattern, we use some unpublished outof-date information on these splits from the BEA, augmented with observations from the 1987 and 1992 censuses. Splitting the industry output by legal form is useful because it better matches the sources of at least some of the income components. Many of the income data are collected through tax records, and corporations and other businesses file different forms. The data also have to be adjusted for misreporting; the dollar adjustment to proprietors’ income was more than twice as large as to corporate profits in 1996, even though proprietors’ income is a much smaller fraction of national income (Parker and Seskin 1997). This suggests that the measurement of output for the noncorporate sector is subject to larger errors than for the corporate sector. Corrado and Slifman (1999) showed that productivity in the noncorporate business sector was measured to have been declining for over two decades, even though capital income as a share of output was relatively high. They pointed to mismeasured prices as one likely explanation for the confluence of these observations. To the extent that prices are biased upward in industries that have a disproportionate share of noncorporate business, the real output of noncorporate business would be biased down more than for corporate business. Splitting individual industries by legal form—where presumably the output and input prices to the sectors within an industry are similar—and comparing their relative performances may shed some additional light on the issue. Concording The last basic tool concords two data sets whose dimensions are organized on different classification schemes. For example, the GPO data from 1949 to 1987 are organized along the 1972 SIC; from 1987 to 2000 they are organized along the 1987 SIC. Some of these industries map to more than one industry. As suggested by figure 15.1, the problem of concording data organized by the hierarchy on the left to the hierarchy on the right is simply to split the pieces of the left-hand side into parts so that they can be allocated to the different categories on the right-hand side and then added back up. 11. See Seskin and Parker (1998) for definitions of corporations, sole proprietorships and partnerships, and other forms of legal organization as used in the NIPAs.
464
Eric J. Bartelsman and J. Joseph Beaulieu
Fig. 15.1
Concordance mapping (example)
Concording the right-hand side to the left-hand side is the mirror image of this operation. Thus, for the most part, the problem of concording is simply the organized use of aggregating and disaggregating operations. As such, the important part of the implementation is developing weights for the disaggregation. In most cases, information on the relative weights is limited because no data are reported on both bases. As a result, the weights have to be developed using whatever information is available. In concording the inputoutput tables to the GPO data, a few input-output industries had to be split; to do this, we used a variety of data, such as detailed employment shares and census shipments data (see appendix). In one important case, data are reported on two bases in a reference year, allowing for a richer concordance: the GPO data for 1987 are available using the 1972 SIC and the 1987 SIC. For example, industries 481,2,9 (telephone, telegraph, and other communications services) and 483–4 (radio and TV broadcasting) on the 1972 basis map to industries 481,2,9 (telephone, telegraph, and other communications services) and 483–4 (radio and TV broadcasting and other TV services) on the 1987 basis. One can think of the problem of developing concordance weights as a balancing problem where the 1972 and 1987 totals are controls. As initial guesses for the pattern for all of the industries, we used the concordance in the NBER Productivity Database (Bartelsman and Gray 1996) for manufacturing, and simply used 1/N for other industries for cells that are nonzero according to an available mapping. See table 15.5 for this simple example. The cells of the matrix are the concordance weights. The advantage of balancing a matrix of weights is that one can concord data both ways in a consistent manner. Concording data from the 1972 SIC to the 1987 SIC and then back again yields the original 1972 data.
A Consistent Accounting of U.S. Productivity Growth Table 15.5
465
Gross output of communications, 1987 SIC 87
SIC 72
481,2,9 483–4
170.1 29.7
481,2,9 157.8
483–4 42.1
157.8 0.0
12.3 29.7
Concording provides a means for moving the data between two classification schemes in the same conceptual dimension. Technically analogous is the problem of cross-classification, such as moving data collected at the firm level and published by industry, to match data by industry collected from establishments. The cross-classification table would contain data akin to that in a concordance, showing the amount in a firm-based industry that splits into various establishment-based industries. 15.2.6 Specialized Productivity Tools We have developed several tools needed specifically to study productivity. One tool accumulates weighted levels of past investment using the socalled perpetual inventory method to estimate stocks of particular assets. The weights are modeled in the same manner as the BLS and Federal Reserve Board (FRB; Mohr and Gilbert 1996) use to account for wear and tear, the average rate of discards, and the effects of obsolescence. A second tool weights these stocks using the standard user cost model of Hall and Jorgenson (1967) in order to estimate capital services. The rate of return can be an ex-ante rate, such as a corporate bond rate, or an ex-post rate, such as property-type income divided by an estimate of the value of the capital stock. A third tool estimates TFP growth by calculating a Divisia index of the inputs using different approaches; the implementation in this paper uses cost shares to weight the inputs. 15.3 Completing the Data Set for the Study of Productivity 15.3.1 Basic Industry Data Besides the various steps described in the preceding, we had to fill out some of the price data for 1977 to 1986. We concorded the 1982, 1987, and 1992 input-output tables to the GPO data (see appendix) and used the implicit weights in these tables to calculate price deflators for intermediate inputs. Along with available gross product deflators, these gave us gross output deflators. All told, we have information on nominal and real gross output, intermediate inputs, and value added by industry and by legal form of organization. We use these data, along with estimates from the input-
466
Eric J. Bartelsman and J. Joseph Beaulieu
output tables to estimate the amount of nominal and real intermediate inputs produced and consumed by the same industry. We exclude the consumption of these inputs from other intermediate inputs; at the same time we exclude them for gross output. The resulting measure is known as sectoral output, and it is the suitable measure for the study of productivity (see Domar 1961; Gollop 1979). In addition, we have information on various components of income paid and employee and nonemployee hours worked by industry and by legal form of organization. To complete the information needed to study productivity, we developed estimates of capital services and labor quality. 15.3.2 Investment and Capital Stocks The investment series that we use are the detailed industry estimates of industry investment by asset type that the BEA made available on its Web site. We refine these data by splitting industry investment between corporate and noncorporate investment for each type of equipment and structure, controlling the total for each legal form to equal the data available in tables 4.7 of the Standard Fixed Asset Tables and the residential investment tables of the Detailed Fixed Asset Tables. The nonresidential investment tables report investment in equipment and in structures by legal form, divided among three activity groups (farm, manufacturing, and other). To refine these data by industry and by asset type, we used total industry investment by industry and by asset type as an initial pattern in our balancing routine. A practical problem in working with the data was that the investment figures were rounded to integers. In early years, or for activity-type combinations with low levels of investment, dividing nominal values by reals provided a poor estimate of the deflator. To rectify this, we assumed that these asset prices did not vary by activity and used the deflator calculated from aggregate data. Capital stocks are calculated by accumulating the investment data using the standard BLS stochastic mean-service life and beta-decay parameters. We estimate capital services using the Hall-Jorgenson formula using exante returns, and to analyze trends, we separate capital services into three categories, high-tech equipment and software (ICT), other equipment, and structures.12 15.3.3 Labor Services Analogous to capital, a unit of labor of a certain type may provide a different level of service than a unit of labor of another type. The measure of labor input appropriate for productivity analysis, labor services, is computed as a quality-weighted aggregate of labor hours by type. The weights used to aggregate labor are expenditures shares for each type. 12. ICT capital is defined as computers and peripheral equipment, communications equipment, photocopy and related equipment, instruments, and software.
A Consistent Accounting of U.S. Productivity Growth
467
For each industry and sector, information is thus needed on hours worked and compensation for workers by type. These data are not directly available from firm- or establishment-based data sources. However, the Current Population Survey (CPS) March Supplement from the U.S. Bureau of the Census has information on wages of workers, along with other worker characteristics such as age, gender, occupation, education, and industry. To calculate labor services, we first estimated Mincer’s wage equation of the following form: log[w(s, x)] const s 1 x 2 x 2 3 x 3 4 x 4 Z ε, where w(s, x) represents wage earnings of someone with s years of schooling and x years of work experience. In the regression we also included gender, part-time/full-time, and ICT occupation dummies, summarized in Z, with coefficient vector . The wage equation was estimated using U.S. Census Bureau CPS March survey data for years 1977–2001. We used the fitted values of this equation to impute wages to all workers in the data set. Using estimated wages and hours of individual workers, hours and imputed compensation are computed by industry and by four types of workers. The four worker types that we use are technology workers and three other worker types based on education attained (high school dropout, high school graduate, and college plus).13 With these estimates from the CPS, we disaggregated the GPO employee hours and compensation paid to obtain these variables by worker type consistent with the aggregates we observe in our augmented GPO data set. We then aggregated hours of the four worker types by industry using Törnqvist compensation weights to obtain labor services. The labor quality index is defined as labor services divided by hours, and so labor services are defined as labor quality times hours. 15.4 Applications 15.4.1 Productivity Growth of Nonfarm Business As an initial exercise, we estimated TFP by industry and by legal form of organization, aggregated to private nonfarm business. At the individual industry level, we model the growth rate of TFP as the growth rate of real sectoral output less the share-weighted growth rates of real intermediate in13. For the years 1977–1981, ICT workers are defined as compute programmers, computer systems analysts, computer specialists, not elsewhere classified (n.e.c.), electrical and electronic engineers, and computer and peripheral equipment operators. For the years 1983– 2000, ICT workers are defined as electrical and electronic engineers; computer systems analysts and scientists; operations and systems researchers and analysts; supervisors, computer equipment operators; chief communications operators; computer operators; and peripheral equipment operators.
468
Eric J. Bartelsman and J. Joseph Beaulieu
Table 15.6
Growth accounting, nonfarm business Capital services Labor services
Materials
Information and communications technologies
1978–2001
0.16
0.70
0.24
0.35
0.66
0.37
0.17
0.45
3.10
1978–1989 1990–1995 1996–2001
–0.08 0.71 0.09
0.52 0.59 1.17
0.22 0.15 0.36
0.46 0.24 0.24
0.81 0.37 0.63
0.54 –0.05 0.46
0.20 0.17 0.10
0.20 0.54 0.88
2.88 2.72 3.93
Total Reallocation factor Equipment Structures Hours effects Quality productivity
Sectoral output
puts, labor input, and capital services. We use data from the input-output tables on the ratio of sectoral output to gross output to estimate ownindustry inputs. The data on real gross output, intermediate inputs, and cost-weighted expenditure shares come from our modified GPO data set. To calculate aggregate TFP growth, we take a weighted sum of the individual components, where the weights are calculated as sketched in Domar (1961).14 We estimate the ratio of sectoral output to gross output in each industry times the ratio of sectoral output to gross output of all private industries excluding farm and owner-occupied housing as measured in the 1982, 1987 and 1992 input-output tables. We interpolate these ratios between years and then multiply them by the ratio of gross output in our data set for each industry to gross output of all private nonfarm industries to obtain annual Domar weights. The contribution of inputs (excluding materials) and TFP to nonfarm business sectoral output growth equals the weighted sum of the contributions to growth of the inputs and TFP to individual industry sectoral output growth, where the weights are the annual Domar weights. The contribution from materials is calculated as the growth rate of sectoral output less the sum of the contributions from the other inputs and TFP. As noted by Domar, the weighted sums of TFP growth rates measures the increase in aggregate output holding the factors of production constant, which is the closest thing to the concept of technical progress that we have. Table 15.6 reports the growth rate of aggregate sectoral output for private nonfarm businesses over each of the time periods considered, as well as an estimate of the contributions to growth from the use of materials, capital, labor, and TFP. As described in the table, sectoral output grew, on average, 3.1 percent per year. Capital services contributed 1.3 percentage points per years, and labor hours added 2/3 percentage point. We estimate that increases in the quality of labor contributed a little over .5 per14. See also Gollop (1987) and Hulten (1978) for a more detailed discussion of the derivation and interpretation of the Domar weights.
A Consistent Accounting of U.S. Productivity Growth
469
centage points to sectoral output growth. The Domar weighted average across industries of labor quality contributed 0.17 percentage points, while the Domar-weighted average of the contribution of hours less the simple sum of nonfarm-business hours times labor share, which we refer to as reallocation effects, added 0.37 percentage points. These estimates, including the reallocation effects, are a little higher than is implied by the results in Jorgenson, Ho, and Stiroh (2002) and in Aaronson and Sullivan (2001). We estimate that TFP rose on average 0.45 percent per year. Over the 1996 to 2001 period, sectoral output climbed 3.9 percent per year. TFP accelerated to 0.9 percent per year, and the average contribution of high-tech capital services increased to 1.2 percentage points. Tables presenting estimates of output growth and contributions of the various input factors and TFP for the sixty industries are available upon request. As noted elsewhere, important contributors to the TFP acceleration in the late 1990s were machinery manufacturing (which includes computers) and electrical machinery manufacturing industries (which includes communication equipment and semiconductors). Technical progress also picked up in the trade industries, as did the growth rate of their stock of high-tech equipment. Some other industries, such as depository institutions and business services, also pushed up their rates of investment in high-tech equipment. But TFP growth increased only 0.3 percentage points in depository institutions and fell sharply in business services. Table 15.7 reports TFP growth split between corporate and noncorporate private businesses. At the aggregate level, the acceleration noted in table 15.6 in nonfarm business TFP is due to the sharp improvement among noncorporate business. Indeed, among all major components, TFP rose more rapidly among noncorporate businesses than corporations. This could be an artifact of mismeasured capital services. As shown in the bottom half of the table, the contribution to growth from capital services was more rapid among corporations than noncorporate businesses. 15.4.2 Y2K In the late 1990s, businesses spent a large amount of money working to fix potential Y2K bugs. Software that could not recognize that the year represented by the two-digit number “00” was one year larger than the year “99” had to be modified or replaced. Industry reports indicate that some firms regarded the purchase of whole new systems, including hardware, as preventive maintenance. These stories suggest that the rate of depreciation and discards of computers and software was unusually high in advance of the century data change. The models that we employ to estimate capital stocks do not directly measure this rate. Unless augmented, these models assume that the rate is a function of the stock and age of equipment of each vintage. As a small experiment with our system, we adjust the stocks of computers and
470 Table 15.7
Eric J. Bartelsman and J. Joseph Beaulieu Output contribution from capital services and total factor productivity by legal form of organization 1977–1989 Corporate
Nonfarm private business Agricultural services Mining Manufacturing Transportation and utilities Wholesale trade Retail trade Finance, insurance, and real estate Services Nonfarm private business Agricultural services Mining Manufacturing Transportation and utilities Wholesale trade Retail trade Finance, insurance, and real estate Services
1990–1995
Noncorporate
Corporate
1996–2001
Noncorporate
Corporate
Noncorporate
0.30 3.11 –1.43 0.89
Total factor productivity –0.06 0.78 1.35 –0.92 –2.64 0.87 2.24 0.93
–0.11 –0.44 4.14 –0.41
0.64 –0.52 –0.42 0.72
1.77 0.95 0.73 2.71
0.18 1.52 –0.91
1.12 –0.07 –0.32
0.88 1.19 –0.87
3.52 –1.07 –1.60
–0.41 3.97 1.76
0.83 3.29 2.15
–2.21 0.76
1.65 –1.33
0.25 0.64
0.50 –0.72
1.06 –1.99
2.28 –0.14
1.18 –4.15 1.13 0.38
Capital services 1.27 0.99 –0.89 2.02 2.00 –0.29 0.06 0.40
0.43 0.26 –1.06 –0.11
1.83 2.58 0.67 0.59
0.85 0.36 –0.97 0.12
1.39 0.79 0.95
1.31 –2.16 –0.29
1.07 0.68 0.86
0.52 0.81 0.44
2.06 0.93 1.72
1.23 0.98 0.96
2.24 1.01
1.63 0.44
1.61 0.71
0.53 0.16
2.59 1.39
0.75 0.41
software assuming that some share of Y2K spending represented additional scrappage. To parameterize the experiment, we used figures reported by the U.S. Department of Commerce (1999). That report cites a study from International Data Corporation (IDC) that public and private spending from 1995 to 2001 to fix the Y2K problem was roughly $114 billion. It also cites an Office of Management and Budget (OMB) report that the federal government was spending a little over $8 billion and a Federal Reserve study that suggests spending by state and local governments was roughly half of federal spending. The Commerce report also provides some figures developed by Edward Yardeni of the distribution of spending across industries. We used the aggregate estimates to calculate baseline spending on Y2K by the private sector over 1995 to 2001, and we used the Yardeni estimates to split them across broad industry aggregates. We assume that Y2K spending across different types of computer equipment and software was the same as total spending, except that we goosed up the fraction on software by 50 percent based on some IDC figures on the split on spending between hardware and software to redress the Y2K bug.
A Consistent Accounting of U.S. Productivity Growth
471
Two considerations suggest these figures are not precise. The IDC indicates that a lower and upper range for spending was plus or minus 50 percent. In addition, all of this Y2K spending does not necessarily reflect additional spending on investment. Estimates from the IDC indicate that only 27 percent of worldwide spending was on “hardware or software,” whereas the rest was on “internal or external” spending, which may not have been counted as investment. As a lower bound, we assume none of the “internal or external” spending was investment; as an upper bound, we assume all of it was. This leaves a wide range of investment of $14 to $152 billion, which we assume also represents the additional scrappage of older stocks of hardware and software. Table 15.8 reports the change in estimates of TFP by broad aggregates
Table 15.8
Effect of Y2K spending on total factor productivity growth $150 billion
$50 billion
1995 1996 1997 1998 1999 2000 2001 Cumulative Cumulative Nonfarm private business Forestry, fishing, agricultural services Mining and construction Manufacturing Durable goods Electronic equipment and instruments Motor vehicles and equipment Other manufacturing durables Nondurable goods Chemical, petroleum, coal Excluding petrochemicals Transportation and utilities Communications Excluding communications Wholesale and retail trade Finance, insurance, and real estate Depository and nondepository Other finance and insurance Real estate Services Business and other services Recreation and motion pictures Other services
0.05 0.17 0.26 0.22 0.10 –0.12 –0.13
0.56
0.14
0.00 0.02 0.02 0.02
–0.00 –0.04 –0.04 –0.05
0.01 0.21 0.21 0.25
0.00 0.05 0.05 0.06
0.04 0.14 0.17 0.13 0.05 –0.07 –0.08
0.39
0.10
0.01 0.02 0.02 0.02 0.01 –0.01 –0.01
0.06
0.01
0.02 0.06 0.10 0.09 0.04 –0.04 –0.04 0.01 0.04 0.06 0.05 0.02 –0.02 –0.03
0.21 0.13
0.05 0.03
0.01 0.01 0.05 0.11 0.03 0.04
0.03 0.04 0.17 0.36 0.08 0.12
0.04 0.06 0.25 0.52 0.13 0.18
0.04 0.05 0.21 0.41 0.12 0.16
0.02 0.02 0.11 0.21 0.06 0.08
–0.02 –0.02 –0.11 –0.23 –0.05 –0.08
–0.02 –0.03 –0.11 –0.21 –0.06 –0.09
0.10 0.13 0.57 1.16 0.31 0.41
0.02 0.03 0.14 0.29 0.08 0.10
0.05 0.09 0.05 0.01 0.06 0.08
0.14 0.32 0.09 0.04 0.20 0.25
0.21 0.44 0.14 0.07 0.29 0.36
0.17 0.35 0.12 0.06 0.24 0.29
0.07 0.14 0.05 0.03 0.09 0.09
–0.09 –0.19 –0.06 –0.02 –0.15 –0.20
–0.10 –0.21 –0.07 –0.03 –0.14 –0.18
0.45 0.93 0.31 0.17 0.59 0.69
0.11 0.23 0.08 0.04 0.15 0.17
0.03 0.09 0.13 0.10 0.00 –0.06 –0.06 0.05 0.16 0.24 0.20 0.09 –0.10 –0.11
0.24 0.52
0.06 0.13
0.00 0.05 0.06 0.08
0.00 0.09 0.09 0.11
0.00 0.08 0.08 0.10
0.00 0.04 0.03 0.04
–0.00 –0.03 –0.04 –0.05
472
Eric J. Bartelsman and J. Joseph Beaulieu
when one assumes that the upper bound of Y2K spending ($150 billion) went to replacing high-tech equipment and software that was scrapped and replaced.15 The largest effect on any aggregate in any year is 0.52 percent in the communications industry in 1997. The extra scrappage reduces the growth rate of capital services. Because real output is not changed, the lower contribution from capital services means that TFP must have been higher, in this case by 0.52 percentage points. In a few industries, such as communications, depository and nondepository institutions, and business and miscellaneous professional services, the effect of Y2K scrappage could be important. For the rest, the effect appears to have been small relative to the average year-to-year variation in TFP. In total, if capital services are adjusted along the lines suggested in the preceding, the rate of growth in TFP would be 16 basis points higher in the second half of the 1990s and 13 points lower in 2000 and 2001. Assuming a more moderate level of Y2K spending that represents replacement investment ($50 billion) reduces the cumulative effect to one-quarter of the upper-bound effect. 15.5 Conclusion This paper explicates a general approach to the problem of building a consistent data set for the study of economic issues. Coding observations in a relational database allows us to easily manipulate economic data, while the metadata help us to preserve the numerous linear relations across variables. The tools that we have developed take advantage of the standardized data and metadata in order to build a consistent data set. The system was originally conceived to aid in the study of productivity. To that end, we started with the BEA’s GPO data. We concorded the GPO data before 1987, which are organized using the 1972 SIC, to the more recent data, which use the 1987 SIC to classify industries. We then supplemented the data set by including estimates of employee and all persons hours from the NIPA and the BLS, as well as estimating some missing pieces of data, such as gross output for some industries before 1987 and some price deflators. We also concorded the BEA’s estimates of investment by industry and by type to the GPO data. To study productivity, we linked data from the input-output tables to calculate Domar weights; we incorporated data from the Current Population Survey, March Supplement 1977–2001, to estimate labor services; and we employed some specialized tools that we developed to estimate capital stocks, capital services, and TFP. Finally, we decomposed all of the data by legal form of organization,
15. For the exercise, the underlying database was first aggregated to the level of detail available for the Y2K spending. For each of these activity groups, TFP was computed using sectoral output and net intermediate use concepts. For higher level aggregates, the TFP was aggregated using appropriate Domar weighting.
A Consistent Accounting of U.S. Productivity Growth
473
controlling the estimates to be consistent with industry totals and aggregate legal-form totals in the NIPA. Our overall estimates of TFP growth by industry generate the same qualitative results seen elsewhere. Total factor productivity accelerated in the latter part of the 1990s and was particularly high in most industries outside the service sector. The contribution to output growth from increased investment in high-tech capital equipment also increased. We also demonstrated how the system could be employed to reconsider assumptions made in the construction of data and counterfactual exercises. In this small experiment, we took estimates of the amount of spending to remedy the Y2K problem and assumed that some fraction of this estimate was not an increment to the capital stock but instead purely replaced an unusually high amount of capital that was scrapped because it was potentially infected with the Y2K bug. Except for a few industries, the effects on TFP were likely small unless one were to assume that the scrappage associated with the century date change was very large. A few obvious extensions are possible. Fully incorporating the inputoutput data, including making them fully consistent with the value added data in the GPO data set, would open up several research avenues. Immediately, it would allow us to have a fully consistent application of Domar weighting. It would allow us to study various price-markup models and to perform various counterfactuals, such as the effects of different productivity growth rates among intermediate producers on prices and aggregate productivity. If, at the same time, separate estimates of input-output tables at the same level of aggregation controlled to the current expenditure-side estimates of GDP were available, we could study the statistical discrepancy. Extending the input-output tables further back and incorporating auxiliary information on prices will enable us to estimate industry price deflators before 1977. In putting together our preliminary estimates of capital services, we simply used the BEA’s estimates of investment by industry and by type that it employs to estimate capital consumption and wealth. However, these estimates are based on limited data of investment by industries outside of census years and are not based on any systematic information on investment by both industry and by type in any year (see, for instance, Bonds and Aylor 1998). Indeed, even though the BEA has made these data available to the public on their Web site, they consider them unpublished because they do not rise to their usual standards for statistical reliability. In the future, we plan to examine how sensitive the capital services estimates are to other plausible distributions of investment. Based in part on conversations with our colleagues, we suspect that the distribution of computer investment could matter importantly, but for other types of equipment, the effects may be small. At the same time, we plan to examine how important the depreciation estimates are for estimates of capital services.
474
Eric J. Bartelsman and J. Joseph Beaulieu
Finally, the system has the tools necessary to start with the most microlevel data sets. Many of the problems of switching classifications and crossclassification would be better approached by working with plant- and firmlevel data. For example, a better concordance between the SIC and NAICS could be developed by attaching SIC and NAICS codes to each firm or establishment in a particular year (based on the same logic used to apply the original activity code to a respondent in the survey) and then tabulating a concordance for each relevant variable. Indeed, a joint Federal Reserve– Census project is currently under way to develop such a concordance for manufacturing using the Longitudinal Research Database (Klimek and Bayard 2002). The same method could be used in making a firmestablishment cross-classification by linking enterprise, firm, and establishment codes at the micro level, and then merging and aggregating different data sources to create a cross-classification table.
Appendix Concording the Input-Output Tables to the GPO Data A handful of input-output industries had to be split among two or more GPO industries. The following tables describe how the weights for the concordance were calculated in order to allocate the outputs and inputs of these IO commodities and industries among the GPO industries. The 1982 table was mapped to 1972 GPO industries and then concorded to 1987 industries using the same concordance that was used for gross output in the GPO. In calculating price deflators, the reverse was done, and the 1987 table was concorded to the 1972 SIC. After the concordance, the IO tables were adjusted to account for the new treatment of software in the NIPA. All three published tables (1982, 1987, 1992) treat prepackaged and custom software as intermediate inputs and do not count own-account software as an output. As of the 2000 revision, the BEA began to count software as investment (Parker and Grimm 2000). To adjust the IO tables, we reduced the amount of the use of the commodity “computer and data processing services” by the amount of investment in prepackaged and custom software, and we raised the make of the same commodity by the amount of own-account software investment.16 The first columns of tables 15A.1–15A.3 report the IO code, and the 16. We did not adjust manufacturing in 1992 for custom software because Moylan (2001) indicates that the 1992 and 1997 censuses did not collect information on purchases of services by manufacturers, which we take to mean what is now known as custom software investment.
01–2 07–9
15–17 65re
15–17 65re
10 11–12 13 14
10 11–12 14
20 52–59
23 39
28 33
04.0001
11.0101
11.0103
11.0602
11.0603
14.1801
18.0400
38.0400
Sh
Sh Dir
Sh GO
GO GO GO
GO GO GO GO
GO GO
GP GP
Dir Dir
2819 3334
231–8
2051 542–9
1081 1112 1481
1081 1112 138 1481
15–17 65re
15–17 65re
1.0 1.0
1.0 .1
1.0 .434
1.0 1.0 1.0
1.0 1.0 1.0 1.0
1.0 .122
.796 .122
.5 .5
No.
Comment
2819 3334
231–8 3999pt
2051 5462
1081 1112 1481
1081 1112 138 1481
15–17 6552
15, 16 6552
Shipments of 39996 (Furs dressed and dyed) in 1982 Census
(continued )
Ratio of employees in 5462 (Bakeries) to employees in all food stores excluding grocery stores in 1982
Ratio of employees in 655 to 65 in 1982 times 1/2 to split 655 between 6552 and 6551
Ratio of employees of 15 and 17 to 15–17 in 1982 Ratio of employees in 655 to 65 times 1/2 to split 655 between 6552 and 6551
0254, 0279pt No information; split 04.0001 evenly between 01–2 and 07–09 071–2, 5–6, 085, 092
SIC code
Splitting IO industries to different GPO industries, 1982
GPO GPO IO code industry Indicator indicator
Table 15A.1
(continued)
GP
61 67
07–09 80
67
84, 89
70.0200
77.0302
77.0504
86
GP
52–59 73 80
69.0200
GO Dir
GP GP
Clc GP Dir
GP
42 47
65.0300
GP GP
40 47
65.0100
86
84, 89
67
07
61 67
Mixed 73
42 47
40 47
GPO GPO IO code industry Indicator indicator
Table 15A.1
.083
.073
.1125
.140
1.0 .888
.01 14.5
1.0 .025
1.0 .05
No.
865, 9
84, 8922
6732
074 8049, 807–9
61 67, excl. 6732
52–7,9 excl. 5462 7396 8042
42 4789pt
40 474, 4789pt
SIC code
Ratio of employees of 673 in 2000 (from occupation by industry data) to employees in 67 times 1/2 to split between 6732 and 6733 1/4 of employees in 873 employees in 84 in 1999 (from occupation data) divided by employees in 84, 87, and 89 Ratio of employees in political organizations and membership organizations, n.e.c. to all employees in 86 (from occupation data)
Ratio of employees in 074 to 07 in 1982 Revenue of 8049 and 807–9 from 1982 Census
One minus ratio of employees of 673 in 2000 (from occupation by industry data) to employees in 67 times 1/2 to split between 6732 and 6733
Calculated as GP(52 – 9) [1 – GO(548)]/GO(52 – 9) Assumed to be small Revenue of 8042 from 1982 Census
Have to use value added because no gross output data available for 47 Same as with 65.0100
Have to use value added because no gross output data are available for 47 Assume 4741, 4738, 4785, and 4789 are same size and 4789 split evenly between 65.0100 and 65.0300
Comment
01–2 07–09
15–17 65re
10 12 13 14
10 12 14
18 52–59
28 33
40 47
42 47
04.0001
11.0000
11.0602
11.0603
14.1801
38.0400
65.0100
65.0300
GO
G0 G0
Sh
Sh GO
GO GO GO
GO GO GO GO
GO GO
Dir
42 474–8
40 474–8
2819 3334
2051 542–9
1081 1241 1481
1081 1241 138 1481
15T7 653
1.0 .125
1.0 .375
1.0 1.0
1.0 .485
1.0 1.0 1.0
1.0 1.0 1.0 1.0
1.0 .149
.5 .5
No.
Comment
42 4789pt
40 4741, 4789pt
2819 3334
2051 5461
1081 1241 1481
1081 1241 138 1481
15–17 6552
Like 65.0100 (continued )
Assume 4741, 4738, 4785, and 4789 are same size and 4789 split evenly between 65.0100 and 65.0300
Ratio of employees in 5461 (Bakeries) to employees in all food stores excl. grocery stores in 1987
Ratio of employees in 655 to 653 in 1987 times 1/2 to split 655 between 6552 and 6551
0254, 0279pt No information so split 04.0001 evenly between 01–2 and 07–09 071–2, 075–6, 085, 092
SIC code
Splitting IO industries to different GPO industries, 1987
GPO GPO IO code industry Indicator indicator
Table 15A.2
(continued)
GO
GO GO GO
07–09 80 80
67
84 86 86
77.0302
77.0504
GO Dir GO
GO go
61 67
70.0200
GO GO Dir Dir
52–59 52–59 73 80
69.0200
84 865 869
67
807–9
074
61 67
527–9 542–9
GPO GPO IO code industry Indicator indicator
Table 15A.2
1.0 1.0 1.0
.112
1.0 3.6 1.0
1.0 .888
1.0 –.485 .3 3.5
No.
84 865 869
6732
074 8043, 8049 807–9
61 67 excl. 6732
52–7,9 excl. 5462 7396 8042
SIC code
Ratio of employees of 673 in 2000 (from occupation by industry data) to employees in 67 times 1/2 to split between 6732 and 6733
Revenue of 8043 and 8049 in 1987 Census
One minus ratio of employees of 673 in 2000 (from occupation by industry data) to employees in 67 times 1/2 to split between 6732 and 6733
Have to exclude 5462, as calculated above (14.1801) Revenue of 7396 from 1987 Census Revenue of 8042 from 1982 Census
Comment
GO
GO GO GO
15–17 65re
10 12 13 14
10 12 14
40 47
61 67
67
84 86 86
11.0108
11.0602
11.0603
65.0100
70.0200
77.0504
GO GO
GO Dir
GO GO GO
GO GO GO GO
GO Dir
GO Dir
15–17 63re
11.0101
Dir
01–02 07–09
04.0001
84 865 869
67
61 67
40
1081 1241 1481
1081 1241 138 1481
110108
110101
1.0 1.0 1.0
.112
1.0 .888
1.0 1.9
1.0 1.0 1.0
1.0 1.0 1.0 1.0
1.0 4.6
1.0 4.6
.5 .5
No.
Comment
84 865 869
6732
61 67, excl. 6732
40 474
1081 1241 1481
1081 1241 138 1481
15, 17 6552
15, 17 6552
Ratio of employees of 673 in 2000 (from occupation by industry data) to employees in 67 times 1/2 to split between 6732 and 6733.
One minus ratio of employees of 673 in 2000 (from occupation by industry data) to employees in 67 times 1/2 to split between 6732 and 6733.
Revenue of 474 in 1992 Census
Half of revenue of 6552 in 1992 Census
Half of revenue of 6552 in 1992 Census
0254, 0279pt No information so split 04.0001 evenly between 01–2 and 07–09 071–2, 075–6, 085, 092
SIC code
Splitting IO industries to different GPO industries, 1992
GPO GPO IO code industry Indicator indicator
Table 15A.3
480
Eric J. Bartelsman and J. Joseph Beaulieu
second columns indicate to which GPO industries these IO codes map. The next three columns show how, in one of two ways, the weights were calculated. Either the weight was written down directly, or it was set as some fraction of a particular indicator. If the weights were entered directly, the column “Indicator” equals “Directly”; the column “No.” reports the value of the weight in billions of dollars; and the last column reports the source for the weight. Otherwise, the weight equals the value in “No.” times the indicator noted in the columns “Indicator” and “GPO indicator.” The values in the “Indicator” column can equal GO (gross output), GP (gross product), or Sh (manufacturing shipments). The column “GPO indicator” reports the particular industry that is used as an indicator. If “No.” does not equal one, the “Comment” column describes how the fraction was calculated. For instance, the 1982 IO industry 11.0101 had to be split in two. The weight used to calculate the fraction that is part of GPO industry 15–17 was set to 0.796 times the gross product of GPO industry 15–17; the weight used to allocate the rest of 11.0101 to GPO industry 65re (real estate excluding owner-occupied housing) was set equal to 0.122 times the gross product of industry 65re.
References Aaronson, Daniel, and Daniel Sullivan. 2001. Growth in worker quality. Federal Reserve Bank of Chicago Economic Perspectives 25 (4): 53–74. Bacharach, Michael. 1965. Estimating nonnegative matrices from marginal data. International Economic Review 6 (3): 294–310. Bartelsman, Eric J., and Wayne Gray. 1996. The NBER manufacturing productivity database. NBER Technical Working Paper no. T0205. Cambridge, MA: National Bureau of Economic Research, October. Bell, William R., and David W. Wilcox. 1993. The effect of sampling error on the time series behavior of consumption. Journal of Econometrics 55 (1–2): 235–65. Bonds, Belinda, and Tim Aylor. 1998. Investments in new structures and equipment in 1997 by using industries. Survey of Current Business 78 (12): 26–51. Corrado, Carol, and Lawrence Slifman. 1999. Decomposition of productivity and unit costs. American Economic Review: Papers and Proceedings 89 (2): 328–32. Domar, Evsey D. 1961. On the measurement of technological change. Economic Journal 71 (284): 709–29. Fraumeni, Barbara M. 1997. The measurement of depreciation in the U.S. national income and product accounts. Survey of Current Business 77 (7): 7–23. Golan, Amos, George Judge, and Sherman Robinson. 1994. Recovering information from incomplete or partial multisectoral economic data. Review of Economics and Statistics 76 (3): 541–49. Gollop, Frank M. 1987. Modeling aggregate productivity growth: The importance of intersectoral transfer prices and international trade. Review of Income and Wealth 33 (2): 211–27.
A Consistent Accounting of U.S. Productivity Growth
481
———. 1979. Accounting for intermediate input: The link between sectoral and aggregate measures of productivity. In Measurement and interpretation of productivity, Washington, DC: National Academy of Sciences. ———. 1979. Accounting for intermediate input: The link between sectoral and aggregate measures of productivity. In Measurement and interpretation of productivity, ed. Albert Rees and John Kendrick, 318–33. Washington, DC: National Academy of Sciences. Griliches, Zvi. 1986. Economic data issues. In Handbook of econometrics. Vol. 3, ed. Zvi Griliches and Michael D. Intriligator, 1465–1514. Oxford, UK: NorthHolland. Günlük-S¸enesen, Gulay, and John M. Bates. 1988. Some experiments with methods of adjusting unbalanced data matrices. Journal of Royal Statistical Society A 151 (3): 473–90. Hall, Robert E., and Dale W. Jorgenson. 1967. Tax policy and investment behavior. American Economic Review 57 (3): 391–414. Hulten, Charles R. 1978. Growth accounting with intermediate inputs. Review of Economic Studies 45 (3): 511–18. Jorgenson, Dale W., Mun S. Ho, and Kevin J. Stiroh. 2002. Information technology, education, and the sources of economic growth across U.S. industries. Harvard University. Mimeograph. Jorgenson, Dale W., and Kevin J. Stiroh. 2000. Raising the speed limit: U.S. economic growth in the information age. Brookings Papers on Economic Activity, Issue no. 1:125–211. Washington, DC: Brookings Institution. Klimek, Shawn D., and Kimberly N. Bayard. 2002. Classifying the Census of Manufactures from the standard industry classification system, 1963 to 1992. U.S. Census Bureau, Center for Economic Studies. Mimeograph. Miron, Jeffrey A., and Stephen P. Zeldes. 1989. Production, sales, and the change in inventories: An identity that doesn’t add up. Journal of Monetary Economics 24 (1): 31–51. Mohr, Michael F., and Charles E. Gilbert. 1996. Capital stock estimates for manufacturing industries: Methods and data. Board of Governors of the Federal Reserve System, March. http://www.federalreserve.gov/releases/G17/capital_stock _doc-latest.pdf. Moylan, Carol. 2001. Estimation of software in the U.S. national accounts: New developments. OECD Statistics Directorate Working Paper no. STD/NA(2001) 25. Paris: Organization for Economic Cooperation and Development, September. Parikh, Ashok. 1979. Forecasts of input-output matrices using the R.A.S. method. Review of Economics and Statistics 61 (3): 477–81. Parker, Robert, and Bruce Grimm. 2000. Recognition of business and government expenditures for software as investment: Methodology and quantitative impacts, 1959–98. Bureau of Economic Analysis. Unpublished Manuscript. http://www .bea.gov/bea/papers/software.pdf. Parker, Robert P., and Eugene P. Seskin. 1997. The statistical discrepancy. Survey of Current Business 77 (8): 19. Postner, Harry H. 1984. New developments towards resolving the companyestablishment problem. Review of Income and Wealth 30 (4): 429–59. Schneider, Michael H. and Stavros A. Zenios. 1990. A comparative study of algorithms for matrix balancing. Operations Research 38:439–55. Seskin, Eugene P., and Robert P. Parker. 1998. A guide to the NIPAs. Survey of Current Business 78 (3): 26–68. Sinkhorn, Richard. 1964. A relationship between arbitrary positive matrices and doubly stochastic matrices. Annals of Mathematical Statistics 35:876–79.
482
Eric J. Bartelsman and J. Joseph Beaulieu
Stone, Richard and Alan Brown. 1962. A compatible model of economic growth. London: Chapman and Hall. Stone, Richard, David G. Champernowne, and James E. Meade. 1942. The precision of national income estimates. The Review of Economic Studies 9 (2): 111–25. Weale, Martin. 1985. Testing linear hypothesis on national account data. Review of Economics and Statistics 67 (4): 685–89. Wilcox, David W. 1992. The construction of U.S. consumption data: Some facts and their implications for empirical work. American Economic Review 82 (4): 922–41. U.S. Department of Commerce, Economics and Statistics Administration. 1999. The economics of Y2K and the impact on the United States. November 17. Washington, DC: U.S. Department of Commerce. Yuskavage, Robert E. 2002. Gross domestic product by industry: A progress report on accelerated estimates. Survey of Current Business 82 (6): 19–27.
16 Should Exact Index Numbers Have Standard Errors? Theory and Application to Asian Growth Robert C. Feenstra and Marshall B. Reinsdorf In practice, therefore, Professor Fisher’s formula may often do no harm. The objection to the formula is . . . that it does not bring home to the computer, as the previous methods inevitably do, the nature and the degree of the error which is involved. —John Maynard Keynes
16.1 Introduction In the stochastic approach to index numbers, prices are viewed as draws from some distribution, and the price index is viewed as a measure of the trend change in prices, with an estimable standard error. The most comprehensive treatment of this problem is by Selvanathan and Rao (1994), but the idea dates back to Keynes (1909) and earlier writers, such as Jevons and Edgeworth. Keynes points out that the price changes reflect both a common trend (generalized inflation) and commodity-speciffic trends,1 which make the common trend difficult to identify. Selvanathan and Rao (1994, 61–67) attempt to solve this problem using purely statistical techniques, as we describe in section 16.2, and the standard error of their price index reflects the precision of the estimate of the trend. Keynes does not offer a solution, but elsewhere he observes that changes in the purchasing power of money can occur for three distinct reasons: “The first of these reasons we may classify as a change in tastes, the second as a change in environment, and the third as a change in relative prices” (1930, 85). Robert C. Feenstra is a professor of economics at the University of California, Davis, and a research associate of the National Bureau of Economic Research. Marshall B. Reinsdorf is a senior research economist at the U.S. Department of Commerce, Bureau of Economic Analysis. We are grateful to Angus Deaton, Jack Triplett, and Charles Hulten for helpful comments. 1. In Keynes (1909) essay on “Index Numbers,” section VIII deals with the “Measurement of General Exchange Value by Probabilities,” which is the stochastic approach. He writes: “We may regard price changes, therefore, as partly due to causes arising from the commodities themselves raising some, lowering others, and all different in degree, and, superimposed upon the changes due to these heterogeneous causes, a further change affecting all in the same ratio arising out of change on the side of money. This uniform ratio is the object of our investigations” (from The Collected Writing of John Maynard Keynes, volume XI, 106).
483
484
Robert C. Feenstra and Marshall B. Reinsdorf
The first factor identified by Keynes—changing tastes—can be expected to affect the weights in a price index and not just the prices. Accordingly, we will derive the standard error of a price index when both prices and tastes (or technology) are treated as stochastic. The rationale for our treatment of stochastic tastes (or technology) comes from the economic approach to index numbers (e.g., Diewert 1976), which shows that certain price indexes, known as exact indexes, equal the ratio of expenditures needed to obtain a fixed level of utility at two different prices. This ratio of expenditures depends on the tastes of the consumer, so if the taste parameters are stochastic, then the exact index number is also. Section 16.3 describes how we allow for both random prices and random tastes, thereby integrating the stochastic and economic approaches to index numbers. We use our integrated approach to stochastic index numbers to derive estimators for index number standard errors for two well-known models of tastes or technology. The first of these is the constant elasticity of substitution (CES) expenditure function (for a consumer) or cost function (for a firm). In section 16.4, we suppose that the CES taste parameters are random and obtain a simple specification for demand that depends on the random parameters and on prices. The estimated error from this demand equation can be used to infer the standard error of the exact price index. Inverting the demand equation, we also obtain a simple specification in which price changes depend on a trend (the price index) and a component (log changes in expenditure shares) that has an average of zero, just as is supposed for the error term in an identical set of equations derived under the stochastic approach. The CES case therefore provides a good comparison to the specification used by Selvanathan and Rao (1994). In section 16.5, we extend our treatment of the CES case to deal with both random prices and random tastes, allowing an additional comparison to the Selvanathan and Rao results. We next apply our integrated approach to stochastic index numbers to the translog case, considering the effects of random tastes in section 16.6 and the effects of both random tastes and random prices in section 16.7. The demand equations estimated are the familiar translog expenditure share equations, and, again, the error in this system of regressions is used to infer the standard error of the exact price index. Although the CES system provides a particularly clear comparison with the conventional stochastic approach, a linear relationship between the shares and the taste parameters makes the translog system easier to implement than the nonlinear CES system, and we recommend the translog for future use. In section 16.8, we provide an application of our results to Asian productivity growth and, in particular, productivity growth in Singapore. The extent to which the East Asian countries are “exceptional” or not in terms of their productivity growth has been a topic of debate between the World Bank (1993) and Young (1992, 1995). Citing the estimates of zero or nega-
Theory and Application to Asian Growth
485
tive productivity growth in Singapore found by Young and also Kim and Lau (1994), Krugman (1994) popularized the idea the growth in some East Asian countries is mainly due to capital accumulation and, in that respect, is not much different than the former Soviet Union: certainly not a miracle. Recently, however, Hsieh (2002) has reexamined the productivity performance of several East Asia countries using dual measures of total factor productivity (TFP) and for Singapore finds positive productivity growth, contrary to Young. The difference lies in Hsieh’s use of “external” rates of return for capital computed from three different sources, which are then used in a dual calculation of productivity growth; this contrasts with Young’s calculation of primal productivity growth, which implicitly uses an “internal” return on capital. We use Hsieh’s three different rates of return on capital to compute the standard error of that series, and of estimates of TFP, where we also incorporate the error in fitting the translog function. In the results, we find that TFP growth in Singapore is insignificantly different from zero for any single year in the sample. The same holds true when estimating cumulative TFP growth over any five-year or ten-year period of the sample. For the fifteen-year period, however, we find that cumulative TFP growth in Singapore is significantly positive. Thus, the estimates of Hsieh (2002) are indeed statistically different from those of Young (1992, 1995), provided that cumulative TFP over a long enough time period is considered. Part of the difference between these results is found to be due to the use of different measures of the rate of return for capital. Drawing on Berndt and Fuss (1986), Hulten (1986) shows that when capital is not at a long-run equilibrium level, TFP measured using ex post returns—Young’s (1992, 1995) approach—differs from TFP measured using ex ante returns—Hsieh’s approach—by an amount that reflects the overutilization or underutilization of capital. We implement a theorem of Hulten to show that part of the difference between Hsieh’s TFP estimates and Young’s TFP estimates arises from the greater ex post returns on capital investment associated with levels of capital that are below the equilibrium level. 16.2 The Stochastic Approach to Index Numbers An example of the stochastic approach to price indexes is a model where price changes satisfy the equation: (1)
pit ln t eit, pit1
i 1, . . . , N,
where the errors are independent and heteroscedastic satisfying E(eit ) 0 and var(eit ) 2/wi , where wi are some exogenous values that sum to unity, ΣNi1wi 1. Under these conditions, an unbiased and efficient estimate of trend change in prices t is,
486
(2)
Robert C. Feenstra and Marshall B. Reinsdorf
N pit ˆ t ∑ wi ln , p i1 it1
which can be obtained by running weighted least squares (WLS) on equation (1) with the weights w i . An unbiased estimate for the variance of ˆ t is (3)
s 2p s2 , N1
where s2p ΣNi1 wi ( ln pit – ˆ t )2. Diewert (1995) criticizes the stochastic approach and argues that (a) the common trend t in equation (1) is limiting; (b) the variance assumption var(eit ) 2/wi is unrealistic; and (c) some choices of wi (such as budget shares) will not be exogenous. In assessing these criticisms, we believe that a distinction should be made between lower-level and higher-level aggregation. At higher levels, these criticisms seem apt, and simple extensions to the model in equation (1) are unable to resolve them in completely satisfactory ways. In particular, to avoid the assumption of a single common trend for all prices, Selvanathan and Rao (1994, 61–67) add commodityspecific trends i to equation (1): (1 )
pit ln t i eit , pit1
i 1, . . . , N,
where again the errors are independent and satisfy E(eit ) 0 and var(eit ) 2/wi , with ΣNi1wi 1. For the estimator of the common trend t to be identified, some assumption is needed on i . Selvanathan and Rao show that the estimator for t is still given by equation (2) under the assumption that the commodity-specific trends have a weighted average of zero:2 N
(4)
∑ w 0. i
i
i1
The justification for equation (4) is purely statistical, that is, it allows t to be identified, though we will suggest an economic interpretation in section 16.4. In contrast, at the lowest level of aggregation, prices from different sellers of the same commodity are typically combined into indexes for individual commodities or for narrow classes of closely related commodities. At this level, the assumption of a common trend, such as t in equation (1), will often be realistic. The stochastic approach in equations (1)–(3) can then be used to form the elementary indexes that are combined at higher levels of aggregation into indexes for all commodities or for broad classes 2. The standard error of ˆ t when the commodity effects are used is less than that in equation (3) as the residual error of the pricing equation is reduced.
Theory and Application to Asian Growth
487
of commodities. If the expenditure shares needed to compute the weights wi are unavailable for lower-level aggregates so that wit 1/N is used, the stochastic approach in equations (1)–(3) amounts to using a simple average of log changes for prices and its variance. 16.3 Integrating the Economic and Stochastic Approaches to Index Numbers Despite Keynes’ early contribution to the literature on the stochastic approach to index numbers, he ultimately rejected it. Keynes (1930, 87–88) wrote,3 I conclude, therefore, that the unweighted (or rather randomly weighted) Index-Number of Prices—Edgeworth’s indefinite index number—which shall in some way measure the value of money as such or the amount of influence on general prices exerted by “changes on the side of money” or the “objective mean variation of general prices” as distinguished from the “change in the power of money to purchase advantages,” has no place whatever in a rightly conceived discussion of the problems of Price-Levels. We also believe that index numbers with weights that reflect expenditure patterns are more interesting and informative than index numbers that have a purely statistical motivation. To motivate the incorporation of expenditure information in our index, we assume that the objective is to estimate an economic index, which, for the consumer problem, is defined as the ratio of the expenditure function evaluated at current period prices to the expenditure function evaluated at reference period prices. Adopting an economic index as the goal of estimation makes the link between the stochastic properties of the data and the stochastic properties of the estimator less straightforward than when the goal is simply to estimate a mean price change. Nevertheless, any kind of index number calculated from stochastic data is itself stochastic. Moreover, a model of the stochastic processes reflected in the data used to calculate the economic index should allow the derivation of an estimator for its standard error. Our starting point is the stochastic process for the expenditure shares used to calculate the weights in the index. To estimate an economic index requires an assumption about the form of the expenditure function that describes tastes. (For simplicity, our discussion will be in terms of the consumer problem although the approach is equally applicable to the producer problem.) This assumption implies a functional form for the 3. Keynes’ (1930) argument against stochastic price models with independent commodityspecific shocks was that linkages between prices in an economy preclude shocks that affect a single price in isolation: “But in the case of prices a movement in the price of one commodity necessarily influences the movement in the prices of other commodities” (86).
488
Robert C. Feenstra and Marshall B. Reinsdorf
equations relating expenditure shares to prices. Because these equations generally do not fit the data on expenditure shares precisely, they imply the existence of an error term. We interpret changes in expenditure shares not explainable by changes in prices as arising from stochastic tastes. If we were able to take repeated draws from the distribution of the taste parameters in the expenditure function while holding prices constant, we would observe a range of outcomes for expenditure shares: this is one source of variance in an economic index. A second source of variance in the price index is sampling error in the price data.4 When prices are treated as stochastic, we need to decide whether the expenditure shares are determined by observed prices or by expected prices. Grunfeld and Griliches (1960, 7), for example, recognize that models of consumer behavior may be specified using either expected prices (and income) or observed prices (and income). We shall take the latter approach, and assume that observed prices determine expenditure shares.5 In that case, the error term for prices influences the expenditure shares, so a component of the variance of expenditure shares comes from the variance of prices. In our following translog results, we include components representing the variance of expenditure shares that comes from the variance of prices (see proposition 7). However, for the nonlinear CES model with prices treated as stochastic, we are unable to include these components in our variance estimator. In addition to the indirect effects arising from the equations relating prices to expenditure shares, which are small, price variances have an important direct effect on the index’s variance. We therefore extend our results for both the CES model and for the translog model to take account of the direct effect on the index of sampling error in the measures of prices used to construct the index. We assume that the lower-level aggregates in the index are price indexes for individual commodities or for narrow categories of items that are homogeneous enough to be treated as a single commodity. Each commodity has its own nonstochastic price trend, but rather than summing to zero, as in equation (4), with properly chosen weights these commodity price trends sum up to the true index expressed as a logarithm. As sample estimators of these trends, the lower-level aggregates used to construct the index are subject to sampling error. Equation (1) describes the process generating the changes in individual price quotes that 4. Sampling error in the expenditure shares changes the interpretation and derivation of the estimator of the index standard error, but not the estimator itself. 5. Under the alternative hypothesis that expected prices are the correct explanatory variable, regression equations for expenditure shares using observed prices must be regarded as having mismeasured explanatory variables. Measurement error in the prices used to explain expenditure patterns may imply a bias in the estimates of the contribution of the taste variance to the index variance, but it does not necessarily do so. If sampling error in the price variables reduces their ability to explain changes in expenditure shares, too much of the variation in expenditure shares will be attributed to changes in tastes, and the estimate of the taste variance will be biased upward. Nevertheless, even if such a bias exists, it will usually be negligible.
Theory and Application to Asian Growth
489
are combined in a lower-level aggregate. If all the quotes have identical weights and variances, the variance of the lower-level aggregate can therefore be estimated by equation (3). Finally, in addition to sampling error in tastes and in estimates of commodity prices, another source of inaccuracy in an estimate of a cost-ofliving index is that the model used to describe tastes might be misspecified. This was thought to be an important problem until Diewert (1976) showed that by using a flexible functional form, an arbitrary expenditure function could be approximated to the second order of precision. We do not explicitly estimate the effect of possible misspecification of the model of tastes on the estimate of the cost-of-living index, but our standard error estimator does give an indication of this effect. An incorrectly specified model of tastes will likely fit the expenditure data poorly, resulting in a high estimate of the component of the index’s variance that comes from the variance of tastes. On the other hand, our estimator will tend to imply a small standard error for the index if the model fits the expenditure data well, suggesting that the specification is correct, or if growth rates of prices are all within a narrow range, in which case misspecification does not matter. One source of misspecification that can be important is an incorrect assumption of homotheticity. (Diewert’s [1976] approximation result for flexible function forms does not imply that omitting an important variable, such as income in nonhomethetic cases, is harmless.) In the economic models considered in this paper, homotheticity is assumed for the sake of simplicity. In applications to producers, or in applications to consumers whose income changes by about the same amount as the price index, this assumption is likely to be harmless, but when consumers experience large changes in real income, the effects on their expenditure patterns are likely to be significant. If a model that assumes homotheticity is used and changes in real income cause substantial variation in expenditure shares, the estimate of the variance of the index’s weights is likely to be elevated because of the lack of fit of the model. Knowing that the estimate of the economic index has a wide range of uncertainty will help to prevent us from having too much confidence in results based on an incorrect assumption, but relaxing the assumption would, of course, be preferable.6 6. Caves, Christensen, and Diewert (1982) find that the translog model that we discuss in the following can be extended to allow the log of income to affect expenditure patterns and that the Törnqvist index still measures the cost-of-living index at an intermediate utility level. However, to model nonhomothetic tastes for index number purposes, we recommend the use of Deaton and Muellbauer’s (1980) “Almost Ideal Demand System (AIDS).” Feenstra and Reinsdorf (2000) show that for the AIDS model, the predicted value of expenditure shares at the average level of logged income and logged prices must be averaged with the Törnqvist index weights (which are averages of observed shares from the two periods being compared) to obtain an exact price index for an intermediate standard of living. However, the variance formulas that we derive in the following for the translog model can still be used to approximate the variance of the AIDS index as long as logged income is included among the explanatory variables in the regression model for expenditure shares.
490
Robert C. Feenstra and Marshall B. Reinsdorf
16.4 The Exact Index for the CES Model with Random Technology or Tastes 16.4.1 Exactness of the Sato-Vartia Index in the Nonrandom Case Missing from the stochastic approach is an economic justification for the pricing equation in (1) or (1 ), as well as the constraint in equation (4). It turns out that this can be obtained by using a CES utility or production function, which is given by, f(xt , at )
N
∑a x it
(1) it
i1
/(1)
,
where x t (x1t , . . . xNt ) is the vector of quantities, and at (a1t , . . . aNt ) are technology or taste parameters that we will allow to vary over time, as described in the following. The elasticity of substitution 0 is assumed to be constant. We will assume that the quantities x t are optimally chosen to minimize ΣNi1 pit xit , subject to achieving f (xt , at ) 1. The solution to this optimization problem gives us the CES unit-cost function, (5)
c(pt, bt)
N
∑b p it
i1
1 it
1/(1)
,
where pt (p1t , . . . pNt ) is the vector of prices, and bit ait 0, with bt (b1t , . . . bNt ). Differentiating (5) provides the expenditure shares sit implied by the taste parameters bt : (6)
sit ∂ ln c(pt, bt )/∂ ln pit c(pt , bt )1 bit p1 it .
Diewert (1976) defines a price index formula whose weights are functions of the expenditure shares sit–1 and sit as exact if it equals the ratio of unit-costs. For the CES unit-cost function with constant bt , the price index due to Sato (1976) and Vartia (1976) has this property. The Sato-Vartia price index equals the geometric mean of the price ratios with weights wi : N
(7)
p
i1
pit
wi
it1
c(pt, bt ) , c(pt1, bt)
where the weights wi are defined as: (8)
(sit sit1)/(ln sit ln sit1) wi . ∑ Nj1 (sit sit1)/(ln sit ln sit1)
The weight for the with commodity is proportional to (sit – sit–1)/(ln sit – ln sit–1), the logarithmic mean of sit and sit–1, and the weights are normalized to sum to unity.7 Provided that the expenditure shares are computed with 7. If sit–1 equals sit , then the logarithmic mean is defined as sit .
Theory and Application to Asian Growth
491
constant taste parameters bt , the Sato-Vartia index on the left side of equation (7) equals the ratio of unit-costs on the right, also computed with constant bt . In addition to its ability to measure the change in unit-costs for the CES model, the Sato-Vartia index is noteworthy for its outstanding axiomatic properties, which rival those of the Fisher index (Balk 1995, 87). 16.4.2 Effect of Random Technology or Tastes As discussed in the preceding, we want to allow for random technology or taste parameters bit , and derive the standard error of the exact price index due to this uncertainty. First, we must generalize the concept of an exact price index to allow for the case where the parameters bt change over time, as follows:8 Proposition 1: Given bt–1 bt , let si denote the optimally chosen shares as in equation (6) for these taste parameters, t –1, t, and define bi bi / ΠNi1biwi, using the weights wi computed as in equation (8). Then there exists b˜i between bit–1 and bit such that, N pit wi c(pt, b˜ ) (9) . c(p , b˜ ) i1 p
it1
t1
This result shows that the Sato-Vartia index equals the ratio of unit-costs evaluated with parameters b˜i that lie between the normalized values of the bit in each period. Therefore, the Sato-Vartia index is exact for this particular value b˜ of the parameter vector, but the index will change as the random parameters bt–1 and bt change. The standard error of the index should reflect this variation. The following proposition shows how the variance of the log Sato-Vartia index is related to the variance of the ln bit , denoted by 2 : Proposition 2: Suppose that ln bi are independently and identically distributed with variance 2 for i 1, . . . , N and t – 1 or t. Using the weights wi as in equation (8), denote the log Sato-Vartia index by sv ΣNi1wi ln pit . Then conditional on prices, its variance can be approximated as (10)
N 1 1 var sv 2 ∑ w 2i ( ln pit sv )2 2 s 2p w, 2 2 i1
where w Σni1w 2i ( ln pit – sv )2/ ΣNi1wi ( ln pit – sv )2 is a weighted average of the wi with the weights proportional to wi ( ln pit – sv )2, and s 2p Σ Ni1wi ( ln pit sv )2 is the weighted variance of prices. Equation (10), which is conditional on the observed prices, shows how the variance of the Sato-Vartia index reflects the underlying randomness of 8. The proofs of all propositions are in an appendix at: http://www.econ.ucdavis.edu/ faculty/fzfeens/papers.html.
492
Robert C. Feenstra and Marshall B. Reinsdorf
the taste parameters bt and bt–1. In effect, we are computing the variance in the price index from the randomness in the weights wi in equation (8) rather than from the randomness in prices used in the conventional stochastic approach. To compare equation (10) to equation (3), note that w equals 1/N when the weights wi also equal 1/N. In that case, we see that the main difference between our formula (10) for the variance of the price index and formula (3) used in the conventional stochastic approach is the presence of the term 1/2 2. This term reflects the extent to which we are uncertain about the parameters of the underlying CES function that the exact price index is intended to measure. It is entirely absent from the conventional stochastic approach. We next show how to obtain a value for this variance. 16.4.3 Estimators for 2 and for the Index Variance To use equation (10) to estimate the variance of a Sato-Vartia price index, we need an estimate of 2 . Changes in expenditure shares not accounted for by the CES model can be used to estimate this variance. After taking logarithms, write equation (6) in first-difference form as (11)
ln sit ( 1) ln ct ( 1) ln pit ln bit,
where ln ct ln[c(pt, bt )/c(pt–1, bt–1)]. Next, eliminate the term involving ln ct by subtracting the weighted mean (over all the i ) of each side of equation (11) from that side. Then using the fact that ΣNi1wi ln sit 0, equation (11) becomes, (12)
ln sit t ( 1) ln pit εit,
i 1, . . . , N,
where N
(13)
t ( 1) ∑ wi ln pit , i1
and N
(14)
εit ln bit ∑ wi ln bit . i1
Equation (12) may be regarded as a regression of the change in shares on the change in prices, with the intercept given by equation (13) and the errors in equation (14). These errors indicate the extent of taste change and are related to the underlying variance of the Sato-Vartia index. Denote by ˆ t and ˆ the estimated coefficients from running WLSs on equation (12) over i 1, . . . , N, using the weights wi . Unless the supply curve is horizontal, the estimate of may well be biased because of a covariance between the error term (reflecting changes preferences and shifts in the demand curve) and the log prices. We ignore this in the next result, however, because we treat the prices as nonstochastic so there is no corre-
Theory and Application to Asian Growth
493
lation between them and the errors εit . We return to the issue of stochastic prices at the end of this section and in the next. The weighted mean squared error of regression (12) is useful in computing the variance of the Sato-Vartia index, as shown by the following result: N 2 Proposition 3: Define w Σi1w i as the weighted average of the wi , using wi from equation (8) as weights, with w as in proposition 2. Also, denote the mean squared error of regression (12) by s 2ε ΣNi1wi ˆε 2it , with ˆε it ln sit – ˆ t ( ˆ – 1) ln pit . Then an unbiased estimator for 2 is:
(15)
s2ε s 2 . 2(1 w w)
To motivate this result, notice that the regression errors εit in equation (14) depend on the changes in the bit , minus their weighted mean. We have assumed that the ln bit are independently and identically distributed with variance 2. This means that the variance of ln bit ln bit – ln bit–1 equals 2 2. By extension, the mean squared error of regression (12) is approximately twice the variance of the taste parameters, so the variance of the taste parameters is about one-half of the mean squared error, with the degrees of freedom adjustment in the denominator of (15) coming from the weighting scheme. Substituting equation (15) into equation (10) yields a convenient expression for the variance of the Sato-Vartia index, w s 2ε s 2p (16) var sv . 4(1 w w) If, for example, each wi equals 1/N, the expression for var sv becomes s 2ε s 2p / 4(N – 2). By comparison, the conventional stochastic approach resulted in the index variance s 2p /(N – 1) in equation (3), which can be greater or less than that in equation (16). In particular, when s 2ε 4(N – 2)/(N – 1), then conventional stochastic approach gives a standard error of the index that is too high, as will occur if the fit of the share equation is good. To further compare the conventional stochastic approach with our CES case, let us write regression (12) in reverse form as, (12 )
ln pit sv ( 1)1 ln sit ( 1)1 εit ,
where N
(13 )
sv ∑ wi ln pit , i1
and the errors are defined as in equation (14). Thus, the change in prices equals a trend (the Sato-Vartia index), plus a commodity-specific term reflecting the change in shares, plus a random error reflecting changing tastes. Notice the similarity between equation (12 ) and the specification of
494
Robert C. Feenstra and Marshall B. Reinsdorf
the pricing equation in equation (1 ), where the change in share is playing the role of the commodity-specific terms i . Indeed, the constraint in equation (4) that the weighted commodity-specific effect sum to zero is automatically satisfied when we use equation (12 ) and the Sato-Vartia weight wi in equation (8) because then ΣNi1wi ln sit 0. Thus, the CES specification provides an economic justification for the pricing equation (1 ) used in the stochastic approach. If we run WLS on regression (12 ) with the Sato-Vartia weights w , i the estimate of the trend term is exactly the log Sato-Vartia index, ˆ sv ΣNi1wi ln pit . If this regression is run without the share terms in equation (12 ), then the standard error of the trend is given by equation (3) as modified to allow for heteroscedasticity by replacing N with 1/ w . The standard error is somewhat lower if the shares are included. Either of these can be used as the standard error of the Sato-Vartia index under the conventional stochastic approach. By comparison, in formula (16) we are using both the mean squared error s2ε of the “direct” regression (12), and the mean squared error s2p of the “reverse” regression (12 )—run without the share terms. The product of these is used to obtain the standard error of the Sato-Vartia index, as in equation (16). Recall that we are assuming in this section that the taste parameters are stochastic, but not prices. Then why does the variance of prices enter equation (16)? This occurs because with the weights wi varying randomly, the Sato-Vartia index sv ΣNi1wi ln ( pit /pit–1) will vary if and only if the price ratios ( pit /pit–1) differ from each other. Thus, the variance of the Sato-Vartia index must depend on the product of the taste variance, estimated from the “direct” regression (12), and the price variance, estimated from the “reverse” regression (12 ) without the share terms. We should emphasize that in our discussion so far, regressions (12) or (12 ) are run across commodities i 1, . . . , N, but for a given t. In the appendix (available at the Web address given in footnote 8), we show how to generalize proposition 3 to the case where the share regression (12) is estimated across goods i 1, . . . , N and time periods t 1, . . . , T. In that case, the mean squared error that appears in the numerator of equation (15) is formed by taking the weighted sum across goods and periods. But the degrees of freedom adjustment in the denominator of equation (15) is modified to take into account the fact that the weights wi in (8) are correlated over time (as they depend on the expenditure shares in period t – 1 and t). With this modification, the variance of the taste parameters is still relatively easy to compute from the mean squared error of regression (12), and this information may be used in equation (10) to obtain the variance of the Sato-Vartia index computed between any two periods. Another extension of proposition 3 would be to allow for stochastic prices as well as stochastic taste parameters. This assumption is introduced in the next section under the condition that the prices and taste parameters
Theory and Application to Asian Growth
495
are independent. But what if they are not, as in a supply-demand framework where shocks to the demand curve influence equilibrium prices: then how is proposition 3 affected? Using the mean squared error s2ε of regression (12) in equation (15) to compute the variance of the taste disturbances, or in equation (16) to compute the variance of the index, will give a lower-bound estimate. The reason is that running WLS on regression (12) will result in a downward-biased estimate of the variance of tastes: E [s2ε /2(1 2 –w – w )] s if taste shocks affect prices because the presence of taste information in prices artificially inflates the explanatory power of the regression. 16.5 Variance of the Exact Index for the CES Function with Stochastic Prices Price indexes are often constructed using sample averages of individual price quotes to represent the price of goods or services in the index basket.9 In these cases, different rates of change of the price quotes for a good or service imply the existence of a sampling variance. That is, for any commodity i, ln pit will have a variance of 2i , which can be estimated by s 2i , the sample variance of the rates of change of the various quotes for commodity i. The variances of the lower-level price aggregates are another source of variance in the index besides the variances of the weights considered in proposition 3.10 In the special case where the commodities in the index are homogeneous enough to justify the assumption of a common trend for their prices and the log changes in commodity prices all have the same variance, an unbi2 ased estimator for this variance is s2p /(1 – w ), where s p is defined in propo11 sition 2. This special case yields results that are easily compared with the results from the conventional stochastic approach. In the more general case, no restrictions are placed on the commodityspecific price trends, but we do assume that the price disturbances are independent of the weight disturbances. Although allowing ln pit to have a nonzero covariance with the wi would be appealing if positive shocks to bit are thought to raise equilibrium market prices, this would make the ex9. Even if every price quoted for a good is included in the sample, we can still adopt an infinite population perspective and view these prices as realizations from a data-generating process that is the object of our investigations. 10. Estimates of the variance of the Consumer Price Index (CPI) produced by the Bureau of Labor Statistics have long included the effects of sampling error in the price measures used as lower-level aggregates in constructing the CPI. Now they also include the effects of the variances of the weights used to combine these lower-level aggregates, which reflect sampling error in expenditure estimates. See Bureau of Labor Statistics (1997, 196). 11. The denominator is a degrees of freedom correction derived as follows. Assume for simplicity that E [ ln pit – E( ln pit)]2 1. Then E( ln pit – t)2 E( ln pit – Σ wj ln pit)2 E [(1 – wi ) ln pit – Σjiwj ln pjt]2 (1 – wi )2 Σj iw2j 1 – 2wi Σjw2j . The weighted average of these terms, Σi wi [1 – 2wi Σjw 2j ], is 1 – Σiw2i 1 – w .
496
Robert C. Feenstra and Marshall B. Reinsdorf
pression for var sv quite complicated. With the independence assumption, we obtain proposition 4. Proposition 4: Let prices and weights wi in equation (8) have independent distributions, and let s 2i be an estimate of the variance of ln pit , i 1, . . . , N. Then the variance of the Sato-Vartia price index can be approximated by (17)
N N 1 1 var sv ∑ w 2i s2i s2 s 2p w s 2 ∑ s2i w 2i (1 wi )2, 2 2 i1 i1
where s 2 is estimated from the mean squared error of the regression (12) as in equation (15). In the special case when every price variance may be estimated by s 2p /(1 – w ), equation (17) becomes
N s2pw 1 1 s2p var sv s 2 s 2p w s2 ∑ w 2i (1 wi )2. (1 w 2 2 (1 w ) )] i1 This proportion shows that the approximation for the variance of the price index is the sum of three components: one that reflects the variance of prices, another that reflects the variance of preferences and holds prices constant, and a third that reflects the interaction of the price variance and the taste variance. The first term in equation (17) or (17 ) is similar to the variance estimator in conventional stochastic approach. If each wi equals 2 1/N, the first term in equation (17 ) becomes s2p w /(1 – w ) sp /(N – 1), just as in equation (3). The second term in equation (17) or (17 ) is the same as the term that we derived from stochastic tastes, in proposition 2. The third term is analogous to the interaction term that appears in the expected value of a product of random variables (Mood, Graybill, and Boes 1974, 180, corollary to theorem 3). The presence of this term means that the interaction of random prices and tastes tends to raise the standard error of the index. If each wi equals 1/N, then the second term in equation (17 ) becomes s2ε s2p /4(N – 2), and the third term becomes [s2ε s2p /4(N – 2)](1 – 1/N ).
(17 )
16.6 Translog Function We next consider a translog unit-cost or expenditure function, which is given by (18)
N 1 N N ln c(pt , t) 0 ∑ it ln pit ∑ ∑ ij ln pit ln pjt , 2 i1 j1 i1
where we assume without loss of generality that ij ji . In order for this function to be linearly homogeneous in prices, we must have ΣNi1it 1 and ΣNi1ij 0. The corresponding share equations are N
(19)
sit it ∑ ij ln pjt , j1
i 1, . . . , N.
Theory and Application to Asian Growth
497
We will treat the taste or technology parameters it as random variables but assume that the ij are fixed. Suppose that it i εit, where the constant coefficients i satisfy ΣNi1i 1, while the random errors εit satisfy ΣNi1εit 0. Using this specification, the share equations are, N
(20)
sit i ∑ ij ln pjt εit ,
i 1, . . . , N.
j1
We assume that εit is identically distributed with E(εit) 0 for each equation i, though it will be correlated across equations (because the errors sum to zero) and may also be correlated over time. Because the errors sum to zero, the autocorrelation must be identical across equations. We will denote the covariance matrix of the errors by E(et, e t ) , and their autocorrelation is then E(et e t–1) . With this stochastic specification of preferences, the question again arises as to what a price index should measure. In the economic approach, with it constant over time, the ratio of unit-costs is measured by a Törnqvist (1936) price index. The following result shows how this generalizes to the case where it changes: Proposition 5: Defining i (it it)/2, and wit (sit–1 sit)/2, where the shares si are given by equation (19) for the parameters i , t – 1, t, then N
(21)
p
i1
pit
it1
wit
c(pt, ) . c(pt1, )
The expression on the left of equation (21) is the Törnqvist price index, which measures the ratio of unit-costs evaluated at an average value of the taste parameters it . This result is suggested by Caves, Christensen, and Diewert (1982) and shows that the Törnqvist index is still meaningful when the first-order parameters i of the translog function are changing over time. The variance of the Törnqvist index can be computed from the right side of equation (21) expressed in logs. Substituting from equation (18), we find that the coefficients i multiply the log prices. Hence, conditional on prices, the variance of the log change in unit-costs will depend on the variance of i (it–1 it)/2 i (εit–1 εit)/2. The covariance matrix of these taste parameters is E( – )( – ) (1 ) /2. This leads to the next result: Proposition 6: Let the parameters it be distributed as it i εit , with E(et e t ) and E(et e t–1 ) , and denote the log Törnqvist index by t ΣNi1 wit ln pit . Then, conditional on prices, the variance of t is (22)
1 var t (1 ) ln p t ln pt . 2
498
Robert C. Feenstra and Marshall B. Reinsdorf
Because the errors of the share equations in equation (20) sum to zero, the covariance matrix is singular, with 0 where is a (N 1) vector of ones. Thus, the variance of t is equal to, (23)
1 var t (1 )( ln pt t ) ( ln pt t ). 2
The variance of the Törnqvist index will approach zero as the prices approach a common growth rate, and this property also holds for the variance of the Sato-Vartia index in proposition 2 and the conventional stochastic approach in equation (3). But unlike the stochastic approach in equation (3), the variance of the Törnqvist index will depend on the fit of the share equations. Our formula for the variance of the Törnqvist index is more general than the one that we obtained for the Sato-Vartia index because proposition 6 does not assume that the taste disturbances are all independent and identically distributed. The fit of the share equations will depend on how many time periods we pool over, and this brings us to the heart of the distinction between the stochastic and economic approaches. Suppose we estimated equation (20) over just two periods, t – 1 and t. Then it is readily verified that there are enough free parameters i and ij to obtain a perfect fit to the share equations. In other words, the translog system is flexible enough to give a perfect fit for the share equation (20) at two points. As we noted in section 16.3, such flexibility is a virtue in implementing the economic approach to index numbers: indeed, Diewert (1976) defines an index to be superlative if it is exact for an aggregator function that is flexible.12 But from an econometric point of view, we have zero degrees of freedom when estimating the share equations over two periods, so that the covariance matrix cannot be estimated. How are we to resolve this apparent conflict between the economic and stochastic approaches? We believe that a faithful application of the economic approach requires that we pool observations over all available time periods when estimating equation (20). In the economic approach, Caves, Christensen, and Diewert (1982) allow the first-order parameters it of the translog unit-cost function to vary over time (as we do in the preceding) but strictly maintain the assumption that the second-order parameters it are constant (as we also assume). Suppose the researcher has data over three (or more) periods. If the share equations are estimated over periods one and two, and then again 12. Diewert (1976) defines an aggregator function to be flexible if it provides a second-order approximation to an arbitrary function at one point, that is, if the parameters can be chosen such that the value of the aggregator function, and its first and second derivatives, equal those of an arbitrary function at one point. We are using a slightly different definition of flexibility: if the ratio of the aggregator function and the value of its first and second derivatives equal those of an arbitrary function at two points.
499
Theory and Application to Asian Growth
over periods two and three, this would clearly violate the assumption that ij are constant. Because this is an essential assumption of the economic approach, there is every reason to use it in our integrated approach. The way to maintain the constancy of ij is to pool over multiple periods, which allows the covariance matrix to be estimated. Pooling over multiple periods is also recommended for the CES share equations in equation (12), to satisfy the maintained assumption that is constant, even though in the CES case we do not obtain a perfect fit if equation (12) is estimated over a single cross section (provided that N 2). Once we pool the share equations over multiple periods, it makes sense to consider more general specifications of the random parameters it. In particular, we can use it i ti εit , where the coefficients i on the time trend satisfy ΣNi1i 0. Then the share equations become N
(24)
sit i i t ∑ ij ln pjt εit,
i 1, . . . , N; t 1, . . . , T.
j1
We make the same assumptions as before on the errors εit . Proposition 5 continues to hold as stated, but now the i are calculated as i (it–1 it )/2 i i /2 (εit–1 εit)/2. The variance of these is identical to that calculated above, so proposition 6 continues to hold as well. Thus, including time trends in the share equations does not affect the variance of the Törnqvist index. 16.7 Translog Case with Stochastic Prices As in the CES case, we would like to extend the formula for the standard error of the price index to include randomness in prices as well as taste parameters. For each commodity i, we suppose that ln pit is random with variance of 2i , which can be estimated by s2i , the sample variance of the rates of change of the various quotes for item i. These lower-level sampling errors are assumed to be independent across commodities and are also independent of error εit in the taste or technology parameters, it i εit. Then the standard error in proposition 6 is extended as proposition 7. Proposition 7: Let ln pit , i 1, . . . , N, be independently distributed with mean 0 and variances estimated by s2i , and also independent of the parameters it i εit . Using the weights wit (sit–1 sit)/2, the variance of the log Törnqvist index is approximated by (25)
∑ ˆ ˆ s 1 1 ˆ ( ln p ) ∑(s ) ∑ ˆ s (1 ˆ)( ln p ) 2 4
1 var() ∑ w2it s2i ∑ ∑( ln pit )( ln pjt ) 4 i j t
2 ik jk k
k
2 ij
2 i
t
i
j
2 j
500
Robert C. Feenstra and Marshall B. Reinsdorf
∑ ˆ s .
1 (1 ) 2
ii
2 i
In the case where all prices have the same trend and variance, we can estimate 2i by s 2p /(1 – w ) for all i and equation (25) becomes (25 )
1 2 var() s2pw [s /(1 w /(1 w ) )] 4 p
∑ ∑( ln p )( ln p ) ∑ ˆ ˆ it
i
jt
j
ik jk
k
1 1 2 ˆ ( ln p ) [s2 /(1 w (1 ˆ)( ln pt ) )] t 2 4 p
∑ ∑ ˆ 2 [s /(1 w)](1 ˆ) ∑ ˆ . 2 ij
i
1
2 p
j
ii
i
Thus, with stochastic prices the variance of the Törnqvist index includes five terms. The first term in equation (25) contains the product of the squared weights times the price variances, the second term reflects the effect of the price variance on the weights, the third term reflects the variance of the weights that comes from the taste shocks, the fourth term reflects the interaction of the price variance component and the component of the weight variance that comes from the price shocks, and the fifth term reflects the interaction between the price variance and the component of the weight variance that comes from the taste shocks. The expression for the variance of the Sato-Vartia index in equation (17) also includes the analogous expressions for the first, third, and fifth terms in equation (25). The terms in equation (25) reflecting the effect of the price shocks on the weight variance are new, however, and arise because the linearity of the Törnqvist index allow use to include them; they were omitted for the sake of simplicity in the CES case. These terms can be omitted from the estimator of the index variance if the model of consumer behavior is specified with expected prices, rather than realized prices, as explanatory variables. 16.8 Application to Productivity Growth in Singapore We now consider an application of our results to productivity growth in Singapore. As discussed in the introduction, Hsieh (2002) has recently computed measures of TFP for several East Asian countries and obtains estimates higher than Young (1992, 1995) for Singapore. The question we shall address is whether Hsieh’s estimates for productivity growth in Singapore are statistically different from those obtained by Young. Hsieh (2002) uses three different measures of the rental rate on capital
Theory and Application to Asian Growth
501
for Singapore. They are all motivated by the Hall and Jorgenson (1967) rental price formula, which Hsieh writes as (26)
rj p kj (i pˆ k j ), p p
where p kj is the nominal price of the jth type of capital, p is the GDP deflator, i is a nominal interest rate, pˆk is the overall inflation rate for capital, and j is the depreciation rate for the jth type of capital. For the real interest rate (i – pˆk ), Hsieh uses three different measures: (a) the average nominal lending rate of the commercial banks, less the overall inflation rate for capital pˆk ; (b) the earnings-price ratio of firms on the stock market of Singapore; and (c) the return on equity from firm-level records in the Singapore Registry of Companies. These are all plotted in figure 16.1 (reproduced from Hsieh 2002, figure 2), where it can be seen that the three rates are substantially different. To compute the real rental price, capital depreciation is added to all three series in figure 16.1, after which the calculation in equation (26) is made using the investment price deflators for five kinds of capital for p kj . Hsieh (2002) weights these five types of capital by their share in payments to obtain an overall rental rate corresponding to each interest rate. We will denote these by r kt , k 1, 2, 3, depending on the three interest rates used. The plot of the real rental prices (not shown) looks qualitatively similarly to figure 16.1. In figure 16.2, we show the percent change in the rental prices (computed as the change in the log of equation [26], times 100), where it is evident that the dip in the commercial bank lending rate in 1974 has a dramatic effect on that rental price. Hsieh (2002, 509) regresses the average growth of rentals on a constant and time trend, and the coefficient of the time trend over each sample—representing the average annual growth of each rental price—is reported in part A of table 16.1.
Fig. 16.1
Real interest rates in Singapore (percentage)
Fig. 16.2 Table 16.1
Change in the rental prices (percentage) Dual total factor productivity (TFP) growth in Singapore Annual growth rate (%) Labor share
Real rental
Real wages
Dual TFP
Primal TFPa
3.64 2.86 4.44
1.76 2.26 2.02
–0.69 –0.22 –0.66
B. Computed with annual data, Törnqvist index Real interest rate used Return on equity (1973–92) 0.418 –0.85 4.33 Bank lending rate (1973–92) 0.418 2.50 4.33 Earnings-price ratio (1973–92) 0.418 1.62 4.33
1.24 3.35 2.85
A. Revised from Hsieh (2002) Real interest rate used Return on equity (1971–90) Bank lending rate (1968–90) Earnings-price ratio (1973–90)
0.511 0.511 0.511
–0.20 1.64 –0.50
Average rental price (1973–92) Average SD (1973–92)
0.418
1.09 (17.4)
4.33
2.48 (10.5)
Average rental price (1975–94) Average SD (1975–92)
0.424
–0.58 (8.7)
4.02
1.37 (5.0)
Fifteen-year growth (%) C. Computed with 15-year changes, Törnqvist index Real interest rate used Average rental price (end years 1990–92) 0.422 –6.82 61.0 Average SD due to error in rentals (6.6)
21.8 (3.8)
Average SD due to translog error SD due to interaction between errors
(0.6) (0.1)
Total SDb
(3.9)
a
Calculated by Hsieh from primal estimates in Young (1995), which depend on the sample period used. Computed as the square root of the sum of squared standard deviations listed in the preceding.
b
Theory and Application to Asian Growth
503
On the wage side, Hsieh distinguishes eight types of workers, by gender and four educational levels. He uses benchmark estimates for wages and employment in 1966, 1972, 1980, and 1990 and annual data on income and employment from labor market surveys beginning in 1973 to calculate the annual growth rates of wages. The average annual growth of wages over various times periods is shown in part A of table 16.1.13 The labor share of 0.511 shown in part A is taken from Young (1995) and is held constant. Then dual TFP growth is computed by the weighted average of the annual growth in the wage and rental price of capital, using the constant labor share as the weight on labor. This results in dual TFP growth ranging from 1.76 percent to 2.46 percent per year, as shown in the second-to-last column of table 16.1. These estimates are comparable to the estimates for other Asian countries, but they contrast with the negative estimates of primal TFP of –0.22 to –0.69 percent per year for Singapore, from Young (1995), as shown in the final column. The question we wish to address is whether the Hsieh’s (2002) estimates in the second-to-last column of part A are significantly different from Young’s estimates in the last column. Hsieh (table A2, 523) computes confidence intervals on the average growth of each of the real rental prices in part A using the standard errors from the coefficient on each time trend. For two of the three alternative measures, the 95 percent confidence interval includes a decline of nearly 1 percent per year. Hsieh uses the bounds of these confidence intervals for rental price growth to calculate confidence intervals for TFP growth. The confidence intervals for TFP growth all lie above 1 percent per year, so according to his calculations, TFP growth is significantly greater than zero. We would argue that this procedure fails to convey the true uncertainly associated with the TFP estimates, for two reasons. First, and most important, we should treat each interest rate—and associated rental prices on capital—as an independent observation on the “true” rate and pool across these to compute the standard error of the rentals. Second, we should distinguish this standard error in any one year from that over the entire sample period. Hsieh’s (2002) procedure is to compute average TFP over the entire sample, along with its standard error, but this does not tell us whether TFP growth in any one year (or shorter period) is significantly positive. We now proceed to address both these points. 16.8.1 Error in Annual TFP Before we can construct our estimate of the standard error of dual TFP, we first need to remeasure productivity using annual data on labor shares, wages, and the rental price. These results are shown in part B of table 16.1. 13. These results are somewhat higher than reported in Hsieh (2002, 509) because we have corrected a slight inconsistency in his calculation.
504
Robert C. Feenstra and Marshall B. Reinsdorf
Annual data for wages and labor shares are available beginning in 1973, and the annual data for all three rentals continues until 1992, so that becomes our sample period. We first aggregate the eight types of labor using a Törnqvist price index and then compute dual TFP growth using a Törnqvist index over the real wage index and real rental price of capital: (27)
1 1 TFP kt (sLt sLt1) ln(wt /pt ) (sKt sKt1) ln(r kt /pt ), 2 2
where sLt is the labor share in period t, sKt is the capital share with sLt sKt 1, wt is the wage index, and rt is the rental price on capital. The labor shares are computed from Economic and Social Statistics, Singapore, 1960– 1982 and from later issues of the Yearbook of Statistics, Singapore. These shares range from 0.36 to 0.47 over 1973 to 1992 and average 0.418, which is less than the labor share shown in part A of table 16.1 and used by Young (1995) and Hsieh (2002). In addition to the average labor share, we report in part B the average growth rates of the rentals prices and wage index, as well as the computed dual TFP. The average rental price growth differs substantially between parts A and B. This reflects the use of different formulas: as noted in the preceding, Hsieh (2002) uses a regression-based method to compute the growth rate, whereas we use the average of the difference in logs of equation (26), times 100. Hsieh states that his method is less sensitive to the initial and end points of the sample period, whereas the average of the difference in log rentals certainly does depend on our sample period. It is evident from figure 16.2 that the rental price computed with the average banklending rate falls by about 200 percent from 1973 to 1974 and then rises by about 300 percent from 1974 to 1975, and these values are the largest in the sample. If instead of using 1973–1992 as the sample period, we use 1975– 1992, then the average growth in the rental price computed with the commercial bank lending rate falls from 2.5 percent per year to –1.4 percent per year! The growth rates of wages reported in parts A and B also differ slightly because of differences in sample periods and in formulas used.14 Dual TFP based on the Törnqvist index, reported in part B, shows higher growth for two of the rental price measures and lower growth for one measure than dual TFP based on average growth rates, reported in part A. Using the mean of the three alternative rental price estimates, the growth of dual TFP based on the Törnqvist index is 2.48 percent per year over 1973 to 1992. Yet this falls dramatically to 1.37 percent per year if 1973–1974 (when one of the rental prices moved erratically) is omitted. Our goal is to compute the standard error of the Törnqvist index in equation (27), where this error arises from two sources: (a) error in mea14. As discussed in the preceding, we use a Törnqvist price index constructed over the eight types of labor, whereas Hsieh uses an averaging procedure.
Theory and Application to Asian Growth
505
suring the rental prices of capital, based on the three alternative real interest rates used; (b) error because the annual data will not fit a translog cost function perfectly. Under the hypothesis that the homothetic translog cost function model describes the process generating the data, the Törnqvist price index exactly summarizes the change in the cost function. Thus, we assume that changes in expenditure shares represent responses to changes in wages and rental prices in accordance with the translog model, plus effects of random shocks to expenditures. Ceteris paribus, the greater the variance of the share changes that is unexplainable by the translog model, the greater the variance of the random shocks that affect the weights in the Törnqvist index. Beginning with error (a), we first construct the mean rental price: 1 3 ln rt ∑ ln r kt , 3 k1 and its change, 1 3 ln (rt /rt1) ∑ ln(r kt /r kt1), 3 k1
(28)
where k 1, 2, 3 denotes the three rental prices. Then the sample variance of the change in mean rental price, denoted by s 2t , is 1 3 s2t ∑ [ln(r kt /r kt1) ln(rt /rt1)] 2. 6 k1
(29)
In figure 16.3, we plot mean TFP growth in each year, (30)
1 1 T F P t (sLt sLt1) ln(wt /pt ) (sKt sKt1) ln(rt /pt ), 2 2
and the 95 percent confidence interval (with 2 degrees of freedom) con2 structed as T F P t 2.9/2(sKt sKt–1) s. t We can see that the confidence
Fig. 16.3
Annual TFP growth, 1974–1992 (percentage)
506
Robert C. Feenstra and Marshall B. Reinsdorf
Fig. 16.4
Annual TFP growth, 1976–1992 (percentage)
interval on mean TFP growth over 1973 to 1995 is extremely wide, but this is not surprising given the erratic data on rentals shown in figure 16.2. In table 16.1, we report in parentheses the average standard deviation of the change in rental prices, and the average standard error of mean TFP growth, over the 1973 to 1992 period. Consistent with figures 16.2 and 16.3, both of these are extremely large. Furthermore, even when we restrict attention to the shorter period of 1975–1992 shown in figure 16.4, the confidence interval of mean TFP growth still includes zero in every year. This can also be seen from table 16.1, where we report the average standard deviations of the change in rental prices and mean TFP growth over 1975 to 1992. The average value of mean TFP growth is 1.4 percent, but it has an average standard error of 5 percent. Accordingly, in every year we cannot reject the hypothesis that mean TFP growth is zero or negative. Thus, on an annual basis, we would be hard pressed to conclude that the positive productivity estimates of Hsieh are significantly different from the negative estimates of Young (1995). 16.8.2 Error in Cumulative TFP Nevertheless, our interest is not in the hypothesis that TFP growth in each year is positive but, rather, that cumulative TFP growth is positive. An erratic movement in a rental price one year might very well be reversed the next year, resulting in a negative autocorrelation that reduces the variance of long-run rental growth. To assess the implications of this, we instead consider longer differences in TFP, such as fifteen-year growth, (31)
1 T F P t,15 (sLt sLt15)[ln(wt /pt ) ln(wt15 /pt15 )] 2 1 (sKt sKt15 )[ln(rt /pt ) ln(rt15 /pt15 )]. 2
Theory and Application to Asian Growth
Fig. 16.5
507
Fifteen-year TFP growth (percentage)
The standard error of this can be measured using the variance of measurement error in the long-difference of rental prices, (32)
1 3 s2t,15 ∑ [ln(r kt /r kt15 ) ln(rt /rt15 )] 2. 6 k1
In figure 16.5, we plot the mean fifteen-year TFP growth ending in the years 1988–1992, along with the 95 percent confidence interval T F P t,15 2 2.9/2(sKt sKt–15 )s t,15 . We have five observations for the fifteen-year cumulative TFP growth, and in four out of five cases the cumulative growth is significantly greater than zero. The only exception is 1989, where the erratic movement in the mean rental from 1974, and its large standard error, makes that observation on TFP growth insignificantly different from zero. In all other end years, the confidence intervals on cumulative TFP growth exclude zero. This can also be seen from part C of table 16.1, where we report the mean values of the growth in wages, mean rental, and mean TFP growth over the fifteen-year period, along with their standard deviations. Cumulative TFP growth of 21.8 percent (averaged over the end-years 1990–1992) vastly exceeds its standard error of 3.8 percent. Notice that this standard deviation is actually smaller than the standard deviation of the annual change in TFP growth in part B, indicating some negative correlation in the measurement error of rental price changes.15 Accordingly, we cannot reject that hypothesis that fifteen-year mean TFP growth is positive, except in 1989. 16.8.3 Error from Fitting the Translog Function We still need to check the second source of error in the index, which arises because a translog cost function does not fit the data perfectly. We 15. The standard deviation of the change in rental prices first increases with the lag length and then falls. That is, let s2t,T denote the variance of the change in rental prices as in equation 2 (7), but with a lag length of T. For T 1, 2, 3, 5, 10, 15, the standard deviation of s t,T (averaged over end years 1990–1992) equals 5.0, 7.5, 8.7, 18.8, 8.8, 6.6 percent.
508 Table 16.2 Sample 1974–1992 Constrained 1976–1992 Constrained
Robert C. Feenstra and Marshall B. Reinsdorf Translog estimation Constant
ln(real rental)
ln(real wage)
SE of regression
R2 , N
0.63 (0.02) 0.50 (0.08) 0.65 (0.02) 0.72 (0.02)
–0.020 (0.011) –0.015 (0.014) 0.005 (0.048) 0.10 (0.011)
–0.14 (0.022) 0.015 (0.014) –0.14 (0.027) –0.10 (0.011)
0.50 (0.23) 0.91 (0.10) 0.44 (0.23) 0.10 (0.27)
0.010
0.92, 19
0.015
0.81, 19
0.011
0.89, 17
0.012
0.87, 17
Notes: Dependent variable capital share. The constraint used is that the coefficients of the log real rental and real wage should be equal but opposite in sign. This constraint is rejected over the 1974–1992 period, but not rejected over 1976–92. SE standard error.
proceed by estimating the share equations for the translog cost function, using the mean rental price rt and the Törnqvist wage index wt .16 Dropping one share equation (because shares sum to unity), we are left with estimating the capital share equation: (33)
sKt L KK ln(rt /pt ) KL ln (wt /pt ) εKt ,
where sKt is the capital share. We allow for first-order autocorrelation in the error εKt when estimating this equation. So losing one observation to allow for estimation of , the sample period becomes t 1974, . . . , 1992, or t 1976, . . . , 1992 when we exclude the erratic change in rental prices. Results for both periods are shown in table 16.2. Over the 1974 to 1992 period, we obtain significant estimates for both KK and KL in the first regression (row) in table 16.2, but with these estimates we strongly reject the homogeneity restriction that KK KL 0. If we go ahead and impose this constraint, then the results, shown in the second regression of table 16.2, are quite poor: KK –KL is insignificant, and most of the explanatory power comes from the autocorrelation 0.91. This is likely caused by the erratic movement in the mean rental price over 1973 to 1974, so instead we consider estimation over 1976 to 1992. In that case, unconstrained estimation in the third regression leads to estimates of KK and KL that are opposite in sign, and the homogeneity restriction KK –KL is borderline between being accepted and rejected at the 95 percent level. Using this restriction, we obtain the estimates in the final regression, with KK 0.10, 0.10 and a standard error of the regression equal to 0.012. We shall use these estimates in the calculations that follow. 16. The second error can be assessed by either fitting a translog unit-cost function to the data on rental prices and the Törnqvist wage index (two factors) or to the data on the rental prices and wages for each type of labor (nine factors of production). For convenience, we have used just two factors.
Theory and Application to Asian Growth
509
To construct the standard deviation of fifteen-year TFP growth due to the translog error, we rewrite the term on the right of equation (22) as (34)
1 (1 15 )[ln(wt /wt15 ) ln(rt /rt15 )] 22KK , 2
where KK is the standard error of the capital-share regression. We obtain equation (34) from equation (22) by using the simple structure of the covariance matrix
2KK
–2KK
–2KK
2KK
,
which follows as the errors in the capital and labor share equations sum to zero. With autocorrelation of 0.10, the term 15 is negligible. So taking the square root of (34), the standard error of fifteen-year TFP growth becomes (1/2 )⏐ln(wt /wt–15 ) – ln(rt /rt–15 )⏐KK . The fifteen-year rise in the wagerental ratio is quite large: 68 percent from the values in part C of table 16.1. But then multiplying by the standard error of the capital-share equation, which is 0.012, and dividing by 2, we obtain the small standard error of 0.6 percent shown in parentheses in part C. This is about one-sixth the size of the standard error due to measurement error in the rentals, so the imprecision in fitting the translog function does not add very much to the standard error of the productivity index in this case. Next, we need to check the various interaction terms between the measurement error in the rentals and in error in fitting the translog function; these are the second, fourth, and fifth terms on the right of equation (25). Computing these for the fifteen-year changes in factor prices and using the estimated coefficient KK 0.10, we obtain an additional standard error of 0.1 percent, also shown in parentheses in part C. Summing the squares of these various sources of error in the TFP index and taking the square root, we obtain the total standard deviation of the fifteen-year TFP growth of 3.9 percent. The 95 percent confidence interval for fifteen-year growth (averaged over 1990 to 1992) is then (10.6 percent, 33.1 percent), which easily excludes zero. Even after taking into account the errors in computing the dual Törnqvist index, the conclusion is still that cumulative productivity growth in Singapore has indeed been significantly greater than zero. 16.8.4 An Explanation for the Conflicting Results of Hsieh and Young The finding of statistical significance for the difference between the estimate of cumulative productivity growth for Singapore from Hsieh (2002) and the negative estimate of Young (1992, 1995) means that an explanation is needed for this difference. Hsieh’s use of a dual approach to measuring
510
Robert C. Feenstra and Marshall B. Reinsdorf
is an obvious methodological difference from Young, but not one that would generally affect the results for any systematic reason. Rather, the most critical difference between the studies appears to be in the method for measuring the rate of return on capital. The internal return to capital in Young (1992, 1995) is an ex post return, as in Jorgenson and Griliches (1967): it is the rental computed by subtracting payments to labor from value added in the economy and then dividing by a capital stock. In contrast, some of the rates of return used by Hsieh (2002) are ex ante measures (particularly [a], based on the average nominal lending rate of the commercial banks). This distinction allows us to apply a theorem due to Hulten (1986), showing how the difference between TFP measured using ex post and ex ante returns is influenced by capital utilization. Specifically, Hulten argues that TFP measured using an ex ante rate of return includes a “capacity utilization” effect because, with gradual adjustment, the capital stock is likely to differ from its long-run equilibrium value, at which average cost would be minimized. If the capital-output ratio is below its equilibrium value, the marginal revenue product of capital exceeds the ex ante return to capital. Under this circumstance, additions to the capital stock will earn quasi-rents, so the capital stock can be expected to have a higher growth rate than output. With a growing capital stock, use of the ex ante return will spuriously attribute some output growth to long-run productivity growth. For Singapore, capital has been growing faster than output for several decades, implying that the capital-output ratio has indeed been below the equilibrium value. Use of an ex ante rate could, therefore, result in an overestimate of the role of productivity in Singapore’s long-run growth. To see whether this hypothesis could explain the discrepant results of Young and Hsieh, we apply a theorem due to Hulten. Let TFP ex-post equal TFP growth estimated with an ex post capital rental price. This measure will reflect long-run productivity growth, such as Hicks-neutral shifts in the production function. Conversely, let TFP ex-ante equal TFP growth estimated with an ex ante capital rental price. This measure will reflect both long-run productivity growth and short-run capacity utilization effects. Then Hulten (1986, 46) shows that Q˙ K˙ (35) TFP ex-ante TFP ex-post TFP ex-post , Q K where (Q˙ /Q) is the growth rate of output, (K˙ /K ) is the growth rate of capital, and measures the ratio of short-run marginal cost to short-run average cost, less unity. This parameter is related to the utilization of capital: 0 if capital is overutilized, that is, below its long-run level. What value to assign to is hard to know, but the other variables appearing in equation (35) can be readily measured for Singapore. A value of –0.5 percent per year for long-run productivity growth TFP ex-post is consistent with the TFP estimates of Young reported in the final column of table 16.1, which are measured with an ex post rental on capital. From Young
Theory and Application to Asian Growth
511
(1995, 658), the difference between the growth of output and growth of (weighted) capital for Singapore over 1966 to 1990 is –2.8 percent per year, so the term in brackets in equation (35) becomes –2.8 0.5 –2.3 percent. Conversely, the growth in TFP ex-ante, measured with an ex ante rental to capital, averages about 2.0 percent per year from Hsieh (2002), reported in part A of table 16.1. Then equation (35) becomes, 2.0 –0.5 (2.3), which holds if and only if is close to unity. Based on Hulten’s (1986, 48–49) geometric interpretation of , a value of unity for implies that short-run economic profits (computed after paying capital its market rental) are 100 percent of short-run costs (or 50 percent of revenue). This appears rather high. If the value of is instead one-half, then about one-half of the difference in TFP between Young and Hsieh is explained by capital not being at its long-run level, and similarly if this parameter is one-quarter, then about one-quarter of the difference in TFP is explained. We conclude that some portion of the difference between Young’s results and Hsieh’s results is probably explained by the violation of the assumption underlying Hsieh’s method of a capital-output ratio at its long-run equilibrium value, but not the entire difference. 16.9 Conclusions The problem of finding a standard error for index numbers is an old one, and in this paper we have proposed what we hope is a useful solution. We have extended the stochastic approach to include both stochastic prices and stochastic tastes. The variance of the taste parameters, which affect the weights in the price index formula, is obtained by estimating a demand system. Our proposed method to obtain the standard error of prices indexes therefore involves two steps: estimating the demand system, and using the standard error of that regression (or system), combined with estimates of the sampling error in the prices measures themselves, to infer the variance of the price index. While our methods extend the stochastic approach, they also extend the economic approach to index numbers by integrating the two approaches. It is worth asking why standard errors have not been part of the economic approach to indexes. Consider, for example, the problem of estimating a cost-of-living index. We could estimate the parameters of a model of preferences from data on expenditure patterns and then use these estimates to calculate a cost of living index. Yet, if the data fit the model perfectly, the cost-of-living index calculated from the parameter estimates would have the same value as an exact index formula that uses the data on expenditure patterns directly. Moreover, Diewert’s (1976) paper showed that the types of preferences or technology that can be accommodated using the exact index approach are quite general. As a result, econometric modeling was no longer thought to be necessary to estimate economic index numbers. A consequence of the lack of econometric modeling is that estimates of
512
Robert C. Feenstra and Marshall B. Reinsdorf
economic index numbers are no longer accompanied by standard errors, such as those that appear, for example, in Lawrence (1974). Nevertheless, if the model that underlies an exact index number formula has positive degrees of freedom, an error term will usually need to be appended to the model to get it to fit the data perfectly. This will certainly be the case if the consumption or production model is estimated over a panel data set with multiple commodities and years, which is our presumption. Indeed, we would argue that the assumption of the economic approach that taste parameters are constant between two years, when applied consistently over a time series, means that the parameters are constant over all years of the panel. This will certainly mean that the demand system must have an error appended, and, as a result, the taste parameters and exact price index are also measured with error. We have derived the formula for this error in the CES and translog cases, but our general approach can be applied to any functional form for demand or costs. In our application to Asian growth, we have contrasted the TFP estimate of Hsieh (2002) to those of Young (1992, 1995). Hsieh argues that the available evidence on returns to capital from financial sources do not show the decline that is implicit in the work of Young. Hsieh considers three different measures of the return to capital and their associated rental prices. While the rental prices differ markedly from each other in some years, the error in measuring the true rental is not enough to offset the underlying fact that their decline is much less than the cumulative rise in real wages: a 6.8 percent cumulative decline in the average rental over fifteen years, as compared to a 61 percent increase in the real wage. Even when including the additional error from fitting a translog function to the data for Singapore, the standard error on fifteen-year cumulative TFP growth remains low enough so that its confidence interval is entirely positive. The evidence that Singapore has enjoyed positive productivity growth over this long time period is, therefore, strong, in contrast to the conclusions of Young. Finally, although effects of excess returns on investment caused by violations of an equilibrium assumption could plausibly have elevated Hsieh’s estimates, these effects are unlikely to be large enough to account for all of the measured amounts of TFP growth.
References Balk, Bert M. 1995. Axiomatic price theory: A survey. International Statistical Review 63:69–93. Berndt, Ernst R., and Melvyn Fuss. 1986. Productivity measurement with adjustments for variation in capacity utilization, and other forms of temporary equilibrium. Journal of Econometrics 33:7–29.
Theory and Application to Asian Growth
513
Caves, Douglas W., Christensen, Laurits R., and W. Erwin Diewert. 1982. The economic theory of index numbers and the measurement of input, output and productivity. Econometrica 50:1393–1414. Deaton, Angus, and John Muellbauer. 1980. Almost Ideal Demand System (AIDS). American Economic Review 70 (3): 312–26. Diewert, W. Erwin. 1976. Exact and superlative index numbers. Journal of Econometrics 4:115–45. ———. 1995. On the stochastic approach to index numbers. University of British Columbia. Discussion Paper no. DP95-31, September. Grunfeld, Yehuda, and Zvi Griliches. 1960. Is aggregation necessarily bad? Review of Economics and Statistics 42 (1): 1–13. Feenstra, Robert C., and Marshall Reinsdorf. 2000. An exact index for the Almost Ideal Demand System. Economics Letters 66:159–62. Hsieh, Chang-Tai. 2002. What explains the industrial revolution in East Asia? Evidence from the factor markets. American Economic Review 92 (3): 502–26. Hulten, Charles R. 1986. Productivity change, capacity utilization, and the sources of efficiency growth. Journal of Econometrics 33:31–50. Hall, Robert E., and Dale W. Jorgenson. 1967. Tax policy and investment behavior. American Economic Review 57:391–414. Jorgenson, Dale W., and Zvi Griliches. 1967. The explanation of productivity change. Review of Economic Studies 34:249–82. Keynes, John Maynard. 1909. Index Numbers. In Collected writings of John Maynard Keynes. Vol. 11, ed. Donald Moggridge, 49–156. ———. 1930. A treatise on money. Vol. 1 of The pure theory of money. New York: Harcourt, Brace and Co. Kim, Jong-Il, and Lawrence Lau. 1994. The sources of economic growth of the East Asian newly industrialized countries. Journal of the Japanese and International Economies 8 (3): 235–71. Krugman, Paul R. 1994. The myth of Asia’s miracle. Foreign Affairs 73 (6): 62–78. Lawrence, Anthony G. 1974. The bias and the asymptotic variance of a computed true cost of living index: The case of the Klein-Rubin Constant Utility Index. BLS Working Paper no. 20. Washington, DC: U.S. Bureau of Labor Statistics, January. Mood, Alexander M., Franklin A. Graybill, and Duane C. Boes. 1974. Introduction to the theory of statistics. New York: McGraw-Hill. Sato, Kazuo. 1976. The ideal log-change index number. Review of Economics and Statistics 58 (May): 223–28. Selvanathan, E. A., and D. S. Prasada Rao. 1994. Index numbers: A stochastic approach. Ann Arbor: University of Michigan Press. Törnqvist, Leo. 1936. The Bank of Finland’s consumption price index. Bank of Finland Monthly Bulletin 10:1–8. U.S. Bureau of Labor Statistics. 1997. Consumer Price Index. In Handbook of methods. Washington, DC: Government Printing Office. http://www.bls.gov/opub/hom/ pdf/homch17.pdf. Vartia, Y. O. 1976. Ideal log-change index numbers. Scandinavian Journal of Statistics 3:121–26. World Bank. 1993. The East Asia miracle. Washington, DC: World Bank. Young, Alwyn. 1992. A tale of two cities: Factor accumulation and technical change in Hong Kong and Singapore. In NBER macroeconomics annual 1992, ed. Olivier Jean Blanchard and Stanley Fischer, 13–54. Cambridge, MA: MIT Press. ———. 1995. The tyranny of numbers: Confronting the statistical realities of the East Asian growth experience. Quarterly Journal of Economics 110 (3): 641–80.
17 What Really Happened to Consumption Inequality in the United States? Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
17.1 Introduction The dynamics of inequality over the 1980s and 1990s has received an enormous amount of attention and a voluminous literature studies it. However, most of the existing studies consider either inequality in wages (hourly earnings) or incomes. In the United States (and to a large extent in the United Kingdom), these studies have documented a very large increase during the 1980s, especially during the first half of that decade, followed by some more moderate increases during the last part of the 1980s and the 1990s. Several dimensions of the evolution of inequality have been extensively studied. In particular, many researchers have tried to decompose the observed increase in inequality into increases in inequality between welldefined groups (for instance, based on educational attainment; see also Berman, Bound, and Griliches 1994) and within groups. Others have focused instead on the decomposition of the increase in inequality between Orazio Attanasio is a professor of economics at University College London, a research fellow of the Institute for Fiscal Studies, and a research associate of the National Bureau of Economic Research. Erich Battistin is a researcher in the Department of Statistics at the University of Padova, and a senior research economist at the Institute for Fiscal Studies. Hidehiko Ichimura is a professor in the Graduate School of Public Policy and Graduate School of Economics at the University of Tokyo, and a deputy director of the Centre for Microdata Methods and Practice. The paper benefited from useful discussion with Ernst Berndt, Richard Blundell, David Card, Angus Deaton, David Johnson, Tom MaCurdy, and Luigi Pistaferri; from comments by audiences at the conferences Hard-to-Measure Goods and Services: Essays in Honor of Zvi Griliches in Washington, DC and The Link between Income and Consumption Inequality in Madrid; and from seminars at the Bureau of Labor Statistics, Stanford, and Berkeley.
515
516
Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
increases in the variance of permanent components of wages and earnings and transitory components. Very few studies have considered the evolution of inequality in consumption. This is partly due to the paucity of data sources containing individuallevel consumption data. One of the first papers to use the Consumer Expenditure Survey (CEX) in the United States to study the evolution of consumption inequality is Cutler and Katz (1991), documenting an increase in consumption inequality that substantially paralleled the increase in wage and income inequality. Slesnick (1993), on the other hand, analyzes the evolution of poverty in the United States and stresses that the picture that emerges when one uses consumption instead of income to measure poverty is very different, both in terms of levels and of dynamics. Attanasio and Davis (1996) focus on differences across education and year of birth cohorts and report that, coherently with the Cutler and Katz (1991) evidence, especially at relatively low frequencies, relative wage changes are pretty much reflected in relative consumption changes. Slesnick (2001), instead, claims that the evolution of consumption inequality is in sharp contrast to that of income inequality: “the widely reported U-turn in inequality in the United States is an artifact of the inappropriate use of family income as a measure of welfare. When well-being is defined to be a function of per equivalent consumption, inequality either decreased over the sample or remained essentially unchanged depending on the choice of equivalence scale” (154). More recently, Krueger and Perri (2006) discuss results based on the analysis of the CEX until 2001 that are roughly consistent with those reported by Slesnick (2001). In particular, Krueger and Perri (2006) stress that after a modest increase during the first part of the 1980s, consumption inequality is substantially flat. Attanasio (2003) and Battistin (2003), on the other hand, present evidence, based on both the Interview and the Dairy segments of the CEX that seems to contradict such a view. Blundell, Pistaferri, and Preston (2002) use the Panel Study of Income Dynamics (PSID) until 1992 and show that the inequality of food consumption is increasing in that data set. A fair conclusion that can be drawn from the few studies cited in the preceding is that the evidence on the evolution of consumption inequality in the United States is far from clearcut and that there is not much agreement in the literature. This state of affairs is particularly unsatisfying because measures of consumption inequality and their evolution can be particularly useful and informative. As Blundell and Preston (1998) stress, under certain conditions, consumption comparisons can be more informative about welfare differences than income comparisons. Well-being is determined by consumption rather than income. Consumption changes will take into account any mechanism that individual households have to buffer income shocks (either because they are transitory or because they are somehow insured).
What Really Happened to Consumption Inequality in the United States?
517
Deaton and Paxson 91994) spell out some of the implications of the lifecycle model for the evolution of the cross-sectional variance of consumption inequality. Blundell and Preston (1998) show how to use information on the evolution of income and consumption inequality and the insights from the permanent-income model to decompose changes into income variances in changes in the variance of transitory and permanent components. An approach complementary to Blundell and Preston (1998) is that of Attanasio and Davis (1996), who frame their evidence in terms of a test of consumption insurance along the lines proposed by Cochrane (1991), Mace (1991), and Townsend (1994). Essentially, what Attanasio and Davis (1996) label “uninsured relative wage changes” is closely related to Blundell and Preston’s “permanent” shocks, which cannot be self-insured within a life-cycle model. The current lack of consensus and even the small number of studies that have analyzed in detail consumption inequality are related to the nature of the individual-level data currently available on consumption expenditure. The CEX is a relatively small survey, collected mainly to compute weights of the Consumer Price Index (CPI), rather than studying consumption inequality. Moreover, the survey is affected by other problems. There is now substantial evidence that by aggregating CEX data it is not easy to obtain figures corresponding closely to figures from National Income and Product Accounts (NIPA) Personal Consumption Expenditure (PCE) data for many commodities (see McCarthy et al. 2002). While the differences between CEX and NIPA-PCE data can partly be explained by definitional and coverage differences, and it does not necessarily arise from problems with the CEX (see, for instance, Slesnick 1992), the amount by which the CEX underestimates national aggregates is massive (around 35 percent) and compares badly with other surveys such as the Family Expenditure Survey for the United Kingdom. Moreover, the relationship between the aggregated CEX and NIPA-PCE data has worsened considerably during the second part of the 1990s (see, for example, Battistin 2003). The main goal of the CEX (that is the computation of CPI weights) is reflected in the existence of two completely separate surveys, one based on retrospective interviews (Interview Sample [IS]) and one based on weekly diaries (Diary Sample [DS]). The rationale is that some expenditure items (such as large, infrequent items) are better measured by retrospective interviews, while others (such as frequently purchased and small items) are better measured by diaries. Indeed, until 1986, the DS only collected information on frequently purchased items. Since 1986, both surveys are in principle exhaustive, but it is quite clear from the BLS literature and from informal communications that some items are reliably measured in the IS and others in the DS. This data structure does not constitute an important problem if one is only interested in means, but it creates a problem if one needs information on to-
518
Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
Fig. 17.1
Standard deviation of log per capita monthly expenditure
tal consumption expenditure for a given household, as it is the case if one wants to study consumption inequality across households. Of course, in the absence of reliable data on total consumption for a given household that would allow to study inequality at the individual level, one could focus on differences in mean consumption across well-defined groups of households. But such an approach would miss an important dimension of inequality, that is, within-group inequality. One of the most puzzling results that arises from the analysis of CEX consumption inequality data is that the evolution over time of consumption inequality as measured in the IS and DS is very different (see Attanasio 2003; Battistin 2003). In figure 17.1, we plot the standard deviation of log of per-adult equivalent nondurable consumption from 1982 to 2001 for IS and DS data (diary information is only after 1986). The figures are based on all households headed by an individual aged twenty-five to sixty. The difference is remarkable: the DS plot shows a substantial increase, amounting to around 10 percentage points between 1986 and 2001. The IS plot, on the other hand, shows a path that is substantially flat. As the IS shows an increase in the variance across education groups (see Attanasio and Davis 1996; Attanasio 2003), a constant overall inequality also constitutes indirect evidence of a decline in inequality within groups.1 Regressing the two 1. This follows as Var( y) Var(E [ y⏐z]) E [Var( y⏐z)], where groups are denoted by z.
What Really Happened to Consumption Inequality in the United States?
519
lines on a constant and a linear time trend, one obtains (after 1986) a trend coefficient of 0.04 for the IS and of 0.64 for the DS with a t-statistic of 1.17 and 12.66, respectively. The correlation between the residuals of the two regressions is 0.26. This evidence is particularly puzzling because the differences in mean nondurable consumption between the two surveys is relatively stable over time (as we show in section 17.3).2 Moreover, in many other dimensions, the CEX offers a picture of inequality that is remarkably consistent with that obtained from other (extensively explored) data sets. In what follows we provide some evidence in this respect. The aim of this paper is twofold. First, we provide additional information on the dynamics of inequality of consumption and its components in the two CEX surveys and relate it to the dynamics of wage inequality, as measured both in the CEX and the Current Population Survey (CPS). Second, we use the information that some items are better measured in the DS and others in the IS and some assumptions on the nature of measurement error in the two surveys to obtain a unified picture of the dynamics of consumption inequality in the United States between 1986 and 2001 (and particularly for the 1990s). The fact that some items are better measured in one survey than in the other implies that consistent means for total (nondurable) consumption can easily be obtained for any consistently defined group of consumers combining the two surveys. However, to get an estimate of inequality (say the standard deviation of log consumption or the coefficient of variation of consumption), one needs to deal with the covariance between different consumption items that are well-measured in different surveys. One can make use of the measurement-error ridden measure in both surveys and, under some assumptions we discuss in section 17.4, obtain point estimates of the growth of the coefficient of variation of nondurable consumption. The remainder of the paper is organized as follows. In section 17.2, we describe the CEX and its components. We describe in detail the nature of the information on consumption available in the DS and the IS, as well as the sample we select for the rest of our analysis. In section 17.3, we discuss the evolution of average consumption over time in the two surveys. We also report some evidence on comparing the pattern of wage inequality in the CEX and in the CPS that, complementing that in Attanasio (2003), shows that the two surveys tell similar stories in this dimension. Finally, in this section, we also discuss the puzzle presented in figure 17.1. In particular, we consider and dismiss a few simple explanations for the divergence in in2. Attanasio (2003), however, shows that if one conditions on cohorts defined from education and year of birth, the differences in the dynamics of inequality between the two surveys is not as remarkable as in figure 17.1. Whether this is genuinely due to conditioning or to the small sample sizes in the DS once one crosses education and year of birth cohort is, however, debatable.
520
Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
equality between the two surveys. In section 17.4, we write down the basic relationships and assumptions we use to obtain an estimate of the variance of consumption by combining the two surveys we have. In section 17.5, we present the results we obtain using such an approach. Section 17.6 concludes. 17.2 The CEX Surveys The CEX is currently the only micro-level data set reporting comprehensive measures of consumption expenditures for a large cross section of households in the United States. The CEX has a long history: the first survey was collected in 1916 to 1917. More recently, the CEX was collected in 1960 to 1961 and 1972 to 1973. As the main scope of the survey is to compute weights for the CPI, data were collected roughly every ten years. As a consequence, the survey methodology and the questionnaires are not homogeneous across the early surveys and this makes intertemporal comparisons difficult. However, in 1980 it was decided to collect data on a continuous basis with a methodology that was roughly consistent over time. Since then, and especially after 1982, the instruments have changed only marginally and in very few occasions. Therefore, with some important caveats (see the discussion in Battistin 2003) it is conceivable to use the time series of cross sections since 1982 for intertemporal comparisons. As we mentioned in the preceding, the CEX consists of two separate surveys, the IS and the DS. In this section, we summarize the main features of these two components. In particular, section 17.2.1 describes the IS and the DS questionnaires. Section 17.2.2 discusses the extent to which the IS and the DS are comparable with respect to sample designs, population coverage, and information collected. In the same subsection, we also discuss the definition of household total consumption we use in the analysis. Finally, section 17.2.3 presents some evidence on the sample we use in this paper. The reader interested in more specific details on the survey methodology in the CEX is referred to Battistin (2003) and the Bureau of Labor Statistics (2003). 17.2.1 Diary and Interview Samples In the CEX, sample consumer units are households (literally, “all members of a particular housing unit who are related by blood, marriage, adoption, or some other legal arrangement”; see Bureau of Labor Statistics [BLS] 2003). The survey consists of two separate and independent samples of households, each of them with its own questionnaire. The IS is a rotating panel including 5,000 units each quarter. The DS consists of repeated cross sections of households (around 4,500 per year) interviewed over a two-week period. Response rates for the two components are reasonably good (around 80 percent). Starting in 1999, the sample size has increased
What Really Happened to Consumption Inequality in the United States?
521
by about 40 percent. More detailed characteristics of the two surveys are discussed extensively in Battistin (2003). In the IS, households are interviewed about their expenditures every three months over five consecutive quarters. The first interview, however, is a contact interview on which there is no information in the public database. After the last interview, households are dropped and replaced by a new unit, so that—by design—20 percent of the sample is replaced out every quarter. Only one person responds for the whole consumer unit, typically the most knowledgeable of expenditures in the family. The percentage of households completing all five interviews is about 75 percent. In the DS, consumer units are asked to self-report their daily purchases over two consecutive one-week periods using product-oriented diaries. Each diary is organized by day of purchase and by broad classifications of goods and services. Respondents are assisted by printed cues and—depending on whether it is needed—by interviewers at pick-up. The percentage of households completing both diaries is about 92 percent. Crucial to our exercise is that the two samples drawn are random and representative of the same population. The two survey components are in fact based on a common sampling frame: the 1980 Census for those households sampled in the 1980s and the 1990 Census for households sampled in the 1990s. Sample designs differ only in terms of frequency and oversampling of DS households during the peak shopping period of Christmas and New Year holidays. 17.2.2 The Information Collected in the CEX In this paper, we use twenty years of data from both surveys of the CEX between 1982 and 2001. From 1980 to 1985, the DS only collected information on frequently purchased items, while it became comprehensive in 1986. Because of this, our analysis will focus especially on the 1986–2001 period. Both the DS and the IS collect detailed information on individual commodities, identified by several hundreds of Universal Classification Codes (UCC). The information on frequently purchased items, and especially food items, is much more detailed in the DS. In the IS, food is made only of two large components: food at home and food away from home. We perform a first level of aggregation on both surveys. This aggregation is mainly dictated by the categories that form the CPI defined by the BLS. We further aggregate these categories into non-durable consumption and other consumption expenditure. Throughout the analysis we will be focusing on the expenditure on nondurable goods and services. The expenditure categories considered have been defined so that definitions are comparable and consistent over time and across surveys (see Battistin 2003). Expenditure on nondurables is defined according to the definition in Attanasio and Weber (1995): food and
522
Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
Table 17.1
Definitions of expenditure categories Food and nonalcoholic beverages at home Food and nonalcoholic beverages away from home Alcoholic beverages (at home and away from home) Nondurable goods and services Newspapers and magazines Nondurable entertainment expenses Housekeeping services (DS only) Personal care (DS only) Housing and public services Home maintenance services Public utilities Miscellaneous home services Tobacco and smoking accessories Clothing, footwear, and services Clothing and footwear Services Heating fuel, light, and power Transportation (including gasoline) Fuel for transportation Transportation equipment maintenance and repair Public transportation Vehicle rental and miscellaneous transportation expenses
nonalcoholic beverages (both at home and away from home), alcoholic beverages, tobacco and expenditures on other nondurable goods such as heating fuel, public and private transports (including gasoline), services and semidurables (defined by clothing and footwear). The categories included in our definition of nondurable consumption are listed in table 17.1. We consider nine expenditure categories, corresponding to roughly 280 and 400 UCC for IS and DS data, respectively.3 While the bulk of the questionnaires and survey methodology were remarkably stable over time, some minor changes did occur. New diaries with more cues were introduced in the DS after 1991; for the IS, the food question changed in 1982 and 1987. Some UCC have changed, mainly reflecting the diffusion of new goods, but our aggregates are not affected substantially by these changes. Battistin (2003) maps UCC into the nine categories in table 17.1 accounting for these changes. Both surveys are almost exhaustive. This implies that, for most items, we 3. As we mentioned in the preceding, expenditures referring to “Housing and Public Services” and “Nondurable Services” have been introduced in the DS only after 1986, with the exception of very few items for “Home Maintenance Services” and “Nondurable Entertainment Expenses.” Similarly, information on “Fuel” and “Transportation” expenses is not available from public tapes between 1982 and 1985. As for IS data, the time series of food at home expenditure presents discontinuities introduced by changes in survey design in 1982 and 1987 (see Battistin 2003). A detailed description of the items used to define the categories of nondurable consumption can be downloaded at http://www.stat.unipd.it/~erich/papers .html, separately for IS and DS data.
What Really Happened to Consumption Inequality in the United States?
523
have a measure both for the households in the DS and for those in the IS. The only exception for the definition of consumption considered in our analysis is given by some small items (mainly housekeeping services and personal care—these categories are reported in italic in table 17.1 in the “Nondurable Goods and Services” category) for which information is collected in the DS, but not in the IS. We discuss how we tackle this problem in the following. We exclude from our definition of consumption expenditures on durables, health, education, as well as mortgage and rent payments. The main reason to exclude expenditure on durables is that it is not directly linked to consumption. One would like to measure the services provided by the existing stock of durables, rather than the increase in the stock of durables. Similarly, education and health expenditure obviously have an important investment component. Moreover, in the case of health, the CEX only measures out-of-pocket expenditures. Finally, we excluded rent because we do not have a reliable measure of rental equivalent for homeowners and mortgage payments because they are not directly related to the consumption of housing services. While all the exercises for which we report results use the expenditure on nondurable and services as our definition of consumption, we also performed some experiments with total consumption, as defined by the BLS. The results we obtained in these cases were substantially similar to those we report. To estimate average consumption, the BLS follows the standard international procedure of exploiting information from recall questions for more durable and less frequently purchased items purchased in the quarter prior to the interview. Diary-based records of purchases carried out within a two-week period are used for more frequently purchased items such as food. According to the Bureau of Labor Statistics (2003), neither survey is expected to measure accurately all components of consumption. In what follows, we label the commodities that the BLS thinks are better measured in the DS as D goods and services and those that are better measured in the IS as R goods and services. In table 17.2, we list which categories belong to the D and R groups according to the BLS.4 17.2.3 The Selected Sample In this analysis, we focus on households headed by individuals aged at least twenty-five and no more than sixty and not self-employed. The family head is conventionally fixed to be the male in all husband and wife families (representing the 56 percent and 53 percent of the whole sample for IS 4. It is worth noting that, although the level of aggregation considered by the BLS is finer, the classification procedure exploited in what follows broadly reflects the one currently being used in the publication of CEX data. See also the discussion in Battistin (2003), where evidence on the validity of this classification is produced with respect to other expenditure surveys in the world.
524
Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
Table 17.2
Commodity Split Commodities better measured in the Diary Survey: D goods Food and nonalcoholic beverages at home Food and nonalcoholic beverages away from home Alcoholic beverages (at home and away from home) Nondurable goods and services Commodities better measured in the Interview Survey: R goods Housing and public services Tobacco and smoking accessories Clothing, footwear, and services Heating fuel, light, and power Transportation (including gasoline)
and DS data, respectively). Battistin (2003) presents a detailed description of less-important selection criteria used to derive the working sample considered in the analysis. Although the two surveys are designed to be representative of the same population, there are significant differences between the two samples along several dimensions and with a different pattern over time (even using the population weights provided by the BLS). To control for observed compositional differences between the two samples (for instance the DS is slightly more educated than the IS sample), Battistin (2003) weights DS households with the inverse of the probability of being in the IS sample, estimated as a function of characteristics common across the two samples (propensity score weighting; see Battistin, Miniaci, and Weber 2003; and Hirano, Imbens, and Ridder 2003). The specification adopted (see table 1 in Battistin 2003) includes education, race, age, and work-related information of the head, as well as information on household composition (proportion of children and members within certain age bands) and family income. These variables have been proven relevant to data quality in previous analysis of CEX data (see Tucker 1992). We use the same procedure here. However, results obtained using BLS population weights or propensity score weights are basically identical. Table 17.3 reports, for each year, the size of the sample we end up with for the two surveys. As it is obvious from the table, sample sizes are not huge, particularly for the DS sample. This represents a real problem if one wants to control for several observable characteristics, such as year of birth and education. The increase in sample size in 1999 we mentioned previously is evident in this table. Monthly expenditure in the DS is defined as 26/12 2.16 times the expenditure observed over two weeks, assuming equally complete reporting. Family consumption is adjusted using the Organization for Economic Cooperation and Development (OECD) adult equivalence scale, which gives
What Really Happened to Consumption Inequality in the United States? Table 17.3
525
Sample sizes
Year
Born 1960–69
1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
257 383 345 422 497 574 603 624 560 542 688 722 674 985 1,021 1,044
Born 1950–59
Born 1940–49
Born 1930–39
Totals
Diary sample
1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
242 435 572 989 1,217 1,436 1,724 1,943 2,030 2,334 2,424 2,380 2,133 2,954 3,088 3,043 4,331 4,393 4,314
864 849 756 738 809 808 744 726 663 587 758 740 751 997 931 950
675 633 515 603 578 571 555 527 476 444 504 579 543 716 722 691
419 466 374 412 459 396 352 370 295 265 328 411 347 481 457 418
2,215 2,331 1,990 2,175 2,343 2,349 2,254 2,247 1,994 1,838 2,278 2,452 2,315 3,179 3,131 3,103
Interview sample 2,881 2,883 3,226 2,804 3,059 2,577 2,765 2,201 3,498 2,690 3,376 2,609 2,852 2,411 2,768 2,412 2,904 2,309 2,862 2,181 2,869 1,978 2,899 2,159 2,869 2,132 2,526 1,849 3,244 2,380 3,363 2,347 3,221 2,268 4,493 3,147 4,381 3,259 4,099 3,207
1,979 1,901 1,729 1,587 2,023 1,995 1,611 1,590 1,472 1,568 1,467 1,424 1,384 1,213 1,541 1,585 1,691 2,232 2,216 1,932
7,743 8,173 7,800 7,125 9,200 9,197 8,310 8,494 8,628 8,641 8,648 8,906 8,765 7,721 10,119 10,383 10,223 14,203 14,249 13,552
each adult beyond the first a weight of 0.7 and each child (under eighteen) a weight of 0.5. While such an adult equivalence scale is clearly arbitrary, the results we obtain are only minimally affected by the consideration of alternative scales.5 Real expenditures are obtained using the CPI published 5. In previous versions, we have reported results based on different adult equivalence scales. The choice of the scale changes the results very little.
526
Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
by the BLS. Consumption data are also corrected for seasonality, using a simple seasonality model estimated on the whole data set. This simply consisted in regressing monthly expenditures on monthly dummies and removing the monthly effect for each household. To control for the effects of outliers, we trimmed the data at the 1st and 99th percentiles of the expenditure distribution. Both surveys collect information on a very large set of household characteristics (demographics and work-related variables) as well as on income and assets (using a twelve-month recall period). The latter information is subject to top-coding in both components of the CEX and known to be not as reliable as the expenditure information: the amount of incomplete income reporters is about 13 percent in the two surveys and missing values are currently not imputed (see McCarthy et al. 2002). Because the percentage of incomplete income reporters is so high, we included all of them in the final sample. In our robustness analysis, we checked whether the exclusion of households with incomplete income responses makes any difference to our main results. 17.3 Evidence on Consumption and Wages In this section, we present three sets of results. First, we compare expenditure means from the two CEX surveys to aggregate values from national accounts data. We find important differences between IS and DS figures and, crucially, in the ratio of CEX to NIPA figures over time. Second, we present some data on the evolution of wage inequality exploiting CEX and CPS. This evidence shows that the overall picture painted in the two surveys is essentially the same. Finally, we present some additional information on the evolution of consumption inequality from the two survey components of the CEX. 17.3.1 Consumption Means In figure 17.2, we compare total nondurable expenditure in published CEX tables to the figures one obtain for a similar category in the NIPA accounts for PCE.6 The CEX aggregates are computed using the population weights provided by the BLS and are based on published information from both the DS and the IS. Two elements are worth stressing from this picture. First, even though there are some important definitional differences, dis6. We are grateful to David Johnson at the BLS for making this graph available to us. Nondurables includes food at home, food away, alcohol, apparel and services, maintenance and repairs, utilities, household operations, housekeeping supplies, gasoline and motor fuel, vehicle maintenance, vehicle rental and other, public transportation, fees and admissions, other entertainment supplies, personal care, and tobacco. The contents of this figure are comparable to those of figure 3.2 in Slesnick (2001, 51), although the latter figure looks at total expenditure on durable and nondurable goods.
What Really Happened to Consumption Inequality in the United States?
527
Fig. 17.2 Nondurable expenditures in 2000 dollars—Consumer Expenditure Survey (CEX) and Personal Consumption Expenditures (PCE)
cussed extensively by Slesnick (1992, 2001) among others, one cannot help noticing that the CEX figure massively understates the one from PCE data. While this does not necessarily mean that for every single consumption item the PCE provides superior information (see, for instance, the discussion in McCarthy et al. 2002), this evidence contrasts sharply with similar comparisons for the United Kingdom, where aggregating a time series of individual cross-sectional data, one obtains close to 95 percent of nondurable consumption, as documented in Banks and Johnson (1998). Second, while the divergence between CEX and PCE is roughly constant in the first part of the sample, the difference seems to increase in the second part of the 1990s. This evidence is consistent with that reported by other researchers including Slesnick (1992, 2001) and Sabelhaus (1996). As we are interested in combining the information from the IS and the DS, it might be interesting to compare the estimates of aggregate nondurable consumption that emerge from the two surveys. In figure 17.3, we plot the time series of average log nondurable consumption computed in the two data sets. While average consumption is consistently higher in the IS than in the DS, we also notice that the relative difference between the two surveys is roughly constant over time and, in particular, after 1991.7 7. This hypothesis can be tested statistically and cannot be rejected at standard significance levels. Battistin (2003) finds that the relationship between mean expenditures in the two surveys varies a great deal considering a similar analysis by expenditure group.
528
Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
Fig. 17.3
Mean of log monthly expenditure on nondurable goods (2001 dollars)
In figure 17.4, we plot the time series of the average level (rather than log) of monthly nondurable consumption from the two data sets. The two series are very different in the first part of the sample, but they converge remarkably starting in 1991. Of course, figure 17.1, where we plotted the standard deviation of log consumption, and figures 17.3 and 17.4, that plot the average level and log consumption, are not independent: under the assumption of log normality, one would be able to derive any one from the other two figures. We can summarize the evidence from these pictures by saying that, especially from 1991 onward, the average log and level consumption estimated from the two surveys move very closely to each other, while the standard deviation diverges considerably. 17.3.2 Wage Inequality in the CEX and in the CPS There is a widespread perception that income figures in the CEX are not particularly reliable, especially relative to other more established and better explored data sets, such as the CPS. Before starting with the analysis of consumption inequality, it is therefore worth reporting some information on how the CEX performs in measuring the level and inequality of wages (hourly earnings) over time relative to the CPS, which has been widely used in the study of wage inequality. From one of the supplementary CEX files, it is possible to obtain measures of earnings and hours worked for each household member. We com-
What Really Happened to Consumption Inequality in the United States?
Fig. 17.4
529
Mean of monthly expenditure on nondurable goods (2001 dollars)
pare the figures we obtain using these measures to analogous measures obtained from rotating CPS files.8 Using this information, we compute the average and the standard deviation of log male hourly earnings in both data sets for the twenty years from 1982 to 2001. We perform the exercise both for the whole sample and for cohort and educational attainment groups. Top-coding levels are different in the two data sets and have changed over time. In both data sets, we compute the mean and standard deviation of each cell by fitting (in each cell) a log-normal distribution truncated at the top-coding level. This procedure assumes that the log-normal distribution fits reasonably well the distribution of wages even for the top few percents that are top-coded. This type of correction is particularly important for comparisons over time as top-coding levels change. We start by computing the mean and standard deviation of log hourly male earnings in the IS and in the CPS for all males living in urban areas aged between twenty-five and sixty. If we correlate the averages, we obtain a correlation coefficient of 0.62. Regressing the IS average on the CPS average, we obtain a coefficient of 0.83 (standard error 0.26), a constant not statistically different from zero and an R-squared of 38 percent. 8. The CPS is not exempted from problems. There are several changes, both in terms of definitions of various variables (such as relation to the household head) and in top-coding levels. We have used the NBER extracts and suggestions to correct for changes in definitions and variable labels over time. We discuss what we do for top-coding in the following. We thank David Card for providing us with the data for 2000 and 2001 and for some useful suggestions.
530
Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
Fig. 17.5
Standard deviation of log wages in the CPS
Correlating the standard deviation of log wages we obtain a coefficient of 0.5. A regression of the IS standard deviation on the CPS standard deviation yields a coefficient of 0.32 (standard error 0.13), a constant of 0.16 (standard error 0.05) and an R-squared of 26 percent. Relating changes in mean log wages in the two data sets yields a correlation coefficient of 0.44, while changes in the standard deviation in the two data sets yield a correlation coefficient of 30 percent. We interpret this evidence as saying that the CEX and the CPS depict similar pictures in terms of the dynamics of inequality in wages. To conclude this section, in figure 17.5 we plot the standard deviation of log wages in the CPS. This figure shows a consistent increase throughout the sample period. Between 1990 and 2000, this measure of wage inequality increases by about 0.04. This will be a useful point of reference when assessing the size of the increase in consumption inequality. 17.3.3 Consumption Inequality In figure 17.1, we plotted the standard deviation of log nondurable consumption in the IS and the DS. We have already stressed the difference in the time series pattern of the two measures of consumption. Very similar evidence can be obtained considering other measures of inequality. Battistin (2003), for instance, reports evidence on the Gini coefficient, and
What Really Happened to Consumption Inequality in the United States?
531
various measures belonging to the Generalized Entropy Family. This difference is particularly puzzling given the substantial stationarity between the difference in mean consumption in the two surveys over time. To make sense of the remarkably different pattern we observe in figure 17.1, we begin by analyzing simple explanations. In particular, we check whether the difference could be explained by (a) changes in questionnaires and survey methodology; (b) changes in the frequency of purchases of commodities; (c) changes in the willingness to answer surveys; and (d) changes in the differences in sample compositions.
• Changes in questionnaires and survey methodology. From official BLS documents, analysis of the questionnaires and conversations with BLS staff, we could not identify any substantive change that would explain the observed differences. The only substantive change occurs in the IS for the question for food consumed at home, changed in 1982 and 1987 (see Battistin 2003). The first change, 1982, is outside our interval, and the second change, 1988, precedes the point in which the two measures of inequality start to diverge (1991). Moreover, such an explanation would be difficult to square with the absence of changes in the difference of means. • Increase in the number of zeros in the DS. A potentially attractive explanation is the following. Over time, people shop less frequently and purchase larger quantities in each shopping trip. As the horizon of the two surveys is different (two weeks for the DS, three months for the IS), the DS would result in the same mean (or a stable difference over time) but increasingly larger variances over time in the DS because of the increased number of zeros. Table 17.4 (see Battistin 2003) shows that the number of nonzero expenditures for nondurable items varies over time for some groups (particularly, it decreases for “alcohol,” “tobacco,” and “clothing”) but with the same pattern across samples. If anything, the number of zeros seems to increase more in the IS than in the DS (see, for instance, “food” and “alcohol”). We can therefore confidently dismiss such an explanation. • Over time people have become less willing to answer accurately or answer at all. Obviously, it is difficult to judge the importance of measurement error over time. However, one can check whether attrition rates or the fraction of incomplete income responses have changed substantially over time and differentially so for the two surveys. Anecdotal evidence shows that wealthier individuals are less willing to answer. Survey response rates did not change much over time, remaining between 80 and 90 percent. McCarthy et al. (2002) and Slesnick (2001) report evidence in this respect. The same is also true for the percentage of incomplete income responses. It is therefore unlikely that this could explain the different trend in inequality painted by the two CEX surveys.
532 Table 17.4
Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura Percentage of zero expenditures 1986–89
Food and nonalcoholic beverages at home Food and nonalcoholic beverages away Alcoholic beverages (at home and away) Nondurable goods and services Housing and public services Tobacco and smoking accessories Clothing and footwear Heating fuel, light, and power Transport (including gasoline) Food and nonalcoholic beverages at home Food and nonalcoholic beverages away Alcoholic beverages (at home and away) Nondurable goods and services Housing and public services Tobacco and smoking accessories Clothing and footwear Heating fuel, light, and power Transport (including gasoline)
1990–92
Interview sample 0.00 0.00 0.10 0.12 0.37 0.43 0.08 0.09 0.03 0.02 0.56 0.61 0.13 0.15 0.11 0.09 0.02 0.02 Diary sample 0.02 0.06 0.42 0.17 0.34 0.57 0.26 0.54 0.04
0.01 0.08 0.47 0.18 0.35 0.62 0.27 0.49 0.06
1993–95
1996–98
1999–2001
0.00 0.12 0.44 0.10 0.02 0.64 0.18 0.08 0.02
0.00 0.13 0.49 0.12 0.02 0.66 0.22 0.07 0.02
0.00 0.15 0.52 0.15 0.02 0.71 0.28 0.07 0.03
0.02 0.11 0.50 0.22 0.36 0.67 0.30 0.54 0.07
0.02 0.10 0.55 0.23 0.37 0.68 0.32 0.54 0.06
0.02 0.11 0.55 0.26 0.37 0.71 0.36 0.55 0.07
• Survey nonresponse. It is widely believed that households who refuse to take part into surveys are overrepresented by very rich and very poor. If so, we would possibly underestimate inequality because of underrepresentation of consumer units (CUs) in the upper and lower tail of the distribution of consumption. It might be that a larger number of wealthier individual are being lost in the IS survey. While such an hypothesis has been suggested (by Sabelhaus [1996] among others), it is unlikely that it could explain the observed different pattern between IS and DS. We control for differences in sample composition by using our propensity score weights9 and do not find that these differences (or other differences in sample composition) can explain the different dynamics of inequality in the two samples. • Changes in sample compositions. As we have already mentioned, the composition of the IS and DS is different, even after using the BLS weights. For instance, the DS is better educated than the IS. While these differences change marginally over time, we control for them by using propensity score weights and show that these differences cannot explain the different dynamics of inequality. Having discarded some simple explanations of the puzzling patterns observed in figure 17.1, we now look at the dynamics of inequality in the two 9. Propensity score weights are based on the specification described in section 17.2.3 and have been computed from the regression results presented in Battistin (2003).
What Really Happened to Consumption Inequality in the United States?
Fig. 17.6
533
Squared coefficient of variation for D goods
subsets of goods we have defined as R goods and D goods in table 17.2. This is possible because both sets of goods are observed in both surveys. In figure 17.6, we plot the square of the coefficient of variation of D goods as measured in both surveys, while in figure 17.7, we plot the square of the coefficient of variation of R goods as measured in both surveys. Two features of these pictures are noteworthy. First, the coefficient of variation of D goods seems to be increasing in both surveys. On the other hand, the coefficient of variation of R goods increases slightly in the DS, while it stays constant in the IS after 1990. The drop observed in IS for 1986 and 1987 may be related to the change in the food question as discussed in Battistin (2003). The divergence in the path of measured coefficients of variation between the two surveys is particularly evident in the first part of the sample, until 1990. Second, for both sets of goods, the coefficient of variation is much larger in the DS than in the IS. This feature might be a consequence of the shorter horizon covered by the DS and the larger number of zeros documented in table 17.4. 17.4 Combining Information from Interview and Diary Samples Rather than pursuing further the attempt to explain the difference between inequality measures observed in the IS and DS, in this section we propose a different approach. The main reason for the existence of two
534
Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
Fig. 17.7
Squared coefficient of variation for R goods
different samples is that the BLS believes that different methodologies are more appropriate in measuring different commodities. Indeed, the weights for the CPI, as well as aggregate estimates produced by the BLS, ultimately combine information from the IS and the DS, in that some commodities are deemed to be better measured in the IS, while others are in the DS. In table 17.2 we listed which categories, according to the Bureau of Labor Statistics (2003), are better measure by each survey. Let C ∗ be total expenditure on all nondurable commodities (1)
C ∗ C ∗D C ∗R ,
where C ∗D and C ∗R represent expenditures on items that are better measured in DS and IS data, respectively. Obviously, both surveys being (almost) exhaustive, a measure of “R goods” exists also in the DS and a measure of (most) “D goods” exists also in the IS.10 More accurate estimates of average nondurable consumption can easily be obtained by combining information from the two surveys. Nevertheless, 10. The reason for the qualifiers in parentheses is the existence of a small subset of commodities (mainly personal-care items) on which there is information in the DS and no information in the IS (as pointed out in table 17.1). We discuss the implications of the presence of these goods after we describe our approach.
What Really Happened to Consumption Inequality in the United States?
535
it is worth noting that straightforward pooling cannot be implemented as diary and recall expenditures are not observed for the same survey households. If one is interested in the variance of nondurable consumption, the problem is more complicated. In what follows, we will be interested in the squared coefficient of variation Var(C ∗D Var(C R∗ ) 2Cov(C D∗ , C R∗ ) Var(C ∗) (2) CV(C ∗)2 . [E(C D∗ ) E(C ∗R )]2 E(C ∗)2 The reason for this choice is twofold. First, if total consumption is log normally distributed, the following relationship holds exactly: Var(C ∗) Var(ln C ∗) ln 1 E(C ∗)2
so that the quantity in equation (2) is informative on the variance of log consumption. Battistin, Blundell, and Lewbel (2006) provide strong empirical evidence to support the fact that in a variety of data sets the crosssectional distribution of consumption seems to be very well approximated by a log normal. Second, regardless of the distribution of consumption, the squared coefficient of variation provides a first-order approximation to the variance of log consumption and therefore is of some interest as an index of inequality. Each household is either observed in the DS or in the IS. However, for each household, in both surveys, we observe expenditure on both D and R commodities. In what follows, we will denote by C d total nondurable expenditure as measured in the DS and by C r total nondurable expenditure as measured in the IS.11 Observed consumption in the two surveys is then given by (3)
C d C dD C dR, C r C rD C rR.
As we mentioned in the preceding, the BLS believes that the DS measures accurately commodities in D, while the IS measures well commodities in R. We translate this assertion in the following extreme assumption. Assumption 1:
C dD C D∗ , C rR C R∗.
Accordingly, the reporting error in DS and IS figures comes from CR and CD , respectively
11. As mentioned in the preceding, for the time being we are ignoring the fact that some D goods are not observed in the IS. We tackle this problem in the following.
536
Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
(4)
C d C ∗ C dR C R∗ dR ,
(5)
C r C ∗ C rD C D∗ rD .
If assumption 1 is satisfied, the mean of IS and DS errors is identified by E(rD) E(C rD ) – E(C dD ) and E(dR ) E(C dR ) – E(C rR ), respectively. By analogy, the mean of total expenditure can be estimated by E(C ∗) E(C rR) E(C dD). Figures for IS and DS errors as proportion of total nondurable expenditure are reported in Battistin (2003), where implications on estimated saving rates from the IS are also discussed.12 Note that we do not require measurement error to be of the classical type. Indeed, classical measurement error and, in particular, the absence of correlation between the measurement error and the true level of the relevant variable and assumption 1 would imply a larger measured variance of “R goods” in the DS and of “D goods” in the IS. This implication is obviously contradicted by figures 17.6 and 17.7. As we can easily estimate the denominator of equation (2) by combining the two data sets, we will focus here on the estimation of the cross-sectional variance for a given group of individual households. From equation (2), it is clear that to estimate the variance of C ∗ we lack an estimate of the Cov(C ∗D , C ∗R ). Equations (4) and (5) together with assumption 1 imply that observed covariances in the two surveys are informative about such a covariance as Cov(C dD , C dR ) Cov(C D∗ , C R∗ ) Cov(C D∗ , dR), Cov(C rD , C rR ) Cov(C ∗D , C R∗ ) Cov(C R∗ , Dr ). Clearly, if we assumed that either Cov(C ∗D , dR) 0 or Cov(C ∗R , rD) 0, it would be possible to identify Cov(C ∗D , C ∗R ), which is what we are interested in. However, notice that if we assumed that Cov(C ∗D , υdR) 0, we could test whether Cov(C ∗R , rD) 0 and vice versa. Clearly each of these alternative two tests are equivalent to testing whether Cov(C D∗ , dR ) Cov(C R∗ , rD ) as this is equivalent to (6)
Cov(C rD , C Rr ) Cov(C dD , C dR ) 0.
Equation (6) is a necessary but not sufficient condition for our identification assumption. If Cov(C ∗D , dR ) 0, we can identify Cov(C ∗D , C ∗R ) from the DS, while if Cov(C ∗R , rD ) 0, we can identify from the IS. Unfortunately, the restriction in equation (6) is rejected in our data (see Battistin 2003). 12. While the assumption of no measurement error in D commodities in the DS and of no measurement error in R commodities in the IS is extreme, it is made here only for expositional convenience. As we discuss later, it can be slightly relaxed without changing the substance of our argument.
What Really Happened to Consumption Inequality in the United States?
537
However, if we are only interested in the changes of the variance (or of the coefficient of variation) of consumption, we can use a weaker and less unappealing identification assumption. Because Var(C R∗ ) Var(C D∗ ) Cov(C D∗ , C R∗ ) (7) Var(ln C ∗) 2 , E(C ∗)2 E(C ∗)2 E(C ∗)2 it is easy to show that the last term on the right hand side of this expression can be identified if we assume either that Cov(C ∗D , dR )/E(C ∗)2 0 or Cov(C ∗R , rD )/E(C ∗)2 0. Once again, assuming the former we can test the latter, and vice versa. Once again, testing either of these assumptions is equivalent to testing the hypothesis that Cov(C D∗ , dR ) Cov(C R∗ , rD) , E(C ∗)2 E(C ∗)2 which in turn is equivalent to the following hypothesis in terms of observables: (8)
Cov(C dD , C dR ) Cov(C rD , C rR) 0. E(C ∗)2 E(C ∗)2
Such an hypothesis is a necessary but not sufficient condition for the point identification of Var(ln C ∗). A sufficient condition for the point identification of Var(ln C ∗) is the following: Cov(C dD , C dR ) Cov(C rD, C rR ) Assumption 2: 0. E(C ∗)2 E(C ∗)2 A nonrejection of the relationship in equation (8) does not guarantee that assumption 2 is correct: it is possible that the covariances of measurement error with the true value change in both surveys in the same way, while we need that such covariances are constant over time. It should be stressed, however, that the assumption of a homoscedastic measurement error is implicitly made by all studies that analyze changes in inequality over time. As we mentioned in the preceding, for some commodities, mainly personal-care items, the only available information is that in the DS as the IS does not ask the relevant retrospective questions. This slightly complicates the approach sketched in the preceding. Equation (1) now becomes (9)
∗ C Dp ∗ C R∗ , C ∗ CDr
where C ∗Dr are “D goods” also available in the IS and C ∗Dp are “D goods” available only in the DS. Equation (3) now becomes C d C dDr C dDp C dR. Equation (9) implies that one cannot obtain a complete estimate of average total nondurable consumption from the IS. Moreover, when one wants to compute the variance, one obtains
538
Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
(10)
∗ C Dp ∗ ) Var(C R∗) 2[Cov(C Dr ∗ , C R∗) Var(C ∗) Var(C Dr Cov(C ∗Dp , C R∗)].
The first of the two covariances in the last expression can only be estimated under the assumption that Cov(C ∗Dp C∗Dr , dR ) 0. As before, if we are only interested in the changes in this covariances, we can use the less restrictive assumption that Cov(C ∗Dp C ∗Dr , dR ) 0. However, we will not be able to test the hypothesis that Cov(C ∗Dp C ∗Dr , dR ) Cov(C ∗R, rD ). In other words, if all the commodities in the DS were also observable in the IS, we could use either the covariance between D and R goods estimated in the DS or that estimated in the IS to compute the (changes in the) variance of total consumption. Now, at least for Cov(C ∗Dp , C ∗R ), we are forced to use the DS. For Cov(C ∗Dr , C ∗R ), however, we can use, as before, both surveys. 17.5 Results In figure 17.8, we report two estimates of the evolution of consumption inequality. Both are derived using equation (10). However, the first is obr tained using Cov(C Dr , C Rr ) for Cov(C ∗Dr , C ∗R ), while the second uses d d Cov(CDr , CR ). That is, the first measures the variance using
Fig. 17.8
Inequality growth using observed covariances
What Really Happened to Consumption Inequality in the United States? Table 17.5
539
Regressions using observed covariances Whole sample
1990s
Coefficient
Standard error
Coefficient
Standard error
Time Constant
0.1715 0.0112
0.0291 0.0028
0.0870 0.0191
0.0666 0.0066
R2 (%) No. of observations
71.15 16
17.57 10
d d d d Var(C ∗) Var(CDr CDp ) Var(C Rr ) 2[Cov(CDr , CRd ) Cov(CDp , CRd )],
while the second uses d d r d Var(C ∗) Var(CDr CDp ) Var(C Rr ) 2[Cov(C Dr , C Rr ) Cov(CDp , CRd )].
In addition to the raw estimate, we also plot a smoothed version of each measure obtained by a third-order moving average. The necessary condition in equation (8) is equivalent to the hypotheses that the distance between the two lines in equation (7) is constant. In table 17.5, we test such a hypothesis by regressing the difference between the two lines on a constant and a linear trend and test for the coefficient of the linear trend to be zero. Unfortunately, we can reject the null if we use the whole sample (see the left-hand-side panel). However, if we restrict the sample to the 1990s (see the right-hand-side panel), we do not reject the null. The changes in the DS questionnaire in the early 1990s might constitute a justification for restricting the sample to the 1990s. A possible interpretation of figure 17.8 is that regardless of whether we use the DS- or the IS-based measure of Cov(C ∗Dr , C ∗R ), the evidence indicates a slight decline of inequality in the late 1980s followed by a sustained increase during the 1990s. The magnitude of the increase, however, is debatable. If we use estimates of Cov(C ∗Dr , C ∗R ) based on the DS, we obtain a larger increase than if we use estimates from the IS.13 If we decide that assumption 2 is a valid one, the two lines in figure 17.7 give us two measures of the changes in consumption inequality. An issue is therefore how to combine this information to obtain a single efficient measure. Figure 17.9 presents our estimates of the evolution of consumption inequality (from an arbitrary starting point) obtained from the entire CEX sample, as it results from combining efficiently the two lines in figure 17.8. In the figure we also plot confidence intervals (dotted lines). 13. By using stronger assumptions, one can also relax assumption 1 of no measurement error in some components of the survey. Attanasio, Battistin, and Ichimura (2004) spell out which assumptions are necessary to achieve identification in this case. The results we present in the following would be unaffected by this slightly different approach.
540
Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
Fig. 17.9
Inequality growth using combined information
Denoting with ˆR and ˆD the estimate of equation (2) obtained using IS and DS covariances, respectively, we obtain the estimates in figure 17.9 as ˆR (1 )ˆD , where the value of is chosen so to minimize the variance of the estimator. The first-order condition gives Cov(ˆD , ˆD ˆR ) ∗ , Var(ˆD ˆR ) which corresponds to the coefficient of an OLS regression of ˆD on ˆD – ˆR . We estimate the optimal value of for each time period t between 1986 and 2001 via resampling methods using 1000 pseudosamples from the original data set. To improvement readability of figure 17.9, table 17.6 reports observed inequality growth over time from IS and DS data. As we mentioned in the preceding, the level of inequality is not identified. Strictly speaking, the figure is only informative about changes in inequality over time. We pin down the level so to place the initial level of inequality between the initial levels of inequality in the DS and IS in 1986. Perhaps not surprisingly, the estimate we obtain for the changes in inequality are in between the paths for the DS and the IS. However, having
What Really Happened to Consumption Inequality in the United States? Table 17.6
541
Inequality growth over time Time 1995–90 2000–95
Interview
Diary
Estimated
1.10 –1.16
5.75 2.09
3.58 1.84
gone through the exercise of using optimally the information coming from the IS and the DS, the interesting question is a quantitative one: by how much does inequality increase according to our estimates? The answer is by a substantial amount. According to our results, inequality rises by about 5.4 percentage points over the 1990s. This results is economically very different from that reported by Krueger and Perri (2006) and from the change observed in the IS of approximately 1.0 percentage point over the same period. 17.6 Conclusions This paper begins with the puzzle that, when following the evolution of consumption inequality during the late 1980s and the 1990s in the United States in the two available surveys, one obtains very different and contradictory patterns. We use the information that some components of consumption are better measured in the Diary survey while others are better measured in the Interview survey to obtain a new view on the pattern of inequality in the United States. Obviously, as we do not observe the same households in the two surveys, we can obtain a point estimate of the crosssectional variance of total nondurable consumption only by making some assumptions on the nature of measurement error. From our analysis, we conclude that consumption inequality has increased substantially more than what is indicated by the path of the standard deviation of log consumption in the IS (and substantially less than what is indicated in the DS). The increase of 5.4 percentage points is economically significant. Our results are reasonably robust. Moreover, of the two assumptions we have used we can conceivably relax the one that states there is no measurement error in some commodities in the DS and in other commodities in the IS. By assuming stationarity of the measurement error processes, we would obtain essentially the same results we have presented in this paper. Assumption 2 is crucial to identify changes over time in consumption inequality, but the assumption does not help identification of its level. Without that assumption, one can try to put bounds on the unknown covariance between expenditure on items in R and D. Battistin (2003) shows that Cauchy-Schwartz bounds on the level of consumption inequality can be derived using assumption 1. By looking at these bounds, Battistin (2003)
542
Orazio Attanasio, Erich Battistin, and Hidehiko Ichimura
shows that the effect of reporting errors should be massive to discard increasing inequality over time, even if assumption 2 were not satisfied. Battistin (2003) performs a sensitivity analysis with respect to the true value of the correlation coefficient between items in R and D, ∗ say. Inequality is found to be statistically increasing over time unless the effect of reporting errors on ∗ is such that observed correlations in IS and DS data and ∗ are more than 30 percent apart (in absolute terms). This research is only the beginning of a more ambitious research project. First, we have only looked at nondurable consumption. It will be interesting to study the evolution of durable expenditure and, more important, consumption. The IS contains a remarkable amount of information on the stock of vehicles and housing, both of which are important components of consumption. Second, it will be interesting to relate directly wage and consumption inequality and their evolution. Such a study can be informative both about the nature of the shocks faced by individuals (temporary shocks to earnings are less likely to be reflected into consumption than permanent shocks) and by the mechanisms that individuals use to smooth out shocks.
References Attanasio, Orazio P. 2003. Consumption inequality: What we know and what we can learn from it. Lecture presented at the annual meeting of the Society of Economic Dynamics, New York. Attanasio, Orazio P., Erich Battistin, and Hideiko Ichimura. 2004. What really happened to consumption inequality in the US? NBER Working Paper no. 10338. Cambridge, MA: National Bureau of Economic Research. Attanasio, Orazio P., and Steven J. Davis. 1996. Relative wage movements and the distribution of consumption. Journal of Political Economy 104:1227–62. Attanasio, Orazio P., and Guglielmo Weber. 1995. Is consumption growth consistent with intertemporal optimization? Evidence from the Consumer Expenditure Survey. Journal of Political Economy 103:1121–57. Banks, James, and Paul Johnson, eds. 1998. How reliable is the Family Expenditure Survey? Trends in incomes and expenditures over time. London: Institute for Fiscal Studies. Battistin, Erich. 2003. Errors in survey reports of consumption expenditures. IFS Working Paper no. W03/07. London: Institute for Fiscal Studies. Battistin, Erich, Richard Blundell, and Arthur Lewbel. 2006. Puzzles of consumption and income distribution explained: Gibrat’s law for permanent income. Institute for Fiscal Studies. Unpublished Manuscript. Battistin, Erich, Raffaele Miniaci, and Guglielmo Weber. 2003. What do we learn from recall consumption data? Journal of Human Resources 38:354–85. Berman, Eli, John Bound, and Zvi Griliches. 1994. Changes in the demand for skilled labor within U.S. manufacturing: Evidence from the Annual Survey of Manufacturers. The Quarterly Journal of Economics 109:367–97.
What Really Happened to Consumption Inequality in the United States?
543
Blundell, Richard, Luigi Pistaferri, and Ian Preston. 2002. Partial insurance, information and consumption dynamics. IFS Working paper no. W02/16. London: Institute for Fiscal Studies. Blundell, Richard, and Ian Preston. 1998. Consumption inequality and income uncertainty. The Quarterly Journal of Economics 113:603–40. Bureau of Labor Statistics. 2003. Consumer expenditures and income. In Handbook of methods. http://www.bls.gov/opub/hom/homch16_2.htm. Cochrane, John H. 1991. A simple test for consumption insurance. Journal of Political Economy 99:957–76. Cutler, David M., and Lawrence F. Katz. 1991. Macroeconomic performance and the disadvantaged. Brookings Papers on Economic Activity, Issue no. 2:1–74. Washington, DC: Brookings Institution. Deaton, Angus S., and Cristina H. Paxson. 1994. Intertemporal choices and inequality. Journal of Political Economy 102:437–68. Hirano, Keiseuke, Guido W. Imbens, and Geert Ridder. 2003. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71:1161–89. Krueger, Dirk, and Fabrizio Perri. 2006. Does income inequality lead to consumption inequality? Evidence and theory. Review of Economic Studies 73 (1): 163–93. Mace, Barbara J. 1991. Full insurance in the presence of aggregate uncertainty. Journal of Political Economy 102:384–94. McCarthy, Mary E., David Johnson, Thesia I. Garner, and Bill Passero. 2002. Issues in construction and research use of the Consumer Expenditure Survey. Paper presented at the 2002 NBER Summer Institute, Cambridge, MA. Sabelhaus, John. 1996. Consumer Expenditure Survey: Family-level extracts, 1980: 1–1994:1. Washington, DC: Congressional Budget Office. http://www.nber.org. Slesnick, Daniel T. 1992. Aggregate consumption and saving in the United States. Review of Economics and Statistics 74:585–97. ———. 1993. Gaining ground: Poverty in the postwar United States. Journal of Political Economy 101:1–38. ———. 2001. Consumption and social welfare. Living standards and their distribution in the United States. Cambridge, UK: Cambridge University Press. Townsend, Robert M. 1994. Risk and insurance in Village India. Econometrica 62:539–91. Tucker, Clyde. 1992. The estimation of instrument effects on data quality in the Consumer Expenditure Survey. Journal of Official Statistics 8:41–61.
18 Technology Adoption from Hybrid Corn to Beta-Blockers Jonathan Skinner and Douglas Staiger
18.1 Introduction The idea that differential adoption of new technology can explain productivity differences across regions has gained acceptance in the economics literature. Most of the variation in income per capita across countries appears to come from differences in total factor productivity (TFP) rather than differences in inputs such as capital or labor (Klenow and RodriguezClare 1997; Casselli and Coleman 2006), and income differences have been directly related to differences across countries in technology adoption (Comin and Hobijn 2004). Economic models such as that proposed by Parente and Prescott (1994) imply that relatively small differences across countries in barriers to technology adoption could explain much of the difference in income levels and growth across countries. In medicine, the recent emphasis on accountability has documented a similarly wide variation in medical practice, with most observers agreeing that improvements in compliance with well-established guidelines could lead to substantial improvements in health outcomes of lagging regions (Berwick 2003; Jencks, Huff, and Cuerdon 2003; Skinner 2003). The real question is why some countries or regions are successful in Jonathan Skinner is the John French Professor of Economics and professor of community and family medicine at Dartmouth College, and a research associate of the National Bureau of Economic Research. Douglas Staiger is professor of economics at Dartmouth College, and a research associate of the National Bureau of Economic Research. We are grateful to Martin N. Baily, Ernst Berndt, Bronwyn Hall, Christopher Jencks, and to participants in the NBER Conference on Research in Income and Wealth on Hard-toMeasure Goods and Services: Essays in Memory of Zvi Griliches in September 2003, as well as seminar participants at Duke and Princeton Universities and the University of Michigan. Weiping Zhou and Daniel Gottlieb provided superb data analysis. This research was funded by the National Institute on Aging PO1-AG19783.
545
546
Jonathan Skinner and Douglas Staiger
adopting efficient technological innovations and why others lag behind. The classic research in economics goes back to the study of state-level hybrid corn diffusion during the 1930s and 1940s by Zvi Griliches (Griliches 1957). For Griliches, the answer was clear: the presence of differential economic incentives and the profitability of innovation determined the wide variations in the pace of diffusion.1 However, this view of the diffusion of technology was not shared universally. During the early 1960s, there was a long and contentious debate between Griliches and sociologists in the pages of Rural Sociology on what caused technology to diffuse. In this exchange, the sociologists, including Everett Rogers, emphasized characteristics of individual decision makers, the structure of networks, and interactions among decision makers in regulating the speed and extent of diffusion (Brandner and Straus 1959; Griliches 1960, 1962; Havens and Rogers 1961; Rogers and Havens 1962; Babcock 1962). While Griliches ultimately acknowledged the potential importance of these sociological factors (Griliches 1962), the literature on diffusion in sociology and the parallel literature in economics has shown little cross-fertilization, at least until quite recently.2 In this paper, we return to the debate between Griliches and the sociologists by considering a number of technological innovations in addition to hybrid corn, with a particular focus on the treatment of heart attacks (acute myocardial infarction [AMI]) in the 1990s. Between 1984 and 1998, mortality among the elderly in the year following a heart attack fell from 41 percent to 32 percent, with most of the gain representing progress in medical technology (Cutler 2004; Cutler and McClellan 2001). Despite these gains, there remain large and persistent differences across regions in the survival rates of observationally identical heart attack patients, differences that cannot be explained by variations in expenditures or surgical rates of diffusion (Skinner and Staiger 2006; Skinner, Staiger, and Fisher 2006). These patterns suggest that, like hybrid corn, the diffusion of new innovations in the treatment of heart attacks has been uneven, with some regions lagging behind others. In a previous study, we found that early adopter states of hybrid corn in the 1930s and 1940s were also the early adopter states of beta-blockers for the treatment of heart attacks in 2000 to 2001 (Skinner and Staiger 2006). Beta-blockers are a highly effective and inexpensive off-patent drug used among appropriate AMI patients to reduce demands on a weakened heart. Clinical trials indicate the use of beta-blockers reduce mortality by 25 percent or more (Gottlieb, McCarter, and Vogel 1998; Yusuf, Wittes, and Friedman 1998) and their beneficial effects have been well understood for 1. Hall (2004) provides an excellent synthesis of both the economic and sociological literature on diffusion. 2. See, for example, Calvó-Armengol and Jackson (2004) and Conley and Udry (2004).
Technology Adoption from Hybrid Corn to Beta-Blockers
547
several decades (Yusuf et al. 1985). In this paper, we expand our set of technological innovations to include not just hybrid corn and beta-blockers, but also tractors in the first half of the twentieth century, computers in the 1980s and early 1990s, and more general measures of technological progress in health care: changes and levels in one-year mortality and one-year expenditures for AMI patients during 1989 to 2001. Briefly, we find a strong state-level association in the likelihood to innovate in all technologies, ranging from tractors to beta-blockers. We next use a factor analysis approach to ask what characteristics of the state are most strongly associated with technological innovation in general and which are most closely associated with higher costs for treating heart attacks. Conventional economic explanatory variables include per capita income and population density to reflect potential geographic spillovers. As well, we also include high school attainment in 1928 (Goldin and Katz 1998) and a measure that reflects social networks or social capital more generally (Putnam 2001), a mechanism stressed in the early work on the diffusion of medical technology (Coleman, Katz, and Menzel 1966). Nearly all of the variables considered in the analysis line up along two latent factors. The first latent factor is highly associated with education (correlation 0.95) and the Putnam social capital index (correlation 0.97). This first factor is also closely associated with the adoption of hybrid corn, tractors, beta-blockers, and both the level and decline in AMI mortality, with somewhat weaker correlation for computer ownership, but is not associated with increased costs for the treatment of heart attacks. The second latent factor is strongly associated with per capita income (correlation 0.84) and population density (correlation 0.89) in the state. While this second factor is largely unrelated to technological diffusion or quality of health care, it is closely associated with higher health care costs (correlation 0.92). In sum, there appear to be common factors or barriers within regions governing the adoption of cost-effective technology, whether corn, tractors, computers, or beta-blockers, and these are associated with the presence of education (or more effective social or informational networks) in the region. Yet the adoption of these effective technologies was not associated with either per capita income or money spent treating heart attack patients. While Griliches (1957) developed powerful economic approaches to predict early technology adoption, sociological or network models may hold greater promise in explaining why some regions seem to lag so far behind. 18.2 Griliches versus the Sociologists on Technological Diffusion Griliches (1957) hypothesized that farmers who stand to gain greater profits from an increase in productivity (per acre) would be more likely to adopt the use of hybrid corn, a type of corn that increased yield by about
548
Jonathan Skinner and Douglas Staiger
20 percent. Noting that the dollar value of investing in hybrid corn was larger for agricultural land that was already more productive prior to adoption, he estimated a strong and significant state-level (and crop reporting area) association between adoption and the initial value of agricultural land (see also Dixon [1980], who found similar results). Griliches acknowledged the potential importance of sociological factors in affecting diffusion but dismissed them as relatively unimportant: It is my belief that in the long run, and cross-sectionally, these [sociological] variables tend to cancel themselves out, leaving the economic variables as the major determinants of the pattern of technological change. This does not imply that the “sociological” variables are not important if one wants to know which individual will be the first or last to adopt a particular technique, only that these factors do not vary widely crosssectionally. (522, footnote 45) Sociologists were not entirely pleased with this assessment of their work. They attempted to challenge the Griliches results on statistical grounds by focusing on time series rather than cross-sectional analysis, but this approach was not entirely successful because by appropriate correction for inflation, Griliches (1962) reversed their result to one supporting his profitability hypothesis. More successful was their study showing that hybrid sorghum tended to be adopted not so much where adoption was more profitable, but by farmers who had previously adopted hybrid corn, and was thus supportive of the idea of “congruence,” that innovation was easier when one has innovated in the past. The sociologists focused more on the observation that diffusion took place too slowly to conform to a purely economic model: The acceptance of improved farming practices is determined largely by economic considerations yet, if economic considerations were the only basis of acceptance, improved practices would be adopted as rapidly as their economic advantages were demonstrated. But, not only is there a considerable lapse of time between initial acquaintance and adoption of a practice, but those who would benefit most from improved practices are frequently the last to adopt them. (Wilkening [1953] quoted in Havens and Rogers [1961, 410]). Ultimately, Griliches grudgingly acknowledged the importance of sociological factors in diffusion, observing that “If one broadens my ‘profitability’ approach to allow for differences in the amount of information available to different individuals, differences in risk preferences, and similar variables, one can bring it as close to the ‘sociological’ approach as one would want to” (Griliches 1962, 330). While Griliches may have ultimately rejected the either-or approach of the earlier debate, others, such as Schultz ([1964, 164], quoted in Dixon [1980]) viewed his work as demonstrating the primacy of economic over sociological models in development economics.
Technology Adoption from Hybrid Corn to Beta-Blockers
549
Economic models of diffusion since then have largely remained firmly within the context of profitability and relative advantage as the most important factors determining the adoption of new technologies, as in the two models developed in Jovanovic and Nyarko (1995, 1996) where the relative advantage of the individual agent, for example, his past investments in specific technologies, determines whether a new technology is adopted. The common feature in these models is the absence of individuals interacting with others except indirectly through market forces. The literature in health economics has generally been more sympathetic to the idea of network effect on physician practice style (Fournier, Prasad, and Burke 2002), but even these are often poised in the form of individual Bayesian learning models (for a review, see Phelps 2000). Exceptions are Foster and Rosenzweig (1995), where farmers learned about their own optimal agricultural inputs by observing what other farmers do (knowledge spillovers), and Berndt, Pindyck, and Azoulay (2003), who allowed for the possibility of network effects in the prescription of antiulcer drugs. More recently, Conley and Udry (2004) were able to establish convincingly the importance of social networks in effecting these knowledge spillovers. They found that in Ghana, pineapple farmers emulated input choices of the more successful farmers inside their social network. The medical quality-improvement movement has adopted a sociological framework to understand why highly cost-effective treatments such as screening for breast cancer, annual flu shots, and the use of beta-blockers lag behind ideal levels (Berwick 2003). In this literature, physicians are categorized into groups such as early adopters and laggards, and the focus is on individual “change agents” who can shift treatment norms within hospitals (Berwick 2003; see Rogers 1995). Their emphasis on sociological factors is not entirely surprising. There are few if any economic incentives to prescribe beta-blockers or flu shots, so standard economic models of learning by doing or Bayesian updating could have limited predictive power in explaining diffusion (Skinner and Staiger 2006). What makes the lag in beta-blocker adoption puzzling is that the clinical benefits have been understood for many years. In 1985, Yusuf et al. (1985), concluded in their abstract that: “Long term beta blockage for perhaps a year or so following discharge after an MI is now of proven value, and for many such patients mortality reductions of about 25% can be achieved.” In 1996, the National Committee for Quality Assurance adopted beta-blocker use as a marker of quality (Berndt 2001), yet median state compliance with appropriate beta-blocker use in 2000 to 2001 was still just 69 percent (Jencks, Huff, and Cuerdon 2003). For the same reason, it would be difficult to develop an economic model explaining why the use of X-rays for fractures lagged for several decades in the first part of the twentieth century (Howell 1995) or why more than a century passed between the discovery that lemons and limes prevented scurvy
550
Jonathan Skinner and Douglas Staiger
to the date when the British Navy mandated their use in sailors’ diets (Berwick 2003). Education is an important variable considered by all empirical studies of diffusion, but the interpretation of how education matters differs between the economists and the sociologists. For much of the economic literature, education is a measure of human capital, and so innovation may be more likely to take place depending on the nature of the production function and the elasticity of substitution between skilled labor and capital (e.g., Caselli and Coleman 2006). The sociological literature, on the other hand, places greater emphasis on education as affecting the ability of “change agents” and early adopters to process and understand the value of the new technology and to be able to make the jump to a new paradigm (Rogers 1995). While one might expect differences across farmers in their education levels to affect innovation in hybrid corn, it is more difficult to understand how average education levels of the general population would matter for the use of medical procedures as each physician has completed at least twenty years of formal education.3 Alternatively, education might be associated with more rapid innovation if the demand for innovations were higher among highly educated patients. Explaining the pattern of betablocker use in the twenty-four hours following hospital admission as variations in demand by patients would be difficult. The typical Medicare patient is unlikely to be aware of beta-blockers at all, and, among ideal candidates, its use should have been near 100 percent in any case regardless of demand. An alternative hypothesis is that better quality physicians are attracted to regions with higher income and hence greater demand for quality although we do not find evidence for this in the empirical section below. The classic work in the diffusion of medical technology is Coleman, Katz, and Menzel (1966), who document the diffusion of a new antibiotic, tetracycline, among physicians in several North Central communities. While the diffusion was relatively rapid, with nearly 90 percent of physicians using the drug within seventeen months, there were distinct differences in rates of adoption. Physicians with more interactions among other physicians, such as affiliation with a hospital or group practice, were more likely to adopt early. Importantly, physicians who reported high social participation in the medical community adopted earlier. Among those with low social participation, even after seventeen months the use of the more effective antibiotic lagged by roughly 15 percent (Coleman, Katz, and Menzel 1966, 49). More recent analysis of the original data suggest a more complex mechanism for diffusion, but “contagion” still appears to have played a role (Strang and Tuma 1993). 3. Specialists have more formal education, but the correlation between cardiologists per capita and beta-blocker use across regions is near zero (Skinner 2003).
Technology Adoption from Hybrid Corn to Beta-Blockers
551
These results suggest an important interaction between regional levels of education and the extent of social participation or networks (see Barr 2000). The pathway is more indirect as education is highly predictive of social participation in the community, participation in democratic activities, and other factors conducive to the interactions that can lead to more rapid diffusion among health care providers within a community.4 In practice, we use indexes of social capital and education in the 1990s from Braatz and Putnam (1996). While the idea of social capital has received increasing attention in economics (e.g., Helliwell 1996; Guiso, Sapienza, and Zingales 2004), economists have also been among the most critical; see Durlauf and Fafchamps (2004) for a distinctly lukewarm review of this approach. Our use of social capital, however, avoids their most serious objections in that we need not identify causality, nor need we do more than to establish that some “factor x” associated with high levels of social capital (controlling for income) is also associated with technological innovation. As it turns out, general education levels and even high school graduation rates in 1928 (Goldin and Katz 1998) are highly predictive of these innovations, but reliance upon education alone begs the question, as noted previously, of why beta-blockers were more likely to diffuse in states with higher graduation rates. We begin the empirical analysis by considering at the state level the association among four technological innovations: hybrid corn, tractors, computers, and beta-blockers. Simple correlations between innovation in hybrid corn and tractors are consistent with either economic or sociological models as the farmers gaining most from hybrid corn adoption would presumably enjoy similar gains that adopting tractors. A correlation between hybrid corn and computers is less likely, although to be fair, farmers were early adopters of computers to keep track of plot yields and animal medical records.5 Predicting a positive correlation between the adoption of beta-blockers and the adoption of these other technologies is not so obvious as the educational attainment of physicians does not differ much across states, nor is it easy to develop a rational economic model as to why physicians wouldn’t adopt highly effective treatments that save lives but only cost about twenty cents per dose. To develop a broader picture of technology gains in the treatment of heart attacks, we also consider risk adjusted one-year mortality rates and risk-adjusted medicare expenditures by state, whether in levels or in changes during the period of analysis 1989–2000. Mortality rates reflect a more general measure of technological innovation that goes beyond beta4. Coleman, Katz, and Menzel (1966, 43) also find differences in rates of adoption depending on the location of the physician’s medical school, with those attending school in the North Central states being more likely to adopt. See also Strang and Tuma (1993). 5. We are grateful to Ernst Berndt for pointing this out. This pattern is consistent with the international data (e.g., Comin and Hobijn 2004); see also Skinner and Staiger (2006).
552
Jonathan Skinner and Douglas Staiger
blocker adoption. Because these rates are derived conditional on the index AMI and a variety of covariates are included to control for underlying health differences, they are less likely to be confounded by the general health levels of the state.6 One-year Medicare expenditures capture the total value of inputs to care, given that the Medicare program effectively pays on the basis of inputs (i.e., procedures performed by the physician or subsequent admissions to the hospital) rather than on quality of outcomes. Variations in expenditures capture both differences in total inputs across states and over time as well as differences in reimbursement rates.7 The obvious question is, why do some states lag behind in adoption of these innovations? Broadly, there are two ways that we might answer this question. On the one hand, there are factors that are unique to medical care such as regulation, health insurance, or malpractice laws. Alternatively, in some states there may be broad, general barriers to technology adoption such as discount rates, risk aversion, lack of human capital and information networks, or cultural norms (e.g., Strang and Soule 1998). These types of barriers would not be unique to medical care and would imply that some states would be late to adopt a wide variety of innovations. 18.3 The Factor Analysis Model We adopt a factor analysis approach to quantify the relative importance of state-level factors that could plausibly be associated with technological diffusion. The factor model assumes that all of the state-level variables are linear combinations of a few unobserved factors (representing the statelevel characteristics that are barriers to adoption), plus an independent error (Harmon 1967). Thus, the correlations that are observed between our different measures of adoption are assumed to be generated by a few common factors. Furthermore, we can estimate whether any observed state characteristic is strongly correlated with the common factor—that is, are there characteristics of states that are strongly associated with adoption and can account for the common adoption patterns that we observe across technologies. More specifically, a factor model assumes that the data we observe for each state (Y ) depend on a small-dimensional set of latent factors (F ):
6. In other words, Louisiana may have higher rates of heart disease than Vermont, but this does not imply that adjusted mortality in Louisiana should be higher conditional on the heart attack. For example, in the empirical section, we find no association between adjusted AMI mortality and per capita income. 7. There are also marked differences in Medicare reimbursement rates across regions because of cost-of-living variation and, more important, government programs such as graduate medical education and disproportionate share hospitals programs. Thus, cross-sectional comparisons in expenditures will reflect these permanent differences, while changes in expenditures will reflect only secular changes in reimbursement rates.
Technology Adoption from Hybrid Corn to Beta-Blockers
553
Yj Fj j where Fj is a 1 j vector of factors with dimension less than Y, is a matrix of coefficients (the factor loadings) giving the effect of the factors on the elements of Y, and j is a vector of independent residuals with Var() a diagonal matrix. Without loss of generality we normalize the factors to be independent with Var(F ) I, and normalize the observed data so that each variable has Var(Y ) 1. Thus, the factor loadings () can be interpreted as the increase in Y (in SD units) associated with a one standard deviation increase in each factor. Moreover, with this normalization the factor loadings are equal to the correlation between Y and each factor so that a factor loading near 1 suggests a strong association between Y and a given factor. The factor structure implies that E(YY ) Var(Y ) . Thus, the parameters of the factor model (, ) can be estimated by fitting the parameters of the factor model to the observed covariance matrix. There are a variety of methods for doing this. We assume normality and estimate the parameters by maximum likelihood (other methods yield similar results). The main advantage of the factor model is that it summarizes the observed correlation in the adoption measures with only a few factors and thereby focuses attention on a few simple dimensions on which adoption varies. Moreover, if the unobserved factors are strongly correlated with other state characteristics such as education, income, or social capital, then the factors can be given a natural interpretation. There are also a number of important limitations of the factor model. First, the estimated factor loadings are only unique up to an orthonormal rotation. We report estimates from the varimax rotation, which is commonly used in factor analysis because it maximizes the amount that each outcome is associated with a unique factor, thus helping with interpretation of the factors. A second limitation of the factor model is that the appropriate number of factors is not obvious. We choose the number of factors empirically, based on significance of additional factors and the robustness of results.
18.4 Data Here we describe each of the state-level variables used in the empirical analysis. 18.4.1 Measures of Nonmedical Technology Adoption We use two state-level measures of technology adoption in agriculture. For hybrid corn adoption, Griliches estimated logistic models for each state of the form ln(Pt /(K – Pt )) a bt, where Pt represented the fraction of land planted in hybrid seed in year t, K the maximum fraction of land to
554
Jonathan Skinner and Douglas Staiger
be planted in hybrid seed, b the parameter that captures the speed of adoption, and a the intercept. The parameters K, a, and b were estimated by least squares for each state using annual data on the use of hybrid corn for years ranging from 5 percent to 95 percent adoption (more precisely, K was first chosen by a crude visual method, and then the parameters a and b were estimated by ordinary least squares [OLS]). We use the time (measured relative to 1940) at which the state attained a 10 percent level of hybrid corn; this was given by (–2.2 – a)/b, and was reported in table 1 of Griliches (1957) for the thirty-one states on which he had data. We used an identical methodology to obtain an estimate of the year in which each state attained at least 10 percent levels of use for tractors. Data on the proportion of farms in each state using tractors in the years 1920, 1925, 1930, 1940, 1945, 1949, 1954, and 1959 were taken from Olmstead and Rhode’s (2000) analysis of the diffusion of tractors.8 We assumed that tractors would eventually be adopted by all farms (K 1) and estimated the parameters a and b by OLS for each state.9 Finally, using data from the 1993 Current Population Survey, we estimated the proportion of the population of each state that had a computer in their home. Unlike our industry-specific measures of adoption in medical care and agriculture, computer ownership captured the adoption of a new consumer good in the general population. 18.4.2 Medicare Claims Data on Treatment of Heart Attacks Acute myocardial infarction makes a good case study to use in studying health care productivity. Nearly every AMI patient is admitted to a hospital, and ambulance drivers are generally instructed to drive to the nearest hospital, so the match between hospital and AMI patient is rarely driven by consumer demand. The outcome, survival, is accurately measured, as are measures of inputs and covariates, whether reflecting regional measures in the use of pharmaceutical treatments or demographic and health information (including the type of heart attack).10 Data on the mortality and cost of treating heart attacks (AMIs) are constructed from the Medicare Claims data from 1989 to 2000; these data are described in greater detail in Skinner and Staiger (2006). They include 20 percent of Medicare admissions for 1989 to 1991 and 100 percent from 1992 to 2000. We use every fee-for-service heart attack admission to create a longitudinal cohort of 2.5 million fee-for-service enrollees age sixty-five 8. The data on tractor use was originally derived from U.S. Bureau of the Census (1962, 214). We thank Paul Rhode for providing these data. 9. Dixon (1980) reestimated the original Griliches (1957) diffusion parameters to account for what ultimately turned out to be a 100 percent adoption level in each state for hybrid corn but found results consistent with the original analysis. 10. Skinner, Staiger, and Fisher (2006) consider in more detail the role of surgical innovations in both improvements in outcomes and increasing costs.
Technology Adoption from Hybrid Corn to Beta-Blockers
555
or over coded with AMI.11 We created risk-adjusted one-year and inflation-adjusted increases in Medicare expenditures that control for a variety of demographic factors and comorbidities.12 Thus, differences across states and over time in demographic composition, severity of AMI, and health status are accounted for in both mortality and expenditure measures. We assigned patients to states based on their residence (rather than where they were treated) and use linked death certificate data to see whether patients survived a one-year window. Payments by Medicare for the initial and all subsequent inpatient (Part A) expenditures were used to construct measures of one-year hospital costs for each patient. To construct overall measures of risk-adjusted mortality and costs, we averaged the measures for each state across the years 1989–2000. To construct measures of the change in risk-adjusted mortality and costs, we used the change from 1989 to 1991 to 1999 to 2000. Overall, in our data, 35 percent of Medicare heart attack victims died within one year of their heart attack, and the average cost of treatment was just under $13,000 in 2000 dollars. 18.4.3 Other Data on the Quality of Medical Care We obtained from Jencks, Huff, and Cuerdon (2003) additional measures of the quality of medical care in 2000 to 2001 for each state. These measures were derived by the Center for Medicare and Medicaid Services (CMS) from large state-wide random samples of medical records. Jencks, Huff, and Cuerdon reported information for twenty-two quality indicators for each state, ranging from the fraction of the eligible population who receive mammograms or flu immunizations to the proportion of stroke patients who were prescribed a recommended medication at discharge. Thus, these measures were intended to capture the extent to which each state had adopted current medical guidelines across a range of conditions. We use the overall ranking (based on the average ranking across the twenty-two indicators, with 1 being the best) as a general indicator of how close a state is to the frontier in terms of its adoption of medical guidelines. As a specific measure of innovation in the treatment of heart attacks, we also used the percent of heart attack patients who receive beta-blockers within twentyfour hours of admission; as noted in the preceding, median state-level compliance was just 69 percent. 11. Luft (2003) found modest differences in underlying measurable health status between fee-for-service and the health maintenance organization (HMO) population. However, the survival characteristics of AMI patients in HMOs, conditional on the observed comorbidities in his (and our) data, were similar to those in the fee-for-service programs. 12. These include demographic cells by sex, five-year age groups, and race (black of nonblack), and categorical variables for the type of heart attack (anterior, inferior, subendocardial, other), the presence of vascular disease, pulmonary disorders, dementia, diabetes, liver dysfunction, renal failure, cancer (nonmetastatic), and cancer (metastatic). In this data set, we exclude patients reporting previous heart attacks, but to avoid potential secular biases, we did not exclude people appearing twice in the data set.
556
Jonathan Skinner and Douglas Staiger
18.4.4 State-Level Factors That May Influence the Rate of Adoption In addition to state-level measures of the rate of adoption of various innovations, we considered four variables that may capture differences across states that influence the rate of adoption. Because income could affect the demand for innovative products, we include per capita income based on the 2000 Census. Many theories would suggest that density of the market could influence the rate of innovation because higher density would be associated with greater potential gains from an innovation. Therefore, we use each state’s population density in 2000 (also from the Bureau of the Census). The third factor, education, was measured using two different approaches. The first is whether a state invested early in high school education. As documented in Goldin and Katz (1998), the “high school movement” occurred much earlier in certain regions of the country, and this early development has had lasting effects on each region. As an indicator of which states were early adopters of high school education, we used the number of public and private high school graduates in each state as a percent of the number of seventeen-year-olds in 1928.13 The second education measure relies on an index developed by Braatz and Putnam (1996). The index is the average of (a) test scores from the National Assessment of Educational Progress, (b) the average Scholastic Assessment Test (SAT) score in the state adjusted for participation, and (c) a measure of the high school dropout rate. Finally, sociological explanations of technology adoption suggest that measures of social capital, which capture the strength of relationships between individuals in the state, should be important factors. We used a social capital index also developed by Braatz and Putnam (1996), which is the average z-score of (a) nonprofit organizations per capita in 1989, (b) newspaper circulation per capita in 1991, (c) voter turnout in the presidential elections of 1988 and 1992, (d) the number of association memberships per capita from the General Social Survey for the years 1974–1994, and (e) the social trust measure from the General Social Survey for 1972–1996. 18.5 Empirical Results Summary statistics on the state-level variables are shown in table 18.1. Data are not available for all states in all years owing primarily to lack of data on agricultural innovations in urban states. (Alaska and Hawaii are obviously missing for the historical data.) There is a substantial degree of variation in all variables. 13. See Goldin (1994) for details on the construction of the 1928 high school graduation rate. We thank Claudia Goldin and Larry Katz for providing these data.
Technology Adoption from Hybrid Corn to Beta-Blockers Table 18.1
557
Summary statistics of state data
Variable
No. of observations
Mean
Standard deviation
Min.
Max.
1. AMI mortality and cost AMI adjusted 1-year mortality AMI adjusted 1-year ln(cost) mortality, 1989–91 to 1999–2000 ln(cost), 1989–91 to 1999–2000
50 50 50 50
2. Measures of innovation Percentage AMI getting beta blocker within 24hr Jencks quality of care ranking, 2000 Year hybrid corn achieved 10% Year tractors achieved 10% Percentage with computers at home, 1993
50 50 31 48 50
69.7 25.8 1940.6 1930.6 0.28
7.7 14.9 3.5 8.7 0.07
50 1 1936 1916 0.14
86 51 1948 1947 0.44
3. Factors that influence adoption Putnam social capital index ln(per capita income, 2000) Putnam education index High school graduation rate, 1928 ln(population density in 2000)
48 50 48 48 50
0.01 10.24 0.04 0.30 4.40
0.79 0.15 0.83 0.12 1.42
–1.37 9.95 –1.74 0.12 0.10
2.11 10.63 1.89 0.55 7.03
–0.009 9.37 –0.034 0.63
0.02 0.19 0.020 0.16
–0.055 9.09 –0.102 0.42
0.023 9.78 0.031 1.22
18.5.1 Is Early Adoption Related across Innovations? We begin by comparing the year in which each state adopted hybrid corn and two other measures of technology adoption in the state: the year tractors achieved 10 percent (figure 18.1) and the rate of computer ownership in 1993 (figure 18.2). There is a close association across all three variables with regard to innovation (the graph comparing computers and tractors, not reported here, is similar). The correlation between tractors and hybrid corn is not entirely surprising, but it is perhaps more surprising that the pattern is so similar in figures 18.1 and 18.2. One might normally expect a much closer association of hybrid corn adoption with tractor adoption than with computer ownership, particularly under the assumption that profitability of farm land (rather than general characteristics of the state) is a key determinant of agricultural adoption. Iowa might be expected to have lead in the adoption of hybrid corn and tractors, but it also lead in computer ownership.14 In figure 18.3, we show the relationship by state in the proportion of ideal patients receiving beta-blockers during 2001 (Jencks, Huff, and Cuerdon 2003) along with the year in which each state attained at least 10 percent levels of use for hybrid corn varieties, based on the logistics curves 14. As noted in the preceding, this could also be explained in part by early adoption of computers by farmers in keeping track of livestock.
Fig. 18.1 Year hybrid corn use attained 10 percent and year tractor use attained 10 percent: By state
Fig. 18.2 Year hybrid corn use attained 10 percent and percent of population living in homes with personal computer in 1993: By state
Technology Adoption from Hybrid Corn to Beta-Blockers
559
Fig. 18.3 Year hybrid corn use attained 10 percent and use of beta-blockers within twenty-four hours of AMI to 2000 to 2001: By state Source: Skinner and Staiger (2004), using data from Griliches (1957) and Jencks, Huff, and Cuerdon (2003).
estimated in Griliches (1957). Despite more than one-half century that separates these adoption measures and the very sharp differences in the nature of the outputs (and the markets for these outputs), states that took longer to adopt hybrid corn also had much lower use of beta-blockers in 2000 to 2001 (correlation –0.57). In other words, there appears to be some common state-level factor that lead to slow adoption of any innovation, whether in the last century or in this century. 18.5.2 Evidence on the Variation in Mortality and Cost of Care for AMI Patients Cutler and McClellan (2001), Cutler et al. (1998) and Skinner, Staiger, and (2006) have documented recent trends in mortality and the cost of care for AMI patients. The trends between 1984 and 2000 can be roughly characterized as two distinct periods: from 1984 to 1997, there was a steady decline in one-year mortality rates following the heart attack, accompanied by a steady rise in costs. Since 1997, however, costs have continued to climb while mortality has remained unchanged or even risen slightly. In a previous paper using the same data set (Skinner, Staiger, and Fisher 2006), we have demonstrated a lack of association between the cost of care and pa-
560
Jonathan Skinner and Douglas Staiger
Fig. 18.4 Average risk-adjusted one-year AMI mortality and average riskadjusted one-year AMI expenditures (1989–2000): By state
tient mortality; this is shown in figure 18.4. States such as California and New York spent 40–60 percent more per patient than states such as Indiana and Oregon, yet their patient outcomes were no better (in fact a bit worse). The time series evidence in that previous study showed a similar relationship: no association between the growth in expenditures and the decline in mortality. This lack of an association between resources being devoted to patients and patient outcomes suggests that the variation in mortality is the result of productivity differences across regions; that is, the diffusion of new innovations in the treatment of heart attacks has been uneven, with certain states lagging behind. Figure 18.5 provides some direct evidence suggesting that the high-mortality states have lagged behind in adoption effective treatments for heart attacks. Here we plot the risk-adjusted mortality rate in each state (averaged over the entire 1989–2000 period) against the percentage of heart attack patients receiving beta-blockers within twentyfour hours of their heart attack. There is a strong negative relationship (correlation –0.58) between mortality and a state’s use of beta-blockers. A simple regression of the risk-adjusted mortality rate on the use of betablockers suggests that a 10 percentage point increase in beta-blocker use is associated with a 1.2 percentage point decline in mortality. An effect of this magnitude is somewhat larger than what would be expected on the ba-
Technology Adoption from Hybrid Corn to Beta-Blockers
561
Fig. 18.5 Use of beta-blockers within twenty-four hours of AMI in 2000 to 2001 and average risk-adjusted one-year AMI mortality (1989–2000): By state
sis of clinical trials, given that not every heart attack patient is appropriate for the use of beta-blockers. There were many other innovations in the treatment of heart attacks; states more likely to use beta-blockers were also more likely to adopt other beneficial innovations in patient treatment. 18.5.3 State-Level Factors Related to Adoption of Innovations In this section, we investigate whether any particular characteristic of a state can explain the strong correlations we observe in the adoption of innovations over time and across technologies. Estimates from the factor model are reported in table 18.2. We used the factor model to explain fourteen state-level variables: four measures of AMI mortality and cost; five other measures of whether the state had adopted various innovations; and five state characteristics that might influence the adoption of innovations. The first two columns report the loading for the first two factors—that is, the correlation between each variable and the first two latent factors. The third column reports an R2 value, representing how much of the variation in each variable can be explained by these two factors. With the exception of changes in cost (and to a lesser extent changes in mortality), these two factors are able to explain the majority of the variation across states. We
562
Jonathan Skinner and Douglas Staiger
Table 18.2
Factor model estimates Factor loadings Factor 1
Factor 2
R2 (1-uniqueness)
1. AMI mortality and cost AMI adjusted 1-year mortality AMI adjusted 1-year ln(cost) mortality, 1989–91 to 1999–2000 ln(cost), 1989–91 to 1999–2000
–0.82 0.00 –0.52 –0.33
0.20 0.92 0.40 –0.15
0.71 0.85 0.43 0.13
2. Measures of innovation Percentage AMI getting beta blocker within 24hr Jencks quality of care ranking, 2000 Year hybrid corn achieved 10% Year tractors achieved 10% Percentage with computers at home, 1993
0.70 –0.77 –0.71 –0.92 0.68
0.20 0.03 –0.27 –0.16 0.45
0.53 0.59 0.58 0.88 0.67
3. Factors that influence adoption Putnam social capital index ln(per capita income, 2000) Putnam education index High school graduation rate, 1928 ln(population density in 2000)
0.97 0.42 0.95 0.80 –0.35
–0.14 0.84 0.05 –0.15 0.89
0.96 0.88 0.91 0.66 0.92
Variable
Notes: Estimates based on thirty-one states used in Griliches (1957). Factor model estimated by maximum likelihood with varimax rotation.
limited the model to two factors because the third factor added little to the explanatory power for any of the variables.15 These estimates imply that the variation across states in AMI mortality and cost can be adequately explained by two quite distinct factors. The first factor is strongly related to mortality, with a correlation of –0.82. This same factor was also strongly associated with all five measures of innovation: the correlation is about 0.7 with all positive indicators of adoption (rates of beta-blocker use in 2000 and computer ownership in 1993); and a correlation of between –0.7 and –0.9 with all negative indicators of adoption (quality-of-care ranking in 2000 and year achieved 10 percent in hybrid corn and tractors). Moreover, this first factor is correlated 0.97 with the Putnam social capital index, correlated 0.95 with the Putnam education index, and correlated 0.80 with the high school graduation rate from 15. With three factors included, the first factor accounted for 64 percent of the explained variance, the second factor accounted for 28 percent, and the third factor accounted for only 8 percent. The third factor primarily loaded on to the Jencks overall quality rank and betablocker use (but not the AMI measures), explaining a bit more of the variation (and correlation) in these two measures but adding little else to the model. Because beta-blocker use was a component of the Jencks ranking, this third factor may have captured a mechanical correlation between these two measures.
Technology Adoption from Hybrid Corn to Beta-Blockers
563
1928. Thus, the first factor suggests that high levels of social and human capital in a state are generating early adoption across a wide range of innovations. The second factor, in contrast, is only weakly correlated with the measures of innovation. Instead, this factor is closely related to the cost of AMI care, to per capita income in 2000, and to the population density in 2000. Thus, the variation across states in the cost of care appears to be associated with very different characteristics of the state. Overall, this factor model suggests that the differences across states in AMI mortality and in the adoption of a range of innovations can be explained by a single factor that is closely related to the human and social capital of the state that facilitates the adoption of innovations. This can be seen most clearly in the simple two-way graphs that show the association between the social capital index and the adoption of hybrid corn (figure 18.6) and the adoption of beta-blockers (figure 18.7). Similar patterns are apparent when educational attainment is substituted for social capital. More strikingly, figure 18.8 plots Putnam’s social capital index against the level of risk-adjusted AMI mortality. Again, this is not a measure of overall health in the population (as in Putnam’s [2001] own analysis of social capital and mortality rates), but instead the survival rate conditional on an index event and controlling for a wide variety of comorbidities, and the result holds even when controlling for differences across states in income
Fig. 18.6 By state
Putnam’s social capital index and year hybrid corn achieved 10 percent:
Fig. 18.7 Putnam’s social capital index and use of beta-blockers within twentyfour hours of AMI in 2000 to 2001: By state
Fig. 18.8 Putnam’s social capital index and average risk-adjusted one-year AMI mortality (1989–2000): By state
Technology Adoption from Hybrid Corn to Beta-Blockers
565
Fig. 18.9 Income per capita in 2000 and average one-year AMI mortality (1989– 2000): By state
levels (e.g., table 18.2). By contrast, there is little association between the level of per capita income and mortality rates (figure 18.9). Indeed, there appears to be two groups of states, one with a positive association (on the Southwest-Northeast quadrant), from Utah to New Jersey, and the other with a negative association (on the Northwest-Southeast quadrant), from Mississippi to Connecticut. As noted in table 18.2, there is also a strong association between overall technological improvements in AMI mortality and social capital. Figure 18.10 shows this significant negative association between the change in AMI mortality and social capital, while figure 18.11 displays the lack of association between the change in AMI mortality and the level of income in 2000. Once again, state-level income is shown to have little association with technological improvement in health care. 18.6 Conclusion We have revisited the 1960s debate between Zvi Griliches and the sociologists by comparing the adoption of several important innovations during the twentieth century, ranging from advances at midcentury in hybrid corn and tractors to medical innovations in the treatment of heart attacks at the end of the century. We found first a very strong correlation with re-
Fig. 18.10 Putnam’s social capital index and decline in one-year AMI mortality (1989–2000): By state
Fig. 18.11 Income per capita in 2000 and the decline in one-year AMI mortality (1989–2000): By state
Technology Adoption from Hybrid Corn to Beta-Blockers
567
gard to the adoption of new and effective technology, and this correlation held across a variety of industries and time periods. These results are suggestive of state-level factors associated with barriers to adoption. One might further expect a priori that these barriers could be related to information or network flows, given that farmers, physicians, and individual computer users conduct their business in often small and isolated groups and therefore are most vulnerable to potential information asymmetries. Second, we found that measures of social capital or education appear to explain a large fraction of these state-level variations in the diffusion of innovation. We cannot and need not make any claims about causality. However, it is not unreasonable to conjecture that there are systematic differences across states with regard to the frequency and likelihood of informational exchanges through networks or other social activities and that these are systematically related to both average educational attainment and other measures of social capital. Finally, we view the debate between Griliches and the sociologists as being a particularly useful guide to better understanding what causes technological diffusion. Often differences between the economic and sociological view were simply a matter of labeling and semantics; what was congruent behavior for the sociologists (i.e., that farmers who innovate in one technology are more likely to innovate in another) could potentially be relabeled Bayesian learning-by-doing by the economists, as in Griliches (1962). But there remain more fundamental disagreements uncovered in this debate. For economists, people who innovate first do so because of relative advantage, better access to credit, greater profitability, and so forth. Most economic models assume innovation doesn’t occur because it is more profitable to wait, whether because of second-mover advantage, risk aversion, credit constraints, uncertainty, or other factors. By contrast, sociologists are more interested in why innovation can be so slow even when it seems highly appropriate to do so, going so far as to label the noninnovators “laggards,” a value-laden term economists typically avoid.16 Both groups agree that economic incentives are very important in affecting the likelihood of adoption, but the disagreements arise over the behavior of people who are slow to innovate; are they slow because they lack (or have rejected) the appropriate decision tools or because they do so optimally?17 Despite these caveats, it is important to note the potential importance 16. Economists may be quick to label behavior “inefficient,” but this is typically because of faulty incentives and not a lack of self-interest. The growing literature on behavioral economics, particularly in health care (e.g., Frank 2004), holds promise in bridging this gap between economists and sociologists. 17. In developing countries, it can sometimes be the case that native people who are “laggards” in the adoption of modern agricultural methods do so for very good reasons that may not be apparent to the western agricultural advisors (Rogers 1995).
568
Jonathan Skinner and Douglas Staiger
of economic incentives to get around the problem of slow diffusion. While Coleman, Katz, and Menzel (1966) did not find a strong association between physicians having contact with drug detailers and adoption, it is still the case that the drug had largely diffused within eighteen months, a much better record than the off-patent beta-blockers. Indeed, one reexamination of the Coleman data views this relatively rapid diffusion as being the consequence of marketing efforts rather than social contagion per se (Van den Bulte and Lilien 2001). As well, the critical importance of drug detailers has been noted in the context of antiulcer drugs; Berndt et al. (1997) documented that detailers both shifted market share and expanded the overall diffusion of the drug. Economic factors such as expected profitability are clearly a necessary condition for the adoption of new technologies; the insight of the sociologists is that they are not always sufficient conditions.
References Babcock, Jarvis M. 1962. Adoption of hybrid corn: A comment. Rural Sociology 27:332–38. Barr, Abigail. 2000. Social capital and technical information flows in the Ghanaian manufacturing sector. Oxford Economic Papers 52:539–59. Berndt, Ernst R. 2001. The U.S. pharmaceutical industry: Why major growth in times of cost containment? Health Affairs 20 (March/April): 100–114. Berndt, Ernst R., Linda Bui, David Reiley, and Glen Urban. 1997. The roles of marketing, product quality and price competition in the growth and composition of the U.S. anti-ulcer drug industry. In The economics of new goods, ed. Timothy Bresnahan and Robert Gordon, 277–322. Chicago: University of Chicago Press. Berndt, Ernst R., Robert S. Pindyck, and Pierre Azoulay. 2003. Network effects and diffusion in pharmaceutical markets: Antiulcer drugs. Journal of Industrial Economics 51 (June): 243–70. Berwick, Donald M. 2003. Disseminating innovations in health care. Journal of the American Medical Association 289 (April 16): 1969–75. Braatz, Jay, and Robert Putnam. 1996. Families, communities, and education in America: Exploring the evidence. Harvard University, Working Paper. Brandner, Lowell, and Murray A. Straus. 1959. Congruence versus profitability in the diffusion of hybrid sorghum. Rural Sociology 24:381–83. Calvó-Armengol, Antoni, and Mathew O. Jackson. 2004. The effects of social networks on employment and inequality. American Economic Review 94 (June): 426–54. Caselli, Francesco, and Wilbur John Coleman II. 2006. The world technology frontier. American Economic Review 96 (3): 499–522. Coleman, James S., Elihu Katz, and Herbert Menzel. 1966. Medical innovation: A diffusion study. New York: Bobbs-Merrill. Comin, Diego, and Bart Hobijn. 2004. Cross-country technology adoption: Making the theories face the facts. Journal of Monetary Economics 51:39–83.
Technology Adoption from Hybrid Corn to Beta-Blockers
569
Conley, Timothy G., and Christopher R. Udry. 2004. Learning about a new technology: Pineapple in Ghana. Yale University. Mimeograph. Cutler, David M. 2004. Your money or your life: Strong medicine for America’s health care system. New York: Oxford University Press. Cutler, David M., and Mark McClellan. 2001. Is technological change in medicine worth it? Health Affairs 20 (5): 11–29. Dixon, Robert. 1980. Hybrid corn revisited. Econometrica 48 (November): 1451– 61. Durlauf, Steven N., and Marcel Fafchamps. 2004. Social capital. NBER Working Paper no. 10485. Cambridge, MA: National Bureau of Economic Research, May. Foster, Andrew D., and Mark R. Rosenzweig. 1995. Learning by doing and learning from others: Human capital and technical change in agriculture. Journal of Political Economy 103 (December): 1176–1209. Fournier, Gary M., Kislaya Prasad, and Mary A. Burke. 2002. Physician social networks and treatment variations in coronary inpatient care. Florida State University. Mimeograph. Frank,, Richard G. 2004. Behavioral economics and health economics. NBER Working Paper no. 10881. Cambridge, MA: National Bureau of Economic Research, November. Goldin, Claudia. 1994. Appendix to: “How America graduated from high school, 1910 to 1960.” Construction of state-level secondary school data. NBER Working Paper no. H057. Cambridge, MA: National Bureau of Economic Research, June. Goldin, Claudia, and Lawrence F. Katz. 1998. Human capital and social capital: The rise of secondary schooling in America, 1910 to 1940. NBER Working Paper no. 6439. Gottlieb, Stephen S., Robert J. McCarter, and Robert A. Vogel. Effect of betablockade on mortality among high-risk and low-risk patients after myocardial infarction. New England Journal of Medicine 339 (August 20): 489–97. Griliches, Zvi. 1957. Hybrid corn: An exploration in the economics of technological change. Econometrica 25 (October): 501–22. ———. 1960. Congruence versus profitability: A false dichotomy. Rural Sociology 25:354–56. ———. 1962. Profitability versus interaction: Another false dichotomy. Rural Sociology 27:325–30. Guiso, Luigi, Paola Sapienza, and Luigi Zingales. 2004. The role of social capital in financial development. American Economic Review 94 (June): 526–56. Hall, Bronwyn. 2004. Innovation and diffusion. In The Oxford Handbook of Innovation, ed. Jan Fagerberg, David C. Mowery, and Richard R. Nelson, 459–85. New York: Oxford University Press. Harmon, Harry H. 1967. Modern factor analysis. 2nd ed. Chicago: University of Chicago Press. Havens, A. Eugene, and Everett Rogers. 1961. Adoption of hybrid corn: Profitability and the interaction effect. Rural Sociology 26:409–14. Helliwell, John. 1996. Economic growth and social capital in Asia. In The Asia Pacific region in the global economy: A Canadian perspective, ed. Richard G. Harris, 21–42. Calgary, Canada: University of Calgary Press. Howell, Joel D. 1995. Technology in the hospital: Transforming patient care in the early twentieth century. Baltimore: Johns Hopkins University Press. Jencks, Stephen F., Edwin D. Huff, and Timothy Cuerdon. 2003. Change in the
570
Jonathan Skinner and Douglas Staiger
quality of care delivered to Medicare beneficiaries, 1998–99 to 2000–2001. Journal of the American Medical Association 289 (January 15): 305–12. Jovanovic, Boyan, and Yaw Nyarko. 1995. A Bayesian learning model fitted to a variety of empirical learning curves. Brookings Papers on Economic Activity, Macroeconomics: 247–99. ———. 1996. Learning by doing and the choice of technology. Econometrica 64 (November): 1299–1310. Klenow, Peter J., and Andres Rodriguez-Clare. 1997. Economic growth: A review essay. Journal of Monetary Economics 40 (3): 597–617. Luft, Harold S. 2003. Variations in patterns of care and outcomes after acute myocardial infarction for Medicare beneficiaries in fee-for-service and HMO settings. Health Services Research 38 (August): 1065–79. Olmstead, Alan L., and Paul W. Rhode. 2000. The diffusion of the tractor in American agriculture: 1910–60. NBER Working Paper no. 7947. Cambridge, MA: National Bureau of Economic Research, October. Parente, Stephen L., and Edward C. Prescott. 1994. Barriers to technology adoption and development. Journal of Political Economy 102 (April): 298–321. Phelps, Charles E. 2000. Information diffusion and best practice adoption. In Handbook of health economics. Vol. 1A, ed. A. J. Culyer and J. P. Newhouse, 223– 64. New York: Elsevier Science. Putnam, Robert D. 2001. Bowling alone. New York: Simon and Schuster. Rogers, Everett M. 1995. Diffusion of innovations. 4th ed. New York: Free Press. Rogers, Everett M., and A. Eugene Havens. 1962. Rejoinder to Griliches’ “Another false dichotomy.” Rural Sociology 27:330–32. Salim, Richard Peto, Jacqueline Lewis, Rory Collins, and Peter Sleight. 1985. Beta blockage during and after myocardial infarction: An overview of the randomized trials. Progress in Cardiovascular Disease 27 (March/April): 335–71. Schultz, Theodore W. 1964. Transforming traditional agriculture. New Haven, CT: Yale University Press. Skinner, Jonathan. 2003. Geography and the use of effective care in the United States. Dartmouth College. Mimeograph. http://www.dartmouth.edu/~jskinner. Skinner, Jonathan, and Douglas Staiger. 2006. The diffusion of health care technology. Dartmouth College. Mimeograph. Skinner, Jonathan, Douglas Staiger, and Elliott Fisher. 2006. Is medical technology always worth it? The case of acute myocardial infarction. Health Affairs 25 (2): w34–w47. Strang, David, and Sarah A. Soule. 1998. Diffusion in organizations and social movements: From hybrid corn to poison pills. Annual Review of Sociology 24: 265–90. Strang, David, and Nancy Brandon Tuma. 1993. Spatial and temporal heterogeneity in diffusion. American Journal of Sociology 99 (November): 614–39. U.S. Bureau of the Census. 1962. Census of agriculture 1959. Vol. 2. Washington, DC: Government Printing Office. Van den Bulte, Christophe, and Gary Lilien. 2001. Medical innovations revisited: Social contagion versus marketing effort. American Journal of Sociology 106 (5): 1409–35. Wilkening, Eugene A. 1953. Acceptance of improved farm practices in three coastal plain counties. NCAES Technical Bulletin no. 98. Raleigh, NC: North Carolina Agricultural Experiment Station. Yusuf, Salim, Janet Wittes, and Lee Friedman. 1998. Overview of results of randomized clinical trials in heart disease I. Treatments following myocardial infarction. Journal of the American Medical Association 260:2088–93.
VI
Epilogue
19 Zvi Griliches’s Contributions to Economic Measurement Jack E. Triplett
Zvi Griliches was without doubt the foremost economist of his day in contributions to economic measurement. In this memorial, I will of course discuss his research contributions. But his contributions went beyond his own research, formidable as it was, for perhaps what Zvi really did most for economic measurement was to exhort and preach and encourage. A great example of his preaching (or, perhaps, the word exhortation is better) was his American Economic Association Presidential Address (Griliches 1994, 2): “[O]ur understanding of what is happening in our economy . . . is constrained by the extent and quality of the available data.” That obviously must be right—and rightly should have been obvious. But it was not something that ranked very high among economists’ concerns at the time. In a similar vein, Griliches suggested that some unresolved empirical problems in economics were simply measurement problems: “I will argue that data and measurement difficulties may in fact be a major source of this failure. This point will be made not to provide us with an alibi, but rather to temper the pretentiousness of some of our pronouncements and to urge us toward the more mundane task of observation and measurement” (Griliches 1994, 10). Again, this was obviously right, but economists were not acting as if it were right, so someone needed to say it, and say it emphatically. How many economists would take the occasion of a presidential address to preach to the profession that it was neglecting, in its neglect of economic measurement issues, its own interest? Earlier, however, he argued (in the best Chicago tradition) that neglect Jack E. Triplett is a nonresident senior fellow in economic studies at the Brookings Institution.
573
574
Jack E. Triplett
of data was actually in econometricians’ own interest: “If the data were perfect, collected from well designed randomized experiments, there would be hardly room for a separate field of econometrics. Given that it is the ‘badness’ of the data that provides us with our living, perhaps it is not all that surprising that we have shown little interest in improving it” (Griliches 1986, 1466). With respect to economic data, economists are still largely huntergatherers, working with what Zvi often called “found” data. He urged them to move toward the next stage of civilization, where they would be more like artisan farmers. “[W]e lead a somewhat remote existence from the underlying facts we are trying to explain. We did not observe them directly . . . In this we differ quite a bit from other sciences . . . such as archeology, astrophysics, biology, or even psychology where the ‘facts’ tend to be recorded by the professionals themselves” (Griliches 1986, 1467). When Zvi was planning the econometrics curriculum for the Moscow New Economic School,1 he set up the course sequence so that the first course segment focussed on data.2 He said that he had no hope for changing the standard U.S. curriculum, but the New School gave him an opportunity to correct the imbalance in U.S. teaching. He believed that measurement was important to economics, and he acted on that belief. 19.1 His Motivation: Zvi’s MFP Mismeasurement Hypothesis Some economists do research in economic measurement because they believe that measurement is an important part of economics or because they want to develop information for testing economic theories and for explaining the economy. Griliches was motivated by this; there is no doubt about it. But he got into economic measurement because of a particular hypothesis. In his early papers on agricultural productivity (see, for example, Griliches 1963), he developed the hypothesis that the growth in multifactor productivity (MFP) was just measurement error.3 Recall that MFP, the “residual,” had been discovered as the major source of economic growth in the mid-1950s (see Zvi’s own summary of this and earlier work in Griliches 1996). This discovery led to a huge debate about what MFP was measuring—the contribution of technological change, the contribution of increases in knowledge and similar explanations were paramount. Abromovitz’s (1956, 11) famous characterization of the residual as the “measure of 1. On the school, see its tenth anniversary at: http://www.nes.ru/english/index.htm. 2. Daniel Hamermesh taught that first course. A product was a study of hedonic indexes for computers, done jointly with the students—see Hamermesh et al. (1994). 3. Apparently he conceived this idea much earlier. Nerlove (2001) records that Griliches’s teacher, Theodore Schultz, published a paper containing the same mismeasurement hypothesis in 1956, but Schultz, in a footnote, gave credit for the idea to his student Griliches.
Zvi Griliches’s Contributions to Economic Measurement
575
our ignorance” was consistent with the search by economists such as Denison (1962) for missing variables or missing explanations.4 Griliches pointed in another direction: “[C]hanges in output are attributable to changes in the quantities and qualities of inputs, and to economies of scale. Conventionally derived residual measures of productivity growth are viewed not as measures of technical change but rather as the result of errors in the measurement” (Griliches 1963, 271). For agriculture, Griliches corrected the data (a) for mismeasurement of output and (especially) inputs, (b) for error introduced by inappropriately using input shares as measures of output elasticities, and (c) for the failure of others to allow for economies of scale. Without these measurement corrections, measured MFP in agriculture increased 48 percent between 1940 and 1960. With the corrections, it declined 6 percent. This is a powerful role for economic measurement. He later moved away from the strongest form of the MFP mismeasurement hypothesis, at least to a degree. After Jorgenson and Griliches’s great debate with Denison,5 a debate that marks a watershed in productivity research, Griliches conceded that his side might have “explained too much,” mainly because Denison showed that their capacity utilization adjustment was faulty. But he did not abandon the mismeasurement hypothesis entirely. He remarked at a still later date that “all of the issues raised in this first [published] paper . . . continued to preoccupy me in the years to come” (Griliches 1988, 6). Moreover, the hypothesis motivated part of Jorgenson’s subsequent productivity research, as he has noted. Griliches was always greatly interested in evidence of bias in price indexes (and the corollary opposite bias in output and input quantity measures) because he thought that on-balance price measurement error enlarge, inappropriately, measured MFP. Griliches did not live to take part in the debate about the sources of the MFP acceleration that began in the United States in 1995. (It began in 1995, but it was only discovered with the release of the Bureau of Economic 4. Abramovitz’s full sentence reads: “Since we know little about the causes of productivity increase, the indicated importance of this element may be taken to be some sort of measure of our ignorance about the causes of economic growth in the United States and some sort of indication about where we should concentrate our attention” (Abramovitz 1956, 11). Denison, in his day, was as active in criticizing economic measurement as was Griliches. See his massive study of the sources of economic growth (Denison 1962). He was quite cognizant that the measured residual might incorporate measurement error and strove mightily to construct a residual that was stripped of measurement error, to the extent possible. However, Denison, unlike Griliches, was content to describe the reconstructed residual as the contribution of the advance in knowledge and always objected to proposed improvements in measurement that he regarded as improperly incorporating knowledge advances into the inputs, particularly into the capital input. This principle motivated, in part, Denison’s famous mistrust of hedonic indexes. 5. The original article was Jorgenson and Griliches (1967). The debate includes Denison (1969), reprinted in Denison (1972), Jorgenson and Griliches (1972), and the series of exchanges published in the same issue of the Survey of Current Business.
576
Jack E. Triplett
Analysis’s [BEA’s] benchmark revision of gross domestic product [GDP] in late 1998.) Barry Bosworth and I have shown (Triplett and Bosworth 2006; Bosworth and Triplett, chap. 14 in this volume; Triplett and Bosworth 2004) that three-quarters of recent U.S. MFP growth took place in services industries and that almost all of the acceleration in MFP growth after 1995 was in services industries (because they had very low MFP growth in the years before 1995). For most of these services industries (medical care being an exception), differential changes in the measurement of output are not the sources of the acceleration because the BEA has extended most of the measurement improvements that it has made back to years before 1995. But have we missed crucial inputs? Or have we mismeasured them? These are the questions that Griliches’s legacy suggests that we should ask. Measures of managerial input come to mind, but labor, capital, and intermediate inputs are each one suspect in one industry or another (Triplett and Bosworth 2004). 19.2 Impact on Productivity Research Griliches stayed with measurement issues and measurement questions because he judged the topic important. This was characteristic of his career: he had sound judgment on what was important, and he was not distracted by fads in economics that come and go and occupy so much attention before eventually fading away without making real contributions. His penetrating judgments were displayed in formal and informal discussions, often dazzling his contemporaries. It was not uncommon to see him come into a conference, pick up a copy of a paper, thumb apparently idly through it, then stop on precisely the point of weakness, and fire a question at the astonished author. At NBER productivity workshops, he was notorious in finding a flaw in a paper very early in the author’s presentation— on occasion, in the first sentence! The sound critical judgment that he showed of others’ work kept him on track in his own. The NBER Productivity Program and the NBER Summer Institute started in 1979. Under Griliches’s leadership, those productivity sessions from the very beginning included papers and discussions on economic measurement issues. As time went on, they also included some of the more interesting (and, truth be told, some of the not-so-interesting) work being done within statistical agencies. Inclusion of economic measurement within the topic of productivity is now so well accepted by productivity researchers in North America that they may not always appreciate how much of this is distinctively a contribution of Griliches, plus others, including Jorgenson in collaboration with Griliches and with others, Denison (1962, 1967, 1974), and Kuznets (see his Nobel Prize address [1971] and Milton Friedman’s recollections of Kuznets’s role in starting the Conference on Research in Income and
Zvi Griliches’s Contributions to Economic Measurement
577
Wealth in Berndt and Triplett [1990]). Measurement is not always a part of the productivity research tradition elsewhere. I recall attending a productivity conference on another continent where all the economists used data they obtained from international agencies to study differences between, perhaps, economic growth in Indonesia and Afghanistan. They asked many penetrating questions about the econometrics of the estimates and developed many ingenious (and no doubt fanciful) explanations for the differences in international growth and productivity rates they had estimated. But they gave literally no attention at all to the possibility that the differences in their estimates across countries reflected little more than the differing and generally inadequate measures of the economic variables that those countries’ statistical agencies produced. They “found,” in Griliches’s word, data on international agencies’ Web pages, and that was all they needed to know. The Griliches-influenced North American tradition is a better one. 19.3 Hedonic Indexes—Labor and Product Markets Among measurement topics to which Griliches contributed, his name is most prominently linked to hedonic indexes. Griliches (1961) was not the first to estimate hedonic price indexes.6 However, his work established the topic’s modern standing—see Lipsey’s (1990) historical appraisal on this point. Hedonic indexes were dead before Griliches breathed life into them. Griliches worked on two kinds of hedonic functions. He was best known for what we normally call a hedonic function and a hedonic price index, where the hedonic function looks like (using computers as an example) the following: ln pit a0 a1 ln speedit a2 ln memoryit . . . eit This hedonic function says that the price of computer i at time t ( pit ) depends on its speed (megahertz or MHz), amount of memory capacity, and other performance variables. An example of Zvi’s work on computers is Berndt and Griliches (1993). Griliches’s best known hedonic function— for automobiles—has the prices of automobile models on the left-handside, in logarithms and, on the right-hand side, the characteristics weight, horsepower, size, and other measures. A human capital wage regressions is another hedonic function. In this formulation, wages are a function of the human capital variables education, experience, and so forth: ln wjt b0 b1 ln educationjt b2 ln experiencejt . . . ujt 6. Court (1939) and Stone (1956) preceded him. See, for additional historical discussion, Berndt (1991, chapter 4) and Triplett (2006, “Historical Note,” appendix A to chapter III).
578
Jack E. Triplett
Not atypically for the computer and human capital literatures, both of the preceding hedonic functions appear in the double log-functional form, but this is mainly for illustration.7 Griliches’s automobile hedonic functions were semilog, which became controversial for reasons discussed in the following. Both price and wage hedonic functions are motivated by the hedonic hypothesis, which holds that complex transactions are aggregations (bundles is the word commonly used in this literature) of quantities of lower order variables that we call characteristics. Computer or automobile characteristics are the true variables that enter consumers demand functions and that define the outputs of producers. Education, experience, and so forth are the variables in which workers invest and that employers find productive and for which they are willing to pay. The characteristics have their own prices, often termed implicit prices. The hedonic function unbundles the observed transaction into the variables on which economic agents’ behaviors are based, and it is also used to estimate the implicit prices for these variables. Actually, there are many hedonic functions in economics, and for many of them the purpose is not to estimate price indexes. Colwell and Dilmore (1999) identify Haas (1922) as estimating an early hedonic function for agricultural land, and the vegetables study by Waugh (1928) is frequently cited; both wanted to help farmers, not to measure farm prices. The hedonic literature on housing is enormous (Sheppard [1999] provides a review), and the literature on the structure of interest rates can also be thought of as hedonic (with characteristics such as liquidity and risk). Because the hedonic hypothesis covers many kinds of complex transactions, it might not be very interesting to discover who first estimated a hedonic function (without necessarily calling it that), but it was undoubtedly an agricultural economist. Griliches’s contributions to hedonic functions (using the term narrowly, as we usually do, to apply to price hedonic functions) were very different from his contributions to human capital or hedonic wage functions. For a number of reasons, some a bit obscure, these two literatures did not develop in a parallel fashion. Griliches’s own work proceeded in parallel with the way the two literatures developed. Of course, the purposes of the two literatures diverged, to an extent, because product market hedonic functions have often been used to estimate hedonic price indexes; human capital wage regressions have not been used to estimate wage indexes. But divergence in purpose does not explain the divergence in research directions. 7. Triplett (2006, chapter V) contains a review of functional forms used for hedonic studies on computers and other products.
Zvi Griliches’s Contributions to Economic Measurement
579
19.3.1 Hedonic Functions: The Variables With respect to the wage hedonic function, Griliches’s major interest was in measuring the variables in the function, particularly the education variable. Fairly early on, economists questioned the interpretation that years of education measured worker productivity. Alternative hypotheses included screening and the contention that education was merely a proxy for ability. Griliches’s response to these objections was to test the efficacy of the education measure in a production function (Griliches 1970; Griliches and Mason 1972). In his own work, he rejected the screening interpretation (though this has been readdressed by others) and the proxy interpretation. He did not deny that the schooling measure could be improved as a measure of labor quality (“such studies tend to treat years of school as the conceptually right and error-free measure of educational attainment, a position that is hardly tenable in light of the extreme diversity of the educational system in the United States” [Griliches and Mason 1972, S97]). He concluded, however, that alternative interpretations of the education variable, or contentions that education was not a measure of labor quality, were rejected by his production function analysis. The research principle illustrated here is straightforward: just because education is associated with wage differentials, that does not mean it is necessarily a proper measure of labor quality. The measurement hypotheses needed scientific testing, and that was the approach he followed. Zvi knew that the weight of cars, used as a characteristic in his automobile paper, is not a good measure of the transportation services they provide and discussed better measures in his paper with Ohta (Ohta and Griliches 1976). Probably he knew that megahertz (MHz) was not a very good measure of computer performance, and certainly if he didn’t, Cole (1993) raised it to his attention in discussing his first computer paper (Berndt and Griliches 1993). But in contrast to his work on hedonic wage regressions, he spent minimal research effort on asking questions about the variables that appeared in hedonic functions. There were probably two reasons for that. In the first place, there are long precedents in economics for the idea that education improves labor quality. The idea of human capital goes back to Adam Smith. When Zvi was working on this topic, labor economists were on their way to executing several hundred, if not more, human capital wage regressions. With that abundant background on the supply side, Zvi was actually one of the few to tackle the demand side of the problem. My point is that economists understood where the education and experience variables came from and how they were produced. No mystery outside of economics inhibited understanding of the forces that produced the characteristics that determined la-
580
Jack E. Triplett
bor quality nor the reasons these characteristics commanded prices in labor markets. After all, most research economists are in the business of producing human capital. On the other hand, there was absolutely no economic precedent for analyzing the variables in product hedonic functions. We understand a great deal about the determinants of the supply of human capital. What are the determinants of the parallel decision on the part of a computer producer to supply more computer characteristics—to put more speed or memory into a computer? This is in the realm of engineering, or so it has sometimes been thought by economists, and certainly not in the traditional territory of economics—quite unlike the situation in human capital where the behavior of suppliers of it was well rooted in the field of labor economics. Another way of putting it is that analysis of the variables in hedonic functions was harder, and because understanding their supply required engineering knowledge, economists had little relative advantage. Moreover, almost the only available work on the demand for heterogeneous products and their characteristics was the demand model of Lancaster (1971), which most economists regarded, with considerable justification, as nonoperational. Thus, the questions economists asked about hedonic functions were different from the ones they asked about hedonic wage equations. More recent work has shifted the balance, to an extent— for example, Berry, Levinsohn, and Pakes (1996) developed an operational demand for characteristics model, and Stavins (1995) looked at gaps in characteristics space among personal computer models. Significantly, these contributors included Griliches’s students. 19.3.2 Hedonic Functions: Functional Form There is another half to this story, which is both a consequence of the different approaches to measuring the variables that emerged in the two literatures and a contributor to their divergent developments. That half story concerns the question of functional form. Because labor economists understood a great deal about the supply of human capital (and a little bit about the demand for it), none of them proposed that the hedonic wage equation should be derived from a production function, that is, from the employers’ demand functions for human capital. Labor economists comprehended, even before Rosen’s (1974) article on hedonic functions, that the wage hedonic function was some sort of combination of demand and supply relations (though the exact relation was not well understood). Others of them possibly did not think much about where the functional form came from; it was simply an empirical construct (which Rosen showed theoretically was the right way to think about it). Rosen showed that the hedonic function is a double envelope to sets of demand functions on one side and sets of supply functions on the other side. As with any envelope, the form of the hedonic function there-
Zvi Griliches’s Contributions to Economic Measurement
581
fore depends on distributions of buyers and sellers, but not on the shapes of buyers’ demand or utility functions or on sellers’ supply or production functions. In contrast to labor market hedonic functions, functional form became a big issue for product market hedonic functions. Economic critics demanded that the functional form for hedonic functions be derived from behavioral functions on the demand side. In consumer demand analysis of the 1960s and 1970s, functional form was, if not everything, nearly everything. Great ingenuity was expended on deriving sets of demand functions that could be estimated and that were also consistent with utility theory (the culmination of this work from the 1960s and 1970s is Deaton and Muellbauer 1980). Confronted with an estimated hedonic function on goods (automobiles or computers), consumer demand theorists contended that hedonic researchers should have derived the form of the hedonic function from the form of consumers’ utility functions, by analogy with their own work on empirical consumer demand functions. “Measurement without theory” was a devastating epithet in its day (the phrase comes from Koopmans 1947, 161), and the epithet was hurled at Griliches’ work and that of others, mostly in private conversations and in seminars and so forth. Why was this question asked the way it was, and why was it not asked of hedonic wage equations? Probably, it was merely economists’ familiarity with established paradigms and their reluctance to break out of them. Economists were familiar with the economic paradigm that determines workers’ decisions to improve labor quality and had no experience whatever with manufacturers’ decisions to improve product quality.8 On the other hand, consumer demand theorists were familiar with the paradigm that governs functional forms of demand equations and had no familiarity at all with a paradigm that determines the variables in consumer demand functions (what, after all, is the “good” for which demand is estimated?). Into this gap, theorists stepped in with “proofs” that the semilog hedonic functional form (in Griliches’s automobile hedonic functions) was not theoretically valid. Before Rosen (1974), that is, theorists thought that hedonic functions were some form of demand function, which should be derived from the utility function. Under this mistaken notion, then, it was easy to show semilog and linear hedonic functions corresponded to not very good utility function specifications.9 Zvi dismissed this. In a not very convincing defense, he contended that the hedonic index was just an attempt to improve price indexes empirically, 8. I do not imply that no literature on this existed. But mostly, it followed the comparatively unfruitful approach: “Let z be a measure of quality,” which the researcher then inserted into some behavioral function, where it is manipulated for some purpose. Whether “quality” is validly represented as a scalar (rather than by a vector of characteristics) is not usually considered. 9. Some are still engaged in this exercise (Diewert 2003).
582
Jack E. Triplett
that the theorists were loading more on to the hedonic function than the empirical economists had in mind. Despite the theoretical proofs to the contrary, the Consumer Price Index (CPI) ‘exists’ and is even of some use. It is thus of some value to attempt to improve it even if perfection is unattainable. . . . [The hedonic approach] did not pretend to dispose of the question of whether various observed differentials are demand or supply determined . . . and whether the resulting indexes have an unambiguous welfare interpretation. Its goals were modest. (Ohta and Griliches 1976, 326) It was not a very compelling argument. Possibly because he was attacked so vigorously by some theorists, he remained curiously uninterested in the theory of hedonic functions, even in the contribution of Rosen (1974). He seemed not impressed by the new theory. I never understood this. Rosen (1974) should have ended the pointless discussions about functional form that Zvi knew were pointless, and so one might have thought that he would have welcomed Rosen’s contribution.10 Pakes (2004) suggests that the issue lay in industrial organization: Rosen’s model is a competitive one, but the world of differentiated products—to which the hedonic function applies—is a world with gaps in the product space and likely market power to producers who succeed in innovating to fill in a gap. That view must have been true in later years (Pakes was a Griliches student and colleague), but I never heard Zvi use this line of reasoning in the 1970s and early 1980s, and it does not figure in his last assessment of hedonic indexes (Griliches 1990). It remains, nevertheless, curious that dissimilar sets of questions were asked by the profession about two research topics that are themselves quite parallel. For hedonic wage equations, the questions concerned predominantly the validity of the variables in the functions. The functional form was simply established empirically, without, so far as I am aware, the criticism that the functional form should be derived from the employer’s objective function. I believe that the labor economists were right in their research priorities. On the other hand, for hedonic functions on goods, most of the economic critics’ questions concerned the derivation of the hedonic functional form, and few of them concerned the variables in the hedonic functions. In my view, that was the wrong orientation. Although Rosen’s (1974) straightening out the theory of hedonic functions was extremely salutary for redirecting research, regrettably research on (product) hedonic functions 10. Somewhat later, however, he outlined the essence of Rosen’s result and endorsed it as his own view: “What is being estimated is actually the locus of intersections of the demand curves of different consumers with varying tastes and the supply functions of different producers with possibly varying technologies of production . . . [T]heoretical derivations at the individual level [cannot] really provide substantive constraints on the estimation of such ‘market’ relations” (Griliches 1990, 189).
Zvi Griliches’s Contributions to Economic Measurement
583
tailed off in the late 1970s and early 1980s, after the Rosen article was published (hedonic research accelerated in analysis of labor markets and real estate markets, where significantly these quasi-theoretic disputes over functional form were mostly ignored). It is also my view that economists who estimated hedonic functions would have benefited from following the lead of the labor economists and should have asked much more penetrating questions about the variables they inserted into those hedonic functions. With respect to the variables, many researchers took a kitchen sink approach, which no doubt had something to do with the decline of interest in hedonic research. Indeed, in that interval, hedonic function research became “not respectable” in certain quarters, as Griliches once observed in an NBER seminar. 19.4 Other Topics Griliches’s contributions to economic measurement are also defined by what did not interest him. Except for his famous 1961 paper on hedonic indexes (Griliches 1961), which he himself emphasized he did not invent, he never made major contributions to measurement techniques as we usually think of them.
• He took little interest in index number formulas, despite the tremendous interest in that topic in the 1960s and early 1970s. Diewert (1976) put an end to that as a research question, in my view, by showing that a small number of index numbers (he called them superlative) not only provided good approximations to the unknown true economic indexes but also gave results that corresponded closely, numerically. Hausman (2003) remarked that index number formulas are second-order problems in measuring price indexes, which I suspect summarized Zvi’s views. Once chain superlative index numbers were adopted (included among the measurement recommendations in Jorgenson and Griliches 1967), the index number formula didn’t matter much.11 • Griliches also made no real contributions to national accounts, unlike Simon Kuznets, an economist whom Zvi venerated. Of the economists of his generation, Griliches was probably Kuznets’s closest intellectual descendant. He wrote one paper (Griliches 1973) in which he says he is showing how research and development (R&D) can be integrated into national accounts, but I doubt that any national accountant would recognize it as such. Griliches probably felt that the accounting parts of national accounts were no longer interesting problems, ex11. Jorgenson and Griliches in 1967 used a Divisia index justification for chain Törnqvist indexes. This Divisia justification was superceded by Diewert’s superior theoretical justification, which provided Törnqvist (and Fisher) indexes a grounding in production theory. On the relation between Divisia and economic approaches, see Hulten (1973).
584
Jack E. Triplett
cepting insofar as the national accountants’ focus on the accounting side was getting in the way of making improvements to the measurement.12 It is still true that national accounting conventions sometimes hold back improvements in economic measurements (an example is the measurement conventions for insurance and banking in national accounts—see Triplett and Bosworth [2004] chapters 6 and 7). • Neither did he jump into the conceptual questions surrounding measuring the output of hard-to-measure industries, with one or two exceptions. However, his “Introduction” to Griliches (1992) is a marvelous and concise statement of the general problem of measuring services output. He pointed out that in many of these industries the transaction—what was being provided and what was being charged for—was not quite clear, and when it was, the transactions were so heterogeneous that they presented enormous quality change problems. Some substantial progress on services data has been made in the interim, but Griliches’s statement of the problem remains relevant and insightful. • Though he contributed to econometrics and was early in his career thought of as an econometrician, he did not remain primarily a contributor to econometric techniques. Nerlove (2001, F424) judged that, “[A]lthough statistical and econometric methodology were not at the central core of his contribution, he was an empirical economist in the best sense, perhaps the best his generation of economists produced. This is econometrics in its best sense, which blends theory and application.” That does not mean that he did not keep up with advances in econometric methods or that he was unsympathetic to techniques themselves. An example is his thorough and sympathetic review of techniques for filtering signal from noisy data in his Handbook of Econometrics chapter (Griliches 1986), his work on unobserved variables (Griliches and Chamberlain 1975), and his contributions to panel data methods (Hausman, Hall, and Griliches 1984). This was another case where he made one of his sound judgments about what really mattered: “It is the preparation skill of the econometric chef that catches the professional eye, not the quality of the raw materials in the meal, or the effort that went into procuring them” (Griliches 1994, 14). Griliches’s strength as an empirical economist and his use of that strength in behalf of improving economic measurement is his lasting contribution to the field. • Finally, though he must have believed in “measurement with theory,” he took little interest in some of the attempts to theorize about economic measurement that took place during his career. The theorists’ 12. As an example of the latter, see the exchange between Griliches (1964), Jaszi (1964), and Gilbert (1961).
Zvi Griliches’s Contributions to Economic Measurement
585
writing about hedonic functions provides on example, which I discussed in a previous section. There were exceptions, including the paper by Treadway (1969), which he, almost alone, recognized and promoted. 19.5
The Statistical Agencies
What was Griliches’s influence on statistical agencies? His focus on measurement should have made him a welcome ally, but for most of his career, it was not the case. One cannot consider this topic without also considering the climate. Attitudes within statistical agencies were for the most part not cordial toward academics. I can still recall a blistering rebuke once administered in an open meeting by one of my Bureau of Labor Statistics (BLS) superiors: “No one wants that except your academic friends!” (The issue was, incredibly, services statistics.) The BLS response to the Stigler committee report (Price Statistics Review Committee 1961) in the hearings held on the report is another example. Agency officials were polite, but dismissed as impractical both the committee’s recommendations on cost-of-living indexes and on hedonic indexes (Reinsdorf and Triplett 2004). So ideas from outside seldom received any immediate hearing from inside statistical agencies, and Zvi’s ideas were no more cordially received than those of other academics. Over time, of course, the personnel changed. With changes in personnel, the intellectual climate also became more receptive, partly because the agencies hired more technically trained staff (particularly in economics), and the younger staff were closer to Zvi’s tradition and were molded in part by his influence, which was enormous and extensive.13 But it took a long time. Fraumeni (2000) lists the hedonic indexes that now contribute to U.S. national accounts and counts them as part of Griliches’s legacy, but most of them were produced in the last decade of his life. Slow acceptance was not confined to the United States. Even as late as 1999, the year of Griliches’s death, there was little interest in hedonic indexes outside North America, with the exceptions of France (Moreau 1996) and, to a lesser extent, Sweden. By 2003, statistical agency views had changed, and interest in hedonic indexes was everywhere. In addition to the United States and Canada in North America and early adopters France (and Sweden) in Europe, hedo13. Iain Cockburn produced a “Tree of Zvi” showing a huge collaborative network of economists who were associated with him in some research capacity (this is downloadable from http://www.people.business.edu/cockburn/tree_of_zvi.html). But the tree is incomplete in the sense that it was not possible to record all the economists who were touched by his legacy, in the form of fruitful suggestions on their work, providing an influential model for emulation and other professional inputs.
586
Jack E. Triplett
nic indexes have now been estimated in the United Kingdom, Australian, Dutch, and German statistical agencies, Eurostat established the European Hedonic Centre to examine the feasibility of hedonic indexes for computers and other products in Europe (Konijn, Moch, and Dalen 2003), and the Organization for Economic Cooperation and Development (OECD) commissioned a “handbook” on hedonic indexes (Triplett 2006). Strong interest has also been expressed in other countries. The state of interest in hedonic indexes today is several magnitudes greater than it was during any part of Zvi Griliches’s lifetime. Unfortunately, from a conversation with him in the last few months of his life, I think that he never expected the increased interest to happen. Timing is sometimes everything. 19.6 Final Remark I want to end on a personal note. Zvi Griliches refereed my first published article (Triplett 1969). He published his referee’s comments!14 Zvi’s last written professional work was his comment on my “Human Repair and Car Repair” article on medical care accounts (Griliches 2001). His influence, encouragement, and insightful suggestions on my research were constantly present throughout my career. It is impossible to measure how much I benefited, personally and professionally, from knowing him.
References Abramovitz, Moses. 1956. Resource and output trends in the United States since 1870. American Economic Review 46 (2): 5–23. Berndt, Ernst R. 1991. The practice of econometrics: Classic and contemporary. Reading, MA: Addison-Wesley. Berndt, Ernst R., and Zvi Griliches. 1993. Price indexes for microcomputers: An exploratory study. In Price measurements and their uses, ed. Murray F. Foss, Marilyn E. Manser, and Allan H. Young, 63–93. Studies in Income and Wealth, vol. 57. Chicago: University of Chicago Press. Berndt, Ernst R., and Jack E. Triplett, eds. 1990. Fifty years of economic measurement: The jubilee of the Conference on Research in Income and Wealth. Studies in Income and Wealth, vol. 54. Chicago: University of Chicago Press. Berry, Steven, James Levinsohn, and Ariel Pakes. 1996. Automobile prices in market equilibrium. Econometrica 63 (4): 841–90. Cole, Rosanne. 1993. Comment. In Price measurements and their uses, ed. Murray F. Foss, Marilyn E. Manser, and Allan H. Young, 93–99. Studies in Income and Wealth, vol. 57. Chicago: University of Chicago Press.
14. They make up a short section in Griliches (1971).
Zvi Griliches’s Contributions to Economic Measurement
587
Colwell, Peter, and Gene Dilmore. 1999. Who was first? An examination of an early hedonic study. Land Economics 75 (4): 620–26. Court, Andrew T. 1939. hedonic price indexes with automotive examples. In The dynamics of automobile demand, 99–117. New York: General Motors Corporation. Deaton, Angus, and John Muellbauer. 1980. Economics and consumer behaviour. Cambridge, UK: Cambridge University Press. Denison, Edward F. 1962. The sources of economic growth in the United States and the alternative before us. New York: Committee for Economic Development. ———. 1967. Why growth rates differ: Postwar experience in nine western countries. Washington, DC: Brookings Institution. ———. 1969. Some major issues in productivity analysis: An examination of estimates by Jorgenson and Griliches. Survey of Current Business 49 (5), pt. II): 1–27. ———. 1972. Reply to Jorgenson and Griliches. Survey of Current Business 52 (5, pt. II): 37–63. ———. 1974. Accounting for United States economic growth, 1929–1969. Washington, DC: Brookings Institution. Diewert, W. Erwin. 1976. Exact and superlative index numbers. Journal of Econometrics 4 (2): 115–45. ———. 2003. Hedonic regressions: A consumer theory approach. In Scanner data and price indexes, ed. Robert C. Feenstra and Matthew D. Shapiro, 317–48. Studies in Income and Wealth, vol. 64. Chicago: University of Chicago Press. Fraumeni, Barbara M. 2000. Zvi Griliches and his contributions to economic measurement. Survey of Current Business 80 (1): 15–17. Gilbert, Milton. 1961. Quality changes and index numbers. Economic Development and Cultural Change 9 (3): 287–94. Griliches, Zvi. 1961. Hedonic price indexes for automobiles: An econometric analysis of quality change. In The price statistics of the federal government: Review, appraisal, and recommendations, ed. Price Statistics Review Committee, National Bureau of Economic Research, 173–96. New York: National Bureau of Economic Research. ———. 1963. The sources of measured productivity growth: U.S. agriculture, 1940–1960. Journal of Political Economy 81 (4): 331–46. ———. 1964. Notes on the measurement of price and quality changes. In Models of income determination, 381–418. Studies in Income and Wealth, vol. 28. Princeton, NJ: Princeton University Press. ———. 1970. Notes on the role of education in production functions and growth accounting. In Education, income and human capital, ed. L. Lee Hanson, 71–115. Studies in Income and Wealth, vol. 35. New York: Columbia University Press. ———. 1971. Introduction: hedonic price indexes revisited. In Price indexes and quality change: Studies in new methods of measurement, ed. Zvi Griliches, 3–15. Cambridge, MA: Harvard University Press. ———. 1973. Research expenditures and growth accounting. In Science and technology in economic growth, ed. Bruce R. Williams, 59–83. New York: Wiley. ———. 1986. Economic data issues. In Handbook of econometrics. Vol. 3, ed. Zvi Griliches and Michael D. Intriligator, 1465–1514. Amsterdam: Elsevier Science. ———. 1988. Introduction. In Technology, education, and productivity, ed. Zvi Griliches, 1–24. Oxford, UK: Basil Blackwell. ———. 1990. Hedonic price indexes and the measurement of capital and productivity: Some historical references. In Fifty years of economic measurement: The jubilee of the Conference on Research in Income and Wealth, ed. Ernst R. Berndt
588
Jack E. Triplett
and Jack E. Triplett, 185–202. Studies in Income and Wealth, vol. 54. Chicago: University of Chicago Press. ———. 1992. Introduction. In Output measurement in the service sectors, ed. Zvi Griliches, 1–22. Studies in Income and Wealth, vol. 56. Chicago: University of Chicago Press. ———. 1994. Productivity, R&D, and the data constraint. American Economic Review 84 (1): 1–23. ———. 1996. The discovery of the residual: A historical note. Journal of Economic Literature, 34 (September): 1324–30. ———. 2001. Comment on “What’s different about health? Human repair and car repair in national accounts and in national health accounts.” In Medical care output and productivity, ed. David N. Cutler and Ernst R. Berndt, 94–95. Studies in Income and Wealth, vol. 62. Chicago: University of Chicago Press. Griliches, Zvi, and Gary Chamberlain. 1975. Unobservables with a variancecomponents structure: Ability, schooling and the economic success of brothers. International Economic Review 16 (2): 422–49. Griliches, Zvi, and William Mason. 1972. Education, income, and ability. Journal of Political Economy 80 (3, pt. II): S74–S103. Haas, G. C. 1922. Sale prices as a basis for farm land appraisal. Technical Bulletin no. 9. St. Paul: University of Minnesota Agricultural Experiment Station. Hamermesh, Daniel, Zvi Griliches, with students of the New Economic School. 1994. Hedonic price indexes for personal computers: Intertemporal and interspatial comparisons. Economic Letters 44 (4): 353–57. Hausman, Jerry A. 2003. Sources of bias and solutions to bias in the Consumer Price Index. Journal of Economic Perspectives 17 (1): 23–44. Hausman, Jerry A., Bronwyn Hall, and Zvi Griliches. 1984. Econometric models for count data with application to the patents-R&D relationship. Econometrica 52 (4): 909–38. Hulten, Charles R. 1973. Divisia index numbers. Econometrica 41:1017–25. Jaszi, George. 1964. Comment. In Models of income determination, 404–9. Studies in Income and Wealth, vol. 28. Princeton, NJ: Princeton University Press. Jorgenson, Dale W., and Zvi Griliches. 1967. The explanation of productivity change. Review of Economic Studies 34 (3): 249–83. ———. 1972. Issues in growth accounting: A reply to Edward F. Denison. Survey of Current Business 52 (5, pt. II): 65–94. Konijn, Paul, Dietmar Moch, and Jorgen Dalen. 2003. Comparison of hedonic functions for PCs across EU countries. Eurostat discussion paper presented at 54th session of the International Statistics Institute, Berlin. August. Koopmans, Tjalling. 1947. Measurement without theory. Review of Economics and Statistics 29 (3): 161–27. Kuznets, Simon. 1971. Modern economic growth: Findings and reflections. Nobel lectures: Economic sciences, 1969–1980, ed. Assar Lindbeck, 87–102. Singapore: World Scientific, 1992. Lancaster, Kelvin. 1971. Consumer demand: A new approach. New York: Columbia University Press. Lipsey, Robert E. 1990. Comment. In Fifty years of economic measurement: The jubilee of the Conference on Research in Income and Wealth, ed. Ernst R. Berndt and Jack E. Triplett, 202–9. Studies in Income and Wealth, vol. 54. Chicago: University of Chicago Press. Moreau, Antoine. 1996. Methodology of the price index for microcomputers and printers in France. Industry productivity: International comparison and measurement issues, 99–118. Paris: Organization for Economic Cooperation and Devel-
Zvi Griliches’s Contributions to Economic Measurement
589
opment. Nerlove, Marc. 2001. Zvi Griliches, 1930–1999: A critical appreciation. Economic Journal 111 (June): F422–F448. Ohta, Makoto, and Zvi Griliches. 1976. Automobile prices revisited: Extensions of the hedonic hypothesis. In Household production and consumption, ed. Nestor E. Terleckyj, 325–90. Studies in Income and Wealth, vol. 40. New York: National Bureau of Economic Research. Pakes, Ariel. 2004. Hedonics and the Consumer Price Index. Paper presented at international conference in memory of Zvi Griliches (1930–1999), R&D, Education, and Productivity, Paris. Price Statistics Review Committee, National Bureau of Economic Research. 1961. The price statistics of the federal government: Review, appraisal, and recommendations. Report to the Office of Statistical Standards, Bureau of the Budget, NBER General Series no. 73. New York: National Bureau of Economic Research. Reinsdorf, Marshall, and Jack E. Triplett. 2004. A review of reviews: Sixty years of professional thinking about the CPI. Paper presented at the NBER Conference on Research in Income and Wealth on price indexes, Vancouver. Rosen, Sherwin. 1974. Hedonic prices and implicit markets: Product differentiation in pure competition. Journal of Political Economy 82 (1): 34–55. Sheppard, Stephen. 1999. Hedonic analysis of housing markets. In Handbook of regional and urban economics, ed. P. Cheshire and E. S. Mills, 1595–1635. Amsterdam: Elsevier Science. Stavins, Joanna. 1995. Model entry and exit in a differentiated product industry: The personal computer market. Review of Economics and Statistics 77 (4): 571– 84. Stone, Richard. 1956. Quantity and price indexes in national accounts. Paris: Organization for European Economic Cooperation. Treadway, Arthur B. 1969. What is output? Problems of concept and measurement. In Production and productivity in the service industries, ed. Victor R. Fuchs, 53–84. Studies in Income and Wealth, vol. 34. New York: Columbia University Press. Triplett, Jack E. 1969. Automobiles and hedonic quality measurement. Journal of Political Economy 77 (3): 408–17. ———. 2006. Handbook on hedonic indexes and quality adjustments in price indexes. Paris: Organization for Economic Cooperation and Development. Triplett, Jack E., and Barry P. Bosworth. 2004. Services productivity in the United States: New sources of economic growth. Washington, DC: Brookings Institution. ———. 2006. “Baumol’s disease” has been cured: IT and multifactor productivity in U.S. services industries. In The new economy and beyond: Past, present, and future, ed. Dennis W. Jansen, 34–71. Cheltenham, UK: Edward Elgar. Waugh, Frederick V. 1928. Quality factors influencing vegetable prices. Journal of Farm Economics 10:185–96.
Contributors
Jaison R. Abel Analysis Group 111 Huntington Avenue, 10th Floor Boston, MA 02199 Ana Aizcorbe Bureau of Economic Analysis 1441 L Street, NW Washington, DC 20230 B. K. Atrostic Center for Economic Studies U.S. Census Bureau 4700 Silver Hill Road, Stop 6300 Washington, DC 20233 Orazio Attanasio Department of Economics University College London Gower Street London WC1E 6BT, England Eric J. Bartelsman Faculty of Economics and Business Administration Vrije Universiteit Amsterdam De Boelelaan 1105 1081 HV Amsterdam, The Netherlands
Erich Battistin Department of Statistics University of Padova Via Cesare Battisti, 241 35123 Padova, Italy J. Joseph Beaulieu Brevan Howard, Inc. Suite 250 1776 Eye Street, NW Washington, DC 20006 Ernst R. Berndt Sloan School of Management, E52-452 Massachusetts Institute of Technology 50 Memorial Drive Cambridge, MA 02142 Barry P. Bosworth The Brookings Institution 1775 Massachusetts Avenue, NW Washington, DC 20036 Sean M. Dougherty OECD Economics Department 2 Rue André Pascal 75775 Paris Cedex 16, France Robert C. Feenstra Department of Economics University of California, Davis Davis, CA 95616
591
592
Contributors
Kenneth Flamm L.B.J. School of Public Affairs University of Texas, Austin Austin, TX 78713 Harley Frazis Bureau of Labor Statistics 2 Massachusetts Avenue, NE Washington, DC 20212-0001 Michael J. Geske Washington University School of Medicine 660 South Euclid Avenue St. Louis, MO 63110 Robert J. Gordon Department of Economics Northwestern University Evanston, IL 60208-2600 Shane Greenstein Kellogg School of Management Northwestern University 2001 Sheridan Road Evanston, IL 60208-2013 Michael J. Harper Division of Productivity Research and Program Development Bureau of Labor Statistics 2 Massachusetts Avenue, NE Washington, DC 20212-0001 Judith K. Hellerstein Department of Economics Tydings Hall University of Maryland College Park, MD 20742 Saeed Heravi Cardiff Business School Cardiff University Colum Drive Cardiff CF10 3EU, Wales Charles R. Hulten Department of Economics Room 3105, Tydings Hall University of Maryland College Park, MD 20742
Hidehiko Ichimura Faculty of Economics University of Tokyo 7-3-1 Hongo, Bunkyo-ku Tokyo 113-0033, Japan Robert Inklaar Faculty of Economics University of Groningen PO Box 900 9700 AV Groningen, The Netherlands Dale W. Jorgenson Department of Economics Harvard University 1805 Cambridge Street Cambridge, MA 02138 Anjum Khurshid L.B.J. School of Public Affairs University of Texas, Austin Austin, TX 78713 Robert H. McGuckin, deceased David Neumark Department of Economics 3151 Social Science Plaza University of California, Irvine Irvine, CA 92697-5100 Sang Nguyen Center for Economic Studies U.S. Census Bureau 4700 Silver Hill Road, Stop 6300 Washington, DC 20233 Valerie A. Ramey Department of Economics, 0508 University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0508 Marshall B. Reinsdorf Bureau of Economic Analysis 1441 L Street NW, Mail Stop BE-40 Washington, DC 20230 Matthew D. Shapiro Department of Economics University of Michigan Ann Arbor, MI 48109-1220
Contributors Mick Silver Statistics Department International Monetary Fund 1900 Pennsylvania Avenue, NW Washington, DC 20431 Jonathan Skinner Department of Economics 6106 Rockefeller Hall Dartmouth College Hanover, NH 03755 Douglas Staiger Department of Economics HB6106, 301 Rockefeller Hall Dartmouth College Hanover, NH 03755-3514 Jay Stewart Employment Research and Program Development Staff Bureau of Labor Statistics 2 Massachusetts Avenue, NE Washington, DC 20212 Greg Stranger Boston Consulting Group Two Embarcadero Center, Suite 2800 San Francisco, CA 94111
593
Jack E. Triplett The Brookings Institution 1775 Massachusetts Avenue, NW Washington, DC 20036 Bart van Ark Faculty of Economics University of Groningen P.O. Box 800 9700 AV Groningen, The Netherlands Todd vanGoethem Bain & Company, Inc. 233 South Wacker Drive, Suite 4400 Chicago, IL 60606 Alan G. White Analysis Group 111 Huntington Avenue, 10th Floor Boston, MA 02199
Author Index
Abel, Jaison R., 10 Abowd, John M., 33, 33n3 Abraham, Katharine G., 93 Abramovitz, Moses, 346, 575n4 Adelman, I., 239 Aizcorbe, A., 248, 252, 352n4, 359, 372, 378 Allen, R. G. D., 274 American House Survey (AHS), 157 Andrews, William H., 65 Arguea, N. M., 240 Ashenfelter, Orley C., 33 Aten, Bettina, 345n15 Atrostic, B. K., 11–12, 383, 384, 395, 395n10, 404 Attanasio, Orazio, 13, 416, 517, 518, 519, 519n2 Azoulay, Pierre, 549 Babcock, Jarvis M., 546 Bacharach, Michael, 461 Bailar, Barbara A., 91n20 Baily, Martin N., 385, 396n12, 403n20, 415, 416, 436, 443 Bajari, Patrick, 12 Baldwin, John, 327n8, 399 Balk, B. M., 244, 273n4 Banks, James, 527 Barr, Abigail, 551 Bartelsman, Eric J., 12, 44n14, 399 Barua, A., 404 Baskin, Robert M., 164 Basu, S., 363n19
Bates, John M., 462n10 Battistin, Erich, 13, 516, 517, 518, 527n7, 530, 531, 532, 533, 535, 536 Baumol, William, 328, 346, 346n16, 347, 414n1 Beaulieu, Joseph, 12 Becker, Gary S., 47, 53, 82 Becker, R., 386 Bell, William R., 451n3 Benkard, Lanier, 162 Ben-Porath, Yoram, 47 Berenson, Stephen A., 164 Berman, Eli, 61, 386, 515 Berndt, Ernst R., 10, 101, 107, 110, 112, 122n1, 198, 199, 207n5, 220, 239n3, 240, 249, 251, 252, 274, 352n6, 353, 353n8, 353nn10–11, 485, 549, 551n5, 577, 577n6, 579 Bernstein, J., 319n33 Berry, Steven, 580 Berwick, Donald M., 545, 549, 550 Blank, David M., 157, 158, 166n17, 182, 183 Blundell, Richard, 516, 517, 535 Boskin, Michael, 6, 20, 21, 154n2, 235, 272 Bostrom, Ann, 93 Bosworth, Barry, 12, 386, 403, 414, 414n1, 415, 418n4, 419, 419n5, 420, 431, 433n16, 436n18, 576 Bound, John, 61, 386, 515 Braatz, Jay, 551 Brandner, Lowell, 546
595
596
Author Index
Bresnahan, T., 387, 389 Brown, Clair, 154, 155, 155n7, 156, 158, 163n13, 178 Brunner, E. D., 295 Brynjolfsson, Erik, 116, 287n19, 387, 389, 404 Burke, Mary A., 549 Busch, S. H., 274 Calvó, Armengol Antoni, 546n2 Cameron, G., 319n33 Campbell, D., 385, 396n12, 403n20 Card, David, 33 Case, Karl E., 156 Caselli, Francesco, 545, 550 Census of Manufacturers (CM), 45 Chamberlain, Gary, 347 Chao, L., 274, 315 Chow, Gregory, 326 Christensen, Laurits R., 17, 49n18, 328, 329, 347 Clayton, T., 399 Clemente, Peter C., 198 Cochrane, John H., 517 Cockburn, Ian, 585 Cole, Rosanne, 240, 352n6, 579 Colecchia, Alessandra, 326n4 Coleman, James S., 547, 550, 551n4, 558 Coleman, Wilbur John, II, 545, 550 Colwell, Peter, 578 Comin, Diego, 545, 551n5 Conley, Timothy G., 546n2, 549 Cooper, Russell, 109, 386 Corrado, Carol, 20, 359, 372, 378, 441, 463 Court, Andrew T., 577n6 Crandall, R., 354n14 Criscuolo, C., 400 Crone, Theodore M., 157, 158, 161, 164, 164n15, 165, 166, 168n18, 173, 186, 187, 193 Cuerdon, Timothy, 545, 549, 555 Cummings, Diane, 328, 329, 347 Cutler, David M., 516, 546, 559 Dalén, Jorgen, 315, 586 Dalton, K. V., 242, 251 Danzon, P., 274, 315 Davis, Steven J., 516, 517, 518 Deaton, Angus S., 517, 581 Dedrick, J., 397, 398, 398n14, 404 Denison, Edward F., 17, 344, 344n13, 345, 346, 347, 575, 575nn4–5, 576
Díaz, Antonia, 162 Diewert, W. Erwin, 240, 244, 244n7, 246, 247, 248, 251, 253nn18–19, 273n4, 274, 281, 372, 372n21, 484, 486, 490, 581, 583 Dilmore, Gene, 578 DiMasi, J. A., 294n7 Dixon, Robert, 548, 554n9 Domar, Evsey, 424, 425n10, 468 Doms, Mark, 44n14, 122n1, 126, 142n11, 353, 353n12, 354n15, 359, 372, 378, 378n24, 385, 386, 396, 404 Dougherty, Chrys, 328, 329, 347 Dougherty, Sean M., 10 Doughtery, A., 162 Downes, Tom, 205 Dulberger, E., 240, 352n3 Dunne, T., 385, 386, 388n2, 394, 396, 404 Dupuy, Chris, 201 Durlauf, Steven, 346n16, 551 Ehrenberg, Ronald G., 441 Fafchamps, Marcel, 551 Feenstra, Robert C., 12, 243n6 Fernald, J. G., 363n19 Fisher, Elliott, 546, 554n10, 559 Fisher, Franklin, 19, 102, 112, 114, 417 Flamm, Kenneth, 11, 352n3, 352n5, 353n7, 353n13 Forman, C., 354n15, 378n24 Forsyth, F. G., 251 Foster, Andrew D., 549 Foster, Lucia, 37 Fournier, Gary M., 549 Fowler, R. F., 251 Frank, Richard G., 274, 567n16 Frank, Robert H., 47 Fraumeni, Barbara M., 21n2, 294n5, 415, 438, 438n20, 440n21, 441, 585 Frazis, Harley, 8, 93, 94n25, 95n27 Freeman, C., 295 Friedberg, Leora, 94 Friedman, Lee, 546 Friedman, Milton, 20, 576 Fuss, Melvyn A., 107, 110, 112, 485 Gandal, N., 287n19 Gates, J., 384, 404 Gautheir, Ann, 80 Genesove, David, 161
Author Index Gersbach, H., 316, 316n30 Gershuny, Jonathan, 80 Geske, Michael, 9 Gilbert, Charles E., 465, 584n12 Godbey, Geoffrey, 80n8 Golan, Amos, 462n9 Goldfarb, Avi, 198, 214 Goldin, Claudia, 547, 551, 556, 556n13 Gollop, Frank M., 415, 424 Goodman, Jack, 156n10 Gort, Michael, 185 Gottlieb, Stephen S., 546 Gottschalk, Peter, 94 Grawbowski, H. G., 294n7 Grebler, Leo, 157, 158, 166n17, 182, 183 Greenan, N., 385 Greenlees, J. S., 242, 251 Greenless, John, 161 Greenspan, Alan, 5, 19–21 Greenstein, Shane, 9, 199, 200, 202, 204, 205, 207n6, 209, 229, 387, 389 Greenwood, Jeremy, 185 Griliches, Zvi, 3, 4, 5, 11, 16, 17, 18, 19, 31n1, 45, 59, 61, 65, 66, 99, 106, 112, 122n1, 198, 236, 238, 239, 239n3, 240, 241, 310, 326, 353n8, 353n10, 384, 386, 413, 414, 417, 436, 488, 510, 515, 546, 547–48, 554n9, 559, 567, 573, 574, 575, 575n5, 579, 582, 582n10, 583, 583n11, 584, 584n12, 586 Grimm, B., 271, 287, 353, 353n7, 353n9, 359, 372, 378 Gronau, Reuben, 82 Grundfeld, Yuhuda, 417, 488 Guiso, Luigi, 551 Gullickson, William, 441 Günlük-Senesen, Gulay, 462n10 Gurbaxani, V., 397, 398, 398n14, 404 Haan, J. de, 247n12 Hall, Bronwyn, 546n1, 584 Hall, Robert E., 99, 102–3, 107, 114, 117, 122, 123, 465, 501 Haltiwanger, John, 37, 109, 384, 386, 388n2, 389n4, 394, 404 Hammermesh, Daniel S., 88, 94n25, 574n2 Hansen, R. W., 294n7 Harchaoui, Tarek, 327n8 Harcourt, G. C., 101, 112 Harmon, Harry H., 552 Harper, Michael J., 9, 441 Harvey, Andrew, 73, 80, 80n8
597
Haskell, J. E., 400 Hass, G. C., 578 Hausman, Jerry, 584 Havens, A. Eugene, 546, 548 Hellerstein, Judith K., 8, 32, 33n2, 37, 39, 39n7, 44n14, 46, 48, 48n16, 48n17, 50, 54, 54n25, 58 Helliwell, John, 551 Heravi, Saeed, 10, 236, 240, 249, 252, 253, 255 Herz, Diane, 74 Heston, Alan, 252, 298, 299, 300, 345, 345n15 Hill, Peter, 100 Hitt, Lorin M., 116, 387, 389, 404 Ho, Mun S., 326n5, 386, 435, 436, 444 Hobijn, Bart, 545, 551n5 Holloway, Sue, 83 Horrigan, Michael, 74 Howell, Joel D., 549 Hsiao, C., 240 Hsieh, Chang-Tai, 485, 500, 503n13, 510, 512 Huff, Edwin D., 545, 549, 555 Hulten, Charles R., 7, 19, 20, 25, 99, 101, 117, 122, 123, 154, 313, 385, 396n12, 403n20, 424, 485, 583n11 Hutchens, Robert M., 47 Ichimura, Hidehiko, 13 Inklaar, Robert, 10, 294n5, 297, 299n11, 315n28 Irwin, D. A., 352n3 Islam, Nazrul, 328, 346, 346n16, 347 Jackson, Matthew O., 546n2 Jaffe, S. A., 310 Jang, S. L., 352n3 Jarmin, Ron S., 44n14, 384, 386, 388n2, 389n4, 394, 404 Jaszi, George, 584n12 Jencks, Stephen F., 545, 549, 555 Johnson, Paul, 527 Jorgenson, Dale, 4, 8, 11, 17, 49n18, 100, 112, 123, 301, 326n5, 326n7, 327n10, 328, 334, 347, 352n4, 386, 399n16, 403, 415, 416, 422, 435, 436, 438, 441, 444, 451, 465, 501, 510, 575n5, 583, 583n11 Jovanovic, Boyan, 549 Judge, George, 462n9 Juhn, Chinhui, 95
598
Author Index
Katz, Elihu, 547, 550, 551n4, 568 Katz, Lawrence F., 516, 547, 551, 556 Kemerer, C. F., 287n19 Keynes, John Maynard, 483, 487, 487n3 Khurshid, Anjum, 11 Kiba, T., 296, 297, 298 Kikuchi, J., 296, 297, 298 Kim, Jong-Il, 485 Klenow, Peter J., 352n3, 545 Kokoski, M., 240 Konijn, Paul, 315, 586 Koopmans, Tjalling, 16, 581 Kraemer, K., 397, 398, 398n14, 404 Kramarz, Francis, 33 Kranier, John, 162 Kravis, Irving, 298, 299, 300, 345 Kriebel, D. J., 404 Krizan, C. J., 37 Krol, Ed, 201, 203 Krueger, Dirk, 516, 541 Krugman, Paul R., 485 Kuroda, M., 301 Kuznets, Simon, 328, 334, 345, 346, 576, 583 Landefeld, J. Steven, 83, 83n10 Lane, Walter F., 164 Lau, Lawrence J., 49n18, 485 Lawrence, Robert Z., 415, 416, 512 Lawson, Ann, 416 Lazear, Edward, 47, 54 Lebow, D. E., 315n29 Lengermann, Paul, 33n3 Leontief, Wassily W., 101 Levinsohn, James, 66, 580 Lewbel, Arthur, 535 Lilien, Gary, 568 Lipsey, Robert E., 577 Loewenstein, George, 47 Lucas, Robert E., Jr., 20, 22 Luengo-Prado, Mari José, 162 Luft, Harold, 555n11 Lum, Sherlene K. S., 416 MacDonald, A. S., 295, 297, 298 Mace, Barbara J., 517 Mackie, Christopher, 131 Maddison, Angus, 328, 345, 346 Mairesse, Jacques, 66, 384, 385 Mandelkern, Gail, 157, 186n27 Mankiw, N. Gregory, 346, 347 Mansfield, E., 295n8 Marine, April, 203
Marschak, Jacob, 65 Mason, William, 579 McCarter, Robert J., 546 McCarthy, Mary E., 517, 526, 527, 531 McClellan, Mark, 546, 559 McCulla, Stephanie H., 83, 83n10 McGrattan, Ellen, 346n16 McGuckin, Robert H., 10, 44n14, 294nn4–5, 297, 299n11, 312n25, 316, 385 McKinney, Kevin, 33n3 Meeker, Mary, 201 Meese, Richard, 165 Menzel, Herbert, 547, 550, 551n4, 568 Mesenbourg, T., 404 Mincer, Jacob, 47 Miranda, Javier, 44n14 Miron, Jeffrey A., 451n3 Moch, Dietmar, 315, 586 Mohr, Michael F., 465 Monroe, C. W., 249 Moreau, Antoine, 585 Morgenstern, Oskar, 4 Morrison, Catherine J., 101, 363n19 Moses, Karin E., 154 Motohashi, Kazuyuki, 327n10, 399 Moulton, Brent R., 154 Moyer, Brian C., 416 Muelbauer, John, 581 Mukhopadhyay, T., 404 Mulligen, P. H. van, 315, 315n29 Nakamura, Leonard I., 157, 158, 161, 164, 164n15, 165, 166, 168n18, 173, 186, 187, 193 Nieuwenhuijsen, H. R., 399 Nerlove, Marc, 238, 584 Neumark, David, 8, 32, 33n2, 37, 39, 39n7, 44n14, 46, 48, 48nn16–17, 50, 54, 54n25, 58 Nguyen, Sang, 11–12, 383, 395, 395n10 Nordhaus, William D., 6, 73, 154, 424n9 Norsworthy, J. R., 352n3 Nyarko, Yaw, 549 O’Donnell, Shawn, 202 Ohta, Makoto, 239, 579 Okubo, Sumiue, 21n2, 294n5 Oliner, Stephen D., 100, 117, 122n1, 123, 271, 276, 287, 352n4, 363n19, 386, 403, 415, 416, 422, 433, 434n17, 435, 436 Olmstead, Alan L., 554 O’Mahony, Mary, 315n28, 416, 441
Author Index Pakes, Ariel, 19, 207n5, 220n18, 237, 238, 240, 248n13, 252, 580, 582 Parente, Stephen L., 545 Parker, Robert P., 271, 287, 454, 463, 463n11 Pascoe, George, 44n14 Paxson, Cristina H., 517 Perri, Fabrizio, 516, 541 Petrin, Amil, 66 Phelps, Charles E., 549 Pieper, Paul E., 443 Pilat, D., 399 Pindyck, Robert S., 549 Pistaferri, Luigi, 516 Placek, Frank, 164 Postner, Harry H., 452 Power, L., 386, 396n12 Prasad, Kislaya, 549 Prescott, Edward C., 545 Preston, Ian, 516, 517 Prud’homme, Marc, 199, 276, 283 Putnam, Robert D., 547, 551 Quah, Danny, 346n16 Raff, Daniel M. G., 198 Ramey, Valerie, 9, 122, 125n5, 142 Randolph, William C., 157, 164, 166n16 Rao, D. S. Prasada, 483, 484, 486 Rappaport, Neal J., 198, 239n3, 251, 252, 352n6, 353, 353n8, 353nn10–11 Regev, Haim, 65 Reid, Margaret G., 82, 83 Reinsdorf, Marshall B., 12, 585 Rhodes, Paul W., 554, 554n8 Robinson, John P., 80n8, 93 Robinson, Sherman, 462n9 Rodriguez-Clare, Andres, 545 Rogers, Everett M., 546, 548, 550, 567n16 Romer, David, 346, 347 Romer, Paul, 346, 347 Rosen, Sherwin, 238, 580, 581, 582 Rosenzweig, Mark R., 549 Rozaklis, P., 240 Rudd, J. B., 315n29 Ruggles, Steven, 193 Rupert, Peter, 185 Sabelhaus, John, 527, 532 Sabourin, D., 399 St. Croix, Aimee, 73, 80n8 Sakuma, I., 296, 297, 298 Sapienza, Paola, 551
599
Schank, T., 386, 388n2, 389n4, 394 Schmitz, James, 346n16 Schreyer, Paul, 326, 326n4, 328, 399n16, 403n20 Schultz, Theodore W., 548, 574n3 Schultze, Charles, 131 Schwartz, Lisa K., 77 Selvanathan, E. A., 483, 484, 486 Seskin, Eugene P., 287, 454, 463, 463n11 Shah, C., 256 Shapiro, Matthew D., 9, 21, 122, 125n5, 142 Sheppard, Stephen, 165, 578 Sherwood, Mark K., 441 Shiller, Robert J., 156 Short, Sandra, 83 Sichel, Daniel, 20, 100, 117, 271, 276, 287, 352n4, 354n14, 363n19, 386, 403, 415, 416, 422, 433, 434n17, 435, 436 Sicherman, Nachum, 47 Silver, Mick, 10, 236, 240, 241, 242n5, 247, 249, 252, 253, 255, 255n20 Sinai, Todd, 163 Sinkhorn, Richard, 461 Skinner, Jonathan, 13–14, 545, 546, 549, 550n3, 551n5, 554, 554n10, 559 Slaughter, M. J., 400 Slesnick, Daniel T., 516, 526, 527, 531 Slifman, Lawrence, 441, 463 Smeeding, Timothy M., 94 Solow, Robert, 100–101, 102, 110, 346, 347, 417 Song, M., 352n3 Souleles, Nicholas S., 163 Spletzer, James R., 93 Staiger, Douglas, 13–14, 546, 549, 551n5, 554, 554n10, 559 Stavins, Joanna, 580 Stevens, Philip, 441 Stewart, Jay, 8, 75, 82n9, 93, 94n25, 95, 95n27 Stewart, K. J., 242, 251 Stigler, G., 235 Stiroh, Kevin J., 100, 112, 326n5, 352n4, 384, 386, 397, 398, 399, 399n16, 403, 415, 416, 422, 435, 436, 444, 451 Stolarick, K. M., 397 Stone, Richard, 577n6 Strang, David, 550, 551n4 Stranger, Greg, 9, 210, 229 Straus, Murray A., 546 Streitwieser, M. L., 385 Summers, Robert, 298, 299, 300, 345, 345n15
600
Author Index
Syverson, Chad, 67n35 Szulc, B. J., 251
Voith, Richard, 157, 158, 161, 164, 164n15, 165, 166, 168n18, 173, 186, 187, 193
Tamplin, Sarah, 83 Taylor, G. A., 240 Timmer, M. P., 298, 299, 301, 315n28 Topiol-Bensaid, A., 385 Townsend, Robert M., 517 Trajtenberg, Manuel, 198 Treadway, Arthur B., 585 Triplett, Jack E., 7–8, 12, 99, 116, 236, 237n1, 238, 240, 243n6, 273n5, 365n20, 386, 403, 414, 414n1, 415, 418n4, 419, 419n5, 420, 431, 433n16, 436n18, 441, 576, 577, 577n6, 578n7, 585, 586 Troske, Kenneth R., 32, 44n14, 46, 48, 48n17, 50, 54, 54n25, 58, 385, 386, 396, 404 Tuma, Nancy Brandon, 550, 551n4
Waehrer, K., 240 Wallace, Nancy E., 165 Waugh, Frederick V., 578 Weale, Martin, 451n3 Webb, Anthony, 94, 255n20 Weil, David, 346, 347 Welch, Finis, 95 Weston, Rafael R., 157, 158n11, 175 White, Alan G., 10, 249 Wilcox, David W., 21, 451n3 Wilkening, Eugene A., 548 Wilson, D., 386, 399n15 Winnick, Louis, 157, 158, 166n17, 182, 183 Wittes, Janet, 546 Wu, H. X., 315n28 Wyckoff, Andrew, 325–26, 326n3 Wykoff, Frank C., 116, 122n1, 123
Udry, Christopher R., 546n2, 549 van Ark, Bart, 10, 298, 299, 301, 312n25, 315n28, 316, 316n30, 327, 327n9, 416 Van den Bulte, Christophe, 568 Van Garderen, K. J., 256 vanGoethem, Todd, 9 van Leeuwen, G., 399 Van Order, R., 162 Vogel, Robert A., 546
Yip, Eric, 328, 329, 347 Young, Alwyn, 295, 484, 485, 509, 512 Yu, Kam, 199, 276, 283 Yuskavage, Robert E., 415 Yusuf, Salim, 546, 547, 549 Zeldes, Stephen P., 451n3 Zingales, Luigi, 551
Subject Index
Page numbers followed by t or f refer to tables and figures, respectively. 1990 Census of Population: previous matched data using, 34–35 1990 Decennial Employer-Employee Dataset (DEED), 32; estimates from, 50–55; fine-tuning matching for, 38–39, 39n8; introduction to, 33; matching workers and establishments, 37–38; overview of, 35–37; previous work on, 48–50; representativeness of, for manufacturing workers, 40–43, 40t; WECD results and results from, 55–59 1990 Standard Statistical Establishment List (1990 SSEL), 32, 34–35, 34n4 Access providers. See Internet service providers (ISPs) Acute myocardial infarction (AMI). See Heart attacks Advanced Research Projects Agency Network (ARPAnet), 199 Age-related depreciation, 125, 133, 136–38, 141–42 Age-zero depreciation, 125 Aggregate price indexes, elementary units and, 271–74 Aggregation, 417, 459; of industry productivity measures, 422–30 American Housing Survey (AHS), 157, 191–92; hedonic regression estimates of apartment rents from, 165–68; qual-
ity changes and, 168–72; quality issues in, 192–93 American Time Use Survey (ATUS), 13; data collection, 74; demographic information, 75; household production questions and, 82–87; introduction, 73–74; labor force information, 75–76; measuring hours worked questions and, 87–94; vs. other time use surveys, 80–82, 81t; summary questions for, 77– 78; time diary of, 76–77; time estimates from data of, 79–80. See also Time use data Annual Survey of Manufacturers (ASM), 45 AOL, 204–5 Apartment rents: Brown’s evidence on quality change and, 178–82; evidence on quality change and, 182–83; hedonic regression estimates, for AHS data, 165–68; hedonic regressions based on Census of Housing data, 172–75; merging prehedonic and hedonic results for century-long perspective of, 158–86; study of, in Evanston, Illinois, 186–89; Weston data (1930-1970) for, 175–78. See also Rents Arrow-Debreu revolution, 15 Assets: machine model to analyze assets of, 107–11; obsolescence and demise of, 117–19
601
602
Subject Index
AT&T, 204–5 ATUS. See American Time Use Survey (ATUS) Balancing, 460–63 Baumol effect, 424n9 Baumol’s disease, 414, 414n1, 419 Bias: in CPI, 235–37; in-sample, 236; outof-sample, 236 Block Numbering Areas (BNAs), 34n4 Bulletin boards, 203 Bureau of Labor Standards (BLS), 73 Canada. See G7 nations Capital, IT, role of, 430–33 Capital inputs, measuring, 384–85 Capital measurement, 106–7; quality adjustment and, 103 Capital models, 100–103 Capital stock: calculating, 466; homogeneity assumption underlying measures of, 114–16 CES. See Constant elasticity of substitution (CES) CEX. See Consumer Expenditure Survey (CEX) Chained-base fixed effects indexes, 249 Chat rooms, 200 Computer depreciation: age-related estimates of, 136–38, 141–42; data for, 127–30; decomposing, 143–47; estimates of, 136–43; modeling, 131–36; obsolescence and, 133–36; obsolescence estimates for, 138–40; overview of, 121–23; theoretical framework for, 123–27 Computer inputs: empirical findings/discussion for relationships among computer networks, labor productivity and, 391– 405; estimating relation of labor productivity to, 390–91; measuring, 386–87 Computer investments, 386–87 Computer networks: estimating impact of, 389–90; estimating relation of labor productivity to, 390–91; production function and, 383–84 Computer semiconductor devices: changes in quality, NIPA and, 99–100; empirical findings/discussion for relationships among computer inputs, labor productivity and, 391–405; hedonic model for new prices of, 130–36. See also
Microsoft’s personal computer software products Computer services, 386 Concordances, 457–58 Concording, 453, 463–65 Conference on Research in Income and wealth (CRIW), 5, 6–7, 18n1 Constant elasticity of substitution (CES), 484, 490–96 Consumer Expenditure Survey (CEX), 13; comparison of expenditure means from two surveys, 526–28; description of, 520–26; introduction to, 516–20; wage inequality in, 528–30 Consumer Price Index (CPI): belief in upward bias, 153–54; case for downward bias in, for apartment rents, 163–65; circumstantial evidence for downward bias of, 155; costs of bias in, 24; gross rents over century and, 158–61; logical case for downward bias of, 154–55; measurement bias in, 235–37; overstatement of inflation and, 6; sources of bias in, 20–21 Consumer Price Survey (CPS), wage inequality in, 528–30 Consumption, measuring and modeling, 12–14 Consumption inequality, U.S., 530–33; introduction to, 515–16; results of study for, 538–41; studies of, 516–17 Cost-of-goods index (COGI), 24 Cost-of-living adjustments (COLAs), 23 Cost-of-living index (COLI), 24–25 CPI. See Consumer Price Index (CPI) CPS. See Consumer Price Survey (CPS) Current Population Survey (CPS), 74 DEED, 1990. See 1990 Decennial Employer-Employee Dataset (DEED) Depreciation. See Computer depreciation Deterioration, 125, 125n3 Diary Sample (DS), 517–18, 527; combining information from IS and, 533–38 Diffusion: data for study of, 553–56; economic models of, 549; empirical results of study of, 556–65; factor analysis approach to, 552–53; Griliches vs. sociologists on, 547–52; measuring and modeling, 12–14. See also Technology adoption Disaggregation, 459–60
Subject Index Dummy time (variable) hedonic (DTH) indexes, 237, 247–48; vs. hedonic imputation indexes, 252–54 Econometric analysis, 15 Economic growth, labor and, 4 Economics, post–World War II revolutions in, 15 Economic systems, feedback mechanisms in, 22–23 Economic theory, accurate measurement and, 4 Elementary units, aggregate price indexes and, 271–74 Factor analysis model, 552–53 Feedback mechanisms, 22–23 Fisher hedonic index, 246 Fisher Ideal price index, 274 Fisher price indexes, 272–74 Fixed-base–fixed-effects indexes, 249 Fixed-basket indexes, 272–74 Fixed effects (panel) estimators, 248–49 France. See G7 nations Fully constrained fixed effects index, 249 G7 nations: alternative approaches to measuring economic growth of, 344–47; impact of investment in IT on economic growth of, 325–28; importance of investment and productivity for economic growth of, 336–44; investment/productivity in, 328–35; investment in IT in, 336–44 General equilibrium analysis, 15 Geometry, and hedonic indexes, 244–45 Germany. See G7 nations Griliches, Zvi, 547; contributions of, 15–16, 573–74; hedonic indexes and, 577–83; impact of, on productivity research, 576–77; influence of, on statistical agencies, 585–86; MFT mismeasurement hypothesis of, 574–76; research themes of, 16; topics that didn’t interest him, 583–85 Hall equation, 102–3 Heart attacks: Medicare claims data on treatment of, 554–55; other data on quality of medical care for, 555; treatment of, 546 Hedonic approach, 237–41
603
Hedonic imputation (HI) indexes, 237; vs. dummy time hedonic indexes, 252–54; unweighted arithmetic means of relatives, 245; unweighted geometric means, 242–44; weighted arithmetic means of relatives, 246; weighted geometric means, 244–45 Hedonic price indexes: concern of, 241; for CPI measurement, 236–37; data for study of, 254–55; Griliches’s contribution to, 577–83; for ISPs, 216–22; mean value function for, 247; methodology for studying, 251–54; methods of, 241– 49; regressions for, 255–56; research questions for studying, 249–51; results for study of, 256–62; theory of, 237–41; weighted, 228–29. See also Price indexes Hedonic regressions, 236 Hicks-Samuelson revolution, 15 HI indexes. See Hedonic imputation (HI) indexes Hours worked, measuring, 87–94 Household production: defined, 83; National Income and Products Accounts and, 82–87 Human capital, 4 Hybrid corn, adoption of, 546–47 Hybrid indexes, 241 IBM, 204–5 ICT capital, 466–67, 466n12 ICT workers, 467n13 Indexes. See Price indexes Index numbers: integrating economic and stochastic approaches to, 487–89; introduction to, 483–85; stochastic approach, 485–87 Indirect current-period HI index, 244 Industry productivity: aggregating measures of, 422–30; measurement issues, 436–45 Inflation, overstatement of, CPI and, 6 Information technology (IT), 112; contribution of, to U.S. labor productivity growth, 418; impact of investment in, on G7 nations, 325–28; investment in, 336–44 Information technology (IT) capital, role of, 430–33 Information technology (IT) industries, studies of growth outside of, 433–36
604
Subject Index
In-sample bias, 236 Internet service providers (ISPs), 200; data set used for studying, 207–10; elementary price indexes for, 210–28; hedonic price indexes for, 216–28; history of, in U.S., 199–207; introduction to, 197–99; market/pricing structure of, 204–6; number of, in U.S., 203; organization of, 201f; price determinants for, 213– 16; price indexes of, 206–7; pricing by, 200–204; size of, 202; weighted, 228–29 Interview Sample (IS), 517–18, 527; combining information from DS and, 533–38 Investment, productivity and, 328–35 ISPs. See Internet service providers (ISPs) IT. See Information technology (IT) Italy. See G7 nations Japan. See G7 nations Jevons index, 242 Knowledge capital, measurement of, 20–21 Labor: as component of R&D cost, 301–2; economic growth and, 4; measurement, 31–32; quality of, in production function, 43–48 Labor input, quality of (QL), 45–46 Labor productivity: empirical findings/discussion for relationships among computer inputs, computer networks and, 391–405; services industries and, 414; time series estimates of, 101; trends in, 418–22. See also Services productivity Labor productivity growth, measuring, 418 Laspeyres price indexes, 243, 246, 272–74 Listservs, 200 Longitudinal Research Database (LRD), 35, 44–45, 44n14 Machine model, 103–7; for aggregate capital measures, 111–12; marginal product of machines and, 112–14; nominal earnings of assets described with, 107–11 Marginal product, of machines, 112–14 Matched-Model price indexes: elementary units and, 271–72; for Microsoft’s software products, 274–78 MCI, 204–5 Measurable sectors, 5 Measurement, theory and, 16–19 Microsoft’s personal computer software products: background of, 270–71;
introduction to, 269–70; matchedmodel price indexes for, 274–78; results of price changes for, 278–83 Multifactor productivity (MFP), 5, 101; in IT-producing industries, 433–35; services industries and, 414; trends in, 418–22. See also Services productivity National Income and Product Accounts (NIPA), 15, 20; household production and, 82–87; quality change in computers and, 99–100 National Science Foundation, 199 Negative productivity growth, 440–43 Negative rents, 110–11 Networks. See Computer networks New Worker-Establishment Characteristics Database (NWECD), 34–35 Obsolescence, 117–19, 125–26, 138–39; of attributes, 139–40; computer depreciation and, 133–36 Out-of-sample bias, 236 Paasche hedonic current-period index, 246 Paasche-Laspeyres spread, 310–11 Paasche price indexes, 272–74 Patching, 241 Personal computers. See under Computers Personal computer software products. See Microsoft’s personal computer software products Personal consumption expenditures (PCE), 153 PPPs. See Purchasing power parities (PPPs); Research and development (R&D) PPP Prepackaged software products. See Software products Price hedonics, 4 Price indexes: aggregate, 271–74; Fisher, 272–74; hybrid, 241; Laspeyres, 272– 74; Matched-Model, 271–72; Paasche, 272–74; rental, 161–63; Sato-Vartia, 490–91; for semiconductor devices, 355–63; weighted-hedonic, 228–29. See also Hedonic price indexes Price measurement, 9–11 Production function: accounting for unobservables for, 65–68; importance of heterogeneous labor for estimates of, 59–65; quality of labor input in, 43–48 Production technology, of plant, 43–44
Subject Index Production with machines, model, 103–7 Productivity: investment and, 328–35; measuring and modeling, 12–14; men vs. women, 32. See also Industry productivity Productivity data sets: hurdles to overcome constructing, 451–54; metadata for, 457–58; organizational overview of, 454; relational structure of, 454–57; standardized operations, 458–65 Productivity Program (National Bureau of Economic Research), 18n1 Purchasing power parities (PPPs), 10–11. See also Research and development (R&D) PPP Quality change: Brown’s evidence on, rents and, 178–82; evidence on, 182–83; measuring, 116–19; quantifying, 183–85 Quality of labor input (QL), 45–46 Quantitative analysis, 15 R&D. See Research and development (R&D) Real capital, measuring, 111–16 Real R&D intensities, 314–18 Rental price indexes, conceptual issues in development of, 161–63. See also Consumer Price Index (CPI) Rental shelter housing: CPI and, 158–61; introduction to, 153–54; as research topic, 156–57 Rents: extraction of, from machines, 107–8; negative, 110–11. See also Apartment rents Research and development (R&D), 10–11; intensities, 314–18 Research and development (R&D) PPP: alternative versions of, at country level, 311–12; alternative versions of, at industry level, 312; alternative versions of, for 1987, 312–13; computation of, 300–305; estimation in manufacturing, 298–308; introduction to, 291–94; for 1987, 307–8; over time, 313–14; previous research on, 294–98; sensitivity of, 308–11 Residual model, Solow’s, 100–101 Sample Edited Detail File (SEDF), 34–35, 34n4 Sato-Vartia price index, 490–91
605
Scrappage, 126 Semiconductor devices: calculations for relative importance of, 376–77; construction of price indexes for, 355–63, 369– 76; introduction to, 351–55; prices of, and prices of end goods, 363–68. See also under Computers Services productivity: introduction to, 413– 16; measuring, 415–16. See also Labor productivity; Multifactor productivity (MFP) Singapore: application of study results to productivity in, 500–511; TFP for, 485 Software products: impact of quality change/inflation on, 286–98; studies in price changes of, 283–84; U.S. government producer price indexes for, 284– 86. See also Microsoft’s personal computer software products Solow’s residual model, 100–101 Solow vintage capital model, 101–2 SSEL, 1990. See 1990 Standard Statistical Establishment List (1990 SSEL) Statistical agencies, Griliches’s influence on, 585–86 System of National Accounts (SNA), 82–83 Szalai International Study, 73 Technology adoption: introduction to, 545– 47; measures of nonmedical, 553–54; state-level factors influencing rate of, 556. See also Diffusion TFP. See Total factor productivity (TFP) Theory, measurement and, 16–19 Time use data: activities of nonworker uses of, 95–96; household production and, 82–87; income/well-being uses of, 94– 95; intrahousehold allocation of time uses of, 94; measuring hours worked and, 87–94; uses of, 82–96. See also American Time Use Survey (ATUS) Törnqvist hedonic-imputation index, 246 Total factor productivity (TFP): growth, 101; requirements for measuring, 450–51 Translog function, 496–99; with stochastic prices, 499–500 “Tree of Zvi,” 585n13 United Kingdom. See G7 nations United States. See G7 nations; Services productivity Unmeasurable sectors, 5 Unweighted hedonic indexes, 242–45
606
Subject Index
U.S. labor productivity: basic industry data for, 465–66; contribution of IT to growth of, 418; investment data for, 465; labor services and, 466–67; methodology for measuring growth of, 418; in nonfarm business, 467–69; outside computer/semiconductor manufacturing, 417; overview of growth of, 416–18; Y2K and, 469–72
Wage inequality, in CEX/CPS, 528–30 Weighted hedonic price indexes, 228–29 Women, productivity of, 32 Worker-Establishment Characteristics Database (WECD), 34–35, 40n9; DEED results and results from, 55–59 World Wide Web (WWW), 200 Write-in file, 36–38 Y2K, 469–72
Vintage aggregation, introduction to, 99–100 Vintage capital model, Solow, 101–2