No title

The Econometrics Journal Editorial Announcement The Denis Sargan Econometrics Prize The Econometrics Journal on behalf ...

Author: Pierre Perron and Richard J. Smith

18 downloads 996 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

The

Econometrics Journal Editorial Announcement The Denis Sargan Econometrics Prize The Econometrics Journal on behalf of the Royal Economic Society intends to initiate The Denis Sargan Econometrics Prize. The prize will be awarded for the best (unsolicited) article published in The Econometrics Journal in a given year by anyone who is within five years of being awarded their doctorate. An honorarium of £1000 will be awarded to the winning author. The winner of The Denis Sargan Prize will be chosen by The Econometrics Journal Editorial Board (Managing Editor and Co-Editors) and the prize awarded in the year following publication of the winning article. The first award of the prize will be for an article published in The Econometrics Journal during 2011. If an article of sufficient quality is not forthcoming in a given year the prize would not be awarded. The Denis Sargan Econometrics Prize commemorates the fundamental contributions to and profound influence on econometrics made by (John) Denis Sargan. Denis Sargan, after periods spent at the Universities of Leeds, Minnesota and Chicago, was appointed initially as Reader in Econometrics and then held the Chair of Econometrics at the London School of Economics and Political Science from 1963 until 1984. He was Tooke Professor of Economic Science and Statistics from 1982 until his retirement in 1984 and was President of the Econometric Society in 1980. Denis Sargan died in 1996. Denis Sargan’s research was prescient anticipating many of the themes that now dominate the econometrics literature. His 1964 Colston paper ‘Wages and prices in the United Kingdom: A study in econometric methodology’ introduces the error correction mechanism that underpins much macroeconomic empirical research. Modern concerns with near identification are reflected in Denis Sargan’s Presidential Address to the Econometric Society in 1980 published as ‘Identification and lack of identification’ in Econometrica in 1983. He made a number of seminal and substantive contributions to research on the estimation of linear simultaneous equations and instrumental variables models that also feature centrally in current econometric enquiry. His investigations into exact finite sample properties of econometric estimators and statistics and the development of improved asymptotic approximations to their small sample behaviour reflected his deep concern with the accuracy of econometric inferences. Interested readers might also like to consult: 1

Obituary written by Meghnad J. Desai, David F. Hendry and Grayham E. Mizon. Published in The Economic Journal (1997), 107, pp. 1121–1125 (available at http://www.jstor.org/stable/2957853?seq=1)

2

Memorial Issue of Econometric Theory dedicated to John Denis Sargan, 1924–1996. In a Memorial article entitled ‘Vision and influence in econometrics: John Denis Sargan’ by Peter C. B. Phillips, Denis Sargan’s intellectual influence in econometrics is discussed and some of his visions for the future of econometrics are considered. Published in Econometric Theory (2003), 3, pp. 416–514.

3

Economic Theory Interviews: Professor J. D. Sargan. By Peter C. B. Phillips. Published in Econometric Theory (1985), 1, pp. 119–139 (available at: http://www.jstor.org/stable/3532078?seq=1).

4

Econometric Theory Memorial Issue (2003), 19, pp. 417–422, Cambridge University Press. This issue of Econometric Theory is a Memorial Issue to commemorate John Denis Sargan. It brings together two of Denis Sargan’s essays on econometrics, a laudation by Antoni Espasa, and three separate memorial essays written by David F. Hendry, Peter M. Robinson, and Peter C. B. Phillips. Also included are some photographs of Denis Sargan and his family that were most kindly given to ET by Mary Sargan for publication in this issue. Available at: http://journals.cambridge.org/action/displayJournal?jid=ECT.

The

Econometrics Journal Econometrics Journal (2011), volume 14, pp. Ci–Ciii. doi: 10.1111/j.1368-423X.2010.00339.x

Royal Economic Society Annual Conference 2009 Special Issue on Factor Models: Theoretical and Applied Perspectives

EDITORIAL The papers in this Special Issue on Factor Models: Theoretical and Applied Perspectives arise out of the invited presentations given in The Econometrics Journal Special Session on this topic at the Royal Economic Society Annual Conference held 20–22 April 2009 at the University of Surrey. The organization of Special Sessions on subjects of current interest and importance at Royal Economic Society Annual Conferences is an initiative of the Editorial Board of The Econometrics Journal to enhance further the profile and reputation of the journal. The Editorial Board is responsible for the choice of topic and organization of the Special Session. The intention is by judicious choice of topics and speakers to encourage further a higher standard of submissions to The Econometrics Journal. The 2009 Special Session on Factor Models: Theoretical and Applied Perspectives was organized by Pierre Perron and Richard J. Smith, Co-Editor and Managing Editor of The Econometrics Journal respectively, with Pierre Perron overseeing the editorial process for the submitted papers arising from the Special Session. Of course, in such a diverse field and owing to the time constraint imposed by the Special Session, the specific topics considered are necessarily restrictive but hopefully they do provide an impression of a few of the current frontiers pertaining to factor models and their applications. Though factor models have a long history, they have received considerable attention recently from both theoretical and empirical perspectives. This is in large part due to the increased availability of large data sets, particularly in macroeconomics and finance, and the need to summarize the wealth of information they offer in an efficient manner. In the context of a sensible model and when properly estimated, the factors do indeed provide a summary of the relevant information useful for the stated purpose. They play an important role in forecasting, but also in regression analysis to deal with cross-sectionally correlated errors and endogeneity bias, as well as in improving the performance of standard vector autoregressive models. Given this potential and the increased availability of large data sets, econometricians have recently tackled estimation and inference problems in both static and dynamic factor models. Important advances have been made related to estimating the number of common factors to be used and the distribution theory allowing cross-sectional dependence especially in large panel data models. Issues related to forecasting and impulse response analyses have also been at the forefront of the recent research agenda. The invited speakers for this special session, Serena Ng, M. Hashem Pesaran and Lucrezia Reichlin, have been leaders in this recent strand of research and we are grateful that they agreed to provide additional contributions. These cover an interesting array of topics: specification and application of hierarchical factor analysis, the modelling of cross-sectional dependence in large panels and their estimation, and the role of factors in improving macroeconomic forecasts.

C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society. Published by Blackwell Publishing Ltd, 9600

Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA, 02148, USA.

Cii

Perron et al.

The paper by Emanuel Moench and Serena Ng provides a study of the linkages between housing and consumption in the United States; more precisely it offers a quantitative assessment of the dynamic effect of housing shocks on retail sales. The starting point is a factor-augmented vector autoregressive model at both the national level and four subregions. Here the factors intend to summarize the relevant information about housing prices and sales from a large set of indicators commonly used. Some novel features are noteworthy. First, the factor analysis is hierarchical in the sense that variations in economic time series can be idiosyncratic, common to the series within a block (here the geographical regions), or common across blocks (the US in general). Second, special care in designing the estimation procedure is applied to deal with the fact that the data set is unbalanced, the series are not available with the same span nor at the same frequency. Data augmentation techniques are used in the context of a state space model estimated via a Gibbs sampling algorithm. Several interesting findings are obtained which have important policy implications: a drop in housing market activity does lead to a significant decline in consumption, interest rate cuts do stimulate housing activity and, despite large idiosyncratic variations, there is a national and regional component in each of the geographical regions, the latter being more pronounced in the West. The paper by Elena Angelini, Gonzalo Camba-Mendez, Domenico Giannone, Lucrezia Reichlin and Gerhard Rünstler deals with the short-term forecasts of Euro area GDP. The issue of interest is how to use monthly releases of many macroeconomic variables to construct an early estimate of the current quarterly value of GDP, more precisely a problem of now-casting instead of forecasting which is undoubtedly of great interest to central banks. The method proposed is to bridge quarterly GDP with monthly data via a regression involving factors extracted from a large panel of monthly series released at different times. The traditional so-called bridge equations are regressions of the variable of interest, here GDP, on a small set of pre-selected monthly indicators. This can be repeated for many sets of such monthly indicators and the forecast be some pooled estimate. An alternative method to exploit the large information in the monthly indicators is to combine the predictors in a few common factors and use those as the regressors. The authors show that the later indeed provides better now-casts of Euro area GDP. The paper by Alexander Chudik, M. Hashem Pesaran and Elisa Tosetti is a theoretical work related to estimation and inference in large panels with weak or strong cross-sectional dependence. The starting point is the distinction between strong and weak cross-sectional dependence which is made more general, in particular to allow some form of non-stationarity in the time series structures. The notions introduced are related to those of weak, strong and semi-strong common factors. It is those factors that represent the various forms of crosssectional dependence. It is shown that such general linear dependence can be modelled in terms of a fixed number of strong factors and a large number of weak factors. A central result is that the Common Correlated Effect (CCE) estimator proposed earlier by M. Hashem Pesaran remains consistent and asymptotically normal under some conditions of the loadings including cases where methods relying on principal components fail. This theoretical result is further substantiated via simulations that show the method proposed to outperform other popular procedures. The main reason is that the traditional principal components-based method fails in the presence of weak or semi-strong factors while the CCE estimator, by not aiming at consistent estimation of the factors, effectively deals with the dependence via cross-section averages, thereby allowing consistent estimation of the slope parameters of interest. We believe that these three papers should provide readers with a view of the importance of factor models for exploiting the information in large data sets that have become increasingly

C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

Special Issue on Factor Models

Ciii

available. The extent of the potential applications possible have been the result of an intense and fruitful recent and ongoing research in theoretical econometrics. We would like to take this opportunity to thank all the authors for responding to our request for a contribution to this Special Issue on Factor Models: Theoretical and Applied Perspectives with these three timely papers. Especial appreciation is owed to the referees listed below of the three aforementioned papers. Without their assistance, this Special Issue would have not been possible. C. M. Dahl M. Hallin M. Lippi

D. Giannone G. Kapetanios V. Solo

Pierre Perron (Co-Editor) The Econometrics Journal Department of Economics Boston University 270 Bay State Road Boston, MA 02458, USA Richard J. Smith Managing Editor The Econometrics Journal Faculty of Economics University of Cambridge Austin Robinson Building Cambridge CB3 9DD UK


The

Econometrics Journal Econometrics Journal (2011), volume 14, pp. C1–C24. doi: 10.1111/j.1368-423X.2010.00319.x

A hierarchical factor analysis of U.S. housing market dynamics E MANUEL M OENCH † AND S ERENA N G ‡ †

Federal Reserve Bank of New York, 33 Liberty St., New York, NY 10045, USA. E-mail: [email protected] ‡

Columbia University, 420 W. 118 St., MC 3308, New York, NY 10025, USA. E-mail: [email protected] First version received: July 2009; final version accepted: March 2010

Summary This paper studies the linkages between housing and consumption in the United States taking into account regional variation. We estimate national and regional housing factors from a comprehensive set of U.S. price and quantity data available at mixed frequencies and over different time spans. Our housing factors pick up the common components in the data and are less affected by the idiosyncratic noise in individual series. This allows us to get more reliable estimates of the consumption effects of housing market shocks. We find that shocks at the national level have large cumulative effects on retail sales in all regions. Though the effects of regional shocks are smaller, they are also significant. We analyse the driving forces of housing market activity by means of factor-augmented vector autoregressions. Our results show that lowering mortgage rates has a larger effect than a similar reduction of the federal funds rate. Moreover, lower consumer confidence and stock prices can slow the recovery in the housing market. Keywords: FAVAR, Hierarchical factor models, Mixed sampling frequency, Missing values.

1. INTRODUCTION This paper provides a quantitative assessment of the dynamic effects of housing shocks on retail sales. The econometric exercise consists of estimating national and regional housing factors from large non-balanced panels of data. The economic analysis consists of studying the dynamic response of national as well as regional retail sales to housing market shocks, and assessing the sensitivity of the housing factor to economic conditions and stimulus. As a by-product, we obtain estimates of ‘house price factors’ that summarize the common information in the different published house price indicators. Our approach has three features. First, we make a distinction between ‘national’ and ‘regional’ housing markets in recognition of the fact that not all variations in housing are pervasive. Second, we consider shocks to the housing ‘market’ as opposed to shocks to home ‘prices’ only. The analysis thus makes extensive use of housing data rather than relying on a single measure of house prices. Third, we use diverse measures of price and volume to determine the (latent) state of the housing market. The non-balanced panel of data covers series sampled at different frequencies and that are available over different time spans. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society. Published by Blackwell Publishing Ltd, 9600


C2

E. Moench and S. Ng

Our analysis consists of two steps. We first use a dynamic hierarchical (multi-level) factor model to disentangle information on the housing market into national, regional and series-specific components. For each region, we embed the estimated national and regional housing factors along with other variables that control for the effects of regional business cycles into factoraugmented vector autoregressions (FAVAR). An analysis of the impulse responses then allows us to study the propagation mechanism of regional and national housing shocks. Several considerations motivate our analysis. First, over the last few years, the U.S. housing market has experienced an extended period of expansion followed by an abrupt and pronounced downturn. Newspaper articles and the media often suggest that a housing boom stimulates, while a housing bust slows non-housing activity, in particular consumption. For example, reporting on a speech made by Federal Reserve Chairman Greenspan, the Los Angeles Times (March 9, 1999) wrote, ‘Capital generated by a booming housing market have probably spurred consumer spending and given a strong boost to the U.S. Economy’. In its October 16, 2006, issue, Newsweek magazine wrote that, ‘if home prices drop too much, the damage to consumer confidence and spending won’t be easily offset’. Although two-thirds of U.S. households are homeowners, evidence on the macroeconomic effects of housing shocks have been limited to point estimates from micro data for certain demographic groups or short panels, neither of which are ideal for studying the dynamic response to housing shocks at the aggregate level. There are few VARs estimated for consumption and housing at the national level and even fewer at the regional level because it is difficult to find housing and consumption data that are available for a long enough period to make estimation of a VAR suitable. We circumvent this problem by not restricting ourselves to house price data alone. We also do not restrict ourselves to data sampled at the same frequency. Instead, we extract common housing factors at the national and regional level from data on house prices and housing market quantities. We then use FAVARs to trace out the aggregate effects of housing shocks without fully specifying the structure of the housing market. Second, the notion of a ‘U.S. housing market’ disregards the fact that there is substantial variation in housing activity across markets, a point also raised by Calomiris et al. (2008). Indeed, of the four regions defined by the Census Bureau, the Northeast (including New York and Massachusetts) and the West (including California, Arizona and Nevada) have historically had more active housing markets than the South (including Texas, Florida and Virginia) and the Midwest regions (including Illinois, Ohio and Michigan). Consumers in regions not used to large variations in the housing market might respond differently from those that are accustomed to housing cycles. Consumption responses might also depend on regional business cycle conditions. It is very much an empirical matter whether regional differences in the consumption response to housing shocks exist. Third, there exist numerous measures of house prices, including the well-known Case–Shiller price index, the indices published by the Federal Housing Finance Administration (FHFA, formerly the Office of Federal Housing Enterprise Oversight or OFHEO), and indices published by the National Association of Realtors (NAR). Some series are available for time spans longer than 30 years, while others are available for a little over a decade. Calomiris et al. (2008) note that many issues surrounding empirical estimates of the wealth effect of housing relate to the definition of the house price. We address this problem by estimating a house price factor that extracts the common variations underlying all indicators of house prices, thereby filtering out idiosyncratic noise. While price is a key indicator of the state of the housing market, data on the volume of transactions are also available. Leamer (2007) argues that the ‘volume cycle’ rather than the C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

A hierarchical factor analysis of U.S. housing market dynamics

C3

‘price cycle’ is what makes housing important in U.S. business cycles. Some regional markets may be more prone to high volatility in house prices and to high volume of transactions than others. In order to assess how the activities in the housing ‘market’ (as opposed to house prices) affect consumption, we construct broad-based measures of the state of the housing markets at both the regional and the national level. In our analysis, this is handled using a hierarchical factor model framework. Our results can be summarized as follows. First, we find that the national and regional factors are of comparable order of importance in three of the four regions, but regional variations are much more important in the West. Second, there is a marked difference between the house price and the housing market factor in the Midwest and the South. Third, retail consumption in all regions responds positively to a national housing shock with the peak occurring about 15 months after the shock. A two-standard-deviation shock in the housing factor can reduce aggregate consumption by as much as 8%, all else equal. However, the consumption responses are largely driven by responses to house price shocks. Shocks to volume have significantly smaller effects. A FAVAR in variables that might affect the housing factor indicates that interest rate cuts will stimulate housing activity, with reductions in mortgage rates potentially having a larger impact than similar cuts in the fed funds rate. Moreover, consumer confidence and stock prices both have a significant effect on housing market activity.

2. THE DATA We assemble a data set that covers the four census regions Northeast, Midwest, West and South. We also have aggregate measures of housing activity for the United States, which we will refer to as ‘national’ data. We combine information from various sources, including federal agencies and private institutions. Instead of focusing on house price indicators alone, our data set consists of both price and volume information. This allows us to capture the different dimensions of the housing market. The data are further sampled at different frequencies: some series are monthly, and some are quarterly. Moreover, some series start as early as 1963, while some are not available until 1990. We thus face an unbalanced panel. We take January 1973 to be the beginning of our sample, and the last data entry is May 2008. The key series are listed in Tables 1 and 2. To motivate the analysis to follow, it helps to have a quick review of the data used. As house price is the key indicator of housing wealth, most studies quite naturally use some measure of house prices to study the effect of housing market shocks on real economic activity. However, there exist several indicators of house prices published by various data providers, each employing different data sources and aggregation methods. While these house price series are correlated, each has some distinctive features. Ideally, one would therefore like to use a genuine measure of price movement. In this paper, this is taken to be variations that are common to all observed price measures. In general terms, the literature distinguishes between three main ways to measure house prices. As discussed in Rappaport (2007), the simplest method computes the average or median of house prices observed in a period. This ‘simple approach’ can be volatile due to a changing composition of high and low priced units. The ‘repeat sales method’ focuses on houses that have been sold more than once. It provides a price index and does not measure the price level itself. Furthermore, the number of repeat transactions can be small relative to total transactions, and it is subject to continual revisions. The third is the ‘hedonic approach’ which uses statistical methods to control for differences in quality. As a general matter, price measures based on repeat C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C4

Series

E. Moench and S. Ng Table 1. Regional data. Source

Frequency

First obs.

TFCode

Price data Median Sales Price of Single-Family Existing Homes NAR

mly

Jan1968

2

Single-Family Median Home Sales Price Average Existing Home Prices Average New Home Prices

CENSUS NAR NAR

qly qly qly

Q1-1968 Q1-1970 Q1-1970

2 2 2

CMHPI FHFA Purchase-Only Index

FHLMC FHFA

qly mly

Q1-1970 Jan1986

2 2

FHFA Volume data CENSUS

qly

Q1-1970

2

FHFA Home Prices New One-Family Houses Sold

mly

Jan1968

1

New One-Family Houses For Sale Single-Family Housing Units under Construction

CENSUS CENSUS

mly mly

Jan1968 Jan1980

1 1

Multi-Family Units under Construction Homeownership Rate Homeowner Vacancy Rate

CENSUS CENSUS CENSUS

mly qly qly

Jan1980 Q1-1968 Q1-1968

1 1 1

Rental Vacancy Rate

CENSUS

qly

Q1-1968

1

Note: This table reports the regional housing data used in the estimation. All variables are available for each of the four Census regions ‘Northeast’, ‘MidWest’, ‘South’ and ‘West’. The column ‘TFCode’ documents the transformations applied to the raw series prior to estimation. TFCode = 1 refers to annual growth rates, TFCode = 2 refers to annual differences of annual growth rates.

transactions are often thought to give more precise estimates of house price appreciation. Prices subject to compositional effects are believed to be better at measuring the amount required to purchase housing than at estimating the rate of house price changes. The Federal Housing Finance Administration (FHFA) indices include only homes with mortgages that conform to Freddie Mac and Fannie Mae guidelines. Data are available at the national, regional and state levels, as well as for the major metropolitan areas. They are based on transactions and appraisals, and are then adjusted for appraisal bias. The FHFA also publishes a purchase-only index that excludes refinancing. These indices equally weight prices regardless of the value of the house. The coverage of the indices is broad because Freddie Mac and Fannie Mae provide loans throughout the country. However, the so-called jumbo loans over $417,000 are not included. The S&P/Case–Shiller home price indices, published by Fiserv Inc., are based on information from county assessor and recorder offices. The index started with data from 10 cities in 1987 but was extended to cover 20 cities in 2000. The Case–Shiller indices do not use data from 13 states and have incomplete coverage for 29 states. Compared to the FHFA, the Case–Shiller indices thus have a narrower geographical coverage. However, homes purchased with subprime and other unconventional loans are included in the indices. As they cover defaults, foreclosures and forced sales, these indices show more volatility than the FHFA indices. Note also that the Case–Shiller indices are value weighted and hence give more weight to higher priced homes. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C5


Series

Table 2. National data. Source

Frequency

First obs.

TFCode

Price data Median Sales Price of Single-Family Existing Homes NAR

mly

Jan1968

2

Median Sales Price of Single-Family New Homes Single-Family Median Home Sales Price Average Existing Home Prices

CENSUS CENSUS NAR

mly qly mly

Jan1968 Q1-1968 Jan1994

2 2 2

Average New Home Prices S&P/Case–Shiller Home Price Index

CENSUS S&P

mly qly

Jan1970 Q1-1982

2 2

CMHPI FHFA Purchase-Only Index FHFA Home Prices

FHLMC FHFA FHFA

qly mly qly

Q1-1968 Jan1986 Q1-1970

2 2 2

Volume data CENSUS

mly

Jan1968

1

New One-Family Houses For Sale

Housing Units Authorized by Permit: One-Unit Multi-Family Units under Construction Multi-Family Permits United States


mly mly mly

Jan1968 Jan1968 Jan1968

1 1 1

Multi-Family Starts United States Multi-Family Completions

CENSUS CENSUS

mly mly

Jan1968 Jan1968

1 1

Homeowner Vacancy Rate Homeownership Rate Rental Vacancy Rate


qly qly qly

Q1-1968 Q1-1968 Q1-1968

1 1 1

Note: This table reports the national housing data used in the estimation. The column ‘TFCode’ documents the transformations applied to the raw series prior to estimation. TFCode = 1 refers to annual growth rates, TFCode = 2 refers to annual differences of annual growth rates.

The FHFA and the Case–Shiller indices are both based on repeat sales. In contrast, the NAR report the mean/median purchase prices of homes directly. The NAR represents real estate professionals and has close to 2000 local associations and boards offering multiple listing services. The NAR surveys a fixed subset of its associations. Based on reported transactions from the sample, the NAR calculates a median price for each of the four Census Bureau regions. The national price is then taken as a weighted average of the regional medians. The NAR price indices can be volatile due to compositional changes. An increase in the difference between high priced relative to low priced units will increase the regional and hence the national median. The NAR indices are, however, available for each region on a monthly basis over a long time period. The Bureau of Census publishes several house price series. A monthly national series is available since 1963, but the regional data are available only quarterly. The Census also provides an average price of new homes of constant quality from 1977 onwards on a quarterly basis, both for the United States and for the four regions. The indices are based on a monthly survey of residential construction activity for single-family homes. These indices are also subject to compositional effects that might arise from the sales sample rather than any true changes in C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C6

E. Moench and S. Ng

price. The Census Bureau also publishes an index of one-family homes sold based on the hedonic approach. The Conventional Mortgage Home Price Index (CMHPI) is provided by Freddie Mac. It is calculated on a quarterly basis at both the national and regional level from 1975 onwards. The index is based on conventional conforming mortgages for single-unit residential houses that were purchased or securitized by Freddie Mac or Fannie Mae. The CMHPI overlaps with the FHFA series to some extent. We are primarily interested in the common variations that underlie these series. In addition to prices, volume data on transactions and turnovers are also informative about the level of housing activity. Dieleman et al. (2000) found that demographic changes are largely responsible for turnovers in the housing market, three-quarters of which are generated by renters. In contrast, house prices are mainly affected by household income. In a frictionless world, prices adjust and sales occur instantly after a shock. But frictions in the housing market might prevent house prices from adjusting, which slows turnover. Stein (1995) observed a positive contemporaneous correlation between changes in house prices and sales and that there is more intense trading activity in rising markets than in falling markets. He suggests that downpayment and other borrowing constraints might be responsible for market frictions. Case and Shiller (1989) argue that the rational response in a falling market is for a homeowner to hold on to his/her investment in anticipation of higher future returns. Berkovec and Goodman (1996) suggest that transactions might act as a forward indicator of price changes. As Stein (1995) suggests, if an initial shock knocks prices down, the loss on existing homes could undermine the ability of would-be movers to make downpayments on new homes. This lack of demand could further depress prices. The transactions data used in our factor analysis include new and existing single-family homes under construction, sold and for sale, as well as data on employment in the construction sector. Additionally, the Census bureau publishes data on homeowner vacancy rates, homeownership rates, as well as the rental vacancy rate. These latter indicators are informative about the tightness of the prevailing housing markets. It is also useful to discuss data that are available but are not used in our factor analysis. The BLS publishes data on housing starts, permits as well as rent. Housing starts and permits are informative about the future as opposed to the present market conditions. We do not use these variables in the factor analysis as our model only allows for variables to load on contemporaneous and lagged factor observations. The rent data are based on the Consumer Expenditure Survey of which two-thirds of the sample are homeowners. To the extent that rent captures the capitalized value of housing, it provides a measure of the fundamental instead of the market value of houses. During periods of speculative housing booms, rents and house prices can diverge quite substantially. However, rent is regulated in many areas. We also have data on prices of mobile homes. These data are not used in our analysis, which focuses on single-unit and multi-family housing. Finally, we note that all price variables are deflated by the (all items) CPI to control for increases in the overall price level. The NAR data are not seasonally adjusted. We use the X11 seasonal filter in Eviews to adjust these series. In all cases, we first annualize the data by taking year-to-year differences of the log level of the series. This means that for monthly data, we take the log difference of a series over a 12-month period. For a quarterly series, we take the log difference over four quarters. Since many series remain non-stationary, we transform the series into annual differences of the annual growth rates before estimating the factors. The effective sample is thus January 1975 to May 2008. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


C7

3. ECONOMETRIC METHODOLOGY Our econometric framework is set up with three issues in mind. First, shocks to the housing sector need not be the same as shocks to house prices. Second, there are substantial variations in housing market conditions across regions. Third, we have non-balanced panels of data. To deal with the aforementioned issues, we use an extension of the ‘Dynamic Hierarchical Factor Model’ framework developed in Moench et al. (2009). In such a model, variations in an economic time series can be idiosyncratic, common to the series within a block, or common across blocks. Here, we treat a block (identified as b) as one of the four major geographical regions, the Northeast (NE), Midwest (M), South (S) and West (W). More precisely, we posit that for each b = NE, M, S and W, we observe Nb housing indicators which have zero mean and unit variance. The data, stacked in the vector Xbt , have a factor representation given by Xbt = Gb (L)Gbt + eXbt ,

(3.1)

where Gb (L) is an Nb × kGb matrix polynomial in L of order sGb . According to the model, the housing indicators in a block are driven by a set of kGb regional factors denoted Gbt , and idiosyncratic components eXbt . Stacking up Gbt across regions yields the KG × 1 vector Gt = (G1t G2t . . . GBt ) . Observed indicators of the national housing market are stacked into a KY × 1 vector Yt . At the national level, we assume that Gt eGt = F (L)Ft + , (3.2) Yt eY t where KF factors, collected into the vector Ft , capture the comovement common to all regional factors, and where F (L) is a (KG + KY ) × KF matrix polynomial of order sF . In the housing application under consideration, AR(1) dynamics are assumed throughout. Thus, Ft = F Ft−1 + F t ,

(3.3)

eGbt = G.b eGbt−1 + Gbt ,

(3.4)

eYj t = Y .j eYj t−1 + Yj t ,

(3.5)

eXbt = X.b eXbt−1 + Xbt ,

(3.6)

where F is a diagonal KF × KF matrix, G.b is a diagonal kGb × kGb matrix, Y .j is a diagonal KY × KY matrix and X.b is a diagonal Nb × Nb matrix. Furthermore, k = 1, . . . , KF , F kt ∼ N 0, σF2 k , 2 Gbj t ∼ N 0, σGbj , j = 1, . . . , kGb , 2 , j = 1, . . . , KY , Yj t ∼ N 0, σYj 2 i = 1, . . . , Nb . Xbit ∼ N 0, σXbi Equations (3.1) and (3.2) constitute a three-level factor model—the level-one variations are due to Xbit , the level-two variations are due to Gbt and the level-three variations are due to F t . To C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C8

E. Moench and S. Ng

identify the sign of the factors and loadings separately, we set the upper-left element of F (0) to one, and for b = 1, . . . , B, we also set the upper-left element of G (0) equal to one.1 Our model can be seen as having two sub-models, each with a state space representation. Specifically, if Gbt was observed, (3.2) and (3.3) is a standard dynamic linear model where the latent factor is Ft . Then equations (3.1), (3.4) and (3.6) constitute the second dynamic linear model where the latent vector is Gt . It is in principle possible to use variables that are only available at the national level to estimate Ft . However, to the extent that Gbt are correlated across regions, they also convey information about Ft which we exploit in the estimation. The three-level model implies that Xbt = Gb (L)F b (L)Ft + Gb (L)eGbt + eXbt , where F b is the sub-block of F corresponding to block b. This is in contrast to a two-level factor model that consists of only a common and an idiosyncratic component. Omitting variations at the block level amounts to lumping Gb (L)eGbt with eXbt in the estimation of F. This can result in an imprecise estimation of the common factor space if variations in eGbt are large. Moreover, explicitly specifying the block structure facilitates interpretation of the factors. In our setup, the regional factors contain information about the state of the housing sector at the national level. A different formulation of regional effects is a model specified as Xbit = bi Ft + cbi eGbt + ebit . Fu (2007) uses such a model to decompose house prices in 62 U.S. metropolitan areas into national, regional and metro-specific idiosyncratic factors using quarterly FHFA price data from 1980 to 2005. A similar model was also used by Del Negro and Otrok (2005) to estimate the common component of quarterly FHFA price data from 1986 to 2005. Stock and Watson (2008) use a variant of this model to analyse national and regional factors in housing construction. Kose et al. (2008) use it to study international business cycle comovements. Our model is more restrictive in that the responses of shocks to Ft for all variables in block b can only differ to the extent that their exposure to the block-level factors differs. However, the additional structure we impose makes the model more parsimonious, and it is easy to accommodate observed aggregate indices Yt in estimating Ft . Numerous methods are available to estimate two-level factor models. For models with a few numbers of series, maximum likelihood is widely used. For large dimensional factor models, the method of principal components is popular. The factor model used in the present study and introduced in Moench et al. (2009) is a multi-level extension of the simple two-level factor model considered in various previous contributions. We use Markov chain Monte Carlo (MCMC) methods (specifically a Gibbs sampling algorithm) to estimate the posterior distribution of the parameters of interest and the latent factors. Unlike the method of principal components, estimation via Gibbs sampling requires parametric specification of the innovation processes. However, one practical advantage of MCMC is that credible regions can be conveniently computed. In contrast, there exists no inferential theory for multi-level models estimated by the method of principal components.

1 In the more general case with multiple factors at both levels, the k Gb regional factors could, for example, be identified by requiring that for each b, Gb (0) is a lower triangular matrix with diagonal elements of unity. Similarly, the KF factors could be identified by requiring that F (0) is lower triangular, again with ones on the diagonal. In such a case of multiple factors, the ordering of the variables might potentially have an impact on the factor estimates.



C9

The problem considered here is non-standard because not every series on the housing market is available on a monthly basis. Aruoba et al. (2008) also consider estimation of latent factors when the data are sampled at mixed frequencies. However, we have the additional problem that not all our data series are available over the same time span. For example, house price data are available from NAR since 1970, from the FHFA at the regional level since 1975, and the Case–Shiller index is available since 1987. The first problem that not all data are available on a monthly basis is easily handled in the Bayesian framework using data augmentation techniques. Suppose for now that we have data over the entire sample for Xbt , but it is only available on a quarterly instead of a monthly basis. The monthly value Xbt when t does not correspond to a month during which new data are released has conditional mean Xbt|t−1 = Gb (L)Gbt + eXbt|t−1 , where eXbt|t−1 = X.b eXbt−1 . A monthly observation of Xbt with conditional mean Xbt|t−1 and variance σXb is obtained by taking a draw from the normal distribution with this property. Similarly, if the mth aggregate indicator is available on a quarterly basis, we make use of the fact that Yt,m = F m (L)Ft + eY t,m . Conditional on Ft , F m and Y .m , the monthly value of Yt,m when data are not observed at time t can be drawn from the normal distribution with mean F m (L)Ft + eY t,m|t−1 and variance σY2m , where eY t,m is an autoregressive process with parameters Y m . As for the second problem that some data are missing for the early part of the sample, assume for the sake of discussion that the data (when they are available) come on a monthly basis. For the sub-sample over which data are not available, we fill the data with the value ‘NaN’. In a state space framework, these values contain no new information and contribute zero to the Kalman + be the subset of Xbt for which data are available at time t. If Xbt gain. To implement this, let Xbt + + is Nb × 1, and Xbt is Nbt × 1, we work with the measurement equation: + + = + Xbt Gb (L)Gbt + eXbt , + ) is an Nbt+ × Nbt+ matrix. Equivalently, let Wt be an Nbt+ × Nb selection matrix so where var(eXbt + that Xbt = Wt Xbt is the Nbt+ × 1 vector of variables that contain new information at time t. The measurement equation in terms of Xbt (which contains missing values) is

Wt Xbt = Wt Gb (L)Gbt + Wt eXbt .

(3.7)

This is equivalent to using the entire T × Nb matrix Xb , which is padded with zeros when missing values are encountered, and then setting the Kalman gain to zero. The third problem which makes our MCMC algorithm non-standard relates to the fact that Gbt conveys information about Ft . More precisely, Gb (L)Gbt = αF .bt + Gbt ,

(3.8)

where αF .bt = G.b (L)F (L)Ft depends on t. Given a draw of Ft , this can be interpreted as a time-varying intercept that is known for all t. By conditioning on Ft , our updating and smoothing equations for Gt explicitly take into account the information carried by Ft . Summarizing, when a data series is unavailable for part of the sample, they are ‘zeroed out’ in the measurement equation. When a series is quarterly instead of monthly, then over the sample for C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C10

E. Moench and S. Ng

which the data are available, we use data augmentation techniques to draw the monthly values. In Moench et al. (2009), we show that a simple extension of the algorithm in Carter and Kohn (1994) allows estimation of three-level models that takes into account the dependence of Gb on F. In this paper, we further modify the algorithm to accommodate the first two problems. Precisely, denote the observed national indicators Yt and the observed regional indicators Xbt , b = 1, . . . , B. Let 2 2 Xb = diag σXb1 , . . . , σXbN , b 2 2 Gb = diag σGb1 , . . . , σGbkb , Y = diag σY21 , . . . , σY2KY , F = diag σF2 1 , . . . , σF2 kF . These matrices are of dimension Nb × Nb , kGb × kGb and KF × KF , respectively. Collect {G1 , . . . , GB } and F into , {X1 , . . . , XB }, {G1 , . . . , GB }, Y and F into , and {X1 , . . . , XB }, {G1 , . . . , GB }, Y , and F into . We first use data available for the entire sample to construct principal components. These are used to initialize {Gbt } and {Ft }. Based on these estimates of the factors, initial values for Gb , b , Gb and F are obtained. Each iteration of the sampler then consists of the following steps: (1) Conditional on , , , {Gt } and {Yt }, draw {Ft }. (2) Conditional on {Ft }, draw F , F and F . (3) For each b, conditional on , , and {Ft }, draw {Gbt } taking into account time-varying intercepts. (4) For each b, conditional on {Gbt } and {Yt }, draw Gb and Gb . 2 . (5) For each b, conditional on {Gbt }, draw Gbi . Also draw Xbi and σXbi (6) Data augmentation: (i) For each b and conditional on {Gbt } and the parameters of the model, sample monthly values for elements of {Xbt } that are observed at lower frequencies. (ii) Conditional on {Ft } and the parameters of the model, sample monthly values for those {Yt } that are observed at lower frequencies. (iii) Draw Y using the augmented data vector for {Yt }. We assume normal priors centred around zero and with precision equal to 1 for elements of and , and inverse gamma priors with parameters 4 and 0.01 for elements of .2 Given conjugacy, Gb , F , Xbi , Gb and F in steps 4 and 5 are simply draws from the normal 2 distributions whose posterior means and variances are straightforward to compute. Similarly, σGb 2 and σXbi are draws from the inverse chi-square distribution. Notice that the model for (Gbt , Yt ) is linear in Ft and it is Gaussian. We can therefore run the Kalman filter forward to obtain the conditional mean of Ft at time T and the corresponding conditional variance. We then draw FT from its conditional distribution, which is normal, and proceed backwards to generate draws Ft|T for t = T − 1, . . . , 1 using the algorithm suggested by Carter and Kohn (1994) and detailed in Kim and Nelson (2000). Draws of {Gbt } can be obtained in a similar manner, as the model for Xbt is linear in Gbt and is Gaussian. This basic algorithm is modified to deal with a time-varying intercept in the transition equation for Gbt and missing values as discussed earlier.

2 We use the equivalence of the inverse gamma and scale-inverse χ2 distribution in our procedure and effectively sample variance parameters based on the χ2 distribution. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C11


3.1. Estimates of Ft and Gbt Our sample starts in 1975:01 and ends in 2008:05. The base case price factors are estimated from seven series for each of the four regions, plus nine national price series. The base case housing market factors are estimated from 14 price and volume series for each of the four regions, plus 17 national series. The model has a number of parameters that we need to specify. The final model assumes kGb = 1 for all b and KF = 1.3 As discussed earlier, we set the upper-left element of the factor loading matrices F (0) and the four Gb (0) equal to one in order to separately identify the signs of the factors and factor loadings. We order the NAR’s ‘Median Sales Price of Single Family Existing Homes’ first both at the national and at the regional level. This implies that our estimated factors will have a positive correlation with the corresponding NAR price indices, respectively. As stated earlier, we assume eXbi , eGb , eYj and eF to be AR(1) processes. Moreover, we let sGb = sF = 2 so that the factors at both the regional and the national level are allowed to have a lagged impact of up to two periods on the respective observed variables. This allows us to accommodate lead–lag relationships between the housing cycles across the different regions and the nation as a whole. We begin with 20,000 burn in draws. We then save every 50th of the remaining 50,000 draws. These 1000 draws are used to compute posterior means and standard deviations of the factors and parameters. Table 3. Estimates of and σF : house price model. var(Fp ) F s.e.

Fp

0.639

0.942 Gb

σF2

s.e.

0.054

0.020

0.011

s.e.

2 σGb

s.e.

Gp,b

var(Gp,b )

NE W MW

1.012 2.066 0.524

0.633 0.744 0.187

0.239 1.058 0.107

0.079 0.126 0.280

0.050 1.300 0.229

South

0.074

−0.018

0.002

0.079

0.000

Decomposition of variance: F

s.e.

Gb

s.e.

Xb

s.e.

NE

0.340

0.087

0.239

0.053

0.421

0.051

W MW South

0.247 0.341 0.241

0.083 0.080 0.070

0.271 0.159 0.083

0.060 0.058 0.021

0.482 0.500 0.677

0.056 0.075 0.061

Table 3 reports estimates of the dynamic parameters and the variance of shocks to the common, regional and series-specific components. The unconditional variance of Fp and Gp,b are denoted var(Fp ) and var(Gp.b ), while the variances of Fp and G.b are denoted σF2p and σG2 p .b , respectively. The common price factor, Fp , is more persistent but has a smaller variance than the regional factors, Gp,b . Of the four regional factors, the West is the most persistent. Even though 3

We considered kGb = KF = 2, but the additional factors tend to have little variability and were subsequently dropped.


C12

E. Moench and S. Ng Northeast

West

3

6

2

4

1

2

0

0

−1

−2

−2

−4

F

p

−3 −4 1970

1980

F Gp

−6

Gp 1990

2000

2010

−8 1970

1980

Midwest 2

1

1

0

0

−1

−1 Fp

1990

2000

2010

G

p

1980

2010

Fp

−2

G −3 1970

2000

South

2

−2

1990

p

2000

2010

−3 1970

1980

1990

Figure 1. National and regional house price factors.

house price shocks in the Midwest are larger, the house price factor in the West has a larger unconditional variance once persistence is taken into account. Figure 1 presents the estimated posterior mean of the national (Fp ) versus the regional house price factors (Gp ) for each of the four Census regions. The sample period is 1975:01–2008:05. The standard deviation of Fp is 0.636. The national factor (solid line) is notably smoother than the regional factors (dash-dotted line). The Northeast experienced housing busts in the early 1980s and the late 1980s that were much more pronounced than the national market. However, throughout the 1990s, the Northeast market is stronger than the national market. The West experienced a sharp decline in house prices in the mid-1970s and again in the early 1990s. These variations are larger than what was recorded for other periods, or in any of the other regions. Because of these two episodes, the Gp for the West has a standard deviation of 2.066, much larger than the 1.012 observed for the Northeast. The Midwest and the South have more tranquil housing markets. The standard deviation of Gp,b are 0.515 and 0.078, respectively. Table 3 also reports a decomposition of variance in the series used to estimate the factors. We find that shocks to the national house price factor, Fp , account for 34%, 24.7%, 34.1% and 24.1% of house price variations in the four regions, respectively. Shocks to the regional factor Gp have a share of 23.9% in the Northeast, 27.1% in the West, 15.9% in the Midwest and 8.3% C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C13

A hierarchical factor analysis of U.S. housing market dynamics NAR

CMHPI

4

3 2

2

1 0

0

−2

−1 −2

−4

F

p

−6 1970

1980

1990

2000

2010

−4 1970

p

Freddie−Mac 1980

2

1

1

0

0

−1

−1

−2

−2 F

1980

1990

2000

2010

F

−3

p

p

FHFA −4 1970

1990 Case−Shiller

FHFA 2

−3

F

−3

BLS

Case−Shiller 2000

2010

−4 1970

1980

1990

2000

2010

Figure 2. National house price factor and leading house price indices.

in the South. Series-specific shocks account for the remaining 42.1%, 48.2%, 50% and 67.7% of the variation in the regional house prices series. Notably, the factor structure is strongest in the Northeast and is weakest in the South. Figure 2 plots the estimated posterior mean of the national house price factor (Fp ) versus four leading national house price indices. ‘NAR’ is the Median Sales Price of Single Family Existing Homes from the NAR; ‘CMHPI’ is the CMHPI from Freddie Mac; ‘FHFA’ is the Purchase-Only House Price Index from the Federal Housing Finance Administration (formerly OFHEO); ‘Case–Shiller’ is the S&P/Case–Shiller Home Price Index published by Fiserv Inc. While the NAR series is available at the monthly frequency, the latter three indices are only available quarterly. Our estimated house price factor is a monthly time series. The sample period is 1975:01–2008:05. As expected, Fp is somewhat smoother than the individual price series because Fp is essentially a weighted average of all price indices. The Census monthly price index has a correlation of 0.67 with Fp , while the Conventional Mortgage Home (quarterly) price index has a correlation with Fp of 0.93. Notably, both series are more volatile than Fp . The monthly FHFA series that is available since 1986 is also highly correlated with Fp : computed over the sample for which the two series overlap, the correlation coefficient is 0.99. The correlation between the C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C14

Fpq

E. Moench and S. Ng Table 4. Estimates of and σF : housing market model. var(Fpq ) F s.e. σF2

s.e.

0.554

0.896

0.069

0.028

0.020

Gpq,b

var(Gpq,b )

G

s.e.

2 σGb

s.e.

NE W MW

1.025 2.019 0.127

0.530 0.766 0.206

0.300 1.636 0.009

0.145 0.139 0.101

0.109 2.341 0.002

South

0.591

0.504

0.159

0.080

0.169

Decomposition of variance: F

s.e.

Gb

s.e.

Xb

s.e.

NE W

0.164 0.055

0.036 0.023

0.147 0.247

0.028 0.048

0.689 0.698

0.028 0.044

MW South

0.114 0.130

0.029 0.032

0.128 0.154

0.019 0.027

0.758 0.717

0.016 0.026

Case–Shiller index and Fp is 0.91. Notice, however, that there are significant differences between Fp and the indicators in recent years. The four price indices seem to show sharper declines in house prices than the house price factor which incorporates information from various series. One of our objectives is to investigate whether the house price cycle differs from the housing cycle, where the latter is defined based on data on prices as well as volume. Table 4 reports the posterior mean of the dynamic parameters and the variance of the shocks for the housing model. To distinguish them from the house price factor, Fp , we denote the national housing factor by Fpq and the regional housing factors by Gpq . The common factor is still highly persistent. While a regional factor is not evident in house prices in the South, the data on volume help to isolate this factor. A decomposition of variance of the housing market model reveals that national and regional shocks are equally important in the Northeast, the Midwest and the South, while regional shocks in the West are more important than the national shocks. However, the result that stands out is that idiosyncratic variation in the housing market data in all four regions are relatively more important than the common shocks and dominate the total variations in the data. Figure 3 shows the estimated posterior mean of the national housing factor, Fpq , using both price and volume data, and the estimated posterior mean of the national house price factor, Fp , exclusively based on home price data. The sample period is 1975:01–2008:05. At the national level, the correlation between the house price factor Fp and the housing market factor Fpq is 0.85. In spite of this strong correlation and as seen from Figure 3, there is a notable difference between the two series during peaks and troughs. The discrepancy has been especially pronounced since 2007. While Fp is −2.006 at the end of our sample in May 2008, almost four standard deviations below the mean, Fpq is −0.517, roughly one standard deviation below the mean. The drop in housing activity as indicated by Fpq , estimated using both house price and quantity information, is thus less severe. This section has focused on different measures of housing market activity, and several conclusions can be drawn. First, there is substantial regional variation in housing market activity with the regional component playing the largest role in the West. Second, our house price factor C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C15

A hierarchical factor analysis of U.S. housing market dynamics 1.5 F

pq

Fp

1

0.5

0

−0.5

−1

−1.5

−2

−2.5 1975

1980

1985

1990

1995

2000

2005

2010

Figure 3. National housing factor and national house price factor.

is highly correlated with each of the four widely used house price indices with the important difference that our Fp is smoother. Third, Gpq is generally similar to Gp except in the South. At the national level, Fp and Fpq are well synchronized, but the decline of Fpq since 2007 is much less pronounced than Fp . These observations suggest that there are important idiosyncratic movements in observed housing data. A particular house price series will not, in general, be representative of the true level of activity underlying the housing market. The more data we use to estimate the factors, the better we are able to ‘wash out’ the idiosyncratic noise. However, all indicators point to a sharp decline in housing market activity since 2007. This decline is pervasive and occurs at both the regional and national levels. We next investigate whether shocks to the housing market affect consumption.

4. HOUSING AND CONSUMPTION There exists little work on the regional aspect of housing variations. Using a dynamic Gordon model, Ng and Schaller (1998) find that regional housing bubbles have predictive power for future consumption. Campbell et al. (2008) find that housing premia are variable and forecastable and account for a significant fraction of the variation in the rent–price ratio at the national and regional levels. However, they do not assess the consumption effects of housing. One reason why there are so few estimates of the regional effects of housing is data limitation. Not only is it difficult to find regional housing data over a long time period, government statistical agencies do not publish consumption data at the regional level. We use regional retail sales data provided by the Census Bureau until 1997 and continued by the Bank of Tokyo-Mitsubishi (BTM) since then. These series are available monthly from 1970 onwards, both for each of the four Census regions, and also for the United States as a whole. This retail sales (which we will simply refer C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C16

E. Moench and S. Ng

to as consumption) series is not seasonally adjusted. We run it through the X11 filter in Eviews, and deflate by the all-items CPI. We then analyse the log annual difference of this seasonally adjusted, real retail sales series. 4.1. Estimates from FAVAR We are interested in quantifying the response of retail sales consumption to changes in regional and national housing market conditions. Retail sales, while representing only a sub-category of total consumption, have the advantage of being available for the main Census regions. We can therefore study the effects of housing market shocks on consumption both at the regional and the national level. The foregoing discussion suggests that a strong housing market will increase consumption of some but decrease the consumption of others. As those affected may have different propensities to consume, the aggregate effect of changes in housing market conditions on consumption is an empirical matter. Our analysis is based on factor-augmented vector autoregressions (FAVAR), a tool for analysing macroeconomic data popularized by Bernanke et al. (2005). While a conventional VAR is an autoregressive model for a vector of observed time series, a FAVAR augments the observed vector of variables by a small set of latent factors√often estimated by the method of principal components. Bai and Ng (2006) showed that if T /N → 0 as N , T → ∞, the estimated factors that enter the FAVAR can be treated as though they are observed. The method of principal components is not, however, well suited for the present analysis for two reasons. First, the number of series available for analysis is much smaller than the typical large dimensional analysis in which principal components is applied. Second, we have a non-balanced panel with data sampled at mixed frequencies. Both problems are more easily handled by Bayesian estimation. Accordingly, our FAVAR is based on Bayesian estimates of the factors. Our first set of FAVARs consist of five variables, respectively, Fpq , Gpq,b , U , Ub and RSb , where U is the national unemployment rate, Ub is the unemployment rate for region b, and RSb is the linearly detrended logarithm of real retail sales for region b. The variables Ub and U allow us to control for regional and aggregate business cycle conditions. We use the housing factors instead of the house price factors as these provide more comprehensive measures of the housing market. The regional unemployment rates are available only from 1976 onwards. Thus, for this exercise, the sample is 1976:01–2008:05. The standard deviations of the estimated factors used in the FAVAR are given in Table 4. We identify shocks to the housing market factor using a simple recursive identification scheme where the variables are ordered as they appear earlier. This identification implies that the national housing factor does not react to regional housing market shocks, regional and national unemployment shocks as well as regional consumption shocks within the same month. This assumption appears reasonable given that it usually takes at least a few weeks from the time a decision is made to purchase, sell or construct a home before an actual transaction is being made. Our identification also implies that regional retail sales can respond within the same month to both national and regional housing shocks as well as national and regional labour market shocks, as would be the case if households can adjust their consumption decisions rather quickly in response to various kinds of economic shocks. As for the particular ordering between the national and regional housing market factors, it also appears plausible to suppose that national housing market shocks may have an immediate impact on regional housing market dynamics, whereas the reverse does not hold.


C17

A hierarchical factor analysis of U.S. housing market dynamics −3

2.5

−3

Northeast

x 10

4

2

max:0.0016704

West

x 10

max:0.0027922

3

1.5 2

sum: 0.037251

1 0.5

sum: 0.05741

1

0 0 −0.5 −1

0

5

10

−3

2.5

15

20

25

30

−1

5

10

−3

Midwest

x 10

0

2.5

15

20

25

30

South

x 10

2

2

max:0.0017274

max:0.0016245 1.5

1.5 1 1

sum: 0.036607

sum: 0.04348 0.5

0.5 0 0 −0.5

−0.5

0

5

10

15

20

25

30

−1

0

5

10

15

20

25

30

Figure 4. Impulse responses of regional retail sales to national housing shocks.

The impulse response functions are obtained as follows. First recall that we saved 1000 of the 50,000 draws of Gpq,b and Fpq from the Gibbs sampler. For each draw of the regional and national housing factors, we estimate a five-variable FAVAR with two lags for each of the four regions. The estimated FAVARs are then used to obtain impulse response of regional retail sales to shocks in Fpq and Gpq,b . We also estimate a three-variable VAR in Fpq , U and RS to study the impulse response of aggregate retail sales to housing market factor shocks. Repeating this for each of the 1000 draws of Fpq and Gpq,b gives a set of impulse responses from which we can compute the posterior means and percentiles of the posterior distribution. Figure 4 reports the estimated posterior mean and 90% probability interval of impulse responses to a one-standard-deviation shock in F from the FAVARs discussed in Section 4.1. For each region b, these contain the following five variables: Fpq , Gpq,b , U , Ub and RSb , where Fpq and Gpq,b denote the national and regional housing factor, U and Ub the national and regional unemployment rate, and RSb the linearly detrended logarithm of real retail sales for region b. We identify shocks using a recursive identification scheme of the five variables ordered in the way they appear earlier. The FAVAR has two lags. The sample period is 1976:01–2008:05. The effect of a one-standard-deviation shock to F is positive in all four regions. The response of retail sales is hump shaped. The shock triggers a permanent increase in the level of retail sales. The effects, similar across regions, peak about 10 months after the shock. The cumulative effects C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C18

E. Moench and S. Ng −3

2.5

−3

Northeast

x 10

4

West

x 10

3.5

2

3

1.5 max:0.001072

1

max:0.0025398

2.5

sum: 0.028243

2

sum: 0.063725

0.5 1.5 0

1

−0.5

0.5

−1 −1.5

0 0

5

10

−3

2.5

15

20

25

30

−0.5

5

10

−3

Midwest

x 10

0

3

15

20

25

30

South

x 10

2.5

2

2 max:0.0015949

1.5

1.5

max:0.0011725 1

sum: 0.042596

1 0.5

0.5

sum: 0.026221 0

0 −0.5

−0.5 0

5

10

15

20

25

30

−1

0

5

10

15

20

25

30

Figure 5. Impulse responses of regional retail sales to regional housing shocks.

in the four regions over two and a half years are 0.037, 0.057, 0.043 and 0.037, respectively. Hence, according to our estimates, a shock to the national housing market may result in a longrun increase of regional consumption between 3.7% and 5.7% above its trend level, all else being equal. While the positive consumption effect would be consistent with the idea that homeowners take advantage of a hot housing market, trading down their homes to enjoy realized capital gains, a more likely interpretation is that the positive response is due to the collateral effect brought about by increased home equity. Figure 5 is constructed similarly to Figure 4 and graphs the impulse response to a standard deviation shock in Gpq . Because Fpq is also in the FAVAR and is ordered first, shocks to Fpq can trigger a contemporaneous response in the regional housing market factor, while the reverse is not true. The results can thus be interpreted as shocks to regional housing market activity that are orthogonal to national housing market shocks. As indicated by wider probability bands, the effects of shocks to Gpq are statistically less well determined than those to Fpq . While the effects at the peak are about the same as the response to a national housing factor shock, the cumulative effects of regional housing shocks are smaller and differ substantially across regions. The long-run effects are 0.028, 0.064, 0.026 and 0.043, respectively, implying an increase of regional consumption due to regional housing market shocks between 2.6% and 6.4% above its C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


C19

trend level, all else equal. Notably, the effects are largest in the West where variations in Gpq are also relatively more important. Even in this region, we find the effects of regional housing shocks on consumption to be less pronounced than the national shocks. In unreported results, we find that shocks to volume tend to reduce retail sales for a few months after the shock, and have no substantial long-run effects. Furthermore, the consumption response to Gpq also seems to be largely due to the response to Gp . Thus, the consumption responses we observe in Figures 4 and 5 are largely a consequence of shocks to house prices rather than housing volume. Given the heterogeneity in response across regions, what is the consumption response at the national level? To assess this question, we estimate a FAVAR in three variables: aggregate retail sales, RS, aggregate unemployment, U, and the national housing market factor, Fpq . We again identify shocks to the housing market factor using a recursive identification scheme where the ordering is as the variables appear earlier. The economic reasoning behind this approach follows the discussion earlier. Figure 6 displays the estimated posterior mean and 90% probability interval of impulse responses from the three-variable FAVAR discussed in Section 4.1. This contains national retail sales, RS, the national unemployment rate, U, and the national housing market factor, Fpq . We identify shocks using a recursive identification scheme of the three variables in the order they appear earlier. The FAVAR has two lags. The sample period is 1976:01–2008:05. The top panel in Figure 6 shows that the response of aggregate retail sales to a one-standarddeviation shock in Fpq is positive with a maximum effect occurring about 15 months after the shock, and a cumulative effect of 0.04 after 30 months. This implies that a one-standarddeviation shock may push aggregate retail sales 4% above their trend level. At the same time, unemployment falls by over 10 basis points as housing market activity increases. Figure 6 also shows how Fpq responds to its own shock. The response is gradual, and the half-life of the shock is about 10 months. We have presented results for counter-factual increases in housing market activity. A policy question of interest is the quantitative consumption effect as the national housing market contracts. Figure 4 implies that consumption is expected to fall immediately in all regions as a result of a housing market shock, all else equal. As noted earlier, the housing factor in May 2008 was −0.571, about one standard deviation below the mean. A one-standard-deviation Fpq shock in the FAVAR is about 0.25. A two-standard-deviation shock to Fpq can thus have a cumulative consumption effect on the West of 2 × 0.057, or about 11%, and about 8% in the other three regions. The consumption effect is much larger if we look at the house price factor, recalling that Fp is estimated to be −2.006 in May 2008, almost four standard deviations below the mean. Our results then suggest that at its worst (about 15 months after the shock), consumption can fall by 1.2% with an even higher cumulative effect than a shock to housing activity. The consumption effect thus depends on whether we think house prices alone reflect the state of the housing market, or whether volume information should be taken into account. 4.2. What affects the housing factor? The slumping U.S. housing market has been a deep concern for private citizens and policy makers alike. While as of May 2008, the last data point in our sample, our housing factor was only one standard deviation below average, housing market activities have further decelerated since. This raises the question of what might stimulate housing activity. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C20

E. Moench and S. Ng −3

2.5

x 10

2

max:0.0016075

1.5

sum: 0.039526

1 0.5 0 −0.5 −1

RS 0

5

10

15

20

25

30

0

U

−0.02 −0.04 −0.06 −0.08

max: −0.10921

−0.1

sum: −2.7031

−0.12 0 0.3

5

10

15

20

25

30

max:0.25605

F

0.2 0.1 sum: 1.3052

0 −0.1

0

5

10

15

20

25

30

Figure 6. Impulse responses of national retail sales to national housing shocks.

To address this question, we consider a FAVAR with six variables and four lags at the national level. These variables in the order they enter the FAVAR are the unemployment rate (U), the fed funds rate (FF), a 30-year effective mortgage rate (MR), our housing factor (Fpq ), the University of Michigan’s survey of consumer sentiment (MICH), and the log of the S&P 500 index (SP). The unemployment rate captures aggregate business cycle dynamics. The fed funds rate and the effective mortgage rate measure tightness in the money and loans market. The Michigan survey measures confidence for the economy, and the S&P 500 index is a proxy for changes in financial wealth. Arguably, each of these variables can be thought of having an effect on the level of housing activity. We identify shocks in the FAVAR using a recursive ordering of the variables as they appear earlier. This ordering implies that the unemployment rate does not respond within the month to any of the shocks but its own. The federal funds rate is ordered second and hence responds on impact only to unemployment shocks and monetary policy shocks. The 30-year effective mortgage rate is ordered third which implies that it is assumed to respond on impact to unemployment and monetary policy shocks, but with a one-month lag to shocks to the housing market as measured by our estimated housing factor, consumer confidence and the S&P 500 index. We put the housing factor in fourth position, which implements the assumption that housing market activity cannot respond within the month to shocks to consumer confidence and stock prices. Finally, the fact that consumer confidence and the S&P 500 are ordered last implies C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C21

A hierarchical factor analysis of U.S. housing market dynamics Shock to Fed Funds Rate

Shock to Unemployment Rate 0.08

0.2 max:0.13253

0.06

0.15 0.1

sum: 2.4906

0.02

0.05

sum: 0.60446

0

0 −0.05

max:0.044333

0.04

−0.02 0

10

20

30

−0.04

0

10

20

Shock to US Housing Factor

Shock to Effective Mortgage Rate 0.4

0.5

0.3

0.4

max:0.22354

0.2

max:0.40103

0.3

0.1

0.2

sum: 2.4719

sum: 0.78142 0 −0.1

30

0.1 0

10

20

0

30

0

10

20

30

Shock to S&P Index

Shock to UMich Consumer Sentiment 0.1

0.15 max:0.05943

0.05

0.1

0

0.05

max:0.062778

sum: 0.88176

sum: −0.71621 −0.05 −0.1

0

0

10

20

30

−0.05

0

10

20

30

Figure 7. Impulse responses of national housing factor to different shocks.

that these two variables can respond within the month to unemployment, interest rate and housing market shocks. Figure 7 shows the estimated posterior mean and 90% probability interval of impulse responses of F pq to shocks to each of the six variaties from the FAVAR discussed in Section 4.2.4 This contains the national unemployment rate, U, the fed funds rate, FF, a 30-year effective mortgage rate from Freddie Mac, MR, our national housing factor, Fpq , the University of Michigan’s survey of consumer sentiment, MICH, and the log of the S&P 500 index, SP. We identify shocks using a recursive identification scheme of the variables in the order they appear earlier. The FAVAR has four lags. The sample period is 1976:01–2008:05. We normalize the 4

Unreported results show that changes to the ordering of the six variables do not qualitatively affect our conclusions.


C22

E. Moench and S. Ng

shocks to the federal funds rate and the effective mortgage rate to have a contemporaneous −25 basis point impact on itself, respectively. All other shocks are one-standard-deviation shocks. Reductions in both interest rates lead to an increase in housing market activity. The maximum effect of a 25 basis points cut in the fed funds rate on the housing factor is 0.026 and the cumulative effect is 0.35. By contrast, the maximum effect of a 25 basis points reduction in the effective mortgage rate is 0.13 and the cumulative effect after 30 months equals 0.46. Hence, reducing mortgage rates has a larger maximum and cumulative effect on the housing factor than an equivalent cut of the fed funds rate. This result suggests that direct policy interventions in the mortgage market may represent an effective way to revive the housing market. Interestingly, we find that a positive one-standard-deviation shock to the unemployment rate boosts housing activity. This effect is due to a strong reduction of the federal funds rate following a negative shock to real activity. We further find that a one-standard-deviation increase in stock prices has a positive effect on housing activity both in the short run and in the long run. The maximum effect on Fpq is 0.033, recorded three periods after the shock, and the cumulative effect is 0.51 which is about equal to two standard deviations of the housing factor. A onestandard-deviation increase in consumer confidence boosts the housing factor in the short term, but interestingly has a negative cumulative effect. An overview of the results suggests that all else equal, housing market activity can be stimulated by a cut in the federal funds rate and a reduction of mortgage rates, the latter potentially having larger effects. Increases in stock prices will positively affect housing market activity in the short run and in the long run. Increased consumer confidence is found to stimulate housing market activity in the short run, but may have a negative long-run effect on housing. We also re-estimate the previous FAVAR using three different national house price series available for our full data sample as well as our estimated national house price factor, each standardized to have the same unconditional variance. We do not report these results here, but restrict ourselves to noting that the various indicators imply quite different reactions of house prices to the six shocks. The particular choice of house price measure thus potentially has a large impact on the conclusions one may reach from a quantitative analysis such as the one carried out earlier.

5. CONCLUSION This paper provides three new perspectives on the effects of housing shocks on consumption. First, we distinguish between house price shocks and shocks to general activity in the housing market. Second, we analyse regional as well as national data. Third, our housing shock is not tied to a specific house price series. Instead, we extract house price and housing market factors from a large number of housing indicators. Our results indicate that in spite of large idiosyncratic variations, there is a national and a regional housing component in each of the regions, though the regional component is more important than the national component in the West. The aggregate response of consumption to national housing shocks is hump shaped. According to our estimates, the drop in housing market activity that began in 2006 can lead to a significant decline of consumption, all else being equal. Interest rate cuts can stimulate housing activity, and directly targeting lower mortgage rates may be an effective way to revive the housing market. However, without a boost in consumer confidence and the stock market, the housing market can remain depressed for a prolonged period of time. Our econometric framework permits a block structure C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


C23

and can handle data of mixed frequencies as well as missing data. Latent factors can also co-exist with observed factors. The methodology can be useful in other applications.

ACKNOWLEDGMENTS This paper was presented at The Econometrics Journal Special Session of the 2009 RES Conference in Surrey, and the 2009 Summer Meeting of the Econometrics Society in Boston. The authors are grateful to Chris Otrok, Dan Copper, seminar participants at Columbia University and the Bundesbank for helpful comments and discussions. Evan LeFlore provided valuable research assistance. The second author would like to acknowledge financial support from the National Science Foundation under grant SES 0549978. The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Bank of New York or the Federal Reserve System.

REFERENCES Aruoba, B., F. Diebold and C. Scotti (2008). Real-time measurement of business conditions. Working paper, University of Maryland. Bai, J. and S. Ng (2006). Confidence intervals for diffusion index forecasts and inference with factoraugmented regressions. Econometrica 74, 1133–50. Berkovec, J. A. and J. L. Goodman (1996). Turnover as a measure of demand for existing homes. Real Estate Economics 24, 421–40. Bernanke, B., J. Boivin and P. Eliasz (2005). Factor augmented vector autoregressions (FVARs) and the analysis of monetary policy. Quarterly Journal of Economics 120, 387–422. Calomiris, C., S. Longhofer and W. Miles (2008). The foreclosure–house price nexus: lessons from the 2007–2008 housing turmoil. NBER Working Paper No. 14294, National Bureau of Economic Research. Campbell, S., M. Davis, J. Gallin and R. Martin (2008). What moves housing markets? A variance decomposition of the rent–price ratio. Working paper, Board of Governors of the Federal Research System. Carter, C. K. and R. Kohn (1994). On Gibbs sampling for state space models. Biometrika 81, 541–33. Case, K. and R. Shiller (1989). The efficiency of the market for single family homes. American Economic Review 79, 125–37. Del Negro, M. and C. Otrok (2005). Monetary policy and the house price boom across U.S. states. Working Paper No. 2005-24, Federal Reserve Bank of Atlanta. Dieleman, F., W. Clark and M. Deurloo (2000). The geography of residential turnover in twenty-seven large US metropolitan housing markets, 1985–1995. Urban Studies 37, 223–45. Fu, D. (2007). National, regional and metro-specific factors of the U.S. housing market. Working Paper No. 0707, Federal Reserve Bank of Dallas. Kim, C. and C. Nelson (2000). State Space Models with Regime Switching. Cambridge, MA: MIT Press. Kose, A., C. Otrok and C. Whiteman (2008). Understanding the evolution of world business cycles. International Economic Review 75, 110–30. Leamer, E. (2007). Housing is the business cycle. NBER Working Paper No. 13248, National Bureau of Economic Research. Moench, E., S. Ng and S. Potter (2009). Dynamic hierarchical factor models. Working paper, Columbia University. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C24

E. Moench and S. Ng

Ng, S. and H. Schaller (1998). Do housing bubbles affect consumption? Working paper, Columbia University. Rappaport, J. (2007). A guide to aggregate house price measures. Economic Review, Federal Reserve Bank of Kansas City 2, 41–71. Stock, J. H. and M. W. Watson (2008). The evolution of national and regional factors in U.S. housing construction. Working paper, Princeton University.


The


Short-term forecasts of euro area GDP growth E LENA A NGELINI † , G ONZALO C AMBA -M ENDEZ † , D OMENICO G IANNONE ‡,¶ , L UCREZIA R EICHLIN § ,¶ †† ¨ AND G ERHARD R UNSTLER †

European Central Bank, Kaiserstrasse 29, 60311 Frankfurt am Main, Germany. E-mail: [email protected], [email protected] ‡

ECARES, Université Libre de Bruxelles, 50 Avenue Roosevelt CP 114, 1050 Bruxelles, Belgium. E-mail: [email protected] § London

††

Business School, Regent’s Park, London NW1 4SA, UK. E-mail: [email protected]

Austrian Institute of Economic Research, 1030 Vienna, Arsenal, Objekt 20, Austria. E-mail: [email protected] ¶ CEPR,

77 Bastwick Street, London, EC1V 3PZ, UK.

First version received: July 2009; final version accepted: July 2010

Summary This paper evaluates models that exploit timely monthly releases to compute early estimates of current quarter GDP (now-casting) in the euro area. We compare traditional methods used at institutions with a new method proposed by Giannone et al. The method consists in bridging quarterly GDP with monthly data via a regression on factors extracted from a large panel of monthly series with different publication lags. We show that bridging via factors produces more accurate estimates than traditional bridge equations. We also show that survey data and other ‘soft’ information are valuable for now-casting. Keywords: Factor model, Forecasting, Large data sets, Monetary policy, News, Real-time data.

1. INTRODUCTION This paper evaluates different methods to construct early estimates and short-term forecasts of quarterly GDP growth for the euro area, exploiting timely releases of monthly data. GDP data are published with a considerable delay. The first official release is the Eurostat’s flash estimate of euro area GDP growth which is published six weeks after the end of the reference quarter. As a consequence, policy decisions have to rely on other information which is released in a more timely manner. More timely information have monthly or higher frequency and include ‘hard’ indicators, like industrial production which is released six weeks after the end of the reference month, while ‘soft’ indicators such as survey data which are released at the end of, or a few days after, the reference month. Monthly indicators are routinely used in judgemental forecasting to form a view on current economic conditions before GDP data are made available. Statistical models which can perform C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society. Published by Blackwell Publishing Ltd, 9600


C26

E. Angelini et al.

this exercise and exploit timely information must deal with mixed frequency (using monthly data to now-cast quarterly GDP) and jagged edges (at the end of the sample, different variables will have missing points corresponding to different dates in accordance with their timeliness). In policy institutions models that have these characteristics go under the name of ‘bridge equations’. These are predictive equations that bridge monthly information with quarterly ones. More precisely, bridge equations are regressions of quarterly GDP growth on a small set of preselected key monthly indicators. This simple modelling strategy has been popular among policy institutions which commonly pooled several GDP forecasts from bridge equation models so as to consider information from a large number of predictors. In this paper we will focus on two main implementations of this technique: one approach, implemented at the European Central Bank (ECB), that combines a number of selected bridge equations based on multiple regressors (see Rünstler and Sédillot, 2003, and Diron, 2006) and a second one, which pools forecasts of GDP based on a large number of bridge equations with only one predictor each (see Kitchen and Monaco, 2003). An alternative way to exploit large information to bridge monthly and quarterly variables has been proposed by Giannone et al. (2008), first applied on US data at the Board of Governors of the Federal Reserve and now also regularly implemented at the ECB. This method consists in combining predictors in few common factors which are then used as regressors in bridge equations via the Kalman filter. The first part of the paper presents an out-of-sample evaluation of the three methods, estimated with different specification choices. The design of the experiment is pseudo-real time in the sense that we replicate the data availability situation which is faced in real-time application of the models and re-estimate the models using only the information available at the time of the forecast. However, our design differs from a perfect real-time evaluation since we use final data vintages and hence ignore revisions to earlier data releases. Although there is a large literature comparing the forecasting accuracy of factor models and forecast averaging, no paper has focused on now-casting and which implies using factors estimated from monthly data in combination with quarterly data in a bridge equation.1 Nowcasting is an important application of factor analysis since forecasting improvement of GDP with respect to naive models is mainly limited to the current quarter (D’Agostino et al., 2006).2 We evaluate the impact of new data releases on current GDP now-casts throughout the quarter. We update the model twice per month, at the middle and at the end of the month and measure the accuracy of the forecasts computed using the information available at each date. This allows to understand the importance of different types of releases since the end of month update incorporates essentially the releases of ‘soft’ data while the mid-month update incorporates mainly ‘hard’ data. Moreover, following Banbura and Rünstler (2007) we study the weight the model attaches to ‘soft’ and ‘hard’ data to now-cast GDP. Both exercises allow us to quantify the reliability of ‘soft’ data for the euro area. In this paper we focus on GDP since this is the variable that best captures aggregate economic conditions and it is therefore closely monitored by central banks. The techniques discussed here, however, are relevant for the now-cast of any other variable available at quarterly frequency

1 See, for example, Marcellino et al. (2003) for euro area data, Stock and Watson (2006) for the US, Camba-Mendez et al. (2001) for major European countries, Artis et al. (2005) for the UK, Reijer (2005) for the Netherlands, Duarte and Rua (2007) for Portugal, Schumacher (2007) for Germany and Nieuwenhuyze (2005) for Belgium. 2 A recent Euro-system project (see R¨ unstler et al., 2008) provides results from a streamlined version of this paper, but based on different country data sets.


C27

Short-term forecasts of euro area GDP growth

and published with lags as, for example, national accounts data or employment for the euro area. Forecasting inflation, on the other hand, involves different problems since price data are available at the monthly level and with little delay. The paper is structured as follows. Section 2 describes the bridge equations technique. Section 3 briefly reviews the alternative modelling strategy proposed in Giannone et al. (2008), which relies on combining predictors in a few common factors. Section 4 provides an assessment on the empirical performance of these alternative methods. This section further reviews how to retrieve policy relevant information from the predictions of the ‘bridging with factors’ model, in effect rendering this model less mechanical in nature. Section 5 concludes.

2. BRIDGE EQUATIONS (BE) A traditional modelling strategy for obtaining an early estimate of quarterly GDP growth by exploiting information in monthly variables is based on the so-called ‘bridge equations’ (BE). ‘Bridging’ stands for linking monthly data, typically released early in the quarter, with quarterly data, such as GDP and its components, that are released late and are not available at monthly frequencies. The bridge equations under regular use in several policy institutions, including the ECB, rely on selected indicators, which have been shown to contain some predictive content for quarterly GDP growth (see Kitchen and Monaco, 2003, Rünstler and Sédillot, 2003, Baffigi et al., 2004 and Diron, 2006). Let us denote quarterly GDP growth as ytQ and the vector of k selected stationary monthly j j j indicators, for every bridge equation j , as xt = (x1,t , . . . , xk,t ) , t = 1, . . . , T . The bridge equation is estimated from quarterly aggregates of the monthly data. Predictions of GDP growth are obtained in two steps. In a first step, the monthly indicators are forecasted over the remainder jQ of the quarter to obtain forecasts of their quarterly aggregates, xit . The forecasts of the monthly predictors are typically based on univariate time series models. In a second step, the resulting values are used as regressors in the bridge equation to obtain the GDP forecast. We have ytQ = μ +

k

j

jQ

jQ

βi (L)xit + εt ,

(2.1)

i=1 j

j

j isi

j

where μ is an intercept parameter and βi (L) = βi0 + · · · + β j Lsi denote lag polynomials of j

length si . The models are designed to be used in real time and therefore they are adapted to take into account that at each date of the forecast some series, due to publication lags, have missing data at the end of the sample. Moreover, due to the different timing of data releases, the number of missing data differ across series. Missing data are typically forecasted using univariate monthly autoregressive models. Bridge equations can handle only a limited set of predictors. Information from many predictors is incorporated by combining predictions from many small models (see Kitchen and Monaco, 2003, and Diron, 2006). An alternative route to incorporate large information consists in combining the predictors into few common factors. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C28

E. Angelini et al.

3. BRIDGING WITH FACTORS (BF) In order to exploit information of many timely monthly releases to obtain an early estimate of quarterly GDP growth, an alternative to averaging many bridge equations, is to use estimated common factors as regressors. This idea was first introduced by Giannone et al. (2008) and applied to US data. In a nutshell, the idea of this approach is to compute factors from a large panel of monthly data. Factors are averaged so as to obtain quarterly series which are then used as regressors in the GDP equation, where time aggregation is such that the quarterly series corresponds to the third month of the quarter. Missing observations for the first and second quarters are computed via the Kalman filter. Consider the vector of n stationary monthly series xt = (x 1,t , . . . , x n,t ) , t = 1, . . . , T , which have been standardized to mean zero and variance one. The dynamic factor model considered by Giannone et al. (2008) is then given by the equations xt = ft + ξt , ft =

p

ξt ∼ N(0, ξ ),

Ai ft−i + ζt ,

(3.1)

(3.2)

i=1

ζt = Bηt ,

ηt ∼ N(0, Iq ).

From an n × r matrix of factor loadings , equation (3.1) relates the monthly series xt to an r × 1 vector of latent factors ft = (f 1,t , . . . , f r,t ) plus an idiosyncratic component ξ t = (ξ 1,t , . . . , ξ n,t ) . The latter is assumed to be multivariate white noise with diagonal covariance matrix ξ . Equation (3.2) describes the law of motion for the latent factors ft , which are driven by q-dimensional standardized white noise ηt , where B is an r × q matrix, where q ≤ r. Hence ζt ∼ N(0, BB ). Finally, A1 , . . . , Ap are r × r matrices of parameters and it is further assumed that the stochastic process for ft is stationary. Let us now define quarterly GDP growth as an average of latent monthly observations during the quarter. Precisely, i.e. ytQ = (yt + yt−1 + yt−2 ). This implies that the yt is approximately a three-month growth rate. In order to ensure consistency among monthly indicators and quarterly GDP, all monthly variables are transformed so as to ensure that the corresponding quarterly quantities are given by xitQ ∼ (xit + xit−1 + xit−2 ) measured at the last month of each quarter, i.e. t = 3k and k = 1, . . . , T /3. This implies that series in differences enter the factor model in terms of three-month changes.3 Defining the quarterly factors as ftQ = (ft + ft−1 + ft−2 ), the factors-based bridge equation follows: ytQ = β fˆtQ ,

(3.3)

where β is an r × 1 vector of parameters. In the third month of each quarter, we evaluate the forecast for quarterly GDP growth, ytQ , as the quarterly average of the monthly series ytQ =

1 ( yt + yt−1 + yt−2 ) 3

(3.4)

3 Here we follow the same approach as in Banbura and R¨ unstler (2007). Qualitative results are confirmed if the monthly data are transformed to correspond to a quarterly growth at the end of the quarter, as done in Giannone et al. (2008).



C29

and define the forecast error εtQ = ytQ − ytQ . We assume that εtQ is distributed with εtQ ∼ Q N(0, σε2 ). Innovations ξ t , ζ t and εt are assumed to be mutually independent at all leads and lags. Having specified the bridge equation we now have to deal with the fact that the monthly panel is unbalanced at the end of the sample due to different publication lags of the data. A key feature of the model by Giannone et al. (2008) is that it deals easily with the unbalanced data set problem. The model can be cast into a state space form and hence Kalman filter techniques can be easily applied to deal with missing data. Precisely, to obtain efficient forecasts of GDP growth ytQ from the unbalanced data sets, the Kalman filter and smoother recursions are applied to the state space representation of this model. The advantage of this framework over that of the simple bridge equations is that instead of forecasting missing values on the basis of a univariate autoregressive model, we obtain a forecast compatible with the model which exploits multivariate information. Let us denote zt = (xt , ytQ ) and consider a data set ZT = {zs }Ts=1 that has been downloaded on a certain day of the month and might contain missing observations for certain series at the end of the sample. Following Giannone et al. (2008), a model-based measure for the uncertainty of forecasts from any data set Zt can be easily computed by noting that the variance of the forecast Q can be decomposed into error for yt+h Q Q = πt+h|t + σε2 , (3.5) var yt+h|t − yt+h Q Q where πt+h|t = var( yt+h|t − yt+h ) represents the effect stemming from the uncertainty in forecasts f t+h|t of the latent factors. We denote π t+h|t as filter uncertainty, as opposed to residual uncertainty σε2 . Giannone et al. (2008) have pointed that π t+h|t can be used for inspecting the information content of new data releases. Consider two data sets Zt(date 1) and Zt(date 2) ⊇ Zt(date 1) , hence Zt(date 2) is downloaded on a later date. With parameters θ being estimated from data Zt(date 1) in both cases, it can be shown that Q,(date 1) (date 1) (date 2) Q,(date 2) . (3.6) πt+h|t = πt+h|t + var yt+h|t − yt+h|t (date 1) (date 2) ≥ πt+h|t and filter uncertainty necessarily increases when information is Hence, πt+h|t withdrawn. Empirical results are shown later.

4. THE MODELS AT WORK In its regular monitoring of economic activity in the euro area, ECB staff use a set of bridge equations which have gradually been developed in recent years. They also include equations for the forecast of the demand components of the national accounts and for the GDP of euro area member states (see Rünstler and Sédillot, 2003, and Diron, 2006). The GDP forecast is derived as the simple average of the predictions from all the equations. In this paper, we consider a subset of these equations, i.e. 12 bridge equations, which are designed to forecast euro area GDP directly, and derive the GDP forecast as the simple average of the predictions from these equations. They contain in various combinations, a small set of selected indicators for the euro area: industrial production, industrial production in construction, retail sales, new car registrations, the unemployment rate, money M1, the European Commission business and service confidence indices, and, among composite indicators, the OECD leading indicator, and C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C30

E. Angelini et al. Table 1. Bridge equations for euro area GDP growth (BES model). Equation

Explanatory variables

1

Industrial production (total) Ind. production construction

∗

Retail sales New car registrations Service confidence Unemployment rate Money M1 Business confidence EuroCoin (CEPR) OECD leading indicator

∗

2

3

4

5

6

7

8

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗

∗ ∗

∗

9

10

11

12

∗ ∗ ∗

∗

∗

∗ ∗

∗ ∗ ∗ ∗

Note: With the exception of EuroCoin and the service confidence index, the data have been transformed to represent monthly differences when expressed in rates (unemployment and business survey) and monthly growth rates otherwise. j j All series appear at lag 0 in the equations, i.e. βi (L) = βi0 with the exception of EuroCoin, which appears at lag 1, i.e. j j j j βi (L) = βi1 L, and real money, which appears at lag 2, i.e. βi (L) = βi2 L2 .

the CEPR–Bank of Italy coincident indicator for the euro area ‘EuroCoin’. The models used in each of the 12 individual equations are listed in Table 1. The GDP forecast is derived as the simple average of the predictions from such equations. In what follows we refer to the method as the BES model, standing for Bridge Equations based on Selected predictors. These indicators are regularly monitored not only by the ECB in its Monthly Bulletin, but also by a majority of euro area market analysts. Some equations are based on simple accounting reasoning. This is the case for the equations based on hard monthly indicators, like industrial production, construction production, retail sales and new car registrations, which are components of GDP. Other equations are instead based on soft/indirect indicators such as surveys and financial variables, whose relationship with GDP is looser but they convey useful information since they cover some areas of activity for which there are no hard indicators and they are released earlier. For further details, see Rünstler and Sédillot (2003) and Diron (2006). The factor model is based on a larger information set incorporating a wide range of monthly indicators. We consider n = 85 monthly predictors. Among the official data on the euro area economic activity we include 19 series, i.e. components of industrial production (17), retail sales, new passenger car registrations. As for survey data, we use 24 series from the European Commission business, consumer, retail and construction surveys. Financial data comprise 22 series including exchange rates (6), interest rates (7), equity price indices (4) and raw material prices (5). For the international economy we consider 11 series, including key macroeconomic indicators for the United States (7) and extra area trade volumes from the balance of payments statistics (4). In addition, the data set includes five series related to employment and four series on monetary aggregates and loans. We have transformed series to obtain stationarity. The series and their transformations are described in the Appendix. We also produce early estimates of GDP growth averaging many bridge equations based on the same information set used for bridging with factors. We assess the forecasting performance of an average of the predictions obtained by the 85 univariate bridge equations. Each equation j C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


C31 jQ

predicts quarterly GDP growth from the quarterly aggregate of stationary monthly indicator x1t where j

jQ

jQ

ytQ = μj + β1 (L)x1t + εt .

(4.1)

The idea of model averaging to combine information from large data sets has been discussed, amongst others, by Hendry and Clements (2004). This method has been used for early estimates of GDP growth by the US Treasury (Kitchen and Monaco, 2003). In what follows we refer to the method as the BEA model, standing for Bridge Equations based on All predictors. 4.1. Design of the simulated pseudo out-of-sample exercise We evaluate the forecasting accuracy of different methods under realistic informational assumptions. We design the forecasting evaluation exercise by mimicking as closely as possible the real-time flow of information, by replicating the real-time pattern of data availability. The parameters of the model are estimated recursively using only the information available at the time of the forecast. We do not have a real-time database for all the predictors considered, therefore we are not able to take into account the real-time data revisions. Taking into account the real-time data flow is important for the understanding of the marginal impact of blocks of releases since the latter depends on their order. The order of data arrival and the publication lag are particularly important when data are highly collinear as is the case for macroeconomics series. In this case, the block that is more timely has a larger information content since, by the time the later release is published, its informational content is already incorporated in the forecast. In principle, the now-cast from each model can be computed whenever new data are released within the month. The Giannone et al. (2008) factor model implemented at the Fed is updated once a week while at the ECB the same model and the bridge equations are updated twice a month, in relation to data releases at the end of the month and to the release of important ‘hard’ data such as industrial production in the middle of the month. We replicate the ECB practice and thus conduct two forecasts per month which use only the data that in real time are available at the time of the two monthly updates. More precisely, we use a data set downloaded on 25 February 2008 and combine this with the typical data release calendar to reconstruct data availability at the end of the month (end) and in the middle of the month (mid). The data situation at the end of the month coincides with the release of financial market data and survey data for the previous month while retail trade turnover and monetary aggregates are lagged one month. Around the middle of the month we have the release of the bulk of ‘hard’ data on economic activity, including data on industrial production, external trade and new passenger car registrations. The attribution of releases to each update is described in the last column of the table in the Appendix.4 For GDP of a certain quarter, we produce a sequence of forecasts in seven consecutive months prior to the release. Starting from these two data sets, ZT(i) = {xs }Ts=1 for i = {end, mid}, we define a pseudo real-time data set Zt(i) = {xs }ts=1 as the observations from the original data set ZT(i) up to period t, but with observation x i,t−h , h ≥ 0, eliminated, if observation x i,T−h is missing in ZT(i) . 4

The international data in our data set are published at various dates and we therefore attributed them accordingly.


C32

E. Angelini et al.

Notice that we could have updated the model more frequently throughout the month as done in the evaluation on US data by Giannone et al. (2008). Such a detailed analysis is more difficult with European data since releases are clustered and the relative order of different data releases has changed over the evaluation period. Therefore, for this application, the model is updated only twice a month, in relation, roughly, to the release of ‘soft’ (end of month) and ‘hard’ (mid-month) data, respectively. In this case, the order of the stylized calendar is not far from reality since there are only two large groups and hence changes in the ordering over the evaluation sample are rather limited. Therefore, the mid-of-month and the end-of-month updates reflect the incorporation of ‘hard’ and ‘soft’ data respectively; hence by looking at the evolution of the forecasts and its accuracy it is possible to assess the impact of hard and soft data on GDP. 4.2. Empirical specification of the models There are several parameters to be specified: the number of lags for the bridge equations; the number of factors r, the number of shocks q and the lag length p of the VAR on the factors. 4.2.1. Bridge equations. For the BES model, the lag length is fixed ex ante using the specification that is actually used at the ECB.5 For the BEA model specification is either based on information criteria or on the RMSE criterion. The information criteria used is that proposed in Schwarz (1978) and will be referred to as SIC. The lag length is chosen from the SIC for each equation individually at each point in time. We search lags in the range [0, 4] and also consider the averages across this range. Alternative specifications are also chosen on the basis of a recursive RMSE criterion. According to this criterion we chose the parametrization that, at each point in time, produces the minimum RMSE for the forecast computed up to that point. In addition, we will consider simple forecast averages across a range of specifications. For both BES and BEA, we use the autoregressive model on the monthly growth rates (or monthly difference) to forecast the missing observations for the predictors; the lag length is selected by the SIC. 4.2.2. Bridging with factors. Here use is made of both the RMSE and the information criteria, together with forecast averages across a range of specifications. For the RMSE we consider the lowest average RMSE over the entire forecast horizon. For instance, in 2003Q4, the available forecast errors are from 1999Q1 to 2003Q2 and this sample is used for model selection. As information criteria we use criterion IC2 from Bai and Ng (2002) to determine r within a range of [1, 8]. Given r, we estimate equation (3.2) and determine lag length p from the SIC within a range of [1, 4]. Finally, we apply principal components analysis to the estimated residuals ζt and follow Bai and Ng (2007) in selecting q as the smallest value that satisfies the condition λi )/( ri=1 λi ) < qcrit , where λi are the ordered eigenvalues from the sample covariance ( ri=q+1 matrix of ζ t and q crit is an appropriate critical value. For both information and RMSE criteria, the model is selected from the range r = [1, 8], q = [1, min (5, r)], and p = [1, 4]. Forecast averages are also taken over this range. Standard model specification tests point against extended equation (3.1) with lags of GDP growth.6 5 The specification is based on the findings in R¨ unstler and Sédillot (2003) and Diron (2006), and mostly selects lag 0. This of course may bias results against them. 6 The Ljung–Box test suggests that the dynamics of GDP growth is well explained by the factors, and therefore there is no need to extend the model by further adding lags of GDP growth. Standard Ljung–Box tests have been carried out on



C33

4.3. Forecasting performance The models are evaluated by looking at the out-of-sample forecasting performances during the period 1999Q1 to 2007Q2. For GDP of a certain quarter, a sequence of forecasts in seven consecutive months prior to its release are computed. Furthermore, we conduct two forecasts per month, which replicate the data availability prevailing at the end of the month and in the middle of the month. Figure 1 shows the forecasts from the BF model and the two implementations of the bridge equation model, BES and BEA respectively, against the GDP numbers. For the BF and the BEA models, the average performance across the different specifications is reported. Values shown are the forecasts conducted at the end of months seven, four and one, respectively, prior to the release of the GDP flash estimate. Results show that the factor model forecast tracks GDP more accurately, in particular during the pronounced slowdown in 2001–03. The BEA produces forecasts which are rather flat. However, the BES starts tracking GDP dynamics one month before the first GDP release. We compute out-of-sample measures for all models. Precisely, we look at the evolution of the RMSE for the now-casts computed after each data release within the quarter when GDP growth is projected on available monthly data series. Results are shown in Figure 2, which reports the RMSE for all models as well as for the naive constant growth forecast. In the x-axis of the figure we write −i end to indicate the forecasts conducted using information available at the end of i months ahead of the data release; we use −i mid to indicate the forecast computed using information available in the middle of the month. For the BF and the BEA models, the average performance across the different specifications is reported. The naive forecast predicts GDP growth to be equal to the average of past GDP growth. For the BF model Figure 2 shows the RMSE derived from the average across specifications while detailed results for all parametrizations are reported in Table 2. From both Figure 2 and Table 2, we can see that, for the BF model, there is a clear decline of the RMSE with the increase in monthly information. For the averages across different specifications, for instance, the RMSE declines steadily from 0.338 to 0.234, an improvement of around 30%. This feature is less clear for the bridge equations, especially the BEA specification whose performance does not improve over the quarter. Notice that the BES model becomes more accurate once there is less than two months to go for the release of GDP. This matches the time when industrial production for the current quarter is included for the first time. The result is not surprising as the BES makes extensive use of ‘hard’ data. In general, we find that the BF model uniformly outperforms the AR(1) benchmark and the bridge equations across all horizons and independently from the specification selection method.7 This is most likely a reflection of the fact that, contrary to the bridge equation model, the factor the one-step-ahead prediction errors of GDP growth. These do not allow to reject the null of zero correlation. These tests have been implemented based on the 6, 8, 10, 12 and 14 first autocorrelations. 7 Alternative autoregressive processes to the benchmark AR(1) model reported in the tables have also been implemented. However, these provided forecasting results which were broadly in line with the numbers reported for our adopted benchmark and are thus not shown. The alternative AR processes tested were: (a) AR(p) with p selected using the BIC criterion; (b) AR(p) with p selected recursively on the basis of best forecast performance in the previous quarter; (c) AR(p) with p selected using BIC criterion using the entire sample; and (d) averaging predictions from the AR model estimated with different lags from 1 to 4. In terms of actual forecasting performance, the AR(1) model reported in the tables came always as best for forecasts with one to three ‘months left’. However, the AR(1) model was worse than the other models for forecasts with four to seven ‘months left’. The best RMSEs taken from all AR models implemented ranged between 0.311 for a forecast with one ‘month left’ to 0.324 for a forecast with seven ‘months left’. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C34

E. Angelini et al.

Figure 1. Euro area GDP growth and model forecasts.

model exploits the information content of cross-correlations across series. The major gains occur for the intermediate horizons, i.e. the forecasts made three to five months ahead of the release of the GDP flash estimate. For these forecasts, the RMSE is about 20% lower compared to the AR(1) benchmark. Differences among specification selection methods are small. This is in line with Rünstler et al. (2008) who found that information and recursive RMSE criteria perform C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C35


Figure 2. RMSE from pseudo real-time exercise 1999Q1–2005Q4. Table 2. Root mean squared error from short-term forecasts (1999Q1–2007Q2). BE models Benchmarks

BF model

BEA

Quarter

Vintage

Naive

AR(1)

Rec

Avg

IC

Best

BES

Rec

Avg

IC

Next

−7 mid

0.35

0.35

0.34

0.33

0.33

0.32

0.35

0.35

0.35

0.34

Next

−7 end −6 mid −6 end

0.35 0.35 0.35

0.35 0.34 0.34

0.32 0.31 0.31

0.32 0.31 0.29

0.33 0.32 0.31

0.30 0.30 0.28

0.35 0.35 0.34

0.34 0.34 0.33

0.34 0.34 0.34

0.34 0.34 0.33

Next

−5 mid −5 end

0.35 0.35

0.34 0.34

0.31 0.27

0.29 0.26

0.30 0.26

0.28 0.24

0.35 0.35

0.33 0.33

0.33 0.33

0.33 0.33

Curr. Curr.

−4 mid −4 end −3 mid

0.35 0.35 0.34

0.34 0.34 0.29

0.27 0.27 0.26

0.26 0.26 0.26

0.26 0.27 0.26

0.24 0.24 0.23

0.34 0.35 0.32

0.33 0.33 0.32

0.33 0.33 0.32

0.32 0.32 0.32

Curr.

−3 end −2 mid

0.34 0.34

0.29 0.29

0.26 0.22

0.25 0.23

0.26 0.23

0.24 0.21

0.32 0.26

0.31 0.31

0.32 0.31

0.31 0.31


0.34 0.34 0.34

0.29 0.29 0.29

0.21 0.19 0.20

0.22 0.21 0.21

0.22 0.20 0.21

0.20 0.18 0.20

0.25 0.23 0.23

0.31 0.30 0.30

0.31 0.31 0.31

0.31 0.30 0.30

Prev.

Note: The table reports root mean square forecast errors from different models computed at the end of the month (end) or in the middle of the month (mid) as explained in the main text. The number preceding these terms is used to indicate the number of months pending prior to the release of GDP, e.g. −7 mid. The parametrizations for the BF model and the BEA model have been selected using three criteria. Namely, recursive mean square forecast error (Rec); averaging across all possible parameterizations (Avg); and applying recursively information criteria (I C). Best refers to the best model ex post, i.e. that which gave the lowest RMSE over the whole forecasting sample, which turn out to be the BF model with parameter settings r = 5, p = 3 and q = 1. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C36

Quarter

E. Angelini et al. Table 3. Adjusted Diebold–Mariano test of predictive accuracy (1999Q1–2007Q2). BF BES BEA BF BF Vintage AR(1) AR(1) AR(1) BES BEA

BES BEA

Next

−7 mid

0.198

0.274

0.207

0.207

0.225

0.584

Next


0.141 0.174 0.091

0.214 0.778 0.686

0.165 0.653 0.304

0.146 0.139 0.089

0.162 0.149 0.091

0.660 0.776 0.883

Next

−5 mid −5 end

0.089 0.028

0.814 0.751

0.263 0.203

0.087 0.026

0.081 0.030

0.943 0.939

Current Current

−4 mid −4 end −3 mid

0.028 0.054 0.098

0.603 0.768 0.869

0.175 0.168 0.945

0.026 0.064 0.056

0.037 0.062 0.093

0.880 0.935 0.049

Current

−3 end −2 mid

0.119 0.038

0.826 0.255

0.923 0.877

0.078 0.032

0.101 0.154

0.056 0.067


0.036 0.012 0.012

0.177 0.024 0.027

0.876 0.773 0.775

0.030 0.015 0.015

0.130 0.188 0.188

0.039 0.002 0.003

Previous

Note: The table reports probability values of the adjusted Diebold–Mariano test of predictive accuracy proposed by Harvey et al. (1997). Values shown correspond to bilateral comparisons between the BF and BE models and also comparisons between these models and a benchmark autoregressive process, AR(1) in the table. Probability values are reported for forecasts computed at the end of the month (end) or in the middle of the month (mid) as explained in the main text. The number preceding these terms is used to indicate the number of months pending prior to the release of GDP, e.g. −7 mid.

about equally well across nine data sets. Bridge equations do not uniformly beat the AR(1) and any gains upon the latter are small. Differences between the various specification selection methods are also very small for the bridge equation models. Let us also remark that the best ex post parametrizations on the entire exercise for the BF model is: r = 5, q = 3 and p = 1, shown in Table 2 under the heading ‘Best ex post’. Notice that this also corresponds to the parameters chosen at the end of the evaluation sample using the recursive mean square forecast error criterion. This is the model selected for the last run at the end of the evaluation exercise from the recursive RMSE criterion. We use this parametrization to compute the in-sample measure of uncertainty and also for the measures of contributions from economic indicators to the forecast reported below. Table 3 shows predictive accuracy test for bilateral comparisons across the models using the Diebold–Mariano test modified using the small sample correction suggested by Harvey et al. (1997). The values reported are probability values. For a 5% significance level, values smaller than 0.05 imply that the performance of the forecasts of that model listed first in the heading of the column, is significantly better, and for values larger than 0.95 that it is significantly worse. Table 3 confirms the good performance of the BF model. For a 10% significance level the forecasting performance of the BF model is significantly better than the AR(1) model and the BES model for forecasts that go into the next quarter period. The BEA appears to perform significantly worse than the BES model for forecasts over the current and the previous quarter. For forecast over the next quarter, neither the BES nor the BEA appear to be significantly better than a standard AR(1) process. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


C37

Figure 3. Model-based measure of uncertainty (bridging with factors model).

4.4. The marginal impact of data releases From the factor model estimates, we can compute the marginal impact of data releases on the now-cast in two alternative ways. First, following Giannone et al. (2008), we can compute model-based uncertainty as new data are published using equation (3.5). Results are reported in Figure 3 which illustrates that the general pattern of the out-of-sample measure is confirmed. Once more, use is made in the x-axis of −i end to indicate the forecasts conducted using information available at the end of i months ahead of the data release, and −i mid to indicate the forecast computed using information available in the middle of the month. Figure 3 further shows that the major reductions in model-based uncertainty, reflected in the steps shown in the figure, come primarily with the releases of ‘soft’ data. Secondly, following Banbura and Rünstler (2007), we can compute contribution (weight) of data releases to the forecast. Since the pattern of data availability changes over time, the weight on each individual series changes throughout the quarter.8 By looking at the magnitude of their weight in the forecasts, we can understand which variables are useful and when. Banbura and Rünstler (2007) have proposed to compute the weights of the individual observations in the estimates of the state vector using an algorithm developed by Harvey and Koopman (2003). Again, weights can be calculated for an arbitrary information set with those weights related to missing data being set to zero. This allows expressing forecasts as a weighted sum of available observations in Zt , i.e. Q = yt+h|t

t−1

ωk (h)zt−k ,

(4.2)

k=0

8 Notice that otherwise the filter is stationary and hence if the data set is balanced then the weight of different blocks would not change.


C38

E. Angelini et al.

Figure 4. Contributions to GDP forecasts (bridging with factors model).

From this expression the contribution of series i to the forecast can be computed as ck,it = t−1 k=0 ωk,i (h)zi,t−k , where ωk,i (h) is the ith element of ωk (h), i = 1, . . . , n. Results are reported in Figure 4, which displays the mean absolute contribution, standardized with the sample standard deviation of GDP growth σ , namely Tt=1 |ck,it |/σ and where time runs from 1999Q1 to 2007Q2, i.e. the evaluation sample. In this figure, ‘Data end’ and ‘Data mid’ are used to refer to data availability at the end and in the middle of the month, respectively. The Appendix provides a precise definition of these two groups of data. Results show that the ‘soft’ data have most of the weight for earlier estimates. Later, when ‘hard’ data for the quarter are released, the weight on ‘soft’ information decreases in favour of the ‘hard’ data. Note that this is in line with the fact that, as shown by all results in this paper, the accuracy improves in earlier forecasts with the end-of-month update, while for later forecasts it improves mainly with the mid-of-month release.

5. CONCLUSIONS This paper evaluates pools of bridge equations and the ‘bridging with factors’ approach proposed by Giannone et al. (2008) for the back-cast, now-cast and short-term forecast of euro area quarterly GDP growth. This model provides a framework which allows exploiting the data flow of monthly information during the quarter for the forecast of quarterly GDP. The model allows estimating the factors and computing missing observations due to publication lags within the same framework via the use of the Kalman filter. In addition, we provide an out-of-sample evaluation of the models in an exercise in which models are updated at different dates of the month in relation to the release of ‘soft’ data and ‘hard’ data, respectively. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


C39

Results indicate that the factor model improves upon the pool of bridge equations. In the case of the now-cast, the root mean squared error is lower by 10–15% and we therefore conclude that this is a valid new tool for short-term analysis. We also show that, while the performance of bridge equations is fairly constant over the quarter, the RMSE of the factor model decreases with the arrival of new information. The advantage over bridge equations is particularly pronounced in the middle of the quarter, when it exploits a large number of early releases efficiently. Early in the quarters forecast errors decrease in relation to the release of ‘soft’ data since industrial production and ‘hard’ data in general are not yet available. At the end of the quarter, on the other hand, the decrease is marked in relation to the release of ‘hard’ data. This shows that timeliness is important and that, in order to evaluate the marginal improvement of groups of releases, we need to condition on available information. The same point is shown by the fact that the contribution of the ‘soft’ data releases to the forecast is large at the beginning of the quarter and small at the end, while the opposite is true for ‘hard’ data. Contrary to the bridge equations of the BES, the bridging factor model makes use of a wide range of relevant ‘soft’ data. This translates into a better forecasting performance in particular at longer horizons, when the availability of ‘hard’ data for the reference quarter is scarce.

ACKNOWLEDGMENTS We would like to thank Filippo Altissimo and Marie Diron who have participated in the initial stage of this project. We also thank Barbara Roffia for helping to construct the data set and David De Antonio Liedo for valuable research assistantship. The opinions in this paper are those of the authors and do not necessarily reflect the views of the European Central Bank.

REFERENCES Artis, M. J., A. Banerjee and M. Marcellino (2005). Factor forecasts for the UK. Journal of Forecasting 24, 279–98. Baffigi, A., R. Golinelli and G. Parigi (2004). Bridge models to forecast the euro area GDP. International Journal of Forecasting 20, 447–60. Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models. Econometrica 70, 191–221. Banbura, M. and G. Rünstler (2007). A look into the factor model black box—publication lags and the role of hard and soft data in forecasting GDP. Working Paper Series 751, European Central Bank. Camba-Mendez, G., G. Kapetanios, R. J. Smith and M. R. Weale (2001). An automatic leading indicator of economic activity: forecasting GDP growth for European countries. Econometrics Journal 4, 56–90. D’Agostino, A., D. Giannone and P. Surico (2006). (Un)predictability and macroeconomic stability. Working Paper Series 605, European Central Bank. Diron, M. (2006). Short-term forecasts of euro area real GDP growth: an assessment of real-time performance based on vintage data. Working Paper Series 622, European Central Bank. Duarte, C. and A. Rua (2007). Forecasting inflation through a bottom-up approach: how bottom is bottom? Economic Modelling 24, 941–53. Giannone, D., L. Reichlin and D. Small (2008). Nowcasting: the real-time informational content of macroeconomic data releases. Journal of Monetary Economics 55, 665–76. Harvey, A. C. and S. J. Koopman (2003). Computing observation weights for signal extraction and filtering. Journal of Economic Dynamics and Control 27, 1317–33. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C40

E. Angelini et al.

Harvey, D. I., S. J. Leybourne and P. Newbold (1997). Testing the equality of prediction mean square errors. International Journal of Forecasting 13, 273–81. Hendry, D. F. and M. P. Clements (2004). Pooling of forecasts. Econometrics Journal 7, 1–31. Kitchen, J. and R. M. Monaco (2003). Real-time forecasting in practice: the U.S. Treasury staff’s real-time GDP forecast system. Business Economics 38, 10–19. Marcellino, M., J. H. Stock and M. W. Watson (2003). Macroeconomic forecasting in the euro area: country specific versus area-wide information. European Economic Review 47, 1–18. Nieuwenhuyze, C. V. (2005). A generalised dynamic factor model for the Belgian economy: identification of the business cycle and GDP growth forecasts. Journal of Business Cycle Measurement and Analysis 2, 213–47. Reijer, A. H. J. D. (2005). Forecasting Dutch GDP using large scale factor models. Technical report, Working Paper No. 28, De Nederlandsche Bank. Rünstler, G., K. Barhoumi, R. Cristadoro, A. D. Reijer, A. Jakaitiene, P. Jelonek, A. Rua, K. Ruth, S. Benk and C. V. Nieuwenhuyze (2008). Short-term forecasting of GDP using large monthly data sets: a pseudo real-time forecast evaluation exercise. Journal of Forecasting 28, 595–611. Rünstler, G. and F. Sédillot (2003). Short-term estimates of euro area real GDP by means of monthly data. Working Paper Series 276, European Central Bank. Schumacher, C. (2007). Forecasting German GDP using alternative factor models based on large data sets. Journal of Forecasting 26, 271–302. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6, 461–64. Stock, J. H. and M. W. Watson (2006). Forecasting with many predictors. In G. Elliott, C. W. J. Granger and A. Timmermann, (Eds.), Handbook of Economic Forecasting, Volume 1, 515–554. Amsterdam: North Holland.

APPENDIX: DATA APPENDIX No.

Series

Group

Publication lag

Transformation BF code

Release

1 2

IP-Total industry IP-Total Industry (excl construction)

IndProd IndProd

3 2

2 2

mid mid

3 4

IP-Manufacturing IP-Construction

IndProd IndProd

2 3

2 2

mid mid

5

IndProd

2

2

mid

6

IP-Total Industry excl construction and MIG Energy IP-Energy

IndProd

2

2

mid

7 8

IP-MIG Capital Goods Industry IP-MIG Durable Consumer Goods

IndProd IndProd

2 2

2 2

mid mid

9 10

Industry IP-MIG Energy IP-MIG Intermediate Goods

IndProd IndProd

3 2

2 2

mid mid

11

Industry IP-MIG Non-durable Consumer

IndProd

2

2

mid

Goods Industry


C41


No.

Series

APPENDIX. Continued. Publication Group lag


Release

12

IP-Manufacture of basic metals

IndProd

2

2

mid

13

IndProd

2

2

mid

14

IP-Manufacture of chemicals and chemical products IP-Manufacture of electrical

IndProd

2

2

mid

15

machinery and apparatus IP-Manufacture of machinery and

IndProd

2

2

mid

IndProd

2

2

mid

16

equipment IP-Manufacture of pulp, paper and paper products

17

IP-Manufacture of rubber and plastic products

IndProd

2

2

mid

18

IndProd

2

2

mid

19

Retail trade, except of motor vehicles and motorcycles New passenger car registrations

IndProd

1

2

mid

20 21

Unemployment rate, total Index of Employment, Construction

Emp Emp

1 3

1 2

end mid

22

Emp

3

2

mid

23

Index of Employment, Manufacturing Index of Employment, Total

Emp

3

2

mid

24

Industry Index of Employment, Total

Emp

3

2

mid

Surveys

0

1

end

25

Industry (excluding construction) Industry Survey: Industrial Confidence Indicator

26

Industry Survey: Production trend observed in recent months

Surveys

0

1

end

27

Industry Survey: Assessment of order-book levels Industry Survey: Assessment of

Surveys

0

1

end

Surveys

0

1

end

Surveys

0

1

end

Surveys

0

1

end

Surveys

0

1

end

28 29 30

31

export order-book levels Industry Survey: Assessment of stocks of finished products Industry Survey: Production expectations for the months ahead Industry Survey: Employment expectations for the months ahead


C42

No.

E. Angelini et al.

Series



Release

32

Industry Survey: Selling price

Surveys

0

1

end

33

expectations for the months ahead Consumer Survey: Consumer

Surveys

0

1

end

34

Confidence Indicator Consumer Survey: General

Surveys

0

1

end

35

economic situation over last 12 months Consumer Survey: General

Surveys

0

1

end

economic situation over next 12 months 36

Surveys

0

1

end

37

Consumer Survey: Price trends over last 12 months Consumer Survey: Price trends over

Surveys

0

1

end

38

next 12 months Consumer Survey: Unemployment

Surveys

0

1

end

Surveys

0

1

end

Surveys

0

1

end

Surveys

0

1

end

Surveys

0

1

end

Surveys

0

1

end

Surveys

0

1

end

Surveys

1

1

end

Surveys

1

1

end

Surveys

1

1

end

39 40

41 42

43

44 45 46 47

expectations over next 12 months Construction Survey: Construction Confidence Indicator Construction Survey: Trend of activity compared with preceding months Construction Survey: Assessment of order books Construction Survey: Employment expectations for the months ahead Construction Survey: Selling price expectations for the months ahead Retail Trade Survey: Retail Confidence Indicator Retail Trade Survey: Present business situation Retail Trade Survey: Assessment of stocks Retail Trade Survey: Expected business situation


C43


No.

Series


Transformation BF code Release

48

Retail Trade Survey: Employment

Surveys

1

1

end

49

expectations Total trade - Intra Euro 12 trade, Export Value

Int’l

2

2

end

50

Total trade - Extra Euro 12 trade, Export Value

Int’l

2

2

end

51

Int’l

2

2

end

52

Total trade - Intra Euro 12 trade, Import Value Total trade - Extra Euro 12 trade,

Int’l

2

2

end

53

Import Value US, Unemployment rate

Int’l

1

1

mid

54 55 56

US, IP total excl construction US, Employment, civilian US, Retail trade

Int’l Int’l Int’l

1 1 1

2 2 2

mid mid mid

57

US, Production expectations in manufacturing

Int’l

1

1

end

58 59

US, Consumer expectations index World market prices of raw materials in Euro, total, HWWA

Int’l Int’l

1 1

1 2

end end

60

World market prices of raw materials in Euro, total, excl

Int’l

1

2

end

61

energy, HWWA World market prices, crude oil, USD, HWWA

Int’l

1

2

end

62 63

Gold price, USD, fine ounce Brent Crude, 1 month fwd,

Int’l Int’l

1 1

2 2

end end

64 65

USD/BBL converted in euro ECB Nominal effective exch. rate ECB Real effective exch. rate CPI

Financial Financial

1 1

2 2

end end

66

deflated ECB Real effective exch. rate producer prices deflated

Financial

1

2

end

67 68

Exch. rate: USD/EUR Exch. rate: GBP/EUR

Financial Financial

1 1

2 2

end end

69 70 71

Exch. rate: YEN/EUR Eurostoxx 500 Eurostoxx 325

Financial Financial Financial

1 1 1

2 2 2

end end end

72

US SP 500 composite index

Financial

1

2

end


C44

No.

E. Angelini et al.

Series


Transformation BF code Release

73

US, Dow Jones, industrial average

Financial

1

2

end

74 75

US, Treasury Bill rate, 3-month US Treasury notes & bonds yield, 10 years

Financial Financial

1 1

1 1

end end

76 77

10-year government bond yield 3-month interest rate, Euribor

Financial Financial

1 1

1 1

end end

78 79 80

1-year government bond yield 2-year government bond yield 5-year government bond yield

Financial Financial Financial

1 1 1

1 1 1

end end end

81 82

Index of notional stock - Money M1 Index of notional stock - Money M2

Money Money

2 2

2 2

end end

83 84 85

Index of notional stock - Money M3 Index of Loans Money M2 in the US

Money Money Money

2 2 2

2 2 2

end end end

Note: The ‘Publication lag’ refers to the delay in months for releasing the data for a reference period. The date of release indicates if the release is included in the end or mid update of the model (see main text for further details). As for the transformation, code 1 is used to refer to differences while 2 is used when log differencing has been applied.


The


Weak and strong cross-section dependence and estimation of large panels A LEXANDER C HUDIK †,a , M. H ASHEM P ESARAN ‡,b AND E LISA T OSETTI § ,a †

‡

European Central Bank, Kaiserstrasse 29, 60311 Frankfurt am Main, Germany. E-mail: [email protected]

University of Cambridge, Faculty of Economics, Austin Robinson Building, Sidgwick Avenue, Cambridge CB3 9DD, UK. E-mail: [email protected] § Brunel

University, Kingston Lane, Uxbridge, Middlesex UB8 3PH, UK. E-mail: [email protected] a CIMF, Faculty of Economics, Austin Robinson Building, Sidgwick Avenue, Cambridge CB3 9DD, UK. b USC, Department of Economics, University of Southern California, 3620 South Vermont Avenue, Kaprielian Hall 300, Los Angeles, CA 90089-0253, USA.

First version received: June 2009; final version accepted: June 2010.

Summary This paper introduces the concepts of time-specific weak and strong crosssection dependence, and investigates how these notions are related to the concepts of weak, strong and semi-strong common factors, frequently used for modelling residual cross-section correlations in panel data models. It then focuses on the problems of estimating slope coefficients in large panels, where cross-section units are subject to possibly a large number of unobserved common factors. It is established that the common correlated effects (CCE) estimator introduced by Pesaran remains asymptotically normal under certain conditions on factor loadings of an infinite factor error structure, including cases where methods relying on principal components fail. The paper concludes with a set of Monte Carlo experiments where the small sample properties of estimators based on principal components and CCE estimators are investigated and compared under various assumptions on the nature of the unobserved common effects. Keywords: Common correlated effects (CCE) estimator, Panels, Strong and weak crosssection dependence, Weak and strong factors.

1. INTRODUCTION The problem of error cross-section dependence in panel regressions has attracted considerable attention over the past decade. It is increasingly recognized that conditioning on variables specific C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society. Published by Blackwell Publishing Ltd, 9600


C46

A. Chudik, M. H. Pesaran and E. Tosetti

to the cross-section units alone need not deliver cross-section error independence, and neglecting such dependencies can lead to biased estimates and spurious inference. How best to account for cross-correlation of errors in panels depends on the nature of the cross-dependence, and the size of the time series dimension (T) of the panel relative to its cross-section dimension (N). When N is small relative to T, and the errors are uncorrelated with the regressors crosssection dependence can be modelled using the seemingly unrelated regression equations (SURE) approach of Zellner (1962). But when N is large relative to T, the SURE procedure is not feasible. In such cases, there are two main approaches to modelling cross-section dependence in panels: (i) spatial processes pioneered by Whittle (1954) and developed further by Anselin (1988), Kelejian and Prucha (1999) and Lee (2004); and (ii) factor models introduced by Hotelling (1933), and first applied in economics by Stone (1947). Factor models have been used extensively in finance (Chamberlain and Rothschild, 1983, Connor and Korajzcyk, 1993, Stock and Watson, 2002, and Kapetanios and Pesaran, 2007), and in macroeconomics (Forni and Reichlin, 1998, and Stock and Watson, 2003), as a data shrinkage procedure where correlations across many units or variables are modelled by means of a small number of latent factors. In this paper, we show that factor models can be employed more generally to characterize other forms of dependence such as dependence across space or social networks. Initially, we introduce the concepts of weak and strong cross-section dependence defined at a point in time and with respect to a given information set. These concepts generalize the notions of weak (or idiosyncratic) and strong cross-section dependence advanced in the literature. Forni and Lippi (2001), building on Forni and Reichlin (1998), consider a double index process over both dimensions (time and space) simultaneously, and define it as idiosyncratic (or weakly dependent) if the weighted average of the process, computed over both dimensions, converges to zero in quadratic mean for all sets of weights satisfying certain granularity conditions. The double index process is said to be strongly dependent (again over both dimensions) if the weighted averages do not tend to zero.1 These concepts, that are applicable to dynamic factor models, provide a generalization of the notions of weak and strong dependence developed by Chamberlain (1983) and Chamberlain and Rothschild (1983) for the analysis of static factor models. Our notions of weak and strong cross-section dependence are more widely applicable and do not require the double index process to be stationarity over time, and allow a finer distinction between strong and semi-strong cross-section dependence. Convergence properties of weighted averages is of great importance for the asymptotic theory of various estimators and tests commonly used in panel data econometrics, as well as for arbitrage pricing theory and portfolio optimization with a large number of assets. It is clear that the underlying time series processes need not be stationary, and concepts of weak and strong dependence that are more generally applicable are needed. We also investigate how weak and strong cross-section dependence are related to the notions of weak, strong and semi-strong common factors, which may be used to represent very general forms of cross-section dependence. We then turn our attention to the second main concern of this paper, namely the estimation of slope coefficients in the context of panel data models with general cross-section error dependence. Building on the first part of the paper, we show that general linear error dependence in panels can be modelled in terms of a factor model with a fixed number of strong factors and a large number of non-strong factors. We allow the number of non-strong factors to rise with N, and establish that the common correlated effects (CCE) estimator introduced by Pesaran (2006)

1

For further developments and discussions, see Anderson et al. (2009). C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

Weak and strong cross-section dependence and estimation of large panels

C47

remains consistent and asymptotically normal under certain conditions on the loadings of the infinite factor structure, including cases where methods relying on principal components fail. A Monte Carlo study documents these theoretical findings by investigating the small sample performance of estimators based on principal components (including the recent iterative principal component (PC) procedure proposed by Bai, 2009) and the CCE estimators under alternative assumptions on the nature of unobserved common effects. In particular, we examine and compare the performance of these estimators when the errors are subject to a finite number of unobserved strong factors and an infinite number of weak and/or semi-strong unobserved common factors. As predicted by the theory the CCE estimator performs well and show very little size distortions, in contrast with the iterated PC approach of Bai (2009), which exhibit significant size distortions. The latter is partly due to the fact that in the presence of weak or semi-strong factors the PC estimates of factors need not be consistent. This problem does not affect the CCE estimator because it does not aim at consistent estimation of the factors but deals with error cross-section dependence generally by using cross-section averages to mop up such effects. As shown in Pesaran (2006), the CCE estimator continues to be valid even if the number of factors is larger than the number of cross-section averages. The present paper goes one step further and shows that this property holds even if the number of weak factors tend to infinity with N. Note that for variances of the observables to be bounded, the number of strong factors must be fixed and cannot vary with N. The plan of the remainder of the paper is as follows. Section 2 introduces the concepts of strong and weak cross-section dependence. Section 3 discusses the notions of weak, semi-strong and strong common factors. Section 4 introduces the CCE estimators in the context of panels with an infinite number of common factors. Section 5 describes the Monte Carlo design and discusses the results. Finally, Section 6 provides some concluding remarks. The mathematical details are relegated to the appendices. Notations. |λ1 (A)| ≥ |λ2 (A)| ≥ · · · ≥ |λn (A)| are the eigenvalues of a matrix A ∈ Mn×n , where Mn×n is the space of n × n complex-valued matrices. A+ denotes the Moore–Penrose generalized inverse of A. The columnnorm of A ∈ Mn×n is A1 = max1≤j ≤n ni=1 |aij |. The row norm of A is A∞ = max1≤i≤n nj=1 |aij |. The spectral norm of A is A = [λ1 (AA )]1/2 , and A2 = [Tr(AA )]1/2 . K is used for a fixed positive constant that does not depend on N. j

Joint convergence of N and T will be denoted by (N , T ) → ∞. For any random variable x, xLp = (E|x|p )1/p , for p > 1, denotes Lp norm of x. For any k × 1 vector of random 1/p Lp k p . We use → to denote convergence variables xk = (x1 , x2 , . . . , xk ) , xk Lp = i=1 E|xi | in Lp norm.

2. CROSS-SECTION DEPENDENCE IN LARGE PANELS Consider the double index process {zit , i ∈ N, t ∈ Z}, where zit is defined on a suitable probability space, the index t refers to an ordered set such as time, and i refers to units of an unordered population. Our primary focus is on characterizing the correlation structure of the double index process {zit } over the cross-sectional dimension at a given point in time, t. To this end, we make the following assumptions:


C48


A SSUMPTION 2.1. Let It be the information set available at time t. For each t ∈ T , zNt = (z1t , . . . , zNt ) has the conditional mean, E(zNt |It−1 ) = 0, and the conditional variance, Var(zNt |It−1 ) = Nt , where Nt is an N × N symmetric, non-negative definite matrix. The (i, j )th element of Nt , denoted by σN,ij t is bounded such that 0 < σN,iit ≤ K, for i = 1, 2, . . . , N , where K is a finite constant independent of N. A SSUMPTION 2.2. Let wNt = (wN,1t , . . . , wN,N,t ) , for t ∈ T ⊆ Z and N ∈ N, be a vector of non-stochastic weights. For any t ∈ T , the sequence of weight vectors {wNt } of growing dimension (N → ∞) satisfies the ‘granularity’ conditions: wNt = O(N − 2 ),

(2.1)

wN,j t 1 = O(N − 2 ) for any j ∈ N. wNt

(2.2)

1

Zero conditional mean in Assumption 2.1 can be relaxed to E(zNt |It−1 ) = μN,t−1 , with μN,t−1 being a pre-determined function of the elements of It−1 . Assumption 2.2, known in finance as the granularity condition, ensures that the weights {wN,it } are not dominated by a few of the cross-section units. Although we have assumed the weights to be non-stochastic, this is done for expositional convenience and can be relaxed by requiring that conditional on the information set, It−1 , the weights, wNt , are distributed independently of zNt . To simplify the notations in the rest of the paper we suppress the explicit dependence of zNt , wNt and other vectors and matrices and their elements on N. In the following, we describe our notions of weak and strong cross-sectionally dependent processes, and then introduce the related concepts of weak, strong and semi-strong factors. 2.1. Weak and strong cross-section dependence Consider the weighted averages, z¯ wt = N i=1 wit zit = wt zt , for t ∈ T , where zt and wt satisfy Assumptions 2.1 and 2.2. We are interested in the limiting behaviour of z¯ wt at a given point in time t ∈ T , as N → ∞. D EFINITION 2.1 (Weak and strong cross-section dependence). The process {zit } is said to be cross-sectionally weakly dependent (CWD) at a given point in time t ∈ T conditional on the information set It−1 , if for any sequence of weight vectors {wt } satisfying the granularity conditions (2.1) and (2.2) we have lim Var wt zt | It−1 = 0. (2.3) N→∞

{zit } is said to be cross-sectionally strongly dependent (CSD) at a given point in time t ∈ T conditional on the information set It−1 , if there exists a sequence of weight vectors {wt } satisfying (2.1) and (2.2) and a constant K independent of N such that for any N sufficiently large (and as N → ∞) Var wt zt |It−1 ≥ K > 0. (2.4) The concepts of weak and strong cross-section dependence proposed here are defined conditional on a given information set, It−1 , which allows us to consider cross-section dependence properties of {zit } without having to limit the time series features of the process. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


C49

Various information sets could be considered in practice, depending on the application under consideration. For dynamic (possibly non-stationary) models the information set could contain all lagged realizations of the process {zit }, that is It−1 = {zt−1 , zt−2 , . . . .}, or only the starting values of the process. For stationary panels, unconditional variances of cross-section averages could be considered. Conditioning information set could also contain contemporaneous realizations, which might be useful in applications where a particular unit has a dominant influence on the rest of the units in the system. R EMARK 2.1. Anderson et al. (2009) propose definitions of weak and strong cross-section dependence for covariance stationary processes, with spectral density Fz (ω) (see also Forni and Lippi, 2001). According to their definition, {zit } is weakly dependent if the largest eigenvalue of the spectral density matrix, λz1 (ω), is uniformly bounded in ω and N. {zit } is strongly dependent if the first m ≥ 1 (m < K) eigenvalues (λz1 (ω), . . . , λzm (ω)) diverge to infinity as N → ∞, for all frequencies. In contrast to the notions of weak and strong dependence advanced by Forni and Lippi (2001) and Anderson et al. (2009), our concepts of CWD and CSD do not require the underlying processes to be covariance stationary and have spectral density at all frequencies. R EMARK 2.2. A particular form of a CWD process arises when pairwise correlations take non-zero values only across finite subsets of units that do not spread widely as sample size increases. A similar case occurs in spatial processes, where for example local dependency exists only among adjacent observations. However, we note that the notion of weak dependence does not necessarily involve an ordering of the observations or the specification of a distance metric across the observations. The following proposition establishes the relationship between weak cross-section dependence and the asymptotic behaviour of the spectral radius of t (denoted by λ1 ( t )). P ROPOSITION 2.1. The following statements hold: (a) The process {zit } is CWD at a point in time t ∈ T , if λ1 ( t ) is bounded in N. (b) The process {zit } is CSD at a point in time t ∈ T , if and only if for any N sufficiently large (and as N → ∞), N −1 λ1 ( t ) ≥ K > 0. Since λ1 ( t ) ≤ t 1 , it follows from (B.1) in the Appendix that both the spectral radius and the column norm of the covariance matrix of a CSD process are unbounded in N.2 A similar condition also arises in the case of time series processes with long memory or strong temporal dependence where the autocorrelation coefficients are not absolutely summable (Robinson, 2003). R EMARK 2.3. The definition of idiosyncratic process by Forni and Lippi (2001) differs from our definition of CWD in terms of the weights used to construct the weighted averages. While Forni and Lippi assume limN→∞ w = 0, our granularity conditions (2.1) and (2.2) imply that, 1 for any t ∈ T , limN→∞ N 2 − wt = 0 for any > 0. This difference in the definition of weights has important implications for the cross-sectional properties of the processes. In particular, under limN→∞ wt = 0, it is possible to show that the idiosyncratic process (and hence also the 2

See Horn and Johnson (1985, pp. 297–98).


C50


definition of weak dependence a` la Anderson et al., 2009) imply bounded eigenvalues of the spectral density matrix. Conversely, under (2.1) and (2.2), it is clear that if λ1 ( t ) = O(N 1− ) for any > 0, then, using (2.5), lim wt wt λ1 ( t ) = 0, N→∞

and the underlying process will be CWD. Hence, the bounded eigenvalue condition is sufficient but not necessary for CWD. According to our definition a process could be CWD even if its maximum eigenvalue is rising with N, so long as its rate of increase is bounded appropriately. One rationale for characterizing processes with increasing largest eigenvalues at the slower pace than N as weakly dependent is that bounded eigenvalues is not a necessary condition for consistent estimation in general, although in some cases, such as the method of principal components, this condition is needed. In Section 4, we consider estimation of slope coefficients in panels with an infinite factor structure, where eigenvalues of the error covariance matrix are allowed to increase at a rate slower than N.

3. COMMON FACTOR MODELS Consider the following N factor model for {zit }: zit = γi1 f1t + γi2 f2t + · · · + γiN fNt + εit ,

i = 1, 2, . . . , N ,

(3.1)

or in matrix notations zt = ft + ε t ,

(3.2)

where ft = (f1t , f2t , . . . , fNt ) , ε t = (ε1t , ε2t , . . . , εNt ) , and the common factors, ft , and the idiosyncratic errors, εit , satisfy the following assumptions: A SSUMPTION 3.1. The N × 1 vector ft is a zero mean covariance stationary process, with absolute summable autocovariances, distributed independently of εit for all i, t, t , and such that E(ft2 |It−1 ) = 1 and E(ft fpt |It−1 ) = 0, for = p = 1, 2, . . . , N . A SSUMPTION 3.2. Var(εit | It−1 ) = σi2 < K < ∞, εit and εj t are independently distributed 2 < K < ∞. for all i = j and for all t. Specifically, maxi (σi2 ) = σmax The process zit in (3.1) has conditional variance N N Var(zit | It−1 ) = Var γi ft | It−1 + Var(εit | It−1 ) = γi2 + σi2 . =1

=1

For the conditional variance of zit to be bounded in N, as required by Assumption 2.1, we must have N

γi2 ≤ K < ∞,

for i = 1, 2, . . . , N .

(3.3)

=1 C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


C51

In what follows we also consider the slightly stronger absolute summability condition N

|γi | ≤ K < ∞,

for i = 1, 2, . . . , N .

(3.4)

=1

D EFINITION 3.1 (Strong and weak factors). The factor ft is said to be strong if lim N −1

N

N→∞

|γi | = K > 0.

(3.5)

i=1

The factor ft is said to be weak if lim

N→∞

N

|γi | = K < ∞.

(3.6)

i=1

The literature on large factor models has focused on the case where the factors are strong. The case of weak factors is recently considered by Onatski (2009). It is also possible to consider semi-strong or semi-weak factors. In general, let α be a positive constant in the range 0 ≤ α ≤ 1 and consider the condition lim N −α

N→∞

N

|γi | = K < ∞.

(3.7)

i=1

The strong and weak factors correspond to the two values of α = 1 and α = 0, respectively. For any other values of α ∈ (0, 1), the factor ft can be said to be semi-strong or semi-weak. It will prove useful to associate the semi-weak factors with values of 0 < α < 1/2, and the semi-strong factors with values of 1/2 ≤ α < 1. In Section 4, we provide some practical examples where such semi-strong factors may exist. The relationship between the notions of CSD and CWD and the definitions of weak and strong factors are explored in the following theorem. T HEOREM 3.1. Consider the factor model (3.2) and suppose that Assumptions 2.1–3.2 and the absolute summability condition (3.4) hold, and there exists a positive constant α in the range 0 ≤ α ≤ 1, such that condition (3.7) hold for any = 1, 2, . . . , N. Then the following statements hold: (a) The process {zit } is cross-sectionally weakly dependent at a given point in time t ∈ T if α < 1, which includes cases of weak, semi-weak or semi-strong factors ft , for = 1, 2, . . . , N. (b) The process {zit } is cross-sectionally strongly dependent at a given point in time t ∈ T if and only if there exists at least one strong factor.

Under (3.5) and (3.6), zit can be decomposed as zit = zits + zitw , C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

(3.8)

C52


where zits

=

m

zitw

γi ft ;

=1

N

=

γi ft + εit ,

(3.9)

=m+1

and γi satisfy conditions (3.5) for = 1, . . . , m, and (3.6) for = m + 1, . . . , N . In the light of Theorem 3.1, it follows that zits is CSD and zitw is CWD. Also, notice that when m = 0, we have a model with no strong factors and potentially an infinite number of weak factors. R EMARK 3.1.

Consider the following general spatial process: zt = Rvt ,

(3.10)

where R is an N × N matrix and vt is an N × 1 vector of independently distributed random variables. Pesaran and Tosetti (2010) have shown that spatial processes commonly used in the empirical literature, such as the spatial autoregressive (SAR) process, or the spatial moving average (SMA), can be written as special cases of (3.10). Specifically, for an SMA process R = IN + δS, where δ is a scalar parameter (|δ| < K) and S is an N × N non-negative matrix that expresses the ordering or network linkages across the units, while in the case of an invertible SAR process, we have R = (IN − δS)−1 . Standard spatial literature assumes that R has bounded column and row norms. It is easy to see that under these conditions the above process can be represented by a factor process with an infinite number of weak factors (i.e. with m = 0), and no idiosyncratic error (i.e. εit = 0). For example, by setting zit = N =1 γi ft , where γi = ri and ft = vt , for i, = 1, . . . , N . Under the bounded column and row norms of R, the loadings in the above factor structure satisfy (3.6), and hence zit will be a CWD process. R EMARK 3.2. Consistent estimation of factor models with weak or semi-strong factors may be problematic. To see this, consider the following single factor model with known factor loadings: zit = γi ft + εit ,

εit ∼ IID(0, σ 2 ).

The least squares estimator of ft , which is the best linear unbiased estimator, is given by N 2 γi zit ˆ ˆt ) = σ ft = i=1 , Var( f . N N 2 2 i=1 γi i=1 γi 2 ˆ If, for example, N i=1 γi is bounded, as in the case of weak factors, then Var(ft ) does not vanish as N → ∞, for each t. See also Onatski (2009).

4. CCE ESTIMATION OF PANEL DATA MODELS WITH AN INFINITE NUMBER OF FACTORS In this section, we focus on consistent estimation of slopes in panel regression models where the error terms have an infinite order factor structure. Let yit be the observation on the ith crosssection unit at time t, for i = 1, 2, . . . , N , and t = 1, 2, . . . , T , and suppose that it is generated as yit = α i dt + β i xit + eit ,

(4.1)



C53

where dt = (d1t , d2t , . . . , dmd t ) is an md × 1 vector of observed common effects, and xit is a k × 1 vector of observed individual specific regressors. The parameters of interest are the means of individual slope coefficients, β = E(β i ).3 The error term, eit , is given by the following general factor structure: eit =

mf =1

γi ft +

mn

λi nt + εit ,

(4.2)

=1

where we have distinguished between two types of unobserved common factors, ft = (f1t , f2t , . . . , fmf t ) and nt = (n1t , n2t , . . . , nmn t ) . The former are strong factors that are possibly correlated with the regressors xit , while the latter are the weak, semi-weak or semi-strong factors that are assumed to be uncorrelated with the regressors. The associated vectors of factor loadings will be denoted by γ i = (γi1 , γi2 , . . . , γimf ) and λi = (λi1 , λi2 , . . . , λimn ) , respectively. The cross-section dependence of errors is modelled using the unobserved common factors, ft and nt , and without loss of generality it is assumed that the idiosyncratic errors, εit , are cross-sectionally uncorrelated (although they can be serially correlated). To model the correlation between the individual specific regressors, xit , and the innovations eit , we suppose that xit can be correlated with any of the strong factors, ft , xit = Ai dt + i ft + vit ,

(4.3)

where Ai and i are k × md and k × mf factor loading matrices, and vit is the individual component of xit , assumed to be distributed independently of the innovations eit . Similar panel data models have been analysed by Pesaran (2006), Kapetanios et al. (2010), and Pesaran and Tosetti (2010). Pesaran (2006) introduced CCE estimators in a panel model where mf is fixed, mn = 0, and γ i ft represents a strong factor structure. Contrary to what Bai (2009) (see p. 1231) suggests, CCE estimators are valid even in the rank deficient case where mf could be larger than k + 1. Kapetanios et al. (2010) extended the results of Pesaran (2006) by allowing unobserved common factors to follow unit root processes. In both papers, innovations {εit } are assumed to be cross-sectionally independent although possibly serially correlated. This assumption is relaxed by Pesaran and Tosetti (2010) who assume that {εit } is a weakly dependent process with bounded row and column norms of its variance matrix, which includes spatial MA or AR processes considered in the literature as special cases. In this paper, we focus explicitly on cross-correlations modelled by general factor structures—weak, strong or somewhere in between. Our analysis is thus an extension of Pesaran (2006) to the case where there are an infinite number of factors, a fixed number of which are strong and the rest are either weak, semiweak or semi-strong factors. The special case where both mf and mn are fixed has already been analysed in the above cited papers. The case where f1t , f2t , . . . , fmf t are strong and mf = mf (N ) → ∞ as N → ∞, is not that meaningful as it will lead to unbounded variances as N → ∞. However, it would be possible to let the number of non-strong factors to rise with N, while keeping the number of strong factors fixed. We show below that the CCE-type estimators continue to be consistent and asymptotically normal under these types of infinite-factor error structures. We use notations mn (N) to emphasize the dependence of the number of non-strong factors on N in the remainder of this paper.

3

We assume that individual slope coefficients are drawn from a common distribution with mean β.


C54


Equations (4.1) and (4.3) can be written more compactly as yit zit = = Bi dt + Ci ft + uit , xit where

Bi = α i Ai Di , Ci = γ i i Di , 1 01×k λi nt + εit + β i vit Di = , uit = . βi Ik vit

(4.4)

(4.5)

Stacking the T observations for each i we also have yi = Dα i + Xi β i + ei , Xi = G i + vi ,

(4.6)

Zi = DBi + FCi + Ui , where yi = (yi1 , yi2 , . . . , yiT ) , D = (d1 , d2 , . . . , dT ) , Xi = (xi1 , xi2 , . . . , xiT ) , G = (D, F), F = (f1 , f2 , . . . , fT ) , vi = (vi1 , vi2 , . . . , viT ) , Zi = (zi1 , zi2 , . . . , ziT ) , Ui = (ui1 , ui2 , . . . , uiT ) and i = (Ai , i ) . For the development of the CCE estimators we need the cross-section averages of w z the individual specific variables zit = (yit , xit ) , which we denote by zwt = N i=1 i it , where w = (w1 , w1 , . . . , wN ) is any vector of weights that satisfy the granularity conditions (2.1) and (2.2). Further, let Mw = IT − Hw (Hw Hw )+ Hw , Hw = (D, Zw ), Zw = (zw1 , zw2 , . . . , zwT ) , Mq = IT − Q(Q Q)+ Q , Q = GPw , ⎞ ⎛ Bw Imd md ×(k+1) ⎟ ⎜ (4.7) Pw =⎝ ⎠, (md +mf )×(md +k+1) Cw 0 mf ×md

Bw =

N

wi Bi

and

i=1

Cw =

mf ×(k+1) N

wi Ci .

(4.8)

i=1

Also, define the matrices associated with Mq and Pw as Mg = IT − G(G G)−1 G and ⎛ ⎞ Imd B ⎠ P =⎝ C , 0

(4.9)

mf ×md

where B = E(Bi ) and C = E(Ci ). As we shall see below, the asymptotic theory of the CCE-type estimators depends on the rank of Cw both for a finite N, and as N → ∞. We make the following assumptions on the unobserved common factors ft and nt and their loadings. A SSUMPTION 4.1 (Common factors). The (md + mf ) × 1 vector gt = (dt , ft ) is a covariance stationary process, with absolute summable autocovariances and finite second-order moments. In particular, g < K for some constant K, where g = E(gt gt ) is a positive definite matrix. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


C55

For each = 1, 2, . . . , mn (N), common factor nt follows a covariance stationary process with absolute summable autocovariances, zero mean, unit variance and finite fourth-order moment uniformly bounded in . nt is independently distributed of gt and of n t for all = and t. A SSUMPTION 4.2 (Factor loadings). (a) Factor loadings γ i , and i are independently and identically distributed across i, and of the common factors gt , nt , for all i and t, with fixed mean γ and , and uniformly bounded second moments. In particular, γ i = γ + ηγ i , ηγ i ∼ IID(0, γ ),

for i = 1, 2, . . . , N ,

and vec( i ) = vec() + ηi , ηi ∼ IID(0, ) for i = 1, 2, . . . , N , where γ and are mf × mf and k mf × k mf symmetric non-negative definite matrices, γ < K, γ < K, < K and < K for some constant K. (b) Factor loadings λi , for i = 1, 2, . . . , N and = 1, 2, . . . , mn (N ) are non-stochastic. For each i = 1, 2, . . . , N, the factor loadings, λi , satisfy the following absolute summability condition m n (N)

|λi | < K.

(4.10)

=1

condition (4.10) is sufficient for ensuring bounded R EMARK 4.1. The absolute summability n (N) λ n for each i = 1, 2, . . . , N, as mn (N ) → ∞. This variances of ϑit = λi nt = m i t =1 condition alone does not, however, rule out strong, semi-strong or semi-weak factor structures. Additional requirements on the sum of absolute values of the loadings λi across i will be postulated in theorems below. The following assumptions are similar to Pesaran (2006). A SSUMPTION 4.3. The individual-specific errors εit and vit are independently distributed across i, independently distributed of the common factors gt , nt and of the factor loadings γ j , j , for each i, j and each t. vit , for i = 1, 2, . . . , N , follow linear stationary processes with absolute summable autocovariances, zero mean and finite second-order moments uniformly bounded in i. For each i, E vit vit = vi , where vi is a positive definite matrix, such that supi vi < K, for some positive constant K. Errors εit , for i = 1, 2, . . . , N , follow a linear stationary process with absolute summable autocovariances, zero mean and finite second-order moments uniformly bounded in i. A SSUMPTION 4.4. Coefficient matrices Bi are independently and identically distributed across i, independently distributed of the common factors gt and nt , of the factor loadings γ j and j , and of the errors εj t and vj t , for all i, j and t, with fixed mean B, and uniformly bounded second moments. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C56

A SSUMPTION 4.5.


The slope coefficients follow the random coefficient model β i = β + υ i , υ i ∼ IID(0, β ),

for i = 1, 2, . . . , N ,

where β < K, β < K, β is a symmetric non-negative definite matrix, and the random deviations υ i are distributed independently of the common factors gt and nt , of the factor loadings γ j and j , of the errors εj t and vj t , and of the coefficients in α j and Aj for all i, j and t. ∗ A SSUMPTION 4.6. (a) The matrix limN→∞ N i=1 wi iq = exists and is non-singular, and −1 ∗ ∗ ∗ supi iq < K, where iq = vi + i g i , and i = [I − P(P P)+ P ] i . (b) Denote the tth row of matrix Xi = Mq Xi by xit = ( xi1t , xi2t , . . . ., xikt ). Individual elements of the vector xit have uniformly bounded fourth moments, namely there exists a positive constant K such that 4 ) < K for any t = 1, 2, . . . , T , i = 1, 2, . . . , N and s = 1, 2, . . . , k. Furthermore, fourth E( xist moments of ft , for = 1, 2, . . . , mf , are bounded. (c) There exists T 0 such that for all T ≥ −1 T0 ( N exists. (d) There exists T 0 and N 0 such that for all T ≥ T0 and N ≥ i=1 wi Xi Mw Xi /T ) N0 , the k × k matrices (Xi Mw Xi /T )−1 and (Xi Mg Xi /T )−1 exist for all i. The CCE approach is motivated by the fact that, to estimate β, one does not necessarily need to compute consistent estimates of the unobservable common factors. It is sufficient to account for their effects by including cross-section averages of the observables in the regressions, since such cross-section averages indirectly reflect the overall importance of the factors for the estimation of β. Two types of CCE estimators are considered. The common correlated effects mean group estimator (CCEMG), which is given by N 1 β MG = β, N i=1 i

(4.11)

where β i = (Xi Mw Xi )−1 Xi Mw yi , and the common correlated effects pooled (CCEP) estimator which is defined by βP =

N

−1 wi Xi Mw Xi

i=1

N

wi Xi Mw yi .

(4.12)

i=1

The following theorem establishes consistency of CCE estimators in case of panels with (possibly) an infinite number of factors. T HEOREM 4.1 (Consistency of CCE estimators). Consider the panel data models (4.1) and (4.3), and suppose that Assumptions 4.1–4.6 hold, and there exist constants α and K such that 0 ≤ α < 1, N

|λi | < K N α

for each = 1, 2, . . . , mn (N )

(4.13)

i=1

and lim

N→∞

mn (N ) → 0. N 2(1−α)

(4.14)



C57

Then common correlated effects mean group and pooled estimators, defined by (4.11) and (4.12), j

respectively, are consistent, that is as (N , T ) → ∞ we have p β MG − β → 0,

(4.15)

p β P − β → 0.

(4.16)

and

n (N) Assumptions of Theorem 4.1 rule out the case where ϑit = λi nt = m =1 λi nt is a strong factor structure, but allow for the possibility of semi-strong (1/2 ≤ α < 1), semi-weak (0 < α < 1/2) or weak factors (α = 0) so long as the number of factors mn (N ) is appropriately bounded. The sufficient bound for mn (N) is given by condition (4.14). Note that conditions (4.13) and (4.14) and 0 ≤ α < 1 ensure that Var(ϑ wt ) → 0, as N → ∞, and therefore ϑit is CWD. The following theorem establishes asymptotic distribution of CCE estimator in case of weak (α = 0) and semi-weak (0 < α < 1/2) infinite factor structures. T HEOREM 4.2 (Distribution of CCE estimators). Consider the panel data models (4.1) and (4.3), and suppose that Assumptions 4.1–4.6 hold, and there exist constants α and K such that 0 ≤ α < 1/2, N

|λi | < K N α

for each = 1, 2, . . . , mn (N )

(4.17)

i=1

and mn (N) < K N 1−2α .

(4.18)

√ d N ( β MG − β) → N (0, MG ),

(4.19)

j

Then, as (N, T ) → ∞,

where β MG is given by (4.11), and MG is given by equation (B.30) in the Appendix. Furthermore, −1/2 N d 2 wi ( β P − β) → N (0, P ), (4.20) i=1

where β P is given by (4.12), and P is given by equation (B.24) in the Appendix. R EMARK 4.2. Following Pesaran (2006), it is also possible to provide semi-parametric β P . Consistent estimators of MG and P are given by estimators of variances of β MG and equations (58) and (69) of Pesaran (2006), respectively. R EMARK 4.3. As was mentioned earlier, CCE estimators are valid irrespective whether Cw defined by (4.8) has full column rank, or is rank deficient, and therefore mf , the number of factors in ft , could be larger than k + 1. If assumption of full column rank of Cw (for any N ∈ N, C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C58


as well as N → ∞) is satisfied, then Assumption 4.2(a) on factor loadings and Assumption 4.4 on coefficient matrices could be relaxed. In particular, it would be sufficient to assume that factor loadings γ i and i and the coefficients α i and Ai are non-stochastic and uniformly bounded. Current factor literature assumes that eigenvalues of the spectral density matrix of the underlying double-indexed processes either rise with N at the rate N or are bounded in N, while they are not allowed to rise at any rate slower than N. As the sources of cross-section dependence are generally unknown (factors are latent and in general not identified), such assumptions seem to have been adopted for technical convenience rather than on grounds of their empirical validity. However, in several empirical applications it seems reasonable to consider cases where the eigenvalues of the spectral density rise at a rate slower than N. Semi-strong factors may exist if there is a cross-section unit or an unobserved common factor that affects, rather than all units, only a subset of them expanding at a rate slower than N. One can think of an unobserved common shock that hits only a subset of the population; for example, a new law that affects only large firms. As the number of firms, N, increases, one reasonable assumption is that the number of large firms increases at a rate slower than N. Similarly, the performance of medium-sized firms may have impact only on a subset of firms in the market. If we assume that the range of influence of this firm is proportional to its dimension, then as N increases, the subset of units that is affected by it expands at a rate slower than N. We observe that practical difficulties encountered when estimating the number of factors in large data sets could be related to the presence of semi-strong factors, as existing techniques for determining the number of factors assume that there are no semi-weak (or semi-strong) factors and that all factors under consideration are either weak or strong.

5. MONTE CARLO EXPERIMENTS We consider the following data-generating process: yit = αi d1t + βi1 xi1t + βi2 xi2t + uit ,

(5.1)

for i = 1, 2, . . . , N and t = 1, 2, . . . , T . We assume heterogeneous slopes, and set βij = βj + ηij , with ηij ∼ IIDN(1, 0.04), for i = 1, 2, . . . , N and j = 1, 2, varying across replications. The errors, uit , are generated as uit =

3

γi ft +

=1

mn

λi nt + εit ,

=1

where εit ∼ N(0, σi2 ), σi2 ∼ IIDU(0.5, 1.5), for i = 1, 2, . . . , N (the MC results will be robust to serial correlation in εit ), and unobserved common factors are generated as an independent AR(1) processes with unit variance: ft = 0.5ft−1 + vft , = 1, 2, 3;

t = −49, . . . , 0, 1, . . . , T ,

vft ∼ IIDN(0, 1 − 0.5 ), f,−50 = 0, 2

nt = 0.5nt−1 + vnt , = 1, . . . , mn ;

t = −49, . . . , 0, 1, . . . , T ,

vnt ∼ IIDN(0, 1 − 0.5 ), n,−50 = 0. 2



C59

The first three factors will be assumed to be strong, in the sense that the sum of the absolute values of their loadings is unbounded in N, and are generated as γi ∼ IIDU(0, 1),

for i = 1, . . . , N , = 1, 2, 3.

The following two cases are considered for the remaining mn factors nt : Experiment A. {nt } are weak, with their loadings given by ηi λi = N , ηi ∼ IIDU(0, 1), for = 1, . . . , mn and i = 1, 2, . . . , N . 2 i=1 ηi mn 2 2 It is easily seen that for each , N i=1 |λi | = O(1) and for each i, =1 λi = O(mn /N ). 2 Therefore, asymptotically as N → ∞, the Ri is only affected by the strong factors, even if mn → ∞. Experiment B. As an intermediate case we shall also consider semi-strong factors where the loadings are generated by ηi λi = , for = 1, . . . , mn and i = 1, 2, . . . , N . 2 3 N η i=1 i n 2 1/2 In this case, for each , N ), and for each i, m i=1 |λi | = O(N =1 λi = O(mn /N), and the signal-to-noise ratio of the regressions deteriorate as mn is increased for any given N. In Section 5.1, we will investigate this issue further, to check if the effect of mn on Ri2 for a given N impacts on the performance of our estimators. The remaining variables in the panel data model are set out as follows: regressors xij t are assumed to be correlated with strong unobserved common factors and generated as follows: xij t = aij 1 d1t + aij 2 d2t +

3

γij ft + vij t ,

j = 1, 2,

=1

where γij ∼ IIDU(0, 1),

for i = 1, . . . , N , = 1, 2, 3; j = 1, 2.

vij t = ρυij vij t−1 + ϑij t ,

i = 1, 2, . . . , N; t = −49, . . . , 0, 1, . . . , T ,

ϑij t ∼ IDN(0, 1 −

vij ,−50 = 0, ρϑij ∼ IIDU(0.05, 0.95)

ρϑ2ij ),

for j = 1, 2.

The observed common effects are generated as d1t = 1; d2t = 0.5d2t−1 + vdt , vdt ∼ IIDN(0, 1 − 0.52 ),

t = −49, . . . , 0, 1, . . . , T ,

d2,−50 = 0.

When generating vij t and the common factors ft , nt and d2t , the first 50 observations have been discarded to reduce the effect on estimates of initial values. The factor loadings of the observed common effects do not change across replications and are generated as αi ∼ IIDN(1, 1),

i = 1, 2, . . . , N ,

(ai11 , ai21 , ai12 , ai22 ) ∼ IIDN(0.5τ 4 , 0.5I4 ), where τ 4 = (1, 1, 1, 1) and I4 is a 4 × 4 identity matrix. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C60


Each experiment was replicated 2000 times for all pairs of N and T = 20, 30, 50, 100 and 200. For each N we shall consider mn = 0, N/5, 3N /5, N . For example, for N = 100, we consider mn = 0, 20, 60, 100. We report bias, RMSE, size and power for six estimators: the FE estimator with standard variance, the CCEMG and CCEP estimators given by (4.11) and (4.12), respectively, the MGPC and PPC estimators proposed by Kapetanios and Pesaran (2007), and the PC estimator proposed by Bai (2009). The MGPC and PPC estimators are similar to (4.11) and (4.12) except that the cross-section averages are replaced by estimated common factors using the Bai and Ng (2002) procedure to zit = (yit , xit ) . Note that the PPC estimator coincides with the factor augmented panel regression proposed by Giannone and Lenza (2010). In the PC iterative ˆ is the solution to the following set of non-linear equations: estimator by Bai (2009), ( bˆ P C , F) N −1 N N 1 ˆbP C = ˆ =F ˆ Vˆ , Xi MFˆ Xi Xi MFˆ yi , (yi − Xi bˆ P C )(yi − Xi bˆ P C ) F N T i=1 i=1 i=1

ˆ −1 F ˆ , and Vˆ is a diagonal matrix with the m ˆ F ˆ F) ˆ f largest eigenvalues where MFˆ = IT − F( 1 N ˆ ˆ of the matrix NT i=1 (yi − Xi bP C )(yi − Xi bP C ) arranged in decreasing order. The demeaning operator is applied to all variables before entering in the iterative procedure, to get rid of the fixed effects. The variance estimator of bˆ P C is bˆ P C ) = 1 D−1 DZ D−1 Var( 0 , NT 0 N −1 where D0 = (NT )−1 N ˆ i2 (T −1 Tt=1 zit zit ), with σˆ i2 = T −1 Tt=1 i=1 Zi Zi , DZ = N i=1 σ ˆˆ −1 ˆ εˆ it2 , Zi = MFˆ Xi − N −1 N k=1 [γˆ i ( L L/N) γˆ k ]MFˆ Xk , and L = (γˆ 1 , . . . , γˆ N ) is the matrix of ˆ estimated factor loadings. When T /N → ρ > 0, bP C is biased and, following Bai (2009), we estimate the bias as −1 N ˆ Lˆ Lˆ 1 −1 1 (Xi − Vˆ i ) F bias = − D0 γˆ i σˆ i2 , N N i=1 T N ˆˆ −1 where Vˆ i = N −1 N j =1 γˆ i ( L L/N) γˆ j Xj . The selection of the number of strong common factors (mf ) in the Kapetanios and Pesaran (2007) and in the Bai (2009) estimators has been based on Bai and Ng (2002) ICp1 criterion. 5.1. Results Results on the estimation of the slope parameters for the Experiments A and B are summarized in Tables 1–5. In what follows, we focus on the estimation of β1 ; results for β2 are very similar and are not reported. Notice that the power of the various tests is computed under the alternative H1 : β1 = 0.95. We do not report results for the FE estimator since they show that, as expected, this estimator performs very poorly, is substantially biased, and is subject to large size distortions for all pairs of N and T, and for all values of mn . Tables 1 and 2 show the results for the CCE estimators. The bias and RMSE of CCEP and the CCEMG estimators fall steadily with the sample size and tests of the null hypothesis based on them are correctly sized, regardless of whether the factors, {nt , = 1, 2, . . . , mn }, are weak or semi-strong, and the choice of mn . Further, we notice that the power of the tests based on CCE estimators is not affected by mn , the number of weak (or C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

20 30

0 0

30

50

200

20

30

50


25.65 34.50 58.70 86.65

120 200 −0.07 −0.01 −0.07 −0.04

0.12

86.05

0.06 −0.05

0.00 2.93 2.37 2.01 1.72 1.63 5.10 5.70 5.65 5.05 5.20 42.15 57.10 71.65 82.30

24.85 36.25 59.10

0.01

20.25

0.03 −0.09 −0.11 8.94 7.21 6.12 5.49 5.15 7.50 6.90 7.35 7.80 7.55 11.10 12.45 16.10 18.20

0.09 0.05 0.10 −0.03 7.17 5.89 5.03 4.34 4.20 5.70 6.40 6.55 6.45 7.00 12.65 16.30 19.95 22.10 0.00 −0.03 −0.01 −0.05 5.77 4.70 3.97 3.50 3.24 6.40 5.50 5.70 5.65 5.30 17.20 20.85 26.10 33.15 0.11 −0.03 −0.01 −0.06 4.13 3.28 2.82 2.44 2.34 5.95 5.20 5.60 5.55 5.90 24.90 34.20 45.00 54.20

0.22 −0.21

200 200

20

0.10 0.20 0.08

20

30 30 50 50 100 100

0.04 2.90 2.33 2.01 1.71 1.63 5.45 5.15 5.65 4.50 5.40 40.00 56.80 71.15 82.00

21.15

0.04

0.06 9.27 7.61 6.20 5.51 5.05 7.75 8.60 7.20 8.15 7.90 11.50 14.35 15.65 19.05

20

0.15 −0.02 −0.07

12

18 30 −0.14 −0.01 −0.08 −0.20 0.03 7.03 6.02 4.99 4.51 4.20 5.35 5.45 5.95 6.85 6.90 11.35 16.25 18.35 21.75 30 50 0.17 −0.17 −0.03 −0.04 −0.11 5.69 4.56 3.90 3.43 3.16 5.45 6.05 5.60 5.70 5.00 16.35 19.50 25.30 31.50 60 100 0.06 0.08 −0.03 −0.03 0.01 3.97 3.30 2.74 2.52 2.24 4.90 5.15 5.05 6.60 5.25 23.75 34.15 44.85 53.95

87.30

0.04 2.83 2.35 1.95 1.72 1.63 4.50 5.35 5.10 5.30 4.65 40.50 54.35 71.25 83.70

0.01 −0.12 −0.05

0.04

20 30

40 200

4 6

34.80 59.35

100

10 50 0.08 −0.05 −0.07 −0.06 −0.06 5.82 4.67 3.96 3.47 3.33 6.05 5.95 5.80 6.00 6.25 15.15 19.95 25.90 32.40 20 100 −0.05 −0.07 0.12 −0.05 0.00 4.05 3.29 2.79 2.48 2.29 5.00 4.65 5.45 6.30 4.80 22.40 32.25 44.45 53.15

50

18.75 25.60

30

0.14 0.13 −0.16 −0.08 −0.11 8.65 7.13 6.15 5.50 5.03 6.15 7.30 6.90 6.85 7.45 10.40 13.65 15.85 18.20 0.20 −0.16 0.04 −0.10 −0.09 7.16 5.87 5.12 4.44 4.28 6.00 6.25 7.65 6.60 8.10 13.35 14.10 20.00 22.35

20

87.05

200

34.30 58.20

100

0.10 −0.09 5.86 4.68 3.93 3.42 3.23 5.95 6.45 5.70 6.30 5.15 16.35 20.25 26.80 32.35 0.05 −0.05 4.06 3.34 2.73 2.49 2.34 5.35 5.85 6.05 6.20 5.65 24.60 35.30 46.50 54.65

0.00 −0.06

0.03 0.13

50

19.70 23.70

30

Experiment A: mf = 3 strong factors and mn weak factors 0.08 0.01 8.91 7.13 6.13 5.44 5.19 6.80 6.70 7.10 7.10 7.80 11.70 13.05 15.20 19.00 0.04 −0.15 7.46 5.88 5.01 4.48 4.13 7.15 5.85 7.30 7.50 6.25 14.45 16.05 19.25 22.70

20

200

200

100

100

Power (×100)

0.01 3.02 2.32 2.00 1.75 1.65 6.35 4.60 5.35 5.25 5.60 41.30 55.95 71.30 80.70

0 200 −0.01 −0.04

0.16 0.06

0.04 −0.13 −0.03 0.37 0.11 0.00

20

0 50 0.04 0 100 −0.05

N/T

mn

Bias (×100)

Table 1. Results for CCEMG estimator RMSE (×100) Size (×100)


C61

30

50

100

200

20

30

50

200 200

0.01 2.81 2.29 1.95 1.77 1.62 4.75 5.35 5.35 6.15 4.40 44.40 56.95 71.75 82.00

86.70

25.05 36.80 58.20

0.03 −0.07 −0.05 −0.05

20.70

0.02 8.89 7.38 6.16 5.38 5.06 6.65 7.35 7.30 7.70 6.75 10.15 12.95 14.45 17.10

20 −0.12 −0.12 −0.20 −0.12

30 30 −0.41 0.20 0.03 0.09 −0.01 7.49 6.11 5.13 4.51 4.33 6.30 6.40 6.45 6.45 7.00 10.75 15.50 19.15 23.50 50 50 0.02 0.10 −0.01 0.06 0.03 5.47 4.49 3.81 3.45 3.16 5.70 5.85 5.35 6.35 5.25 15.95 21.15 26.15 33.00 100 100 0.17 −0.01 −0.09 −0.07 −0.02 4.05 3.27 2.83 2.46 2.26 5.85 5.30 5.25 5.05 4.35 26.20 34.05 42.30 52.25

20

86.80

35.25 60.40

0.00 2.80 2.36 1.93 1.73 1.60 4.90 6.00 4.85 4.70 5.00 40.35 56.70 74.15 81.70

0.06 5.82 4.58 3.88 3.42 3.21 6.75 5.55 5.30 5.65 6.20 16.20 22.05 27.05 31.70 0.05 4.11 3.35 2.78 2.42 2.26 5.40 6.05 5.55 5.10 4.45 24.15 34.55 43.70 54.45

20

0.01 −0.03

0.05 −0.04 0.04 0.05 −0.04 −0.01

120 200 −0.07 −0.04

30 50 0.10 60 100 −0.11

200

20.65 24.90

100

20 −0.13 −0.25 −0.30 −0.11 −0.02 8.59 7.19 6.28 5.59 5.01 6.55 6.95 8.25 8.45 6.95 11.40 12.80 15.85 18.50 30 0.04 0.02 0.02 0.05 −0.11 7.31 5.94 5.10 4.47 4.26 7.05 5.85 6.15 6.65 7.15 12.05 15.25 19.05 23.50

12 18

50

86.40

0.07 −0.02

30

0.00 −0.05 2.83 2.36 1.89 1.74 1.66 5.15 5.45 4.30 4.80 6.20 42.15 59.50 72.10 82.55

20

40 200 −0.04

200

35.70 58.70

100

10 50 0.09 0.09 0.04 0.10 0.00 5.62 4.72 3.93 3.48 3.23 4.60 6.10 5.65 6.30 5.35 15.15 21.60 27.75 32.60 20 100 −0.19 −0.06 −0.08 −0.01 −0.06 3.93 3.33 2.83 2.43 2.38 5.00 6.00 6.00 5.20 6.20 22.90 33.85 43.05 55.55

50

20.15 23.90

30

Power (×100)

Experiment B: mf = 3 strong factors and mn semi-weak factors 20 −0.10 −0.33 −0.07 −0.01 0.11 8.73 6.99 6.02 5.45 5.02 6.30 6.95 6.10 7.95 7.55 11.75 12.75 15.55 19.40 30 −0.18 −0.06 −0.13 −0.15 −0.08 7.16 5.77 5.03 4.52 4.12 6.45 6.90 6.35 7.35 6.45 12.20 15.90 19.15 22.80

4 6

20

Size (×100)

200

N/T

Table 1. Continued. RMSE (×100) 100

mn

Bias (×100)

C62 A. Chudik, M. H. Pesaran and E. Tosetti


20 30

0 0

30

50

200

20

32.70 56.80 84.75

40 200 −0.03 −0.12 −0.07


84.70

120 200 −0.05

0.00 2.70 2.33 2.04 1.79 1.68 5.75 5.10 5.60 5.45 4.85 49.05 59.20 69.20 80.15

0.02

0.06 −0.03

0.15

200 200

0.00 −0.09 −0.11 8.38 7.13 6.31 5.70 5.32 7.25 6.75 7.65 8.15 7.70 11.30 12.55 16.25 18.30

0.10 0.00 6.73 5.83 5.16 4.54 4.36 6.40 6.95 6.90 6.50 7.65 14.25 16.50 19.20 21.15 0.02 −0.02 5.27 4.68 4.07 3.58 3.34 5.65 5.85 5.95 5.60 6.15 17.70 21.15 25.45 32.20 0.01 −0.06 3.82 3.24 2.87 2.53 2.43 5.70 5.55 5.65 5.45 6.50 27.40 36.10 43.10 52.15

0.22 −0.10

0.03 0.03 0.05 −0.04 0.11 0.00

20

30 30 0.24 50 50 0.26 100 100 −0.04

20

0.00 −0.07 −0.03

83.75

24.50 34.65 56.00

19.90

24.25 32.00 55.25

0.03 2.72 2.29 2.06 1.76 1.68 5.90 5.40 5.70 4.00 5.30 45.15 57.15 69.05 79.15

20.05

0.13

0.04 8.74 7.49 6.37 5.69 5.22 8.65 8.45 7.50 7.65 7.40 13.75 13.55 16.15 18.35

0.10

0.05 −0.10

20

18 30 −0.27 −0.02 −0.07 −0.23 0.00 6.71 5.88 5.12 4.65 4.34 6.30 6.70 6.55 6.95 6.75 12.15 16.25 18.75 20.45 30 50 0.19 −0.20 0.00 −0.05 −0.09 5.27 4.48 3.98 3.56 3.24 5.65 5.65 5.70 6.00 5.10 17.90 19.55 24.60 29.80 60 100 0.06 0.09 −0.04 −0.03 0.02 3.77 3.23 2.82 2.58 2.33 5.30 5.50 5.15 5.85 4.95 26.60 34.20 42.25 50.95

12

0.03 2.65 2.34 2.00 1.77 1.70 4.60 5.10 5.20 5.30 5.05 46.00 57.05 69.15 81.30

0.04

20 30

10 50 0.08 −0.02 −0.09 −0.06 −0.06 5.44 4.57 4.01 3.61 3.43 6.90 5.85 5.05 6.40 5.90 17.15 21.70 24.80 30.70 20 100 −0.11 −0.10 0.15 −0.02 0.00 3.76 3.29 2.82 2.55 2.34 5.00 5.75 5.50 5.45 5.00 26.30 33.10 43.20 51.30

50

17.45 24.05

30

0.00 0.06 −0.13 −0.13 −0.16 8.13 7.19 6.32 5.67 5.23 6.60 6.65 7.40 7.95 7.50 12.55 14.25 16.00 16.90 0.23 −0.19 0.10 −0.13 −0.05 6.68 5.80 5.19 4.63 4.38 6.45 5.95 7.75 7.25 7.30 14.05 14.85 20.80 21.30

100

4 6

50

84.80

30

0.00 2.82 2.26 2.03 1.80 1.70 5.80 4.60 5.15 5.45 5.60 46.15 57.80 70.20 79.00

0.02 −0.05

20

32.10 56.15

200

0.05 −0.08 5.39 4.59 3.98 3.52 3.30 6.40 6.30 5.75 5.55 5.35 17.60 19.95 26.50 30.75 0.05 −0.04 3.80 3.35 2.78 2.57 2.40 6.00 6.00 4.90 5.45 5.05 25.85 36.70 44.70 51.55

100

0.05 −0.03

0.10 0.13

50

18.85 23.30

30

Experiment A: mf = 3 strong factors and mn weak factors 0.04 0.03 8.43 7.12 6.27 5.66 5.39 7.50 7.60 7.90 7.25 7.80 13.00 13.60 16.45 18.20 0.02 −0.15 6.87 5.87 5.22 4.69 4.29 6.45 6.55 7.50 6.85 6.85 14.85 16.15 19.50 22.40

20

200

200

100

100

Power (×100)

0 200

0.08 0.12

0.13 −0.11 0.01 0.24 0.13 −0.01

20

0 50 0.08 0 100 −0.15

N/T

mn

Bias (×100)

Table 2. Results for CCEP estimator. RMSE (×100) Size (×100)


C63

200

20

83.55

0.01 2.64 2.32 2.01 1.83 1.70 5.55 5.65 4.95 6.10 4.95 47.45 59.60 70.75 78.80

200 200 −0.01 −0.03 −0.03 −0.03

34.20 56.60

23.65 33.95 54.10

20

30 50 0.08 60 100 −0.07

18.70

50

20 −0.12 −0.15 −0.18 −0.12 −0.04 8.37 7.36 6.41 5.71 5.44 7.00 7.20 7.60 7.85 7.35 11.15 13.80 14.95 17.40

30

30 30 −0.27 0.21 0.05 0.09 −0.03 7.00 6.17 5.23 4.74 4.48 6.30 7.10 6.35 6.70 7.00 11.75 16.95 19.60 22.95 50 50 −0.03 0.19 0.10 0.04 0.01 5.02 4.51 3.99 3.65 3.33 5.45 6.45 5.60 7.30 5.20 17.45 22.70 27.50 30.95 100 100 0.09 −0.03 −0.07 −0.04 −0.02 3.81 3.25 2.91 2.57 2.39 6.40 5.50 5.95 5.00 5.50 27.65 34.90 41.25 50.10

100

84.25

50

0.00 2.61 2.33 1.96 1.78 1.65 4.70 5.65 4.45 4.55 4.65 45.35 58.90 70.90 79.10

30

120 200 −0.10 −0.06 −0.02 −0.02

20

0.08 5.26 4.49 3.95 3.58 3.38 5.95 6.05 5.95 6.00 6.40 16.55 20.80 25.95 30.15 0.03 3.77 3.26 2.78 2.56 2.37 5.90 5.95 5.05 5.60 4.80 26.35 34.70 41.10 50.65

200

0.07 −0.02 0.07 0.02 −0.05 −0.01

0.07

100

20.85 23.30

0.04

50

20 0.08 −0.26 −0.34 −0.10 −0.01 8.19 7.10 6.52 5.83 5.28 6.55 7.25 8.70 8.95 7.10 12.85 13.90 16.50 19.20 30 −0.11 −0.11 0.01 0.05 −0.09 6.96 5.94 5.30 4.65 4.39 7.30 6.35 6.55 6.65 7.50 13.10 15.45 18.15 23.45

12 18

40 200

30

83.10

20

0.01 −0.01 −0.04 2.68 2.32 1.91 1.80 1.72 5.30 5.75 4.10 5.00 5.55 47.90 61.25 71.25 79.95

200

33.80 55.70

100

10 50 0.05 0.09 0.05 0.14 −0.03 5.31 4.66 4.04 3.63 3.34 5.55 6.55 5.85 6.10 5.50 17.35 22.10 26.40 31.85 20 100 −0.12 −0.07 −0.07 −0.01 −0.06 3.77 3.23 2.91 2.50 2.46 5.40 5.05 6.10 4.95 5.65 27.25 33.50 42.55 51.45

50

19.80 22.60

30

Power (×100)

Experiment B: mf = 3 strong factors and mn semi-strong factors 20 0.06 −0.27 −0.08 0.05 0.15 8.30 7.10 6.20 5.67 5.26 8.60 7.45 6.85 7.50 7.15 12.30 13.85 15.40 18.85 30 −0.20 −0.06 −0.18 −0.19 −0.08 6.68 5.72 5.16 4.66 4.35 6.20 6.60 7.30 7.15 7.00 13.65 15.55 18.70 21.05

4 6

20

Size (×100)

200

N/T

Table 2. Continued. RMSE (×100) 100

mn

Bias (×100)



30

50

100

200

20

−1.42 −1.14 −0.89 −0.79 −0.66 −0.66 −0.56 −0.36

−2.16 −1.04

0 100 0 200

−1.58 −1.02 −1.01 −0.75 −0.81 −0.65 −0.44 −0.35

−1.99 −0.98

20 100 40 200


−1.41 −1.25 −0.99 −0.78 −0.68 −0.69 −0.54 −0.34

−2.21 −1.03

60 100 120 200

100 200

20

30

50

100

200

20

30

50

4.07 2.69

9.54 6.20 3.05 2.54 2.30 2.12 1.75 1.56

6.65 5.95

7.45 5.10

7.55 6.30

7.25 7.00

8.00 6.15

4.12 2.83

9.49 6.19 6.15 5.85

7.30 6.95

6.80 6.50

8.75 6.05

6.50 6.45

3.97 2.71

9.42 6.39 7.40 6.50

6.35 5.45

7.35 6.90

8.10 6.20

7.90 5.85

7.85

6.10 6.70 6.90 8.20 10.80 7.10 10.40 15.80 18.70 7.25 9.15 18.30 26.10 41.80 50.90 5.20 19.85 37.65 58.70 77.35 87.00

7.18 5.77 4.86 15.35 14.55 15.05 14.70 12.55 4.93 3.92 3.43 9.30 10.55 11.55 10.45 9.15 3.07 2.58 2.28 2.15 1.72 1.54

6.60 6.95 6.95 7.05 10.25 7.40 11.00 16.65 22.30

7.65

6.20 9.10 16.55 29.55 41.80 50.75 6.00 20.00 37.15 59.30 79.95 86.75

7.18 5.61 4.89 13.60 15.35 15.60 14.50 12.35 4.73 3.80 3.45 10.15 8.80 9.20 9.35 8.80 2.98 2.59 2.26 2.08 1.69 1.55

6.85

7.05 7.00 8.70 8.95 7.25 10.05 16.75 20.25

8.10

200

6.90 8.55 18.20 28.90 43.40 50.50 6.10 19.25 38.10 57.15 75.80 87.90

7.19 5.59 4.83 14.40 15.45 15.75 13.30 12.40 4.76 3.77 3.44 10.55 9.25 9.75 9.30 8.95

100

100 100 200 200

−1.89 −0.96

−1.49 −1.32 −0.99 −0.82 −0.69 −0.63 −0.50 −0.38 5.99 4.13

4.00 2.78

3.17 2.53 2.35 2.14 1.71 1.57

8.00 6.65

6.70 6.50

8.40 7.50

7.25 5.90

7.40 10.00 16.15 27.50 41.40 50.65 6.05 21.70 39.70 59.95 78.30 86.05

20 −15.69 −11.09 −8.36 −6.14 −4.86 21.50 14.71 10.99 8.36 6.93 23.30 24.50 25.95 22.85 18.15 13.45 12.65 10.20 8.45 7.15 30 −9.19 −6.19 −4.78 −3.56 −2.89 14.51 9.67 7.33 5.63 4.91 15.30 16.30 16.75 14.55 12.00 7.55 7.50 6.80 7.00 9.90 50 −4.72 −3.31 −2.65 −2.03 −1.65 9.28 6.31 4.84 3.92 3.47 8.45 10.60 10.30 10.00 9.75 5.10 6.80 10.15 16.45 20.60

6.07 4.08

−5.91 −4.59 −3.68 −2.73 14.62 −3.41 −2.72 −2.07 −1.72 9.63

20 30 50

50

20 −15.62 −11.38 −8.52 −6.22 −4.69 21.53 15.17 11.27 8.45 6.76 22.75 25.65 27.20 24.20 17.70 14.15 12.55 11.40

−9.44 −4.83

18 30

30 50

12

5.76 4.02

−5.98 −4.58 −3.51 −2.82 14.08 −3.21 −2.49 −1.86 −1.49 9.55

−8.99 −4.64

6 10

30

20 −15.33 −11.03 −8.35 −6.07 −4.63 21.03 14.71 10.97 8.28 6.75 21.60 24.50 24.30 21.80 17.05 13.40 12.45 10.85

30 50

4

5.87 3.99

−6.08 −4.67 −3.43 −2.85 14.61 −3.01 −2.48 −1.86 −1.59 9.54

−9.09 −4.63

0 0

30 50

0

20

Power (×100)

Experiment A: mf = 3 strong factors and mn weak factors 20 −15.93 −10.98 −8.33 −5.97 −4.55 21.34 14.54 11.18 8.28 6.71 22.00 24.30 26.10 22.45 17.75 12.90 12.05 12.10

mn N/T

Bias (×100)

Table 3. Results for MGPC estimator. RMSE (×100) Size (×100)


C65

50

100

200

20

30

50

100

200

20

30

50

30 −11.76 −8.69 −7.16 −5.79 −5.06 16.66 11.59 9.18 7.38 6.49 19.85 22.25 27.05 28.00 27.30 10.30 9.65 8.70 7.35 50 −7.46 −5.67 −4.97 −4.21 −3.74 11.33 8.15 6.58 5.38 4.83 15.25 19.00 22.85 23.75 23.05 6.80 7.00 5.90 6.25

18 30

30 −13.53 −9.92 −8.45 −7.04 −6.25 18.07 12.76 10.30 8.42 7.48 23.75 27.40 34.00 35.55 34.95 13.25 11.40 10.55 8.95 50 −8.71 −7.20 −6.46 −5.58 −5.13 12.34 9.30 7.81 6.62 6.03 17.95 25.15 32.65 37.95 39.15 7.75 7.55 7.30 6.60

30 50

100 100 −5.88 −5.39 −5.02 −4.54 −4.12 8.54 6.75 5.92 5.19 4.67 17.40 27.60 39.80 45.75 47.15 6.85 5.65 5.85 6.25 6.35 200 200 −4.81 −4.57 −4.26 −3.82 −3.63 6.57 5.48 4.83 4.23 3.95 23.20 38.00 51.20 59.40 64.30 6.80 7.45 8.20 11.70 14.65

6.85 5.45

20 −20.66 −15.49 −12.81 −10.18 −8.27 25.82 18.67 15.02 11.82 9.67 31.80 37.35 44.45 44.05 39.20 21.35 20.90 23.05 17.85 11.85

20

60 100 −4.82 −4.02 −3.59 −3.12 −2.77 7.61 5.63 4.64 3.90 3.52 14.85 18.10 23.15 26.30 24.70 6.20 6.30 8.05 11.70 18.15 120 200 −3.61 −3.17 −2.93 −2.65 −2.41 5.43 4.28 3.64 3.17 2.84 14.90 21.80 29.45 34.90 34.65 7.00 11.65 18.20 29.70 38.85

7.30 6.45 6.75

20 −18.36 −13.84 −11.09 −8.17 −6.89 23.38 17.17 13.49 10.08 8.43 26.65 33.75 37.60 32.75 29.05 16.55 19.25 17.75 11.85

12

20 100 −3.17 −2.43 −2.06 −1.70 −1.51 6.48 4.56 3.61 2.88 2.69 8.35 10.20 12.05 11.65 12.25 6.90 12.05 19.20 29.80 38.35 40 200 −1.99 −1.54 −1.41 −1.24 −1.09 4.34 3.15 2.43 2.10 1.90 7.75 9.95 10.25 12.15 11.60 12.40 26.10 43.30 61.45 72.30

30

30 −10.27 −7.17 −5.43 −4.28 −3.47 15.50 10.46 7.79 6.19 5.24 17.55 17.90 18.30 18.95 14.20 9.75 7.75 7.10 6.75 7.20 50 −5.35 −4.11 −3.33 −2.61 −2.20 9.85 6.86 5.26 4.24 3.78 10.75 12.35 13.70 13.05 12.80 5.80 6.45 7.70 12.10 15.50

20

6 10

200

6.30

100

Experiment B: mf = 3 strong factors and mn semi-strong factors 20 −16.39 −11.94 −8.72 −6.38 −4.98 22.01 15.35 11.34 8.48 6.93 22.65 26.25 28.10 24.25 19.00 15.00 14.15 11.80 7.80

50

Power (×100)

4

30

Size (×100)

200

20

Table 3. Continued. RMSE (×100)

100

mn N/T

Bias (×100)



30

50

100

200

20

−1.28 −1.08 −0.90 −0.81 −0.63 −0.60 −0.55 −0.39

−1.93 −0.82

0 100 0 200

−1.44 −0.93 −0.98 −0.78 −0.70 −0.60 −0.44 −0.37

−1.86 −0.83

20 100 40 200


−1.25 −1.17 −0.99 −0.82 −0.61 −0.64 −0.53 −0.36

−1.93 −0.90

60 100 120 200

100 200

20

30

50

100

200

20

30

50

3.72 2.50

8.68 5.74 2.97 2.56 2.32 2.07 1.76 1.59

6.80 7.05

7.15 5.35

7.10 5.85

7.25 6.40

8.75 7.20

6.35 8.85

3.78 2.55

8.76 5.72 6.75 6.10

7.15 6.75

5.90 6.00

7.85 6.05

6.75 7.20

3.68 2.44

8.68 5.84 7.30 6.00

6.75 4.85

6.95 7.45

7.95 5.90

6.75

7.50 5.85

7.65

6.35 7.20 7.10 8.05 9.70 8.15 11.05 14.95 18.10 6.90 12.00 19.65 28.10 40.75 49.10 5.80 26.45 43.20 61.10 76.85 85.75

6.95 5.76 4.89 16.25 14.95 15.15 14.95 11.80 4.73 3.92 3.47 10.40 9.65 10.05 10.35 9.05 3.03 2.59 2.32 2.11 1.74 1.56

6.80

7.70 7.65 7.10 10.00 8.00 10.90 16.55 22.10

9.80

5.95 11.15 19.55 31.90 40.60 50.20 5.85 27.30 42.70 60.75 78.80 85.55

6.91 5.61 4.93 16.25 16.05 14.75 14.20 12.40 4.56 3.83 3.51 10.60 9.55 8.85 8.80 9.30 2.90 2.60 2.28 2.05 1.70 1.58

6.70

7.15 8.75 9.00 9.95 15.75 19.15

6.85

200

6.75 12.55 20.55 30.70 40.70 50.25 6.35 28.55 44.45 59.60 74.55 86.45

6.86 5.57 4.95 16.60 14.80 14.25 14.05 13.40 4.59 3.81 3.49 11.00 10.10 10.50 9.05 8.15

100

100 100 200 200

−1.68 −0.80

−1.33 −1.22 −0.97 −0.86 −0.59 −0.58 −0.49 −0.40 4.84 3.24

3.69 2.52

3.14 2.55 2.38 2.08 1.73 1.59

7.70 5.90

7.00 6.45

7.80 6.55

6.75 5.90

8.10 12.30 19.70 28.30 39.50 48.90 5.70 27.60 45.15 60.80 77.30 83.95

20 −13.80 −10.15 −7.64 −5.81 −4.71 18.14 13.46 10.27 8.14 6.87 22.90 25.45 23.50 21.55 18.25 13.40 12.10 8.95 7.30 6.60 30 −8.54 −5.59 −4.48 −3.47 −2.89 12.44 8.88 7.03 5.61 4.96 16.80 16.25 15.65 14.15 11.85 8.30 7.25 7.00 7.10 10.25 50 −4.09 −2.90 −2.46 −2.00 −1.68 7.70 5.83 4.69 3.91 3.50 9.05 9.55 9.95 9.70 9.20 5.50 9.30 11.05 15.80 20.25

4.97 3.25

−5.52 −4.30 −3.60 −2.77 12.48 −3.12 −2.50 −2.04 −1.76 7.98

20 30 50

50

20 −14.05 −10.13 −7.63 −5.79 −4.52 18.63 13.57 10.44 8.12 6.65 24.40 24.15 24.70 22.55 16.60 13.90 12.00 10.85

−8.60 −4.29

18 30

30 50

12

4.82 3.23

−5.57 −4.26 −3.44 −2.84 12.13 −2.92 −2.34 −1.85 −1.55 8.07

−8.15 −4.16

6 10

30

20 −14.10 −10.09 −7.63 −5.71 −4.59 18.44 13.35 10.26 7.98 6.75 23.90 23.10 23.30 19.50 17.05 13.80 12.30

30 50

4

4.88 3.32

−5.54 −4.32 −3.32 −2.90 12.53 −2.77 −2.30 −1.89 −1.62 8.10

−8.16 −4.25

0 0

30 50

0

20

Power (×100)

Experiment A: mf = 3 strong factors and mn weak factors 20 −14.29 −10.00 −7.59 −5.62 −4.46 18.34 13.39 10.39 8.00 6.66 24.30 24.30 23.30 21.30 17.05 13.30 12.75 10.70

mn N/T

Bias (×100)

Table 4. Results for PPC estimator. RMSE (×100) Size (×100)


C67

100

200

20

30

50

100

200

20

30

50

100

200

20

30

50

100

6.15 6.75

30 −10.55 −7.89 −6.70 −5.59 −5.04 14.24 10.64 8.79 7.23 6.49 22.20 23.05 25.75 26.15 26.65 11.15 9.00 8.70 6.55 50 −6.39 −5.06 −4.60 −4.07 −3.77 9.55 7.26 6.22 5.28 4.87 17.20 17.60 21.65 21.85 23.45 6.85 5.90 6.10 6.25

18 30

30 −12.15 −8.99 −7.79 −6.82 −6.22 15.46 11.51 9.65 8.23 7.49 26.25 27.25 30.60 32.80 34.70 12.85 10.35 9.35 8.60 50 −7.35 −6.24 −5.95 −5.45 −5.16 10.13 8.25 7.33 6.55 6.07 19.50 23.95 29.65 35.15 38.85 6.60 7.10 7.10 6.85

30 50

100 100 −4.74 −4.55 −4.50 −4.40 −4.13 6.81 5.88 5.42 5.07 4.69 17.85 25.20 33.65 43.85 46.65 6.35 5.75 5.75 6.30 6.80 200 200 −3.61 −3.73 −3.79 −3.68 −3.61 5.00 4.64 4.37 4.11 3.95 21.20 30.60 44.65 55.90 63.05 8.05 9.75 10.75 13.40 14.95

7.45 5.45

20 −18.49 −14.19 −11.79 −9.70 −8.16 22.40 17.04 13.88 11.36 9.58 33.15 37.10 40.85 42.40 37.65 20.50 20.35 19.50 16.15 11.50

20

60 100 −3.86 −3.43 −3.30 −3.01 −2.75 6.06 4.96 4.33 3.85 3.52 13.55 16.70 20.20 25.65 23.90 5.70 7.90 8.35 12.20 17.90 120 200 −2.79 −2.69 −2.61 −2.56 −2.39 4.28 3.74 3.36 3.09 2.84 14.35 19.55 26.00 32.05 32.95 12.30 16.30 23.25 29.95 38.50

6.80

20 −16.80 −12.67 −10.18 −7.81 −6.75 20.80 15.59 12.52 9.74 8.31 30.25 33.30 34.60 30.40 28.05 18.85 16.70 14.90 10.45

12

20 100 −2.57 −2.19 −1.92 −1.67 −1.51 5.30 4.09 3.51 2.89 2.71 9.30 9.60 11.55 10.25 11.60 9.90 13.85 21.65 29.00 37.25 40 200 −1.52 −1.32 −1.27 −1.22 −1.08 3.50 2.82 2.34 2.10 1.91 7.65 8.15 9.25 12.05 11.30 20.00 33.15 47.55 60.90 72.50

6.25

200

30 −9.44 −6.54 −5.16 −4.20 −3.51 13.23 9.55 7.51 6.13 5.31 19.25 18.30 18.20 18.10 14.10 8.90 7.50 7.00 6.00 7.40 50 −4.85 −3.62 −3.08 −2.55 −2.24 8.30 6.25 5.03 4.26 3.85 11.85 12.20 11.70 13.10 13.20 6.00 6.80 7.75 12.20 15.60

50

Power (×100)

6 10

30

Size (×100)

Experiment B: mf = 3 strong factors and mn semi-strong factors 20 −14.67 −10.83 −8.13 −6.12 −4.93 18.93 13.99 10.77 8.31 6.95 24.85 26.45 25.45 23.50 18.50 15.00 12.55 11.25 7.30

20

Table 4. Continued. RMSE (×100)

4

mn N/T

Bias (×100)



C69


Bias (×100) mn

N/T

20

Table 5. Results for Bai estimator. RMSE (×100) Size (×100)

100

20

100

20

Power (×100)

100

20

100

Weak factor structure {λi nt } −0.30 9.78 5.72 37.90

0

20

0.47

48.00

45.60

61.40

0 4 20

100 20 100

−0.01 0.62 0.07

0.02 −0.15 −0.09

3.57 9.80 3.48

2.50 5.83 2.47

21.50 40.10 21.40

47.20 50.50 44.90

58.70 48.30 56.20

91.10 63.20 91.50

20 100

20 100

0.30 0.10

0.09 0.03

9.91 3.47

6.07 2.42

37.90 21.10

52.40 45.30

46.50 59.80

64.20 91.90

Semi-strong factor structure {λi nt } −0.23 9.40 6.08 35.50 −0.17 3.70 2.60 23.60

52.10 46.80

42.70 58.30

65.10 88.70

−0.28 0.03

52.40 44.50

49.40 56.20

60.50 90.20

4 20

20 100

0.45 −0.09

20 100

20 100

1.28 0.02

10.47 3.50

6.27 2.46

41.70 20.90

Note: Experiments A and B: mf = 3 strong factors and mn weak or semi-strong factors.a Based on R = 1000 replications.

Figure 1. Power curves for the CCEP t-tests: N = 100, T = 100.

semi-strong) factors. This is also confirmed by Figure 1, which shows that the power curves of tests based on the CCEP estimator do not change much with mn .4 The Monte Carlo results clearly show that augmenting the regression with cross-section averages seems to work well not only in the case of a few strong common factors, but also in the presence of an arbitrary, possibly infinite, number of (semi-) weak factors. Tables 3 and 4 report the findings for the MGPC and PPC. First notice that these estimators, since they estimate the unobserved common factors by principal components, only work in the case where the factors, {nt }, represent a set of weak factors, or when mn = 0 (i.e. in Experiment 4

Similar curves were obtained for CCEMG estimators, which are not reported due to space considerations.


C70


A). In fact, in the case of a semi-strong factor structure the covariance matrix of the idiosyncratic error would not have bounded column norm, a condition required by the principal components analysis for consistent estimation of the factors and their loadings. However, as shown in Table 1, even for Experiment A, these estimators show some size distortions for small values of N (i.e. when N = 20, 30). One possible reason for this result is that the principal components approach requires estimating the number of (strong) factors via a selection criterion, which in turn introduces an additional source of uncertainty into the analysis. Therefore, not surprisingly, tests based on MGPC and PPC estimators are severely oversized when a semi-strong factor structure is considered. Finally, Table 5 gives the results for the Bai (2009) PC iterative estimator. The bias and RMSE of the Bai estimators are comparable to CCE-type estimators, but tests based on them are grossly over-sized, even when mn = 0. The problem seems to lie with the variance of the Bai estimator, an issue that clearly needs further investigation. In his Monte Carlo experiments, Bai does not provide size and power estimates of tests based on his proposed estimator.

6. CONCLUDING REMARKS Cross-section dependence is a rapidly growing field of study in panel data analysis. In this paper, we have introduced the notions of weak and strong cross-section dependence, and have shown that these are more general and more widely applicable than other characterizations of crosssection dependence provided in the existing econometric literature. We have also investigated how our notions of CWD and CSD relate to the properties of common factor models that are widely used for modelling of contemporaneous correlation in regression models. Finally, we have provided further extensions of the CCE procedure advanced in Pesaran (2006) that allow for a large number of weak or semi-strong factors. Under this framework, we have shown that the CCE method still yields consistent estimates of the mean of the slope coefficients and the asymptotic normal theory continues to be applicable.

ACKNOWLEDGMENTS Elisa Tosetti acknowledges financial support from ESRC (Ref. no. RES-061-25-0317). We are grateful to Takashi Yagamata for helpful comments. A preliminary version of this paper was presented at The Econometrics Journal Special Session, Royal Economic Society Annual Conference, Surrey, April 2009, and at the Institute for Advanced Studies, Vienna, May 2009. We have benefited from comments by Manfred Deistler, Pierre Perron (the Editor), and two anonymous referees. The views expressed in the paper are those of the authors and do not necessarily reflect those of the European Central Bank.

REFERENCES Anderson, B., M. Deistler, A. Filler and C. Zinner (2009). Generalized linear dynamic factor models—an approach via singular autoregressions. Working paper, Vienna University of Technology. Andrews, D. W. K. (1987). Asymptotic results for generalized Wald tests. Econometric Theory 3, 348–58. Anselin, L. (1988). Spatial Econometrics: Methods and Models. Dordrecht: Kluwer. Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica 77, 1229–79. Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models. Econometrica 70, 191–221. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


C71

Bernstein, D. S. (2005). Matrix Mathematics: Theory, Facts, and Formulas with Application to Linear Systems Theory. Princeton, NJ: Princeton University Press. Chamberlain, G. (1983). Funds, factors, and diversification in arbitrage pricing models. Econometrica 51, 1281–304. Chamberlain, G. and M. Rothschild (1983). Arbitrage, factor structure and mean–variance analysis in large asset market. Econometrica 51, 1305–24. Connor, G. and R. A. Korajzcyk (1993). A test for the number of factors in an approximate factor structure. Journal of Finance 48, 1263–91. Davidson, J. (1994). Stochastic Limit Theory. Oxford: Oxford University Press. Forni, M. and M. Lippi (2001). The generalized factor model: representation theory. Econometric Theory 17, 1113–41. Forni, M. and L. Reichlin (1998). Let’s get real: a factor analytical approach to disaggregated business cycle dynamics. Review of Economic Studies 65, 453–73. Giannone, D. and M. Lenza (2010). The Feldstein–Horioka fact. In NBER International Seminar on Macroeconomics 2009, 103–17. Cambridge, MA: National Bureau of Economic Research. Horn, R. A. and C. A. Johnson (1985). Matrix Analysis. Cambridge: Cambridge University Press. Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24, 417–41 and 498–520. Kapetanios, G. and M. H. Pesaran (2007). Alternative approaches to estimation and inference in large multifactor panels: small sample results with an application to modelling of asset returns. In G. D.A. Phillips and E. Tzavalis (Eds.), The Refinement of Econometric Estimation and Test Procedures: Finite Sample and Asymptotic Analysis, 282–318. Cambridge: Cambridge University Press. Kapetanios, G., M. H. Pesaran and T. Yagamata (2010). Panels with non-stationary multifactor error structures. CESifo Working Paper No. 1788, CESifo, Munich (revised). Kelejian, H. H. and I. Prucha (1999). A generalized moments estimator for the autoregressive parameter in a spatial model. International Economic Review 40, 509–33. Lee, L. F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72, 1899–925. Onatski, A. (2009). Asymptotics of the principal components estimator of large factor models with weak factors. Working paper, Columbia University. Pesaran, M. H. (2006). Estimation and inference in large heterogenous panels with multifactor error structure. Econometrica 74, 967–1012. Pesaran, M. H. and E. Tosetti (2010). Large panels with common factors and spatial correlations. CESifo Working Paper No. 2103, CESifo, Munich (revised). Robinson, P. M. (2003). Long memory time series. In P. M. Robinson (Ed.), Time Series with Long Memory, 4–32. Oxford: Oxford University Press. Stock, J. H. and M. W. Watson (2002). Macroeconomic forecasting using diffusion indexes. Journal of Business and Economic Statistics 20, 147–62. Stock, J. H. and M. W. Watson (2003). Has the business cycle changed and why? In M. Gertler and K. Rogoff (Eds.), NBER Macroeconomics Annual 2002, Volume 17, 159–230. Cambridge, MA: National Bureau of Economic Research. Stone, R. (1947). On the interdependence of blocks of transactions. Supplement to Journal of the Royal Statistical Society 9, 1–45. Whittle, P. (1954). On stationary processes on the plane. Biometrika 41, 434–49. Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions, and tests for aggregation bias. Journal of the American Statistical Association 57, 348–68.


C72


APPENDIX A: STATEMENTS AND PROOFS OF LEMMAS We state and prove a number of lemmas that we shall use in proofs of Theorems 4.1 and 4.2. j

Suppose Assumptions 4.1–4.5 hold and (N , T ) → ∞. Then, √ T T 1 N L1 L1 gt ϑ wt → 0, gt εwt → 0, T t=1 T t=1

L EMMA A.1.

√

T N L1 gt vwt → 0, T t=1

T 1 L1 vit ϑ wt → 0 uniformly in i, T t=1

√ T N L1 vit εwt → 0 uniformly in i, T t=1

√ T N L1 vit vwt → 0 uniformly in i, T t=1 √ T N L1 ϑit εwt → 0 uniformly in i, T t=1

T 1 L1 ϑit ϑ wt → 0 uniformly in i, T t=1

√

T N L1 ϑit vwt → 0 uniformly in i, T t=1

√

T N L1 εit εwt → 0 uniformly in i T t=1

T 1 L1 εit ϑ wt → 0, uniformly in i, T t=1

(A.1)

(A.2)

(A.3)

(A.4)

(A.5)

√ and

T N L1 εit vwt → 0 uniformly in i, T t=1

(A.6)

mn (N) N N where gt = (dt , ft ) , ϑ wt = N i=1 wi ϑit , ϑit = =1 λi nt , ε wt = i=1 wi εit and vwt = =1 wi vit . If in addition there exist constants α and K such that 0 ≤ α < 1 and conditions (4.13) and (4.14) hold, then T 1 L1 uwt uwt → 0, T t=1

(A.7)

where uwt = N i=1 wi uit , and uit is defined by (4.5). If conditions (4.17) and (4.18) hold instead of conditions (4.13) and (4.14), 0 ≤ α < 1/2, and the remaining assumptions are unchanged, then √ √ T T N N L1 L1 gt ϑ wt → 0, vit ϑ wt → 0 uniformly in i, (A.8) T t=1 T t=1 √

T N L1 ϑit ϑ wt → 0 uniformly in i T t=1

and

√ T N L1 εit ϑ wt → 0 uniformly in i. T t=1

(A.9)

Proof: We use L1 mixingale weak law to establish results (A.1)–(A.9). Let TN = T (N ) such that TN → ∞ as N → ∞ and let cNt = T1N for all N ∈ N, and all t ∈ Z. To establish the first part of (A.1) define κ Nt =

mn (N) 1 1 gt ϑ wt = gt λw nt , TN TN =1 C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

Weak and strong cross-section dependence and estimation of large panels where λw =

N i=1

C73

wi λi . We have E

κ Nt κ Nt 2 cNt

mn (N)

= g

2

λw ,

=1

mn (N)

2

where g < K by Assumption 4.1. Consider the term =1 λw . Because absolute summability implies n (N) 2 square summability, a sufficient condition for the existence of an upper bound for m =1 λw is the mn (N) existence of an upper bound for =1 |λw |. But m (N) mn (N) mn (N) N N n ≤ |λw | = w λ |w | |λ | < K, i i i i =1

=1

i=1

i=1

=1

mn (N)

where =1 |λi | < K by condition (4.10) of Assumption 4.2, and N i=1 |wi | is bounded by (2.1) and (2.2). It follows that array {κ Nt /cNt } is uniformly bounded in L2 -norm and therefore uniformly integrable.5 Furthermore, gt and nt , for = 1, 2, . . . , mn (N ), are covariance stationary processes with absolute summable autocovariances, and therefore E(gt | It−s )L1 → 0 and E(nt | It−s )L1 → 0, as s → ∞, } is uniformly integrable with respect to theconstant array {cNt }. Since we have and array {κ Nt NL1 -mixingale N N N 2 cNt = limN→∞ Tt=1 TN−1 = 1 < ∞, and limN→∞ Tt=1 cNt = limN→∞ Tt=1 TN−2 = 0, that limN→∞ Tt=1 a mixingale weak law (Davidson, 1994, Theorem 19.11) can be applied, and we have T

κ Nt =

t=1

T 1 L1 gt ϑ wt → 0, TN t=1

as required. Similarly, to establish the first part of (A.8) define √ √ mn (N) N N κ Nt = gt ϑ wt = gt λw nt , TN TN =1 where as before λw =

N i=1

wi λi . Hence E

κ Nt κ Nt 2 cNt

mn (N)

= N g

2

λw ,

=1

and note that

mn (N)

2

λw < KN 2α−2 mn (N ) = O(N −1 ),

=1

under conditions (4.17) and (4.18). Thus, same as before {κ Nt } is uniformly integrable L1 -mixingale with respect to a constant array cNt , and applying a mixingale weak law yields √ T T N L1 κ Nt = gt ϑ wt → 0. T N t=1 t=1 Remaining results can also√be established in a similar way. For example, in order to establish the second part of (A.1) define κ Nt = TNN gt ε wt , and note that E

5

κ Nt κ Nt 2 cNt

N = N g E ε2wt = N g wi2 E εit2 < K. i=1

Sufficient condition for uniform integrability is L1+ε uniform boundedness for any ε > 0.


C74


The following four results, the second part of (A.3), the first part of (A.6), (A.7), and the first part of (A.9), deserve more attention. In the case of the second part of (A.3), we have √

TN N

TN

√ vit vwt =

t=1

TN N

TN

√ wi vit vit +

t=1

TN N

TN

vit

t=1

wj vj t .

(A.10)

j =i

√ N L1 Note that since vit is ergodic in variance then TN−1 Tt=1 vit vit → vi , and since N wi → 0 as N → ∞ √ L1 N and supi vi < K, it follows that TNN Tt=1 wi vit vit → 0. To establish convergence of the second term √ on the right side of (A.10), define κ Nt = TNN vit j =i wj vj t , and note that E κ Nt κ Nt < kN vi wj2 vj < K. 2 c Nt

j =i

N L1 Using now the same arguments as in the proof of the first part of (A.1) we have Tt=1 κ Nt → 0, which completes the proof of the second part of (A.3). The first part of (A.6) can be established along the same lines followed to prove the second part of (A.3). To establish the first part of (A.9), consider √ κNt =

N

TN

√ ϑit ϑ wt =

N

TN

ϑit

N

wj ϑj t .

j =1

We have 2 N N E κNt = N wj wk E ϑit2 ϑj t ϑkt , 2 c Nt

(A.11)

j =1 k=1

where n (N) m n (N) m n (N) m n (N) m λi1 λi2 λj 3 λk4 E(n1 t n2 t n3 t n4 t ), E ϑit2 ϑj t ϑkt =

1 =1 2 =1 3 =1 4 =1

in which E(n1 t n2 t n3 t n4 t ) is non-zero only in the following three cases: (i) 1 = 2 = 3 = 4 , (ii) 1 = 2 and 3 = 4 and (iii) 1 = 3 and 2 = 4 . It follows that mn (N) n (N) n (N) m m λ2i λj λk E n4t + λ2i1 λj 3 λk3 + λi1 λi2 λj 1 λk2 , E ϑit2 ϑj t ϑkt = 1 =1 3 =1

=1

1 =1 2 =1

(A.12)

where E(n2t ) = 1, and E(n4t ) < K by Assumption 4.1. Using conditions (4.17) and (4.18), and the absolute summability condition (4.10) of Assumption 4.2, we obtain

N

N N j =1 k=1

wj wk

m (N) n =1

λ2i λj λk E

mn (N) 4 2 nt = N λ2i λw < K,

(A.13)

=1


C75


N

N N

⎛ wj wk ⎝

j =1 k=1

N

N N

mn (N)

⎛

λ2i1 λj 3 λk3 ⎠ = N

1 =1 3 =1

mn (N)

wj wk ⎝

⎞

λ2i1

λi1 λi2 λj 1 λk2 ⎠ = N

2

λw3 < K,

3 =1

1 =1

⎞

1 =1 2 =1

j =1 k=1

mn (N)

mn (N)

λi1 λw1

1 =1

mn (N)

·

λi2 λw2 < K,

(A.14)

2 =1

Now substitute (A.12) in (A.11) for E ϑit2 ϑj t ϑkt , and use (A.13) and (A.14) to obtain 2 E κNt < K. 2 cNt

(A.15)

Using the same arguments as in the proof of the first part of (A.1), κNt is uniformly integrable L1 -mixingale N L1 κNt → 0, as required. with respect to the constant array cNt , and applying a mixingale weak law yields Tt=1 In order to establish (A.7) note that ⎞ ⎛ N ⎜ ϑ wt + εwt + wi β i vit ⎟ ⎟. uwt = ⎜ i=1 ⎠ ⎝ vwt Convergence of T −1 note that

T

2

t=1

2

ϑ wt can also be established using a mixingale weak law. Let κNt = TN−1 ϑ wt , and

E

2 ϑ wt

⎛ = E⎝

N

⎞2 wj ϑj t ⎠ =

j =1

=

N N

≤

=1

wi wj E(ϑit ϑj t )

i=1 j =1

mn (N)

wi wj

i=1 j =1 mn (N)

N N

λi λj

=1

⎛ ⎞ N N ⎝ wi λi · wj λj ⎠ i=1

j =1

≤ K · mn (N )N 2α−2 → 0, α where |wi | < K/N under the granularity conditions (2.1) and (2.2), N i=1 |λi | < KN by (4.13), and mn (N )N 2α−2 → 0 by (4.14). Similarly as in the proof of the first part of (A.9), it can be shown that N L1 2 2 /cNt ) is bounded and that Tt=1 κNt → 0. The convergence of the remaining elements of (A.7) can E(κNt be established using similar arguments as in Lemma 2 of Pesaran (2006), or by applying a mixingale weak law. L EMMA A.2.

j

Suppose Assumptions 4.1–4.5 hold and (N , T ) → ∞. Then, vi vi p → vi uniformly in i, T

vi G p → 0 uniformly in i, T

vi Q p → 0 uniformly in i, T

G G p → g , T


(A.16)

(A.17)

C76

A. Chudik, M. H. Pesaran and E. Tosetti Q G = Op (1), T ∗

Q Q = Op (1), T

(A.18)

Hw Hw = Op (1), T

(A.19)

Xi Q = Op (1) uniformly in i, T

(A.20)

p

i − ∗i → 0 uniformly in i,

Hw ϑ i = op (1) uniformly in i, T

Hw εi = op (1) uniformly in i, T

Hw Xi = Op (1) uniformly in i T

(A.21)

and

Hw F = Op (1), T ∗

(A.22)

where ∗i = [I − P(P P)+ P ] i , i = [I − Pw (Pw Pw )+ Pw ] i , Pw is defined by (4.7), P = E(Pw ), n (N) G = (D, F), Q = GPw , Hw = (D, Zw ), and ϑ i = (ϑi1 , ϑi1 , . . . , ϑiT ) with ϑit = m =1 λi nt . Proof: The first part of (A.16) follows directly by observing that the covariance stationary process vit is ergodic in variance. Since gt = (dt , ft ) is also a covariance stationary process with absolute summable autocovariances, it follows that T 1 p vit gt → E vit gt = 0, T t=1

where the convergence is uniform in i since the second moments of vit are uniformly bounded in i. This establishes the second part of (A.16). The first part of (A.17) can be established using the same arguments. The second part of (A.17) can be established similarly to the first part of (A.16) by noting that g = E(gt gt ). In the same spirit, T T 1 1 p qt gt = Pw gt gt → E Pw gt gt , T t=1 T t=1 j

as (N , T ) → ∞. But E(Pw gt gt ) = P g and P g ≤ P g < K, where g < K by Assumption 4.1 and P < K by Assumptions 4.2, 4.4 and 4.5, which completes the proof of the first part of (A.18). p Noting that Q = GPw , and that Pw → P, the second part of (A.17) implies Q Q p − P g P → 0, T j

as (N , T ) → ∞. But, same as before, P g P ≤ P2 g < K and it follows that Q Q/T = Op (1), as p

j

required. To establish the first part of (A.19) note that Pw − P → 0 as (N , T ) → ∞, and lim Pr rank Pw Pw = rank P P = 1. N→∞

∗

p

It follows, using also Theorem 2 of Andrews (1987), that i − ∗i → 0. The remaining results can be established in a similar way, as results (A.16)–(A.18), using ergodicity in mean and variance of covariance stationary series with absolute summable autocovariances and Lemma A.1. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


C77

j

L EMMA A.3. Suppose Assumptions 4.1–4.5 hold, (N , T ) → ∞, and there exist constants α and K such that 0 ≤ α < 1, and conditions (4.13) and (4.14) hold. Then, Q − Hw ϑ i p Xi (Hw − Q) p → 0 uniformly in i, → 0 uniformly in i, (A.23) T T

Hw Hw T

+

−

Q Q T

Q − Hw ε i p → 0 uniformly in i, T

+

p

→ 0,

and

(A.24)

Q − Hw F p → 0. T

(A.25)

If conditions (4.17) and (4.18) hold instead of conditions (4.13) and (4.14), 0 ≤ α < 1/2, and the remaining assumptions are unchanged, then √ √ N Q − Hw ϑ i p N Xi (Hw − Q) p → 0 uniformly in i, → 0 uniformly in i, (A.26) T T √

√

N

Hw Hw T

+

−

N Q − Hw ε i p → 0 uniformly in i, T

Q Q T

+

√ and

p

→ 0,

(A.27)

N Q − Hw F p → 0. T ∗

(A.28)

∗

Proof: Using the notations in Section 4, we note that Hw = Q + Uw , where U = (0, Uw ), Uw = N i=1 wi Ui . Also recall that Xi = G i + vi . Hence both parts of (A.23) and (A.25) directly follow from results (A.1)–(A.3) of Lemma A.1. However, because Moore–Penrose inverse is not a continuous function it is not sufficient that Hw Hw Q Q = op (1), (A.29) − T T for (A.24) to hold. We establish (A.24) in a similarly way as Kapetanios et al. (2010). By Theorem 2 of j

Andrews (1987), (A.29) is sufficient for (A.24), if additionally, as (N , T ) → ∞, Hw Hw QQ lim Pr rank = rank = 1, j T T (N,T )→∞ where rank(A) denotes rank of A. But

∗

∗

∗

∗

Q Q Q Uw Hw Hw U Q U U = + + w + w w, T T T T T where

∗ ∗ ∗ Q U∗ Uw Q Uw Uw w lim Pr + + > =0 j T T T (N,T )→∞


(A.30)

C78


for all > 0. Also

rank

Q Q T

= md + rank Cw , j

for all N and T, with rank(Q Q/T ) → md + rank(C) ≤ md + mf , as (N , T ) → ∞. Using these results, it is now easily seen that condition (A.30) in fact holds. Hence, the desired result (A.24) follows. Results (A.26)–(A.28) can be established in a similar way as results (A.23)–(A.25). L EMMA A.4.

j

Suppose Assumptions 4.1–4.6 hold, and (N , T ) → ∞. Then, Xi Mq εi p → 0 uniformly in i, T

(A.31)

and Xi Mq ϑ i p → 0 uniformly in i, T n (N) where ϑ i = (ϑi1 , ϑi2 , . . . , ϑiT ) and ϑit = m =1 λi nt .

(A.32)

Proof: Consider T X ϑ i 1 Xi Mq ϑ i = i = xit ϑit , T T T t=1

where Xi = Mq Xi . Let TN = T (N ) be any non-decreasing integer-valued functions of N such that limN→∞ TN = ∞ and define κ Nt =

mn (N) 1 1 xit ϑit = xit λi nt . TN TN =1

(A.33)

1 ∞ Let {{cNt }∞ t=−∞ }N=1 be a two-dimensional array of constants and set cNt = TN for all t ∈ Z and N ∈ N. We have κ Nt κ Nt =E xit xit ϑit2 = E xit E ϑit2 , xit E 2 cNt

where the second equality follow from independence of xit and ϑit . By Assumption 4.6 there exists a xit ) < K. Further, using independence of factors nt and n t for xit constant K < ∞ such that supi E( any = and noting that E(n2t ) = 1, we have n (N) m E ϑit2 = λ2i < K < ∞.

=1

It follows that

E κ Nt κ Nt < K < ∞. 2 cNt

(A.34)

Result (A.34) established that {κ Nt /cNt } is uniformly bounded in L2 norm, which implies uniform integrability. Using similar arguments as in proof of Lemma A.1, {κ Nt } is L1 -mixingale with respect to the constant array {cNt }, and applying a mixingale weak law (Davidson, 1994, Theorem 19.11) establishes T TN j L1 L1 −1 xit ϑit → 0, as (N , T ) → ∞. This completes the proof of (A.32). t=1 t=1 κ Nt → 0, that is T xit εit and Result (A.31) can be established in a similar way, but this time we need to define κ Nt = TN−1 noting that supi E(εit2 ) < K by Assumption 4.3. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


L EMMA A.5.

C79

j

Suppose Assumptions 4.1–4.5 hold and (N , T ) → ∞. Then Xi Mq Xi p → iq uniformly in i T

(A.35)

Xi Mq F p → Qif uniformly in i, T

(A.36)

and

where iq is positive definite and given by ∗ iq = vi + ∗ i g i

(A.37)

∗ Qif = ∗ i g Sf ,

(A.38)

∗i = [I − P(P P)+ P ] i , S∗f = [I − P(P P)+ P ]Sf ,

(A.39)

and

in which

i = (Ai , i ) , Sf = (0mf ×md , Imf ) and g = E(gt gt ). Proof: Since Xi = G i + vi then v Mq vi v Mq G i Xi Mq Xi = i + i T T T G Mq G i i G Mq vi + i . + T T

(A.40)

Consider the first term and note that, vi Mq vi v vi v Q = i − i T T T

Q Q T

+

Q vi p → vi uniformly in i, T

(A.41)

where the convergence directly follows from Lemma A.2 (the first part of (A.16), the first part of (A.17), and the second part of (A.18)). Next we examine the second and the third elements (the latter is transpose of the former). We have v G v Q Q Q + Q G vi Mq G i p = i i − i i → 0 uniformly in i, (A.42) T T T T T where we have used Lemma A.2, in particular the second part of (A.16), the first part of (A.17), and both parts of (A.18). Finally, we examine the last summand on the right side of (A.40). Let Col(Pw ) denote a linear space spanned by the column vectors of Pw and consider the following decomposition of matrix i : ∗

i , i = i + ∗ i

(A.43)

i ∈ Col(Pw ), and belongs to the orthogonal complement of the space spanned by the column where ∗ ∗ vectors in Pw . The decomposition (A.43) is unique. Note that matrix Mq has the property Mq G i = G i i = 0. It follows that and Mq G

i

G Mq Mq G G Mq G ∗ G G ∗ i . i = i i = i T T T

Using now the second part of (A.17) yields i

G Mq G ∗ ∗ p i − i g i → 0 uniformly in i. T


C80

A. Chudik, M. H. Pesaran and E. Tosetti ∗

p

But according to Lemma A.2, the first part of (A.19), i − ∗i → 0, uniformly in i, and therefore i

G Mq G ∗ p i − ∗ i g i → 0 uniformly in i. T

(A.44)

Using (A.41), (A.42) and (A.44) in (A.40) establishes (A.35), as desired. vi is positive definite by ∗ Assumption 4.3 and matrix g = E(gt gt ) is non-negative definite. It follows ∗ i g i is non-negative definite. Sum of positive definite and positive semi-definite matrices is a positive definite matrix and ∗ therefore iq = vi + ∗ i g i is positive definite. Similarly to the proof of result (A.35), consider Xi Mq F v Mq F i G Mq F = i + T T T i G Mq GSf vi Mq GSf + , = T T

(A.45)

where F = GSf and Sf = (0mf ×md , Imf ) is the corresponding selection matrix. Using similar arguments as in (A.42) and (A.44), we obtain vi Mq GSf p → 0 uniformly in i T

(A.46)

i G Mq GSf ∗ p − ∗ i g Sf → 0 uniformly in i, T

(A.47)

and

where ∗i and S∗f is defined by (A.39). Using (A.46), and (A.47) in (A.45) completes the proof of (A.36). j

L EMMA A.6. Suppose Assumptions 4.1–4.5 hold, (N , T ) → ∞, and there exist constants α and K such that 0 ≤ α < 1/2 and conditions (4.17) and (4.18) hold. Then, √ Xi Mw Xi √ Xi Mq Xi p − N N → 0 uniformly in i, T T

(A.48)

√ Xi Mw εi √ Xi Mq εi p − N N → 0 uniformly in i, T T

(A.49)

√ Xi Mw F √ Xi Mq F p − N N → 0 uniformly in i T T

(A.50)

√ Xi Mw ϑ i √ Xi Mq ϑ i p − N N → 0 uniformly in i, T T

(A.51)

and

where ϑ i = (ϑi1 , ϑi2 , . . . , ϑiT ) and ϑit =

mn (N) =1

λi nt .



C81

Proof: We have + √ √ √ X Hw Hw Hw N Hw Xi √ Xi Q Q Q + Q Xi N Xi Mw Xi − Xi Mq Xi = N i − N T T T T T T T T + √ N Xi (Hw − Q) Hw Hw Hw Xi = T T T √ + N Q − Hw Xi X Q Q Q + i T T T + √ Hw Hw Q Q + Hw Xi XQ (A.52) . N − + i T T T T We focus on the individual elements on the right side of (A.52). The second part of (A.19), the second part of (A.21) and the first part of (A.26) imply + N Xi (Hw − Q) Hw Hw Hw Xi p → 0 uniformly in i. T T T o (1) O (1) √

p

Op (1)

p

The second part of (A.20), the second part of (A.18) and the first part of (A.26) imply √ Xi Q Q Q + N Q − Hw Xi p → 0 uniformly in i. T T T Op (1)

op (1)

Op (1)

Finally, the second part of (A.20), the second part of (A.21) and result (A.27) imply Xi Q √ N T O (1)

Hw Hw T

p

+

−

Q Q T

+

Hw Xi p → 0 uniformly in i, T O (1) p

op (1)

which completes the proof of (A.48). To establish result (A.49), consider + √ √ N N Xi (Hw − Q) Hw Hw Hw ε i Xi Mw εi − Xi Mq εi = T T T T + √ N Q − Hw ε i XQ QQ + i T T T + Hw Hw Q Q + Hw εi Xi Q √ N − + T T T T

(A.53)

p

→ 0 uniformly in i, where, similarly to the proof of (A.48), Lemmas A.2 and A.3 can be used repeatedly to establish the convergence of the elements on the right side of (A.53). C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

C82


Results (A.50) and (A.51) can also be established in a similar way. In particular, Lemmas A.2 and A.3 imply + √ √ √ N Xi Hw − Q N N Hw Hw Hw F Xi Mw F − Xi Mq F = T T T T T √ + N Q − Hw F X Q Q Q + i T T T + √ Hw Hw Q Q + Hw F XQ N − + i T T T T p

→ 0 uniformly in i, and

+ √ √ N N N Xi (Hw − Q) Hw Hw Hw ϑ i Xi Mw ϑ i − Xi Mq ϑ i = T T T T T + √ N Q − Hw ϑ i XQ QQ + i T T T + Hw Hw Q Q + Hw ϑ i Xi Q √ N − + T T T T

√

p

→ 0 uniformly in i. j

L EMMA A.7. Suppose Assumptions 4.1–4.6 hold, (N , T ) → ∞, and there exist constants α and K such that 0 ≤ α < 1/2 and conditions (4.17) and (4.18) hold. Then, N 1 Xi Mq εi L1 →0 √ T N i=1

(A.54)

N 1 Xi Mq ϑ i p → 0, √ T N i=1

(A.55)

and

where ϑ i = (ϑi1 , ϑi2 , . . . , ϑiT ) and ϑit =

mn (N) =1

λi nt .

Proof: Proof of Lemma A.7 is similar to the proof of Lemma A.4. Let TN = T (N ) be any non-decreasing integer-valued function of N such that limN→∞ TN = ∞. Consider the following two-dimensional vector array {κ Nt } defined by κ Nt =

1 √

TN N

N

xit εit .

i=1

1 ∞ Let {{cNt }∞ t=−∞ }N=1 be two-dimensional array of constants and set cNt = TN for all t ∈ Z and N ∈ N. Using independence of xit , and εj t for any i, j ∈ N, and independence of εit and εj t for any i = j , we have

E

κ Nt κ Nt 2 cNt

=

N 1 xit E εit2 E xit N i=1



C83

and N 1 E κ Nt κ Nt ≤ sup E x x E εit2 < K, it it 2 N i=1 cNt i∈N

(A.56)

xit ) < K by Assumption 4.6. Result (A.56) where supi E(εit2 ) < K by Assumption 4.3, and supi∈N E( xit implies uniform integrability of {κ Nt /cNt }. Since εit is covariance stationary process with absolute summable autocovariances, it follows that array κ Nt is uniformly integrable L1 -mixingale array with respect to the constant array cNt , and using a mixingale weak law yields TN

TN N

1 √

κ Nt =

TN N

t=1

L1

xit εit → 0.

t=1 i=1

This completes the proof of result (A.54). Result (A.55) is established in a similar way. This time, we define κ Nt =

1 √

N

TN N

xit ϑit .

i=1

We have E

κ Nt κ Nt 2 cNt

=

N N 1 xj t E(ϑit ϑj t ). E xit N i=1 j =1

Noting that supi,j ∈N E( xj t ) < K (by Assumption 4.6), and that ϑit = xit

mn (N) =1

λi nt , we obtain

N m N n (N) E κ Nt κ Nt < K λi λj 2 N i=1 j =1 =1 cNt N 2 mn (N) K < λi . N =1 i=1 Using conditions (4.17) and (4.18), and noting that 0 ≤ α < 1/2 imply E κ Nt κ Nt < KN 2α−1 mn (N ) < K. 2 cNt 2 ) is bounded in N ∈ N. Using now the same arguments as in derivation of (A.54), Hence E(κ Nt κ Nt /cNt we have TN

κ Nt =

t=1

1 √

TN N

TN N

L1

xit ϑit → 0,

t=1 i=1

which completes the proof of result (A.55). j

L EMMA A.8. Suppose Assumptions 4.1–4.6 hold, (N , T ) → ∞, and there exist constants α and K such that 0 ≤ α < 1 and conditions (4.13) and (4.14) hold. Then, X Mq Xi p Xi Mw Xi − i → 0 uniformly in i, T T

(A.57)

Xi Mw F Xi Mq F p − → 0 uniformly in i, T T

(A.58)


C84

A. Chudik, M. H. Pesaran and E. Tosetti X Mq ϑ i p Xi Mw ϑ i − i → 0 uniformly in i T T

(A.59)

and Xi Mw εi X Mq εi p − i → 0 uniformly in i, T T n (N) where ϑ i = (ϑi1 , ϑi2 , . . . , ϑiT ) and ϑit = m =1 λi nt .

(A.60)

Proof: Results (A.57)–(A.60) can be established in a similar way as results (A.48)–(A.51) of Lemma A.6, i.e. Lemmas A.2 and A.3 can be used repeatedly to work out orders of magnitude in probability of individual elements in (A.57)–(A.60). j

L EMMA A.9. Suppose Assumptions 4.1–4.6 hold, (N , T ) → ∞, and there exist constants α and K such that 0 ≤ α < 1 and conditions (4.13) and (4.14) hold. Then, N

wi

i=1

N

Xi Mw Xi p υ i → 0, T

(A.61)

Xi Mw F p ηi → 0 T

(A.62)

wi

i=1

and N 1 −1 Xi Mq F p ηi → 0. N i=1 iq T

(A.63)

Proof: Granularity conditions (2.1) and (2.2) imply |wi |
0 for any N follows that λ1 ( t ) tends to infinity at sufficiently large. Note that λ1 ( t ) ≤ N i=1 σii,t where, under Assumption 2.1, σii,t are finite, λ1 ( t ) cannot diverge to infinity at a rate faster than N. To prove the reverse relation, first note that, from the Rayleigh–Ritz theorem,6 ∗ λ1 ( t ) = max vt t vt = v∗ t t vt . vt vt =1

6

See Horn and Johnson (1985, p. 176).


(B.2)

C86 Let w∗t =

A. Chudik, M. H. Pesaran and E. Tosetti √1 v∗ N t

and notice that w∗t satisfies (2.1) and (2.2). Hence, we can rewrite λ1 ( t ) as λ1 ( t ) = N · Var w∗ t zt |It−1 .

(B.3)

It follows that if N −1 λ1 ( t ) ≥ K > 0, then Var(w∗ t zt |It−1 ) ≥ K > 0, i.e. the process is CSD, which proves (ii).

Proof of Theorem 3.1: Using (3.2), the covariance of zt is given by = + ε . where ε is a diagonal matrix with elements σi2 . Since condition (3.7) holds for = 1, 2, . . . , N then 1 = O(N α ), and noting that 1 = ∞ = O(1) by (3.4) then 2 λ1 () ≤ + ε 1 ≤ 1 1 + σmax = O(N α ).

But using (B.1),

(B.4)

Var w zt |It−1 = w w ≤ (w w)λ1 () ≤ (w w)O(N α ),

and when α < 1, we have,

lim Var w zt |It−1 = 0,

N→∞

for any weights w satisfying condition (2.1). It follows that {zit } is CWD, which establishes result (i). Now 2 < K < ∞, suppose that {zit } is CSD. Then, noting that σmax 2 0 < lim N −1 λ1 () ≤ lim N −1 1 1 + lim N −1 σmax ≤ lim N −1 1 1 . N→∞

N→∞

N→∞

N→∞

Given that, by assumption, 1 is bounded in N, it follows that limN→∞ N −1 1 = K > 0, and there exists at least one strong factor in (3.2). To prove the reverse, assume that there exists at least one strong factor in (3.2) (i.e. limN→∞ N −1 1 = K > 0). Noting that7 1 1/2 1/2 λ1 () ≥ λ1 ( ) ≥ √ , N

(B.5)

it follows that limN→∞ N −1 λ1 () = K > 0 and the process is CSD, which establishes result (ii).

Proof of Theorem 4.1: We prove the theorem in two parts. First, we establish consistency of the CCEP estimator and in the second part we establish consistency of the CCEMG estimator. Consider N −1 N X Mw Xi X Mw (Xi υ i + Fγ i + ϑ i + ε i ) wi i wi i βP − β = . (B.6) T T i=1 i=1 We focus on the individual elements on the right side of (B.6) below. Lemma A.10 established −1 N X Mw Xi i wi = Op (1). T i=1 7

(B.7)

See Bernstein (2005, p. 368, eq. xiv). C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


C87

According to result (A.61) of Lemma A.9, we have N

wi

i=1

Xi Mw Xi p υ i → 0. T

Noting that γ i can be written as γ i = γ w + ηi − ηw , and that result (A.62) of Lemma A.9, we obtain N i=1

(B.8)

N i=1

wi Xi Mw = Xw Mw = 0, and using

X Mw F Xi Mw F p wi i γi = ηi → 0. T T i=1 N

wi

(B.9)

Result (A.59) of Lemma A.8 and result (A.32) of Lemma A.4 imply N

wi

i=1

Xi Mw ϑ i p → 0. T

(B.10)

Similarly, result (A.60) of Lemma A.8 and result (A.31) of Lemma A.4 yield N

wi

i=1

Xi Mw εi p → 0. T

(B.11)

Using (B.7)–(B.11) in (B.6) establishes (4.16). Next we establish consistency of the CCEMG estimator. Consider N N N N 1 −1 Xi Mw εi 1 1 −1 Xi Mw F 1 −1 Xi Mw ϑ i γi + + , υi +

iT

iT

β MG − β = N i=1 N i=1 T N i=1 T N i=1 iT T (B.12)

iT = T −1 Xi Mw Xi . υ i is identically and independently distributed across i with zero mean and where bounded second moments, and therefore N 1 p υ i → 0. N i=1

(B.13)

Results (A.57) and (A.58) of Lemma A.8 imply N N 1 −1 Xi Mq F 1 −1 Xi Mw F p γi − γ i → 0.

iT N i=1 T N i=1 iq T

But Fγ w belongs to the space spanned by column vectors of Q, and therefore Mq Fγ i = Mq F(γ w + ηi − ηw ) = Mq F(ηi − ηw ), where ηw = Op (N −1/2 ). Now using (A.63) of Lemma A.9 it follows that N 1 −1 Xi Mw F p (B.14)

γ i → 0. N i=1 iT T Results (A.57) and (A.59) of Lemma A.8 and result (A.32) of Lemma A.4 imply N 1 −1 Xi Mw ϑ i p → 0.

N i=1 iT T C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

(B.15)

C88


Similarly, results (A.57) and (A.60) of Lemma A.8, and result (A.31) of Lemma A.4 imply N 1 −1 Xi Mw εi p → 0.

N i=1 iT T

(B.16)

Using (B.13)–(B.16) in (B.12) establish (4.15).

Proof of Theorem 4.2: We prove the theorem in two parts. First, we establish asymptotic distribution of the CCEP estimator and in the second part we establish asymptotic distribution of the CCEMG estimator. Consider N −1/2 −1 N N X Mw Xi 1 Xi Mw (Xi vi + Fγ i + ϑ i + ε i ) i 2 , βP − β = wi wi w i √ T T N i=1 i=1 i=1 (B.17) −1/2 √ N 2 where w i = N wi , and, by granularity conditions (2.1) and (2.2) there exists a real constant i=1 wi K < ∞ (independent of i and N), such that N −1/2 √ < K. | (B.18) wi | = N wi wi2 i=1 We focus on the individual terms on the right side of (B.17) below. Results (A.48) of Lemma A.6 and result (A.35) of Lemma A.5 imply Xi Mw Xi p → iq uniformly in i, T and therefore for any weights {wi } satisfying granularity conditions (2.1) and (2.2) we have N

Xi Mw Xi p − wi iq → 0, T i=1 N

wi

i=1

j ∗ as (N , T ) → ∞. The limit limN→∞ N i=1 wi iq = exists by Assumption 4.6, and furthermore, by the ∗ same assumption, is non-singular. It follows that N i=1

X Mw Xi wi i T

−1 p

→ ∗−1 ,

(B.19)

j

as (N , T ) → ∞. Next we focus on the individual elements in the second summation on the right side of equation (B.17). Noting that γ i can be written as γ i = γ w + ηi − ηw , and that N i=1 wi Xi Mw = Xw Mw = 0, we have N N 1 1 w i Xi Mw Fγ i = √ w i Xi Mw Fηi . √ N i=1 N i=1

(B.20)

Equations (B.18), (B.20) and result (A.50) of Lemma A.6 imply N N 1 Xi Mw F 1 Xi Mq F p γi − √ ηi → 0. w i w i √ T T N i=1 N i=1

(B.21)



C89

Equation (B.18) and result (A.51) of Lemma A.6 imply N N √ Xi Mw ϑ i √ Xi Mq ϑ i 1 1 p w i N w i N → 0, − N i=1 T N i=1 T and, using result (A.55) of Lemma A.7, we have N 1 Xi Mw ϑ i p w i → 0, √ T N i=1

(B.22)

j

as (N , T ) → ∞. Similarly, result (A.49) of Lemma A.6 and result (A.54) of Lemma A.7 establish √ Xi Mw εi p N → 0 uniformly in i, T and therefore (noting that w i is uniformly bounded in i, see (B.18)), N N √ Xi Mw εi 1 1 Xi Mw εi p = w i w i N → 0. √ T N i=1 T N i=1

(B.23)

Using (B.19), (B.21), (B.22), (B.23) and result (A.48) of Lemma A.6 in (B.17), we obtain N

−1/2 wi2

i=1

N d 1 Xi Mq (Xi vi + Fηi ) . β P − β ∼ ∗−1 √ w i T N i=1

Assumption 4.6 is sufficient for the bounded second moments of Xi Mq Xi /T and Xi Mq F/T . In particular, 4 ) < K, for s = 1, 2, . . . , k, is sufficient for the bounded second moment of Xi Mq Xi /T . To condition E( xist see this, note that T 1 Xi Mq Xi = xit xit , T T t=1

and, by Minkowski’s inequality, T 1 xist xipt T t=1

L2

≤

T 1 xist xipt , L2 T t=1

2 2 for any s, p = 1, 2, . . . , k. But by the Cauchy–Schwarz inequality, we have E( xist xipt )≤ 4 4 )E( xipt )]1/2 , and therefore bounded fourth moments of the elements of xit are sufficient for [E( xist the existence of an upper bound for the second moments of Xi Mq Xi /T . Similar arguments can be used to establish that Xi Mq F/T has bounded second moments. It therefore follows from Lemma 4 of Pesaran (2006) and Lemma A.5 that

N

−1/2 wi2

d β P − β → N (0, P ) ,

i=1 j

as (N , T ) → ∞, where P = ∗−1 R∗ ∗−1 , C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

(B.24)

C90


in which

∗ = lim

N→∞

N

wi iq ,

i=1

N 1 2 w i iq β iq + Qif γ Qif , N→∞ N i=1

R∗ = lim

β = Var(β i ), γ = Var(γ i ), iq is defined in Assumption 4.6 and Qif is defined by (A.38). Next, we consider asymptotic distribution of the CCEMG estimator. Consider √

N N N 1 1 −1 Xi Mw F 1 −1 Xi Mw ϑ i γi + √ N β MG − β = √ υi + √

iT

iT T T N i=1 N i=1 N i=1 N 1 −1 Xi Mw εi ,

iT +√ T N i=1

(B.25)

iT = T −1 Xi Mw Xi . It follows from result (A.48) of Lemma A.6 and result (A.35) of Lemma A.5 where that iT − iq = op (N −1/2 ) uniformly in i.

(B.26)

Using (B.26), result (A.51) of Lemma A.6 and result (A.55) of Lemma A.7, we have N 1 −1 Xi Mw ϑ i p → 0.

iT √ T N i=1

(B.27)

Similarly, (B.26), result (A.49) of Lemma A.6 and result (A.54) of Lemma A.7 imply N 1 −1 Xi Mw εi p → 0.

iT √ T N i=1

(B.28)

Noting that Fγ w belongs to the linear space spanned by the column vectors of Q = GPw , we have Mq Fγ w = 0, and Xi Mq Fγ i = Xi Mq F(ηi − ηw ). Using results (A.48) and (A.50) of Lemma A.6 and noting that −1 N Xi Mq F 1 Xi Mq Xi p ηw → 0, √ T T N i=1 yields −1 N N 1 −1 Xi Mw F Xi Mq F 1 Xi Mq Xi p γi − √ ηi → 0.

iT √ T T T N i=1 N i=1

(B.29)

Using (B.27)–(B.29) in (B.25) yields −1 N N d 1 Xi Mq F 1 Xi Mq Xi ηi . N β MG − β ∼ √ υi + √ T T N i=1 N i=1 √ β MG − β) → N (0, MG ), where It now follows that N( N 1 −1 −1 MG = β + lim iq Qif γ Qif iq , N→∞ N i=1 √

(B.30)

in which β = Var(β i ), γ = Var(γ i ), iq is defined in Assumption 4.6 and Qif is defined by (A.38). C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

The

Econometrics Journal Econometrics Journal (2011), volume 14, pp. 1–24. doi: 10.1111/j.1368-423X.2010.00320.x

Quantile regression models with factor-augmented predictors and information criterion T OMOHIRO A NDO † AND R UEY S. T SAY ‡ †

Graduate School of Business Administration, Keio University, 4-1-1 Hiyoshi, Kohoku-ku, Yokohama-shi, Kanagawa 223-8526, Japan. E-mail: [email protected] ‡

Booth School of Business, University of Chicago, 5807 S. Woodlawn Avenue, Chicago, IL 60637, USA. E-mail: [email protected] First version received: February 2009; final version accepted: April 2010

Summary For situations with a large number of series, N, each with T observations and each containing a certain amount of information for prediction of the variable of interest, we propose a new statistical modelling methodology that first estimates the common factors from a panel of data using principal component analysis and then employs the estimated factors in a standard quantile regression. A crucial step in the model-building process is the selection of a good model among many possible candidates. Taking into account the effect of estimated regressors, we develop an information-theoretic criterion. We also investigate the criterion when there is no estimated regressors. Results of Monte Carlo simulations demonstrate that the proposed criterion performs well in a wide range of situations. Keywords: Approximate factor models, Generated regressors, Information-theoretic approach, Panel data, Quantiles.

1. INTRODUCTION Quantile regression of Koenker and Bassett (1978) is a comprehensive statistical methodology for estimating models of conditional quantile functions. By complementing the classical linear regression that focuses on the conditional mean, quantile regression enables us to estimate the effect of covariates not only on the centre but also on the upper or lower tail of the response distribution. Indeed, quantile regression has been widely used in empirical research and its theoretical properties have been extensively investigated in the literature when N is fixed. See, for example, Powell (1986), Koenker and Portnoy (1987), Chaudhuri (1991), Portnoy (1991), Gutenbrunner and Jureckova (1992), Hendricks and Koenker (1992), Donald and Paarsch (1993), Buchinsky (1994), Chamberlain (1994), Chaudhuri et al. (1997), Portnoy and Koenker (1997), Knight (1998), Abrevaya (2001), Koenker and Geling (2001) and Chernozhukov and Hansen (2004, 2006), among others. On the other hand, recent developments in information technology permit the collection of high-dimensional data and provide a data-rich environment. For real applications, the use of many predictors often leads to high dependence among the predictors, C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society. Published by Blackwell Publishing Ltd, 9600


2

T. Ando and R. S. Tsay

especially if the number of predictors exceeds the number of observations; that is, N > T . In such cases, dimension reduction techniques are required to adequately capture the main feature of the data structure. If variations in a set of variables are driven by a small number of latent primitive factors, the factor method is an attractive tool for constructing a proxy of the predictors. For further information concerning the factor method, we refer to the literature of dynamic factor models in Geweke (1977), Forni et al. (2000, 2001, 2004), Angelini et al. (2001), Artis et al. (2001), Forni and Lippi (2001), Stock and Watson (2002b), Bernanke and Boivin (2003), Koop and Potter (2004), Boivin and Ng (2005), Banerjee and Marcellino (2006) and the literature of static factor models in Connor and Korajczyk (1986, 1988), Forni and Reichlin (1998), Jones (2001), Bai and Ng, (2002, 2006), Stock and Watson (2002a), Bai (2003) and Boivin and Ng (2003). In situations with a large number of series, N, each with T observations and containing a certain amount of information for prediction of the response variable, we consider a statistical modelling methodology that first estimates the common factors from a panel of data by the principal component analysis and then applies the estimated factors to standard quantile regression. A related model, called the factor-augmented vector autoregression, was considered by Bernanke et al. (2005) for proper identification of monetary transmission mechanisms. A factor-augmented regression model using a least-squares approach was introduced by Stock and Watson (1999). However, as pointed out by Bai and Ng (2006), modelling procedures for factor-augmented regression are not well understood. One of the main reasons is that the factor-augmented regression model involves estimated factors. If the estimation error for a factor is not negligible, then it is important to investigate the statistical properties of the factor-augmented regression model by taking into account the effect of estimated regressors. Stock and Watson (2002a) showed the consistency of least-squares estimates obtained from factor-augmented regressions. The asymptotic normality of the estimates was investigated by Bai and Ng (2006), who also presented analytical formulas for prediction intervals that might be useful in forecasting. In the present study we make use of several ideas of the aforementioned articles to propose an information-theoretic criterion for model selection. We allow factors in the factor-augmented regression to have certain dynamic structure. However, we do not allow the dynamic to enter into the panel data directly, so that the relationship between the panel data and the factor is still static (see e.g. Bai and Ng, 2002). A crucial issue in constructing quantile regression models with factor-augmented predictors is the selection of a good model among many possible candidates. In particular, the problem of choosing the optimal number of factors that best explain the response variable becomes important. Taking into account the effect of estimated regressors, we develop an informationtheoretic criterion (Akaike, 1973, 1974) under model misspecification √ for both distributional and structural assumptions. Under the conditions T 5/8 /N →√0 and N /T → 0, we show that the bias term involved in the quantile regression is min{N , T T } consistent. The remainder of the paper is organized as follows. Section 2 proposes the quantile regression model with factor-augmented predictors. The assumptions, model structure and parameter estimation procedures are also described. Section 3 develops the information-theoretic criterion and provides some remarks on the proposed criterion. Section 4 investigates the performance of the proposed methodology using Monte Carlo simulations. In Section 5, we consider a real application. Some concluding remarks are given in Section 6. The proof of the main result is given in the Appendix.


Quantile regressions with factor predictors

3

2. MODEL AND INFERENCE 2.1. Model and assumptions Factor models are useful econometric tools that explain the variations in a large number of time series via a small number of common factors. Suppose that a set of T observations X = {x 1 , . . . , x T } is generated by the following r-factor model: x t = f t + εt ,

t = 1, . . . , T ,

(2.1)

where x t = (x1t , . . . , xNt ) is an N-dimensional observable random vector, f t = (f1t , . . . , frt ) is the r-dimensional vector of latent factors, and εt = (ε1t , . . . , εNt ) is an N-dimensional vector of random noises with mean 0 and covariance satisfying the condition described below (see Assumption 2.3). The N × r matrix r = (λ1 , . . . , λN ) contains the factor loadings. The full set of observations X can be expressed in matrix form as X = F + E, where X = (x 1 , . . . , x T ) , F = ( f 1 , . . . , f T ) and E = (ε1 , . . . , ε T ) . Let A = [tr( A A)]1/2 be the usual norm of a matrix A, where tr denotes the trace of a square matrix. Similar to Bai and Ng (2002), we adopt the following assumptions for factor models: A SSUMPTION 2.1. Common factors: the factors satisfy E( f t ) = 0, E[ f t 4 ] < ∞ and T −1 T t=1 f t f t → F as T → ∞, where F is an r × r positive-definite matrix. Here E[·] denotes the expectation operator. A SSUMPTION 2.2. Factor loadings: the factor loading matrix satisfies λi < ∞ and N −1 − → 0 as N → ∞, where is an r × r positive-definite matrix. A SSUMPTION 2.3. Noise term: the noise term εt of the model in equation (2.1) has zero mean, but may have cross-sectional dependence and heteroscedasticity. Furthermore, there exists a positive constant C < ∞ such that for all N and T, (a) E[|εit |8 ] < C for all i and t; (b) N −1 E[ε s εt ] = γ (N , s, t), |γ (N , s, s)| < C for all s, t (s, t = 1, . . . , T ) and T −1 Ts=1 Tt=1 |γ (N , s, t)| < C; (c) E[εit εj t ] = τij ,t with |τij ,t | ≤ |τij | for N some τij for all t (t = 1, . . . , T ) and N −1 N i=1 j =1 |τij | < C; (d) E[εit εj s ] = τij ,ts and N (NT )−1 Ts=1 Tt=1 N |τ | < C; and (e) for every (s, t), E[|N −1/2 N i=1 j =1 ij ,ts i=1 (eis eit − 4 E[eis eit ])| ] < C. A SSUMPTION 2.4. Moments and Central LimitTheorem. There exists a C < ∞ such that 2 for all N and T, (a) For each t, EN −1/2 T −1/2 Ts=1 N k=1 f s [εks εkt − E(εks εkt )] < C; (b) T N The r × r matrix satisfies EN −1/2 T −1/2 t=1 k=1 f t λk εkt ]2 < C; (c) For each t, as N → ∞, N −1/2 N i=1 λi εit → N (0, Q(t)), where Q(t) = lim N −1 N→∞

(d) For each i, as T → ∞, T

T −1/2

i=1

N N E λi λj εit εj t . i=1 j =1

f t εit → N (0, (i)), where

(i) = plimT →∞ T −1

T T E f t f s εis εit . s=1 t=1


4


A SSUMPTION 2.5. The eigenvalues of the r × r matrix ( · F ) are distinct. A SSUMPTION 2.6. The factors f t , idiosyncratic errors εt and factor loadings λt are three mutually independent groups. However, we allow dependence within each group. Suppose we have a large number of predictors x t ∈ R N and another smaller set of predetermined variables w t ∈ R p available. The predetermined variables can be specified by prior information, past experience or substantive theory. In many applications, it may also include some lagged values of the dependent variable to take care of serial dependence. We consider the regression model: yt = α x x t + β wt + et , where α x and β are the N- and p-dimensional coefficient parameters, and et is the noise term. If we replace yt and et by yt+h and et+h , respectively, where h is a positive integer denoting forecast horizon, the model becomes the h-step-ahead forecasting model. This regression model often requires strict restrictions on the number of parameters to avoid overfitting, especially when the dimension of x t ∈ R N is large. Since each predictor in x t contains useful information about yt , using a small subset of x t encounters the problem of omitting variables. Alternative approaches must be sought to overcome the difficulty of curse of dimensionality. In the literature, dynamic factor models have been proposed as a possible solution to the prior forecasting problem (e.g. Stock and Watson, 1999, Forni et al., 2000). The factor-augmented regression model assumes the form: yt = α f t + β w t + et ,

(2.2)

where α and β are the r- and p-dimensional coefficient parameters. The vector f t of common factors is unobservable, but can be estimated by the factor analysis method. To summarize, instead of f t , we observe a panel of data X that contains information about f t . Equations (2.1) and (2.2) constitute the diffusion-index forecasting model of Stock and Watson (2002a). A SSUMPTION 2.7. Noise component and predictors: the noise term et has zero mean (E[et+h |I t ] = 0), given all information It available up to time t. Furthermore, E[et 4 ] < C and et are independent of the idiosyncratic errors εis for all i and s. We also require (a) that z t and T −1 Tt=1 z t z t − zz → 0 as T → ∞; (b) that the quantity T −1/2 Tt=1 z t et is asymptotically normal with mean 0 and variance zz,e , where zz,e = plimT −1 Tt=1 et2 z t z t ; and (c) the predictor wt can be either predetermined or random. If it is a set of random variables, its covariance matrix is w . Assumptions 2.1 and 2.2 together imply r common factors. Assumption 2.3 allows for heteroscedasticity, weak time-series and cross-sectional dependence in the idiosyncratic component. These assumptions are more general than a traditional factor model. As pointed out by Bai (2003), Assumptions 2.4(a) and 2.4(b) are not stringent because the sums in 2.4(a) and 2.4(b) involve zero mean random variables. Also, various mixing processes satisfy 2.4(c) and 2.4(d) (Bai, 2003). Assumption 2.5 ensures that the common components are identifiable. Assumption 2.6 is standard in factor analysis, and Assumption 2.7 is standard for regression analysis. These assumptions are similar to those of Stock and Watson (2002a). In this paper we consider a quantile regression model with factor-augmented predictors. Quantile regression has many important applications in scientific fields. Contrast to the classical theory of factor-augmented regression models, where the conditional expectation of the response variable is the main focus, the quantile regression estimates the τ th conditional quantile of yt C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


5

given x t and wt , qτ (yt | z t ; γ ) = α(τ ) f t + β(τ ) wt ≡ γ (τ ) z t ,

t = 1, 2, . . . , T ,

(2.3)

where γ (τ ) is a vector of coefficients that depend on the quantile τ . If we set τ = 0.5, the model reduces to the conditional median regression, which is more robust to outliers than the conditional mean regression. To motive the proposed research, we consider a simple example. Risk management has attracted much attention lately because of the recent worldwide financial crisis. Value at Risk (VaR) is a valuable tool for assessing the market risk in risk management. From a statistical viewpoint, VaR is simply an upper quantile of the loss function of a financial position; see, for instance, Tsay (2005, ch. 7). In practice, tail probability of 1% is commonly used and the corresponding VaR can be estimated by either the 99% or 1% percentile of the underlying daily stock returns, depending on holding a short or long position on the stock. Suppose that we hold a position on stock A, which belongs to the S&P 500 index; that is, Company A is one of the top 500 public companies in the U.S. Our goal then is to estimate the VaR of the stock. In this particular case, x t may consist of daily returns of all stocks in the S&P 500 index except stock A and wt may consist of daily returns of the three common factors of the famous Fama and French model; see Fama and French (1993). More specifically, w t contains the daily returns of the following three portfolios: (1) the size portfolio denoted by SMB, (2) the book-tomarket portfolio denoted by HML and (3) the market portfolio with excess return ER. Here the dimension of x t is 499, whereas that of wt = (SMBt , HMLt , ERt ) is 3. Of course, w t may also contain changes of some macroeconomic variables and some past returns of stock A, for example yt−1 and yt−2 . From the discussion, it is clear that for this simple example the dimension of x t is much larger than that of w t and it would not be realistic to use x t directly in the quantile regression model. Our proposed approach is to construct common factors from x t and use the estimated common factors along with wt in the quantile regression analysis to estimate VaR. 2.2. Inference with estimated factors The inference procedure consists of two stages. In the first stage, common factors are estimated from the panel of data by the method of asymptotic principal component analysis. In the second, the estimated factors are used in the standard quantile regression. To estimate the factor model in (2.1), we adopt the method of asymptotic principal component analysis studied by Connor and Korajczyk (1986), Stock and Watson (1998) and Forni r of F r and r can be obtained by minimizing tr{(X − et al. (2000). The estimates F r and F r r ) (X − F r r )}/NT , where the T × r matrix F r and the N × r matrix r are subject to the normalization condition F r F r /T = I r and r r /N = I r , where I r is the r × r identity matrix. Specifically, the asymptotic principal component estimates are: √ √ r = N U r , r Fr = V (2.4) Dr / N , r and V r are the first r columns of the matrices U and V where the T × r and N × r matrices U in the singular value decomposition of X such that X = U DV , where D is a diagonal matrix consisting of the ordered singular values of X, and columns of U and V satisfy the orthogonality Dr = diag{d1 , d2 , . . . , dr } consists of the first restrictions V r V r = I r and U r U r = I r . In (2.4), r singular values in D satisfying d1 > · · · > dr > 0. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

6


Putting the estimated factors into the factor-augmented regression model in (2.3) yields: qτ (yt | zˆ t ; γ (τ )) = α(τ ) fˆ t + β(τ ) wt ≡ γ (τ ) zˆ t ,

t = 1, 2, . . . , T ,

(2.5)

where γ (τ ) = (α (τ ), β (τ )) , zˆ t = ( fˆ t , w t ) and fˆ t is the r-dimensional vector of estimated common factors. The unknown parameters γ (τ ) are estimated by maximizing the log-likelihood function: T

1 T T Z) = log τ (1 − τ ) exp − ρτ yt − zˆ t γ (τ ) , (2.6) τ ( yT ; γ (τ ), T t=1 Z = (ˆz 1 , . . . . , zˆ T ) . Thus the log-likelihood with ρτ (u) = u(τ − I (u < 0)), yT = (y1 , . . . ., yT ) , function is formed by independently distributed asymmetric Laplace densities (Yu and Moyeed, 2001). The estimate γˆ is obtained as a solution of ∂(γ )/∂γ = 0. This is a linear optimization problem. Koenker and Bassett (1978) showed that a solution is −1

γˆ (τ ) = Z (hτ ) y(hτ ),

(2.7)

where hτ is a p-element index subset from the set {1, 2, . . . , n}, Z(hτ ) refers to indexed rows in Z = (ˆz 1 , . . . , zˆ n ) , y(hτ ) refers to elements in y selected by hτ . The indexing notation is also used in Koenker (2005). Replacing the unknown parameter γ (τ ) by γˆ (τ ), we obtain the estimated quantile regression model with factor-augmented predictors: qτ (yt | zˆ t ; γˆ (τ )) = γˆ (τ ) zˆ t

t = 1, 2, . . . , T .

After constructing the model, its goodness of fit should be assessed from a predictive point of view. In the next section we propose an information criterion for this purpose.

3. MODEL SELECTION In the previous section, we constructed the quantile function qτ (yt | zˆ t ; γˆ (τ )). In this section, we study the adequacy of qτ (yt | zˆ t ; γˆ ) as an estimate of the true quantile function qτ (yt ) from a predictive point of view. To this end, we propose a new model selection criterion. 3.1. Information-theoretic approach Suppose that the responses y1 , . . . , yn are generated independently from an unknown true distribution G(y) with probability density g(y). We regard g(y) as a target probability function that generates the data. In real applications, it is difficult to obtain precise information on the structure of a system or a process from a finite number of observed data. Therefore, we use a parametric family of distributions with density f (yt | zˆ t , γ ) to approximate the true model g(y). Here γ is estimated by the maximum likelihood method and we denote the estimate by γˆ MLE . This situation is commonly considered in model selection studies (Konishi and Kitagawa, 1996, Hansen, 2005, Ando, 2007). Suppose that uT = (u1 , . . . , uT ) are replicates of the response variable yT drawn from g(u). To assess the closeness of qτ (yt | zˆ t ; γˆ ) to qτ (yt ), the deviation of qτ (yt | zˆ t ; γˆ ) from the true C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


quantile function qτ (yt ) is measured using the expected log-likelihood:

Z)dG(uT ), ητ (G; γˆ , Z) := τ (uT ; γˆ ,

7

(3.1)

where dG(u) is the Lebesgue measure with respect to the true probability density g(u). The best model is chosen by maximizing the expected log-likelihood function in (3.1) among different candidate models. Here the optimal number of factor is defined as the value r that maximizes the expected log-likelihood. We like to point out that one might consider the use of other utility function. However, the maximization of the expected log-likelihood function in (3.1) is a natural approach in the context of information-theoretic framework (Akaike, 1973). Thus, from a ‘predictive’ point of view, the optimal number of factors can be determined. Also, a similar approach was employed by Ando and Tsay (2009a) on model selection for generalized linear models with factor-augmented predictors. Note that the expected log-likelihood depends on the unknown true distribution G(u) and on the observed data y1 , . . . , yT taken from G( y). Thus, we must construct an estimator of the expected log-likelihood in (3.1). A natural estimate of the expected log-likelihood is the samplebased log-likelihood ˆ γˆ , Z) = ητ (G;

ˆ y) = log [τ (1 − τ )] − τ ( y; γˆ , Z)d G(

T

1 ρτ yt − zˆ t γˆ (τ ) , T t=1

(3.2)

which is formally obtained by replacing the unknown distribution G(u) in (3.1) by the empirical ˆ γˆ ) ˆ y), putting math 1/T on each observation yt . The log-likelihood ητ (G; distribution, G( generally has a positive bias as an estimator of the expected log-likelihood ητ (G; γˆ ), because the same data are used both to estimate the parameters of the model and to evaluate the expected log-likelihood. Therefore, the bias correction of the log-likelihood should be considered. The bias bτ (G) of the log-likelihood in estimating the expected log-likelihood ητ (G; γˆ ) is given by

ˆ γˆ , Z) − ητ (G; γˆ , Z)]dG( yT ). bτ (G) := [ητ (G; If the bias bτ (G) can be estimated using an appropriate procedure, the bias-corrected loglikelihood is given by ˆ γˆ , Z) − bˆτ (G). ητ (G; Also, if one wishes to obtain a general form of information criteria (Akaike, 1974), multiplying (−2), we have the form ˆ γˆ , Z) − bˆτ (G)}, IC = −2{ητ (G;

(3.3)

where bˆτ (G) is an estimator of the bias bτ (G). The first term on the right-hand side of the prior equation measures the goodness of fit of the model and the second term is a penalty that measures the complexity of the statistical model. The remaining problem is how to construct an estimator of the bias. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

8


3.2. Main result Here we shall describe the asymptotic normality of γˆ . The asymptotic normality of γˆ is established by Bai and Ng (2008). Additional assumptions on the model are A SSUMPTION 3.1. (a) Let γ 0 (τ ) be an interior point of a compact set of parameter space and it maximizes ητ (G; γ , H Z). Here H is a diagonal matrix H = diag(S, I), with S = −1 ( is the r × r diagonal matrix consisting of the r largest V F r F r /T )(r r /N) and V eigenvalues of X X /(T N ). The log-likelihood function converges uniformly in probability to the expected log-likelihood function on the parameter space. (b) Hessian: the expected log-likelihood function is twice continuously differentiable on a neighbourhood of γ 0 (τ ), and the order of the difference between the Hessian matrices of the log-likelihood function and the expected log-likelihood is op (1). (c) T −1/2 ∂τ ( yT ; γ 0 (τ ), Z)/∂γ → N (0, I τ (Z, γ 0 (τ ))), with I τ (Z, γ (τ )) being a positive definite matrix. In our problem, it is given as: I τ (Z) =

1 τ (1 − τ )Z Z. T

(d) supγ T −1 ||∂τ ( yT ; γ 0 (τ ), Z ∗ )/∂γ ||2 = Op (1) uniformly in a neighbourhood of Z such that max1≤t≤T ||Z ∗ − Z|| < cNT with cNT → 0. (e) ∂τ ( yT ; γ 0 (τ ), Z)/∂γ ∂ f t is uncorrelated with yt and E||∂τ ( yT ; γ 0 (τ ), Z)/∂γ ∂ f t ||2 < C for all t. (f) Let hj (yt , z t ; γ ) = ρτ (yt − z t γ (τ ))/∂γj : T −1

T 2

∂ hj yt , z ∗ ; γ ∗ (τ ) ∂ f t ∂ f 2 = Op (1), t t t=1

T

−1

T 2

∂ hj yt , z ∗ ; γ ∗ (τ ) ∂ f t ∂γ 2 = Op (1), t t=1

T 2

∂ hj yt , z ∗ ; γ ∗ (τ ) ∂γ ∂γ 2 = Op (1), T −1 t t=1

where z ∗t and γ ∗ (τ ) are in a neighbourhood of z t and γ (τ ) such that max1≤t≤T ||z ∗t − z t || < cNT and ||γ (τ ) − γ ∗ (τ )|| < cNT with cNT → 0. Then we have an asymptotic normality of γˆ (τ ) as shown by Bai and Ng (2008). Under Assumptions 2.1–2.7 and 3.1, one of the contributions of this paper is the following theorem. T HEOREM 3.1. Let qτ (yt | zˆ t ; γˆ ) be a quantile regression model with factor-augmented predictors estimated by methods of the previous section. Suppose that the assumed model does not necessarily contain the true model that generates the data g(y). Then, under Assumptions 2.1–2.7 and 3.1, N 1/2 /T → 0 and T 5/8 /N → 0(T , N → ∞), and a model selection criterion for evaluating the estimated model is: ˆ IC = −2τ ( yT ; γˆ , Z) + 2T b(G),

(3.4)



9

ˆ where the bias term b(G) is approximately given by T 1 1 1 ˆ , b(G) = tr J −1 (H Z) · I (H Z) + g(ξ )tr[K (γ (τ )) · (t)] + O τ t τ z 0 τ T T N t=1 BN,T where BN,T = min{N , T

√

T } and

1 τ (1 − τ )(H Z) (H Z), T 1 J τ (H Z) = (H Z) M(H Z), T K τ (γ 0 (τ )) = γ 0 (τ ) γ 0 (τ ), −1 V R Q t RV −1 L f ,w z (t) = , L f ,w w I τ (H Z) =

where M = diag{g(ξ1 (τ )), . . . , g(ξn (τ ))} is the T-dimensional diagonal matrix with the t-th element of M being the value of τ th quantile of the density function g(ξt (τ )), with ξt (τ ) = G−1 (τ |H z t ) and P (yt ≤ y|H z t ) = G(y|H z t ), γ 0 minimizes the expected log N likelihood function, Q t = N −1 N i=1 j =1 E[λi λj εit εj t ], V = plim V , R = plim F r F r /T . H is the r × r −1 ( F r F r /T )(r r /N ) and V is a diagonal matrix H = diag(S, I), with S = V diagonal matrix consisting of the r largest eigenvalues of X X /(T N ). √ The bias term is BN,T = min{N , T T } consistent. The first term in the bias estimate is a penalty for model complexity. The second term considers the uncertainty in using estimated regressors fˆ t . The order of the second term is O(N −1 ), so it vanishes as N → ∞. Although the theorem imposes restrictions on the relationship between N and T, the theorem is not a sequential limit result but a simultaneous one. However, note that the theorem holds for many combinations of N and T, since the restrictions on the relationship between N and T are not strong (Bai, 2003). It can readily be seen that the proposed criterion (3.4) is a consistent estimator of the expected log-likelihood with the order BN,T . Therefore, the proposed criterion asymptotically selects the optimal number of factors that maximizes the expected log-likelihood. In fact, the simulation study in Section 4 shows that the accuracy of the proposed criterion for selecting the correct number of factors increases as N and T become large. 3.3. Comments To construct the bias-corrected mean-squared error, we need to estimate the above unknown quantities. First, the true density g(·) in (3.4) should be estimated by some methods. One natural approach is to replace g(·) by the parametric family of density f (·) estimated by the maximum likelihood method. In this paper, we employed the normal density function for f (·) and estimated the model parameters based on the maximum likelihood method. Even when the true density g(·) is not Gaussian, our simulation study shows that the proposed criterion with the use of a normal density function for f (·) still works. Based on the data characteristics, one can also use other parametric density function such as the Student-t and gamma density. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

10


We can estimate the matrices I τ (H Z), J τ (H Z) and K τ (γ 0 (τ )) by 1 τ (1 − τ ) Z Z, T 1 Z) = J τ ( Z M Z, T K τ (γˆ (τ )) = γˆ (τ )γˆ (τ ) , I τ ( Z) =

where M = diag{f (ξˆ1 (τ )), . . . , f (ξˆn (τ )} is the T-dimensional diagonal matrix, ξˆt (τ ) = F −1 (τ |ˆz t ) and f (ξˆ1 (τ )) is the estimated parametric density function. With respect to the matrix z (t), the results of Bai and Ng (2006) are available. If there are non-zero cross-sectional correlations and/or some seasonality in the errors, Bai and Ng (2006) showed that Q t can be estimated by n T n 1 ˆ ˆ Qt = λi λj εˆ it εˆ j t , nT i=1 j =1 t=1 with n/ min{N, T } → 0, εˆ it = xit − λˆ i fˆ t . Alternatively, assuming that εit is cross-sectionally uncorrelated with εj t , an estimator of Q t is N 1 2 ˆ ˆ Qt = εˆ λi λi . N i=1 it

Bai and Ng (2006) pointed out that the assumption that idiosyncratic errors are cross-sectionally uncorrelated is not especially restrictive, since most of the cross-correlations in the data are presumably captured by the common factors. Therefore, we can estimate the matrix z (t) by −1 −1 Qt V L f ,w V z (t) = , (3.5) w L f ,w w and L f ,w are the usual covariance matrix estimate of w t and of f t and wt . When where we assume the independence between f t and wt , we have a block diagonal matrix z (t) = −1 w }. Furthermore, if wt assumes a fixed value, the estimate can be simplified −1 , Qt V diag{ V −1 −1 , O}. Substituting these estimates into the equation for IC, we can as z (t) = diag{ V Qt V choose the model that minimizes the IC score. Since IC depends on the combination of predictors and the sampling density, the IC score can be used to select the predictors if the sampling model is fixed. In general, the IC score selects the best combination between predictors and sampling distributions. If one is only interested in determining the optimal number of factors r in panel data X, suitable criteria are already available in the literature (e.g. Bai and Ng, 2002). Rather, the proposed criterion is designed to select the factors f t that most accurately predict the targeted variable yt . In this paper, we illustrate the proposed criterion to select the number of factors r, but the proposed criterion is also applicable for selecting the optimal combination of factors. Also, the proposed criterion can focus on the predictive ability of a predetermined (or random) variable wt . Therefore, the proposed criterion can be applied in a wide range of problems. In practice, one can employ the standard variable selection procedures, for example exhaustive search and forward/backward search methods, with the information criterion of C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


11

Theorem 3.1 to perform model selection. For example, if there are five possible factors and three possible predetermined (or random) variable wt , there are 28 = 256 possible combinations. Using a variable selection procedure, one can find an optimal combination of predictors by minimizing the information criterion of Theorem 3.1.

4. SIMULATION STUDY To evaluate performance of the proposed model-selection criterion in finite samples, we carried out some simulation studies. We also compared the proposed criterion with a procedure suggested by Bai and Ng (2002) to select the number of factors for dynamic factor models. First, we use the proposed criterion to select a set of variables that contributes to the prediction of quantiles of the dependent yt . To this end, we employ four data-generating processes (DGP) that include cases of heavy tails and conditional heteroscedasticity. We used various configurations of sample size T, dimension N, and the true number of common factors r. The values of T and N are included in the summary tables and we chose r = 4 for the first three DGPs and r = 7 for the fourth DGP. The first two DGP assume the form x t = f t + σε ε t , (4.1) yt = β1 f1t + β2 f2t + β3 f3t + β4 f4t + σe et , where the four-dimensional common factor f t consists of four independent N (0, 1) variables and each element of the factor loading matrix also follows N (0, 1). For the first DGP, the N-dimensional noise vector εt is multivariate normal with mean 0 and √ variance I N , and the error term et follows N (0, 1). The other parameters are σε = 1, σe = 0.5 and (β1 , . . . , β4 ) = (2, −1, 5, 1). For the second DGP, each element of εt and et follows a Student-t distribution with seven degrees of freedom, σε = 1 and σe = 0.7. Other parameters are the same as before. The third DGP has conditional heteroscedasticity and assumes the model ⎧ x t = f t + σε ε t , ⎪ ⎪ ⎛ ⎞ ⎨ p (4.2) ⎪ y = β1 f1t + β2 f2t + β3 f3t + β4 f4t + ⎝ ζj fj t ⎠ σe et , ⎪ ⎩ t j =1

where the coefficient vectors are β = (2, −1, 5, 4) and ζ = (1, 0.5, 0.5, 0.8) , σε = 1 and σe = 0.7. The four-dimensional factor f t , the N-dimensional noise vector εt and the error term et are generated by the same distributions as those of the first DGP. The fourth DGP employs the model x t = f t + σε ε t , (4.3) yt = β1 f1t + β2 f2t + β3 f3t + β4 f4t + β5 f5t + β6 f6t + β7 f7t + σe et , where the coefficient vector is β = (2, −1, 5, 4, 3, 2, −3) , each element of ε t and et follows a Student-t distribution with 20 degrees of freedom, σε = 1 and σe = 0.7. Each element of the seven-dimensional factor f t follows a Student-t distribution with five degrees of freedom, and each element of the factor loading matrix follows N (0, 1). Thus, both the common factors and the disturbances have heavy tails. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

12


Table 1. Frequencies of under-, correct and over-specification (U, C, O) of the proposed model selection criterion. Data 1 Data 2 Data 3 Data 4 N

T

U

50

50

8

50 50 50

100 200 400

6 6 0

100 100

50 100

2 2

100 100 200

200 400 50

200 200 200

C

O

U

C

O

U

C

O

U

C

59

33

78 76 71

16 18 29

20

52

28

8 2 0

85 76 83

7 22 17

81 86

17 12

6 2

71 80

0 2 0

86 60 89

14 38 11

4 2 0

100 200 400

2 2 2

82 65 55

16 33 43

400 400

50 100

0 4

82 80

400 400

200 400

4 2

75 67

O

22

49

29

18

53

29

6 0 2

57 84 83

37 16 15

8 1 1

70 77 79

22 22 20

23 18

8 4

65 77

27 19

15 0

55 83

30 17

90 83 71

6 15 29

2 0 2

97 97 79

1 3 19

0 0 4

85 85 71

15 15 25

2 0 0

90 88 77

8 12 23

2 0 0

83 99 99

15 1 1

0 0 0

85 84 88

15 16 12

18 16

0 0

71 77

29 23

0 0

73 80

27 20

2 0

53 81

45 19

21 31

0 0

99 80

1 20

0 0

99 100

1 0

0 0

93 98

7 2

Note: Data were generated from models (4.1) to (4.3). The results are based on 100 replications and for quantile regression with τ = 0.01.

Tables 1 and 2 report the percentages of under-, correct and over-specification of the proposed criterion in selecting the number of common factor when τ = 0.01 and 0.05, respectively. In the simulation, the possible values of r are from 0 to 10 and the results are based on 100 replications. From the table, the proposed information criterion is capable of selecting the data-generating model even when T and N are small. Also, as expected, the chance of under-specification by the proposed criterion decrease when τ increases from 0.01 to 0.05. Finally, the proposed criterion works well when both N and T are large. As suggested by a referee, we also compared the proposed criterion with the method of Bai and Ng (2002). The latter is developed for the dynamic factor models with least squares regression not with quantile regression. Strictly speaking, this is not a fair comparison. Our goal here is simply to demonstrate the difference in performance between the two criteria when applied to quantile regression. For the comparison, we considered the following model: x t = f t + εt , (4.4) yt = 1 + β1 f1t + et , where the four-dimensional factor f t consists of independent N (0, 1) variables, each element of the factor-loading matrix also follows N(0, 1), the N-dimensional noise vector εt is multivariate normal with mean 0 and variance I N , and the error term et follows N (0, 4). The coefficient is β1 = −0.15 and the quantile is τ = 0.05. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

13


Table 2. Frequencies of under-, correct and over-specification (U, C, O) of the proposed model selection criterion. Data 1 Data 2 Data 3 Data 4 N

T

U

C

O

U

C

O

U

C

O

U

C

O

50

50

0

55

45

0

48

52

2

61

37

0

65

35

50 50 50

100 200 400

2 0 6

77 68 61

21 32 33

0 2 0

82 82 77

18 16 23

0 0 0

71 77 82

29 23 18

0 0 0

71 72 75

29 28 25

100 100

50 100

0 0

78 86

22 14

0 0

33 72

67 28

0 0

60 90

40 10

0 0

74 76

26 24

100 100 200

200 400 50

0 0 0

77 71 58

23 29 42

0 2 0

87 79 52

13 19 48

0 0 0

88 89 39

12 11 61

0 0 0

85 87 55

15 13 45

200 200 200

100 200 400

0 0 0

84 89 67

16 11 33

0 0 0

64 87 82

36 13 18

0 0 0

76 82 89

24 18 11

0 0 0

67 82 81

23 18 19

400 400

50 100

0 0

48 65

52 35

0 0

25 51

75 49

0 0

39 49

61 51

0 0

56 59

44 41

400 400

200 400

0 0

91 93

9 7

0 0

72 88

28 12

0 0

86 93

14 7

0 0

77 80

23 20

Note: Data were generated from models (4.1) to (4.3). The results are based on 100 replications for quantile regression with τ = 0.05.

Table 3 reports the percentages of under-, correct and over-specification of the number of factors by the two criteria. Again, results are based on 100 replications and the possible values for r are from 0 to 10. From the table, the proposed information criterion outperforms its competitor. The table also gives the average and standard deviation of mean squared errors between the true quantile and the estimated quantile when the model was selected by either the proposed criterion or that of Bai and Ng (2002). As shown in Table 3, the proposed model-selection criterion has a smaller average of mean squared errors for all configurations of N and T. Thus, as expected, for quantile regression with augmented regressors the proposed criterion fares better than the method of Bai and Ng (2002).

5. REAL DATA ANALYSIS In this section, we apply the proposed method to forecast quantiles of the quarterly growth rate of gross domestic product (GDP) of Japan. There is substantial literature on the GDP growth prediction. See, for example, Stock and Watson (2002b, 2003, 2004), Zellner and Chen (2002), Kitchen and Monaco (2003), Baffigi et al. (2004), Altissimo et al. (2006), Schumacher and Breitung (2008) and Ando and Tsay (2009b). However, our analysis focuses on quantile estimation. Stock prices are believed to contain forward-looking information of the economy. They can be used as a leading indicator for the GDP growth. Consequently, to construct the C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

14

T. Ando and R. S. Tsay Table 3. Comparison between two criteria. Method 1 Method 2 Method 1

N

Method 2

T

AMSE

SD

AMSE

SD

U

C

O

U

C

O

50 50

50 100

1130.5 1087.8

935.6 923.6

1942.2 1605.5

1081.2 960.7

0 0

54 62

46 38

0 0

0 0

100 100

50 100 100

200 50 100

1068.1 657.0 504.1

905.2 627.2 445.6

1536.8 920.1 906.9

1055.9 528.7 479.9

0 0 0

65 56 67

35 44 33

0 0 0

0 0 0

100 100 100

100 200

200 50

501.5 302.0

444.0 311.5

938.0 459.7

604.9 293.2

0 0

70 65

30 35

0 0

0 0

100 100

200 200

100 200

281.0 275.7

241.4 231.7

407.5 534.6

264.7 317.0

0 0

70 71

30 29

0 0

0 0

100 100

Note: Method 1 denotes the proposed criterion and method 2 that of Bai and Ng (2002). The measurements used are frequencies of under-, correct and over-identification (U, C, O) of the number of factors and the average and standard deviation of mean squared errors between the true quantile and the estimated quantile. Data were generated from model (4.4) and the quantile value is for τ = 0.05. The mean squared errors are multiplied by 103 . The results are based on 100 replications.

common factors, we employed a panel data on 1284 stock returns listed on the Tokyo Stock Exchange. The selected companies belong to many industries, including Food, Textile, Apparel, Wood manufacture, Paper, Printing Chemical, Petroleum products, Ceramic, Steel, Non-ferrous metal, Metal, General machinery, Electronics, Automobile, Transportation products, Precision machinery, Agriculture, Mining, Construction, Electromechanical service, Banking, Insurance, Securities, Other financial service, Gas service, Information and telecommunication, Common carriers, Wholesale and retail, Real estate, Hotel and Services. Thus, the selected stocks provide a good representation of the Japanese stock market. We consider quantiles of the one-quarter-ahead prediction of the real GDP growth rate yt . To this end, we employ the model: qτ (yt+1 | z t ; γ ) =

r j =1

α(τ )ftj + β0 + β1 SRt + b2 Mt +

2

b3+j yt−j + et ,

(5.1)

j =0

where et follows a normal distribution with mean 0 and variance σ 2 . Besides the lagged values of yt , the other two predetermined predictors are SRt , the quarterly nominal stock return and Mt , the quarterly growth rate of the real monetary base (M2). Following the work of Zellner and Chen (2002), and for simplicity, we included the five variables SRt , Mt , yt , yt−1 , yt−2 in the model as wt . If necessary, the proposed criterion can also be used to refine the selection of predetermined variables. The common factors in f t are obtained by the asymptotic principal component analysis of the 1284 stock returns. Specifically, we consider forecasting the quantiles of GDP growth rate over the period from the third quarter of 2001 to the third quarter 2005. To estimate the model, we used the past 50 observations from the forecast origin, that is T = 50. For example, suppose the forecast origin is the first quarter of 2002. Then, the GDP growth rates yt used in the estimation were from the second quarter of 1989 to the first quarter of 2002, and SRt and Mt used were from the fourth quarter 1989 to the first quarter 2002. Once the quantile regression qτ (yt+1 | z t ; γ ) in (5.1) was C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

15


0.10

GDP Growth

0.05

0.00

−0.50

−0.10 2002.3

2003.2

2004.2

2005.2

Figure 1. Real data application. 5

4

3

2

1

0 2002.3

2003.2

2004.2

2005.2

Figure 2. Real data application.

estimated, we used it to predict the quantiles of yt for the second quarter of 2002. This estimationforecasting procedure is repeated for each forecast origin. In this particular application, it is obvious that estimated common factors and the predictors in wt might be correlated so that we z (t) in (3.5) is used in valuing the considered f t and wt as one vector z t and the statistic proposed criterion. Figure 1 shows the predicted 5% and 95% quantiles and the observed GDP growth rates in the forecasting sample. Next-quarter predictive 5% and 95% quantiles of real growth rate of Japanese GDP. The dashed lines are quarterly predictive 5% and 95% quantiles. From the plot, the actual GDP growth (solid line) is between the predicted 5% and 95% quantiles. It is also shown that the observed GDP growth rate is close to the predicted 95% quantile from the second quarter of 2003 to the first quarter 2004. We found that the numbers of factors to predict 5% and 95% quantiles vary over time. Figure 2 shows the selected number of factors at each time point C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

16


in the forecasting subperiod. The solid line is for the predicted 5% quantile and the dashed line represents the predicted 95% quantile. We can see that the selected number of factors depends on the quantile of interest.

6. CONCLUDING REMARKS For situations with a large number of series, each containing a certain amount of information for predicting a response variable of interest, we propose a statistical modelling methodology that first estimates common factors from a panel of data using principal component analysis and then applies the estimated factors to a standard quantile regression. A crucial issue involved is evaluating the predictability of an estimated model. Taking into account the effect of estimated regressors, we developed an information-theoretic criterion in this paper. Monte Carlo simulations demonstrated that the proposed criterion performs well in various settings. To the best of our knowledge, there are no criteria available for selecting a standard quantile regression model, that is qτ (yt | wt ; β) = β(τ ) w t ,

t = 1, 2, . . . , T ,

where wt denotes the observed predictors. We like to point out that the proposed criterion is applicable for selecting the optimal predictor wt , even though the standard quantile regression model is not the main focus of this paper. Since the quantile function qτ (yt | wt ; γ ) does not include the estimated predictor, the IC score (3.4) reduces to ˆ W ) + 2T b(G), ˆ IC = −2τ ( yT ; β, ˆ where βˆ is the maximum likelihood estimate, W = (w1 , . . . , w T ) , the bias term b(G) is approximately given by 1 ˆb(G) = 1 tr J −1 , √ τ (H Z) · I τ (H Z) + O T T T where the matrices I τ and J τ are given in (3.4). The derivation is easy because there is no randomness in the predictors. There are many directions for further research. First, for simplicity and as an introduction to our methodology, we only considered linear combinations of the estimated factor fˆ t and other variables wt in this paper. In fact, we can consider non-linear predictors of the regressors using splines, B-splines, wavelets, kernels or other non-parametric techniques. In this case, replacing γ and z (t) with the corresponding estimates, the proposed information Z, γˆ ), the matrices J τ ( criterion continues to apply. Second, when the dimension of z t is greater than the number of observations T in the quantile regression, the proposed criterion is not applicable, because of its dependence on asymptotic theory. In this case the boosting approach of Bai and Ng (2008) could be useful. Therefore, a possible extension of our methodology is to combine the proposed model with a boosting procedure. Third, for simplicity we used an approximate factor model to construct the estimated regressors. If dynamic dependence exists among the common factors, then lagged values of observables can be used to construct the estimated regressors. In this case, replacing the matrices z (t) by their corresponding estimates, one can show that the proposed γ and J( Z, γˆ ), criterion remains applicable. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


17

Fourth, the model considered in the paper looks like a regression model. We like to point out that other model specifications are also possible. Consider h-step-ahead forecasting of the quantile function of yt . One of the commonly used specifications is

qτ (yt+h |z t ; γˆ (τ )) = γ (τ ) z t , x t = f t + εt ,

for t = 1, . . . , T . Here again h is the forecast horizon. Therefore, our method can be applied to the general forecasting problem. Finally, the proposed approach could be useful in providing new solutions to model combinations that have been found useful in Hansen (2005).

ACKNOWLEDGMENTS The authors would like to thank the editor and anonymous reviewers for their constructive and helpful comments.

REFERENCES Abrevaya, J. (2001). The effects of demographics and maternal behavior on the distribution of birth outcomes. Empirical Economics 26, 247–57. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov and F. Csaki (Eds.), Proceedings of the Second International Symposium on Information Theory, 267–81. Budapest: Akademiai Kiado. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 716–23. Altissimo, F., R. Cristadoro, M. Forni, M. Lippi and G. Veronese (2006). New eurocoin: tracking economic growth in real time. Working paper, Centre for Economic Policy Research. Ando, T. (2007). Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models. Biometrika 94, 443–58. Ando, T. and R. Tsay (2009a). Model selection for generalized linear models with factor-augmented predictors (with discussion). Applied Stochastic Models in Business and Industry 25, 207–46. Ando, T. and R. Tsay (2009b). Predictive marginal likelihood for the Bayesian model selection and averaging. Forthcoming in International Journal of Forecasting. Angelini, E., J. Henry and R. Mestre (2001). Diffusion index-based inflation forecasts for the euro area. Working paper, European Central Bank. Artis, M. J., A. Banerjee and M. Marcellino (2001). Factor forecasts for the UK. Working paper, Bocconi University. Baffigi, A., R. Golinelli and G. Parigi (2004). Bridge models to forecast the euro area GDP. International Journal of Forecasting 20, 447–60. Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71, 135–72. Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models. Econometrica 70, 191–221. Bai, J. and S. Ng (2006). Confidence intervals for diffusion index forecasts and inference for factoraugmented regressions. Econometrica 74, 1133–50. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

18


Bai, J. and S. Ng (2008). Extremum estimation when the predictors are estimated from large panels. Annals of Economics and Finance 9, 201–22. Banerjee, A. and M. Marcellino (2006). Are there any reliable leading indicators for US inflation and GDP growth? International Journal of Forecasting 22, 137–51. Bernanke, B. S. and J. Boivin (2003). Monetary policy in a data-rich environment. Journal of Monetary Economics 50, 525–46. Bernanke, B., J. Boivin and P. Eliasz (2005). Factor augmented vector autoregressions (FVARs) and the analysis of monetary policy. Quarterly Journal of Economics 120, 387–422. Boivin, J. and S. Ng (2003). Are more data always better for factor analysis?. Working paper, National Bureau of Economic Research. Boivin, J. and S. Ng (2005). Understanding and comparing factor-based forecasts. Working paper, National Bureau of Economic Research. Buchinsky, M. (1994). Changes in the U.S. wage structure 1963–1987: application of quantile regression. Econometrica 62, 405–58. Chamberlain, G. (1994). Quantile regression, censoring, and the structure of wages. In C. Sims (Ed.), Advances in Econometrics: Sixth World Congress, 171–209. Cambridge: Cambridge University Press. Chaudhuri, P. (1991). Nonparametric estimates of regression quantiles and their local Bahadur representation. Annals of Statistics 19, 760–77. Chaudhuri, P., K. Doksum and A. Samarov (1997). On average derivative quantile regression. Annals of Statistics 25, 715–44. Chernozhukov, V. and C. Hansen (2004). The effects of 401(k) participation on the wealth distribution: an instrumental quantile regression analysis. Review of Economics and Statistics 86, 735–51. Chernozhukov, V. and C. Hansen (2006). Instrumental quantile regression inference for structural and treatment effect models. Journal of Econometrics 132, 491–525. Connor, G. and R. A. Korajczyk (1986). Performance measurement with the arbitrage pricing theory: a new framework for analysis. Journal of Financial Economics 15, 373–94. Connor, G. and R. A. Korajczyk (1988). Risk and return in an equilibrium APT: application of a new test methodology. Journal of Financial Economics 21, 255–89. Donald, S. G. and H. J. Paarsch (1993). Piecewise pseudo-maximum likelihood estimation in empirical models of auctions. International Economic Review 34, 121–48. Fama, E. and K. French (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics 33, 3–56. Forni, M., M. Hallin, M. Lippi and L. Reichlin (2000). The generalized dynamic factor model: identification and estimation. Review of Economics and Statistics 82, 540–54. Forni, M., M. Hallin, M. Lippi and L. Reichlin (2001). Do financial variables help in forecasting inflation and real activity in the euro area? Journal of Monetary Economics 50, 1243–55. Forni, M., M. Hallin, M. Lippi and L. Reichlin (2004). The generalized factor model: consistency and rates. Journal of Econometrics 119, 231–55. Forni, M. and M. Lippi (2001). The generalized dynamic factor model: representation theory. Econometric Theory 17, 1113–41. Forni, M. and L. Reichlin (1998). Let’s get real: a factor-analytic approach to disaggregated business cycle dynamics. Review of Economic Studies 65, 453–73. Geweke, J. (1977). The dynamic factor analysis of economic time series. In D. J. Aigner and A. S. Goldberger (Eds.), Latent Variables in Socio-Economic Models, 365–83. Amsterdam: North-Holland.



19

Gutenbrunner, C. and J. Jureckova (1992). Regression rank scores and regression quantiles. Annals of Statistics 20, 305–30. Hansen, B. E. (2005). Challenges for econometric model selection. Econometric Theory 21, 60–68. Hendricks, W. and R. Koenker (1992). Hierarchical spline models for conditional quantiles and the demand for electricity. Journal of the American Statistical Association 87, 58–68. Jones, C. S. (2001). Extracting factors from heteroskedastic asset returns. Journal of Financial Economics 62, 293–325. Kitchen, J. and R. Monaco (2003). The U.S. Treasury staff’s real-time GDP forecast system. Business Economics 38, 10–19. Knight, K. (1998). Limiting distributions for L1 regression estimators under general conditions. Annals of Statistics 26, 755–70. Koenker, R. (2005). Quantile Regression. Cambridge: Cambridge University Press. Koenker, R. and G. Bassett (1978). Regression quantiles. Econometrica 46, 33–50. Koenker, R. and O. Geling (2001). Reappraising medfly longevity: a quantile regression survival analysis. Journal of the American Statistical Association 96, 458–68. Koenker, R. and S. Portnoy (1987). L-estimation for linear models. Journal of the American Statistical Association 82, 851–57. Konishi, S. and G. Kitagawa (1996). Generalized information criteria in model selection. Biometrika 83, 875–90. Koop, G. and S. Potter (2004). Forecasting in dynamic factor models using Bayesian model averaging. Econometrics Journal 7, 550–65. Portnoy, S. (1991). Asymptotic behavior of regression quantiles in nonstationary, dependent cases. Journal of Multivariate Analysis 38, 100–13. Portnoy, S. and R. Koenker (1997). The Gaussian hare and the Laplacian tortoise (with discussion). Statistical Science 12, 279–300. Powell, J. L. (1986). Censored regression quantiles. Journal of Econometrics 32, 143–55. Schumacher, C. and J. Breitung (2008). Real-time forecasting of German GDP based on a large factor model with monthly and quarterly data. International Journal of Forecasting 24, 386–98. Stock, J. H. and M. W. Watson (1998). Diffusion indexes. Working paper, National Bureau of Economic Research. Stock, J. H. and M. W. Watson (1999). Forecasting inflation. Journal of Monetary Economics 44, 293– 335. Stock, J. H. and M. W. Watson (2002a). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97, 1167–79. Stock, J. H. and M. W. Watson (2002b). Macroeconomic forecasting using diffusion indexes. Journal of Business and Economic Statistics 20, 147–62. Stock, J. H. and M. W. Watson (2003). Forecasting output and inflation: the role of asset prices. Journal of Economic Literature 41, 788–829. Stock, J. H. and M. W. Watson (2004). Combination forecasts of output growth in a seven-country data set. Journal of Forecasting 23, 405–30. Tsay, R. S. (2005). Analysis of Financial Time Series (2nd ed.). New York: John Wiley. Yu, K. and R. A. Moyeed (2001). Bayesian quantile regression. Statistics and Probability Letters 54, 437–47. Zellner, A. and B. Chen (2002). Bayesian modeling of economies and data requirements. Macroeconomic Dynamics 5, 673–700.


20


APPENDIX: PROOFS OF THEOREM 3.1 be the r × r diagonal matrix consisting We first review the asymptotic properties of fˆ t and γˆ (τ ). Let V −1 of the r largest eigenvalues of X X /(T N ), and let S = V ( F F/T )( /N ). Bai (2003) developed the following result. √ L EMMA A.1. (Bai, 2003) Suppose Assumptions 2.1–2.6 hold. Then under N /T → 0, we have √

N( fˆ t − S f t ) → N (0, V −1 R Q(t)R V −1 )

, R = plim with V = plim V F F/T and Q(t) = limN→∞ N −1

as T , N → ∞,

N N i=1

j =1

E[λi λj εit εj t ].

Proof: See Proof of Theorem 3.1 in Bai (2003) and also Bai and Ng (2006, p. 1138). L EMMA A.2. (Bai and Ng, 2008)

Suppose Assumptions 2.1–2.7 hold. Under T 5/8 /N → 0, then

√

−1 T (γˆ (τ ) − γ 0 (τ )) → N 0, J −1 τ (H Z) · I τ (H Z) · J τ (H Z) as T , N → ∞, where 1 τ (1 − τ )(H Z) (H Z), T 1 J τ (H Z) = (H Z) M(H Z), T

I τ (H Z) =

M = diag{g(ξ1 (τ )), . . . , g(ξn (τ ))} is the T-dimensional diagonal matrix with tth element of M is the value of τ th quantile of the density function g(ξt (τ )), with ξt (τ ) = G−1 (τ | H z t ) and P (yt ≤ y | H z t ) = G(y | H z t ). Proof: See Proof of Theorem 3.1 in Bai and Ng (2008).

Proof of Theorem 3.1: Under Assumptions 2.1–2.7 and 3.1 and using Lemmas A.1 and A.2, we prove Theorem 3.1. We decompose the bias b(G) as: b(G) = B1 + B2 + B3 + B4 + B5 , where

B1 =

τ ( yT ; γˆ (τ ), Z) − τ ( yT ; γˆ (τ ), H Z) dG( yT ),

τ ( yT ; γˆ (τ ), H Z) − τ ( yT ; γ 0 (τ ), H Z) dG( yT ),

τ ( yT ; γ 0 (τ ), H Z) − τ (uT ; γ 0 (τ ), H Z) dG(uT ) dG( yT ), B3 =

τ (uT ; γ 0 (τ ), H Z) dG(uT ) − τ (uT ; γ 0 (τ ), Z) dG(uT ) dG( yT ), B4 =

B5 = τ (uT ; γ 0 (τ ), Z) dG(uT ) − τ (uT ; γˆ (τ ), Z) dG(uT ) dG( yT ), B2 =

where H is a diagonal matrix, H = diag(S, I). Note that, again, all expectations are taken with respect to the true distribution G. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

21

Quantile regressions with factor predictors We first evaluate B1 . T

ρτ yt − zˆ t γˆ (τ ) − ρτ yt − (H z t ) γˆ (τ )

T × B1 =

t=1 T

ρτ yt − (H z t ) γˆ (τ ) − (ˆz t − (H z t )) γˆ (τ ) − ρτ yt − (H z t ) γˆ (τ ) = t=1 T

1 √ = −√ N (ˆz t − (H z t )) γˆ (τ )ρτ yt − (H z t ) γˆ (τ ) N t=1 T (ˆzt −(H zt )) γˆ (τ ) + (I (yt − (H z t ) γˆ (τ ) ≤ s) − I (yt − (H z t ) γˆ (τ ) ≤ 0)) ds. 0

t=1

From the second line to the third line, we used a Knight’s identity,

ρτ (u − ν) − ρτ (u) = −νψτ (u) +

ν

(I (u ≤ s) − I (u ≤ 0)) ds

0

with ψτ (u) = √ τ − I (u ≤ 0). We now evaluate these two terms. √ Note that N (H z t − zˆ t ) and γˆ (τ ) are asymptotically independent because the limit of N (H z t − zˆ t ) is determined by ε t and that of γˆ (τ ) is determined by the randomness of yt . Thus T

1 √ N (ˆz t − (H z t )) γˆ (τ )ψτ yt − (H z t ) γˆ (τ ) −√ N t=1

can be evaluated as zero from Lemma A.1. Next, we evaluate the term T

(ˆz t −(H z t )) γˆ (τ )

(I (yt − (H z t ) γˆ (τ ) ≤ s) − I (yt − (H z t ) γˆ (τ ) ≤ 0)) ds.

0

t=1

This term can be modified as T t=1

=

(ˆz t −(H z t )) γˆ (τ )

0 T

E

t=1

+

T

⎡ ⎣

(I (yt − (H z t ) γˆ (τ ) ≤ s) − I (yt − (H z t ) γˆ (τ ) ≤ 0)) ds

(ˆz t −(H z t )) γˆ (τ )

0

(ˆz t −(H z t )) γˆ (τ ) 0

t=1

(ˆz t −(H z t )) γˆ (τ )

−E

(I (yt − (H z t ) γˆ (τ ) ≤ s) − I (yt − (H z t ) γˆ (τ ) ≤ 0)) ds (I (yt − (H z t ) γˆ (τ ) ≤ s) − I (yt − (H z t ) γˆ (τ ) ≤ 0)) ds

⎤ (I (yt − (H z t ) γˆ (τ ) ≤ s) − I (yt − (H z t ) γˆ (τ ) ≤ 0)) ds ⎦

0

= E1 + E2 . C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

22


This modification is commonly used in the standard quantile regression to investigate an asymptotic property of the parameter estimator (see e.g. Koenker, 2005, pp. 121–22). We have

E1 =

T

E

T

E


0

t=1

=

(ˆz t −(H z t )) γˆ (τ )

(ˆz t −(H z t )) γ 0 (τ )

(G(ξt + s) − G(ξt )) ds

0

t=1

√ T N (ˆz t −(H z t )) γ 0 (τ ) √ 1 = √ E (G(ξt + w/ N) − G(ξt )) dw N t=1 0 √ T N(ˆz t −(H z t )) γ 0 (τ ) √ √ 1 = E N (G(ξt + w/ N ) − G(ξt )) dw N t=1 0 √ T N(ˆz t −(H z t )) γ 0 (τ ) 1 = E g(ξt )wdw + op (1) N t=1 0 =

T √ √ 1 g(ξt ) N (ˆz t − (H z t )) K (γ 0 (τ )) N (ˆz t − (H z t )) + op (1), 2N t=1

with ξt (τ ) = G−1 (τ |H z t ) and P (yt ≤ y|H z t ) = G(y|H z t ) is defined by the true quantile structure of yt , and K (γ (τ )) = γ (τ )γ (τ ) . Also, we have

E2 =

T

⎡ ⎣

t=1

(ˆz t −(H z t )) γˆ (τ )


0

(ˆz t −(H z t )) γˆ (τ )

−E

⎤ (I (yt − (H z t ) γˆ (τ ) ≤ s) − I (yt − (H z t ) γˆ (τ ) ≤ 0)) ds ⎦

0

→0 in probability (see e.g.√Koenker, 2005,√p. 122). Thus, noting that N (ˆz t − H z t ) N (ˆz t − H z t ) is the covariance matrix of zˆ t , we finally have: B1 =

T 1 1 , g(ξt )tr [K (γˆ (τ )) z (t)] + O 2T N t=1 N

where z (t) =

V −1 R Q t RV −1

L f ,w

L f ,w

w

,

is the covariance matrix of z t . If N , T → ∞, then Q t converges to Q(t) in Lemma A.1. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

23

Quantile regressions with factor predictors We next consider evaluation of the term B2 . T

T × B2 =

ρτ yt − (H z t ) γ 0 (τ ) − ρτ yt − (H z t ) γˆ (τ )

t=1 T

=

ρτ yt − (H z t ) γ 0 (τ ) − ρτ yt − (H z t ) γ 0 (τ ) − (H z t ) (γˆ (τ ) − γ 0 (τ ))

t=1 T

1 √ = −√ T (γˆ (τ ) − γ 0 (τ )) (H z t )ρτ yt − (H z t ) γ 0 (τ ) T t=1 T (γˆ (τ )−γ 0 (τ )) (H zt ) + (I (yt − (H z t ) γ 0 (τ ) ≤ s) − I (yt − (H z t ) γ 0 (τ ) ≤ 0)) ds. t=1

0

It follows from the Lindeberg–Feller Central Limit Theorem that T

1 √ T (γˆ (τ ) − γ 0 (τ )) (H z t )ρτ yt − (H z t ) γ 0 (τ ) −√ T t=1

√ converges to − T (γˆ (τ ) − γ 0 (τ )) a with a ∼ N (0, Iτ (H Z)) (see e.g. Koenker, 2005). Thus, this term can be evaluated as zero from Lemma A.1. Next, first note that E

T t=1

=

(γˆ (τ )−γ 0 (τ )) (H z t )

(I (yt − (H z t ) γ 0 (τ ) ≤ s) − I (yt − (H z t ) γ 0 (τ ) ≤ 0)) ds

0 (γˆ (τ )−γ 0 (τ )) (H z t )

0

(G(ξt + s) − G(ξt )) ds

√ 1 (G(ξt + w/ T ) − G(ξt )) dw = √ T 0

√T (γˆ (τ )−γ 0 (τ )) (H zt ) √ √ 1 T (G(ξt + w/ T ) − G(ξt )) dw = T 0

√T (γˆ (τ )−γ 0 (τ )) (H zt ) 1 g(ξt )wdw + o(1). = T 0 √ T (γˆ (τ )−γ 0 (τ )) (H z t )

Thus, we can evaluate T t=1

(γˆ (τ )−γ 0 (τ )) (H z t )

(I (yt − (H z t ) γ 0 (τ ) ≤ s) − I (yt − (H z t ) γ 0 (τ ) ≤ 0)) ds

0

T √ √ 1 g(ξt ) T (γˆ (τ ) − γ 0 (τ )) (H z t )(H z t ) T (γˆ (τ ) − γ 0 (τ )) + op (1) 2T t=1 √ 1 ≈ tr{Var( T (γˆ (τ ) − γ 0 (τ ))) J τ (H Z)} + op (1). 2 √ T (γˆ (τ ) − γ 0 (τ )) is asymptotically given by From Lemma A.2, the variance matrix of −1 J −1 τ (H Z)I τ (H Z) J τ (H Z). Thus, the term B2 can be evaluated as

≈

B2 =

1 −1 tr J τ (H Z) · I τ (H Z) + O 2T


T

1 √

T

.

24


The term B3 can be evaluated as zero, because

τ (uT ; γ 0 (τ ), H Z) dG(uT ). τ ( yT ; γ 0 (τ ), H Z) dG( yT ) = Finally, we next consider the evaluations of B4 and B5 . Using a similar argument used for the evaluation of the terms B1 and B2 , we have: T 1 1 . g(ξt )tr K (γ 0 (τ )) z (t) + O B4 = 2T N t=1 N 1 −1 1 B5 = tr J τ (H Z) · I τ (H Z) + O . √ 2T T T Combining these results, we obtain the asymptotic bias as: b(G) =

T 1 −1 1 1 tr J τ (H Z) · I τ (H Z) + , g(ξt )tr K (γ 0 (τ )) z (t) + O T T N t=1 BN,T

with BN,T = min{T

√

T , N }. This is the required result.


The


Testing for sphericity in a fixed effects panel data model B ADI H. B ALTAGI † , Q U F ENG ‡ AND C HIHWA K AO † †

‡

Center for Policy Research, 426 Eggers Hall, Syracuse University, Syracuse, NY 13244-1020, USA. E-mail: [email protected], [email protected]

Division of Economics, Nanyang Technological University, 14 Nanyang Drive, Singapore 637332. E-mail: [email protected] First version received: July 2009; final version accepted: March 2010.

Summary This paper proposes a test for the null of sphericity in a fixed effects panel data model. It uses the Random Matrix Theory based approach of Ledoit and Wolf to test for the null of sphericity of the error terms in a fixed effects panel model with a large number of cross-sectional units and time series observations. Because the errors are unobservable, the residuals from the fixed effects regression are used. The limiting distribution of the proposed test statistic is derived. In addition, its finite sample properties are examined using Monte Carlo simulations. Keywords: Cross-sectional dependence, John test, Panel data, Sphericity.

1. INTRODUCTION This paper proposes a new test for the null of sphericity of the remainder disturbances in a fixed effects panel data regression model with large n and T based on a random matrix theory (RMT) approach. This is important for applied panel data work given the numerous applications that report fixed effects estimates ignoring cross-section dependence or heteroscedasticity. We offer this test as a diagnostic.1 One can always report robust HAC type options for the fixed effects estimates perhaps corrected for finite samples; see the simulation results in Long and Ervin (2000) in case of heteroscedasticity and the recommended finite sample correction by MacKinnon and White (1985). The popular method for robustifying the fixed effects estimator to account for heteroscedasticity in panels is based on Arellano (1987). This is programmed by Stata; see also Hansen (2007) and Stock and Watson (2008). For spatial HAC corrections, see Driscoll and Kraay (1998), Conley (1999) and Kelejian and Prucha (2007). More recently, the HAC suggested by Bester et al. (2009) using dependent data in time series, spatial and panel data, seems to be promising.2 Of course, if one is willing to put a more formal structure on the 1 Another diagnostic test, in the same spirit, is given by Hsiao et al. (2009) who test for cross-sectional independence in non-linear panel data models. 2 One important warning however. Unlike heteroscedasticity, certain types of cross-sectional dependence in the data may result in the inconsistency of OLS; see Andrews (2005) for a factor model example, and Lee (2002) for a spatial C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society. Published by Blackwell Publishing Ltd, 9600


26

B. H. Baltagi, Q. Feng and C. Kao

form of cross-section dependence or heteroscedasticity, one can re-estimate the model using this structure. Conley and Molinari (2007) investigated the impact of location/distance measurement errors upon the accuracy of parametric and non-parametric estimators of asymptotic variances. They also suggested a specification test based on a parametric bootstrap that has good power properties for the types of measurement error they considered. The asymptotic results for our test statistic require large dimensional panels and these are becoming more available, especially in finance and marketing; for example, scanner data on thousands of customers over a long period of time in marketing research, and stock purchases for thousands of firms over a long period of time in finance. We base our test on the statistical literature which assumes normality and typically focuses on the raw data. In order not to impose any structure on the covariance matrix using the raw data, statisticians have based their tests for the null of sphericity on the n × n sample covariance matrix, denoted by S. However, with the availability of more data, the dimension of the sample covariance matrix increases, and the researcher is soon faced with the ‘curse of dimensionality’. Also, when n exceeds T , the sample covariance matrix S becomes singular. Even when n/T is smaller than 1, but n is still large, the n × n sample covariance matrix S will be ill-conditioned. These, in turn, cast doubt on any test involving S including the likelihood ratio test (Ledoit and Wolf, 2004). More importantly, the RMT literature shows that the sample covariance matrix is not a consistent estimator of the population covariance matrix when n is large and comparable with T . Because S is a random matrix, not a random variable or a random vector, a different concept of consistency is applied to account for the change of dimensionality. The spectral norm and Frobenius norm of a matrix are often used in this literature. In fact, Geman (1980) shows that when the sample is from an i.i.d. normal distribution with zero mean and an identity variance–covariance matrix, the spectral norm of S does not converge to that of the identity matrix.3 In addition, the largest eigenvalue of the sample covariance matrix follows a Tracy–Widom distribution asymptotically; for example, Johnstone (2001). These results are very different from the textbook results in multivariate statistics with fixed n and large T . In the latter case, the eigenvalues of S are consistent estimators of the population eigenvalues (Theorem 13.5.1 in Anderson, 2003). In addition, Ledoit and Wolf (2004) show that the scaled Frobenius norm of the sample covariance matrix does not converge to that of the population covariance matrix. The intuition behind these results is straightforward. In large n and T panels, the noise contained in each element of S is the same as in the case of fixed n, and its magnitude is 1/T . However, in a setup with comparably large n and T , the noise involving all the elements of S accumulates with increasing n. This cannot be smoothed away by a large T as in the case with fixed n. To account for the ‘curse of dimensionality’ in testing for the null of sphericity with large panels, an RMT approach is introduced. The RMT literature provides useful asymptotic results in this setting with comparably large n and T . Based on the work of John (1971) and Ledoit and Wolf (2002), this paper proposes a test for the null of sphericity of the disturbances of a fixed effects panel data model with comparably large n and T . We call it the John test. Its limiting autoregressive model example. See also, the more recent paper by Bai (2009) who considers a panel data model where the cross-section dependence is generated by interactive fixed effects. For this model, the usual fixed effects estimator would be inconsistent, and hence a HAC based on standard fixed effects residuals may lead to misleading inference. The point is that in these cases, a HAC estimate may not be sufficient to correct the effect of cross-sectional dependence. 3 If n converges to a non-zero constant c as (n, T ) → ∞ then the eigenvalues of the sample covariance matrix will T √ √ spread from (1 − c)2 to (1 + c)2 , while all the eigenvalues of the population covariance matrix (identity matrix) are 1. See Bai (1999) for a survey. This paper follows the concept of asymptotics in Ledoit and Wolf (2002): n and T go to infinity jointly with comparable convergence rate. Please see Assumption 2.3. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

Testing for sphericity in a fixed effects panel data model

27

distribution under the null is derived and its finite sample properties are studied using Monte Carlo experiments. The organization of the paper is as follows. The next section briefly discusses the fixed effects panel regression model and the hypothesis to be tested. Section 3 studies the proposed John test, whereas Section 4 derives its limiting distribution. Section 5 discusses the finite sample bias of this test. Section 6 compares the size and power of the proposed test as well as the traditional tests for cross-sectional dependence using Monte Carlo experiments. We include these tests because under homoscedasticity, testing the null of sphericity is equivalent to testing the null of no crosssection dependence. Section 7 concludes. The Appendix contains all the proofs and the technical details. Notation: The Frobenius norm of a matrix A is denoted as AF = (tr(A A))1/2 , where tr(A) p d denotes the trace of A. → denotes convergence in distribution and → denotes convergence in probability.

2. THE MODEL AND ASSUMPTIONS Consider a fixed effects panel regression model: yit = α + xit β + μi + vit ,

for i = 1, . . . , n; t = 1, . . . , T ,

(2.1)

where i indexes the cross-sectional units and t the time series observations. yit is the dependent variable and xit denotes the exogenous regressors of dimension k × 1, β denotes the corresponding slope parameters. μi denotes the time-invariant individual effects. The individual effects are allowed to be fixed or random and they could be correlated with the regressors. For a detailed discussion, see Baltagi (2008). The idiosyncratic error vit is assumed to be not serially correlated over time and independent of the xit , but otherwise is allowed to have a general variance–covariance matrix across the individual units. Let vt = (v 1t , . . . , vnT ) . A SSUMPTION 2.1.

The n × 1 vectors v 1 , v 2 , . . . , vT are assumed to be i.i.d. N (0, n ).

Here, the n × n population variance–covariance matrix n allows for heteroscedasticity as well as a general form of cross-sectional dependence structure which is assumed to be stable over time. The null hypothesis of sphericity is given by H0 : n = σv2 In .

(2.2)

The alternative hypothesis Ha : n = σv2 In implies cross-sectional dependence or heteroscedasticity or both. For panel data with fixed n and large T , classical multivariate statistics shows that n can be consistently estimated by the n × n sample covariance matrix S. It follows that S reveals information on cross-sectional dependence. Hence, tests can be constructed based on S or its sample correlation coefficients matrix counterpart; see Breusch and Pagan (1980) and Ng (2006). In the statistics literature, (2.2) is usually tested using the raw data and not in a regression context; see John (1971). Ledoit and Wolf (2002) extend John’s (1971) work and study a test for the null of sphericity of large-dimensional covariance matrices. They show that with large n and T , the test statistic proposed by John (1971) still works but follows a different limiting C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

28


distribution. This paper extends their work to test for the null of sphericity of the disturbances of a fixed effects panel data regression model.4 Because vit is unobservable, the test statistic is based on consistent estimates of the regression residuals. We use the within estimator for the slope parameter β: n T −1 n T xit xit xit yit , β˜ = (2.3) i=1 t=1

T

i=1 t=1

where xit = xit − x i· and x i· = yit is defined similarly. It is well t=1 xit . The variable established that under H 0 , β˜ is consistent. The within estimator wipes out the time-invariant variables whether observed or not. In this sense, this estimator is robust to the omission of timeinvariant variables from (2.1) that are unobserved. Simultaneously, this estimator guards against possible endogeneity between time-invariant regressors and the error term. The residuals vit can be obtained as follows: 1 T

˜ vit = yit − xit β.

(2.4) T

Having vit , the residual-based sample covariance matrix can be obtained as Sˆ = T1 t=1 vˆt vˆt , where vˆt = (vˆ1t , . . . , vˆnt ) for t = 1, . . . , T . Unlike the multivariate analysis and RMT literature setting, this paper considers a panel fixed effects regression model. Consequently, the effect of replacing the error vt with the residual vˆt on the asymptotics is examined. The other assumptions needed to derive the asymptotics are given below. A SSUMPTION 2.2. The regressors {xit , i = 1, . . . , n, t = 1, . . . , T } and the idiosyncratic disturbances {vit , i = 1, . . . , n, t = 1, . . . , T } are independent. The regressors {xit } have finite fourth moments, E[xit 4 ] ≤ K < ∞, where K is a positive constant. The normality assumption may be strict but it is standard in this literature. Assumption 2.2 is required for the consistency of the fixed effects estimator. A SSUMPTION 2.3.

n T

→ c ∈ [0, ∞) as (n, T ) → ∞.

We consider an asymptotic framework employed by Ledoit and Wolf (2004). Unlike the standard asymptotics, where only T increases, or n increases, the framework (n, T ) → ∞ considered here regards n as a sequence indexed by T , denoted as nT . As T goes to infinity, nT /T approaches a constant c. For simplicity, we suppress the subscript T of n in the rest of the paper.

3. JOHN TEST We propose the following test statistic for testing the null of sphericity described in (2.2) based on the sample covariance matrix of the fixed effects residuals: −2 1 T n1 trSˆ tr(Sˆ 2 ) − T − n 1 n n J = − − , (3.1) 2 2 2(T − 1) 4 Kapetanios (2004) relaxes the assumption of homoscedasticity in Ledoit and Wolf (2002). Unlike the raw data setup in Kapetanios (2004), this paper considers a fixed effects panel data model, and applies the sphericity test to fixed effects residuals.



29

where Sˆ = T1 Tt=1 vˆt vˆt is the n × n sample covariance matrix computed using the within residuals vˆt . Because this test is based upon the raw data test proposed by John (1971), it is referred to as the John test in this paper and (3.1) is thus called the John statistic. The limiting distribution of the John statistic (3.1) is standard normal under the null, as shown in the next section. The discussion begins with the simple raw data case, and then extends it to the fixed effects model (2.1). Under the normality assumption, the n × n sample covariance matrix S = T1 Tt=1 vt vt follows a Wishart distribution with T degrees of freedom. In the traditional case with fixed dimension n and T → ∞, the most commonly used test statistic to test the null (2.2) is the likelihood ratio test. However, when n > T , S is singular and the likelihood ratio test is not feasible. John (1971) proposes a test for the null hypothesis of sphericity described in (2.2) in the case of fixed n and large T . In this context, this test statistic is given by

2

−1 −2

1 1 1 1 2 trS trS tr(S ) − 1, U = tr S − In = (3.2) n n n n where S is the n × n sample covariance matrix and In is the n × n identity matrix. Note that n1 trS is the average of the eigenvalues of S, and as shown later, n1 trS converges to σv2 under the null. −1 Thus, n1 trS S can be regarded as a sample version of (σv2 )−1 n . Because the null hypothesis (2.2) can be written as (σv2 )−1 n − In = 0, using the matrix norm notation introduced above, −1 U = n1 n1 trS S − In 2F can be regarded as the squared Frobenius norm of a sample version of (σv2 )−1 n − In , measuring the deviation of the scaled S from the identity matrix. A large value of U indicates a significant deviation of n from sphericity. John (1972) shows that under the null, and using the normality assumption, nT d 2 U → χn(n+1)/2−1 2

for fixed n, as T → ∞.

(3.3)

However, as n → ∞, nT2 U diverges. Ledoit and Wolf (2002) derive the limiting distribution of a modified test statistic under the null, as (n, T ) → ∞ with n/T → c ∈ (0, ∞):5 d

T U − n → N (1, 4). Define the statistic J0 =

T U −n 2

−

1 , 2

(3.4)

6

then

d

J0 → N (0, 1).

(3.5)

A similar test was proposed by Srivastava (2005).7 5 Birke and Dette (2005, Theorem 3.7) point out that (3.4) holds in cases of n/T → 0 and n/T → ∞. Therefore, the case of c = 0 is included in Assumption 2.3. 6 Equation (3.5) can be regarded as a normalized version of (3.3) by subtracting its asymptotic mean and dividing by its asymptotic standard deviation. For large n,

1 nT n(n + 1) U− − 1 ≈ J0 . n(n + 1) − 2 2 2 7

The test statistic proposed by Srivastava (2005) for the null (2.2) is defined as W =

T ( γ − 1), 2


30

1 T


effects panel data model, we replace the raw data S by its counterpart Sˆ = InT the fixed v ˆ v ˆ , where vˆt is the within residual in (2.4). Thus, the residual-based Uˆ is t=1 t t

1 ˆ −2 1 ˆ 2 trS tr(S ) − 1 Uˆ = (3.6) n n

and the residual-based statistic of J 0 is defined as T Uˆ − n 1 Jˆ0 = − . 2 2

(3.7)

However, Jˆ0 cannot be used directly to test for the null of sphericity in the fixed effects panel model (2.1). As shown in the next section, a bias occurs using the residual-based statistic Jˆ0 by replacing vt with the within residuals vˆt . Consequently, we propose the John statistic J . Comparing (3.1) and (3.7), we see that the John statistic is just a bias-corrected Jˆ0 : n . (3.8) J = Jˆ0 − 2(T − 1)

4. ASYMPTOTICS OF THE JOHN TEST This section shows that for the fixed effects regression model (2.1) the John statistic J in (3.1) follows asymptotically a standard normal distribution under the null. Equation (3.8) shows that the John statistic J is a bias-corrected Jˆ0 . J can be written as the sum of three terms n n n T (Uˆ − U ) = J0 + (Jˆ0 − J0 ) − = J0 + − . J = Jˆ0 − 2(T − 1) 2(T − 1) 2 2(T − 1) (4.1) The first term J 0 is asymptotically standard normal. The second term Jˆ0 − J0 is just the scaled difference between the residual-based Uˆ and the true U . We will show that the sum of the second and the third terms vanishes asymptotically. Therefore, the John statistic J has the same limiting distribution as J 0 which is standard normal. By (3.2) and (3.6), the bias term (Jˆ0 − J0 ) = T (Uˆ − U )/2 can be written as

2

−2

1 ˆ −2 1 T 1 1 ˆ2 T (Uˆ − U ) 1 ˆ 21 2 = trS tr(S ) − trS tr(S ) trS trS . 2 2 n n n n n n (4.2) where γ = [T 2 /((T − 1)(T + 2))]

1

n tr(S)

−2

1 n

tr(S 2 ) −

1 T

(tr(S))2 . Srivastava (2005) showed that under the null

d

W → N (0, 1) for T = O(nδ ) with 0 < δ ≤ 1. In fact, this test statistic is asymptotically equivalent to John test (3.2). We can write W as W = Hence, W ≈

T U −n 2

−

1 2

TU −n 1 T (T − 2) T2 − . (T − 1)(T + 2) 2 2 (T − 1)(T + 2)

= J0 for large T , implying the equivalence of the statistic W and that in (3.5). C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


31

It is clear that this bias term depends on the following differences: n1 tr S − n1 trS and n1 tr S2 − 1 2 tr(S ), which are studied in the next two propositions. n P ROPOSITION 4.1.

Under Assumptions 2.1 and 2.2, 1 1 trS − trS = Op n n

P ROPOSITION 4.2.

1 . T

Under Assumptions 2.1 and 2.2,

n 1 2 1 1 2 trS − tr(S ) = Op + Op . n n T T2

(4.3)

(4.4)

Due to the fact that the trace is equal to the sum of the eigenvalues, Proposition 4.1 gives the magnitude of the distance between the average of the sample eigenvalues and that of the true eigenvalues. In RMT, the average of the eigenvalues is the first moment of the empirical spectral distribution (ESD) of S (e.g. Bai and Silverstein, 2006, p. 9), so Proposition 4.1 provides the consequence, on the first moment, of replacing the unobservable sample covariance matrix with the residual-based one. Similarly, Proposition 4.2 shows the consequence of replacing S with Sˆ on the second moment of the ESD. Under Assumption 2.3, n/T → c ∈ [0, ∞), the distance between Sˆ and S is of order 1/T . For the case of fixed n, it is easy to show that the distance is of Op (1/T ). Propositions 4.1 and 4.2 show that as T → ∞, the difference of the first and second moments of eigenvalues between the residuals and the unobservable disturbances vanishes. Proposition 1 of Ledoit and Wolf (2002) shows that n1 trS and n1 tr(S 2 ) converge to σv2 and (1 + c)σv4 , respectively, as (n, T ) → ∞ with n/T → c ∈ (0, ∞).8 Consequently, it is straightforward to S and n1 tr S2. obtain the limits of n1 tr C OROLLARY 4.1.

Under Assumptions 2.1 and 2.2, as (n, T ) → ∞, 1 p 2 trS → σv . n

S requires no restriction on the relationship between n and T . However, this The limit of n1 tr is not the case for n1 tr S 2 . Assumption 2.3 is needed. C OROLLARY 4.2.

Under Assumptions 2.1, 2.2 and 2.3, as (n, T ) → ∞, 1 2 n 4 p trS − 1 + σv → 0. n T

The earlier results show that the scaled traces of Sˆ and Sˆ 2 are bounded. The limit of the trace of Sˆ 2 is related to the ratio n/T . When n increases with T , the noise accumulates and the result is affected by the ratio of n/T . Corollary 4.2 follows directly from the proof of Proposition 4.2. With the results on the difference of traces between S and S in Propositions 4.1 and 4.2, it is T (Uˆ −U ) . This is summarized in the following straightforward to calculate the probability limit of 2 proposition. 8 Proposition 1 of Ledoit and Wolf (2002, p. 1083) presents a slightly different form. These results with higher-order terms can be found in Lemma A.2 in the Appendix.


32

B. H. Baltagi, Q. Feng and C. Kao 0.4 0.35

Histogram N(0,1)

0.3 0.25 0.2 0.15 0.1 0.05 0 -6

-4

-2

0

2

4

6

8

Figure 1. The histogram of Jˆ0 under the null of sphericity in the fixed effects model.

P ROPOSITION 4.3.

Under Assumptions 2.1, 2.2 and 2.3,

1 T (Uˆ − U ) n . − = Op 2 2(T − 1) n

(4.5)

Proposition 4.3 indicates that in the presence of fixed effects, the bias term Jˆ0 − J0 = T (Uˆ − U )/2 does not vanish. In fact, it converges to c/2, where n/T → c ∈ [0, ∞) as (n, T ) → ∞. Hence, for the fixed effects model (2.1), the residual-based statistic Jˆ0 is biased. The histogram of Jˆ0 is illustrated in Figure 1. The design for Figure 1 is described in Section 6. The number of replications is 2000, n = 50, T = 10. Proposition 4.3 suggests a bias adjustment term, 2(Tn−1) , for the residual-based statistic Jˆ0 in a fixed effects panel regression model.9 It follows that under the null, the John statistic in the fixed effects panel regression model (2.1) J = J0 + (Jˆ0 − J0 ) − 2(Tn−1) has the same limiting distribution as J 0 , which is standard normal. T HEOREM 4.1. Under Assumptions 2.1, 2.2 and 2.3, in the fixed effects regression model (2.1), as (n, T ) → ∞ d

J → N (0, 1).

(4.6)

5. FINITE SAMPLE BIAS ADJUSTMENT For large T the bias term 2(Tn−1) in Proposition 4.3 is equivalent to n/T → c ∈ [0, ∞) as (n, T ) → ∞.

n 2T

under the assumption

9 This paper derives the bias-term for the standard fixed individual effects model. If time effects are also included, a similar bias term can be derived. The derivation is tedious, but straightforward.



Under Assumptions 2.1, 2.2 and 2.3, in the fixed effects model (1) √ n n 8 n 8 √1 √1 + O + O σ − σ + O ˆ p p p 2 v T v T T (U − U ) T n T ˆ J0 − J0 = . = 2(T −1)2 8 1 2 σ +O √

33

C OROLLARY 5.1.

T2

v

p

(5.1)

nT

√ Corollary 5.1 follows from (A.8) in the proof of Proposition 4.3 in the Appendix. When n/T → 0, a good approximation for (5.1) becomes n 8 σ − Tn2 σv8 T v 2(T −1)2 8 σv T2

=

n . 2(T − 1)

(5.2)

√ Obviously, n/T → 0 can be regarded as a special case of n/T → c ∈ [0, ∞). The Monte Carlo experiments in the next section show this approximation is also good for even relatively large n and small T . Consequently, we propose the John statistic for the fixed effects panel regression model in (4.1). The within residuals in the fixed effects model are consistent as T → ∞, and the convergence rate depends on T . When T is not very large compared with n, within residuals are inaccurate. It is this inaccuracy that accumulates in the bias term Jˆ0 − J0 . Consequently, the bias term Jˆ0 − J0 lingers in the fixed effects model. Specifically, in a fixed effects panel regression model in (2.1) yit = α + xit β + μi + vit . The distance between the within residuals vit and the idiosyncratic error vit is vit − vit = v¯i. − x˜it (β˜ − β) = Op (T −1/2 ) + Op ((nT )−1/2 ) = Op (T −1/2 ),

(5.3)

where β˜ is the within estimate, v¯i. = T1 Tt=1 vit and x˜it = xit − T1 Tt=1 xit . In (4.2), because the denominator is always bounded, the magnitude of this distance is determined by the numerator, which in turn depends on differences involving first and second moments of Sˆ and S. T T 1 1 vt vt − vt vt , S−S = T t=1 T t=1

where the vt ’s are within residuals. After some algebra, we get 1 S−S =− T

T t=1

x˜t (β˜ − β)v˜t −

T T 1 1 x˜t (β˜ − β)(β˜ − β) x˜t − v¯· v¯· . v˜t (β˜ − β) x˜t + T t=1 T t=1 (5.4)

Here x˜t is the within transformation on xt , and v˜t is similarly defined. Equation (5.4) has an additional term −v¯· v¯· , which comes from the within transformation. As shown in the Appendix,

T T T 1 1 1 1 1 ˜ ˜ ˜ ˜ tr − . x˜t (β − β)v˜t − x˜t (β − β)(β − β) x˜t = Op v˜t (β − β) x˜t + n T t=1 T t=1 T t=1 nT (5.5) C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

34


However, the scaled trace of the extra term from the within transformation has a bigger magnitude, 2

n n T 1 1 1 2 1 1 tr(−v¯· v¯· ) = − , v¯i· = − vit = Op n n i=1 n i=1 T t=1 T which dominates Op

1 nT

for comparably large n and T . Therefore,

1 tr(S − S) = Op n

1 nT

+ Op

1 1 = Op . T T

Consequently, the distance of the first and second moments (between Sˆ and S) is so large that the bias term Jˆ0 − J0 cannot be ignored asymptotically as shown in (4.5). As a result, the residualbased statistic Jˆ0 exhibits a shift and is subject to asymptotic bias.

6. MONTE CARLO SIMULATIONS In this section, we conduct Monte Carlo experiments to assess the empirical size and power of the John test proposed in this paper. Throughout the experiments, we assume homoscedasticity on the remainder error term. This means that testing the null of sphericity is equivalent to testing the null of no cross-section dependence. Hence, we include in our experiments Pesaran’s (2004) CD test and the bias-adjusted LM test derived by Pesaran et al. (2008), denoted as PUY’s LM test. The latter tests are based on sample correlation coefficients and test for zero cross-section dependence. The John test is given by (3.1), whereas PUY’s LM test is defined as PUY’s LM =

n n−1 (T − k)ρîj2 − μT ij 2 , n(n − 1) i=1 j =i+1 vT ij

where μT ij =

1 tr(Mi Mj ) T −k

and vT2 ij = [tr(Mi Mj )]2 a1T + 2tr[(Mi Mj )2 ]a2T with a1T = a2T − a2T = 3

1 , (T − k)2

(T − k − 8)(T − k + 2) + 24 (T − k + 2)(T − k − 2)(T − k − 4)

2 .



35

Note that Mi is the residual maker matrix of the individual regression i, whereas k is the number of the regressors.10 Pesaran’s CD test statistic is defined as: n−1 n 2T ρîj , Pesaran’s CD = n(n − 1) i=1 j =i+1 vit . Specifically, where ρîj is the sample correlation of the residuals T −1/2 T −1/2 T 2 2 ρîj = vit vj t vit vj t . t=1

t=1

t=1

We do not include the traditional Breusch and Pagan (1980) LM test statistic since it is not applicable when n → ∞. 6.1. Experiment design The experiments use the following data-generating process: yit = α + βxit + μi + vit ,

i = 1, . . . , n; t = 1, . . . , T ,

xit = ρxi,t−1 + μi + ηit ,

(6.1) (6.2)

where μi is the fixed effect. The regressor xit is generated in a similar way to that of Im et al. (1999). Because xit is correlated with the μi s, it is endogenous. ηit is assumed to be i.i.d. N(φη , ση2 ), and vit is the idiosyncratic error. Under the null, vit is assumed to be i.i.d. N (0, σv2 ) across individuals and over time. To calculate the power of the tests considered, two different models of the cross-sectional dependence in the vit s are used: a factor model and a spatial model. In the former, see Pesaran (2004) and Pesaran and Tosetti (2008) to mention a few, it is assumed that vit = γi ft + εit ,

(6.3)

where ft (t = 1, . . . , T ) are the factors and γ i (i = 1, . . . , n) are the loadings. In a spatial model, see Anselin and Bera (1998) and Baltagi et al. (2003), to mention a few, we consider both a first-order spatial autocorrelation (SAR(1)) and a spatial moving average (SMA(1)) model as follows: vit = δ(0.5vi−1,t + 0.5vi+1,t ) + εit ,

(6.4)

vit = δ(0.5εi−1,t + 0.5εi+1,t ) + εit .

(6.5)

For these specifications (6.3), (6.4) and (6.5), εit is assumed to be i.i.d. N (0, σε2 ) across individuals and over time. The null can be regarded as a special case of γ i = 0 in the factor model (6.3) and δ = 0 in the spatial models (6.4) and (6.5). The parameters α and β are set arbitrarily to 1 and 2, respectively. The μi s are assumed to be i.i.d. N (φμ , σμ2 ) for i = 1, . . . , n. We set φ μ = 0 and σμ2 = 0.25. For the regressor in (6.2), 10 M = I − X (X X )−1 X , where X = (x , . . . , x ) contains T observations of the regressors in the individual i i i i1 iT i i i regression i. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

36


ρ = 0.7 and φ η = 0 and ση2 = 1. vit (under the null) and εit (under the alternative) are from N(0, σv2 ) and N (0, σε2 ) with σv2 = σε2 = 0.5. For the factor model in (6.3) γ i ∼ i.i.d. U ( − 0.5, 0.55) and ft ∼i.i.d. N (0, 1). For the spatial model δ = 0.4 in (6.4) and (6.5). The Monte Carlo experiments are conducted for n = 5, 10, 20, 30, 50, 100, 200 and T = 10, 20, 30, 50. For each replication, we compute the John, CD and PUY’s LM test statistics. Two thousand replications are performed. To obtain the empirical size, the John test is conducted at the two-sided 5% nominal significance level. 6.2. Results Table 1 gives the empirical size of the John test under the null of sphericity. Hence, in these experiments we assume that there is no cross-section dependence of the factor model or spatial correlation type. By and large, the size of the John test is close to 5% for comparably large n and T and for small n and large T . This is consistent with the theoretical results derived in Theorem 4.1 of Section 4. For large n and small T , the John test is slightly oversized. Similarly, the size of PUY’s LM test is close to 5% except for large n and small T , whereas the CD test has the correct size for all combinations of n and T . Table 2 presents the size adjusted power of these tests under the alternative specification of a factor model. Note that the size adjusted power of the John and PUY’s LM tests increase with n and T . However, the CD test lacks power in this case. The power of the CD test is less than 23% even for n = 200 and T = 50. Tables 3 and 4 show the size adjusted power of these tests under the alternative specification of SAR(1) and SMA(1), respectively. Note that all the tests have low power for small T = 10, but this improves considerably when T = 20. In these cases, the size adjusted power of John test Table 1. Size of tests. n

John

PUY’s LM

Pesaran’s CD

T

5

10

20

30

50

100

200

10 20

5.5 5.6

7.3 6.8

7.5 5.5

8.3 5.6

8.2 7.1

7.3 7.2

9.7 6.2

30 50

6.0 4.9

6.1 6.2

5.6 5.1

7.0 5.9

5.2 6.8

6.3 5.1

6.4 5.2

10 20 30

5.0 4.3 4.6

5.7 4.7 4.1

3.7 4.2 3.9

6.4 4.1 6.3

5.5 5.7 4.8

8.3 5.8 5.4

24.8 6.5 6.2

50

4.4

5.0

5.5

4.8

6.1

4.3

4.6

10

5.4

5.2

5.9

5.7

4.4

5.0

6.2

20 30 50

5.4 4.4 4.6

4.2 4.9 5.4

4.9 5.3 4.8

5.3 4.7 5.5

5.0 6.5 4.1

6.1 5.4 4.9

5.8 5.0 4.3

Note: In a fixed effects panel data model specified in Section 6, this table reports the size of the John test, Pesaran et al. (2008) (PUY) LM test and Pesaran’s (2004) CD test. Homoscedasticity and normality of the idiosyncratic errors are assumed. These tests are conducted at the two-sided 5% nominal significance level. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

37

Testing for sphericity in a fixed effects panel data model Table 2. Size adjusted power of tests: factor model. n

John

PUY’s LM

Pesaran’s CD

T

5

10

20

30

50

100

200

10 20

9.4 18.7

14.8 35.1

37.8 67.4

47.3 86.7

71.7 95.5

88.0 99.8

94.8 100.0

30 50

24.6 45.4

55.3 80.1

88.1 97.8

94.7 99.8

99.6 100.0

100.0 100.0

100.0 100.0

10

7.1

12.2

32.2

37.6

62.9

84.2

92.0

20 30

17.7 23.3

32.7 51.3

61.3 84.6

80.2 93.5

93.1 99.4

99.5 100.0

100.0 100.0

50

39.6

75.0

96.3

99.5

99.9

100.0

100.0

10

4.9

6.9

5.2

6.8

6.6

7.0

9.8

20 30 50

8.2 11.0 14.0

9.7 8.6 12.2

8.3 9.9 13.4

9.5 12.6 14.4

10.5 9.0 16.4

11.8 12.1 18.5

13.5 15.8 22.6

Note: In order to calculate the size adjusted power, a factor structure model is assumed to allow for cross-sectional dependence in the errors; see Section 6.

Table 3. Size adjusted power of tests: SAR(1) model. n

John

PUY’s LM

Pesaran’s CD

T

5

10

20

30

50

100

200

10

33.9

34.6

40.2

34.4

42.0

44.5

45.0

20 30 50

72.6 91.5 99.9

77.1 96.9 100.0

87.2 99.6 100.0

89.9 98.9 100.0

88.0 99.1 100.0

91.2 99.6 100.0

94.5 99.8 100.0

10 20

26.6 69.5

22.5 70.2

29.4 78.0

20.0 79.4

23.9 74.6

27.4 75.1

24.6 79.9

30 50

91.7 99.8

95.8 100.0

98.1 100.0

97.3 100.0

97.4 100.0

98.6 100.0

99.8 100.0

10 20 30

58.9 84.3 96.3

50.4 78.5 87.5

46.2 72.3 87.7

47.8 72.3 85.3

45.3 68.1 85.0

41.5 66.8 82.9

38.9 67.6 82.3

50

99.6

98.7

98.0

97.4

97.1

96.0

97.3

Note: In order to calculate the size adjusted power, a SAR(1) model is assumed to allow for cross-sectional dependence in the errors; see Section 6.

is slightly better than that of PUY’s LM test for all combinations of n and T . The size adjusted power of the CD test performs much better than in the case of a factor model.11 11 See Pesaran and Tosetti (2008) who distinguish between factor models and spatial models in terms of time-specific weak versus strong cross-sectional dependence.


38

B. H. Baltagi, Q. Feng and C. Kao Table 4. Size adjusted power of tests: SMA(1) model. n

John

PUY’s LM

Pesaran’s CD

T

5

10

20

30

50

100

200

10 20

16.6 45.0

19.0 50.6

22.4 63.8

16.9 65.5

22.9 66.8

24.1 66.3

25.6 65.6

30 50

75.2 98.6

84.2 99.9

90.2 100.0

92.1 100.0

94.6 100.0

95.2 100.0

94.8 100.0

10

15.9

16.6

21.2

12.6

16.2

20.6

15.4

20 30

52.3 85.6

52.5 85.7

61.5 90.0

62.1 90.5

61.1 91.0

59.3 91.0

63.5 89.4

50

99.6

99.9

100.0

100.0

100.0

100.0

100.0

10

38.2

32.2

33.6

32.3

31.7

30.6

26.8

20 30 50

62.5 83.9 96.5

54.7 70.4 91.2

54.1 66.6 88.9

53.0 67.7 88.7

45.5 68.5 85.5

47.1 64.7 85.1

47.7 67.6 87.2

Note: In order to calculate the size adjusted power, an SMA(1) model is assumed to allow for cross-sectional dependence in the errors; see Section 6.

Table 5. Size of tests with non-normal errors. N (0, 0.5) U [− 2, 2] χ12

t5

T /n

20

50

100

20

50

100

20

50

100

20

50

100

John

10 30

7.5 5.6

8.2 5.2

7.3 6.3

9.8 8.9

11.1 9.3

7.7 7.7

65.0 83.8

77.9 92.5

81.2 94.8

24.9 33.2

32.6 41.7

34.2 45.8

LM

10 30

3.7 3.9

5.5 4.8

8.3 5.4

4.8 5.0

6.5 4.6

4.0 4.0

6.8 8.7

6.4 8.5

12.3 8.8

5.4 5.6

6.0 4.9

9.1 4.2

CD

10 30

5.9 5.3

4.4 6.5

5.0 5.4

5.1 6.1

5.9 5.4

4.3 4.5

4.8 5.4

5.1 4.8

5.3 4.3

4.6 4.1

5.5 4.1

5.4 6.0

Note: In order to check the sensitivity of the tests to non-normal disturbances, uniform distribution U [− 2, 2], chi-square distribution with one degree of freedom χ12 , and t-distribution with five degrees of freedom t 5 are considered. The normal case is also presented for comparison.

Table 5 checks the sensitivity of our tests to non-normal disturbances. The CD and PUY tests seem robust to non-normality. However, the proposed John test seems to be very sensitive to non-normality. For the uniform distribution U [−2, 2], John’s test has similar size as reported for the normal distribution. However, it suffers from big size distortion when the error distribution is chi-square or a t-distribution. In this sense, the normality assumption (Assumption 2.1) is very crucial, and our test is not robust to non-normality.

7. CONCLUSION This paper proposes a new test for the null of sphericity of the disturbances of a fixed effects panel data regression model. Under homoscedasticity of the disturbances, this is equivalent to C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


39

testing for no cross-sectional dependence. There are several econometric methods that model cross-section dependence, including the popular spatial correlation models and the factor models. To avoid the ad hoc specifications imposed on the covariance matrix, a test based on the sample covariance matrix of a fixed effects panel data regression model is proposed. Following Ledoit and Wolf (2002), we propose the John test using the fixed effects residuals. The limiting distribution of the proposed test is derived and its finite sample properties are examined using Monte Carlo experiments. The simulation results show that the John test performs better than the PUY’s LM test and Pesaran’s CD test and can be applied in empirical panel data studies using fixed effects residuals. However, the John test remains oversized for panels with large n and small T . Some of the limitations of our proposed test is that it is sensitive to nonnormality of the disturbances, and its derivation and asymptotic distribution is heavily reliant on the normality assumption. Another limitation of our test statistic is the assumption that there is no time series dependence and full independence between time-varying regressors and time-varying unobservables. The reason we impose these restrictive assumptions is that we rely heavily on results from random matrix theory which to our knowledge has not been extended to deal with non-normality, serial correlation or endogeneity.

ACKNOWLEDGMENTS The authors thank the editor Jaap Abbring and two anonymous referees for their useful comments and suggestions.

REFERENCES Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis (3rd ed.). New York: John Wiley. Andrews, D. W. K. (2005). Cross-section regression with common shocks. Econometrica 73, 1551–85. Anselin, L. and A. K. Bera (1998). Spatial dependence in linear regression models with an introduction to spatial econometrics. In A. Ullah and D. Giles (Eds.), Handbook of Applied Economic Statistics, 237–89. New York: Marcel Dekker. Arellano, M. (1987). Computing robust standard errors for within-groups estimators. Oxford Bulletin of Economics and Statistics 49, 431–34. Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica 77, 1229–79. Bai, Z. D. (1999). Methodologies in spectral analysis of large dimensional random matrices, a review. Statistica Sinica 9, 611–77. Bai, Z. D. and J. Silverstein (2006). Spectral Analysis of Large Dimensional Random Matrices. Beijing: Science Press. Baltagi, B. H. (2008). Econometric Analysis of Panel Data (4th ed.). New York: John Wiley. Baltagi, B. H., Q. Feng and C. Kao (2009). Testing for sphericity in a fixed effects panel data model: supplementary appendix. Center for Policy Research Working Paper No. 112, Syracuse University, New York. Baltagi, B. H., S. H. Song and W. Koh (2003). Testing panel data regression models with spatial error correlation. Journal of Econometrics 117, 123–50. Bester, C. A., T. G. Conley and C. B. Hansen (2009). Inference with dependent data using cluster covariance estimators. Working paper, Graduate School of Business, University of Chicago. Birke, M. and H. Dette (2005). A note on testing the covariance matrix for large dimension. Statistics and Probability Letters 74, 281–89. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

40


Breusch, T. S. and A. R. Pagan (1980). The Lagrange multiplier test and its application to model specification in econometrics. Review of Economic Studies 47, 239–54. Conley, T. G. (1999). GMM estimation with cross sectional dependence. Journal of Econometrics 92, 1–45. Conley, T. G. and F. Molinari (2007). Spatial correlation robust inference with errors in location or distance. Journal of Econometrics 140, 76–96. Driscoll, J. C. and A. C. Kraay (1998). Consistent covariance matrix estimation with spatially dependent panel data. Review of Economics and Statistics 80, 549–60. Geman, S. (1980). A limit theorem for the norm of random matrices. Annals of Probability 8, 252– 61. Hansen, C. B. (2007). Asymptotic properties of a robust variance matrix estimator for panel data when T is large. Journal of Econometrics 141, 597–620. Hsiao, C., M. H. Pesaran and A. Pick (2009). Diagnostic tests of cross section independence for nonlinear panel data models. Working paper, University of Cambridge. Im, K. S., S. C. Ahn, P. Schmidt and J. M. Wooldridge (1999). Efficient estimation of panel data models with strictly exogenous explanatory variables. Journal of Econometrics 93, 177–201. John, S. (1971). Some optimal multivariate tests. Biometrika 58, 123–27. John, S. (1972). The distribution of a statistic used for testing sphericity of normal distributions. Biometrika 59, 169–73. Johnstone, I. (2001). On the distribution of the largest principal component. Annals of Statistics 29, 295–327. Kapetanios, G. (2004). On testing for diagonality of large dimensional covariance matrices. Working paper, Queen Mary University of London. Kelejian, H. H. and I. Prucha (2007). HAC estimation in a spatial framework. Journal of Econometrics 140, 131–54. Ledoit, O. and M. Wolf (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Annals of Statistics 30, 1081–102. Ledoit, O. and M. Wolf (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 88, 365–411. Lee, L. (2002). Consistency and efficiency of least squares estimation for mixed regressive, spatial autoregressive models. Econometric Theory 18, 252–77. Long, J. S. and H. Ervin (2000). Using heteroskedasticity consistent standard errors in the linear regression model. American Statistician 54, 217–24. MacKinnon, J. G. and H. White (1985). Some heteroskedasticity consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics 29, 53–57. Ng, S. (2006). Testing cross-section correlation in panel data using spacings. Journal of Business and Economic Statistics 24, 12–23. Pesaran, M. H. (2004). General diagnostic test for cross section dependence in panels. Working paper, University of Cambridge. Pesaran, M. H. and E. Tosetti (2008). Large panels with common factors and spatial correlations. Working paper, University of Cambridge. Pesaran, M. H., A. Ullah and T. Yamagata (2008). A bias-adjusted LM test of error cross section independence. Econometrics Journal 11, 105–27. Srivastava, M. S. (2005). Some test concerning the covariance matrix in high dimensional data. Journal of the Japanese Statistical Society 35, 251–72. Stock, J. H. and M. W. Watson (2008). Heteroskedasticity-robust standard errors for fixed effects panel data regression. Econometrica 76, 155–74. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

41


APPENDIX: PROOFS OF PROPOSITIONS The following lemma is frequently used in the proofs: L EMMA A.1. For a random sequence {Zn }, if EZn2 = O(nν ), where ν is a constant, then Zn = Op (nν/2 ). In the fixed effects model, yit = xit β + μi + vit , β˜ is the within estimator and the within residuals are ˜ where y˜it = yit − y¯i· and x˜it = xit − x¯i· . Define v˜it = vit − v¯i· , then the residuals given by vit = y˜it − x˜it β, vt = v˜t − x˜t (β˜ − β). To obtain T (Uˆ − U ) in (4.2) we vît = v˜it − x˜it (β˜ − β) and in vector form we have 1 1 1 2 ˆ need to calculate four terms: n trS, n tr(S ), n trS and n1 tr(Sˆ 2 ). The following lemma is needed to prove Theorem 4.1. This following lemma verifies Proposition 1 in Ledoit and Wolf (2002). Moreover, the results above provide the order of the higher-order terms, which are used in the calculation of the asymptotic bias of the John statistic.

L EMMA A.2. Op T1 .

Proof:

Under Assumptions 2.1 and 2.2, (a) n1 trS = σv2 + Op

√1 nT

; (b) n1 tr(S 2 ) =

n

Consider (a), n T T T T 1 1 1 1 1 2 1 trS = tr vt vt = tr vt vt = vt vt = v n n T t=1 nT t=1 nT t=1 nT t=1 i=1 it n T 1 2 vit − σv2 nT t=1 i=1

1 = σv2 + Op √ nT

= σv2 +

since √1nT Tt=1 ni=1 (vit2 − σv2 ) = Op (1). This proves (a). For (b), T T T T 1 1 1 1 1 2 tr(S ) = tr vt vt vs vs tr vt vt vs vs = n n T t=1 T s=1 nT 2 t=1 s=1 =

T T 1 v vs v vt nT 2 t=1 s=1 t s

=

T T T 1 1 v v v v + v vs v vt t t t t nT 2 t=1 nT 2 s=t t=1 t s

=

n n n n T T T 1 2 2 1 v v + vit vis vj s vj t it j t nT 2 t=1 j =1 i=1 nT 2 s=t t=1 j =1 i=1

=

n n T T n 1 4 1 2 2 v + v v it nT 2 t=1 i=1 nT 2 t=1 j =i i=1 it j t

+

n n n T T T T 1 2 2 1 v v + vit vis vj s vj t it is nT 2 s=t t=1 i=1 nT 2 s=t t=1 j =i i=1

= I + II + III + IV. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

T

+ 1 σv4 +

42


It is easy to show that

n n T T 4 E vit4 1 1 4 1 4 , v = O v = − E v + I= p it it it 2 2 nT t=1 i=1 T nT t=1 i=1 T

II =

n n T T n n n−1 4 1 2 2 1 n−1 4 1 2 2 4 σ σ , v = v v = + v − σ + O √ p v v v nT 2 t=1 j =i i=1 it j t T nT 2 t=1 j =i i=1 it j t T T T

III =

n n T T T T T −1 4 1 1 2 2 T −1 4 1 2 2 4 σ σ v = v v = + v − σ + O √ p v v v nT 2 s=t t=1 i=1 it is T nT 2 s=t t=1 i=1 it is T T n

and IV =

n T T n 1 1 , v v v v = O it is j s j t p nT 2 s=t t=1 j =i i=1 T

using n T 1 4 vit − E vit4 = Op (1), √ nT t=1 i=1

1 √

n T

n T n

vit2 vj2t − σv4 = Op (1)

t=1 j =i i=1

and n T T 1 2 2 vit vis − σv4 = Op (1). √ nT s=t t=1 i=1

Hence,

n−1 4 T −1 4 1 1 1 1 1 + Op tr(S 2 ) = Op + σv + Op σv + Op + √ √ n T T T T T n T T

n 1 + 1 σv4 + Op = T T

as required. Proof of Proposition 4.1: Recall y˜it = x˜it β + v˜it and β˜ − β =

T n

−1 x˜it x˜it

t=1 i=1

n T

x˜it v˜it .

t=1 i=1

It is easy to show that

β˜ − β = Op

1 √ nT

.


43


Now vit = y˜it − x˜it β˜ = x˜it β + v˜it − x˜it β˜ = v˜it − x˜it (β˜ − β). In vector form vˆt = v˜t − x˜t (β˜ − β), where v˜t = vt − v¯· . It follows that T T 1 1 vˆt vˆt − vt vt S−S = T t=1 T t=1

=

T T 1 1 [v˜t − x˜t (β˜ − β)][v˜t − x˜t (β˜ − β)] − vt vt T t=1 T t=1

= H + G, where H =

T T 1 1 v˜t v˜t − vt vt T t=1 T t=1

(A.1)

and G=−

T T T 1 1 1 x˜t (β˜ − β)v˜t − x˜t (β˜ − β)(β˜ − β) x˜t . v˜t (β˜ − β) x˜t + T t=1 T t=1 T t=1

(A.2)

Hence, 1 1 1 1 1 trS − trS = tr( S − S) = tr(H ) + tr(G). n n n n n √ It is easy to verify that Tt=1 ni=1 x˜it x˜it = Op (nT ), Tt=1 ni=1 x˜it v˜it = Op ( nT ). Then,

(A.3)

1 1 ˜ 1 ˜ 1 ˜ tr(G) = − (β − β) (β − β) (β − β)(β˜ − β) x˜t v˜t + x˜t x˜t v˜t x˜t − n nT nT nT t=1 t=1 t=1

1 . = Op nT T

T

T

(A.4)

We need to compute the first term of (A.3). Using the fact that H = =

T T T T 1 1 1 1 v˜t v˜t − vt vt = (vt − v¯· )(vt − v¯· ) − vt vt T t=1 T t=1 T t=1 T t=1 T T 1 1 vt vt − vt v¯· − v¯· vt + v¯· v¯· − vt vt T t=1 T t=1

= −v¯· v¯· , it follows that 2 n n T 1 1 1 1 2 1 1 tr(H ) = tr(−v¯· v¯· ) = − v¯· v¯· = − v¯ = − vit n n n n i=1 i· n i=1 T t=1 =−

T T n 1 vis vit nT 2 i=1 s=1 t=1

T T n n T 1 2 1 v − vis vit nT 2 i=1 t=1 it nT 2 i=1 s=t t=1

1 1 σ2 = Op . = − v + Op √ T T T n

=−


(A.5)

44


Collecting (A.3), (A.4) and (A.5), we obtain 1 1 trS − trS = Op n n

1 . T

A.2. Proof of Proposition 4.2 We define A0 = v¯· v¯· , A1 =

T 1 x˜t (β˜ − β)v˜t , T t=1

A2 = A1 =

T 1 v˜t (β˜ − β) x˜t T t=1

and A3 =

T 1 x˜t (β˜ − β)(β˜ − β) x˜t . T t=1

Thus, S − S = −A0 − A1 − A2 + A3 . To prove Proposition 4.2, we need the following lemma. L EMMA A.3. Under Assumptions 2.1 and 2.2, (a) n1 tr(SA1 ) = Op T12 + Op nT1 + Op T √1nT ; (b) 1 tr(SA3 ) = Op nT1 ; (c) n1 tr(A21 ) = Op nT1 2 ; (d) n1 tr(A1 A2 ) = Op T12 ; (e) n1 tr(A1 A3 ) = Op nT1 2 ; n (f) n1 tr(A23 ) = Op nT1 2 ; (g) n1 tr(SA0 ) = Op T1 + Op Tn2 ; (h) n1 tr(A20 ) = Op Tn2 ; (i) n1 tr(A0 A1 ) = Op T12 ; (j) n1 tr(A0 A3 ) = Op nT1 2 . The proofs are included in Baltagi et al. (2009). ˆ + S Sˆ − S 2 and tr(SS) ˆ = tr(S S), ˆ we Proof of Proposition 4.2: Using 2S(Sˆ − S) + (Sˆ − S)2 = Sˆ 2 − SS have 1 1 2 1 1 1 trS − tr(S 2 ) = tr( S 2 − S 2 ) = 2 tr[S(Sˆ − S)] + tr(Sˆ − S)2 . n n n n n Using the notation above, 1 2 1 trS − tr(S 2 ) n n 2 1 = tr(S(−A0 − A1 − A2 + A3 )) + tr((−A0 − A1 − A2 + A3 )(−A0 − A1 − A2 + A3 )) n n 2 1 = tr(−SA0 − SA1 − SA2 + SA3 ) + tr A20 + A0 A1 + A0 A2 − A0 A3 + A1 A0 + A21 + A1 A2 − A1 A3 n n + A2 A0 + A2 A1 + A22 − A2 A3 − A3 A0 − A3 A1 − A3 A2 + A23 . We have tr(A0 A1 ) = tr(A1 A0 ) = tr(A0 A2 ) = tr(A2 A0 ), tr(A1 A2 ) = tr(A2 A1 ), tr(A3 A1 ) = tr(A1 A3 ) = tr(A3 A2 ) = tr(A2 A3 ), tr(A21 ) = tr(A22 ) and tr(SA2 ) = tr(SA1 ). Hence, 1 2 1 trS − tr(S 2 ) n n 4 2 2 2 4 1 = − tr(SA1 ) + tr(SA3 ) + tr A21 + tr(A1 A2 ) − tr(A1 A3 ) + tr A23 n n n n n n 2 1 4 2 − tr(SA0 ) + tr A20 + tr(A0 A1 ) − tr(A0 A3 ). n n n n Given these results, the proposition follows directly from Lemma A.3.


45


Define W1 = n1 trSˆ − n1 trS and W2 = n1 tr(Sˆ 2 ) − n1 tr(S 2 ). To derive the asymptotics of T (Uˆ − U ), we need to calculate the magnitudes of W 1 and W 2 , which requires the following lemma. 2 L EMMA A.4. Under Assumptions 2.1 and 2.2, (a) nT1 2 ni=1 Tt=1 vit2 = σTv + Op T √1nT ; (b) nT1 3 n T T 2 2 T n n 2 2 (n−1) 4 T −1 4 1√ 1 1√ i=1 s=t t=1 vit vis = T 2 σv + Op T 2 n ; (c) nT 3 s=1 j =i i=1 vis vj s = T 2 σv + Op T 2 T ; 2 √ (d) n nT1 2 ni=1 Tt=1 vit2 = Tn2 σv4 + Op ( T 2 √nT ). Proof:

Consider (a).

T T n n σv2 1 1 2 σv2 1 2 2 . v − σv = v = + + Op √ nT 2 i=1 t=1 it T nT 2 i=1 t=1 it T T nT

For (b),

T T n n T T T −1 4 1 2 2 1 T −1 4 1 2 2 4 . σ v = v v = + v − σ σ + O √ p it is v it is v v nT 3 i=1 s=t t=1 T nT 3 i=1 s=t t=1 T2 T2 n For (c),

n n n n T T n−1 4 1 2 2 1 n−1 4 1 2 2 4 . v = v v = σ + v − σ σ + O √ p v nT 3 s=1 j =i i=1 is j s T2 v nT 3 s=1 j =i i=1 is j s T2 v T2 T Finally, we consider (d). 2 2 2 √ T T T n n n 1 2 1 2 n 2 1 2 2 n σ + 2√ v − σv v = v = √ nT 2 i=1 t=1 it T v T 2 n i=1 t=1 it T n i=1 t=1 it 2

√ √ 1 n 2 n n σv + Op . = = 2 σv4 + Op √ √ T T T T T2 T

This proves the lemma. The magnitudes and high order terms of W 1 and W 2 are summarized in the following lemma. 2 1 ; (b) W2 = − T2 σv4 − L EMMA A.5. Under Assumptions 2.1 and 2.2, (a) W1 = − σTv + Op T √ n √ 1 Op T √ + Op T 2n + Op T √1 T . n 2 1 Proof: From the proof of Proposition 4.1 and Lemma A.4, we obtain W1 = − σTv + Op T √ . n By Lemma A.3, the leading terms of W 2 are from parts (g) and (h) of Lemma A.3,

1 1 1 2 1 . + Op + O W2 = − tr(SA0 ) + tr A20 + Op √ p n n nT T2 T nT Baltagi et al. (2009) show that T n n T n T 1 1 2 2 1 2 2 tr(SA0 ) = v v + v v it is n nT 3 i=1 s=t t=1 nT 3 s=1 j =i i=1 is j s

√

1 1 n + Op + O + Op √ √ p T2 T n T T C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

n σ4 T2 v

+

46


and 2

√ T n 1 2 n 1 2 tr A0 = n . v + O p n nT 2 i=1 t=1 it T2 From Lemma A.4, we obtain

√

1 1 1 n 1 n + Op + O tr(SA0 ) = σv4 + 2 σv4 + Op ; √ √ p n T T T2 T n T T

√ 1 2 n n tr A0 = 2 σv4 + Op . n T T2 It follows that 1 ˆ2 1 tr(S ) − tr(S 2 ) n n

1 1 1 1 1 + O + Op = −2 · tr(SA0 ) + tr A20 + Op √ p n n nT T2 T nT

√

1 1 n n 4 1 4 + Op + O = −2 · σ v + 2 σ v + Op √ √ p T T T2 T n T T

√

1 1 1 n n 4 + O + O + O + 2 σv + Op √ p p p T T2 nT T2 T nT

√

2 4 1 1 n n 4 = − σv − 2 σv + Op + Op + O . √ √ p T T T2 T n T T

W2 =

(A.6) (A.7)

Having the results on

1 trS, n1 tr(S 2 ), n

W 1 and W 2 , we are in a good position to prove Proposition 4.3.

Proof of Proposition 4.3: Plugging the results in Lemma A.2 and Lemma A.5 in (4.2), we obtain 2 trSˆ n1 tr(S 2 ) T (Uˆ − U ) = 1 2 trSˆ trS n n 2 2 1 T n1 trS tr(S 2 ) + W2 − n1 trS + W1 n1 tr(S 2 ) n = 1 2 1 2 trSˆ trS n n 2 T W2 n1 trS − 2T W1 n1 trS n1 tr(S 2 ) − T W12 n1 tr(S 2 ) = . 1 2 1 2 trSˆ trS n n T

2 1 trS n1 tr(Sˆ 2 ) n

− 2 1

1

n

Consider the numerator, 2 1 1 1 1 trS − 2T W1 trS tr(S 2 ) − T W12 tr(S 2 ) n n n n 2

√

1 1 1 n n 4 2 4 2 + Op σ v + Op √ + Op = T − σv − 2 σv + Op √ √ T T T2 T n nT T T

T W2



47

n 1 1 1 σv2 2 4 σ v + Op √ + Op + 1 σv + Op − 2T − √ T T T T n nT 2

1 n 1 σv2 −T − + Op + 1 σv4 + Op √ T T T T n

√

1 1 1 n n 4 4 = −2σv − σv + Op √ + Op + Op √ σv4 + Op √ T T n nT T

n 1 1 1 2 2 4 σv + Op √ + 2σv + Op √ + 1 σv + Op T T n nT

4 n 1 1 σ + 1 σv4 + Op + − v + Op √ T T T T n

√

1 1 n n + Op + Op √ = −2σv8 − σv8 + Op √ T T n T

√

n 1 1 n + Op + 1 σv8 + Op √ + Op +2 T T T n

√

1 1 n 1 n 8 + Op + Op σ v + Op + − √ T2 T T2 T2 T n

√

1 1 n n n + Op + Op √ . = σv8 − 2 σv8 + Op √ T T T n T

Similarly, the denominator is

2

2 2

1 ˆ 2 1 1 1 trS trS = trS + W1 trS n n n n 2 2

2 σ 1 1 1 − v + Op σv2 + Op √ = σv2 + Op √ √ T T n nT nT

2 1 1 (T − 1) 4 σv4 + Op √ = σv + Op √ T2 nT nT

1 (T − 1)2 8 . σv + Op √ = T2 nT It follows directly that T (Uˆ − U ) =

Obviously,

T (Uˆ −U ) 2

−

n 2(T −1)

n 8 σ T v

−

n σ8 T2 v

√ + Op Tn + Op √1T . (T −1)2 8 σv + Op √1nT T2

+ Op

√1 n

→ 0 as (n, T ) → ∞ with n/T → c ∈ [0, ∞).


(A.8)

The


The Hausman test in a Cliff and Ord panel model JAN M UTL † AND M ICHAEL P FAFFERMAYR ‡,§ ,¶ †

‡

Institute for Advanced Studies, Stumpergasse 56, 1060 Vienna, Austria. E-mail: [email protected]

Department of Economics, University of Innsbruck, Universitaetsstrasse 15, 6020 Innsbruck, Austria. E-mail: [email protected] § Austrian

Institute of Economic Research, P.O. Box 91, A-1103 Vienna, Austria. ¶ CESifo,

Poschingerstr. 5, 81679 Munich, Germany.

First version received: August 2009; final version accepted: June 2010

Summary This paper studies the random effects model and the fixed effects model for spatial panel data. The model includes a Cliff and Ord type spatial lag of the dependent variable as well as a spatially lagged one-way error component structure, accounting for both heterogeneity and spatial correlation across units. We discuss instrumental variable estimation under both the fixed and the random effects specifications and propose a spatial Hausman test which compares these two models accounting for spatial autocorrelation in the disturbances. We derive the large sample properties of our estimation procedures and show that the test statistic is asymptotically chi-square distributed. A small Monte Carlo study demonstrates that this test works well even in small panels. Keywords: Hausman test, Panel data, Random effects estimator, Spatial econometrics, Within estimator.

1. INTRODUCTION The panel literature offers the random effects and the fixed effects models to account for heterogeneity across units. Although the random effects estimator is more efficient than the fixed effects estimator, in many non-spatial empirical applications the random effects model is rejected in favour of the fixed effects model. Often there are plausible arguments for the explanatory variables to be correlated with unit-specific effects. For example, in earnings equations the unobserved ability of individuals may be reflected in both the unit-specific effects and the explanatory variables such as the years of schooling. The estimation of gravity equations to model bilateral trade flows is another important example where the assumptions of the random effects model are often found to be violated.1 It is perfectly sensible that this issue also comes up in spatial panel models.

1

See also the papers cited in Baltagi (2008) for more examples.



The Hausman test in a Cliff and Ord panel model

49

This paper contributes to the literature by introducing a spatial generalized methods of moments estimator for panel data models with Cliff and Ord type spatial autocorrelation and one-way error components. Our work complements the seminal paper of Kapoor et al. (2007), who provide a spatial generalized least squares (spatial GLS) estimator for the spatial random effects model. In addition to their work, our model allows for an endogenous spatial lag of the dependent variable. We discuss the proper instrumentation of the endogenous spatial lag and suggest an instrumental variable (IV) procedure for both the spatial within estimator and the spatial GLS estimator. To discriminate between the two spatial panel models, we also propose a Hausman test that accounts for spatially autocorrelated disturbances. Specifically, we derive the joint asymptotic distribution of the spatial GLS and the spatial within estimators, as well as the asymptotic distribution of the spatial Hausman test for random versus fixed effects. This test should enable applied researchers to choose between these two models, when spatial correlation of the endogenous variable and/or the disturbances is present. Our paper is not the first that considers spatial within or fixed effects estimators. Case (1991) seems to be among the first in estimating spatial random and fixed effects models. Korniotis (2010) introduces a bias-corrected estimator for a spatial dynamic panel model with fixed effects. Lee and Yu (2010) establish the asymptotic properties of quasi-maximum likelihood estimators for fixed effects spatial autoregressive (SAR) panel data models with SAR disturbances, where the time periods and/or the number of spatial units can be finite or large in all combinations except that both are finite (see also Yu et al., 2007, 2008). In the next section, we specify our model and spell out the maintained assumptions. Section 3 defines the two estimators under consideration and derives their joint asymptotic distribution. Section 4 introduces the feasible counterparts of the considered estimators based on an initial IV estimator. We show that the initial estimator is consistent and asymptotically normal and derive its asymptotic distribution. We also demonstrate that the true and feasible estimators have the same asymptotic distribution. Section 5 defines the spatial Hausman test that allows to discriminate between the two spatial panel models. It provides its asymptotic distribution under the null and also shows that the test statistic diverges in probability under the alternative hypothesis. In Section 6, we report the results of Monte Carlo experiments that assess both the size and the power of the proposed spatial Hausman test in finite samples. Finally, the last section concludes. Proofs of results are given in the Appendix.

2. THE SPATIAL PANEL MODEL Consider the following spatial panel model: yit,N = λ

N

wij ,N yj t,N + xit,N β + di,N γ + uit,N .

(2.1)

j =1

Index i = 1, . . . , N denotes the cross-sectional dimension of the panel while the index t = 1, . . . , T refers to the time series dimension of the panel. Throughout we assume that T is fixed; that is, our asymptotic analysis refers to large N . y it,N is the (scalar) dependent variable and N j =1 wij ,N yj t,N denotes the spatial lag of the dependent variable with w ij,N being observable non-stochastic spatial weights. λ is the associated scalar parameter. xit,N denotes a 1 × (K − 1) vector of time-varying exogenous variables and β is the corresponding (K − 1) × 1 parameter C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

50

J. Mutl and M. Pfaffermayr

vector. di,N is 1 × L vector of time-invariant variables, including the constant, with L × 1 parameter vector γ . Finally, uit,N is the overall disturbance term. We allow for cross-sectional correlation of the disturbances and, in particular, we assume that the disturbances follow a Cliff and Ord type spatial autocorrelation (SAR(1) in the terminology of Anselin, 1988) as proposed by Kapoor et al. (2007): uit,N = ρ

N

mij ,N uj t,N + εit,N ,

(2.2)

j =1

where ρ is a scalar parameter and mij,N are observable spatial weights (possibly the same as the weights w ij,N ). The innovations εit,N have the following one-way error component structure: εit,N = μi,N + νit,N ,

(2.3)

where ν it,N are independent innovations and μi,N are individual effects, which can be either fixed or random. We index all variables by the sample size N , because they form triangular arrays. This is necessary because the model involves inverses of matrices whose size depends on N , and, hence, their elements must change with N . Thus, at the minimum y it,N and uit,N are triangular arrays in the present specification. We sort the data so that the fast index is i and the slow index is t. Stacking the model over the N cross-sections for a single period t yields yt,N = λWN yt,N + Xt,N β + DN γ + ut,N , ut,N = ρMN ut,N + ε t,N ,

(2.4)

ε t,N = μN + ν t,N , where

⎞ y1t,N ⎟ ⎜ = ⎝ ... ⎠ , ⎛

yt,N

⎛

yNt,N

⎞

⎞ x1t,N ⎟ ⎜ = ⎝ ... ⎠ ,

⎞ d1,N ⎟ ⎜ DN = ⎝ ... ⎠

xNt,N ⎞ ⎛ ν1t,N ⎟ ⎜ = ⎝ ... ⎠ ,

dN,N ⎞ ⎛ μ1,N ⎟ ⎜ μN = ⎝ ... ⎠ ,

⎛

Xt,N

⎛

ε1t,N ⎜ .. ⎟ εt,N = ⎝ . ⎠ , ν t,N εNt,N νNt,N ⎞ ⎛ ⎛ w11,N · · · w1N,N m11,N ⎟ ⎜ .. ⎜ . . .. .. ⎠ , MN = ⎝ ... WN = ⎝ . wN1,N · · · wNN,N mN1,N

⎞ u1t,N ⎟ ⎜ = ⎝ ... ⎠ ⎛

ut,N

uNt,N (2.5)

μN,N

⎞ · · · m1N,N .. ⎟ . .. . . ⎠ · · · mNN,N

Stacking over time periods, we write our model compactly as yN = λWN yN + XN β + DN γ + uN = ZN δ + uN , uN = ρMN uN + εN ,

(2.6)

ε N = (ιT ⊗ IN ) μN + ν N , C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


51

where WN = (IT ⊗ WN ), MN = (IT ⊗ MN ), DN = (ιT ⊗ DN ) , ZN = (DN , WN yN , XN ), δ =

γ , λ, β and ⎞ ⎞ ⎞ ⎛ ⎛ ⎛ y1,N X1,N u1,N ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ yN = ⎝ ... ⎠ , XN = ⎝ ... ⎠ , uN = ⎝ ... ⎠ , ⎛

yT ,N

⎞

ε 1,N ⎜ .. ⎟ εN = ⎝ . ⎠ , ε T ,N

XT ,N ⎞ ν 1,N ⎟ ⎜ ν N = ⎝ ... ⎠ . ⎛

uT ,N

(2.7)

ν T ,N

Throughout, we maintain the following basic assumptions, which follow closely those postulated in Kapoor et al. (2007). Note that we reserve the symbols σν2 , σμ2 , λ and ρ for the true parameter values. The elements of ν N are independently and identically distributed over i A SSUMPTION 2.1. and t with finite absolute 4 + δ ν moments for some δ ν > 0. Furthermore, E(ν it,N ) = 0 and 2 ) = σν2 > 0. E(νit,N A SSUMPTION 2.2. The spatial weights collected in MN and WN are non-stochastic and (a) mii,N = 0 and w ii,N = 0. (b) The matrices (IN − ρMN ) and (IN − λWN ) are non-singular. WN , (IN − ρMN )−1 , (IN − λWN )−1 are (c) The row and column sums of the matrices MN , N uniformly bounded in absolute value; that is, supj i=1 |aij ,N | ≤ k < ∞, where k does not depend on N (but may depend on parameters of the model; that is, on ρ or λ, respectively) and a ij,N denotes elements of the above matrices. A SSUMPTION 2.3. The exogenous variables collected in XN and DN are non-stochastic. Their elements are uniformly bounded. Assumption 2.1 is a restriction on the higher moments of the disturbances required for asymptotic results. Assumption 2.2(a) is a typical normalization (but is not necessary for our asymptotic results). Assumptions 2.2(b) and (c) are regularity conditions. Assumptions 2.2(b) and (c) hold, for example, if the spatial weight matrices WN and MN are (maximum)-row normalized and |λ| ≤ k λ < 1 and |ρ| ≤ k ρ < 1, respectively. From Assumption 2.2(c) follows that |ρ| ≤ kρ < 1/λmax (MN ), |λ| ≤ kλ < 1/λmax (WN ), where λmax (·) denotes the largest absolute eigenvalue of a matrix and k ρ and k λ are constants.2 Assumptions like 2.2(b)–2.2(c) are typically maintained in spatial models (see Kelejian and Prucha, 1999) and restrict the extent of spatial dependence among cross-section units. They will be satisfied if the spatial weighting matrix is sparse so that each unit possesses a limited number of neighbours, or if the spatial weights decline sufficiently fast in distance.

3. THE ESTIMATION OF SPATIAL PANEL MODELS In their seminal paper, Kapoor et al. (2007) concentrate on the random effects model, assuming that the explanatory variables and the unit-specific error terms are independent. Yet, in applied 2

This follows from Corollary 5.6.16 in Horn and Johnson (1985) using their Lemma 5.6.10.


52


work this assumption often does not hold and a fixed effects specification is employed instead. Examples in a non-spatial setting, where the unit-specific effects and the explanatory variables may be correlated, include earnings equations. In this setting, the unobserved individual ability of an individual is typically correlated with the years of schooling, which enters the earnings equation as explanatory variable (see e.g. Baltagi, 2008, p. 79). Also in models explaining bilateral trade flows, the random effects model is typically rejected in favour of the fixed effects model to mention another example (see Egger, 2000). First, we analyse the spatial random effects estimator for this general spatial panel model. 3.1. The spatial random effects estimator Under the random effects specification, the unit-specific effects μi,N are assumed to be random and the following standard assumption is maintained. A SSUMPTION 3.1 (RE). The elements of μN are independently and identically distributed with finite absolute 4 + δ μ moments for some δ μ > 0 and (i) E(μ2i,N ) = σμ2 > 0. (ii) Furthermore, the elements of μN are independent of the process for ν it,N and E(μi,N ) = 0 are for all i. The random effects assumptions maintain that the individual effects exhibit homoscedastic variances and that they are orthogonal to each of the explanatory variables. The latter will always hold if the explanatory variables are non-stochastic and E[μN ] = 0 (see Wooldridge, 2002, pp. 257 and 259). The disturbances are generated as uN = (INT − ρMN )−1 ε N .

(3.1)

Under Assumptions 2.1 and 3.1 (RE), the variance–covariance matrix of the disturbances is given by

−1 E(uN uN ) = (INT − ρMN )−1 σμ2 ιT ιT ⊗ IN + σν2 INT INT − ρMN

where u,N = (INT

= σν2 u,N , (3.2) 2 σ − ρMN )−1 σμ2 (ιT ιT ⊗ IN ) + INT (INT − ρMN )−1 . It proves to be useful ν

to apply the notation σ12 = T σμ2 + σν2 and define the following standard within and between transformation matrices Qi,N (i = 0, 1): 1 Q0,N = IT − JT ⊗ IN , T (3.3) 1 Q1,N = JT ⊗ IN , T where JT is a T × T matrix of unit elements. The matrices Qi,N are the standard transformation matrices utilized in the error component literature but adjusted for the different stacking of the data (compare Kapoor et al., 2007, and Baltagi, 2008). The matrices Qi,N are symmetric and idempotent and mutually orthogonal. The variance–covariance matrix of the disturbances can then be written as (see Baltagi, 2008, p. 18):

−1 (3.4) u,N = (INT − ρMN )−1 ε,N INT − ρMN , C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


53

σ2 1

E ε N ε N = 12 Q1,N + Q0,N . 2 σν σν

(3.5)

where ε,N =

Furthermore, the inverse of u,N can then be expressed as −1

−1 u,N = INT − ρMN ε,N (INT − ρMN ),

(3.6)

where −1 ε,N

=

σ1 σν

−2

Q1,N + Q0,N .

(3.7)

If the parameter values ρ, σν2 and σμ2 (and, therefore, σ12 ) are known, the efficient GLS estimation procedure transforms the model by the square root of u,N given by −1/2

−1/2

u,N = ε,N (INT − ρMN ).

(3.8)

This is equivalent to first applying the spatial counterpart of the Cochrane–Orcutt transformation (INT − ρMN ) that eliminates the spatial correlation from the disturbances and then the familiar −1/2 panel GLS transformation ε,N that accounts for the variance–covariance structure of the innovations induced by the random effects. To simplify the exposition, we collect the parameters −1/2 of the variance-covariance matrix in a vector ϑ = (ρ, σν2 , σ12 ) and use the notation u,N (ϑ) to explicitly note the dependence of the GLS transformation on these parameters. Observe that in a balanced panel the order with which the transformations are applied is irrelevant (see also Remark A1 in Kapoor et al., 2007). Because the spatial lag WN yN is endogenous in the (transformed) model, we adapt the IV procedure described in Kelejian and Prucha (1998).3 Specifically, we first eliminate the spatial correlation from the error term using the Cochrane–Orcutt transformation. Then we apply the IV procedure for random effects models suggested by Baltagi and Li (1992), Cornwell et al. (1992) and surveyed by Baltagi (2008). In a non-spatial setting, these authors show that the optimal set of instruments for a random effects model with endogenous variables is composed of [Q0,N XN , Q1,N XN , Q1,N DN ]. Observe that E[yN ] = (INT − λWN )−1 (XN β+DN γ ) k k λ WN (XN β + DN γ ), = k=0

(3.9)

where W0N = INT . Hence, under the present assumptions, the ideal set of instruments is based on

3

It is possible to use other sets of instruments that are similar to those proposed in Lee (2003) or Kelejian et al. (2004).


54

J. Mutl and M. Pfaffermayr −1/2

u,N (ϑ)WN E[yN ] =

−1/2

λk u,N (ϑ)Wk+1 N (XN β + DN γ )

k=0

σν λ Q1,N + Q0,N Wk+1 = N (XN β + DN γ ) σ 1 k=0 σν k λ Q1,N + Q0,N MN Wk+1 −ρ N (XN β + DN γ ). σ 1 k=0

k

(3.10)

−1/2

Therefore, the transformed endogenous variable u,N (ϑ) WN yN is best instrumented by HR,N = [HQ,N , HP ,N ] = [Q0,N G0,N , Q1,N G1,N ], where G0,N contains a subset of the linearly independent columns of

XN , WN XN , W2N XN , . . . , MN XN , MN WN XN , MN W2N XN , . . . and G1,N contains a subset of the linearly independent columns of

G0,N , DN , WN DN , W2N DN , . . . , MN DN , MN WN DN , MN W2N DN , . . . . The columns in G0,N and G1,N must be chosen so that the columns of HR,N are linearly independent. We do not

include the constant and the other time-invariant variables in G0,N , because Q0,N WN DN = IT − T1 JT ιT ⊗ WN DN = 0. In the special case where WN = MN , the set of instruments is based on G0,N = [XN , WN XN , W2N XN , . . .] and G1,N = [G0,N , DN , WN DN , . . .]. If the spatial weighting matrices are row normalized, the set of instruments in G1,N excludes spatial lags of the time-invariant variables, since in this case MN ιNT = WN ιNT = ιNT , where ιNT is an N T × 1 vector of ones. With regard to the choice of the instruments in practical applications, note that it is usually not advisable to use powers of the spatial weight matrices higher than two; see also the discussion of the choice of the instruments in Kelejian and Prucha (1998). In the following we assume that the N T × p matrix of instruments denoted by HR,N is of the form described earlier. To derive asymptotic properties of the considered estimators, we maintain the following additional assumption for the matrix of instruments and the explanatory variables of the model that are collected in ZN = (DN , WN yN , XN )4 : −1/2

A SSUMPTION 3.2. Let ZN (ϑ) = u,N (ϑ)ZN . The matrix of instruments HR,N has full column rank and consists of a subset of linearly independent columns of [Q0,N G0,N , Q1,N G1,N ]. Furthermore, it satisfies the following conditions: (a) MHR HR = limN→∞ (NT)−1 HR,N HR,N exists and is finite and non-singular; (b) MHR Z = p limN→∞ (NT)−1 HR,N ZN exists and is finite with full column rank. The spatial random effects estimator of δ = (γ , λ, β ) is then defined as

−1 ZN (ϑ) ZN (ϑ) yN (ϑ), ZN (ϑ) δ GLS,N =

(3.11)

−1/2 −1/2 ZN (ϑ) = u,N (ϑ)ZN and yN (ϑ) = u,N (ϑ)yN . PHR,N = ZN (ϑ), with ZN (ϑ) = PHR,N −1 HR,N (HR,N HR,N ) HR,N is the projection matrix based on the instruments HR,N . 4 Since Z N defined below depends on the parameter vector ϑ, the next assumption is meant to apply to the true parameter values.



55

The joint asymptotic distribution of the spatial random effects estimator and the spatial within estimator under known nuisance parameter vector ϑ is given in Theorem 3.1. This theorem is based on the random effects Assumption 3.1 (RE) and forms the basis of the Hausman test. The asymptotic properties of this test and its feasible counterparts are given in Theorem 5.1 in Section 5. 3.2. The spatial within estimator In situations where the random effects assumption might be violated one can use the spatial within estimator that remains unaffected by the possible correlation of the unit-specific effects and the explanatory variables and, hence, a violation of Assumption 3.1 (RE). One can apply the within transformation Q0,N to wipe out the individual effects (see e.g. Mundlak, 1978, and Baltagi, 2008). Using Q0,N (INT − ρMN ) = (INT − ρMN )Q0,N , one obtains Q0,N uN = (ET ⊗ IN )[ρ(IT ⊗ MN )uN + (ιT ⊗ IN )μN + ν N ] = ρ(IT ⊗ MN )(ET ⊗ IN )uN + (ET ⊗ IN )ν N = ρ(IT ⊗ MN )Q0,N uN + Q0,N ν N

(3.12)

or Q0,N uN = (INT − ρMN )−1 Q0,N ν N ,

where ET = IT − T1 JT . Hence, one can apply the Cochrane–Orcutt type transformation to the within transformed model to obtain the fixed effects generalized least squares (FEGLS) estimator. Note that the parameters of the time-invariant variables, collected in the vector γ , remain unidentified with this approach. More importantly, one can base the method of moment estimator of (ρ, σν2 ) on the initial within transformed residuals of the initial within estimator as given by Q0,N uN , which are consistently estimated even if the unit-specific effects depend on XN . Obviously, the set of instruments denoted by HQ,N now comprises the linear independent columns of Q0,N G0,N . Because the constant and the time-invariant variables are wiped out in the spatial within estimator, we define the (N T × K) matrix ZQ,N = Q0,N [WN yN , XN ] with the corresponding (K × 1) parameter vector θ = (λ, β ) . To derive the asymptotic properties of the spatial within estimator for θ , we maintain the following additional assumptions for the matrix of instruments used in the spatial within model. A SSUMPTION 3.3. Let ZQ,N = Q0,N [WN yN , XN ]. The matrix of instruments HQ,N has full column rank and consists of a subset of linearly independent columns of Q0,N G0,N . Furthermore, it satisfies the following conditions: (a) MHQ HQ = limN→∞ (NT)−1 HQ,N HQ,N is finite and nonsingular with full column rank. (b) MHQ Z∗ = p limN→∞ (NT)−1 HQ,N (INT − ρMN )ZQ,N exists and is finite with full column rank. Again, treating ρ as known, we apply the Cochrane–Orcutt type transformation to the within transformed model yielding


56


Q0,N (INT − ρMN )yN = (INT − ρMN )Q0,N yN = (INT − ρMN )Q0,N ZQ,N + Q0,N ν N y∗N (ρ)

=

Z∗N (ρ)θ

+

(3.13)

ν ∗N ,

where y∗N (ρ) = (INT − ρMN )Q0,N yN , Z∗N (ρ) = (INT − ρMN )ZQ,N and ν ∗N = Q0,N ν N . The spatial within estimator for θ is then obtained by applying IV to the transformed model to obtain ∗

−1 ∗ ZN (ρ) Z∗N (ρ) (3.14) ZN (ρ) y∗ (ρ), θ W ,N = with Z∗N (ρ) = PHQ,N Z∗N (ρ) = PHQ,N (INT − ρMN )ZN . PHQ,N = HQ,N (HQ,N HQ,N )−1 HQ,N is the projection matrix based on the instruments HQ,N . The following theorem establishes our main asymptotic result concerning the common asymptotic distribution of the spatial random effects and the spatial within estimators under random effects Assumption 3.1 (RE). The Hausman test for spatial panels derived below will be based on this result. Because the random effects estimator includes time-invariant variables γ GLS,N , θ GLS,N ) = ( γ GLS,N , λGLS,N , β GLS,N ) . including the constant, we define δ GLS,N = ( T HEOREM 3.1.

Let Assumptions 2.1–3.3 hold. Then √ θ GLS,N − θ d NT → N 0, GLS GLS , GLS W θ W ,N − θ

−1 where W = σν2 (MHQ Z∗ M−1 and GLS is the lower-right K × K block of the HQ HQ MHQ Z ∗ ) −1 2 −1 matrix σν (MH ZMHR HR MHR Z) . R

This theorem forms the basis for the spatial Hausman test derived later under the null hypothesis that Assumption 3.1 (RE) is true.

4. FEASIBLE ESTIMATION The spatial GLS and spatial within estimators defined earlier are based on the unknown parameters ρ, σν2 and σμ2 which have to be estimated. The feasible estimation procedure starts by estimating the within transformed model using the instruments HQ,N = Q0,N G0,N as described earlier to obtain initial within IV estimates. This initial estimator is consistent (see Baltagi, 2008) and it can be written as

−1 θ I ,N = ZQ,N Q0,N ZQ,N (4.1) ZQ,N Q0,N yN , where ZQ,N = PHQ,N Q0,N ZQ,N . The following proposition gives the asymptotic distribution of the initial estimator. P ROPOSITION 4.1. Let the limit MHQ Z = p limN→∞ (NT)−1 HQ,N ZQ,N exist and be finite with full column rank. Let Assumptions 2.1–2.3 and 3.3 hold. Then √

d NT θ I ,N − θ → N (0, I ), −1 where I = σν2 (MHQ Z M−1 HQ HQ MHQ Z ) .



57

The projected residuals then give consistent initial estimates of Q 0,N uN , which can be used in the spatial generalized moments (GM) estimator as suggested by Kapoor et al. (2007). These authors use OLS residuals, which are only consistent under the random effects Assumption 3.1 (RE). The spatial GM estimator for ρ, σν2 can then be based on the first three moment conditions given in Kapoor et al. (2007). Using Qi,N MN = MN Qi,N and the notation ε N = MN ε N , we can formulate the first three moment conditions in terms of Q0,N uN as ⎤ ⎡ ⎤ ⎡

1 1 εN Q0,N ε N uN Q0,N INT − ρMN (INT − ρMN )Q0,N uN ⎥ ⎢ N(T − 1) ⎥ ⎢ N (T − 1) ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢

1 1 ⎢ ⎥ ⎢ uN Q0,N INT − ρMN MN MN (INT − ρMN )Q0,N uN ⎥ ε N Q0,N ε N ⎥ = ⎢ E⎢ ⎥ ⎥ ⎢ N(T − 1) ⎥ ⎢ N (T − 1) ⎣ ⎦ ⎦ ⎣

1 1 uN Q0,N INT − ρMN MN (INT − ρMN )Q0,N uN ε N Q0,N ε N N (T − 1) N(T − 1) ⎡ ⎤ σν2 ⎢ ⎥ ⎢ 1 ⎥ (4.2) = ⎢σν2 tr(MN MN )⎥ . ⎣ N ⎦ 0 Under the random effects model, Assumption 3.1 (RE), we add a fourth moment condition:

1 1 E ε Q1,N ε N = u Q1,N INT − ρMN (INT − ρMN )Q1,N uN = σ12 . (4.3) N N N N With the solution of the first three moment conditions at hand, one can solve the fourth moment condition to obtain an estimate of σ12 . Theorem 1 in Kapoor et al. (2007) shows that the estimators for ρ, σν2 and σ12 based on these moment conditions and some additional assumptions (see their Assumption 5) are consistent as long as the initial estimator θ I ,N is consistent. Proposition 4.2 demonstrates that the parameters ρ, σν2 and σ12 are nuisance parameters and that the feasible spatial random effects and the feasible spatial within estimates have the same asymptotic distribution as their counterparts based on the true values of ρ, σν2 and σ12 . P ROPOSITION 4.2. Let the feasible estimators θ F GLS,N and θ F W ,N be based on consistent 2 2 estimators of ρ, σν and σ1 . Then under Assumptions 2.1–3.3 we have √ p. NT( θ F GLS,N − θ GLS,N ) → 0, √ p. NT( θ F W ,N − θ W ,N ) → 0.

5. HAUSMAN SPECIFICATION TEST The spatial within estimator is consistent because it wipes out the unit-specific effects by applying the within transformation. The critical assumption for the validity of the spatial random effects model is that E[μN ] = 0, implying that the spatial random effects model is inconsistent if the random effects Assumption 3.1 (RE) does not hold. The Hausman test (Hausman, 1978) suggests comparing these two estimators and testing whether the random effects assumption C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

58


holds true. The spatial GLS estimator of the random effects model is more efficient than the spatial within estimator under the random effects Assumption 3.1 (RE). Moreover, under the null hypothesis both considered estimators are consistent, while under the fixed effects assumption, the spatial random effects estimator is inconsistent, whereas the spatial within estimator is consistent. We summarize these properties in the following lemma.5 L EMMA 5.1.

Under Assumptions 2.1–3.3 we have: (a) √

d. NT( θ GLS,N − θ W ,N ) → N (0, W − GLS ),

where W − GLS is positive definite at the true parameter values ϑ. (b) Furthermore, p. GLS,N → W ,N − W − GLS ,

where

−1 2 W ,N = σν,N N T ZQ,N INT − ρ N MN PHQ,N (INT − ρ N MN )ZQ,N and

−1 −1/2 −1/2 2 GLS,N = σν,N N T ZN u,N ( ϑ N )PHR,N u,N ( ϑ N )ZN , 2 2 ρN , σν,N , σ1,N ) some consistent estimator of ϑ. with ϑ N = (

The theorem below now defines the Hausman test statistic for spatial panels and provides its asymptotic distribution under the null. T HEOREM 5.1.

Assume that Assumptions 2.1–3.3 hold and that

W ,N − GLS,N )−1 ( N = N T ( θ F GLS,N − θ F W ,N ) ( θ F GLS,N − θ F W ,N ) H and θ GLS,N − θ W ,N ) ( W − GLS )−1 ( θ GLS,N − θ W ,N ). HN = N T ( p. N − HN → 0, where HN is asymptotically χ 2 distributed with K degrees of freedom. Then H

Given Theorem 2 in Kapoor et al. (2007) and the results in this paper, a feasible estimation and testing procedure can be summarized as follows: (1) Calculate a consistent initial IVs within estimator θ I ,N which wipes out the individual effects using the within transformation. This estimator ignores the spatial correlation in the disturbances. 5 To operationalize this lemma, we need to provide a consistent estimator ϑ N of ϑ. Our suggestion (see the summary 2 , 2 ) , where ρ GLS,N using W ,N and ϑ N = ( ρN , σν,N σ1,N N and of our estimation procedure below) is to construct both 2 are estimated from three moment conditions using the within estimates (see 4.2), while 2 is derived from a fourth σν,N σ1,N moment condition that exploits the between variation.



59

(2) Use the resulting estimated disturbances of the within transformed model in a spatial GM procedure as described in Kapoor et al. (2007) and obtain a (consistent) estimator 2 2 ρN , σν,N , σ1,N ) . ϑ N = ( (3) Transform the model by the spatial Cochrane–Orcutt transformation and then use either the GLS, or the within transformation to obtain the feasible spatial GLS and spatial θ F W ,N , respectively. within estimators θ F GLS,N and N to make a decision whether the random or (4) Calculate the Hausman test statistics H fixed effects specification is more appropriate. Note, once the variables are spatially Cochrane–Orcutt transformed standard econometric software can calculate the spatial Hausman test. We now investigate the power properties of the test statistic; that is, its behaviour when the null hypothesis is violated. To do so, we need to specify a particular alternative. Note that all results obtained up to this point were independent of a particular choice of the alternative hypothesis. We now follow Mundlak (1978) and assume an alternative (fixed effects) model in which the explanatory variables and the unit-specific effects are related in the following way:6 A SSUMPTION 5.1 (FE).

The vector of individual effects is given by μN = XN π + ξ N ,

where π = 0(K−1)×1 , and the N × (K−1) matrix XN contains the time averages of the explanatory variables and is given by XN =

1 ι ⊗ IN XN . T T

Finally, the elements of the N × 1 random vector ξ N satisfy Assumption 3.1 (RE). Note that it would be trivial to generalize ourassumption to an alternative model that has been defined by Chamberlain (1982), where μi,N = Tt=1 xit,N π t + ξi,N or μN = Tt=1 Xt,N π t + ξ N . Clearly, setting π t = T1 π yields Mundlak’s formulation. Under the FE assumption, we have

1 ιT ⊗ IN XN + ξ N + ν N T = Q1,N XN π + (ιT ⊗ IN ) ξ N + ν N ,

ε N = (ιT ⊗ IN )

(5.1)

uN = (INT − ρMN )−1 [Q1,N XN π +(ιT ⊗ IN )ξ N + ν N ]. The individual effects do not depend on the explanatory variables if and only if π = 0 and the random effects model defined under Assumption 3.1 (RE) arises as a special case of Assumption 5.1 (FE). Observe that under Assumption 5.1 (FE) with π = 0 the spatial GLS estimator is biased

6 There are other specifications of the unit effects possible. In the present case, unit effects are decomposed in a spatially correlated error term similar to that defined in Assumption 3.1 (RE) and a systematic component without spatial effects.


60


as one obtains

−1 ZN (ϑ) ZN (ϑ) yN (ϑ) ZN (ϑ) δ GLS,N =

−1 −1/2 ZN (ϑ)u,N =δ+ ZN (ϑ) ZN (ϑ)

(5.2)

· [Q1,N XN π+(ιT ⊗ IN )ξ N + ν N ]. On the other hand, the within transformation wipes out the individual effects and, hence, the within estimator is the same as under H0 : π = 0. The following proposition shows that the Hausman test statistic is a consistent statistic; that is, the power of the test approaches unity as N → ∞ for an arbitrary significance level of the test. P ROPOSITION 5.1.

Let Assumptions 2.1–2.3, 5.1 (FE), 3.2 and 3.3 hold and assume that −1/2

MHR X = lim (NT)−1 HR,N u,N Q1,N XN N→∞

exists and is finite with full column rank. Let h > 0 be some positive constant. Then limN→∞ P (HN > h) = 1.

6. MONTE CARLO EVIDENCE The Monte Carlo analysis investigates the small sample properties of the proposed spatial Hausman test. For this we use a simple spatial panel model that includes one explanatory variable and a constant yit,N = λ

N j =1

wij ,N yj t,N + βxit,N + α + uit,N ,

i = 1, . . . , N

and

t = 1, . . . , T . (6.1)

We set β = 0.5 and α = 5. The explanatory variable is generated as xit = ζ i + zit with ζ i ∼ i.i.d. U [−7.5, 7.5] and zit ∼ i.i.d. U [−7.5, 7.5] with U [a, b] denoting the uniform distribution on the interval [a, b]. In accordance with Assumption 2.3, xit is treated as a non-stochastic variable and it is held fixed in repeated samples. The individual-specific effects are allowed to be correlated with x i , setting μi = μi0 + π x i , where μi0 is drawn from a normal distribution; that is, μi0 ∼ i.i.d. N(0, 10φ) and π is a constant parameter. This mimics the fixed effects Assumption 5.1 with π = 0 . At π = 0, the random effects Assumption 3.1 holds and it forms the null for the spatial σ2

μ Hausman test. We normalize μi so that its mean is 0 and its variance 10φ, where φ = σ 2 +σ 2, 0 < μ ε φ < 1, denotes the proportion of the total variance due to the presence of the individual-specific effects. For the remainder error, we assume εit ∼ i.i.d. N (0, 10(1 − φ)). This implies that the overall variance of the disturbances is σμ2 + σε2 = 10. The row normalized spatial weighting matrix uses a regular lattice with 144 and 324 cells, respectively, containing one observation each. The spatial weighting scheme is based on a rook design, where every unit is surrounded by four neighbours. The corresponding spatial weighting matrix is maximum-row normalized following Kelejian and Prucha (2007). We use the same spatial weighting matrix to generate both the endogenous spatial lag and the spatial lag of the error term. The spatial parameters λ and ρ vary over the set {−0.8, −0.4, 0, 0.4, 0.8}. The parameter π takes its values in {−0.2, −0.1, 0, 0.1, 0.2}. Based on the discussion earlier



61

we use the instruments HQ,N = [Q0,N xN , Q0,N WN xN ,Q0,N W2N xN ], while HR,N is composed of [HQ,N , Q1,N xN , Q1,N WN xN , Q1,N W2N xN , ιNT , Q1,N WN ιNT ]. In each experiment, we calculate the size of the Hausman test, which is given by the share of rejections at π = 0. The power of the spatial Hausman test is given by the share of rejections at π = 0. The baseline scenario is reported in Table 1 setting N = 144, T = 5 and φ = 0.5. The results show that the proposed spatial IVGLS estimators work well and that the spatial Hausman test exhibits a good performance for almost all considered parameter configurations. In the experiments reported in Table 1, the spatial Hausman test comes close to the nominal size of 0.05 in most of the cases. Exceptions are only observed for high values of λ, where the test is slightly oversized. For example, at λ = 0.8 and ρ = 0.8 the size of the spatial Hausman test is 0.092. At negative values of ρ, this phenomenon is not observed. The power of the test by and large remains unaffected by variations of ρ and λ, although it seems somewhat lower at high absolute values of ρ or high absolute values of λ. A larger cross-section (N = 324, N T = 1620) improves both the size and the power of the test as expected (Table 2), especially the size distortion at high positive values of λ is now reduced. In Table 3, we extend the time series dimension and set T = 11 (N T = 1584). The size distortion at high values of λ becomes smaller as T increases and this effect seems more pronounced than in an extended cross-section as analysed in Table 2. However, the improvement in power is much smaller than that observed when extending the cross-section dimension. In Table 4, we set N = 144, T = 5 and φ = 0.8, so that σμ2 = 8 and σν2 is 2. With a larger weight of the variance of the unit-specific effects, we observe a better performance of the spatial Hausman test in terms of its size and the size distortion observed in the baseline scenario now vanishes. Also, the power of the test is significantly higher. We performed a series of robustness checks and the full set of tables with these results is available upon request. In particular, we considered the case where serial correlation in the disturbances is present, but is ignored. Our results show that the size of the Hausman test remains nearly unaffected. However, the spatial Hausman test has lower power especially if the serial correlation is high. In addition, we investigated the performance of the proposed Hausman test when the exogenous variable has weak effects on the dependent variable. In our simulation experiment the variance of the generated explanatory variable is reduced by one-half, which results in weaker instruments as compared to the baseline experiment. The results show that at high positive values of both λ and ρ the spatial Hausman test tends to over-reject, if the explanatory variables are weak. Also, the power of the test is lower compared to the baseline experiment. The tables corresponding to these additional experiments are available upon request from the authors. We also investigated the root mean square error (RMSE) and the bias of the estimators of β, λ and ρ for the basic case with N = 144, T = 5 and θ = 0.5.7 Following Kapoor et al. (2007) we define the bias as the difference between the respective median of the parameter estimate and

I Q 2 , where I Q is defined as the interquantile its true counterpart, while RMSE = bias2 + 1.35 range, that is the difference between the 0.75 and 0.25 quantile of the simulated parameter IQ corresponds distribution. Under a normal distribution the median and the mean coincide and 1.35 to the standard deviation (up to a rounding error).

7

The tables corresponding to these simulation experiments are available upon request from the authors.


62

J. Mutl and M. Pfaffermayr Table 1. Size and power of the spatial Hausman test. λ

ρ

π

−0.8

−0.4

0

0.4

0.8

−0.8

−0.3 −0.2 −0.1

0.999 0.976 0.501

1.000 0.980 0.529

1.000 0.984 0.532

1.000 0.984 0.541

1.000 0.978 0.475

0.0 0.1 0.2

0.041 0.479 0.983

0.052 0.521 0.989

0.057 0.518 0.988

0.058 0.515 0.985

0.050 0.454 0.979

0.3

1.000

1.000

1.000

1.000

1.000

−0.3

1.000

1.000

1.000

1.000

1.000

−0.2 −0.1 0.0

0.987 0.540 0.041

0.993 0.568 0.035

0.989 0.558 0.043

0.991 0.563 0.048

0.990 0.540 0.041

0.1 0.2

0.505 0.992

0.508 0.987

0.543 0.985

0.534 0.996

0.520 0.985

0.3

1.000

1.000

1.000

1.000

1.000

−0.3

1.000

1.000

1.000

1.000

1.000

−0.2 −0.1 0.0

0.988 0.556 0.034

0.988 0.563 0.036

0.990 0.578 0.040

0.995 0.573 0.053

0.994 0.582 0.074

0.1 0.2

0.543 0.993

0.543 0.993

0.536 0.993

0.541 0.992

0.512 0.988

0.3

1.000

1.000

1.000

1.000

1.000

−0.3 −0.2

1.000 0.989

1.000 0.991

1.000 0.998

1.000 0.993

1.000 0.989

−0.1 0.0

0.532 0.042

0.550 0.041

0.570 0.041

0.572 0.051

0.581 0.079

0.1 0.2 0.3

0.536 0.996 1.000

0.564 0.990 1.000

0.530 0.990 1.000

0.498 0.989 1.000

0.530 0.988 1.000

−0.3 −0.2

1.000 0.987

1.000 0.986

1.000 0.988

1.000 0.989

1.000 0.985

−0.1 0.0 0.1

0.493 0.056 0.540

0.500 0.059 0.545

0.500 0.054 0.530

0.523 0.058 0.513

0.521 0.090 0.480

0.2 0.3

0.992 1.000

0.993 1.000

0.992 1.000

0.984 1.000

0.970 1.000

−0.4

0.0

0.4

0.8

Note: N = 144, T = 5, θ = 0.5, normal disturbances, 2000 replications.


63

The Hausman test in a Cliff and Ord panel model Table 2. Size and power of the spatial Hausman test. λ ρ

π

−0.8

−0.4

0

0.4

0.8

−0.8

−0.3 −0.2 −0.1

1.000 1.000 0.872

1.000 0.999 0.865

1.000 0.999 0.854

1.000 1.000 0.851

1.000 1.000 0.849

0.0 0.1 0.2

0.048 0.845 1.000

0.051 0.857 1.000

0.054 0.849 1.000

0.047 0.856 1.000

0.054 0.838 1.000

0.3

1.000

1.000

1.000

1.000

1.000

−0.3

1.000

1.000

1.000

1.000

1.000

−0.2 −0.1 0.0

1.000 0.899 0.049

1.000 0.896 0.051

1.000 0.896 0.042

1.000 0.900 0.045

1.000 0.893 0.042

0.1 0.2

0.887 1.000

0.902 1.000

0.879 1.000

0.878 1.000

0.886 1.000

0.3

1.000

1.000

1.000

1.000

1.000

−0.3

1.000

1.000

1.000

1.000

1.000

−0.2 −0.1 0.0

1.000 0.900 0.044

1.000 0.906 0.047

1.000 0.905 0.047

1.000 0.896 0.046

1.000 0.909 0.048

0.1 0.2

0.893 1.000

0.896 1.000

0.898 1.000

0.898 1.000

0.896 1.000

0.3

1.000

1.000

1.000

1.000

1.000

−0.3 −0.2

1.000 1.000

1.000 1.000

1.000 1.000

1.000 1.000

1.000 1.000

−0.1 0.0

0.892 0.048

0.900 0.043

0.888 0.047

0.888 0.047

0.891 0.059

0.1 0.2 0.3

0.908 1.000 1.000

0.896 1.000 1.000

0.899 1.000 1.000

0.891 1.000 1.000

0.886 1.000 1.000

−0.3 −0.2

1.000 1.000

1.000 1.000

1.000 1.000

1.000 1.000

1.000 1.000

−0.1 0.0 0.1

0.861 0.043 0.876

0.868 0.049 0.908

0.859 0.049 0.891

0.858 0.058 0.880

0.851 0.075 0.854

0.2 0.3

1.000 1.000

1.000 1.000

1.000 1.000

1.000 1.000

1.000 1.000

−0.4

0.0

0.4

0.8



64

J. Mutl and M. Pfaffermayr Table 3. Size and power of the spatial Hausman test. λ

ρ

π

−0.8

−0.4

0

0.4

0.8

−0.8

−0.3 −0.2 −0.1

1.000 0.992 0.583

1.000 0.991 0.564

1.000 0.993 0.573

1.000 0.992 0.571

1.000 0.993 0.529

0.0 0.1 0.2

0.049 0.561 0.994

0.038 0.566 0.994

0.047 0.572 0.997

0.053 0.577 0.996

0.038 0.535 0.993

0.3

1.000

1.000

1.000

1.000

1.000

−0.3

1.000

1.000

1.000

1.000

1.000

−0.2 −0.1 0.0

0.997 0.627 0.041

0.996 0.624 0.045

0.998 0.617 0.047

0.997 0.616 0.045

0.995 0.614 0.043

0.1 0.2

0.572 0.998

0.589 0.994

0.619 0.998

0.594 0.998

0.615 0.996

0.3

1.000

1.000

1.000

1.000

1.000

−0.3

1.000

1.000

1.000

1.000

1.000

−0.2 −0.1 0.0

0.998 0.625 0.043

0.998 0.637 0.045

0.998 0.638 0.045

0.997 0.650 0.047

0.997 0.640 0.051

0.1 0.2

0.613 0.998

0.618 0.994

0.618 0.997

0.608 0.998

0.609 0.996

0.3

1.000

1.000

1.000

1.000

1.000

−0.3 −0.2

1.000 0.997

1.000 0.999

1.000 0.995

1.000 0.998

1.000 0.997

−0.1 0.0

0.623 0.056

0.594 0.047

0.631 0.042

0.628 0.041

0.602 0.060

0.1 0.2 0.3

0.601 0.998 1.000

0.640 0.998 1.000

0.632 0.997 1.000

0.607 0.996 1.000

0.577 0.996 1.000

−0.3 −0.2

1.000 0.993

1.000 0.994

1.000 0.997

1.000 0.993

1.000 0.993

−0.1 0.0 0.1

0.551 0.054 0.603

0.580 0.051 0.608

0.572 0.048 0.614

0.570 0.058 0.604

0.589 0.058 0.564

0.2 0.3

0.997 1.000

0.994 1.000

0.996 1.000

0.997 1.000

0.992 1.000

−0.4

0.0

0.4

0.8



65

The Hausman test in a Cliff and Ord panel model Table 4. Size and power of the spatial Hausman test. λ ρ

π

−0.8

−0.4

0

0.4

0.8

−0.8

−0.3 −0.2 −0.1

1.000 0.997 0.669

1.000 0.998 0.680

1.000 0.998 0.654

1.000 0.999 0.680

1.000 0.998 0.634

0.0 0.1 0.2

0.045 0.683 0.999

0.052 0.677 0.999

0.045 0.680 0.998

0.046 0.650 0.999

0.047 0.618 0.996

0.3

1.000

1.000

1.000

1.000

1.000

−0.3

1.000

1.000

1.000

1.000

1.000

−0.2 −0.1 0.0

0.999 0.701 0.051

1.000 0.728 0.047

0.999 0.716 0.052

0.999 0.714 0.043

0.999 0.721 0.050

0.1 0.2

0.674 1.000

0.683 0.999

0.693 0.999

0.688 0.999

0.692 1.000

0.3

1.000

1.000

1.000

1.000

1.000

−0.3

1.000

1.000

1.000

1.000

1.000

−0.2 −0.1 0.0

0.999 0.720 0.041

0.999 0.721 0.040

0.999 0.717 0.043

0.999 0.715 0.049

0.999 0.720 0.045

0.1 0.2

0.701 1.000

0.701 0.999

0.727 1.000

0.710 1.000

0.695 0.999

0.3

1.000

1.000

1.000

1.000

1.000

−0.3 −0.2

1.000 1.000

1.000 1.000

1.000 0.999

1.000 0.999

1.000 0.999

−0.1 0.0

0.709 0.045

0.707 0.045

0.688 0.048

0.703 0.045

0.711 0.041

0.1 0.2 0.3

0.743 0.999 1.000

0.729 1.000 1.000

0.711 0.998 1.000

0.702 1.000 1.000

0.672 0.999 1.000

−0.3 −0.2

1.000 1.000

1.000 0.998

1.000 0.998

1.000 0.999

1.000 0.998

−0.1 0.0 0.1

0.695 0.048 0.703

0.673 0.052 0.716

0.669 0.054 0.710

0.674 0.051 0.694

0.674 0.052 0.632

0.2 0.3

0.998 1.000

0.999 1.000

1.000 1.000

0.999 1.000

0.998 1.000

−0.4

0.0

0.4

0.8



66


The simulation exercises reveal a negligible bias for β and a somewhat higher efficiency of the random effects estimator under H 0 . The gain in efficiency is especially large at high positive values of λ and at high absolute values of ρ. A similar pattern can be found for the RMSE of λ, although the efficiency loss of the spatial within estimator is much higher compared to that for β. Under H 1 the random effects estimator is inconsistent, leading to large biases in both β and λ. The bias of the slope parameter β is hardly affected by different degrees of spatial dependence as represented by the parameter values of λ. However, the bias is negative at low and negative values of ρ and turns positive if ρ gets high. With respect to the estimates of λ, we find that the bias is negative if λ or ρ take on negative values, but that it declines in λ and/or ρ. At λ = 0.8 or ρ = 0.8 the bias nearly vanishes. The estimates of ρ remain unaffected by deviations from H 0 as expected. These estimates are based on the spatial within estimator which is consistent under both H 0 and H 1 . We also assess the performance of the spatial Hausman test for non-normal disturbances against the baseline case in Table 1.8 In particular, we follow Kelejian and Prucha (1999) and ξit 0.5 assume lognormal remainder disturbances assuming εit = e√e2−e , where ξ it ∼ i.i.d. N (0, 1). −e1 Alternatively, we maintain that the distribution of the remainder error exhibits fatter tails than the normal and εit ∼ i.i.d. t(5). In both cases, the performance of the spatial Hausman test is comparable to that under normal disturbances. However, under the t(5) error distribution the power of the test is smaller. To summarize, the small Monte Carlo study shows that the proposed spatial Hausman test works well even in small panels. In this spatial setting, the test is able to detect deviations from the assumption that unobserved unit effects and the explanatory variables are uncorrelated, which is critical for the validity of spatial random effects models.

7. CONCLUSIONS In this paper, we study spatial random effects and spatial fixed effects models. We note that in many non-spatial applications the critical assumption maintained under the random effects specification, namely that unit-specific effects and explanatory variables are uncorrelated, does not hold. This also seems a possibility in a spatial setting and should be tested, since the estimates of spatial random effects are inconsistent if this assumption fails to hold. Using a spatial Cliff and Ord type model as analysed in Kapoor et al. (2007), but augmented by an endogenous spatial lag, we introduce (feasible) IVs estimators for both the spatial random effects model and a spatial fixed effects model. We derive the asymptotic distributions of these estimators as well as those of their feasible counterparts. In addition, we propose a spatial Hausman test to compare these two models, accounting for spatial autocorrelation in the disturbances. A small Monte Carlo study shows that this test works well even in small panels.

ACKNOWLEDGMENTS We would like to thank Robert Kunst, Ingmar Prucha, the editor and two anonymous referees for helpful comments and suggestions. 8

The corresponding tables are available from the authors upon request. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


67

REFERENCES Anselin, L. (1988). Spatial Econometrics: Methods and Models. Boston, MA: Kluwer. Baltagi, B. H. (2008). Econometric Analysis of Panel Data (4th ed.). Chichester: John Wiley. Baltagi, B. H. and Q. Li (1992). A note on the estimation of simultaneous equations with error components. Econometric Theory 8, 113–19. Case, A. C. (1991). Spatial patterns in household demand. Econometrica 59, 953–65. Chamberlain, G. (1982). Multivariate regression models for panel data. Journal of Econometrics 18, 5– 46. Cornwell, C., P. Schmidt and D. Wyhowski (1992). Simultaneous equations and panel data. Journal of Econometrics 51, 151–81. Egger, P. (2000). A note on the proper econometric specification of the gravity equation. Economics Letters 66, 25–31. Greene, W. (2008). Econometric Analysis (6th ed.). Englewood Cliffs, NJ: Prentice Hall. Hausman, J. A. (1978). Specification tests in econometrics. Econometrica 46, 1251–71. Horn, R. A. and C. R. Johnson (1985). Matrix Analysis. Cambridge: Cambridge University Press. Kapoor, M., H. H. Kelejian and I. R. Prucha (2007). Panel data models with spatially correlated error components. Journal of Econometrics 140, 97–130. Kelejian, H. H. and I. R. Prucha (1998). A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. Journal of Real Estate Finance and Economics 17, 99–121. Kelejian, H. H. and I. R. Prucha (1999). A generalized moments estimator for the autoregressive parameter in a spatial model. International Economic Review 40, 509–33. Kelejian, H. H. and I. R. Prucha (2007). Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. Journal of Econometrics 157, 53– 67. Kelejian, H. H., I. R. Prucha and Y. Yuzefovich (2004). Instrumental variable estimation of a spatial autoregressive model with autoregressive disturbances: large and small sample results. In J. LeSage and R. K. Pace (Eds.), Advances in Econometrics, 18, 163–98. New York: Elsevier. Korniotis, G. M. (2010). Estimating panel models with internal and external habit formation. Journal of Business and Economic Statistics 28, 145–58. Lee, L. (2003). Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econometric Reviews 22, 307–35. Lee, L. and J. R. Yu (2010). Estimation of spatial autoregressive panel data models with fixed effects. Journal of Econometrics 154, 165–85. Mundlak, Y. (1978). On pooling time series and cross section data. Econometrica 46, 69–85. Mutl, J. (2006). Dynamic panel data models with spatially correlated disturbances. Ph.D. dissertation, University of Maryland. Schmidt, P. (1976). Econometrics. New York: Marcel Dekker. Wooldridge, J. (2002). Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press. Yu, J. R., R. de Jong and L. Lee (2007). Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large: a nonstationary case. Working paper, Ohio State University. Yu, J. R., R. de Jong and L. Lee (2008). Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. Journal of Econometrics 146, 118– 34.


68


APPENDIX Proof of Theorem 3.1: We denote the (T + 1)N × 1 vector of i.i.d. (0, 1) innovations as ζ N = and we write the stacked estimators as9 δ GLS,N − δ = PN FN ζ N , θ W ,N − θ

!

μN σμ

,

ν N σν

"

where PN =

PR,N

0

0 PQ,N

,

FN =

FR1,N FR2,N 0

FQ,N

with

−1

−1 PR,N = ZN HR,N HR,N HR,N HR,N ZN

−1 · ZN HR,N HR,N HR,N ,

−1

−1 ∗ HQ,N Z∗Q,N PQ,N = ZQ,N HQ,N HQ,N HQ,N

−1 · Z∗ , Q,N HQ,N HQ,N HQ,N and −1/2

FR1,N = σμ HR,N ε,N (ιT ⊗ IN ) σν = σμ HR,N Q1,N + Q0,N (ιT ⊗ IN ), σ1 −1/2

FR2,N = σν HR,N ε,N σν = σν HR,N Q1,N + Q0,N , σ1 FQ,N = σν HQ,N Q0,N . By Assumptions 3.2 and 3.3 it follows that the sequences of the stochastic matrices (NT) PR,N and (NT) PQ,N converge in probability; that is,

−1 p. (NT) PR,N → MHR ZM−1 MHR ZM−1 HR HR MHR Z HR HR ,

p. −1 (NT) PQ,N → MHQ Z∗ M−1 MHQ Z∗ M−1 HQ HQ MHQ Z ∗ HQ HQ . Next we apply the central limit theorem for vectors of triangular arrays given in Theorem A1 in Mutl (2006) 1 to (NT)− 2 FN ζ N . By Assumptions 2.1 and 3.1, the vector of random variables ζ N satisfies the assumptions of the central limit theorem. Observe that the matrix FN is non-stochastic and that Assumptions 2.2, 2.3 and 3.2 imply that the row and column sums of FN are bounded in absolute value. Hence, it remains to be demonstrated that the matrix (NT)−1 FN FN has eigenvalues uniformly bounded away from zero.

9 We use the properties of the Q 0,N and Q1,N transformation matrices (see e.g. Kapoor et al., 2007, Remark A1, and −1/2 Baltagi, 2008). In particular, we have ε,N = σ1−1 Q1,N + σν−1 Q0,N . C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

69

The Hausman test in a Cliff and Ord panel model One can show that10

FN FN (p+2q)×(p+2q)

=

σν2

HR,N HR,N HR,N HQ,N

HQ,N HR,N HQ,N HQ,N HR,N 0 INT INT HR,N 2

= σν

0

HQ,N

0

INT INT

0

HQ,N

,

and, hence,

(NT)−1 λmin FN FN ≥ min (NT)−1 λmin HR,N HR,N , (NT)−1 λmin HQ,N HQ,N ) INT INT . · λmin INT INT !

Observe that (NT)−1 HR,N HR,N and (NT)−1 HQ,N HQ,N and

IN T IN T IN T IN T

" are symmetric. By Assumptions 3.2

and 3.3 the first two matrices have full rank p + q and q, respectively. Note that the third matrix has trivially full rank as well. Hence, (N T )−1 · λmin (FN FN ) is uniformly bounded away from zero. Therefore, by the central limit theorem it follows that11 1 d. FN FN . (NT)−1/2 FN ζ N → N 0, lim N→∞ N T From Assumptions 3.2 and 3.3 we also have δ GLS,N − δ d. 1 (NT) 2 → N (0, ), θ W ,N − θ PN FN FN where = p limN→∞ (NT) PN . Fairly straightforward calculation shows that 11 12 , = 12 22 with 11 = 12 22

σν2

−1 MHR ZM−1 = HR HR MHR Z

γ ,GLS γ θ,GLS

γ θ,GLS

GLS

,

γ θ,GLS

−1 0L×K −1 MHR ZMHR HR MHR Z = = IK (K+L)×K GLS

−1 = σν2 MHQ Z∗ M−1 = W . HQ HQ MHQ Z ∗ σν2

10 Recall that the (N T × q) matrix of within transformed instruments H Q,N = Q0,N G0,N has full column rank q and that Q0,N HQ,N = HQ,N . Furthermore, HR,N = [HP ,N , HQ,N ], where HP ,N = Q1,N G1,N with dimension (N T × p) and full column rank p. Because Q0,N Q1,N = 0 , it follows that

HR,N Q0,N HQ,N = HR,N HQ,N G1,N Q1,N 0p×q = H = Q . 0,N Q,N HQ,N HQ,N G0,N Q0,N 11

Note that it can be demonstrated that limN→∞ (NT)−1 FN FN exists.


70


We have ordered the elements of the vector of parameters δ such that the first elements correspond to the time-invariant variables so that the asymptotic variance–covariance matrix of the stacked estimators becomes ⎤ ⎡ γ ,GLS γ θ,GLS γ θ,GLS ⎥ ⎢ GLS ⎦ , = ⎣ γ θ,GLS GLS γ θ,GLS

GLS

W

and, hence, √

⎞ ⎛ γ GLS,N − γ ⎟ d. ⎜ θ GLS,N − θ ⎠ → N (0, ) . NT ⎝ θ W ,N − θ

Proof of Proposition 4.1: Note that we have

−1

−1 HQ,N ZQ,N θ I ,N − θ = ZQ,N HQ,N HQ,N HQ,N

−1 · ZQ,N HQ,N HQ,N HQ,N HQ,N Q0,N ν N = PQ,N HQ,N Q0,N ν N . Given Assumption 3.3 and the condition in the proposition, it follows that p. −1 −1 (NT) PQ,N → [MHQ Z M−1 HQ HQ MHQ Z ] MHQ Z MHQ HQ

and

(NT)λmin HQ,N Q0,N Q0,N HQ,N = (NT)λmin HQ,N HQ,N , which is uniformly bounded away from zero. Given Assumption 2.1, the conditions of Theorem A1 in Mutl (2006) are satisfied and in light of Assumption 3.3(a), we have the desired result. Proof of Proposition 4.2: The proof follows closely that of Theorem 4, part 2, in Kapoor et al. (2007) and Theorem 3 in Mutl (2006). In particular, it will be sufficient to show that (see e.g. Schmidt, 1976):

p ZN ( ϑ) ϑ) − ZN (ϑ) ZN ( ZN (ϑ) → 0, G1,N = (NT)−1 ∗

p W 1,N = (NT)−1 ZN ( ρ ) Z∗N ( ρ) − Z∗N (ρ) Z∗N (ρ) → 0, and p

G2,N = (NT)−1/2 ϑ) uN ( ϑ) − ZN (ϑ) uN (ϑ) → 0, ZN (

∗ p W 2,N = (NT)−1/2 ZN ( ρ ) u∗ ( ρ) − Z∗N (ρ) u∗ (ρ) → 0,

ϑ) = PHR,N ϑ) and Z∗N ( ϑ) = ZN ( where ϑ and ρ are (any) consistent estimators of ϑ and ρ. Note that ZN ( ∗ PHQ,N ZN (ϑ), where −1/2 ZN ( ϑ) = MN )ZN ε,N (INT − ρ σν = Q1,N + Q0,N (INT − ρ MN )ZN , σ1 C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

The Hausman test in a Cliff and Ord panel model and ϑ) = Q0,N (INT − ρ MN )ZN . Z∗N ( Hence,

ρ−ρ σν σν ZN MN Q1,N + Q0,N PHR,N Q1,N + Q0,N ZN NT σ1 σ1 ρ−ρ σν σν + ZN Q1,N + Q0,N PHR,N Q1,N + Q0,N MN ZN NT σ1 σ1 ρ 2 − ρ 2 σν σν ZN MN Q1,N + Q0,N PHR,N Q1,N + Q0,N MN ZN + NT σ1 σ1 σν σν − σ σ1

ZN INT − ρMN Q0,N PHR,N Q1,N (INT − ρMN ) ZN + 1 NT σν σν − σ1 σ1

ZN INT − ρMN Q1,N PHR,N Q0,N (INT − ρMN ) ZN + NT σν2 σ2 − ν2 2 σ σ1

ZN INT − ρMN Q0,N PHR,N Q0,N (INT − ρMN ) ZN + 1 NT ρ−ρ σν σν σ2 ZN AG1,N ZN + ZN AG2,N ZN + ZN AG3,N ZN + ν2 ZN AG4,N ZN = NT σ1 σ1 σ1 ρ−ρ σν σν σ2 ZN AG1,N ZN + ZN AG2,N ZN + ZN AG3,N ZN + ν2 ZN AG4,N ZN + NT σ1 σ1 σ1 ρ 2 − ρ 2 σ σ σ2 ν ν ZN AG5,N ZN + ZN AG6,N ZN + ZN AG7,N ZN + ν2 ZN AG8,N ZN + NT σ1 σ1 σ1 σν σν − σ σ1 ZN AG9,N ZN + ZN AG9,N ZN + 1 NT σν2 σν2 − σ2 σ12 ZN AG10,N ZN , + 1 NT

G1,N =

where AG1,N = MN Q1,N PHR,N Q1,N , AG2,N = MN Q1,N PHR,N Q0,N , AG3,N = MN Q0,N PHR,N Q1,N , AG5,N = AG6,N =

AG4,N = MN Q0,N PHR,N Q0,N , MN Q1,N PHR,N Q1,N MN , MN Q1,N PHR,N Q0,N MN ,

AG7,N = MN Q0,N PHR,N Q1,N MN , AG8,N = MN Q0,N PHR,N Q0,N MN ,

AG9,N = INT − ρMN Q0,N PHR,N Q1,N (INT − ρMN ),

AG10,N = INT − ρMN Q0,N PHR,N Q0,N (INT − ρMN ).


71

72

J. Mutl and M. Pfaffermayr Analogically, )(NT)−1 ZN MN PHQ,N ZN + (ρ − ρ )(NT)−1 ZN PHQ,N MN ZN W 1,N = (ρ − ρ + ( ρ 2 − ρ 2 )(NT)−1 ZN MN PHQ,N MN ZN

ρ 2 − ρ 2 )(NT)−1 ZN AW 2,N ZN , = (ρ − ρ )(NT)−1 ZN AW 1,N ZN + ZN AW 1,N ZN + (

where AW 1,N = MN PHQ,N , AW 2,N = MN PHQ,N MN . In light of Assumptions 2.2, 3.2 and 3.3, the row and column sums of the matrices AGi,N and AWj ,N are uniformly bounded in absolute value. By Lemma C3 in Mutl (2006) it follows that (NT)−1 XN AGi,N XN , (NT)−1 XN AW i,N XN , (NT)−1 DN AGi,N DN and (NT)−1 DN AW i,N DN have elements uniformly bounded in absolute value. Furthermore, by Lemma B2 in Mutl (2006), using Assumptions 2.1, 2.2 and 3.1, the elements of uN have uniformly bounded fourth moments. Therefore, the variance of (NT)−1 uN AGi,N uN is uniformly bounded in absolute value. Recall that ZN = (DN , WN yN , XN ), where the solution of the model yields yN = (INT − λWN )−1 (XN β + DN γ + uN ). Thus, the elements of

(NT)−2 E ZN AGi,N ZN ZN AGi,N ZN and

(NT)−2 E ZN AW i,N ZN ZN AW i,N ZN are uniformly bounded in absolute value. Because ρ , σv and σ1 are consistent estimators, it follows that p. p. G1,N → 0 and W 1,N → 0 as N → ∞. Next we use similar derivations to obtain G2,N =

ρ−ρ 1/2 (NT) +

ZN AG1,N uN +

ρ−ρ (NT)1/2

σν σν σ2 ZN AG2,N uN + ZN AG3,N uN + ν2 ZN AG4,N uN σ1 σ1 σ1

ZN AG1,N uN +

σν σν σ2 ZN AG2,N uN + ZN AG3,N uN + ν2 ZN AG4,N uN σ1 σ1 σ1

ρ 2 − ρ 2 σν σν σν2 Z A u + Z A u + Z A u + Z A u G6,N N G7,N N G8,N N N G5,N N (NT)1/2 σ1 N σ1 N σ12 N σν σν − σ1 σ1 ZN AG9,N uN + ZN AG9,N uN + (NT)1/2 σν2 σν2 − σ2 σ2 + 1 1/21 ZN AG10,N uN (NT) +

and W 2,N =

ρ 2 − ρ 2 ρ−ρ ZN AW 1,N uN + ZN AW 1,N uN + Z AW 2,N uN , 1/2 (NT) (NT)1/2 N C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


73

where ZN = (DN , WN yN , XN ), with yN = (INT − λWN )−1 (XN β + DN γ + uN ). Observe that the matrices AGi,N and AWj ,N , as well as the vectors XN and DN , are non-stochastic. We have that

E (NT)−1/2 β XN AGi,N uN

E (NT)−1/2 γ DN AGi,N uN

−1

E (NT)−1/2 β XN INT − λWN WN AGi,N uN

−1

E (NT)−1/2 γ DN INT − λWN WN AGi,N uN

E (NT)−1/2 β XN AWj ,N uN

E (NT)−1/2 γ DN AWj ,N uN

−1

WN AWj ,N uN E (NT)−1/2 β XN INT − λWN

−1

AWj ,N WN uN E (NT)−1/2 γ DN INT − λWN

= 0, = 0, = 0, = 0, = 0, = 0, = 0, = 0.

and

−1

E (NT)−1/2 uN INT − λWN WN AGi,N uN −1

−1

σν2 tr INT − λWN WN AGi,N (INT − λMN )−1 Q0,N INT − λMN 1/2 (NT) −1

−1

σ12 , tr INT − λWN WN AGi,N (INT − λMN )−1 Q1,N INT − λMN + (NT)1/2

−1

E (NT)−1/2 uN INT − λWN WN AWj ,N uN =

=

−1

−1

σν2 (NT)−1/2 tr INT − λWN WN AWj ,N (INT − λMN )−1 Q0,N INT − λMN 1/2 (NT) −1

−1

σ12 , (NT)−1/2 tr INT − λWN WN AWj ,N (INT − λMN )−1 Q1,N INT − λMN + (NT)1/2

which by Assumptions 2.2, 3.2 and 3.3 are uniformly bounded in absolute value. Therefore, the elements of E[(NT)−1/2 ZN AGj,N uN ] and E[(NT)−1/2 ZN AWj ,N uN ] are uniformly bounded in absolute value. Next consider the corresponding variance–covariance matrices (NT)−1 β XN AGi,N u,N AGi,N XN β, (NT)−1 γ DN AGi,N u,N AGi,N DN γ ,

−1 WN AGi,N u,N AGi,N WN (INT − λWN )−1 XN β, (NT)−1 β XN INT − λWN

−1 WN AGi,N u,N AGi,N WN (INT − λWN )−1 DN γ , (NT)−1 γ DN INT − λWN and (NT)−1 β XN AWj ,N u,N AWj ,N XN β, (NT)−1 γ DN AWj ,N u,N AWj ,N DN γ ,

−1 (NT)−1 β XN INT − λWN WN AWj ,N u,N AWj ,N WN (INT − λWN )−1 XN β,

−1 (NT)−1 γ DN INT − λWN WN AWj ,N u,N AWj ,N WN (INT − λWN )−1 DN γ . C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

74


Finally, the two remaining scalar variances are given by −1

−1

−1

WN AGi,N WN (INT − λWN )−1 uN uN INT − λWN WN AGi,N WN INT − λWN uN (NT)−1 E uN INT − λWN

−1 −1 −1 (INT − λWN ) WN AGi,N = (NT) E tr INT − λMN

−1

−1 −1 · WN (INT − λWN ) (INT − λMN )−1 εN εN INT − λMN WN INT − λWN

−1 −1 · AGi,N WN (INT − λWN ) (INT − λMN ) εN εN

= (NT)−1 E tr B1,Gi,N εN εN B2,Gi,N εN εN , and

−1

−1

(NT)−1 E uN INT − λWN WN AWj ,N WN (INT − λWN )−1 uN uN INT − λWN WN AWj ,N WN (INT − λWN )−1 uN

= (NT)−1 E tr B1,Wj ,N εN εN B2,Wj ,N εN εN , where the B matrices are products of A, WN , WN , (INT − λWN )−1 and (INT − λWN )−1 matrices and by Assumptions 2.2, 3.2 and 3.3 have row and column sums uniformly bounded in absolute value. Hence, the row and column sums of the variance–covariance matrices are uniformly bounded in absolute value p. and, therefore, (NT)−1/2 ZN · AGj,N uN = OP (1) and (NT)−1/2 ZN AWj ,N uN = OP (1). Thus, G2,N → 0 and p. , σν and σ1 are consistent estimators. W 2,N → 0 as N → ∞, because ρ √ d Proof of Lemma 5.1: Part (a): From Theorem 3.1 it follows that NT( θ GLS,N − θ W ,N ) → N (0, ), where GLS GLS IK = (IK , −IK ) = W − GLS . GLS W −IK To show that W − GLS is positive definite, recall that

∗ −1 W = σν2 lim N T Z∗ , N PHQ,N ZN N→∞

−1 ZN PHR,N · (0K×L , IK . GLS = σν2 (0K×L , IK ) lim N T ZN N→∞

Because the instrument sets are given by HQ,N = Q0,N G0,N and HR,N = HQ,N , HP ,N , one can show that ZN PHR,N ZN , we have PHR,N = PHP ,N + PHQ,N and, hence, denoting SR,N = SR,N,11 SR,N,21 SR,N = , SR,N,12 SR,N,22 where

SR,N,11 = DN AN PHQ,N + φ 2 PHP ,N AN DN ,

SR,N,12 = SR,N,21 = (WN yN , XN ) AN PHQ,N + φ 2 PHP ,N AN DN ,

SR,N,22 = (WN yN , XN ) AN PHQ,N + φ 2 PHP ,N AN (WN yN , XN ),

with AN = INT − ρMN and φ = σσν1 . We are interested in the lower-right K × K block of the inverse of SR,N , which we denote by S22 R,N . Using the formula for partitioned inverses (see e.g. Section 0.7.3 in Horn and Johnson, 1985) after some manipulation yields −1

−1 S22 R,N = SR,N,22 − SR,N,21 SR,N,11 SR,N,12

# = (WN yN , XN ) AN PHQ,N + φ 2 PHP ,N AN (WN yN , XN )

−1 $−1 DN AN · PHP ,N AN (WN yN , XN ) . + φ 2 (WN yN , XN ) AN PHP ,N INT − AN DN DN AN AN DN C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

75

The Hausman test in a Cliff and Ord panel model ∗ Defining SQ,N = Z∗ N PHQ,N ZN we have

SQ,N = (WN yN , XN ) AN PHQ,N AN (WN yN , XN ) , so that

−1 22 −1 , W − GLS = σν2 lim N T S−1 Q,N − SR,N = lim N T CN − (CN + DN ) N→∞

N→∞

where CN = (WN yN , XN ) AN PHQ,N AN (WN yN , XN )

−1 DN = φ 2 (WN yN , XN ) AN PHP ,N INT − AN DN DN AN AN DN DN AN · PHP ,N AN (WN yN , XN ) . It follows that (Greene, 2008, p. 822)

−1 −1 −1 −1 −1 CN (CN + DN )−1 = C−1 N − CN DN + CN and, hence,

−1 −1 −1 −1 CN , W − GLS = σν2 lim C−1 N DN + CN N→∞

which is clearly positive definite. 2 2 σν,N , σ1,N ) is a consistent estimator, it follows that Part (b): Because ϑ = ( ρN ,

p. N MN HQ,N − ZQ,N INT − ρMN HQ,N → 0, ZQ,N INT − ρ p. −1/2 −1/2 ϑ N )HR,N − ZN u,N (ϑ)HR,N → 0. ZN u,N (

The claim in Lemma 5.1 then follows from Assumptions 3.2 and 3.3. p.

N − HN → 0 follows directly from Lemma 5.1 and Proposition 4.2. Given Proof of Theorem 5.1: H Theorem 3.1, the (true) Hausman test statistics HN is asymptotically distributed as a quadratic form of normally distributed random variables and, hence, it has an asymptotic χ 2 distribution. 0 1 θ GLS,N the algebraic expressions for the GLS estimators Proof of Proposition 5.1: We denote by θ GLS,N and under the null and the alternative hypothesis, respectively; that is, when uN is given by either Assumption 3.1 (RE) or Assumption 5.1 (FE). Analogically, we denote the algebraic expressions for the Hausman test statistics under the two hypotheses by HN0 and HN1 . We now have

−1 1 −1/2 θ W ,N = (0K×L , IK ) ZN (ϑ) ZN (ϑ)u,N [Q1,N XN π + (ιT ⊗ IN )ξ N + ν N ] − θ W ,N ZN (ϑ) · θ GLS,N − = θ GLS,N − θ W ,N + CN π , 0

where the non-stochastic matrix CN is given by

−1 −1/2 ZN (ϑ)u,N Q1,N XN . CN = (0K×L , IK ) ZN (ϑ) ZN (ϑ) Hence, the test statistics under the alternative can be written as

1

1 θ GLS,N − θ GLS,N − θ W ,N ( W − GLS )−1 θ W ,N HN1 = N T

0 = HN0 + N T π CN ( W − GLS )−1 CN π − 2 θ GLS,N − θ W ,N . Observe that Assumption 5.1 contains the random effects assumption for the independent component 0 d. of the individual effects. Therefore, by Theorem 5.1, we have HN0 → χ 2 (K) and (NT)−1/2 ( θ − GLS,N


76


d. θ W ,N ) → N (0, W − GLS ), where by Lemma 3.1, ( W − GLS ) is a positive definite matrix. Thus, also 0 θ GLS,N − θ W ,N ) = 0. Furthermore, it follows from Assumption 3.2 and the condition stipulated p limN→∞ ( in Proposition 5.1 that the limit of CN exists and is given as

−1 −1/2 ZN (ϑ)u,N Q1,N XN ZN (ϑ) ZN (ϑ) C = lim CN = lim (0K×L , IK ) N→∞ N→∞ −1

= (0K×L , IK ) MHR ZM−1 MHR ZM−1 , HR HR MHR Z HR HR MHR X

where the matrices MHR Z and MHR X have full column rank. As a result, the quadratic form defined by the matrix C ( W − GLS )−1 C is positive definite. Observe now that

0 HN1 = HN0 + (NT)π CN ( W − GLS )−1 CN π − 2N T π C N ( W − GLS )−1 θ GLS,N − θ W ,N √ = HN0 + (NT)π CN ( W − GLS )−1 CN π − 2 NT[π C + o(1)]( W − GLS )−1 op (1) √ = op (1) + (NT)πC N ( W − GLS )−1 CN π − NTo(1)op (1) √ = (NT)π CN ( W − GLS )−1 CN π + op ( NT), where lim CN ( W − GLS )−1 CN = C ( W − GLS )−1 C,

N→∞

is a positive definite matrix. Therefore, for any γ > 0, we have

lim P HN1 > h = 1. N→∞


The


Fully modified narrow-band least squares estimation of weak fractional cointegration M ORTEN Ø RREGAARD N IELSEN † AND P ER F REDERIKSEN ‡ †

Department of Economics, Dunning Hall, Queen’s University, Kingston, Ontario, Canada K7L 3N6. E-mail: [email protected] ‡

Nordea Markets, Strandgade 3, 0900 Copenhagen C, Denmark. E-mail: [email protected]

First version received: June 2008; final version accepted: May 2010

Summary We consider estimation of the cointegrating relation in the weak fractional cointegration model, where the strength of the cointegrating relation (difference in memory parameters) is less than one-half. A special case is the stationary fractional cointegration model, which has found important applications recently, especially in financial economics. Previous research on this model has considered a semi-parametric narrow-band least squares (NBLS) estimator in the frequency domain, but in the stationary case its asymptotic distribution has been derived only under a condition of non-coherence between regressors and errors at the zero frequency. We show that in the absence of this condition, the NBLS estimator is asymptotically biased, and also that the bias can be consistently estimated. Consequently, we introduce a fully modified NBLS estimator which eliminates the bias, and indeed enjoys a faster rate of convergence than NBLS in general. We also show that local Whittle estimation of the integration order of the errors can be conducted consistently based on NBLS residuals, but the estimator has the same asymptotic distribution as if the errors were observed only under the condition of non-coherence. Furthermore, compared to much previous research, the development of the asymptotic distribution theory is based on a different spectral density representation, which is relevant for multivariate fractionally integrated processes, and the use of this representation is shown to result in lower asymptotic bias and variance of the narrow-band estimators. We present simulation evidence and a series of empirical illustrations to demonstrate the feasibility and empirical relevance of our methodology. Keywords: Fractional cointegration, Frequency domain, Fully modified estimation, Long memory, Semi-parametric.

1. INTRODUCTION Recently, the concept of fractional cointegration has attracted increasing attention from both theoretical and empirical researchers in economics and finance. In this theory, a p-vector time series zt is said to be cointegrated if the elements of zt are integrated of order d, denoted I(d), but there exists a linear combination that is I (d − δ) with δ > 0. Originally, the concept of cointegration does not restrict d and δ to be integers, for example Granger (1981), but estimation C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society. Published by Blackwell Publishing Ltd, 9600


78

M. Ø. Nielsen and P. Frederiksen

methods have been developed mostly for the so-called I (1) − I (0) cointegration, where it is assumed that d = δ = 1. For a precise statement, a covariance stationary time series xt ∈ I (d), d < 1/2, if d xt = (1 − L)d xt = vt , where vt ∈ I (0); that is, has continuous spectral density that is bounded and bounded away from zero at all frequencies. In this case, {xt } has spectral density f (λ) ∼ gλ−2d as λ → 0+ ,

(1.1)

where g ∈ (0, ∞) is a constant and the symbol ‘∼’ means that the ratio of the left- and righthand sides tends to one in the limit. The parameter d determines the memory of the process: if d ∈ (0, 1/2) the process is covariance stationary with long memory, but if d = 0 the process has only weak dependence. For surveys see, for example, Baillie (1996) and Robinson (2003). We consider estimation of the single-equation cointegrating regression yt = α + β xt + ut ,

t = 1, . . . , T ,

(1.2)

where both the regressors and the errors have long memory, and can in fact be non-stationary (d > 1/2), but the errors have less memory than the regressors; that is, where xt ∈ I (dx ) and ut ∈ I (du ) with dx > du ≥ 0. In particular, we consider the model with dx − du < 1/2 which is termed weak fractional cointegration in Hualde and Robinson (2010). To accommodate potential non-stationarity, we let γ ≥ 0 and consider1 γ yt = α + β γ xt + γ ut ,

t = 1, . . . , T .

(1.3)

Now γ xt ∈ I (dx − γ ) and γ ut ∈ I (du − γ ), and γ is any real number which transforms a potentially non-stationary model (like (1.2)) into one with stationary regressors (so that dx − γ < 1/2), where, additionally, the cointegrating error in the transformed model has non-negative memory (so that du − γ ≥ 0). The interpretation is that γ is a user-chosen number whose choice affects the estimation procedure, and it is connected to the model (1.2) which generates the data only through the two requirements 1/2 > dx − γ > du − γ ≥ 0. That way xt and possibly also ut may be non-stationary.2 Different choices of γ lead to different estimators, but one choice in particular leads to the best estimator in a GLS sense. This is γ = du which, given that we are under weak cointegration, is always an appropriate choice in the sense that dx − γ < 1/2 and du − γ ≥ 0. Of course, du is unknown, but a feasible version of the estimator with γ = du may be implemented by replacing du with a suitable estimator, dû . Another interesting special case is γ = 0 in which case we require dx < 1/2, termed stationary fractional cointegration by Robinson (1994). After appropriate differencing our model is stationary, so a comparison with the standard time-series regression model with weakly dependent regressors is natural. It is well known that, in the standard case, under a wide variety of regularity conditions, the ordinary least squares (OLS) estimator of β in (1.2) is asymptotically normal; see, for example, Hannan (1979). The new complication is that, as pointed out by Robinson (1994) and Robinson and Hidalgo (1997), when the regressors and the errors both have long memory and are possibly non-orthogonal, the OLS estimator in (1.2) is in general no longer consistent. To deal with this issue, Robinson 1 There is a slight abuse of notation in (1.3) since γ α is a constant which we also call α. However, because our estimators are functions of the periodogram at non-zero frequencies only, this is irrelevant. 2 When defining non-stationary fractionally integrated processes there is a choice of type I or type II variants. In our model, either choice will lead to identical first-order asymptotic results; see Robinson (2005). Therefore, we do not consider this issue further, and below state our assumptions in terms of γ xt and γ ut which are stationary. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

Fully modified narrow-band least squares

79

(1994) proposed a semi-parametric narrow-band least squares (NBLS) estimator in the frequency domain (as opposed to a fixed band estimator as considered by, for example, Phillips (1991) in a cointegration context). The NBLS estimator assumes only a multivariate version of (1.1), and essentially performs OLS on a degenerating band of frequencies around the origin. The consistency of the estimator in the stationary case was proved by Robinson (1994). Christensen and Nielsen (2006) showed that its asymptotic distribution is normal when the collective memory of the regressors and the error term is less than 1/2, that is when dx + du < 1/2, and under the condition that the regressors and the errors have zero coherence at the origin. In contrast, Robinson and Marinucci (2003) consider several cases where the regressors are non-stationary fractionally integrated and the limiting distributions for the NBLS estimator involve fractional Brownian motion, and Chen and Hurvich (2003) add deterministic trends. We follow a semi-parametric approach characterized by assuming only a local model such as (1.1) for the spectral density, and using a degenerating part of the periodogram around the origin for estimation. This approach has the advantage of being invariant to short-term dynamics (and mean terms since the zero frequency is usually left out). Specifically, a local Whittle estimator of the memory parameter d based on maximization of a local Whittle approximation to the likelihood using (1.1) has been developed by Künsch (1987) and Robinson (1995). Of course, a fully parametric estimator would be more efficient, but is inconsistent if the parametric model is misspecified. The methods described earlier are combined by Marinucci and Robinson (2001b) and Christensen and Nielsen (2006), who suggest conducting a (stationary) fractional cointegration analysis in several steps. First, the integration orders of the observed data are estimated by the local Whittle estimator. Secondly, the NBLS estimator of the cointegrating vector is calculated, and finally the integration order of the residuals is estimated assuming that the local Whittle approach is equally valid when based on residuals. Hypothesis testing is then conducted on du as if ut were observed, and on β as if du (which enters in the limiting distribution of the NBLS estimator) were known. Moreover, the distribution theory for the NBLS estimator developed by Christensen and Nielsen (2006) assumes that the long-run (zero frequency) coherence between the regressors and the errors is zero. In this paper, we extend the stationary setting of Marinucci and Robinson (2001b) and Christensen and Nielsen (2006) to that of weak fractional cointegration. We develop the asymptotic distribution theory based on a different spectral density representation, which is relevant for multivariate fractionally integrated processes, and the use of this representation is shown to result in lower asymptotic bias and variance of the narrow-band estimators. We show that in the non-zero coherence case a bias term appears in the mean of the asymptotic normal distribution of the NBLS estimator. The bias term is proportional to the square-root of the bandwidth, with factor of proportionality depending on the integration orders and the coherence at frequency zero. However, we show that the bias can be estimated and hence removed by a fully modified type procedure in the spirit of Phillips and Hansen (1990). The result is a fully modified NBLS (FMNBLS) estimator, which has no asymptotic bias and the same asymptotic variance as the NBLS estimator. However, the FMNBLS estimator will have a better rate of convergence in general; that is, it will have the same rate as the NBLS estimator under non-coherence as in Christensen and Nielsen (2006). We also consider inference on the integration order of the error term in the cointegrating relation, and show that in the case of stationary errors it can be consistently estimated by the local Whittle estimator based on the residuals from an NBLS regression. However, the local Whittle estimator converges at a slower rate than if the errors were observed except if there is no C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

80


long-run coherence between regressors and errors. In the latter case, the asymptotic distribution theory for the local Whittle estimator is unaffected by the fact that the estimator is based on residuals. Extensions of the well-known fully modified least squares procedure of Phillips and Hansen (1990) to non-stationary fractional cointegration have been examined by Dolado and Marmol (1996), Kim and Phillips (2001) and Davidson (2004) in parametric frameworks. An alternative fully modified procedure for the I (1) − I (0) model was suggested by Marinucci and Robinson (2001a), who considered the estimator of Phillips and Hansen (1990) based on NBLS residuals rather than OLS residuals. However, the approach taken in this paper is more direct. We derive an expression for the asymptotic bias term, which depends on the integration orders of the regressors and the errors and also on the coherence matrix at the zero frequency. We show that under appropriate conditions on the bandwidth parameters, the bias term can be estimated consistently, for example by running an auxiliary NBLS regression, and this can be used to modify the initial NBLS estimator to eliminate the bias. In a simulation study, we document the finite sample feasibility of the proposed FMNBLS estimator. The simulations demonstrate the superiority in terms of bias of FMNBLS relative to NBLS in the presence of non-zero long-run coherence between the regressor and the error, which comes at the cost of an increased finite sample variance. In terms of RMSE, FMNBLS also clearly outperforms NBLS in most cases with non-zero long-run coherence. To demonstrate the empirical relevance of our methodology, we include several brief empirical illustrations. We first revisit the long-run unbiasedness question in the implied–realized volatility relation. We then consider the relation between inflation rates of European Union countries, exemplified by the harmonized consumer price indices of France and Spain. Finally, we investigate the relationship between the volatilities of the General Electric stock and two stock indices. The remainder of the paper is laid out as follows. Next, we describe NBLS estimation of (1.2) and (1.3) and derive the relevant asymptotic distribution theory. We also discuss inference using the local Whittle estimator of the integration order of the errors when the errors are not observed and residuals are used instead. In Section 3, we consider the FMNBLS modification to the NBLS estimator. Sections 4 and 5 present simulation evidence and empirical illustrations, respectively, and Section 6 offers some concluding remarks. All proofs are gathered in the appendices.

2. NARROW-BAND LEAST SQUARES ESTIMATION We begin with some remarks about the spectral representation of multivariate long memory models. Suppose the spectral density of the covariance stationary process wt = (γ xt , γ ut ) is ¯ −1 as λ → 0+ , f (λ) ∼ (λ)−1 G(λ)

(2.1)

where the bar denotes complex conjugation, (λ) = diag(e−iπd1 /2 λd1 , . . . , e−iπdp /2 λdp ), and G is a real, symmetric, positive-definite matrix. The spectral density representation (2.1) is C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

81


motivated by the multivariate stationary fractionally integrated model with da ∈ (−1/2, 1/2), a = 1, . . . , p : ⎡ ⎢ ⎢ ⎣

(1 − L)d1 .. 0

⎤⎡

⎤ ⎡ w1t − Ew1t v1t ⎥⎢ ⎥ ⎢ . . ⎥⎢ ⎥=⎢ . .. ⎦⎣ ⎦ ⎣ .

0 . (1 − L)

dp

wpt − Ewpt

⎤ ⎥ ⎥, ⎦

t = 1, . . . , T ,

(2.2)

vpt

where vt = (v1t , . . . , vpt ) is a covariance stationary process with spectral density matrix that is finite and bounded away from zero (in the sense of positive-definite matrices) at all frequencies; that is, vt is I (0). When vt is an ARMA model, wt is a multivariate fractional ARIMA model. This class of models is very popular in both theoretical and applied time-series analysis. Since (1 − eiλ )d = λd e−iπd/2 (1 + O(λ)) as λ → 0, the representation (2.1) follows by defining G = limλ→0 fv (λ). A typical element of (2.1) is fab (λ) ∼ Gab λ−da −db eiπ(da −db )/2 as λ → 0+ ,

a, b = 1, . . . , p,

where da and db appear both in the power decay and in the phase shift. Note that fab (λ) differs from the simpler representation hab (λ) ∼ Gab λ−da −db as λ → 0+ ,

a, b = 1, . . . , p,

(2.3)

applied, for example, by Robinson (1995b) and Lobato and Robinson (1998) for inference on the integration orders and by Robinson and Marinucci (2003) and Christensen and Nielsen (2006) in the context of stationary fractional cointegration. The most important difference is that f (λ) has a non-zero complex part even at the origin unless da = d for all a = 1, . . . , p, and neglecting the complex part is a source of misspecification. For a detailed comparison of f (λ) and h(λ), see Shimotsu (2007) and Robinson (2008) who derive multivariate local Whittle estimators based on (2.1). We remark here that the assumptions of Christensen and Nielsen (2006) (and hence also those of, for example, Lobato and Robinson (1998) and Lobato (1999)) and much subsequent research, unfortunately, appear incompatible. The reason is that the real-valued cross-spectral density (2.3) imposed in their Assumption A implies that cross-autocorrelations are symmetric with respect to time, which implies a two-sided moving average with equal lead and lag weights and not a one-sided moving average as imposed in their Assumption B. The assumptions of Christensen and Nielsen (2006) (and subsequent research on narrow-band estimation of stationary fractional cointegration) can be made compatible, however, in light of their condition that Gap = Gpa = 0, by assuming that the integration orders of the regressors are all equal; that is, that da = dx for a = 1, . . . , p − 1 and dx > dp . In that special case, the representations (2.1) and (2.3) are equivalent and their results correct. To consider frequency domain least squares inference on β in the cointegrating relation (1.2) or the pre-differenced regression (1.3), we define the cross-periodogram matrix between the observed vectors {γ qt , t = 1, . . . , T } and {γ rt , t = 1, . . . , T }, Iqr (γ , λ) =

T T 1 γ ( qt )(γ rs ) e−i(t−s)λ . 2π T t=1 s=1


(2.4)

82


We then form the discretely averaged co-periodogram l 2π ˆ Fqr (γ , k, l) = Re(Iqr (γ , λj )), T j =k

0 ≤ k ≤ l ≤ T − 1,

(2.5)

for λj = 2πj /T . By setting k ≥ 1 and thus excluding the zero frequency, the estimator becomes invariant to non-zero means; that is, invariant to α in (1.2) and (1.3). With Fˆ defined in (2.5), we consider the frequency domain least squares estimator −1 βˆm (γ ) = Fˆxx (γ , 1, m)Fˆxy (γ , 1, m)

(2.6)

of β in the regression (1.3). Notice that, by this definition, βˆT −1 (0) is algebraically identical to the usual OLS estimator of β in (1.2) with allowance for a non-zero mean. On the other hand, if m 1 + → 0 as T → ∞, m T

(2.7)

then βˆm (γ ) is an NBLS estimator using only a degenerating band of frequencies near the origin. We need m to tend to infinity to gather information, but we also need to remain in a neighbourhood of zero where we have assumed knowledge about the spectral density, so m/T must tend to zero. When γ = 0, the NBLS estimator βˆm (0) in (2.6) is the estimator defined by Robinson (1994). On the other hand, with γ = du , βˆm (du ) is a GLS-type estimator similar to the one discussed in Nielsen (2005), who also shows that the latter in fact also corresponds to a local Whittle quasimaximum likelihood estimator of β. Of course βˆm (du ) is infeasible, but a feasible version will be discussed in the next section. To prove our main results we assume, with obvious implications for γ yt , the following conditions on wt = (γ xt , γ ut ) and the bandwidth parameter. Here and throughout, the memory parameters da , db and dp are used to refer to wat , wbt and wpt , respectively. That is, the memory parameters da , a = 1, . . . , p, belong to the transformed regression (1.3) and are related to the original memory parameters dx,a , du , and the pre-differencing parameter γ by da = dx,a − γ , a = 1, . . . , p − 1 and dp = du − γ . A SSUMPTION 2.1. The spectral density matrix of wt given in (2.1) with typical element fab (λ), the cross-spectral density between wat and wbt , satisfies fab (λ) − Gab λ−da −db ei(π−λ)(da −db )/2 = O(λφ−da −db ) as λ → 0+ , a, b = 1, . . . , p, (2.8) for some φ ∈ (0, 2]. The matrix G is positive definite. A SSUMPTION 2.2. The memory parameters satisfy 0 ≤ da < 1/2 for a = 1, . . . , p, da + dp < 1/2 for a = 1, . . . , p − 1, and min1≤a≤p−1 da − dp = min1≤a≤p−1 dx,a − du = δmin > 0.

A SSUMPTION 2.3. wt is a linear process, wt = μ + ∞ j =0 Aj εt−j , with square sum ∞ 2 mable coefficient matrices, j =0 Aj < ∞. The innovations εt satisfy E(εt |Ft−1 ) = 0, E(εt εt |Ft−1 ) = Ip , E(εt ⊗ εt εt |Ft−1 ) = μ3 and E(εt εt ⊗ εt εt |Ft−1 ) = μ4 , almost surely, where μ3 and μ4 are non-stochastic, finite and do not depend on t, and Ft = σ ({εs , s ≤ t}). C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

83


A SSUMPTION 2.4. λ → 0+ ,

Let Aa (λ) denote the ath row of A(λ) =

∞

j =0

Aj eij λ . Then, as

∂Aa (λ) = O(λ−1 ||Aa (λ)||), a = 1, . . . , p. ∂λ

A SSUMPTION 2.5.

The bandwidth parameter m0 = m0 (T ) satisfies 1+2 min(1,φ)

m 1 + 0 2 min(1,φ) → 0 as T → ∞. m0 T Our assumptions are a multivariate generalization of those in Robinson (1994, 1995a); see also Lobato (1999) and Christensen and Nielsen (2006). Since our assumptions are semiparametric in nature they naturally differ from those employed by, for example, Robinson and Hidalgo (1997) in their parametric setup, and are at least in some respects weaker than standard parametric assumptions. In particular, we avoid standard assumptions (from stationary timeseries regression) of independence or uncorrelatedness between xt and ut as well as complete and correct specification of f (λ). The first part of Assumption 2.1 specializes (2.1) by imposing smoothness conditions on the spectral density matrix of wt commonly employed in the literature. They are satisfied with φ = 2 if, for instance, wt is a vector fractional ARIMA process. The more precise approximation offered by Assumption 2.1 relative to (2.1) reflects the approximation (1 − eiλ )d = |2 sin(λ/2)|d e−i(π−λ)d/2 = λd e−i(π−λ)d/2 (1 + O(λ2 )) as λ → 0; see Shimotsu (2007). The positive-definiteness condition on G is a no-multicollinearity or no-cointegration condition within the components of xt , which is typical in single-equation cointegration models and in regression models. The single-equation cointegrating regression model (1.2) is similar to the usual cointegrating regression model in the I (1) − I (0) case, and the nature of the regression setup is subject to the same advantages and disadvantages. Two important issues, given a set of more than two variables, are to justify the single-equation regression and to justify the choice of the left-handside variable. For the latter issue, it is likely that economic theory can be used as guidance and in any case this should be done on a case-by-case basis. For the former issue, since cointegration among the regressors is ruled out by Assumption 2.1 (as is standard in cointegrating regression models), in practice one would have to establish that only one cointegrating relationship exists among the given set of variables. This could be done, for example, by the approach of Robinson and Yajima (2002) as in the empirical application in Section 5.3. Much of the previous literature on semi-parametric frequency domain inference in the fractional cointegration model distinguish (either explicitly or implicitly) between cases of coherence and non-coherence between the regressors and the error process at the zero frequency; for example, Robinson and Marinucci (2003), Christensen and Nielsen (2006) and Robinson (2008). In the present notation this condition is Gap = Gpa = 0, for a = 1, . . . , p − 1. Indeed, in the stationary case, asymptotic distribution theory for the NBLS estimator is only available in the case with non-coherence at the zero frequency; see Christensen and Nielsen (2006). Our assumptions avoid the non-coherence condition and thus allow correlation between the errors and regressors at any frequency. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

84


The conditions on da in Assumption 2.2 translate into conditions on γ given dx,a and du . In particular, γ satisfies 1 (2.9) max dx,a + du − 1/2 < γ ≤ du . 2 1≤a≤p−1 For instance, we find that γ = 0 is permitted only in the stationary case when also dx,a + du < 1/2, a = 1, . . . , p − 1. In the important GLS case with γ = du , Assumption 2.2 reduces to 0 < δmin =

min dx,a − du ≤ max dx,a − du < 1/2,

1≤a≤p−1

1≤a≤p−1

(2.10)

which is exactly the weak fractional cointegration assumption such that the condition da + dp < 1/2 is redundant in that case because dp = 0. In view of the results from, for example, Fox and Taqqu (1986, Prop. 1) and Lobato and Robinson (1996), showing that quadratic forms of long memory processes with squaresummable autocovariances (2d < 1/2) are asymptotically Gaussian, we assume that da + dp < 1/2 in Assumption 2.2. The last condition of Assumption 2.2 is the essential assumption of cointegration, with δmin denoting the strength of the cointegrating relation. Assumptions 2.3 and 2.4 follow Robinson (1995a) and Lobato (1999) in imposing a linear structure on wt with square summable coefficients and martingale difference innovations with finite fourth moments. The assumption of constant conditional variance for the innovations could presumably be relaxed by assuming boundedness of higher moments as in Robinson and Henry (1999). Under Assumption 2.3 we can write the spectral density matrix of wt as f (λ) =

1 A(λ)A∗ (λ), 2π

(2.11)

where the asterisk denotes transposed complex conjugation. Assumption 2.4 is a smoothness condition imposing differentiability of the spectral density near the origin, analogous to those imposed on the spectral density at any frequency in parametric frameworks; see, for example, Fox and Taqqu (1986). The condition is satisfied, for example, by fractional ARIMA models. The statements of Assumptions 2.1 and 2.4 are made in the frequency domain, whereas the statement of Assumption 2.3 is in the time domain, which follows the tradition in the literature on semi-parametric estimation in long memory models. Clearly, the assumptions are closely related, and in particular the matrix G in Assumption 2.1 is a function of the lag weights {Aj , j ≥ 0} from Assumption 2.3. The connection between the representations (2.1) and (2.3) (or Assumption 2.1) and the lag weights in the linear process (Assumption 2.3) is explored in Theorems 1 and 2 of Robinson (2008). In particular, it is shown there that our Assumptions 2.1 and 2.3 are compatible.3 Finally, Assumption 2.5 restricts the expansion rate of the bandwidth parameter m0 = m0 (T ). The bandwidth is required to tend to infinity for consistency, but at a slower rate than T to remain in a neighbourhood of the origin, where we have assumed some knowledge of the form of the spectral density. When φ is high, (2.8) is a better approximation to (2.11) as λ → 0+ , and hence (by the second term of Assumption 2.5) a higher expansion rate of the bandwidth can be chosen. The weakest constraint is implied by φ ≥ 1, in which case the condition is m0 = o(T 2/3 ).

3 Note that we could alternatively write our Assumptions 2.1–2.4 in terms of the model (2.2) and the errors v , as in, t for example, Shimotsu and Phillips (2005).



85

A slightly weaker bandwidth condition was employed by Christensen and Nielsen (2006) due to their assumption of real-valued spectral density at the origin. We next derive the distribution of the NBLS estimator in the fractional cointegration model (1.2)–(1.3). This generalizes the consistency (with rates) result of Robinson and Marinucci (2003) (when γ = 0) and the asymptotic normality results of Nielsen (2005) and Christensen and Nielsen (2006) (who assumed non-coherence at the origin and a different spectral density model). T HEOREM 2.1. satisfies

Let Assumptions 2.1–2.5 be satisfied. Then the NBLS estimator βˆm0 (γ ) in (2.6)

d

√ du −1 ˆ m0 λm0 m0 (βm0 (γ ) − β) − K(γ )−1 H (γ ) → N 0, K(γ )−1 J (γ )K(γ )−1 as T → ∞, (2.12) d

d

where m = diag(λmx,1 , . . . , λmx,p−1 ) and, for a, b = 1, . . . , p − 1, K(γ ) = (Kab (γ )), H (γ ) = (Ha (γ )) and J (γ ) = (Jab (γ )) are given by π Gab cos (dx,a − dx,b ) , 1 − dx,a − dx,b + 2γ 2 Gap π cos (dx,a − du ) , Ha (γ ) = 1 − dx,a − du + 2γ 2 π Gap Gbp cos (dx,a + dx,b − 2du ) Jab (γ ) = 2(1 − dx,a − dx,b − 2du + 4γ ) 2 π Gab Gpp + cos (dx,a − dx,b ) . 2(1 − dx,a − dx,b − 2du + 4γ ) 2

Kab (γ ) =

Theorem 2.1 refines the results of Nielsen (2005) and Christensen and Nielsen (2006) in three ways: first, our result uses the representation (2.1) of the multivariate spectral density; secondly, we allow for non-zero coherence at the origin; and thirdly, we generalize the result to the weak fractional cointegration model. The cosine terms in the asymptotic distribution are a result of using the representation (2.1) rather than the simpler (2.3), in which case these terms would not be present. In the absence of any coherence between the regressors and the errors at the origin, the distribution theory follows from the above results by setting Gap = Gpa = 0 for a = 1, . . . , p − 1. Also note that the theorem presents a simple and closed-form expression for the asymptotic bias term K(γ )−1 H (γ ). In the next section we show that K(γ )−1 H (γ ) can be estimated consistently with a sufficient rate such that the bias can be removed and a centred distribution can be obtained. To illustrate the distribution theory and the developments leading to the below FMNBLS estimator, we consider briefly an illustrative example. Consider the two-variable stationary case; that is, the regression (1.2) or (1.3) with only one regressor. Denote the integration orders dx and du and the spectral density matrix at the origin G = (Gab ) with a, b = x, u. In this case (2.12) reduces to √ d m0 (λdmu0−dx (βˆm0 (γ ) − β) − η(γ )) → N (0, ω(γ )2 ), C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

(2.13)

86


where the asymptotic bias and variance terms are given by π Gxu (1 − 2dx + 2γ ) η(γ ) = cos (dx − du ) , Gxx (1 − dx − du + 2γ ) 2 2 Guu (1 − 2dx + 2γ ) G2 + 2ux cos(π (dx − du )) . ω(γ )2 = 2(1 − 2dx − 2du + 4γ ) Gxx Gxx Note that, if the spectral representation (2.3) were used instead of (2.1), the cosine terms in both η(γ ) and ω(γ ) would be replaced by unity, their upper bound. The increased variance obtained using (2.3) when the true model is (2.1) is a consequence of the misspecification of the spectral density at the origin since the non-zero complex part in (2.1) is ignored in (2.3). Hence, using the correct representation (2.1) results in a distribution theory that is more precise, both in terms of bias and variance, as shown in Theorem 2.1. Both the bias η(γ ) and the variance ω(γ )2 depend on γ , and it is easily verified that η(γ ) is increasing in γ , whereas ω(γ )2 is decreasing in γ for γ ∈ [0, du ]. Thus, as expected, the minimum variance is attained for the GLS estimator. The fact that the bias is increasing in γ will be inconsequential since our FMNBLS estimator eliminates the bias term. In addition to the bias, the absence of the zero coherence condition results in an additive variance inflation of G2ux (1 − 2dx + 2γ )2 cos(π (dx − du )) ≥ 0. 2(1 − 2dx − 2du + 4γ ) G2xx However, consistency of the estimator is not affected by the presence of non-zero coherence between the regressors and the errors at the zero frequency, and the rate result established by Robinson (1994) and Robinson and Marinucci (2003) (for γ = 0) is, in fact, sharp in this case as conjectured by Robinson and Marinucci (2003). This is easily seen from (2.13), where λdmx −du −1/2 βˆm0 (γ ) − β = √0 ω(γ )Z + λdmx0−du η(γ ) + oP m0 λdmx0−du , m0 for Z ∼ N(0, 1). The consistency and rate of βˆm0 (γ ) follows immediately, and in particular, P λdmu0−dx (βˆm0 (γ ) − β) → η(γ ). That is, when normalized as in Robinson (1994) and Robinson and Marinucci (2003), the NBLS estimator converges to a degenerate distribution (a constant) in the case of non-zero coherence between the regressors and the errors at the origin. However, in the absence of coherence between the regressors and the errors at the origin and normalized by an √ additional m0 , the NBLS estimator has an asymptotic normal distribution. To estimate the asymptotic bias term η(γ ) and to feasibly implement the GLS version of the FMNBLS estimator βˆm0 (du ) we would need an estimate of the memory parameter du of the error term. To that end, we next consider the local Whittle estimator dû based on NBLS residuals with γ = 0. We will give the results for the stationary case where du < dx,a < 1/2. Similar results have been derived by, for example, Velasco (2003) for non-stationary fractional cointegration. Thus, suppose du is estimated by ˆ dû = arg min R(d), d∈

m1 2d ˆ ˆ R(d) = log G(d) − log λj , m1 j =1

m1 1 ˆ G(d) = λ2d Iuˆ uˆ (0, λj ), m1 j =1 j

(2.14)


87


where = [0, 2 ], 0 < 2 < 1/2, is the parameter space and Iuˆ uˆ (γ , λj ) = Iuu (γ , λj ) + (β − βˆm (γ )) Re(Ixx (γ , λj ))(β − βˆm (γ )) + 2(β − βˆm (γ )) Re(Ixu (γ , λj ))

(2.15)

is the periodogram of the differenced residual series γ uˆ t = γ yt − βˆm (γ ) γ xt = γ ut + (β − βˆm (γ )) γ xt . The lower bound of the parameter space reflects prior information that du ≥ 0, which seems reasonable from a practical/empirical point of view. This condition could be relaxed at the cost of a longer proof of the following theorem. We introduce the following condition on the expansion rate of the bandwidth parameter m1 = m1 (T ) used for the local Whittle estimator of du : δmin 1+2φ (log m1 )2 m m0 (log T )4 (log m1 ) + 1 → 0 as T → ∞, (2.16) m1 T 2φ where m0 is the bandwidth parameter for βˆ from Assumption 2.5 and φ is the smoothness parameter from Assumption 2.1. The first part of (2.16) is essentially satisfied if m1 diverges to infinity at a faster rate than m0 . The second part is the standard assumption on the bandwidth parameter for local Whittle estimation; for example, Robinson (1995a). T HEOREM 2.2. Let Assumptions 2.1 and 2.3–2.5 be satisfied with γ = 0 and suppose dû is given by (2.14) using bandwidth m1 satisfying (2.16) and based on residuals uˆ t = yt − βˆm0 (0) xt , where βˆm0 (0) is the NBLS estimator (2.6). Suppose du belongs to the interior of and du < min1≤a≤p−1 dx,a ≤ max1≤a≤p−1 dx,a < 1/2. Then, as T → ∞,

p dû − du = OP (log m1 ) (m0 /m1 )δmin → 0.

(2.17)

√ If, in addition, Gap = Gpa = 0 for a = 1, . . . , p − 1 and (m0 /m1 )2δmin m1 /m0 → 0, then √ D m1 (dû − du ) → N (0, 1/4) as T → ∞.

The second part of Theorem 2.2 shows that, in the absence of long-run coherence between regressors and errors and under an additional (weak) restriction on the bandwidth, the local Whittle estimator of the integration order of the errors is unaffected by the fact that it is based on NBLS residuals. In the general case, the local Whittle estimator remains consistent, although it converges at a slower rate. Moreover, the result in Theorem 2.2 shows that in fact the three-step procedure employed by Marinucci and Robinson (2001b) and Christensen and Nielsen (2006) is valid only when there is no long-run coherence, as assumed in Christensen and Nielsen (2006). That is, in their setup inference on du may be conducted based on our distributional result in Theorem 2.2 and is equivalent to disregarding the fact that the estimator is based on residuals, as long as the bandwidth parameter is chosen according to our assumptions. To conclude this section, we make the following assumption regarding estimation of du . A SSUMPTION 2.6. The memory parameter du of the error term is estimated semiparametrically based on NBLS residuals uˆ t = yt − βˆm0 (0) xt and using bandwidth parameter m1 satisfying (2.16). The resulting estimator dû satisfies (2.17). C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

88


Theorem 2.2 implies that Assumption 2.6 is satisfied when du < min1≤a≤p−1 dx,a ≤ max1≤a≤p−1 dx,a < 1/2. Similar results for the non-stationary case are provided by, for example, Velasco (2003).

3. FULLY MODIFIED NBLS ESTIMATION We next consider estimation of the bias in NBLS from Theorem 2.1; that is, estimation of K(γ )−1 H (γ ). From the definitions of K(γ ) and H (γ ) in Theorem 2.1 and its proof, we can equivalently write d −1

H (γ ) = λmp m Fxu (γ , λm ), (3.1) K(γ ) = λ−1 m m Fxx (γ , λm )m , λ where Fqr (γ , λ) = 0 Re(fqr (γ , θ ))dθ and fqr (γ , θ ) is the cross-spectral density between γ qt and γ rt . Thus, K(γ ) is the (scaled) integrated co-spectrum of γ xt and H (γ ) is the (scaled) integrated co-spectrum between γ xt and γ ut . By rewriting K(γ ) and H (γ ) in this way, the bias term K(γ )−1 H (γ ) is recognized to be the (scaled) population equivalent to the coefficient estimator in a regression of the errors from (1.3) on the regressors. This mimics the corresponding well-known result from ordinary least squares when the errors and regressors are correlated. However, in our weak fractional cointegration setup the bias term can be estimated and hence eliminated. It follows that a natural estimator of the bias can be based on −1 (γ , 1, m2 )Fˆxu (γ , 1, m2 ), m2 (γ ) = Fˆxx

using bandwidth parameter m2 = m2 (T ). However, the estimator m2 (γ ) is infeasible since the errors ut are unobserved. Instead, the residuals from an initial NBLS regression, uˆ t , may be m ˆ used. Defining Fˆx uˆ (γ , l, m) = 2π j =l Re(Ix uˆ (γ , λj )) and noting that Fx uˆ (γ , 1, m0 ) = 0 from T the first-order condition for βˆm0 (γ ), yields the feasible estimator −1 (γ , m0 + 1, m2 )Fˆx uˆ (γ , m0 + 1, m2 ). ˆ m2 (γ ) = Fˆxx

(3.2)

Thus, estimation of K(γ )−1 H (γ ) can be based on simply calculating the coefficient estimator in an auxiliary NBLS regression of the (differenced) residuals from the initial NBLS regression on the same set of regressors, γ xt ; that is, on NBLS estimation of the auxiliary regression γ uˆ t = κ + γ xt + γ vt ,

t = 1, . . . , T .

(3.3)

The discussion of the representations (2.1), (2.3) and (2.8) suggests that l 2π Re(eiλj (dq −dr )/2 Iqr (γ , λj )), F˜qr (γ , k, l) = T j =k

0 ≤ k ≤ l ≤ T − 1,

should more precisely approximate the integrated co-spectrum Fqr (γ , λ); cf. Assumption 2.1. Thus, we also consider the estimator −1 (γ , m0 + 1, m2 )F˜x uˆ (γ , m0 + 1, m2 ). ˜ m2 (γ ) = F˜xx

(3.4)

For the estimation of the bias term, we need the following condition on the bandwidth m2 . C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

89


A SSUMPTION 3.1.

The bandwidth parameter m2 = m2 (T ) satisfies m2 m0 → 0 as T → ∞, + m2 T

where m0 is the bandwidth from Assumption 2.5. The first term in Assumption 3.1 ensures that (3.2) is based on an increasing number of periodogram ordinates, m2 − m0 . The second term ensures that estimation is conducted in a neighbourhood of the origin, which is sufficient for consistent NBLS estimation. We can now state the following result regarding the estimation of the NBLS bias term. T HEOREM 3.1. Let Assumptions 2.1–2.5 and 3.1 be satisfied and assume that ˆ m2 (γ ) in (3.2) and ˜ m2 (γ ) in (3.4) are based on residuals uˆ t = yt − βˆm0 (γ ) xt , where βˆm0 (γ ) is the NBLS estimator (2.6). Then, as T → ∞, ˆ λdmu2 −1 m2 m2 (γ )

−1

− K(γ ) H (γ ) = OP

−1 ˜ λdmu2 −1 m2 m2 (γ ) − K(γ ) H (γ ) = OP

m0 m2 m0 m2

δmin +

−1/2 m0 (log T )−1

+

−1/2 m0 (log T )−1

δmin

+ +

m min(1,φ)

2

T m φ 2

T

p

→ 0,

p

→ 0.

du −1 ˜ ˆ This result implies that λdmu2 −1 m2 m2 (γ ) and λm2 m2 m2 (γ ) based on residuals are both consistent estimators of K(γ )−1 H (γ ). The theorem also implies, in conjunction with Theorem −1 u 2.2, that the bias λ−d m0 m0 K(γ ) H (γ ) of the NBLS estimator in Theorem 2.1 can be consistently estimated. It is even possible, based on Theorems 2.2 and 3.1, to obtain a rate result for the bias estimator, which we shall apply in the derivation of the fully modified estimator. The FMNBLS estimator is based on a new bandwidth parameter m3 = m3 (T ). In particular, dû ˆ dû ˆ −1 ˜ β˜m3 (γ ) = βˆm3 (γ ) − λ− m3 m3 λm2 m2 m2 (γ ), ˆ

(3.5)

d dˆx,1 ˆ m = diag(λm , . . . , λmx,p−1 ). That is, the fully modified estimator β˜m3 (γ ) is simply the where NBLS estimator βˆm3 (γ ) corrected for the asymptotic bias. All the estimators of the integration orders are based on the bandwidth m1 . The bias correction term ˜ m2 (γ ) is estimated using bandwidth m2 for (3.4) and bandwidth m0 is needed to obtain the residuals upon which both (3.4) and dû are based. We could also have used ˆ m2 (γ ) in (3.5), but Theorem 3.1 shows that ˜ m2 (γ ) converges at a faster rate than ˆ m2 (γ ). Note that in Theorem 3.1 the estimator of K(γ )−1 H (γ ) is based on periodograms integrated over λm0 +1 , . . . , λm2 and therefore truncates the first m0 Fourier frequencies, which may introduce variance inflation in finite samples. For example, Hurvich et al. (1998) report Monte Carlo variance inflation from trimming the lowest frequencies in log-periodogram regression, even though theoretically trimming the lowest frequencies has no detrimental effect. However, as noted earlier, we cannot use the lowest m0 frequencies due to the first-order condition for the initial NBLS estimator. This differs from the fully modified estimator C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

90


in Phillips and Hansen (1990), which uses the frequencies closest to the origin to estimate the bias term. For the bandwidth m3 = m3 (T ) of the FMNBLS estimator, we need the following condition. A SSUMPTION 3.2.

The bandwidth parameter m3 = m3 (T ) satisfies 1+2 min(1,φ)

m 1 + 3 2 min(1,φ) + m3 m3 T

m0 m2

2δmin + m3

m 2φ 2

T 2δmin m0 m3 + (log T )−2 + (log T )2 (log m1 )2 m3 →0 m0 m1

(3.6)

as T → ∞, where m0 , m1 and m2 are the bandwidth parameters from Assumptions 2.5, 2.6 and 3.1, and φ is the smoothness parameter from Assumption 2.1. The condition on m3 is in some ways complicated and in others quite mild and simple. The first two terms state that m3 has to satisfy the NBLS assumption for the bandwidth; cf. Assumption 2.5. At the same time, m3 must diverge to infinity at a rate no faster than that of m0 (fifth term on the left-hand side) and at a slower rate than m1 and m2 (sixth and third terms on the left-hand side). Note that if m1 and m2 diverge to infinity at much faster rates than m0 and the cointegrating strength, δmin , is large, Assumption 3.2 is less restrictive. In fact, Assumption 3.2 is simple and easily satisfied because it is always feasible to choose m3 = m0 , in which case there is no need to obtain a new NBLS estimator upon which to base the FMNBLS estimator (3.5). In that case, the condition simplifies significantly, and in particular the relevant assumption then becomes min m1+2δ 0 min m2δ 2

+ m0

m 2φ 2

T

+ (log T )2 (log m1 )2

min m1+2δ 0 min m2δ 1

→ 0 as T → ∞,

(3.7)

in addition to Assumptions 2.5, 2.6 and 3.1 already placed on m0 , m1 and m2 . To illustrate the restriction placed on the bandwidths by (3.7), suppose φ ≥ 1 and that we are in the empirically relevant (Section 5) situation δmin = 0.4. Then choosing m0 = m3 = T 0.3 is feasible if at the same time m1 = T ψ1 and m2 = T ψ2 for any ψ1 > 0.675 and ψ2 ∈ (0.675, 0.85). On the other hand, if m1 = m2 = T 0.8 then choosing m0 = m3 = T ψ0 for any ψ0 ∈ (0, 32/90) is feasible which is only slightly restrictive in light of Assumption 2.5 on m0 . Also note that it is in fact feasible in some cases to choose m2 to diverge faster than T 0.8 , which is even faster than the rate allowed for the asymptotically normal NBLS estimation; cf. Assumption 2.5. In any case, the rate of convergence of β˜m3 (γ ) in the following theorem is mostly affected by the cointegration strength δmin and not so much by the choice of m0 = m3 . For example, if δmin = 0.4 and m0√= m3 = T 0.3 , the rate of convergence of β˜m3 (γ ) in (3.8) is T 0.43 , which is close to the usual T -convergence in spite of the low bandwidth rate for m3 . In general, when min )+δmin . Therefore, for any ζ , when m0 = m3 = T ζ , the rate of convergence of β˜m3 (γ ) is T ζ (0.5−δ √ ˜ δmin → 1/2 the rate of convergence of βm3 (γ ) approaches T , which is the best rate attainable for fully parametric estimators based on the complete and correct specification of the spectral density at all frequencies. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


91

T HEOREM 3.2. Let Assumptions 2.1–2.6 and 3.1, as well as either Assumption 3.2 or (3.7) be satisfied and let β˜m3 (γ ) be the FMNBLS estimator (3.5). Then √ d −1 −1 ˜ m3 λdmu3 −1 m3 (βm3 (γ ) − β) → N (0, K(γ ) J (γ )K(γ ) ) as T → ∞,

(3.8)

where K(γ ) and J (γ ) are defined in Theorem 2.1. Finally, we prove in the next theorem that a feasible version of the GLS-version of the FMNBLS estimator can be implemented in the weak fractional cointegration model using an estimate of the memory of the error term; for example, from Theorem 2.2. That is, we show that β˜m3 (dû ) has the same asymptotic distribution as β˜m3 (du ), as long as dû satisfies Assumption 2.6. T HEOREM 3.3. Let Assumptions 2.1, 2.3–2.6 and 3.1, as well as either Assumption 3.2 or (3.7) be satisfied, let β˜m3 (γ ) be the FMNBLS estimator (3.5), and suppose the memory parameters satisfy (2.10). Then

√ d −1 −1 ˜ ˆ as T → ∞, m3 λdmu3 −1 m3 (βm3 (d u ) − β) → N 0, K(du ) J (du )K(du )

(3.9)

where K(du ) and J (du ) are defined in Theorem 2.1. The results in Theorems 3.2 and 3.3 demonstrate that it is possible to obtain an asymptotically unbiased estimator of the cointegration vector in the weak fractional cointegration model (1.2), where the memory parameters satisfy (2.10), even in the presence of long-run coherence. The feasible estimator β˜m3 (dû ) in Theorem 3.3 has the minimum variance among the estimators β˜m3 (γ ) in Theorem 3.2 for γ ∈ [0, du ]; cf. the discussion following Theorem 2.1 in Section 2. In the stationary case, Theorems 3.2 and 3.3 prove that it is possible to consistently estimate (with a centred asymptotic distribution) the relation between stationary time series even when the regressors and the errors are correlated at any frequency. A necessary condition is that the time series in question are fractionally cointegrated. Results similar to Theorems 3.2 and 3.3 are obtained by Hualde and Robinson (2010) who derive the asymptotic distribution theory for a related inverse spectral density weighted estimator; see also Nielsen (2005). In a different setup, Robinson (2008) developed joint multiple local Whittle (MLW) estimation of the memory parameters, the cointegration coefficient, and a phase parameter in a bivariate stationary fractionally cointegrated system. The MLW estimator of β also has a centred asymptotic distribution and converges at the same rate as our FMNBLS estimator. The multivariate method enjoys the usual advantages of a systems approach, but being based on numerical optimization of a multi-parameter objective function it is computationally more demanding than our regression approach and the objective function may have multiple local optima. Finite sample performance of the MLW estimator of β and our FMNBLS estimator is compared in Section 4. Compared to the NBLS estimator of Theorem 2.1, the fully modified estimator incurs no asymptotic variance inflation, only bias correction. Indeed, the FMNBLS estimator enjoys a faster rate of convergence than the NBLS estimator in the general case with non-zero coherence between the regressors and the errors at the origin. In particular, in the notation of C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

92


the example following Theorem 2.1, the asymptotic mean squared error of the two estimators are related as u −2dx E(βˆm3 (γ ) − β)2 = ω(γ )2 + m3 η(γ )2 AMSE(βˆm3 (γ )) = m3 λ2d m3 = AMSE(β˜m3 (γ )) + m3 η(γ )2 .

Thus, FMNBLS with the asymptotic distribution theory of Theorems 3.2 and 3.3 constitutes a much more useful inferential tool for the weak fractional cointegration model than the NBLS estimator, which is commonly used in previous work and applied especially in financial economics. Furthermore, Theorem 3.3 shows that the FMNBLS estimator is in fact applicable in the more general weak fractional cointegration model, and not just in the stationary cointegration case. Consistent estimation of the parameters appearing in the variance of the limiting distribution in (3.8) can be based on Theorem 2.2 in conjunction with the estimator m2 d +d −2γ ˆ ab (β(γ ), d) = 1 Re λj x,a x,b ei(λj −π)(dx,a −dx,b )/2 Iab (λj ) , G m2 j =1

where d = (dx,1 , . . . , dx,p−1 , du ), Iab (λj ) is the (a, b)th element of Iww (λj ) = Iww (0, λj ), and Iww (0, λj ) is the periodogram matrix of wt = (γ xt , γ ut ) . Note that β enters in Iab (λj ) if a = p and/or b = p. Specifically, if d˜u is the local Whittle estimator of du based on u˜ t and ˜ ˜ I˜(d˜u , λj ) is the periodogram matrix of (du xt , du u˜ t ), where u˜ t denotes FMNBLS residuals u˜ t = yt − β˜m3 (dû ) xt , we have m2 ˆ ˆ p 1 ˆ ˆ d +d −2d˜ ˆ ˜ ˆ ˜ Re λj x,a x,b u ei(λj −π)(d x,a −d x,b )/2 Iãb (d˜u , λj ) → Gab Gab (βm3 (d u ), d) = m2 j =1

as T → ∞. The proof of this statement is omitted since it follows as in Propositions 2 and 3 of −1/2 d −d Robinson and Yajima (2002) by noting that βã,m3 (dû ) − βa = OP (m3 λmx,a3 u ).4

4. SIMULATION EVIDENCE In this section, we investigate the finite sample behaviour of the GLS-version of the FMNBLS estimator β˜m3 (dû ) introduced in Theorem 3.3 and compare with the performance of the NBLS estimator βˆm3 (0) and the MLW estimator of Robinson (2008).5 We consider the following three two-dimensional generating mechanisms for xt and ut in the cointegrating relation (1.2): Model A:

xt = (1 − L)−dx ε1t ,

ut = (1 − L)−du ε2t ,

Model B:

xt = (1 − L)−dx v1t ,

ut = (1 − L)−du ε2t ,

v1t = a1 v1,t−1 + ε1t ,

Model C:

xt = (1 − L)−dx ε1t ,

ut = (1 − L)−du v2t ,

v2t = a2 v2,t−1 + ε2t ,

4

Also note that, as in Theorem 2.2, local Whittle estimation of the integration order of the errors based on FMNBLS −1/2 residuals is consistent and, if m0 = m3 , then d˜u − du = OP (m0 (log m1 ) (m0 /m1 )δmin ), which converges faster than when based on NBLS residuals. 5 We thank the editor and an anonymous referee for suggesting the comparison with the MLW estimator. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

93


where εt = [ε1t , ε2t ] is independently and identically N (0, ) distributed with ξ ρξ 1/2 = . ρξ 1/2 1 Thus, ξ = var(ε1t )/var(ε2t ) is the signal-to-noise ratio and ρ = corr(ε1t , ε2t ) is the contemporaneous correlation between the innovations ε1t and ε2t . Based on the pair (xt , ut ) we generate yt from (1.2) with α = 0 and β = 1. For all the simulations we generate the data with (dx , du ) = (0.4, 0) which is close to what is expected in many practical situations concerning, for example, financial volatility series. This choice is made to facilitate comparison with the MLW estimator, and is also supported by the empirical applications below where we find estimates very close to these values in almost all cases. Unreported simulations reveal that the bias in NBLS is more severe when the integration orders are closer, for example, (dx , du ) = (0.3, 0.1), which also reduces the effectiveness of the bias correction procedure. However, the bias reduction in FMNBLS relative to NBLS remains noteworthy in that case, and for larger sample sizes the bias reduction works as well as with (dx , du ) = (0.4, 0). Models A, B and C satisfy all the assumptions of the model, and are increasing in complexity with Model A having no short-run dynamics, whereas Models B and C include short-run dynamics. Model B adds short-run dynamics to the regressor and thus disturbs the signal due to the contamination of the low frequencies of xt from the higher frequencies, which are dominated by the short-run dynamics. In Model C short-run dynamics is present in ut instead of xt . Note that ξ (1 − a1 )−2 ρξ 1/2 (1 − a1 )−1 (1 − a2 )−1 (4.1) G= ρξ 1/2 (1 − a1 )−1 (1 − a2 )−1 (1 − a2 )−2 such that when ρ = 0 the G matrix is not diagonal and the distribution theory for NBLS from Christensen and Nielsen (2006) no longer applies; see Theorem 2.1. However, the NBLS estimator is still consistent when ρ = 0. On the other hand, FMNBLS should be able to handle the presence of the long-run endogeneity that is due to ρ = 0, as shown in Theorems 3.2 and 3.3. We also consider the three-dimensional generating mechanism Model D:

x1t = (1 − L)−dx,1 ε1t ,

x2t = (1 − L)−dx,2 ε2t ,

ut = (1 − L)−du ε3t ,

where yt is generated by (1.2) with α = 0 and β = [1, 0] , du = 0 and εt = [ε1t , ε2t , ε3t ] is independently and identically N (0, ) distributed with ⎡ ⎤ 2 0 −0.75 ⎢ ⎥ 2 −0.75 ⎦ . =⎣ 0 −0.75

−0.75

1

Note that in Model D the cointegrating regression (1.2) is yt = x1t + ut ; that is x1t ∈ I (dx,1 ), x2t ∈ I (dx,2 ), ut ∈ I (0) and yt ∈ I (dx,1 ). Hence, this is a three-dimensional model where the integration orders of the regressors are not necessarily the same, but all assumptions are satisfied because one of the regressors is not included in the DGP for yt ; in particular, there is no cointegration among the regressors. The model illustrates a situation where one of the included regressor variables is in fact not part of the cointegrating regression, and demonstrates how the C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

94


estimation of the associated coefficient (with true value equal to zero) depends on the parameters of the model, in particular on the memory parameters, bandwidth parameters and sample size. For each model we use 10,000 replications for sample sizes T = 128 and 512, which are close to what is found in practical applications (see also the following section), although many applications in finance will have much larger sample sizes. The bandwidth parameters chosen for the simulation study are mi = T ψi , i = 0, 1, 2, 3, where ψ0 ∈ {0.4, 0.5}, ψ1 ∈ {0.6, 0.7}, ψ2 = 0.8, ψ3 = ψ0 and x denotes the largest integer less than or equal to x. Tables 1–3 present the Monte Carlo bias and root mean squared error (RMSE) results for Models A–C. As expected from (2.12) and (4.1), we find that changing the sign of the contemporaneous correlation ρ only causes the bias to change sign but does not change the size of the bias or RMSE, so we only report results for ρ ≤ 0. For comparison, we also report the corresponding results for the MLW estimator of Robinson (2008) with bandwidth m1 and using as starting values the NBLS, dx , and du estimates also applied in FMNBLS; see Robinson (2008, Remark 3). For the MLW estimator the phase parameter is set as (dx − du )π/2 (that is, fractional integration), and the MLW objective function is optimized over the three-dimensional parameter (β, dx , du ) by the BFGS algorithm and terminated when the convergence criterion ε = 10−6 is satisfied or after 100 iterations.6 Table 1 presents the results for Model A. A general finding is that increasing the signal-tonoise ratio ξ from 1 to 2, reduces the bias of NBLS and also improves the bias-reducing ability of the FMNBLS procedure. This is due to the fact that the contemporaneous covariance between ε1t and ε2t is reduced when ξ increases from 1 to 2. Furthermore, estimating K(γ )−1 H (γ ) in (2.12) when it is in fact zero because ρ = 0 inflates the variance (and hence the RMSE) of FMNBLS relative to that of NBLS, but the fully modified procedure still yields unbiased estimates of β. For ρ = −0.75 (and ρ = 0.75), the FMNBLS procedure bias-corrects NBLS although this comes at the expense of an increase in the finite sample standard error of up to 50%. However, the RMSE of FMNBLS in that case is (much) lower than that of NBLS. For the larger sample size, T = 512, FMNBLS yields almost unbiased estimates for all bandwidths with RMSEs much smaller than those of NBLS, except when ρ = 0. Even though the bias of NBLS increases (and becomes fairly large) for larger m0 , the fully modified procedure is still able to correct this, and indeed the bias of FMNBLS is smaller when m0 (= m3 ) is larger. Since there is no short-run dynamics, the choice of m1 appears less important. The MLW estimator performs quite poorly compared to both NBLS and FMNBLS, especially for T = 128. Interestingly, the sign of the bias of MLW is opposite that of NBLS. Table 2 presents the simulation results for Model B with autoregressive coefficients a1 = −1/2 or a1 = 1/2.7 Now, (2.1) is a worse approximation to (2.11) when moving only a short distance away from the origin, due to the contamination from higher frequencies (short-run dynamics), and we therefore expect the bias of NBLS (and possibly also of FMNBLS) to be larger than for the case of no short-run dynamics. Interestingly, for Model B it appears that the biases and RMSEs of NBLS and FMNBLS are lower than for Model A when a1 = 1/2 and higher than for Model A when a1 = −1/2. In Model B, the MLW estimator is sometimes equal to or better than FMNBLS in terms of RMSE in some cases with m1 = T 0.7 . In general, though, 6 In the case of non-convergence after 100 iterations, the replication in question was not included in the calculation of bias and RMSE for the MLW estimator. Increasing the number of iterations required before termination of the numerical optimization substantially worsens the results for the MLW estimator. 7 For Models B and C we report the simulation results for ξ = 2 only. The results for ξ = 1 are qualitatively similar; see also Table 1.


m0

m1

T

0.5

T 0.4

T 0.5

T 0.4


T 0.6 T 0.7

T 0.6 T 0.7

T 0.4

T 0.5

−0.144

−0.120

−0.203

−0.169

0.147

0.125

0.208

0.176

0.218

0.188

0.309

0.267

RMSE

0.002 −0.000

−0.016 −0.017

−0.024 0.003 0.000

−0.022

−0.040 −0.021 −0.023

−0.032 −0.037

−0.053 −0.056 −0.030

Bias

0.047 0.047

0.051 0.051

0.071 0.066 0.066

0.072

0.111 0.103 0.102

0.145 0.114

0.160 0.156 0.146

RMSE

FMNBLS

0.036 0.030

0.036 0.030

0.045 0.058 0.045

0.059

0.137 0.141 0.136

0.193 0.157

0.212 0.195 0.185

Bias

0.148 0.086

0.149 0.086

0.137 0.231 0.124

0.258

0.587 0.727 0.579

0.810 0.692

0.943 0.772 1.118

RMSE

MLW

−0.000

−0.000

0.000

0.000

0.001

0.001

0.001

−0.000

Bias

0.040

0.047

0.057

0.064

0.081

0.097

0.114

0.136

RMSE

NBLS

−0.001 −0.001

−0.000 −0.000

0.001 0.001 0.001

0.000

0.001 0.001 0.001

0.002 0.002

−0.001 −0.000 0.003

Bias

0.063 0.063

0.067 0.066

0.091 0.090 0.089

0.092

0.147 0.139 0.136

0.193 0.151

0.211 0.205 0.198

RMSE

FMNBLS

ρ=0

−0.001 −0.001

−0.001 −0.001

0.001 −0.002 0.001

0.000

0.000 −0.002 0.004

0.001 −0.001

0.004 −0.006 0.007

Bias

0.103 0.051

0.101 0.051

0.072 0.154 0.072

0.208

0.384 0.514 0.527

0.349 0.715

0.927 0.576 0.925

RMSE

MLW

Note: The simulations are based on 10,000 replications under the empirically relevant scenario (dx , du ) = (0.4, 0), with bandwidths m2 = T 0.8 and m3 = m0 .

2

T

0.5

T 0.7 T 0.6 T 0.7

−0.209

−0.175

T 0.7 T 0.6 T 0.7

−0.296

T 0.7 T 0.6

−0.248

Bias

NBLS

T 0.6 T 0.7 T 0.6

Panel B: T = 512 T 0.6 1 T 0.4

2

1

Panel A: T = 128

ξ

Bandwidths

Table 1. Simulation results for Model A. ρ = −0.75


95

T 0.5

T

T 0.6 T 0.7

T 0.7 T 0.6 T 0.7

−0.066

−0.057

0.068

0.059 0.003 −0.006

−0.030 −0.016 −0.023 0.024 0.023

0.078 0.028 0.032

0.080 0.079

0.083

0.054

0.054 0.066

0.246 0.241 0.053

0.211 0.204

RMSE

−0.005 −0.019

0.086 −0.005 −0.019

0.086 0.092

0.090

−0.027

−0.027 −0.006

0.243 0.333 −0.007

0.297 0.342

Bias

0.032 0.028

0.235 0.032 0.028

0.235 0.376

0.354

0.058

0.058 0.127

1.203∗ 1.546∗ 0.144

1.345∗ 1.528∗

RMSE

MLW

−0.000

−0.000

0.000

0.000

0.000

0.000

0.001

−0.000

Bias

0.020

0.023

0.060

0.068

0.043

0.050

0.120

0.143

RMSE

NBLS

−0.000 −0.000

0.001 −0.000 −0.000

0.001 0.001

0.000

0.000

0.000 0.001

0.002 0.002 0.001

−0.001 −0.000

Bias

0.032 0.030

0.095 0.033 0.030

0.099 0.094

0.098

0.066

0.065 0.072

0.205 0.206 0.071

0.225 0.226

RMSE

FMNBLS

ρ=0

0.000 −0.000

0.001 0.000 −0.000

0.001 −0.001

−0.001

−0.000

−0.000 0.000

0.019 0.008 0.002

0.008 0.001

Bias

0.030 0.021

0.080 0.030 0.021

0.080 0.206

0.224

0.050

0.050 0.145

1.131 0.855 0.181

1.066 0.765

RMSE

MLW

Note: The simulations are based on 10,000 replications under the empirically relevant scenario (dx , du ) = (0.4, 0), with bandwidths m2 = T 0.8 and m3 = m0 . An asterisk indicates that MLW did not converge for 5–10% of the replications.

1/2

0.4

0.227

−0.221

T 0.5

−0.034 −0.035

−0.039

0.190

−0.182

0.095

−0.027 0.037

T 0.7 T 0.6

−0.089

0.086

Panel B: T = 512 −1/2 T 0.4 T 0.6

−0.078

T 0.7 T 0.6

−0.188 −0.178 −0.017

−0.134 −0.124

Bias

0.024

T

0.5

T 0.4

0.342

0.291

RMSE

FMNBLS

T 0.7

1/2

T 0.6 T 0.7 T 0.6

−0.328

Bias

T 0.5

m1

−0.271

m0

NBLS

Panel A: T = 128 −1/2 T 0.4 T 0.6 T 0.7

a1

Bandwidths

Table 2. Simulation results for Model B. ρ = −0.75

96 M. Ø. Nielsen and P. Frederiksen


−0.430

T T 0.6


T 0.5

T 0.4

T 0.6 T 0.7

T 0.7 T 0.6 T 0.7

−0.301

−0.249

0.308

0.258

0.096

−0.177 −0.183

0.005 −0.123 −0.108 0.204 0.212

0.032 0.158 0.150

0.034 0.032

0.034

0.503

0.393 0.477

0.070 0.074 0.384

0.072 0.069

RMSE

0.231 0.360

0.003 0.239 0.389

0.003 0.016

0.016

0.093

0.005 0.076

0.047 0.012 0.198

0.055 0.012

Bias

0.735 1.056

0.040 0.794 1.282

0.040 0.085

0.085

1.717∗∗

1.911 1.865∗∗

∗∗

0.337 0.154 1.928∗∗

0.349 0.145

RMSE

MLW

−0.001

−0.000

0.000

0.000

0.001

0.002

0.000

−0.000

Bias

0.080

0.093

0.027

0.030

0.152

0.187

0.055

0.064

RMSE

NBLS

−0.001 −0.002

0.000 −0.000 0.000

0.000 0.000

0.000

−0.001

0.000 0.001

0.001 0.001 0.002

−0.000 −0.000

Bias

0.128 0.139

0.042 0.139 0.152

0.042 0.043

0.044

0.302

0.366 0.281

0.097 0.091 0.339

0.100 0.093

RMSE

FMNBLS

ρ=0

−0.004 −0.005

0.000 −0.005 0.002

0.000 −0.000

−0.001

−0.001

−0.001 −0.005

0.001 0.002 −0.048

0.005 0.001

Bias

MLW

0.279 0.502

0.032 0.376 0.528

0.032 0.064

0.109

1.958∗

1.951∗ 1.539

0.325 0.142 2.479∗

0.451 0.106

RMSE

Note: The simulations are based on 10,000 replications under the empirically relevant scenario (dx , du ) = (0.4, 0), with bandwidths m2 = T 0.8 and m3 = m0 . One and two asterisks indicate that MLW did not converge for 5–10% of the replications and 10–25% of the replications, respectively.

1/2

−0.093

T

0.5

−0.012 0.007

−0.009

0.082

−0.079

−0.289 −0.421

T 0.7 T 0.6

0.446

0.394

Panel B: T = 512 −1/2 T 0.4 T 0.6

−0.369

0.7

0.036 0.032 −0.293

−0.003 −0.010

Bias

−0.443

T

0.5

T 0.4

0.139

0.122

RMSE

FMNBLS

T 0.7

1/2

T 0.6 T 0.7 T 0.6

−0.133

Bias

T 0.5

m1

−0.113

m0

NBLS

Panel A: T = 128 −1/2 T 0.4 T 0.6 T 0.7

a2

Bandwidths

Table 3. Simulation results for Model C. ρ = −0.75


97

m0

(0.40, 0.40)

(0.40, 0.25)

−0.127

T T 0.6

T

0.5

T 0.4

T 0.7 T 0.6 T 0.7

0.7

−0.147

T 0.5

−0.151

−0.124

T 0.6 T 0.7 T 0.6

T 0.4

−0.233

−0.207

Bias1

T 0.6 T 0.7

T 0.6 T 0.7

m1

T 0.5

Panel A: T = 128 (0.25, 0.40) T 0.4

(dx,1 , dx,2 )

Bandwidths

−0.150

−0.127

−0.235

−0.211

−0.149

−0.125

Bias2

0.166

0.152

0.160

0.145

0.248

0.235

RMSE1

0.166

0.152

0.250

0.238

0.162

0.146

RMSE2

Table 4. Simulation results for Model D. NBLS

−0.037 −0.021 −0.022

−0.017 −0.036

−0.033 −0.032 −0.016

−0.126 −0.127

−0.118 −0.118

Bias1

−0.036 −0.020 −0.021

−0.132 −0.034

−0.126 −0.125 −0.132

−0.020 −0.021

−0.034 −0.034

Bias2

0.124 0.114 0.112

0.103 0.127

0.116 0.113 0.105

0.199 0.197

0.220 0.216

RMSE1

FMNBLS

0.125 0.115 0.113

0.201 0.126

0.221 0.217 0.204

0.105 0.104

0.117 0.115

RMSE2

98 M. Ø. Nielsen and P. Frederiksen



T 0.6 T 0.7

T 0.5

T 0.7 T 0.6 T 0.7

T 0.6 T 0.7

−0.103

−0.087

−0.103

−0.086

T T 0.6

0.7

−0.187

−0.165

Bias1

T 0.6 T 0.7 T 0.6

m1

T 0.4

T

0.5

T 0.4

T 0.5

T 0.4

m0

−0.103

−0.086

−0.187

−0.165

−0.102

−0.085

Bias2

0.109

0.095

0.108

0.093

0.193

0.176

RMSE1

0.109

0.095

0.193

0.176

0.107

0.092

RMSE2

Note: The simulations are based on 10,000 replications with d3 = 0 and bandwidths m2 = T 0.8 and m3 = m0 .

(0.40, 0.40)

(0.40, 0.25)

(0.25, 0.40)

Panel B: T = 512

(dx,1 , dx,2 )

Bandwidths

Table 4. Continued. NBLS

−0.002 −0.003

−0.016 −0.016

−0.015 −0.002 −0.003

−0.088 −0.015

−0.081 −0.082 −0.087

Bias1

−0.002 −0.003

−0.015 −0.015

−0.081 −0.087 −0.087

−0.002 −0.081

−0.015 −0.015 −0.001

Bias2

0.052 0.051

0.057 0.056

0.052 0.048 0.048

0.118 0.053

0.122 0.121 0.118

RMSE1

FMNBLS

0.052 0.052

0.057 0.056

0.122 0.119 0.118

0.047 0.123

0.051 0.050 0.047

RMSE2 Fully modified narrow-band least squares

99

100


it does not perform as well as FMNBLS, and in some cases it even has convergence problems marked by asterisks in the table. Next, we turn to Model C. Table 3 presents the simulation results, which are quite different for a2 = −1/2 and a2 = 1/2. Compared to the results of Model A, the NBLS estimator is actually less biased in this setup when a2 = −1/2. This suggests that negative autocorrelation in ut offsets some of the bias in the NBLS estimator introduced by contemporaneous covariance between xt and ut ; see (4.1). Consequently, the FMNBLS procedure works very well and generally yields almost unbiased estimates and also large reductions in RMSEs when a2 = −1/2. When a2 = 1/2, Model C results in extremely large biases for NBLS. For the small sample size the NBLS biases when ρ = −0.75 range from 0.37 to 0.43 in absolute value, and for the large sample size the biases are still about two-thirds of the bias for the smaller sample size. For the small sample size, this yields an imprecise estimate of K(γ )−1 H (γ ), and as a result FMNBLS is still biased, although the fully modified procedure generally still manages to reduce the bias quite considerably and has a smaller RMSE than NBLS. For T = 512, FMNBLS has low bias and the RMSE is again (much) smaller than that of NBLS. The performance of MLW is similar to that in Table 2 with convergence problems for the small sample size when a2 = 1/2, and performance equal to or better than that of FMNBLS only when a2 = −1/2 and at the same time m1 = T 0.7 and T = 512. Finally, in Table 4 we turn to Model D with two regressors with memory parameters (dx,1 , dx,2 ). In this case, as in Model A, the bandwidth m1 has no significant effect since there is no short-run dynamics. Increasing the bandwidth m0 appears to worsen the results for NBLS but improve those for FMNBLS, both in terms of bias and RMSE. The most interesting aspect of Model D is the comparison across different values of (dx,1 , dx,2 ). In this respect we find for both NBLS and FMNBLS that bias and RMSE are higher for the coefficient on the variable with the lowest memory parameter. This finding is in line with theory and with unreported simulations of Models A–C with (dx , du ) = (0.3, 0.1). The results appear symmetric with respect to the variable that is included in the cointegrating regression (x1t ) and that which is excluded (x2t ). Overall, the simulations clearly demonstrate the superiority (in terms of both bias and RMSE) of the fully modified estimator relative to NBLS in the presence of non-zero long-run coherence between the regressor and the error. In all models, the bias-reduction of FMNBLS relative to NBLS is considerable, and for the larger sample size the bias practically disappears. The cost of this bias correction is an increase in the finite sample standard deviation of approximately 30–50% for the models considered here. However, the results indicate that this is more than offset by the large bias reduction when ρ = 0, thus yielding reductions in the RMSE. The simulations also suggest that the GLS-version of the FMNBLS estimator is superior to the MLW estimator in many circumstances. This could possibly be due to the extra flexibility of the FMNBLS estimator from using separate bandwidths for estimation of the cointegration vector, the integration orders and the asymptotic bias term.

5. EMPIRICAL ILLUSTRATIONS We apply NBLS and FMNBLS to three different empirically relevant examples.8

8 Henry and Zaffaroni (2003) survey empirical applications of fractional integration and long memory in macroeconomics and financial economics.



101

5.1. The implied-realized volatility relation Recent contributions by, for example, Comte and Renault (1998), Bandi and Perron (2006), Christensen and Nielsen (2006) and Berger et al. (2009) including empirical evidence, have pointed towards viewing the predictive regression between implied volatility (IV) and realized volatility (RV) as one of stationary fractional cointegration. However, the possible existence of a volatility risk premium that is correlated with IV can bias the NBLS estimator in a regression of RV on IV, which ultimately can lead to a wrongful rejection of the long-run unbiasedness hypothesis; see Bandi and Perron (2006). Furthermore, the existence of an unobserved risk premium can imply a negative intercept in the regression, and thus long-run unbiasedness is typically upheld if the cointegrating coefficient is β = 1 regardless of the presence of the intercept. We sample S&P500 index options (SPX) data from the Berkeley options database covering the period January 1988 to December 1995 and calculate T = 412 weekly Black–Scholes implied volatilities and the corresponding S&P500 realized volatilities; see Christensen and Nielsen (2006) for details. In particular, Christensen and Nielsen (2006) find that the logvolatilities are stationary, with insignificantly different long memory estimates, and that NBLS regression yields a cointegrating coefficient β ranging from 0.84 to 0.89 for different bandwidth choices. Panel A of Table 5 shows the memory estimates for the two log-volatility series. As found by Christensen and Nielsen (2006), the series are stationary (d < 1/2) and exhibit long memory. In Panel B of the table, we show estimates (with asymptotic standard errors in parentheses) of the (stationary) fractional cointegration relation between the two log-volatility series, IV and RV, for a variety of bandwidth parameters: m0 = m3 ∈ {T 0.4 , T 0.5 }, m1 ∈ {T 0.6 , T 0.7 , T 0.8 } and m2 = T 0.8 . The NBLS estimates are of course in line with the results of Christensen and Nielsen (2006), with the parameter of interest, β, estimated to be 0.81−0.84. For m0 = T 0.5 , m1 = T 0.8 it is significantly less than unity when applying the asymptotic distribution theory in Theorem 2.1. Note that in two cases dˆx + dû ≥ 1/2 so that Theorem 2.1 does not apply to the NBLS estimator and the asymptotic standard error is denoted by (−). The FMNBLS procedure corrects for the possible correlation between the regressor and the error term; those estimates are displayed in the final columns. We obtain point estimates of β that are now above unity, but insignificantly different from unity except when m1 = T 0.6 . Thus, our estimates generally tend to support the long-run unbiasedness hypothesis, β = 1. Finally, we notice that with m1 = T 0.8 both the NBLS and FMNBLS estimates support an I (d) − I (0) relation with d around 0.35−0.4, although, cf. Theorem 2.2, the usual asymptotic distribution may not apply for dû and d˜u (dû denotes the estimate based on NBLS residuals uˆ t and d˜u denotes the estimate based on FMNBLS residuals u˜ t ). 5.2. Inflation rate harmonization in the European Union We also examine consumer price indices of France and Spain. Methods for calculating the consumer price index vary across different countries, which makes international comparison more difficult, and because of this we use the harmonized index for consumer prices (HICP) developed within the European Union based on a coordinated methodology. Since the differentials between the inflation rates of individual member countries of the European Union are constrained, we expect that there exists a stable relationship between the inflation rates. Furthermore, based on evidence of long memory in inflation rates in Doornik and C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

102


Table 5. Implied–realized volatility application. Panel A: Long memory estimates, dˆ Realized volatility Bandwidth yt = log σRV ,t m1 = T 0.6

Implied volatility xt = log σI V ,t

0.4476

0.4527

(0.0822)

m1 = T 0.7

(0.0822)

0.4162

0.3503

(0.0606)

m1 = T 0.8

(0.0606)

0.4180

0.2801

(0.0449)

(0.0449)

Panel B: Cointegration analysis NBLS αˆ m3 (0)

βˆm3 (0)

m0 = T 0.4 , m1 = T 0.6

−0.9403

0.8364

m0 = T 0.4 , m1 = T 0.7

−0.9403

m0 = T 0.4 , m1 = T 0.8

−0.9403

m0 = T

−0.9990

m0 = T 0.5 , m1 = T 0.7

−0.9990

m0 = T 0.5 , m1 = T 0.8

−0.9990

Bandwidths

0.5

, m1 = T

0.6

(−)

0.8364

(0.1325)

0.8364

(0.1227)

0.8076 (−)

0.8076

(0.1242)

0.8076

(0.1044)

FMNBLS dû

α˜ m3 (dû )

0.1046

−0.0390

0.0987

0.0289

0.0718

−0.2893

0.1145

−0.2018

0.1079

−0.1359

0.0805

−0.3470

(0.0822) (0.0606) (0.0449) (0.0822) (0.0606) (0.0449)

β˜m3 (dû ) 1.2792

(0.1305)

1.3126

(0.1746)

1.1562

(0.1554)

1.1992

(0.0976)

1.2316

(0.1317)

1.1279

(0.1264)

d˜u 0.1778

(0.0822)

0.1341

(0.0606)

0.0616

(0.0449)

0.1525

(0.0822)

0.1181

(0.0606)

0.0582

(0.0449)

Note: Panel A reports local Whittle estimates of the fractional integration orders as described in Robinson (1995a). √ d Numbers in parentheses are asymptotic standard errors using m1 (dˆ − d) → N (0, 1/4). Panel B reports NBLS and 0.8 FMNBLS estimates with m2 = T and m3 = m0 . The asymptotic standard errors for the NBLS and FMNBLS estimates are based on (2.12) and (3.8), respectively. Standard errors for dû and d˜u are based on the same asymptotic ˆ and should be used with caution; see Theorem 2.2. distribution as d,

Ooms (2004) we expect that relationship to be one of stationary fractional cointegration. We calculate T = 159 monthly inflation rates based on the HICP of France and Spain. These data were obtained from Eurostat and cover the period January 1992 to April 2005. Panel A of Table 6 shows that the memory estimates decrease as the bandwidth increases. This may be due to an added noise perturbation or, more likely, due to the distinct seasonal patterns in inflation series, possibly reflecting seasonal long memory; see Doornik and Ooms (2004). Instead of filtering this out by ad hoc procedures, we focus on the results for the lowest bandwidth, m1 = T 0.5 , which should be less sensitive to contamination from higher (e.g. seasonal) frequencies. For this bandwidth, the memory estimates for both inflation rates imply that the series are stationary. Panel B of Table 6 again supports the notion of I (d) − I (0) cointegration with d around 0.35. Here, the FMNBLS estimates are much higher than the NBLS estimates. In particular, the FMNBLS estimates of the cointegration coefficient are significantly higher than unity at the 1% level in both cases, implying that the long-run rate of inflation in Spain is higher than that in France (by over 80% according to the point estimates). In addition, the estimates of d for the residuals are lower for FMNBLS than for NBLS although all appear insignificantly different from zero (again, the usual asymptotic distribution may not apply, see Theorem 2.2). C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

103

Fully modified narrow-band least squares Table 6. Inflation rate harmonization application. Panel A: Long memory estimates, dˆ Spain Bandwidth yt = πS,t m1 = T 0.5

France xt = πF ,t

0.4007

0.3048

(0.1443)

m1 = T 0.6

(0.1443)

0.0990

−0.0690

−0.1847

−0.1377

(0.1118)

m1 = T 0.7

(0.1118)

(0.0857)

(0.0857)

Panel B: Cointegration analysis NBLS Bandwidths

αˆ m3 (0)

m0 = T 0.3 , m1 = T 0.5

0.0011

m0 = T

0.0012

0.4

, m1 = T

0.5

βˆm3 (0) 1.1395

(0.3139)

1.0577

(0.2965)

FMNBLS dû

α˜ m3 (dû )

0.0852

0.0001

0.1048

0.0011

(0.1443) (0.1443)

β˜m3 (dû ) 1.8619

(0.2797)

1.8272

(0.2470)

d˜u 0.0100

(0.1443)

0.0099

(0.1443)

Note: Panel A reports local Whittle estimates of the fractional integration orders as described in Robinson (1995a). √ d Numbers in parentheses are asymptotic standard errors using m1 (dˆ − d) → N (0, 1/4). Panel B reports NBLS and 0.8 FMNBLS estimates with m2 = T and m3 = m0 . The asymptotic standard errors for the NBLS and FMNBLS estimates are based on (2.12) and (3.8), respectively. Standard errors for dû and d˜u are based on the same asymptotic ˆ and should be used with caution; see Theorem 2.2. distribution as d,

5.3. Realized volatility relations Finally, we analyse the relation between the realized volatility of the General Electric (GE) stock and those of the Dow Jones Industrial Average (DJIA) and NASDAQ 100 indices; that is, there are three variables in this application. The realized volatilities are monthly and are constructed based on daily returns calculated as the difference in log-open and log-close prices. The sample covers January 1990 to December 2008; that is, T = 228. Panel A of Table 7 shows that the memory estimates of the three realized volatilities are very similar and stable across bandwidths with point estimates around 0.4, except for the middle bandwidth where point estimates are higher and suggest non-stationarity. A test of the hypothesis that all memory parameters are equal, see Robinson and Yajima (2002, section 3), is insignificant at conventional levels for all bandwidth choices in the table. In Panel B of Table 7, we present cointegration rank statistics from Robinson and Yajima (2002) using bandwidth m0 for rank statistics and m1 to estimate memory parameters. In the remainder of the table we ignore bandwidth m1 = T 0.7 to be able to apply their results. In particular, using the notation of Robinson and Yajima (2002, section 3), Panel B presents the eigenvalues of the correlation-type matrix . The rank can P, and the value of the model determination function L(u) using v(T ) = m−0.4 0 be determined by arg min L(u), which suggests that the rank is one. Thus, we conclude that a regression approach is appropriate in this multivariate system. In Panel C, we report estimates of the stationary fractional cointegration relation between the realized volatilities of GE and the DJIA and NASDAQ indices. Clearly, the volatility of GE should be related to the broader market volatility, so it seems reasonable to assume that the volatility of GE enters in the cointegrating regression with a non-zero coefficient. Moreover, we are interested in analysing how the volatility of GE depends on the volatilities of the two indices, C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

104


Table 7. Realized volatility relations application. Panel A: Long memory estimates, dˆ GE Dow Jones 2 2 Bandwidth yt = σGE,t x1t = σDJ ,t m1 = T 0.6

0.4350

0.3526

(0.1000)

m1 = T 0.7

0.4383

(0.1000)

0.5041

(0.1000)

0.4080

(0.0762)

m1 = T 0.8

NASDAQ 2 x2t = σND,t

0.5980

(0.0762)

0.3958

(0.0762)

0.4277

0.4026

(0.0585)

(0.0585)

(0.0585)

Panel B: Cointegration rank analysis Eigenvalues of P

L(u)

1

2

3

u=0

u=1

u=2

m0 = T 0.4 , m1 = T 0.6 m0 = T 0.4 , m1 = T 0.8 m0 = T 0.5 , m1 = T 0.6

2.3523 2.3522 2.4420

0.5889 0.5889 0.4764

0.0588 0.0588 0.0816

−1.6942 −1.6942 −1.9561

−2.0706 −2.0706 −2.2224

−1.9170 −1.9170 −2.0940

m0 = T 0.5 , m1 = T 0.8

2.4419

0.4764

0.0816

−1.9561

−2.2224

−2.0940

Bandwidths

Panel C: Cointegration regression analysis NBLS Bandwidths

αˆ m3 (0)

m0 = T 0.4 , m1 = T 0.6 m0 = T 0.4 ,

0.0001

m1 = T 0.8 m0 = T 0.5 , m1 = T 0.6 m0 = T 0.5 , m1 = T 0.8

0.0001 0.0003 0.0003

βˆm3 (0) 1.6478

(0.1321)

1.6478

(0.1318)

1.4828

(0.1203)

1.4828

(0.1188)

0.1825

(0.0211)

0.1825

(0.0332)

0.2061

(0.0257)

0.2061

(0.0300)

FMNBLS dû

α˜ m3 (dû )

0.0192

−0.0003

(0.1000)

0.0409

−0.0002

0.0362

−0.0001

(0.0585)

(0.1000)

0.0402

(0.0585)

−0.0001

β˜m3 (dû ) 1.8591

(0.1327)

1.8349

(0.1281)

1.6043

(0.1134)

1.5824

(0.1069)

0.1828

(0.0182)

0.1837

(0.0348)

0.2311

(0.0166)

0.2344

(0.0296)

d˜u −0.0004 (0.1000)

0.0561

(0.0585)

−0.0023 (0.1000)

0.0442

(0.0585)

Note: Panel A reports local Whittle estimates of the fractional integration orders as described in Robinson (1995a). √ d Numbers in parentheses are asymptotic standard errors using m1 (dˆ − d) → N (0, 1/4). Panel B reports rank statistics from Robinson and Yajima (2002) and Panel C reports NBLS and FMNBLS estimates with m2 = T 0.8 and m3 = m0 . The asymptotic standard errors for the NBLS and FMNBLS estimates are based on (2.12) and (3.8), respectively. ˆ and should be used with caution; see Standard errors for dû and d˜u are based on the same asymptotic distribution as d, Theorem 2.2.

so we choose yt to be the realized volatility of GE. From the results, it appears that the NBLS estimator underestimates the slope coefficient on DJIA (x1t ) in the cointegrating relation. Both the NBLS and the FMNBLS results indicate that the volatility of GE most strongly follows that of the DJIA.

6. CONCLUDING REMARKS We have considered estimation of the cointegration vector under weak fractional cointegration. A special case is the stationary fractional cointegration model which has found important C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


105

application recently, especially in financial economics. Previous research has considered Robinson’s (1994) semi-parametric frequency domain narrow-band least squares (NBLS) estimator, for which a condition of non-coherence between regressors and errors at the zero frequency has sometimes been imposed; for example, Christensen and Nielsen (2006). We have shown that in the absence of such condition, NBLS suffers from asymptotic bias although it remains consistent as proven by Robinson (1994). We also showed that the bias can be consistently estimated, and consequently we introduced a fully modified NBLS (FMNBLS) estimator which eliminates the bias but has the same asymptotic variance as NBLS. Indeed, FMNBLS enjoys a faster rate of convergence than NBLS in general. We also conducted a simulation study of the proposed FMNBLS estimator, which clearly demonstrated the superiority with respect to bias of the fully modified estimator relative to NBLS in the presence of non-zero long-run coherence between regressors and errors. Although this comes at the cost of increased finite sample variance, FMNBLS is superior in terms of RMSE in simulations with long-run coherence between regressors and errors. The simulations also indicate that the bias correction method works well in the presence of short-run dynamics in regressors and errors. The empirical relevance of our methodology was demonstrated through a series of brief empirical illustrations, all of which support the notion of a stationary fractional cointegration relation.

ACKNOWLEDGMENTS We are grateful to the editor Pierre Perron, two anonymous referees, Javier Hualde, Lealand Morin, Benoit Perron, Katsumi Shimotsu and participants at the 2006 Summer Econometrics Workshop in Aarhus and the 2006 European Meeting of the Econometric Society in Vienna for useful and constructive comments, and to the Social Sciences and Humanities Research Council of Canada (SSHRC grant no. 410-2009-0183), the Danish Social Sciences Research Council (FSE grant no. 275-05-0220) and the Centre for Research in Econometric Analysis of Time Series (CREATES, funded by the Danish National Research Foundation) for financial support. A previous version of this paper was circulated under the title ‘Fully modified narrow-band least squares estimation of stationary fractional cointegration’.

REFERENCES Andrews, D. W. K. and Y. Sun (2004). Adaptive local polynomial Whittle estimation of long-range dependence. Econometrica 72, 569–614. Baillie, R. T. (1996). Long memory processes and fractional integration in econometrics. Journal of Econometrics 73, 5–59. Bandi, F. M. and B. Perron (2006). Long memory and the relation between implied and realized volatility. Journal of Financial Econometrics 4, 636–70. Berger, D., A. Chaboud and E. Hjalmarsson (2009). What drives volatility persistence in the foreign exchange market? Journal of Financial Economics 94, 192–213. Brown, B. M. (1971). Martingale central limit theorems. Annals of Mathematical Statistics 42, 59–66. Chen, W. W. and C. M. Hurvich (2003). Estimating fractional cointegration in the presence of polynomial trends. Journal of Econometrics 117, 95–121.


106


Christensen, B. J. and M. Ø. Nielsen (2006). Asymptotic normality of narrow-band least squares in the stationary fractional cointegration model and volatility forecasting. Journal of Econometrics 133, 343–71. Comte, F. and E. Renault (1998). Long-memory in continuous-time stochastic volatility models. Mathematical Finance 8, 291–323. Davidson, J. (2004). Convergence to stochastic integrals with fractionally integrated integrator processes: theory and applications to fractional cointegration. Working paper, University of Exeter. Dolado, J. J. and F. Marmol (1996). Efficient estimation of cointegrating relationships among higher order and fractionally integrated processes. Working Paper 9617, Banco de Espana. Doornik, J. A. and M. Ooms (2004). Inference and forecasting for ARFIMA models with an application to US and UK inflation. Studies in Nonlinear Dynamics and Econometrics 8, Issue 2, Article 14. Fox, R. and M. S. Taqqu (1986). Large-sample properties of parameter estimates for strongly dependent stationary Gaussian series. Annals of Statistics 14, 517–32. Granger, C. W. J. (1981). Some properties of time series data and their use in econometric model specification. Journal of Econometrics 16, 121–30. Hall, P. and C. C. Heyde (1980). Martingale Limit Theory and its Application. New York: Academic Press. Hannan, E. J. (1979). The central limit theorem for time series regression. Stochastic Processes and their Applications 9, 281–89. Henry, M. and P. Zaffaroni (2003). The long range dependence paradigm for macroeconomics and finance. In P. Doukhan, G. Oppenheim and M. S. Taqqu (Eds.), Theory and Applications of Long-Range Dependence, 417–438. Boston: Birkhäuser. Hualde, J. and P. M. Robinson (2010). Semiparametric inference in multivariate fractionally cointegrated systems. Journal of Econometrics 157, 492–511. Hurvich, C. M., R. S. Deo and J. Brodsky (1998). The mean squared error of Geweke and Porter-Hudak’s estimator of the memory parameter of a long memory time series. Journal of Time Series Analysis 19, 19–46. Kim, C. S. and P. C. B. Phillips (2001). Fully modified estimation of fractional cointegration models. Working paper, Yale University. Künsch, H. R. (1987). Statistical aspects of self-similar processes. In Y. Prokhorov and V. V. Sazanov (Eds.), Proceedings of the First World Congress of the Bernoulli Society, 67– 74. Utrecht: VNU Science Press. Lobato, I. N. (1999). A semiparametric two-step estimator in a multivariate long memory model. Journal of Econometrics 90, 129–53. Lobato, I. N. and P. M. Robinson (1996). Averaged periodogram estimation of long memory. Journal of Econometrics 73, 303–24. Lobato, I. N. and P. M. Robinson (1998). A nonparametric test for I(0). Review of Economic Studies 65, 475–95. Marinucci, D. and P. M. Robinson (2001a). Finite-sample improvements in statistical inference with I(1) processes. Journal of Applied Econometrics 16, 431–44. Marinucci, D. and P. M. Robinson (2001b). Semiparametric fractional cointegration analysis. Journal of Econometrics 105, 225–47. Nielsen, M. Ø. (2005). Semiparametric estimation in time-series regression with long-range dependence. Journal of Time Series Analysis 26, 279–304. Phillips, P. C. B. (1991). Spectral regression for cointegrated time series. In W. A. Barnett, J. Powell and G. E. Tauchen (Eds.), Nonparametric and Semiparametric Methods in Econometrics and Statistics: Proceedings of the Fifth International Symposium in Economic Theory and Econometrics, 413–35. Cambridge: Cambridge University Press. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


107

Phillips, P. C. B. and B. E. Hansen (1990). Statistical inference in instrumental variables regression with I(1) variables. Review of Economic Studies 57, 99–125. Robinson, P. M. (1994). Semiparametric analysis of long-memory time series. Annals of Statistics 22, 515–39. Robinson, P. M. (1995a). Gaussian semiparametric estimation of long range dependence. Annals of Statistics 23, 1630–61. Robinson, P. M. (1995b). Log-periodogram regression of time series with long range dependence. Annals of Statistics 23, 1048–72. Robinson, P. M. (1997). Large-sample inference for nonparametric regression with dependent errors. Annals of Statistics 25, 2054–83. Robinson, P. M. (2003). Long-memory time series. In P. M. Robinson (Ed.), Time Series with Long Memory, 4–32. Oxford: Oxford University Press. Robinson, P. M. (2005). The distance between rival nonstationary fractional processes. Journal of Econometrics 128, 283–300. Robinson, P. M. (2008). Multiple local Whittle estimation in stationary systems. Annals of Statistics 36, 2508–30. Robinson, P. M. and M. Henry (1999). Long and short memory conditional heteroscedasticity in estimating the memory parameter of levels. Econometric Theory 15, 299–336. Robinson, P. M. and F. J. Hidalgo (1997). Time series regression with long-range dependence. Annals of Statistics 25, 77–104. Robinson, P. M. and D. Marinucci (2003). Semiparametric frequency domain analysis of fractional cointegration. In P. M. Robinson (Ed.), Time Series with Long Memory, 334–73. Oxford: Oxford University Press. Robinson, P. M. and Y. Yajima (2002). Determination of cointegrating rank in fractional systems. Journal of Econometrics 106, 217–41. Shimotsu, K. (2007). Gaussian semiparametric estimation of multivariate fractionally integrated processes. Journal of Econometrics 137, 277–310. Shimotsu, K. and P. C. B. Phillips (2005). Exact local Whittle estimation of fractional integration. Annals of Statistics 33, 1890–933. Velasco, C. (2003). Gaussian semi-parametric estimation of fractional cointegration. Journal of Time Series Analysis 24, 345–78.

APPENDIX A: PROOF OF THEOREMS Proof of Theorem 2.1: First write

√

ˆ m0 λdmu0 −1 m0 (βm0 (γ ) − β) as

⎞−1 m0 m0 √ 2π 2π ⎝m0 λ−1−2γ Re(Ixx (γ , λj ))m0 ⎠ m0 λdmu0−1−2γ m0 Re(Ixu (γ , λj )). m0 T j =1 T j =1 ⎛

Let Iab (λj ) denote the (a, b)th element of

Iww (0, λj ); the periodogram matrix of wt = (γ xt , γ ut ) . m0 m0 −1−2γ 2π da +db −1 2π Then the (a, b)th element of m0 λm0 j =1 Re(Ixx (γ , λj ))m0 is λm0 j =1 Re(Iab (λj )) and T T converges in probability to Kab (γ ) by Lemma B1(c). Note that G, and thus the leading (p − 1) × (p − 1) submatrix of G and therefore K(γ ), is invertible by Assumption 2.1. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

108

M. Ø. Nielsen and P. Frederiksen For the second term we show that ⎞ ⎛ m0 √ 2π D d −1−2γ u m0 ⎝m0 λm0 Re(Ixu (γ , λj )) − H (γ )⎠ → N (0, J (γ )). T j =1

By the Cramer–Wold device, for any (p − 1)-vector η, we need to examine √ η m0 λdmu0−1−2γ m Fˆxu (γ , 1, m0 ) − H (γ ) =

p−1

ηa

√

⎛ d +d −1 2π m0 ⎝λma0 p

T

a=1

=

m0

⎞ Re(Iap (λj )) − Ha (γ )⎠

j =1

p−1

m0

√ d +d −1 2π ηa m0 λma0 p Re Iap (λj ) − Aa (λj )Iεε (0, λj )A∗p (λj ) T a=1 j =1

+

+

p−1

m0

√ d +d −1 2π ηa m0 λma0 p Re Aa (λj )Iεε (0, λj )A∗p (λj ) − fap (λj ) T a=1 j =1

p−1

ηa

√

⎛ d +d −1 2π m0 ⎝λma0 p

T

a=1

m0

(A.1)

(A.2)

⎞ Re(fap (λj )) − Ha (γ )⎠ ,

(A.3)

j =1

where Iεε (0, λj ) is the periodogram matrix of εt from Assumption 2.3. −1/6 −1/2 By Lemma B1(a) it follows that (A.1) is OP (m0 (log m0 )2/3 + m0 (log m0 ) + T −1/4 ), and by min(1,φ)+1/2 − min(1,φ) Lemma B1(b) that (A.3) is O(m0 T ). Thus, both are oP (1) by Assumption 2.5. Equation (A.2) is m0 p−1 T √ 1 da +dp −1 2π −1 ∗ ηa m0 λm0 Re Aa (λj ) εt εt − Ip Ap (λj ) T (A.4) T j =1 2π a=1 t=1

+

p−1

ηa

√

⎞ T 1 −i(t−s)λ ∗ j Re ⎝Aa (λj ) εt εs e Ap (λj )⎠ . T j =1 2π T t=1 s=t

d +d −1 2π m0 λma0 p

a=1

m0

⎛

(A.5)

Note that D = T −1 Tt=1 εt εt − Ip satisfies ||D|| = OP (T −1/2 ) since εt εt − Ip is a martingale difference sequence with finite second moments. Then, by the Cauchy–Schwarz inequality, ⎛ (A.4) = OP ⎝ max

1≤a≤p−1

d +d −1/2 m0 D λma0 p

⎛ ⎞1/2 ⎛ ⎞1/2 ⎞ m0 m0 2 2 ⎝

Aa (λj ) ⎠ ⎝

Ap (λj ) ⎠ ⎠ j =1

⎛ = OP ⎝ max

1≤a≤p−1

d +d −1/2 m0 T −1/2 λma0 p

j =1

⎞1/2 ⎛ ⎞1/2 ⎞ ⎛ m0 m0 ⎝ faa (λj )⎠ ⎝ fpp (λj )⎠ ⎠ , j =1

j =1

√ where the second equality follows since ||Aa (λ)|| = O( faa (λ)). Thus, (A.4) is OP (λ1/2 m0 ). C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

109

Fully modified narrow-band least squares Next, the term inside the parenthesis in (A.5) can be rewritten as T t−1 T T −1

1 −i(t−s)λj −i(t−s)λj Aa λj εt εs e + εt εs e A∗p λj 2π T t=2 s=1 t=1 s=t+1

−i(t−s)λ

1 j Aa λj + εs εt ei(t−s)λj A∗p λj , εt εs e 2π T t=2 s=1 T

=

t−1

so that (A.5) =

p−1

ηa

a=1

×

t−1

√ m0 T m0 da +dp −1 λ εt m 0 T2 j =1 t=2

Re Aa λj A¯ p λj e−i(t−s)λj + A∗p λj Aa λj ei(t−s)λj εs

s=1

=

T

εt

t=2

t−1

ct−s εs ,

s=1

where m0 1 θj , √ 2π T m0 j =1 p−1

−i(t−s)λ da +dp j ηa λm0 Aa λj A¯ p λj e θj = Re

ct−s =

a=1

p−1

i(t−s)λ da +dp ∗ j ηa λm0 Ap λj Aa λj e + Re a=1

= Re ωj e−i(t−s)λj + ωj ei(t−s)λj

= Re ωj + Re ωj cos (t − s) λj + Im ωj − Im ωj sin (t − s) λj ,

da +dp and we have defined ωj = p−1 Aa λj A¯ p λj . By defining the triangular array (subscript T is a=1 ηa λm0

omitted for brevity) z1 = 0 and zt = εt t−1 s=1 ct−s εs , t = 2, . . . , T , we can apply the martingale difference central limit theorem of Brown (1971) and Hall and Heyde (1980, ch. 3.2) if T

p−1 p−1

P E zt2 Ft−1 − ηa ηb Jab (γ ) → 0,

t=1

(A.6)

a=1 b=1

T

E zt4 → 0,

(A.7)

t=1

since zt is a martingale difference array with respect to the filtration (Ft )t∈Z , Ft = σ ({εs , s ≤ t}). We first show (A.6). The first term on the left-hand side is t−1 t−1 t−1 t−1 T T T E εs ct−s εt εt ct−r εr Ft−1 = εs ct−s ct−s εs + εs ct−s ct−r εr . t=2

s=1 r=1

t=2 s=1

(A.8)

t=2 s=1 r=s

By slight modification of Lemma 4 of Nielsen (2005) the second term on the right-hand side of (A.8) is oP (1). Following the method of Robinson (1995a), we need to show that the mean of the first term on the C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

110


right-hand side of (A.8) is asymptotically equal to t−1 T

p−1 p−1 a=1

b=1

ηa ηb Jab (γ ). Thus,

t−1 T

E tr ct−s ct−s εs εs = tr ct−s ct−s

t=2 s=1

t=2 s=1

=

m0 t−1 T

1

4π 2 T 2 m0 t=2 s=1 j =1 +

m0 t−1 T t=2 s=1 j =1 k=j

tr θj θj

(A.9)

1 tr θj θk . 4π 2 T 2 m0

(A.10)

Note that, from standard trigonometric identities, see also Lemma 3 of Shimotsu (2007), T −t T −1

cos sλj cos (sλk ) = O (T ) ,

j = k,

t=1 s=1 T −t T −1

sin sλj sin (sλk ) = O (T ) ,

j = k,

t=1 s=1 T −t T −1

cos sλj sin (sλk ) = O(T 2 (j + k)−1 + T 2 |j − k|−1 ),

j = k,

t=1 s=1 T −t T −1 t=1

T2 +o T2 , cos2 sλj = 4 s=1

T −t T −1 t=1

T2 +o T2 , sin2 sλj = 4 s=1

j = 1, . . . , m, j = 1, . . . , m.

It is thus easily seen that (A.10) is of smaller order than (A.9), so we focus on (A.9) for which

tr θj θj = tr Re ωj + Re ωj Re ωj + Re ωj cos2 (t − s) λj

+ tr Im ωj − Im ωj Im ωj − Im ωj sin2 (t − s) λj

+ tr Re ωj + Re ωj Im ωj − Im ωj cos (t − s) λj sin (t − s) λj

+ tr Im ωj − Im ωj Re ωj + Re ωj cos (t − s) λj sin (t − s) λj . The last two terms cancel and the sum of the first two terms can be written as 2

tr θj θj = tr Re ωj + Re ωj cos2 (t − s) λj 2

− tr Im ωj − Im ωj sin2 (t − s) λj 2 2 2 cos (t − s) λj = tr Re ωj + Re ωj − Im ωj − Im ωj 2 2

sin (t − s) λj − cos2 (t − s) λj , − tr Im ωj − Im ωj C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

111


where the second term is of smaller order by the trigonometric relations above. Using that 2Re (X) = X + X¯ and 2iIm (X) = X − X¯ for any complex matrix X, the first term is

2

2 2 cos (t − s) λj tr 2−2 ωj + ω¯ j + ωj + ωj∗ − (2i)−2 ωj − ω¯ j − ωj + ωj∗

1 2 tr ωj + ωj∗2 + ωj ωj∗ + ωj∗ ωj + ω¯ j2 + ωj2 + ω¯ j ωj + ωj ω¯ j cos2 (t − s) λj 2

= tr ωj2 + ωj∗2 + ωj ωj∗ + ωj∗ ωj cos2 (t − s) λj .

=

Hence, we have found that (A.9) is asymptotically negligibly different from m0 t−1 T

1

4π 2 T 2 m0 t=2 s=1 j =1 =

t−1 T

tr ωj2 + ωj∗2 + ωj ωj∗ + ωj∗ ωj cos2 (t − s) λj p−1 p−1

1

4π 2 T 2 m0 a=1 b=1 t=2 s=1

d +db +2dp

ηa ηb λma0

m0

4π 2 cos2 (t − s) λj

j =1

× fpa λj fpb λj + fap λj fbp λj + fpp λj fba λj + fpp λj fab λj

=

p−1 p−1 1 d +d +2d ηa ηb λma0 b p 4m0 a=1 b=1

×

m0

fpa λj fpb λj + fap λj fbp λj + fpp λj fba λj + fpp λj fab λj ,

j =1

where the equalities follow from (2.11) and the trigonometric identities earlier. Approximating the sum over j by an integral, applying Assumptions 2.1, 2.2, and that cos(x) = (eix + e−ix )/2, this equals Gap Gbp 1 ηa ηb (eiπ (da +db −2dp )/2 + e−iπ (da +db −2dp )/2 ) 4 a=1 b=1 1 − da − db − 2dp p−1 p−1

Gab Gpp 1 ηa ηb (eiπ (da −db )/2 + e−iπ (da −db )/2 ) + o (1) 4 a=1 b=1 1 − da − db − 2dp p−1 p−1

+ =

p−1 p−1

Gap Gbp 1 ηa ηb cos π da + db − 2dp /2 2 a=1 b=1 1 − da − db − 2dp

Gab Gpp 1 ηa ηb cos (π (da − db ) /2) + o (1) 2 a=1 b=1 1 − da − db − 2dp p−1 p−1

+ =

p−1 p−1

ηa ηb Jab (γ ) + o (1) .

a=1 b=1

Finally, to show (A.7), T t=1

⎛ ⎞ t−1 t−1 t−1 T t−1

4 E zt = E⎝ εs ct−s εt εt ct−r εr εp ct−p εt εt ct−q εq ⎠ t=2

s=1

r=1

p=1

q=1

T t−1 t−1 t−1 T ≤C tr ct−s ct−s ct−s ct−s + tr ct−s ct−r ct−r ct−s t=2

s=1

t=2

s=1

r=1

for some constant C > 0 by Assumption 2.3. Using the arguments of Lemma 4 of Nielsen (2005), this

expression can be bounded by O(T ( Tt=1 ||ct2 ||)2 ) = O(T −1 ), which completes the proof. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

112


P Proof of Theorem 2.2: First we show that (log T ) (dû − du ) → 0. Since γ = 0 it holds that wt = (xt , ut ) such that dx,a = da and du = dp . Rewriting equations (A.1)–(A.4), (A.24), (A.25) and (A.30) from the proof of Theorem 3 of Robinson (1997) it suffices to show that 2(du −1 )−1

m1

m1

P j 2(1 −du ) hj → 0

for 0 ≤ 1 < du ,

(A.11)

j =1

−1 (log T )2 m2τ 1

m1

P j −2τ hj → 0 for some τ > 0,

(A.12)

j =1

where = exp(m−1 1

m1

j =1

m1 P (log T )2 hj → 0, m1 j =1

(A.13)

m1

1 P (j /)2(1 −du ) − 1 hj → 0, m1 j =1

(A.14)

log j ) and hj =

Iuˆ uˆ 0, λj − Iuu 0, λj u Gpp λ−2d j

(A.15)

measures the impact of using the periodogram of residuals (from NBLS with γ = 0) instead of that of the errors. Our assumption du ≥ 0 allows a simplification of conditions (A.11)–(A.14) compared to their counterparts in Robinson (1997), and could be relaxed at the expense of a longer proof. From our Theorem 2.1 and Robinson and Marinucci (2003, Theorem 3.1) it holds that βâ,m0 (0) − d −du βa = OP (λmx,a ) when du < min1≤a≤p−1 dx,a ≤ max1≤a≤p−1 dx,a < 1/2. Using that result along with 0 Assumption 2.1, (2.15) and the proof of Theorem 2 of Robinson (1995b), hj satisfies hj = OP ((j /m0 )−δmin + (j /m0 )−2δmin ). (A.16) Applying (A.16) and the fact that m −α−1 −1 α j = O (1) sup m (log m) −1≤α≤C j =1

for C ∈ (1, ∞) ,

(A.17)

which is used without special reference in what follows, it is easy to show that ⎞ ⎛ m1 2(d − δ δ )−1 2( −d −δ )−δ u 1 min min u 1 min min (A.11) = OP ⎝m1 j m0 (1 + j m0 )⎠ = OP

(log m1 )

j =1

m0 m1

δmin ,

and similarly (A.12) and (A.13) are both OP ((log T )2 (log m1 )(m0 /m1 )δmin ). Using the fact that ∼ m1 /e (e = 2.71 . . .) as T → ∞, the left-hand side of (A.14) is bounded, for large T, by m1 m1 1 ej 2(1 −du ) 1 hj , hj + m1 j =1 m1 m1 j =1 which is negligible by (A.11) and (A.13). C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.


113

Thus, we have shown (log T )-consistency of dû and proceed to prove the rate and asymptotic distribution results. With probability approaching one as T → ∞, dû satisfies 0=

ˆ d¯u ) ˆ dû ) ˆ u ) ∂ 2 R( ∂ R( ∂ R(d (dû − du ), = + ∂d ∂d ∂d 2

where |d¯u − du | ≤ |dû − du |. Following Robinson (1995a, pp. 1641–44) we have that ˆ ˜ 2,uˆ (d) − G ˜ 1,uˆ (d)2 ) ˜ 0,uˆ (d)G ∂ 2 R(d) 4(G 4(F˜0,uˆ (d)F˜2,uˆ (d) − F˜1,uˆ (d)2 ) = = , ˜ 0,uˆ (d)2 ∂d 2 G F˜0,uˆ (d)2 where m1

k

˜ k,q (d) = 1 G log λj λ2d j Iqq 0, λj m1 j =1

m1

1 and F˜k,q (d) = (log j )k λ2d j Iqq 0, λj . m1 j =1

If we show that ˜ 0,uˆ (d) − G ˜ 0,u (d) G = oP ((log m1 )−10 ), sup ¯ G(d) d∈∩N

(A.18)

ζ

P F˜k,uˆ (du ) − F˜k,u (du ) → 0, ¯ with G(d) = Gpp m11

m1

j =1

k = 0, 1, 2,

(A.19)

λj2(d−du ) and Nζ = {d : |du − d| < ζ } for 0 < ζ < 1/2, then ˆ d¯u ) P ∂ 2 R( → 4. ∂d 2

(A.20)

Note that, following Andrews and Sun (2004, p. 600), in our (A.18) we use (log m1 )−10 rather than (log m1 )−6 as in Robinson’s (1995) equation (4.6). By (4.7) in Robinson (1995a), (A.18) follows if j m1 j 1−2τ −2 P j hk → 0 (log m1 ) m1 j =1 k=1 10

for some τ > 0,

which holds by (A.12) and (A.13) above. The left-hand side of (A.19) is bounded by δmin m1 m1 Gpp (log m1 )k Gpp m0 k k+1 hj = OP (log m1 ) (log j ) hj ≤ m m1 m1 1 j =1 j =1 by the same arguments as applied to (A.13) above. This proves (A.20). Having established (A.20) it follows that √

ˆ u) √ ∂ R(d , m1 (dû − du ) = (4 + oP (1))−1 m1 ∂d

(A.21)

and the first statement of the theorem will follow below by examining the right-hand side of (A.21). To prove the second statement of the theorem we have to show that √ ∂Ruˆ (du ) ∂Ru (du ) P − m1 → 0, (A.22) ∂d ∂d C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

114


where m1 ˜ 1,q (d) ∂Rq (d) G 2 H˜ q (d) =2 − , log λj = 2 ˜ ˜ ∂d m G0,q (d) G0,q (d) 1 j =1 m1

1 H˜ q (d) = νj λ2d j Iqq 0, λj , m1 j =1

and νj = log j − m−1 1

m1

j =1

log j . Now we write the left-hand side of (A.22) as

˜ 0,uˆ (du ) − G ˜ 0,u (du )) H˜ uˆ (du ) − H˜ u (du ) H˜ u (du ) (G √ − 2 m1 ˜ 0,uˆ (du ) ˜ 0,u (du ) ˜ 0,uˆ (du ) G G G −1 √ ˜ H˜ uˆ (du ) − H˜ u (du ) ≤ 2 m1 G 0,uˆ (du ) −1 ˜ √ ˜ G ˜ 0,uˆ (du ) − G ˜ 0,u (du ) Hu (du ) . + 2 m1 G 0,uˆ (du ) G ˜ 0,u (du )

(A.23)

(A.24)

˜

u (du ) To show that (A.24) is oP (1) note that G˜Hu (d(duu)) = 12 ∂R∂d , that is, the score for the estimation problem with 0,u H˜ u (du ) −1/2 observed series, such that G˜ (du ) = OP (m1 ) as in Robinson (1995a, p. 1644). Furthermore, based on 0,u the previous results we get

m1

2du G ˜ 0,uˆ (du ) − G ˜ 0,u (du ) ≤ 1 λj Iuˆ uˆ 0, λj − Iuu 0, λj m1 j =1

≤

m1

|Gpp | |hj | = OP (m0 /m1 )δmin , m1 j =1

˜ 0,uˆ (du ) = Gpp + oP (1) by (A.19) with k = 0 and Robinson which is oP (1) by Assumption 2.6. Since G

(1995a), we have established that (A.24) = OP (m0 /m1 )δmin = oP (1). It also follows that (A.23) is of the √ same order as m1 |H˜ uˆ (du ) − H˜ u (du ) | which is equal to

Gpp √ m1

⎞ ⎛ m1 m1

(log m ) 1 hj ⎠ = OP (log m1 )√m1 (m0 /m1 )δmin . νj hj = OP ⎝ √ m1 j =1 j =1

√ √ Hence, (A.22) is OP ((log m1 ) m1 (m0 /m1 )δmin ) in general. By (A.21) it then follows that m1 (dû − du ) = √ OP ((log m1 ) m1 (m0 /m1 )δmin ) which proves the first statement of the theorem. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

115


To prove the second statement of the theorem, we need to show that if Gap = Gpa = 0 for a = √ √ P m1 |H˜ uˆ (du ) − H˜ u (du ) | → 0. Thus, m1 |H˜ uˆ (du ) − H˜ u (du ) | is equal to

1, . . . , p − 1, then in fact

m1 ν h j j j =1 m

Gpp 1 u ≤ √ νj λ2d (β − βˆm0 (0)) Re Ixx 0, λj (β − βˆm0 (0))/2 j m1 j =1

ˆ + (β − βm0 (0)) Re Ixu 0, λj p−1 p−1 m

Gpp 1 2du ˆ ˆ ≤ √ νj λj (βa − βa,m0 (0))(βb − βb,m0 (0))Re Iab λj 2 m1 j =1 a=1 b=1

Gpp √ m1

Gpp +√ m1

p−1 m1

2du . ˆ λ ν λ (β − β (0))Re I j a a,m ap j j 0 j =1 a=1

(A.25)

(A.26)

First, using summation by parts, m1

m1

u u = νm1 νj λ2d λ2d j Re Iap λj j Re Iap λj

j =1

j =1 m1 −1

−

νj +1 − νj

j =1

j

u λ2d k Re Iap (λk ) ,

k=1

and for νj we know that νm1 = O(1) and |νj +1 − νj | = O(j −1 ) uniformly in j (by a mean value expansion). In the present case with Gap = Gpa = 0 for a = 1, . . . , p − 1 we know from Theorem 2.1 that βâ,m0 (0) − −1/2 d −du βa = OP (m0 λmx,a ). This implies, in conjunction with Lemma B1(c) with Gap = Gpa = 0 for a = 0 1, . . . , p − 1, that (A.26) is

p−1 d −du 1 λmx,a 1/2 1+min(1,φ) − min(1,φ) 0 du −dx,a m1 λ T + m1 (log m1 ) OP √ √ m1 a=1 m0 m1 ⎞ ⎛ p−1 dx,a −du m1 −1 1 λ m0 du −dx,a 1+min(1,φ) − min(1,φ) −1 1/2 j j λj T + j (log j ) ⎠ + OP ⎝ √ √ m1 a=1 m0 j =1 δmin 1 m0 1/2+min(1,φ) − min(1,φ) m1 T + (log m1 ) , = OP √ m0 m1 which is negligible by Assumption 2.6. Similarly, we get that (A.25) is OP

√

p−1 p−1 dx,a +dx,b −2du λm 2d −d −d 0 m1 λm1u x,a x,b m 0 a=1 b=1

= OP

which is also negligible. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

m0 m1

2δmin √

m1 m0

,

116


−1 ˆ Proof of Theorem 3.1: To derive the asymptotic order of magnitude of λdmu2 −1 m2 m2 (γ ) − K(γ ) H (γ ), we du −1 ˆ first write λm2 m2 m2 (γ ) as

⎞−1 m2 m2

1 1 ⎝λ−2γ Re Ixx γ , λj m2 ⎠ λdmu2−2γ m2 Re Ix uˆ γ , λj . m2 m2 m2 − m0 j =m +1 m2 − m0 j =m +1 ⎛

0

0

We then show that λ−2γ m2 m2

m2

1 Re Ixx γ , λj m2 − K(γ ) = OP (l(m0 , m2 )) , m2 − m0 j =m +1

(A.27)

0

m2 λdmu2−2γ

m2

1 Re(Ix uˆ γ , λj ) − H (γ ) = OP (l(m0 , m2 ) + (m0 /m2 )δmin ), m2 − m0 j =m +1

(A.28)

0

where l(m0 , m2 ) =

m min(1,φ) 2

T

m0 + m2

−1/2

+ m2

(log m2 ) 1−2(max1≤a≤p−1 dx,a −γ )

m0 min(1,φ) −1/2 + m0 (log m0 ) , T

which is sufficient to prove the desired result since K(γ )−1 (1 + OP (l(m0 , m2 )))−1 H (γ )(1 + OP (l(m0 , m2 ) + (m0 /m2 )δmin )) = K(γ )−1 H (γ ) (1 + OP (l(m0 , m2 ))) (1 + OP (l(m0 , m2 ) + (m0 /m2 )δmin )) = K(γ )−1 H (γ )(1 + OP (l(m0 , m2 ) + (m0 /m2 )δmin )) and since max1≤a≤p−1 dx,a − γ < 1/2 implies (m0 /m2 )1−2(max1≤a≤p−1 dx,a −γ ) (log m0 ) = O((log T )−1 ). The (a, b)th element of the left-hand side of (A.27) is m2

λdma2+db m2 − m0

Re Iab λj − Kab (γ )

j =m0 +1

⎞ ⎛ m2

1 ⎝λdma +db Re Iab λj − Kab (γ )⎠ = 2 m2 − m0 j =1 ⎞ ⎛ m0

1 ⎝λdma +db − Re Iab λj − Kab (γ )⎠ 2 m2 − m0 j =1 −1/2 min(1,φ) = OP (m2 /T ) + m2 (log m2 ) −1/2 +OP (m0 /m2 )1−da −db ((m0 /T )min(1,φ) + m0 (log m0 ))

by application of Lemma B1(c). C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

117

Fully modified narrow-band least squares To prove (A.28) we write the ath element of the left-hand side as m2 m2 d +d d +d

λma2 p λma2 p Re(Iâp λj ) − Ha (γ ) = Re Iâp λj − Iap λj m2 − m0 j =m +1 m2 − m0 j =m +1

(A.29)

m2 d +d

λma2 p Re Iap λj − Ha (γ ), m2 − m0 j =m +1

(A.30)

0

0

+

0

where Iâp (λj ) is the cross-periodogram between wat and wˆ pt = γ uˆ t . Since Iâp (λj ) = Iap (λj ) +

p−1 dx,b −du ˆ ˆ )= b=1 Iab (λj )(βb − βb,m0 (γ )), equation (A.29) depends on βb − βb,m0 (γ ) which is OP (λm0 db −dp OP (λm0 ) by Theorem 2.1. Thus, m2 d +d

λma2 p (βb − βˆb,m0 (γ )) Re Iab λj m − m 2 0 j =m +1 b=1 0 p−1 da +dp db −dp m0 δmin −da −db λm 2 λm 0 λm 2 = OP = OP . m2 b=1 p−1

(A.29) =

Finally, the term (A.30) is OP (l(m0 , m2 )) by the same argument as for (A.27). The same proof can be applied for ˜ m2 , although Lemma B1(c) must be modified as λrda +db −c−1

Re eiλ(da −db )/2 λc fab (λ) dλ

λr

0

= λrda +db −c−1

λr

Gab λc−da −db Re(eiπ (da −db )/2 ) 1 + O λφ dλ

0

=

λrda +db −c−1

λr

Gab λc−da −db cos (π (da − db ) /2) (1 + O(λφ )) dλ

0

=

(1 − da − db ) Kab (γ )(1 + O(λφr )). (1 + c − da − db )

Proof of Theorem 3.2: The result follows by application of the previous theorems. From (3.5) and (3.8), √

˜ m3 λdmu3 −1 m3 (βm3 (γ ) − β) = = =

√ √ √

dû ˆ −1 ˜ −dû ˆ ˆ m3 λdmu3 −1 m3 βm3 (γ ) − λm3 m3 λm2 m2 m2 (γ ) − β ˆ m3 λdmu3 −1 m3 (βm3 (γ ) − β) −

√

δmin ˜ m3 λdmu2 −1 )) m2 m2 (γ )(1 + OP ((log T ) (log m1 )(m0 /m1 )

√ ˆ ˜ m3 λdmu3 −1 m3 λdmu2 −1 m3 (βm3 (γ ) − β) − m2 m2 (γ ) + oP (1) ,

(A.31)

where the third equality is by Assumption 3.2 (or (3.7) if m3 = m0 ) and the second follows from dˆ

λmx,a 3

−dx,a

−1/2

= 1 + OP ((log T ) m1

),

a = 1, . . . , p − 1,

λdmu2−du = 1 + OP ((log T ) (log m1 )(m0 /m1 )δmin ), ˆ


(A.32)

(A.33)

118


which are consequences of Robinson (1995a) and Assumption 2.6 (Theorem 2.2). From Theorem 3.1 it follows that √ √ ˜ m3 λdmu2 −1 m3 K(γ )−1 H (γ ) m2 m2 (γ ) = m φ √ m0 δmin 2 −1/2 −1 + m0 (log T ) + + m3 OP m2 T √ −1 = m3 K(γ ) H (γ ) + oP (1) by Assumption 3.2 (or (3.7) if m3 = m0 ). The desired result now follows from Theorem 2.1. Proof of Theorem 3.3: We need to show that follows if √ √

√

˜ ˆ ˜ m3 λdmu3 −1 m3 (βm3 (d u ) − βm3 (du )) = oP (1), which from (A.31)

ˆ ˆ ˆ m3 λdmu3 −1 m3 (βm3 (d u ) − βm3 (du )) = oP (1),

(A.34)

ˆ ˜ ˜ m3 λdmu2 −1 m2 (m2 (d u ) − m2 (du )) = oP (1).

(A.35)

First note that by the mean value theorem, ∂ Iqr (dû , λ) = Iqr (du , λ) + (dû − du ) Iqr (d¯u , λ), ∂d where ∂d∂ Iqr (d¯u , λ) = ∂d∂ Iqr (d, λ)|d=d¯u and d¯u is an intermediate value satisfying |d¯u − du | ≤ |dû − du |. Setting θ = d¯u − du we have that T T 1 ∂ (log(1 − L)θ du qt )(θ du rs ) e−i(t−s)λ ; Iqr (d¯u , λ) = ∂d 2π T t=1 s=1

see also Shimotsu and Phillips (2005, p. 1912). Adapting the last displayed equation on p. 1914 of Shimotsu and Phillips (2005) to our notation, using their equation (59) and the fact that their function JT (eiλj ) = O(log T ), we find that uniformly for θ ∈ M = {θ : |θ| ≤ (log T )−4 } it holds that ⎞ ⎛ m3 m3 1 Re ∂ Iqr (d¯u , λj ) = OP ⎝ (log T ) Re Iqr (du , λj ) ⎠ . m ∂d m 3 j =1

3

u 2π Hence, uniformly for θ ∈ M, m3 λ−1−2d m3 T element ⎛

m3

j =1

j =1

Re(Ixx (dû , λj ) − Ixx (du , λj ))m0 has (a, b)th

⎞ m3 (log T ) d +dx,b −2du −d −d +2d OP (|dû − du |)OP ⎝ Re λj x,a x,b u ⎠ λmx,a 3 m3 j =1

= OP |dû − du |(log T )

m3 √ ˆ and m3 λdmu3−1−2du m3 2π j =1 Re(Ixu (d u , λj ) − Ixu (du , λj )) has ath element T ⎞ m3

√ (log T ) −d +d −du m3 |dû − du |(log T ) . OP (|dû − du |)OP ⎝ √ Re λj x,a u ⎠ = OP λdmx,a 3 m3 j =1 ⎛

Both terms are negligible by Assumptions 2.6 and 3.2, and thus (A.34) holds uniformly for θ ∈ M. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

119


m2 1 u ˆ Similarly we find that uniformly in θ ∈ M, λ−2d m2 m2 m2 −m0 j =m0 +1 Re(Ixx (d u , λj ) − Ixx (du , λj ))m2 has (a, b)th element ⎞ ⎛ m2 (log T ) d +dx,b −2du −d −d +2d OP (|dû − du |)OP ⎝ Re λj x,a x,b u ⎠ λmx,a 2 m2 − m0 j =m +1 0

= OP |dû − du |(log T ) and m2 λdmu2−2du

m2 1 Re(Ix uˆ (dû , λj ) − Ix uˆ (du , λj )) m2 − m0 j =m +1 0

u = m2 λ−d m2

1 m2 − m0

u +m2 λ−d m2

m2

Re(Ixu (dû , λj ) − Ixu (du , λj ))

j =m0 +1

m2 1 Re(Ixx (dû , λj ) − Ixx (du , λj ))(β − βˆm0 (dû )) m2 − m0 j =m +1 0

has ath element

⎞ m2 (log T ) −d +d −du OP (|dû − du |)OP ⎝ Re λj x,a u ⎠ λdmx,a 2 m2 − m0 j =m +1 0 ⎞ ⎛ p−1 m2 (log T ) −d −d +2d d −du −du ⎠ + λdmx,a OP (|dû − du |)OP ⎝ Re λj x,a x,b u λmx,b 0 2 m2 − m0 j =m +1 b=1 0

= OP |dû − du |(log T ) + OP |dû − du |(log T ) (m0 /m2 )δmin , ⎛

and again both are negligible by Assumptions 2.6 and 3.1, which proves (A.35) uniformly for θ ∈ M. From Assumption 2.6 (Theorem 2.2) it holds that θ ∈ M with probability tending to one. Therefore the above results also hold with probability tending to one, which proves the result.

APPENDIX B: TECHNICAL LEMMA L EMMA B1. Under Assumptions 2.1–2.4, as T → ∞, for 1 ≤ r ≤ m and 0 ≤ c ≤ da + db , (a)

max λrda +db −c a,b

Re λcj [Iab λj − Aa λj Iεε 0, λj A∗b λj ]

j =1

= OP (r

(b)

r

1/3

(log r)2/3 + (log r) + r 1/2 T −1/4 ),


r

a −db Re λcj fab λj − λc−d r

j =1

= OP r 1+min(1,φ) T − min(1,φ) ,

(c)


= OP r

r

a −db Re λcj Iab λj − λc−d r

j =1

1+min(1,φ)

T − min(1,φ) + r 1/2 (log r) ,

(1 − da − db ) Kab (γ ) (1 + c − da − db )

(1 − da − db ) Kab (γ ) (1 + c − da − db )


120


where Iεε 0, λj is the periodogram matrix of εt from Assumption 2.3 and Iab (λj ) is the (a, b)th element of Iww (0, λj ); the periodogram matrix of wt = (γ xt , γ ut ) . Proof: Decompose the terms inside the real operator as

H1j = λcj Iab λj − Aa λj Iεε 0, λj A∗b λj ,

H2j = λcj Aa λj Iεε 0, λj A∗b λj − fab λj ,

a −db H3j = λcj fab λj − λc−d r

(1 − da − db ) Kab (γ ). (1 + c − da − db )

The proof of Lemma 1(b)

in Shimotsu (2007) applies also to our terms H1j and H2j which shows that (a) holds and that maxa,b | rj =1 H2j | = OP (r 1/2 (log r)). For H3j we use Assumptions 2.1 and 2.2 and the fact that Re(eiλz ) = 1 + O(λ2 ), Im(eiλz ) = O(λ) as λ → 0 for any z ∈ R, which imply Re(ei(π −λ)(da −db )/2 ) = Re(eiπ (da −db )/2 )Re(e−iλ(da −db )/2 ) − Im(eiπ (da −db )/2 )Im(e−iλ(da −db )/2 ) = cos(π (da − db ) /2)(1 + O(λ2 )) − sin(π (da − db ) /2)O(λ), such that λrda +db −c r −1

r j =1

= λrda +db −c−1

Re λcj fab λj = λrda +db −c−1

λr

Re (λc fab (λ)) dλ + RT

0

λr

Gab λc−da −db Re(ei(π −λ)(da −db )/2 ) 1 + O λφ dλ + RT

0

=

λrda +db −c−1

λr

Gab λc−da −db cos (π (da − db ) /2) (1 + O(λmin(1,φ) )) dλ + RT

0

=

(1 − da − db ) Kab (γ )(1 + O(λmin(1,φ) )) + RT . r (1 + c − da − db )

The approximation error RT is O(T c−da −db −1 (log r)) uniformly in r.


The


Corrigendum to ‘Likelihood-based cointegration tests in heterogeneous panels’ (Larsson R., J. Lyhagen and M. Löthgren, Econometrics Journal, 4, 2001, 109–142)

¨ RSAL † AND B ERND D ROGE †,‡ D ENIZ D ILAN K ARAMAN O †

‡

Institute for Statistics and Econometrics, School of Business and Economics, Humboldt-Universität zu Berlin, Spandauer Str. 1, 10099 Berlin, Germany. E-mail: [email protected], [email protected]

CASE—Center for Applied Statistics and Economics, Humboldt-Universität zu Berlin, Wirtschaftswissenschaftliche Fakultät, Spandauer Str. 1, 10178 Berlin, Germany. First version received: January 2010; final version accepted: June 2010

We correct the proof of Lemma 3 in Larsson, Lyhagen and Löthgren (2001), hereafter LLL. LLL presented a test for the cointegrating rank, r, in heterogeneous panels, which is based on the likelihood ratio (LR) test statistic developed by Johansen (1995) for p-dimensional autoregressive models. Under the null hypothesis and as the time dimension, T, approaches infinity, the LR statistic converges weakly to the asymptotic trace statistic,

1

Zk = tr

W (s) dW (s)

0

1

−1

W (s)W (s) ds

0

1

W (s) dW (s)

,

(1.1)

0

where W(s) is a k-dimensional standard Brownian motion with k = p − r. Since the first two moments of Zk are used by LLL for standardizing the average of the individual LR test statistics, their procedure relies crucially on the fact that these moments exist and, moreover, may be obtained as limits of the corresponding moments of the statistic ⎡

ZT ,k

T 1 = tr ⎣ εt Xt−1 T t=1

T 1 Xt−1 Xt−1 T 2 t=1

−1

⎤ T 1 Xt−1 εt ⎦ , T t=1

(1.2)

where εt ∼ Nk (0, Ik ) i.i.d. and Xt = ti=1 εi for t = 1, . . . , T . However, the proof of the basic Lemma 3 in LLL is incorrect. Therefore we reformulate the result in the following Lemma and provide a corrected version of the proof. L EMMA 1. There exist some constants a and b such that, for all T > k, (a) E(ZT2 ,k ) < a; q q (b) E(ZT4 ,k ) < b; (c) furthermore, E(Zk2 ) < ∞ and limT →∞ E(ZT ,k ) = E(Zk ) for q = 1, 2. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society. Published by Blackwell Publishing Ltd, 9600


¨ D. D. Karaman Orsal and B. Droge

122

Proof: The assumption T > k ensures that the inverted matrix appearing in (1.2) is nonsingular with probability one. As LLL, we introduce the (T × k) matrices ε = (ε1 , ε2 , . . . , εT ) and X = (X1 , X2 , . . . , XT ) as well as the (T × T ) matrices ⎞ ⎛ ⎞ ⎛ 1 0 ··· ··· 0 0 ··· ··· ··· 0 ⎟ ⎜ ⎜1 ⎜1 1 0 ··· 0⎟ 0 ··· ··· 0⎟ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜. .. .. ⎟ .. .. ⎜ ⎜. 1 0 ··· 0⎟ . . . . ⎟ and B = ⎜ 0 A=⎜. ⎟. ⎟ ⎜. ⎟ ⎜ . . . . ⎟ ⎜ ⎟ ⎜. . . . . . . .. .. . . . ⎟ ⎜ .. .. ⎝. .⎠ . . 0⎠ ⎝ 0 ··· 0 1 0 1 ··· ··· ··· 1 Then, X = Aε and the (k × k) matrices appearing in (1.2) can be rewritten as CT :=

T 1 1 Xt−1 Xt−1 = 2 ε A B BAε, T 2 t=1 T

DT :=

T 1 1 Xt−1 εt = ε A B ε. (1.3) T t=1 T

Defining M = BA and Y = Mε, we therefore obtain ZT ,k = tr DT CT−1 DT = tr[ε Mε(ε M Mε)−1 ε M ε] = tr(ε PY ε) ≤ tr(ε ε),

(1.4)

where PY = Y (Y Y )−1 Y is a projection matrix. The assumption εt ∼ Nk (0, Ik ) i.i.d. now implies tr(ε ε) = Tt=1 εt εt ∼ χT2 k , which shows that all moments of ZT ,k exist. However, inequality (1.4) cannot be used to bound the moments of ZT ,k uniformly in T, because the moments of a χ 2 -distributed random variable depend on the degrees of freedom. Using an inequality of Coope (1994), we get, on account of (1.4), ZT ,k = tr CT−1 DT DT ≤ tr CT−1 tr DT DT , (1.5) since CT−1 and DT DT are symmetric and non-negative definite matrices of the same order. To deal with CT , let λ1 ≥ · · · ≥ λT −1 ≥ λT ≥ 0 and v1 , . . . , vT be the eigenvalues and the associated orthonormal eigenvectors, respectively, of the symmetric and non-negative definite (T × T ) matrix F = M M. Then, for any m ∈ {1, . . . , T − 1}, F =

T

λt vt vt

λm

t=1

m

vt vt =: Fm ,

(1.6)

t=1

where denotes the Löwner partial ordering for symmetric matrices. Because of the orthonormality of the matrix V = (v1 , . . . , vT ), V ε has the same distribution as ε; that is, with the notation of Muirhead (1982), V ε ∼ N (0, IT ⊗ Ik ). This implies

ε Fm ε = λm U ,

with U :=

m

ε vt vt ε ∼ Wk (m, Ik ),

t=1

and thus, in view of (1.3) and (1.6), CT =

1 1 λm ε F ε 2 ε Fm ε = 2 U =: CT ,m . T2 T T

(1.7)


123

Corrigendum to LLL

Clearly, CT ,m is almost surely positive definite if m ≥ k. Then (1.7) leads to CT−1,m CT−1 , so that we arrive at T2 tr(U −1 ). tr CT−1 ≤ tr CT−1,m = λm

(1.8)

Observing ⎛

0

⎜1 ⎜ ⎜ ⎜ M = ⎜1 ⎜. ⎜. ⎝. 1

0

...

...

0

...

...

1 .. .

0

...

..

.

..

1

...

1

0

⎞

0⎟ ⎟ ⎟ 0⎟ ⎟, .. ⎟ ⎟ .⎠

.

0

⎛

⎞

T −1

T −2

T −3

...

1

−2

T −2

T −3

...

1

−3 .. .

T −3 .. .

T −3 .. .

... .

1 .. .

1

1

1

...

1

⎟ 0⎟ ⎟ 0⎟ ⎟ ⎟ .. ⎟ , .⎟ ⎟ ⎟ 0⎠

0

0

0

...

0

0

⎜ ⎜T ⎜ ⎜T ⎜ F =⎜ ⎜ ⎜ ⎜ ⎜ ⎝

..

0

it follows λT = 0, and λ1 , . . . , λT −1 are the eigenvalues of the positive definite matrix F˜ obtained from F by deleting the last column and the last row. This matrix can be represented as the inverse of a tridiagonal Minkowski matrix; that is, ⎛

T −1

T −2

T −3

...

2

−2

T −2

T −3

...

2

−3 .. .

T −3 .. .

T −3 .. .

... .

2 .. .

2

2

2

...

2

1

1

1

...

1

⎜ ⎜T ⎜ ⎜T ⎜ ˜ F=⎜ ⎜ ⎜ ⎜ ⎜ ⎝

..

1

⎛

⎞

⎜ ⎜ ⎟ 1⎟ ⎜ ⎜ ⎟ ⎜ ⎟ 1⎟ ⎜ ⎟ = (−1) ⎜ .. ⎟ ⎜ ⎜ .⎟ ⎜ ⎟ ⎜ ⎟ 1⎠ ⎜ ⎝ 1

−1

1

0

...

0

1

−2

1

...

0

0 .. .

1 .. .

−2

..

.

0

..

.

..

.

..

0

0

0

..

.

−2

0

0

0

...

.

1

⎞−1

0

0 ⎟ ⎟ ⎟ ⎟ 0 ⎟ ⎟ ⎟ .. ⎟ . . ⎟ ⎟ ⎟ ⎟ 1 ⎠ −2

By a result of Rutherford (1946), the positive (ordered) eigenvalues of F are given by λt =

1 2 1 − cos (2t−1)π 2T −1

for t = 1, . . . , T − 1.

The series expansion of the cosine function provides, for a fixed m and as T → ∞, (2m − 1)π (2m − 1)2 π 2 1 − cos = + o(T −3 ) 2T − 1 2(2T − 1)2 and therefore (2m − 1)2 π 2 T2 =: c1 < ∞. −→ λm T →∞ 4

(1.9)

With the notation εt = (εt1 , . . . , εtk ) , the last term in inequality (1.5) may be written as k k 1 αij2 , tr DT DT = 2 tr(ε M εε Mε) = T i=1 j =1

where αij =


T T 1 εsj εti . T s=1 t=s+1

(1.10)

¨ D. D. Karaman Orsal and B. Droge

124

To prove (a), we first apply the Cauchy–Schwarz inequality to the second power in (1.5): 2 −1 4 4 1/2 E ZT2 ,k ≤ E tr CT−1 tr DT DT ≤ E tr CT E tr DT DT .

(1.11)

Consequently, it suffices to verify that both expectations on the right-hand side of inequality (1.11) are uniformly bounded in T > k. In view of (1.10), E[tr(DT DT )]4 is uniformly bounded in T if, for i, j ∈ {1, . . . , k}, supT E(αij8 ) < ∞. But this follows from εt ∼ N(0, Ik ) i.i.d. and ⎡ ⎤ T T T T 8 1 ⎣ E αij = 8 ... ... E(εs1 j . . . εs8 j εt1 i . . . εt8 i )⎦ , T s =1 s =1 t =s +1 t =s +1 1

8

1

1

8

8

because E(εs1 j . . . εs8 j εt1 i . . . εt8 i ) = 0 if more than eight of the subscripts s1 , . . . , s8 , t1 , . . . , t8 differ. Finally, to bound E[tr(CT−1 )]4 uniformly, we recall U ∼ Wk (m, Ik ) and use the fact that the qth moments of U −1 exist if m − k − 2q + 1 > 0 (von Rosen, 1994). Consequently, E[tr(U −1 )]4 ≤ c2 < ∞ for m ≥ k + 8,

(1.12)

so that an application of inequality (1.8) for m = k + 8 (assuming T > m) together with (1.9), (1.12) and the existence of all moments of ZT ,k yields the desired result. The proof of (b) is analogous to that of (a) and thus only sketched. First, the Cauchy–Schwarz inequality provides, using (1.5), 4 −1 8 8 1/2 E ZT4 ,k ≤ E tr CT−1 tr DT DT ≤ E tr CT E tr DT DT . It is easy to see that E[tr(DT DT )]8 is uniformly bounded in T, since supT E(αij16 ) < ∞. Finally, E[tr(CT−1 )]8 is uniformly bounded by choosing m = k + 16 and applying (1.8) together with (1.9), because then E[tr(U −1 )]8 ≤ c3 < ∞. To prove (c), recall that ZT ,k converges weakly to the asymptotic trace statistic Zk (Johansen, 1995). Thus the result follows if {ZT2 ,k } is uniformly integrable (see Theorem A on p. 14 in Serfling, 1980), for which a sufficient condition is given by supT E|ZT ,k |2+δ < ∞ for some δ > 0. But this is an immediate consequence of part (b), completing the proof.

ACKNOWLEDGMENTS This research was partly supported by the Deutsche Forschungsgemeinschaft through the SFB 649 “Economic Risk”.

REFERENCES Coope, I. D. (1994). On matrix trace inequalities and related topics for products of Hermitian matrices. Journal of Mathematical Analysis and Applications 188, 999–1001. Johansen, S. (1995). Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press. C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

Corrigendum to LLL

125

Larsson, R., J. Lyhagen and M. Löthgren (2001). Likelihood-based cointegration tests in heterogeneous panels. Econometrics Journal 4, 109–42. Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. New York: John Wiley. Rutherford, D. E. (1946). Some continuant determinants arising in physics and chemistry. Proceedings of the Royal Society of Edinburgh, Section A, 62, 229–36. Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. New York: John Wiley. von Rosen, D. (1994). On moments of the inverted Wishart distribution. Statistics 30, 259–78.


The


Corrigendum to ‘A Gaussian approach for continuous time models of short-term interest rates’ (Yu, J. and P. C. B. Phillips, Econometrics Journal, 4, 210–24)

P ETER C. B. P HILLIPS †,‡,§ ,¶ AND J UN Y U †† †

‡

Yale University, PO Box 208281, New Haven, CT 06520-8281, USA. E-mail: [email protected]

University of Auckland, Private Bag 92019, Auckland 1142, New Zealand.

§ University

of Southampton, University Road, Southampton, Hampshire SO17 1BJ, UK.

¶ Singapore ††

Management University, 90 Stamford Road, Singapore 178903.

School of Economics and Sim Kee Boon Institute for Financial Economics, Singapore Management University, 90 Stamford Road, Singapore 178903. E-mail: [email protected] First version received: May 2010; final version accepted: July 2010

This note corrects an error in Yu and Phillips (2001, hereafter YP) where a time transformation was used to induce Gaussian disturbances in the discrete time version of a continuous time model. The error occurs in equations (3.7)–(3.10) of YP where the Dambis–Dubins–Schwarz (hereafter DDS) theorem was applied to the quadratic variation of the error term in equation (3.6), [M]h , in order to induce a sequence of stopping time points {tj } for which the disturbance term in (3.10) follows a normal distribution, facilitating Gaussian estimation. To apply the DDS theorem, the original error process, M(h) needs to be a continuous martingale with finite quadratic variation. In YP, it was assumed that M(h) was a continuous martingale. This note shows that the assumption is generally not warranted and so the DDS theorem does not induce a Brownian motion. However, a simple decomposition splits the error process into a trend component and a continuous martingale process. The DDS theorem can then be applied to the detrended error process, generating a Brownian motion residual. With the presence of the time-varying trend component, the discrete time model is heteroscedastic and the regressor is endogenous. The endogeneity is addressed using an instrumental variable procedure for parameter estimation. In addition, we show that the new stopping time sequence differs from that in YP by a term of O(a 2 ), where a is the pre-specified normalized timing constant. In the case where a is often chosen to be the average variance whose value is small, the difference between the two stopping time sequences is likely small. The discrete time model of the following (non-linear) continuous time model dr(t) = (α + βr(t)) dt + σ r γ (t)dB(t), has the form r(t + h) =

α βh (e − 1) + eβh r(t) + β

h

σ eβ(h−τ ) r γ (t + τ ) dB(τ ),

(1.1)

(1.2)

0



Corrigendum to Yu and Phillips

127

h where B is standard Brownian motion. Let M(h) = σ 0 eβ(h−τ ) r γ (t + τ ) dB(τ ). YP assumed that M(h) is a continuous martingale with ‘quadratic variation’: h [M]∗h = σ 2 e2β(h−τ ) r 2γ (t + τ ) dτ. (1.3) 0

Under this assumption, YP used the DDS theorem—see Revuz and Yor (1999)—to induce a Brownian motion to represent the process M(h). That is, for any fixed ‘timing’ constant a > 0, YP set s ∗ 2 2β(s−τ ) 2γ hj +1 = inf s | [Mj ]s ≥ a = inf s | σ e r (tj + τ ) dτ ≥ a , (1.4) 0

and constructed a sequence of time points {tj } using the iterations tj +1 = tj + hj +1 , leading to the following version of (1.2) evaluated at {tj }: r(tj +1 ) =

α βhj +1 (e − 1) + eβhj +1 r(tj ) + M(hj +1 ). β

(1.5)

If the DDS theorem were applicable, then M(hj +1 ) = B(a) ∼ N (0, a). Unfortunately, in general, M(h) is NOT a continuous martingale. There is a trend factor in M(h) and the quadratic variation calculation (1.3) in YP fails to take account of this factor. M(h) is not a continuous martingale even when γ = 0. In this simple case, we have h βh M(h) = σ e e−βs dB(s), 0

which is an Ornstein–Uhlenbeck (OU) process satisfying dM(h) = βM(h)dh + σ dB(h), whose quadratic variation process is h [M]h = hσ 2 = σ 2 e2βh e−2βs ds. 0

To adjust for the drift in the residual of (1.2), let h β(h−s) γ βh M(h) = σ e r (s) dB(s) = e σ where H (h) := σ

h 0

0

e

h

e−βs r γ (s) dB(s) = eβh H (h),

0

−βs γ

r (s) dB(s) is a continuous martingale. Then M(t) follows the process

dM(t) = βM(t) dt + eβt dH (t) = βM(t) dt + σ r γ (t)dB(t), with d[H ]t = σ 2 e−2βt r 2γ (t) dt

and

d[M]t = σ 2 r 2γ (t) dt.

Hence, instead of (1.3), the actual quadratic variation of M is h [M]h = σ 2 r 2γ (t + s) ds. 0

The equation of interest is

α βt α r(t) = r(0) + e − + eβt H (t), β β


128

so that

P. C. B. Phillips and J. Yu

α β(t+h) α e − + eβ(t+h) H (t + h) r(t + h) = r(0) + β β α βh βh = e r(t) + (e − 1) + eβ(t+h) H (t + h) − eβ(t+h) H (t) β α βh = e r(t) + (eβh − 1) + eβ(t+h) (H (t + h) − H (t)) β t+h α = eβh r(t) + (eβh − 1) + eβ(t+h) σ e−βs r γ (s) dB(s) β t h α = eβh r(t) + (eβh − 1) + eβh σ e−βp r γ (t + p) dB(t + p). β 0

Now

h

Qt (h) = σ

e−βp r γ (t + p) dB(t + p)

0

is a continuous martingale with dQt (h) = e−βh σ r γ (t + h)dB(t + h) and d[Qt ]h = e−2βh σ 2 r 2γ (t + h) dh. Applying the DDS theorem to Qt with timing constant a so that s 2 −2βp 2γ

e r (tj + p) dp ≥ a , hj +1 = inf s : [Qtj ]s ≥ a = inf s : σ

(1.6)

0

we have

r(tj +1 ) = eβ hj +1 r(tj ) +

α β hj +1

e hj +1 , − 1 + eβ hj +1 Qtj

β

which has Gaussian N(0, a) innovations and where tj +1 = tj +

hj +1 . However, the step size and stopping times

hj +1 are endogenous. As a result, the ordinary least squares or weighted least squares procedures are inconsistent. To consistently estimate α and β, we note that (1, r(tj )) is a valid instrument. The estimating equations are

α −β hj +1 −β

hj +1 e − r(tj ) r(tj ) = 0 r(tj +1 ) − 1 − e (1.7) β j and

α −β hj +1 −β

hj +1 e − r(tj ) = 0. r(tj +1 ) − 1 − e β j

(1.8)

Solving these two equations for α and β yields the instrumental variable (IV) estimators of ). The analytic expression for (α, β) which we denote as ( α, β α is −β hj +1 )) − r(tj (β )) r(tj +1 (β j e α=β ,

−β hj +1 j 1−e C 2011 The Author(s). The Econometrics Journal C 2011 Royal Economic Society.

Corrigendum to Yu and Phillips

129

is obtained by numerically solving the following equation: and β

e−β hj +1 r( 1 − e−β hj +1 tj +1 ) − r( tj ) r( tj ) j

j

− e−β hj +1 r( 1 − e−β hj +1 )r( tj +1 ) − r( tj ) tj = 0, j

j

) and ). where h=

h(β tj = tj ( β

ACKNOWLEDGMENTS We would like to thank Joon Park for bringing to our attention an error in Yu and Phillips (2001) and for helpful discussion on the same issue, and Minchul Shin, the editor and a referee for helpful comments. Phillips gratefully acknowledges support from the NSF under Grant Nos. SES 06-47086 and SES 09-56687.

REFERENCES Revuz, D. and M. Yor (1999). Continuous Martingales and Brownian Motion. New York: Springer. Yu, J. and P. C. B. Phillips (2001). A Gaussian approach for continuous time models of short-term interest rates. The Econometrics Journal 4, 210–24.


No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

Recommend Documents