No title

Econometrics Journal (2009), volume 12, pp. Ci–Ciii. doi: 10.1111/j.1368-423X.2009.00298.x Royal Economic Society Annua...

Author: Jianqing Fan and Richard J. Smith

39 downloads 679 Views 6MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Econometrics Journal (2009), volume 12, pp. Ci–Ciii. doi: 10.1111/j.1368-423X.2009.00298.x

Royal Economic Society Annual Conference 2008 Special Issue on Financial Econometrics

EDITORIAL The papers in this Special Issue on Financial Econometrics arise out of the invited presentations given in The Econometrics Journal Special Session on this topic at the Royal Economic Society Annual Conference held 17–19 March 2008 at the University of Warwick. The organization of Special Sessions on subjects of current interest and importance at Royal Economic Society Annual Conferences is an initiative of the Editorial Board of The Econometrics Journal to enhance further the profile and reputation of the journal. The Editorial Board is responsible for the choice of topic and organization of the Special Session. The intention is by judicious choice of topics and speakers to encourage further a higher standard of submissions to The Econometrics Journal. The 2008 Special Session on Financial Econometrics was organized by Jianqing Fan and Richard J. Smith, Co-Editor and Managing Editor of The Econometrics Journal respectively, with Jianqing Fan overseeing the editorial process for the submitted papers arising from the Special Session. Of course, in such a diverse field as financial econometrics and owing to the time constraint imposed by the Special Session, the specific topics considered are necessarily restrictive but hopefully they do provide an impression of a few of the current frontiers of financial econometrics. From their original submissions in May 2008 to the completion of the editorial process for the papers in July 2009, financial markets have undergone the worst crisis since the Great Depression. The worldwide economy has experienced an unprecedented credit and liquidity crunch and market volatility, which have lead to global recession and a surge in unemployment rates. A number of important questions naturally arise. How do asset prices behave and correlate under extremely stressful market conditions? How can extreme events be better modelled and understood? How are portfolio risks more realistically assessed and managed? How should existing financial econometric models be revised to better capture financial risks? This unfortunate crisis provides financial econometricians with great opportunities but also with significant challenges. It provides an enormous amount of data on the financial markets under crisis that would otherwise be impossible to collect. It is our hope that those data, together with financial econometric techniques, will enable us to gain a better understanding of the structural problems underlying the current crisis, to improve the financial system and to build liquid and sound capital markets. We certainly hope that the publication of these three invited articles contribute to these efforts. The innovation in information and technology allows us to collect high-frequency data for a host of financial instruments and at scales like individual bids to buy and sell and the full distributions of such bids. Challenges for the analysis of high-frequency data include handling microstructure noise and non-synchronized trading. In the paper by Ole Barndorff-Nielsen, Peter Hansen, Asger Lunde and Neil Shephard, realized kernels are introduced to estimate daily volatility of the prices of an individual asset. The emphasis is on the practical utility of such C The Author(s). Journal compilation C Royal Economic Society 2009. Published by Blackwell Publishing Ltd, 9600 Garsington Road,

Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA, 02148, USA.

Cii

methods. The details of practical implementation of such methods, including procedures for cleaning high-frequency data, handling edge-effects and choices of tuning parameters, are clearly and carefully provided. Insights are given for the choice of methods as well as their associated tuning parameters. The sensitivity of the estimates to various choices of parameters is carefully studied. The estimates based on trade and quote data are found to display a remarkable level of agreement. The features that are challenging to the use of realized kernels are identified and explained. The article gives a very comprehensive view on the current state of the art on the analysis of high-frequency data for estimating daily volatility. High-frequency data have many other important applications such as detecting and locating jumps of stock prices, estimating volatility matrices, understanding market microstructure, estimating probability of extreme events, portfolio selection and risk assessment, among others. Many of these cannot be covered in this paper but references are provided for the interested reader. Understanding the dynamics of the term structure is of paramount importance to investments on fixed income securities and decisions at central banks. The Svensson generalization of the three-factor Nelson–Siegel model on term structure has been widely used in practice, but is unfortunately not arbitrage free in its dynamic form. An important contribution of the paper by Jens Christensen, Francis Diebold and Glenn Rudebusch is to propose an arbitrage-free extension of the Svensson four-factor model. To achieve this, the authors first observe insightfully and importantly that the Nelson–Siegel model can be derived from a specific form of the three-factor affine model, which guarantees that it is arbitrage-free. This motivates an elegant derivation of a term structure model, called the arbitrage-free generalized Nelson–Siegel model, from a specific form of the five-factor affine model. The resulting term structure model is a nice and beautiful generation of the Svensson model and critically as noted above is arbitrage-free. The five-factor model and associated derivations are remarkable contributions of the paper. With three models on the term structure, Nelson–Siegel, Svensson and its arbitrage-free generalization, a simple method to obtain a dynamic model is to assume that the factor loadings follow independently an autoregressive model of order 1. For the Nelson–Siegel and the arbitrage-free generalized Nelson–Siegel models, it is also viable to use dynamic models that are derived from their respective affine models. Taken together these give a total of five dynamic models for the term structure. Statistical techniques are carefully outlined for estimating the unknown parameters in these five dynamic term structure models which are very tractable. The models and the fitting techniques are illustrated and compared by a careful and extensive study of the 16 U.S. Treasure security yields from 1987 to December 2002. It is concluded convincingly that the Nelson–Siegel curve and its associated dynamic model have trouble fitting long-maturity yields, that the popular Svensson extension improves long-maturity fit but with arbitrage opportunity in the model, and that its natural generalization, the five-factor dynamic arbitrage-free generalized Nelson–Niegel model, is arbitrage-free, fits well the term structure of all maturities and is easily tractable. Ever since Markowitz laid down the landmark theory for the mean–variance optimal portfolio, the idea and technique have been widely adapted and used in practice and it has become the cornerstone of modern investment theory. Is a given portfolio efficient? Do we benefit from diversification? Are the celebrated Capital Asset Pricing Model and Arbitrage Pricing Theory consistent with market data? These questions have attracted financial econometricians and practitioners for nearly 40 years with consequently many exciting developments and insights over the course of the development of this area. The article by Enrique Sentana provides a comprehensive and elegant overview of the econometrics of mean–variance efficiency tests. Starting with the classical Wald, Wilks and F-tests based on a Gaussian assumption on returns and their robust versions based on the generalized method of moments, the effect of the number C The Author(s). Journal compilation C Royal Economic Society 2009.

Ciii

of assets and the composition of portfolios on the power of such tests are critically analyzed. The paper then introduces asymptotically equivalent tests based on portfolio weights, investigates the trade-offs between the efficiency of tests and robustness of parametric and semi-parametric likelihood procedures and reviews the results for exact finite-sample tests. The paper is concluded nicely by a thorough discussion of mean–variance–skewness efficiency and spanning tests and by an outline of a number of topics not addressed in the paper. The survey focuses primarily on statistical techniques rather than the empirical findings to be found in the literature and provides financial econometricians with a comprehensive overview and extensive survey of the literature on the subject. We would like to take this opportunity to thank all the authors for responding to our request for a contribution to this Special Issue with these three timely papers. Especial appreciation is owed to the referees listed below of the three aforementioned papers. Without their assistance, this Special Issue would have not been possible. F. Bandi R. Kimmel

R. Kan H. Li

Y. Li

E. Renault

Jianqing Fan (Co-Editor) The Econometrics Journal Bendheim Center for Finance Princeton University Princeton, NJ 08544 United States Richard J. Smith (Managing Editor) The Econometrics Journal Faculty of Economics University of Cambridge Austin Robinson Building Cambridge CB3 9DD United Kingdom

C The Author(s). Journal compilation C Royal Economic Society 2009.

Econometrics Journal (2009), volume 12, pp. C1–C32. doi: 10.1111/j.1368-423X.2008.00275.x

Realized kernels in practice: trades and quotes O. E. B ARNDORFF -N IELSEN † , P. R EINHARD H ANSEN ‡ , A. L UNDE § AND N. S HEPHARD ¶ †

The T.N. Thiele Centre for Mathematics in Natural Science, Department of Mathematical Sciences, and CREATES, University of Aarhus, Ny Munkegade, DK-8000 Aarhus C, Denmark E-mail: [email protected] ‡

Department of Economics, Stanford University, Landau Economics Building, 579 Serra Mall, Stanford, CA 94305-6072, USA E-mail: [email protected] §

¶

Department of Marketing and Statistics, Aarhus School of Business, and CREATES, University of Aarhus, Bartholins Allé 10, DK-8000 Aarhus C, Denmark E-mail: [email protected]

Oxford-Man Institute, and Department of Economics, University of Oxford, Eagle House, Walton Well Road, Oxford OX2 6ED, UK E-mail: [email protected] First version received: May 2008; final version accepted: November 2008

Summary Realized kernels use high-frequency data to estimate daily volatility of individual stock prices. They can be applied to either trade or quote data. Here we provide the details of how we suggest implementing them in practice. We compare the estimates based on trade and quote data for the same stock and find a remarkable level of agreement. We identify some features of the high-frequency data, which are challenging for realized kernels. They are when there are local trends in the data, over periods of around 10 minutes, where the prices and quotes are driven up or down. These can be associated with high volumes. One explanation for this is that they are due to non-trivial liquidity effects. Keywords: HAC estimator, Long run variance estimator, Market frictions, Quadratic variation, Realized variance.

1. INTRODUCTION The class of realized kernel estimators, introduced by Barndorff-Nielsen et al. (2008a), can be used to estimate the quadratic variation of an underlying efficient price process from highfrequency noisy data. This method, together with alternative techniques such as subsampling and pre-averaging, extends the influential realized variance literature which has recently been shown to significantly improve our understanding of time-varying volatility and our ability to predict future volatility—see Andersen et al. (2001), Barndorff-Nielsen and Shephard (2002) and the reviews of that literature by, for example, Andersen et al. (2008) and Barndorff-Nielsen C The Author(s). Journal compilation C Royal Economic Society 2009. Published by Blackwell Publishing Ltd, 9600 Garsington Road,


C2

O. E. Barndorff-Nielsen et al.

and Shephard (2007). 1 In this paper, we detail the implementation of our recommended realized kernel estimator in practice, focusing on end effects, bandwidth selection and data cleaning across different types of financial databases. We place emphasis on methods that deliver similar estimates of volatility when applied to either quote data or trade data. This is difficult as they have very different microstructure properties. We show realized kernels perform well on this test. We identify a feature of some data sets, which causes these methods difficulties—gradual jumps. These are rare in financial markets; they are when prices exhibit strong linear trends for periods of quite a few minutes. We discuss this issue at some length. In order to focus on the core issue, we represent the period over which we wish to measure the variation of asset prices as the single interval [0, T ]. We consider the case where Y is a Brownian semimartingale plus jump process (BMSJ ) given from t t au du + σu dWu + Jt , (1.1) Yt = 0

0

Nt

where Jt = i=1 Ci is a finite activity jump process (meaning it has a finite number of jumps in any bounded interval of time). So, N t counts the number of jumps that have occurred in the interval [0, t] and N t < ∞ for any t. We assume that a is a predictable locally bounded drift, σ is a càdlàg volatility process and W is a Brownian motion, all adapted to some filtration F. For reviews of the econometrics of processes of the type Y see, for example, Shephard (2005). Our object of interest is the quadratic variation of Y ,

T

[Y ] = 0

where

T 0

σu2 du +

NT

Ci2 ,

i=1

σu2 du is the integrated variance. We estimate it from the observations 0 = τ0 < τ1 < · · · < τn = T ,

Xτ0 , . . . , Xτn , where Xτj is a noisy observation of Yτj ,

Xτj = Yτj + Uτj . We initially think of U as noise and assume E(Uτj ) = 0, Var(Uτj ) = ω2 . It can be due to, for example, liquidity effects, bid/ask bounce and misrecording. Specific models for U have been suggested in this context by, for example, Zhou (1996), Hansen and Lunde (2006), Li and Mykland (2007), and Diebold and Strasser (2007). We will write U ∈ WN to denote the case where (Uτ0 , . . . , Uτn ) are mutually independent and jointly independent of Y . There has been substantial recent interest in learning about the integrated variance and the quadratic variation in the presence of noise. Leading references include Zhou (1996), Andersen et al. (2000), Bandi and Russell (2008), Hansen and Lunde (2006), Zhang et al. (2005), Zhang (2006), Kalnina and Linton (2008), Jacod et al. (2007), Fan and Wang (2007), and Barndorff-Nielsen et al. (2008a).

1

Leading references on this include Zhang et al. (2005) Zhang (2006) and Jacod et al. (2007). C The Author(s). Journal compilation C Royal Economic Society 2009.

C3

Realized kernels in practice

Our recommended way of carrying out estimation based on realized kernels is spelt out in Barndorff-Nielsen et al. (2008b). Their non-negative estimator takes on the following form: H n h γh , γh = K(X) = k xj xj −|h| , (1.2) H +1 h=−H j =|h|+1 where k(x) is a kernel weight function. We focus on the Parzen kernel, because it satisfies the smoothness conditions, k (0) = k (1) = 0, and is guaranteed to produce a non-negative estimate. 2 The Parzen kernel function is given by ⎧ 2 3 ⎪ ⎨ 1 − 6x + 6x 0 ≤ x ≤ 1/2 1/2 ≤ x ≤ 1 k(x) = 2(1 − x)3 ⎪ ⎩ 0 x > 1. Here x j is the jth high frequency return calculated over the interval τj −1 –τj in a way that is detailed in Section 2.2. The method by which these returns are calculated is not trivial, for the accuracy and depth of data cleaning is important, as are the influence of end conditions. This realized kernel has broadly the same form as a standard heteroskedasticity and autocorrelated (HAC) covariance matrix estimator familiar in econometrics (e.g. Andrews, 1991), but unlike them, the statistics are not normalized by the sample size. This makes their analysis more subtle and the influence of end effects theoretically important. p p Barndorff-Nielsen et al. (2008b) show that as n → ∞ if K(U ) → 0 and K(Y ) → [Y ] then T NT p K(X) → [Y ] = σu2 du + Ci2 . 0

i=1

The dependence between U and Y is asymptotically irrelevant. They need H to increase with n p in order to eliminate the noise in such a way that K(U ) → 0. With H ∝ nη , we will need η > 1/3 to eliminate the variance and η > 1/2 to eliminate the bias of K(U), when U ∈ WN . 3 For p K(Y ) → [Y ], we simply need η < 1. Barndorff-Nielsen et al. (2008b) show that H ∝ n3/5 is the best trade-off between asymptotic bias and variance. 4 Their preferred choice of bandwidth is 2 1/5 ω2 k (0) ∗ ∗ 4/5 3/5 ∗ 2 , (1.3) H = c ξ n , with c = and ξ = T k•0,0 T 0 σu4 du 2 The more famous Bartlett kernel has k(x) = 1 − |x|, for |x| ≤ 1. This kernel is used in the Newey and West (1987) estimator. The Bartlett kernel will not produce a consistent estimator in the present context. The reason is that we need both k(0) − k(1/H ) = o(1) and H /n = o(1), which is not possible with the Bartlett kernel. 3 This assumes a smooth kernel, such as the Parzen kernel. If we use a ‘kinked’ kernel, such as the Bartlett kernel, then we need η > 1/2 to eliminate the variance and the impractical requirement that H /n → ∞ in order to eliminate the bias. Flat-top realized kernels are unbiased and converge at a faster rate, but are not guaranteed to be non-negative. The latter point is crucial in the multivariate case. In the univariate case, having a non-negative estimator is attractive but the flat-top kernel is only rarely negative with modern data. However, if [Y] is very small and the ω2 very large, which we saw on slow days on the NYSE when the tick size was $1/8, then it can happen quite often when the flat-top realized kernel is used. Of course our non-negative realized kernels do not have this problem. We are grateful to Kevin Sheppard for pointing out these ‘negative, days. p 4 This means that K(X) → [Y ] at rate n1/5 , which is not the optimal rate obtained by Barndorff-Nielsen et al. (2008a) and Zhang (2006), but has the virtue of K(X) being non-negative with probability one, which is generally not the case for the other estimators available in the literature.


C4


where c∗ = ((12)2 /0.269)1/5 = 3.5134 for the Parzen kernel. The bandwidth H ∗ depends on the T unknown quantities ω2 and 0 σu4 du, where the latter is called the integrated quarticity. In the next section, we define an estimator of ξ , which leads to a bandwidth, Hˆ ∗ = c∗ ξˆ 4/5 n3/5 , that can be implemented in practice. Although the assumption that U ∈ WN is a strong one, it is not needed for consistency. p Previously K(U ) → 0 has been shown under quite wide conditions, allowing, for example, the U to be a weakly dependent covariance stationary process. The realized kernel estimator in (1.2) is robust to serial dependence in U and can therefore be applied to the entire database of highfrequency prices. In comparison, Barndorff-Nielsen et al. (2008a) applied the flat-top realized kernel to prices sampled approximately once per minute, in order not to be in obvious violation of U ∈ WN —an assumption that the flat-top realized kernel estimator is based upon. The structure of the paper is as follows. In Section 2, we discuss the selection of the bandwidth H and the important role of end effects for these statistics. This is followed by Section 3, which is on the data we used in our analysis and the data cleaning we employed. We then look at our data analysis in Section 4, suggesting there are some days where our methods are really challenged, while on most days, we have a pretty successful analysis. Overall, we produce the empirically important result that realized kernels applied to quote and trade data produce very similar results. Hence for applied workers, they can use these methods on either type of data source with some comfort. This analysis is followed by a conclusion in Section 5.

2. PRACTICAL IMPLEMENTATION 2.1. Bandwidth selection in practice Initially Barndorff-Nielsen et al. (2008a) studied flat-top, unbiased realized kernels, but their flat-top estimator is not guaranteed to be non-negative. This work has been extended to the nonnegative realized kernels (1.2) by Barndorff-Nielsen et al. (2008b), and it is their results we use T here. Their optimal bandwidth depends on the unknown parameters ω2 and 0 σu4 du, through ξ as spelt out in (1.3). We estimate ξ very simply by , ξˆ 2 = ωˆ 2 IV is a preliminary estimate of IV = T σu2 du. The latter where ωˆ 2 is an estimator of ω2 and IV 0 is motivated by the fact that it is not essential to use a consistent estimator of ξ , and IV2

T 4 T 0 σu du when σ 2u does not vary much over the interval [0, T ], and it is far easier to obtain too T a precise estimate of IV than of T 0 σu4 du. 5 In our implementation we use = RVsparse , IV which is a subsampled realized variance based on 20 minute returns. More precisely, we compute a total of 1200 realized variances by shifting the time of the first observation in 1-second 2 yi4 is Consider, for yj is consistent for IV and n3 instance, the simple case without noise and T = 1, where 4 4 consistent for σu du. With constant volatility the asymptotic variances of these two estimators are 2σ and 83 σ 4 , respectively. Further, the latter estimator is more sensitive to noise. 5



C5

increments. RV sparse is simply the average of these estimators. 6 This is a reasonable starting point, because market microstructure effects have negligible effects on the realized variance at this frequency. 7 To estimate ω2 we compute the realized variance using every qth trade or quote. (q) By varying the starting point, we obtain q distinct realized variances, RV(1) dense , . . . , RVdense , say. Next we compute 2 ωˆ (i) =

RV(i) dense , 2n(i)

i = 1, . . . , q,

where n (i) is the number of non-zero returns that were used to compute RV(i) dense . Finally, our estimate of ω2 is the average of these q estimates, ωˆ 2 =

q 1 2 ωˆ . q i=1 (i)

For the case q = 1, this estimator was first proposed by Bandi and Russell (2008) and Zhang et al. 2 (2005). The reason that we choose q > 1 is robustness. For ωˆ (i) to be a sensible estimator of E(U 2τ ) it is important that E(Uτj Uτj +q ) = 0. There is overwhelming evidence against this assumption when q = 1, particularly for quote data. See Hansen and Lunde (2006) and the figures presented later in this paper. So, we choose q such that every q-th observation is, on average, 2 minutes apart. On a typical day in our empirical analysis in Section 4, we have q ≈ 25 for transaction data and q ≈ 70 for mid-quote data. These values for q are deemed sufficient for E(Uτj Uτj +q ) = 0 to be a sensible assumption. 2 2 Another issue in using RV(i) dense /(2n (i) ) as an estimator of ω , is an implicit assumption that ω is large relative to [Y ]/(2n (i) ). This problem was first emphasized by Hansen and Lunde (2006), who showed that the variance of the noise is very small after the decimalisation, in particular for actively traded assets where they found ω2 0.001 · [Y ]. The main reason being that the decimalisation has reduced some of the main sources for the noise, U , such as the magnitude of ‘rounding errors’ in the observed prices, and the bid-ask bounces in transaction prices. So our estimator, ωˆ 2 is likely to be upwards biased, which results in a conservative choice of bandwidth parameter. But there are a couple of advantages in using a conservative value of H. One is that a too small value for H will, in theory, cause more harm than a too large value for H ; another is that a larger value of H increases the robustness of the realized kernel to serial dependence in Uτ. So, in our empirical analysis we use the expression Hˆ = 3.5134ξˆ 4/5 n3/5 to choose the bandwidth parameter for the realized kernel estimator that is based on the Parzen kernel function. It should be emphasized that our bandwidth choice is optimal in an asymptotic MSE sense. Alternative selection methods that seek to optimize the finite sample properties of estimators (under the assumption that U ∈ WN and Y ⊥⊥ U ) have been proposed in Bandi and Russell (2006b). They focus on flat-top realized kernels (and related estimators), but 6 The initial two scale estimator of Zhang et al. (2005) takes this type of average RV statistic and subtracts a positive multiple of a non-negative estimator of ω2 —to try to bias adjust for the presence of noise (assuming Y ⊥ ⊥ U ). Hence this two-scale estimator must be below the average RV statistic. This makes it unsuitable, by construction, for mid-quote data where RV is typically below integrated variance due to its particular form of noise. Their bias corrected two scale estimator is re-normalized and so maybe useful in this context. 7 RV sparse was suggested by Zhang et al. (2005) and has a smaller sampling variance than a single RV statistic and is more objective, for it does not depend upon the arbitrary choice of where to start computing the returns.


C6


their approach can be adapted to the class of non flat-top realized kernels that are defined by (1.2). 2.2. End effects In this section, we discuss end effects. From a theoretical angle, we will explain why they show up in this estimation problem, why they are important, and how these effects are eliminated in the computation of the realized kernel. From an empirical perspective, we will then argue that they can largely be ignored in practice. The realized autocovariances, γ h , h = 0, 1, . . . , H are not divided by the sample size. This means that the realized kernel is influenced by the noise components of the first and last p observations in the sample, U 0 and U T , respectively. The problem is that K(U ) → U02 + UT2 = 0 as n → ∞. The important theoretical implication is that K(X) would be inconsistent if applied to raw price observations. Fortunately, this end-effect problem is easily resolved by replacing the first and last observation by local averages. The implication is that K(U ) = U¯ 02 + U¯ T2 + op (1), where U¯ 0 and U¯ T both are averages of m, say, observations. If U t is ergodic with E(U t ) = 0, then p it follows that K(U ) → 0 as m → ∞. So, the local averaging at the two end-points eliminates the end-effects. While the contribution from end effects are dampened by the local averaging (jittering), a drawback from increasing m is that fewer observations are available for computing the realized kernel. This follows from the fact that 2m observations are used up for the two local averages. This trade-off defines a mean-squared optimal choice for m. In practice, the optimal choice for m is often m = 1, as shown in Barndorff-Nielsen et al. (2008b). This is the reason that end effects can safely be ignored in practice, despite their important theoretical implications for the asymptotic properties of the realized kernel estimator. To quantify this empirically, we computed the realized kernels for m = 1, . . . , 4 for Alcoa Inc. and found that it led to almost identical estimates. Across our sample period, the (absolute) difference was on average less than 0.5% on average. Loosely speaking, end-effects can safely be ignored whenever the quadratic variation, [Y], is thought to dominate the size of U 20 + U 2T . This is the case for actively traded equities. However, for less liquid assets, this could be a problem, e.g. on days where the squared spread is, say, 5% of the daily variance of returns. In any case, we now discuss how this local averaging is carried out in practice, for the case m = 2, which is the value we use in our empirical analysis. Write the times at which the log-price process, X, is being recorded as 0 = τ 0 ≤ · · · ≤ τ N = T . When the recording is being carried out regularly in time, we have τ j − τ j −1 = T /N, for j = 1, . . . , N, but in practice, we typically have irregularly spaced observations. Define the discrete time observations X 0 , X 1 , . . . , X n , where

X0 =

1 Xτ0 + Xτ1 , 2

Xj = Xτj +1 , j = 1, 2, . . . , n − 1,

and

Xn =

1 XτN −1 + XτN . 2

Thus, the end points, X 0 and X n , are local averages of two available prices over a small interval of time. These prices allow us to define the high frequency returns as x j = X j − X j −1 for j = 1, 2, . . . , n that are used in (1.2). C The Author(s). Journal compilation C Royal Economic Society 2009.


C7

3. PROCEDURE FOR CLEANING THE HIGH-FREQUENCY DATA Careful data cleaning is one of the most important aspects of volatility estimation from high-frequency. The cleaning of high-frequency data have been given special attention in e.g. Dacorogna et al. (2001, chapter 4), Falkenberry (2001), Hansen and Lunde (2006), and Brownless and Gallo (2006). Specifically, Hansen and Lunde (2006) show that tossing out a large number of observations can in fact improve volatility estimators. This result may seem counterintuitive at first, but the reasoning is fairly simple. An estimator that makes optimal use of all data will typically put high weight on accurate data and be less influenced by the least accurate observations. The generalized least-squares (GLS) estimator in the classical regression model is a good analogy. On the other hand, the precision of the standard least squares estimator can deteriorate when relatively noisy observations are included in the estimation. So, the inclusion of poor quality observations can cause more harm than good to the least-squares estimator, and this is the relevant comparison to the present situation. The realized kernel and related estimators ‘treat all observations equally’ and a few outliers can severely influence these estimators. 3.1. Step-by-step cleaning procedure In our empirical analysis, we use trade and quote data from the NYSE Trade and Quote (TAQ) database, with the objective of estimating the quadratic variation for the period between 9:30 am and 4:00 pm. The cleaning of the TAQ high frequency data was carried out in the following steps. P1–P3 was applied to both trade and quote data, T1–T4 are only applicable to trade data, while Q1–Q4 is only applicable to quotation data. All data P1. Delete entries with a time stamp outside the 9:30 am–4 pm window when the exchange is open. P2. Delete entries with a bid, ask or transaction price equal to zero. P3. Retain entries originating from a single exchange (NYSE in our application). Delete other entries. Quote data only Q1. When multiple quotes have the same time stamp, we replace all these with a single entry with the median bid and median ask price. Q2. Delete entries for which the spread is negative. Q3. Delete entries for which the spread is more that 50 times the median spread on that day. Q4. Delete entries for which the mid-quote deviated by more than 10 mean absolute deviations from a rolling centred median (excluding the observation under consideration) of 50 observations (25 observations before and 25 after). Trade data only T1. Delete entries with corrected trades. (Trades with a Correction Indicator, CORR = 0). T2. Delete entries with abnormal Sale Condition. (Trades where COND has a letter code, except for ‘E’ and ‘F’). See the TAQ 3 User’s Guide for additional details about sale conditions. C The Author(s). Journal compilation C Royal Economic Society 2009.

C8


T3. If multiple transactions have the same time stamp, use the median price. T4. Delete entries with prices that are above the ‘ask’ plus the bid–ask spread. Similar for entries with prices below the ‘bid’ minus the bid–ask spread. 3.2. Discussion of filter rules The first step P1 identifies the entries that are relevant for our analysis, which focuses on volatility in the 9:30 am–4 pm interval. Steps P2 and T1 removes very serious errors in the database, such as misrecording of prices (e.g. zero prices or misplaced decimal point), and time stamps that may be way off. T2 rules out data points that the TAQ database is flagging up as a problem. Table 1 gives a summary of the counts of data deleted or aggregated using these filter rules for the database used in Section 4, which analyses the Alcoa share price. By far, the most important rules here are P3, T3 and Q1. In our empirical work, we will see the impact of suspending P3. It is used to reduce the impact of time-delays in the reporting of trades and quote updates. Some form of T3 and Q1 rule seems inevitable here, and it is these rules which lead to the largest deletion of data. We use Q4 to get the outliers that are missed by Q3. By basing the window on observation counts, we will have it expanding and contracting in clock time depending on the trading intensity. The choice of 50 observations for the window is ad hoc, but validated through extensive experimentation. T4 is an attractive rule, as it disciplines the trade data using quotes. However, it has the disadvantage that it cannot be applied when quote data is not available. 8 We see from Table 1 that it is rarely activated in practice, while later results we will discuss in Table 2 on realized kernels, demonstrate the RK estimator (unlike the RV statistic) is not very sensitive to the use of T4. It is interesting to compare some of our filtering rules to those advocated by Falkenberry (2001) and Brownless and Gallo (2006). In such a comparison, it is mainly the rules designed to purge outliers/misrecordings that could be controversial. Among our rules Q4 and T4 are the relevant ones. Q4 is very closely related to the procedure (Brownless and Gallo 2006, pp. 2237) advocate for removing outliers. They remove observation i if the condition, |pi − p¯ i (k)| > 3si (k) + γ , is true. Here p¯ i (k) and s i (k) denote, respectively, the δ-trimmed sample mean and sample standard deviation of a neighbourhood of k observations around i and γ is a granularity parameter. We use the median in place of the trimmed sample mean, p¯ i (k), and the mean absolute deviation from the median in place of s i (k). By not using the sample standard deviation, we become less sensitive to runs of outliers. Falkenberry (2001) also use a threshold approach to determine if a certain observation is an outlier. But instead of using a ‘Search and Purge’ approach he applies a ‘Search and Modify’ methodology. Prices that deviate with a certain amount from a moving filter of all prices are modified to the filter value. For transactions, this has the advantage of maintaining the volume of a trade even if the associated price is bad. Finally, we note that our approach to discipline the trade data using quotes, T4, has formerly be applied in only Hansen and Lunde (2006), Barndorff-Nielsen et al. (2006), and BarndorffNielsen et al. (2008a). 8

When quote data is not available, Q4 can be applied in place of T4, replacing the word mid-quote with price. C The Author(s). Journal compilation C Royal Economic Society 2009.

C9


Table 1. Summary statistics for the cleaning and aggregation procedures when applied to Alcoa Inc. (AA) data from different exchanges. Trade date Quote data P2 January 24, 2007 NYSE 7276 PACIF NASD NASDAQ

T1 T2

T3

T4

P2

Q1

Q2 Q3

Q4

0

0

0

2299

5

42,121

0

28,205

0

0

68

6847 9813 0

0 0

0 0

0 14

4678 6365

1 1

15,909 30,231 0

0 15

7768 20,625

0 0

0 87

12 57

142

0

0

3

32

3

8787 4606

0 0

0 0

0 0

3454 2824

4 4

51,115 21,509

0 0

36,843 12,024

0 0

0 0

6 0

10,743 0 479

0

0

2

6728

11

26

28,922

0

197

49

0

0

3

40,130 0 36

Other January 26, 2007 NYSE PACIF NASD NASDAQ OtherOther

3

May 4, 2007 NYSE

8487

0

0

0

3234

8

48,812

0

34,181

0

0

35

PACIF NASD

4795 1402

0 0

0 0

0 16

3117 372

4 0

28,676 2394

0 0

19,250 1491

0 0

0 6

0 0

NASDAQ OtherOther

10,131 485

0 0

0 0

0 1

7155

0

49,720 34,926

0 88

39,751

0

0

6

May 8, 2007 NYSE PACIF

24,347 24,840

0 0

0 0

1 0

14,475 19,096

53 13

109,240 76,900

0 0

90,766 62,386

0 0

0 0

8 0

NASD NASDAQ

6,643 42,162

0 0

4 0

15 0

2384 34,483

1 23

17,003 138,140

0 0

12,908 122,610

0 0

108 0

1 4

1,897

0

0

3

102,810

7

Other

Notes: The first column gives the number of observations observed between 9:30 am and 4:00 pm (P1). Subsequent columns state the reductions in the number of observations due to each of the cleaning/aggregation rules. A blank entry means that the filter was not applied in the particular case. NYSE(N): New York Stock Exchange, PACIF(P): Pacific Exchange, NASD(D): National Association of Security Dealers, NASDAQ(T): National Association of Security Dealers Automated Quotient, in each case the letter in parenthesis is the TAQ identifier.

4. DATA ANALYSIS We analyse high-frequency stock prices for Alcoa Inc. which has the ticker symbol AA. It is the leading producer of aluminium, and its stock is currently part of the Dow Jones Industrial Average (DJIA). We have estimated daily volatility for each of the 123 days in the six-month period from January 3 to June 29, 2007. Much of our discussion will focus on four days that C The Author(s). Journal compilation C Royal Economic Society 2009.

C10


Table 2. Sensitivity of RV and RK to our filtering rules P2, T3 and T4 for trade data from Alcoa Inc. (AA) on three specific days and averaged across the full sample. No. of Observations Realized variance Realized kernel P2

T3.•

T4.E

P2

T3.E T4.E

P2

T3.A T3.B T3.C T3.D T3.E T4.E

January 24, 2007 NYSE 7276

4977

4972

3.25

2.20

2.14 0.91

0.81

0.83

0.83

0.83

0.82

0.82

PACIF NASD All

2169 3434 7815

2168 3433

1.34 2.65 7.19

1.26 1.71 2.88

1.07 0.97 1.55 0.95 1.02

0.83 0.84 0.96

0.83 0.84 0.95

0.84 0.83 0.92

0.83 0.83 0.92

0.83 0.84 0.92

0.76 0.84

January 26, 2007 (excluding 12:13 to 12:21 pm) NYSE 8169 5094 5090 6.95 5.61

5.67 5.10

5.30

5.31

5.31

5.31

5.31

5.31

PACIF NASD All

6847 9813 24,078

4160 9828 22,630

1663 3815 7757

1660 3805

4.85 6.20 11.00

4.84 5.27 6.31

4.86 5.27 5.12 4.79 4.86

5.14 5.08 5.16

5.14 5.08 5.17

5.13 5.08 5.17

5.14 5.08 5.17

5.14 5.09 5.16

5.13 5.09

May 8, 2007 NYSE 24,347

9871

9818 14.27

7.32

7.72 6.25

6.82

6.73

6.70

6.71

6.72

6.69

PACIF 24,840 NASD 6643 NASDAQ 42,162

5744 4240 7679

5731 7.94 5.52 5.51 7.08 4239 23.69 12.50 9.24 7.57 7656 7.57 5.38 5.39 6.51

7.10 6.99 6.89

7.09 7.02 6.87

7.09 7.02 6.84

7.09 7.01 6.87

7.10 7.01 6.90

7.08 7.04 6.89

All

99,889 13585

62.62

7.34

6.17

6.90

6.88

6.88

6.87

6.88

5476 2196 2356

5460 4.91 2194 12.26 2351 2.81

3.27 4.08 2.48

3.24 2.46 3.81 2.43 2.47 2.53

2.42 2.37 2.44

2.41 2.37 2.44

2.41 2.37 2.44

2.41 2.37 2.44

2.41 2.37 2.44

2.41 2.38 2.44

3526 8344

3447

8.36 2.41 2.50 2.69 83.83 17.61 2.70

2.57 2.54

2.57 2.53

2.56 2.53

2.56 2.53

2.57 2.54

2.60

Averages over full sample NYSE NASD PACIF

9719 4109 7602

NASDAQ 12,846 All 31,735

Notes: Analysis based on data from the common exchanges (NYSE, PACIF, NASD and NASDAQ) and all exchanges (denoted ALL). T3A-E vary how multiple data on single seconds are aggregated. Our preferred method is T3.E, which takes the median prices. The first three columns report the observation count at each stage. T3.• signify that T3A-E all result in the same number of observations.

highlight some challenging empirical issues. The data are transaction prices and quotations from NYSE and all data are from the TAQ database extracted from the Wharton Research Data Services (WRDS). We present empirical results for both transaction and mid-quote prices that are observed between 9:30 am and 4:00 pm. We first present results for a regular day, by which we mean a day where the high frequency returns are such that it is straightforward to compute the realized kernel. Then we present empirical results on the use of realized kernels using the entire sample of 123 separate days, indicating the realized kernels behave very well and better than any available realized variance statistic. Then we turn our attention to days where the high-frequency data have some unusual and puzzling features that potentially could be harmful for the realized kernel. C The Author(s). Journal compilation C Royal Economic Society 2009.


C11

4.1. Sensitivity to data cleaning methods In Table 2, we give a summary of the various effects of aggregating and excluding observations in different manners. We have carried out the analysis along two dimensions. First, we have separated data from different exchanges. Specifically, we consider trades on NYSE, PACIF, NASD and NASDAQ in isolation. We also investigate the performance of the estimator when all exchanges are considered simultaneously, which is the same as dropping P3 entirely. This defines the first dimension that is displayed in the rows of Table 2, for three of the four days we give special attention and averaged over the full sample for AA. Our second dimension is the amount of cleaning, aggregation and filtering that we apply to the data. With reference to the cleaning and filtering step in Section 3.1, the columns of Table 2 have the following information. P2: This is the data with a time stamp inside the 9:30 am–4 pm window, when most the exchanges are open. We have deleted entries with a bid, ask or transaction price equal to zero. So, this is basically the raw data, with the only purged observations being clearly nonsense ones. T3.A–E: This is what is left after step T.3. The different letters represent five different ways of aggregating transactions that have the same time stamp: A. First single out unique prices and aggregate volume. Then use the price that has the largest volume. B. First single out unique prices and aggregate volume. Then use the price by volume weighted average price. C. First single out unique prices and aggregate volume. Then use the price by log(volume) weighted average price. D. First single out unique prices and aggregate volume. Then use the price by number of trades weighted average price. E. Use the median price. This is the method that we used in the paper. T4.E This is what is left after rounding step T.4 on the data left after T3.E. In Table 2, we present observation counts, realized variances and realized kernels. Two things are particularly conspicuous. On January 24 at PACIF, only one observation was filtered out by T4.E, still both the realized variance and the realized kernels are quite sensitive to whether this observation is excluded—it is the only day and exchange where this is the case. In the left-hand panel of Figure 1, we display the data around this observation, and it is clear that it is out of line with the rest. Also May 8 at NASD, only one observations was filtered out by T4.E, here only the realized variance is quite sensitive to whether this observation is excluded. In the right-hand panel of Figure 1, we display the data around this observation, and again, it is clear that it is out of line with the rest. Hence we conclude that T4 is useful when it can be applied in practice, but it does not usually make very much difference in practice when RK estimators are used. An noteworthy feature of Table 2 is how badly RV does when we aggregate data across exchanges and only apply P2—basically only implementing trivial cleaning. The upward bias we see for RV when based on trade-by-trade data is dramatically magnified. Some of this is even picked up by the RK statistic, which significantly benefits from the application of T3. It is clear from this table that if one wanted to use information across exchanges, then it is better to carry out RK on each exchange separately and then average the answers across the exchanges rather than treat all the data as if they were from a single source. C The Author(s). Journal compilation C Royal Economic Society 2009.

C12


31.92

39.5

31.90 39.4

31.86

Price (05-08-2007)

Price (01-24-2007)

31.88

31.84 31.82 31.80 31.78 31.76

39.3 39.2 39.1 39.0

31.74 31.72

38.9

deleted by T4

deleted by T4

31.70 38.8 9:30

9:31

9:32

9:33

PACIF: Pacific Exchange

9:34

9:35

14:27

14:28

14:29

14:30

14:31

14:32

NASD: National Association of Security Dealers

Figure 1. Transaction prices for Alcoa Inc. over a period of 5 minutes surrounding one observation deleted by T4.E. The left-hand panel displays January 24 on PACIF, and the right-hand panel show the scenario at May 8 on NASD.

4.2. A regular day: May 4, 2007 Figure 2 shows the prices that were observed in our database after being cleaned. They are based on the irregularly spaced times series of transaction (left-hand panels) and mid-quote (righthand panels) prices on May 4, 2007. The two upper plots show the actual tick-by-tick series, comprising 5246 transactions and 14,631 quotations recorded on distinct seconds. Hence for transactions data, we have a new observation on average every 5 seconds, while for mid-quotes it is more often than every couple of seconds. In the middle panel the corresponding price changes are displayed, changes above 5 cents and below minus 5 cents are marked by a large star (red) and are truncated (in the picture) at ±5 cents. May 4 was a quite tranquil day with only a couple of changes outside the range of the plot. The lower panel gives the autocorrelation function of the log-returns. The acf(1) is omitted from the plot, but its value is given in the subtext. For the transaction series, the acf(1) is about −0.24, which is fundamentally different from the one found for the mid-quote series that equals 0.088. This difference is typically for NYSE data as first noted in Hansen and Lunde (2006). It is caused by the more smooth character of most midquote series, that induces a negative correlation between the innovations in Y and the innovations in U. The negative correlation results in a smaller, possibly negative, bias for the RV, and this feature of mid-quote data will be evident from Figure 5, which we discuss in the next subsection. The negative bias of the RV is less common when mid-quotes are constructed from multiple exchanges, see, e.g. Bandi and Russell (2006a). A possible explanation for this phenomenon was given in Hansen and Lunde (2006, pp. 212–214 ), who showed that pooling mid-quotes from multiple exchanges can induce additional noise that overshadows the endogenous noise found in single exchange mid-quotes. May 4, 2007 is an exemplary day. The upper panels of Figure 3 present volatility signature plots for irregularly spaced times series of transaction prices (left-hand panels) and mid-quote prices (right-hand panels). 9 The dark line is the Parzen kernel with H = c∗ ξ 4/5 n3/5 , and the light line is the simple realized variance. 9 These pictures extend the important volatility signature plots for realized volatility introduced by Andersen et al. (2000). To construct the plots we use activity fixed tick time, where the sampling frequency is chosen such that we



9:30

9:30

10:15

10:15

11

11

13

14

12 13 14 Time of Day (05-04-2007, min = –0.08, max = 0.05)

12

16

16

–0.03

–0.02

–0.01

0.0

0.01

0.02

0.03

0.04

0.05 0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04 –0.05

9:30

9:30

10:15

10:15

11

11

13

14


12

30

24

20

15

40

30

24

20

15

11

8

6

4

2

Lag length (05-04-2007, ACF(1) = 0.0725)

40

Lag length (05-04-2007, ACF(1) = –0.1778)

15

15

35.0

35.1

35.2

35.3

35.4

35.5

35.6

35.7

15

15

16

16

100

90

80

70

60

50

11

8

6

4

2

100

90

80

70

60

50

Figure 2. High-frequency prices and returns for Alcoa Inc. (AA) on May 4, 2007, and the first 100 autocorrelations for tick-by-tick returns. Left-hand panels are for transaction prices and right-hand panels are for mid-quote prices. Returns larger than 5 cents in absolute value are marked by red dots in the middle panels. The largest and smallest (most negative) returns are reported below the middle panels. Lower panels display the autocorrelations for tick-by-tick returns, starting with the second-order autocorrelation. The numerical value of the first-order autocorrelation is given below these plots. A log-scale is used for the x-axis so that the values for lower-order autocorrelations are easier to read.

0.05 0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04 –0.05

35.0

35.1

35.2

35.3

35.4

35.5

35.6

0.05 0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04 –0.05

Price (05-04-2007, 5245 obs.)

Price changes

ACF

Mid-quote (05-04-2007, 14588 obs.) Mid-quote changes ACF

35.7


C13

C14


2.77

2.54

2.54

2.31 RV

1.85

1.62 1.39

1.16

1.16

0.93

0.93

0.70

0.70

IV estimate (05-04-2007)

2.54

1.85 1.62

12

10

8

5

2.31 2.08

ˆ

ˆ

1.85 1.62

1.39

1.39

1.16

1.16

0.93

0.93

0.70

0.70 660 560 460 410 360 310 260

190 165 140 115 95 80 70 60 50 40

30 24 20 15

10 7 5 3

0

660 560 460 410 360 310 260

190 165 140 115 95 80 70 60 50 40

30 24 20 15

10 7 5 3

0

H (kernel width)

22.5

2.77

2.54

ˆ

12.8

2.77

ˆ

8.0

3.00

2.08

18

RV

Sampling frequency in seconds (on average)

3.00

2.31

22

30

40

104

1.6

307.9

124.5

97.9

66.9

35.7

26.8

17.8

13.4

4.5



5

1.85

300.0

21

1.39

2.08

121.9

26

10 8

12

2.31

92.9

31

59

17

ˆ

62.4

1.62


2.77

2.08

ˆ

RV 3.00

32.1


ˆ

ˆ

RV 3.00

H (kernel width)

Figure 3. Signature plots for the realized kernel and realized variance on May 4, 2007 for Alcoa Inc. Those based on transaction prices are plotted in left-hand panels and those based on mid-quote prices are plotted in right-hand panels. The horizontal line in these plots is the subsampled realized variances based 20-minute returns. The thicker dark line in the upper panels represents the realized kernels using the bandwidth Hˆ ∗ = c∗ ξˆ 4/5 n3/5 , and the thin line is the usual realized variance. The lower panels plot the point estimates of the realized kernel as a function of the bandwidth, H , where the sampling frequency is the same (tick-by-tick returns) for all realized kernels. The estimate of the optimal bandwidth is highlighted in the lower panels.

The lower panel of Figure 3 present a kernel signature plot where the realized kernel computed on tick-by-tick data is plotted against increasing values of H. In these plots, we have indicated the optimal choices of H. In both plots, the horizontal line is an average of simple realized variances based on 20 minute returns sampled with different offsets. The shaded areas denote the 95% confidence interval based on 20 minute returns using the (Barndorff-Nielsen and Shephard, 2002) feasible realized variance inference method. We characterize May 4, 2007 as

get approximately the same number of observations each day. To explain it, assume that the first trade at the ith day occurred at time t i0 and the last trade on the ith day occurred at time tini . So approximate ‘60 second’ sampling is constructed as follows. We get the tick time sampling frequency on day i as 1 + ni 60/(tini − ti0 ). In this way, there will be approximately 60 seconds between observations when one takes the intraday average over the sampled intratrade durations. The actual sampled durations will in general be more or less widely dispersed. C The Author(s). Journal compilation C Royal Economic Society 2009.


C15

an exemplary day because the signature plots are almost horizontal. This shows that the realized kernel is insensitive to the choice of sampling frequency. An erratic signature plot indicates potential data issues, although pure chance is also a possible explanation. 4.3. General features of fesults across many days Transaction prices and mid-quote prices are both noisy measures of the latent ‘efficient prices’, polluted by market microstructure effects. Thus, a good estimator is one that produces almost the same estimate with transaction data and mid-quote data. This is challenging, as we have seen the noise has very different characteristics in these two series. Figure 4 presents scatter plots where estimates based on transaction data are plotted against the corresponding estimates based on mid–quote data. The upper two panels are scatter plots for the realized kernel using tick-by-tick data (left-hand side) and the upper right-hand plot is the realized kernel based on 1-minute returns, and both scatter plots are very close to the 45◦ , suggesting that the realized kernel produce accurate estimates at this sampling frequencies, with little difference between the two graphs. The lower four panels are scatter plots for the realized variance using different sampling frequencies: tick-by-tick returns (middle left-hand panel), 1-minute returns (middle right-hand panel), 5-minute returns (lower left-hand panel) and 20-minute returns (lower right-hand panel). These plots strongly suggest that the realized variance is substantially less precise than the realized kernel. The realized variance based on tick-by-tick returns is strongly influenced by market microstructure noise. But the characteristics of market microstructure noise in transaction prices are very different from those of mid-quote prices. Thus, as already indicated, the trade data causes the realized variances to be upward biased, while for quote data, it is typically downward bias. This explains that the scatter plot for tick-by-tick data (middle left-hand panel) is shifted away from the 45◦ degree line. Table 3 reports a measure for the disagreement between the estimates based on transaction prices and mid-quote prices. The statistics computed in the first row are the average Euclidian distance from the pair of estimators to the 45◦ degree line. To be precise, let V T ,t and V Q,t be estimators based on transaction data and quotation data, respectively, on day t, and let V¯t be the average of the two. The distance from (V T , V Q ) to the 45◦ degree line is given by

√ (VT ,t − V t )2 + (VQ,t − V t )2 = VT ,t − VQ,t / 2,

and the first row of Table 3 reports the average of this distance computed over the 123 days in our sample. The distance is substantially smaller for the realized kernels than any of the realized variances, while our preferred estimator, the realized kernel based on tick-by-tick returns, has the least disagreement between estimates based on transaction data and those based on quote data. The relative distances are reported in the second row of Table 3, and we note that the disagreement between any of the realized variance estimators is more than twice that of the realized kernel. Table 4 contains summary statistics for realized kernel and realized variance estimators for the Alcoa Inc. data over our 123 distinct days. The estimators are computed with transaction prices and mid-quote prices using different sampling frequencies. The sample average and standard deviation is given for each of the estimators, and the fourth column has the empirical correlations between each of the estimators and the realized kernel based on C The Author(s). Journal compilation C Royal Economic Society 2009.

C16

O. E. Barndorff-Nielsen et al. 2.5

Slope = 1.014 (0.01))

Slope = 0.986 (0.009))

const = –0.017 (0.009) R2 = 0.987

const = 0.024 (0.008) R2 = 0.989

log(RKernel Transactions)

log(RKernel Transactions)

2.0

2.0

1.5 1.0 0.5

1.5 1.0 0.5 0.0

–0.5

0.0

0.0

0.5

1.0

1.5

log(RKernel mid quotes)

2.0

2.5

–1.0 –1.0

tick sampl.

0.0

0.5

1.0

log(RKernel mid quotes)

1.5

2.0

ap. 1 min sampl.

Slope = 0.948 (0.018))

2.0

2.0

const = 0.081 (0.016)

log(RV Transactions)


R2 = 0.956

1.5

1.0

0.5

Slope = 0.799 (0.023))

1.5

1.0

0.5

const = 0.671 (0.016)

0.0

R2 = 0.906

0.0

0.5

1.0

0.0 1.5

log(RV mid quotes)


2.0

0.0

0.5

1.0

1.5

log(RV mid quotes)

2.5

Slope = 0.872 (0.03)) const = 0.111 (0.027)

2.0

R2 = 0.878

2.0

ap. 1 min sampl.

Slope = 0.918 (0.035)) const = 0.077 (0.032) R2 = 0.848


2.5

2.0 tick sampl.

1.5 1.0 0.5

1.5 1.0 0.5 0.0

–0.5

0.0

–1.0

–0.5 –0.5

0.0

0.5

log(RV mid quotes)

1.0

1.5

2.0

2.5

ap. 5 min sampl.

–1.0

0.0

log(RV mid quotes)

1.0

2.0 ap. 20 min sampl.

Figure 4. Scatter plots of estimates based on transaction prices plotted against the estimates based on midquote prices for Alcoa Inc. Regression lines and regression statistics are included with the 45◦ line.

tick-by-tick transaction prices. The table confirms the high level of agreement between the realized kernels estimator based on transaction data and mid-quote data. They have the same sample mean, and the sample correlation is nearly one. The time-series standard deviation of the daily mid-quote based realized kernel is marginally lower than that for the transaction C The Author(s). Journal compilation C Royal Economic Society 2009.

C17


Table 3. This Table present statistics that measure the disagreement between the daily estimates based on transaction prices and mid-quote prices. Realized kernel Simple realized variance tick

1 min

tick

1 min

5 min

20 min

Alcoa Inc (AA) Distance

0.089

0.105

1.119

0.170

0.312

0.406

Relative Distance

1.000

1.182

12.62

1.922

3.523

4.575

American International Group, Inc (AIG) Distance 0.020 0.038

0.458

0.061

0.088

0.132

Relative Distance

1.000

1.892

22.75

3.035

4.382

6.558

0.079 1.000

0.060 0.755

0.578 7.277

0.133 1.669

0.166 2.095

0.248 3.117

0.047 1.000

0.051 1.083

0.564 11.96

0.106 2.246

0.121 2.567

0.242 5.132

Bank of America Corporation (BAC) Distance 0.028 Relative Distance 1.000

0.070 2.509

0.620 22.21

0.050 1.775

0.084 3.004

0.345 12.35

Citigroup (C) Distance

0.033

0.052

0.722

0.080

0.139

0.250

Relative Distance

1.000

1.604

22.12

2.467

4.270

7.664

American Express (AXP) Distance Relative Distance Boeing Company (BA) Distance Relative Distance

based realized kernel. The table also shows the familiar upward bias of the tick-by-tick trade based RV and downward bias of the mid-quote version. Low frequency RV statistics have more variation than the tick-by-tick RK, while the RK statistic behaves quite like the 1-minute mid-quote RV. Figure 5 contains histograms that illustrate the dispersion (across the 123 days in our sample) of various summary statistics. In a moment we will provide a detailed analysis of three other days, and we have marked the position of these days in each of the histograms. As is the case in most figures in this paper, the left-hand panels correspond to transaction data and right-hand panels to mid-quote data. The first row of panels present the log-difference between the realized kernel computed with tick-by-tick returns and the realized kernel based on five-minute returns. The day we analysed in greater details in the previous subsection, May 4, is fairly close to the median in all of these dimensions. The three other days—May 8, January 24 and January 26—are our examples of ‘challenging days’. January 24 and January 26 are placed in the two tails of the histogram related to the variation in the realized kernel. The three other dimensions we provide histograms for are—(2nd row) the log-difference between the realized variance computed with tick-by-tick returns and that computed with five minute returns; (3rd row) the distribution of the estimated first-order autocorrelation and the 4th row contains histograms for the sum of the next nine autocorrelations (acf(2) to acf(10)). C The Author(s). Journal compilation C Royal Economic Society 2009.

C18


Table 4. Summary statistics for realized kernel and realized variance estimators, applied to transaction prices or mid-quote prices at different sampling frequencies for Alcoa Inc. (AA). ], K) Mean (HAC) Std. ρ([Y acf(1) acf(2) acf(5) acf(10) Realized kernels based on transaction prices 1 tick 2.401 (0.268) 1.750 1 minute 2.329 (0.290) 1.931

1.000 0.952

0.50 0.44

0.29 0.23

−0.08 −0.08

0.10 0.10

RV based on transaction prices 1 tick 3.210 (0.232)

1.670

0.916

0.44

0.25

−0.12

0.10

1 minute 5 minute 20 minute

1.555 2.001 1.745

0.969 0.953 0.878

0.46 0.40 0.30

0.28 0.26 0.22

−0.12 −0.08 −0.04

0.10 0.06 0.10

Realized kernels based on mid-quotes 1 tick 2.402 (0.258) 1.720

0.997

0.49

0.29

−0.09

0.09

1 minute

0.944

0.42

0.22

−0.08

0.12

2.489 (0.225) 2.458 (0.293) 2.315 (0.262)

2.299 (0.281)

1.877

RV based on mid-quotes 1 tick 1.897 (0.173)

1.209

0.910

0.41

0.26

−0.09

0.11

1 minute 5 minute

2.398 (0.234) 2.464 (0.317)

1.529 2.138

0.973 0.966

0.50 0.45

0.31 0.23

−0.09 −0.08

0.10 0.08

20 minute

2.286 (0.298)

2.061

0.884

0.34

0.19

−0.03

0.06

Notes: The empirical correlations between the realized kernel based on tick-by-tick transaction prices and each of the estimators are given in column 4 and some empirical autocorrelations are given in columns 5–8.

Note the bias features of the realized variance that is shown in the second row of histograms. For transaction data the tick-by-tick realized variance tends to be larger than the realized variance sampled at lower frequencies, whereas the opposite is true for mid-quote data. Next we turn to three potentially harder days that have features that are challenging for the realized kernel. These days were selected to reflect important empirical issues we have encountered when computing realized kernels across a variety of datasets. 4.4. A heteroskedastic day: May 8, 2007 We now look in detail at a rather different day, May 8, 2007. Figure 6 suggests that this day has a lot of heteroskedasticity, with a spike in volatility at the end of the day. This day is also characterized by several large changes in the price. The transaction price changed by as much as 25 cents from one trade to the next and the mid-quote price by as much as 19 cents over a single quote update. Informally, this is suggestive of jumps in the process. Although jumps can alter the optimal choice of H, they do not cause inconsistency in the realized kernel estimator. The middle panels of Figure 6 visualise the different behaviour of the price throughout the day. The jump in volatility around 2:30 pm is quite clear from these plots. In spite of the jump in volatility, and possibly jumps in the price process, Figure 7 offers little to be concerned about, in terms of the realized kernel estimator. Again the volatility signature plot is reasonably stable for both transaction prices and mid-quote prices, and so, one has quite some confidence in the estimate. C The Author(s). Journal compilation C Royal Economic Society 2009.


Count

Count

Count

0

4

8

12

12 9 6 3 0

0

4

8

12

-0.20

-20

-60

0

-0.15

-0.30

-40

20

8/5

4/5

8/5

-0.25

-20

-0.10

24/1

26/1

0

-0.05

26/14/5

-0.20

26/1

40

8/5

0.0

24/1

4/5

20

26/1

-0.10 -0.05 Variation in daily estimates of acf(1)

8/5

80 100 120 Variation in daily signature plot for RV

60 80 100 Variation in daily signature plot for RK

0.05 0.10 0.15 0.20 Variation in daily estimates of acf(2)+...+acf(10)

4/5 24/1

-0.15

60

40

24/1

24 20 16 12 8 4 0

12 9 6 3 0

0

4

8

12

16

16 12 8 4 0

–0.10

–80

–0.3

–60

–0.2

–0.06

–60

–40

26/1

–20

–0.1

8/5

–0.02

–40

26/1

20

0.0

0.02

0.06

40

0.1

0.08

4/58/5

20

0.10 0.12 0.14 0.16 0.18 Variation in daily estimates of acf(1)

40 60 80 Variation in daily signature plot for RV

60 80 100 120 Variation in daily signature plot for RK

24/1

0.2 0.3 0.4 0.5 Variation in daily estimates of acf(2)+...+acf(10)

4/5 24/126/1

0.04

26/1

24/1

0.0

0

4/5 8/5 24/1

8/5

–20

0

4/5

Figure 5. Histograms for various characteristics of the 102 days in our sample. Left-hand panels are for transactions prices, right-hand panels are for mid-quote prices. The two upper panels are histograms for the difference between the realized kernel based on 1-tick returns and that based on five-minute returns. The panels in the second row are the corresponding plots for the realized variance. Histograms of the first-order autocorrelation are displayed in the panels in the third row. The fourth row of panels are histograms for the sum of the 2nd to the 10th autocorrelation. The 4 days for which detailed results are provided are identified in each of the histograms.

Count

Count Count Count Count

12 9 6 3 0


C19

Price (05-08-2007, 9817 obs.)

Price changes

9:30

10:15

10:15

12

12

13

13

Time of Day (05-08-2007, min = –0.11, max = 0.25)

11

11

15

15

16

16

0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04 –0.05 –0.06

9:30

0.05 0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04 –0.05

9:30

10:15

10:15

12

12

13

13


11

11

24

20

15

11

30

24

20

15

11

8

6

4

2

Lag length (05-08-2007, ACF(1) = 0.0752)

30

Lag length (05-08-2007, ACF(1) = –0.0818)

14

14

38.2

38.4

38.6

38.8

39.0

39.2

39.4

39.6

39.8

14

14

15

15

90

80

70

60

50

40

8

6

4

2

100

90

80

70

60

50

40

16

16

Figure 6. High-frequency prices and returns for Alcoa Inc. on May 8, 2007, and the first 100 autocorrelations for tick-by-tick returns. For details see Figure 2.

–0.05

–0.03

–0.01

0.01

0.03

0.05

0.05 0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04 –0.05 9:30

38.2

38.4

38.6

38.8

39.0

39.2

39.4

39.6

39.8

ACF


C20 O. E. Barndorff-Nielsen et al.


C21

Realized kernels in practice ˆ ∗ = c∗ ξˆ 4/ 5 n 3/5 R K w ith H 12.0

11.1

11.1

10.2

10.2

9.20 8.25 RV 36

82

6.35

28

21

17

5 12

10

8

9.20 8.25 7.30 6.35

5.40

5.40

4.45

4.45

3.50

3.50

18

12

10

8

5

RV

92.5

62.1

21.5

31.7

11.4

6.3


13.0

13.0

12.0

12.0

11.1

11.1 IV estimate (05-08-2007)


33 22

1.3

303.9

123.8

64.3

92.9

33.3

23.8

14.3

9.5

2.4


10.2 9.20

46

121

300.0


12.0

7.30

ˆ ∗ = c∗ ξˆ 4/ 5 n 3/ 5 R K w ith H

RV 13.0

121.2


RV 13.0

ˆ = c* ξˆ 4/ 5 n 3/ 5 H*

8.25 7.30

10.2 9.20

7.30

6.35

6.35

5.40

5.40

4.45

4.45

3.50

3.50 660 560 460 410 360 310 260

190 165 140 115 95 80 70 60 50 40

30 24 20 15

10 7 5 3

0

660 560 460 410 360 310 260

190 165 140 115 95 80 70 60 50 40

30 24 20 15

10 7 5 3

0

H (kernel width)

ˆ = c* ξˆ 4/5 n 3/ 5 H*

8.25

H (kernel width)

Figure 7. Signature plots for the realized kernel and realized variance for Alcoa Inc. on May 8, 2007. For details see Figure 3.

4.5. A ‘gradual jump’: January 26, 2007 The high-frequency prices for January 26 is plotted in Figure 8. On this day, the price increases by nearly 1.5% between 12:13 and 12:20. The interesting aspect of this price change is the gradual and almost linear manner by which the price increases in a large number of smaller increments. Such a pattern is highly unlikely to be produced by a semi-martingale adapted to the natural filtration. The gradual jump produces rather disturbing volatility signature plots in Figure 9, which shows that the realized kernel is highly sensitive to the bandwidth parameter. This is certainly a challenging day. We zoom in on the gradual jump in Figure 10. The upper left-hand panel has 96 upticks and 43 downticks. The lower plot shows that the volume of the transactions in the period that the price changes are not negligible; in fact, the largest volume trades on January 26 are in this period. C The Author(s). Journal compilation C Royal Economic Society 2009.

Price (01-26-2007, 5329 obs.)

Price changes

9:30

0.05 0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04

0.05 0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04 –0.05 9:30

31.4

31.5

31.6

31.7

31.8

31.9

32.0

32.1

32.2

32.3

10:1 5

10:15

12

12

13

13

14

14


11

11

16

16

9:30

0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03

9:30

0.05 0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04 –0.05

31.4

10:15

10:15

12

13

14

11 12 13 14 Time of Day (01-26-2007, min = –0.03, max = 0.05)

11

30

24

20

15

11

40

30

24

20

15

11

8

6

4

2

Lag length (01-26-2007, ACF(1) = 0.0421)

40

Lag length (01-26-2007, ACF(1) = –0.2017)

15

15

31.5

31.6

31.7

31.8

31.9

32.0

32.1

32.2

32.3

15

15

16

16

100

90

80

70

60

50

8

6

4

2

100

90

80

70

60

50

Figure 8. High-frequency prices and returns for Alcoa Inc. on January 26, 2007, and the first 100 autocorrelations for tick-by-tick returns. For details see Figure 2.

ACF




C23


ˆ = c* ξˆ 4/5 n 3/5 RK with H* 9.50

8.72

8.72

7.94

7.94

7.16

7.16

6.38 5.60

21

17

8

7

14

4.04

48

7

33

25

18

14

85

2.48

2.48

1.70

1.70

300.0

62.2

Sampling frequency in seconds (on average) 9.50

8.72

8.72

7.94

7.94 IV estimate (01-26-2007)

9.50

7.16 6.38 5.60

ˆ = c* ξˆ 4/5 n 3/5 H*

4.82

RV

1.6

303.9

127.2

96.3

65.7

35.1

26.4

17.6

13.2

4.4



8 10

3.26

13.1

25

32.8

10 RV

4

4.82

8.2

3.26

5.60

122.5

4.04

6.38

91.8

4


9.50

4.82

ˆ = c* ξˆ 4/5 n 3/5 RK with H*

RV

23.0


RV

7.16 6.38 5.60

4.04

4.04

3.26

3.26

2.48

2.48

1.70

1.70 660 560 460 410 360 310 260

190 165 140 115 95 80 70 60 50 40

30 24 20 15

0

10 7 5 3

660 560 460 410 360 310 260

190 165 140 115 95 80 70 60 50 40

30 24 20 15

10 7 5 3

0

H (kernel width)

ˆ = c* ξˆ 4/5 n 3/5 H*

4.82

H (kernel width)

Figure 9. Signature plots for the realized kernel and realized variance for Alcoa Inc. on January 26, 2007. For details see Figure 3.

One possible explanation of this is that there is one or a number of large funds wishing to increase their holding of Alcoa (perhaps based on private information), and as they buy the shares, they consume the immediately available liquidity—they could not buy more at that price, the instantaneous liquidity may not exist, it can only be met by waiting for it to refill. If the liquidity had existed, then the price may have shot up in a single move. An explanation of such a scenario can be based on market microstructure theory (see e.g. the surveys by O’Hara, 1995 or Hasbrouck, 2007). Dating back to Kyle (1985) and Admati and Pfleiderer (1988a,b, 1989), the idea is to model the trading environment as comprising three kinds of traders: risk neutral insiders, random noise trades and risk neutral market makers. The noise trades are also known as liquidity traders because they trade for reasons that are not directly related to the expected value of the asset. As such they provide liquidity, and it is their presence that explain what we encounter in Figure 10. An implication of the theory is that without these noise traders, there would be no one willing to sell the asset on the way up to the new price C The Author(s). Journal compilation C Royal Economic Society 2009.

C24


Price

Price

32.20 0.020 32.15 0.015

32.10 32.05

0.010

32.00 0.005 31.95 0.0

31.90 31.85

–0.005

31.80 –0.010 31.75 31.70

–0.015

31.65 –0.020

12:22

12:21

12:20

12:19

12:18

12:17

12:16

12:15

12:14

12:13

12:12

12:22

12:21

12:20

12:19

12:18

12:17

12:16

12:15

12:14

12:13

12:12

Time of day (01-26-2007)

Time of day (01-26-2007) Price

Trading Volume

32.3

17553 32.2 15060 32.1 12567 32.0 10073 31.9 7580 31.8 5087 31.7 2593 31.6 100 31.5 13:00

12:55

12:50

12:45

12:40

12:35

12:30

12:25

12:20

12:15

12:10

12:05

12:00

11:55

11:50

11:45

Time of day (01-26-2007)

Figure 10. The ‘gradual’ jump on January 26, 2007. Prices and returns in the period from 2:12 pm to 12:22 pm are shown in the two upper panels. The lower panel shows the prices and volume (vertical bars) between 11:45 am and 1:00 pm.

level at 12:25. So, without the noise traders, we would have seen a genuine jump in the price. Naturally, this line of thinking is speculative, and abstract from the fact that some market makers, including those at the NYSE, are obliged to provide some liquidity. This ‘compulsory’ liquidity will also tend to erase genuine jumps in the observed prices. Mathematically, we can think of a gradual jump in the following way. The efficient price jumps at time τ j by Yτj but Xτj 0, which means that

Yτj − Uτj . C The Author(s). Journal compilation C Royal Economic Society 2009.


C25

Hence the noise process is now far from zero. As trade or quote time evolves the noise trends back to zero, revealing the impact of the jump on X, but this takes a considerable amount of new observations if the jump is quite big. This framework suggests a simple model Uτj = Vτj + ετj , Vτj = ρVτj −1 − θτj Yτj ,

ρ ∈ [0, 1),

where ετj is covariance stationary and θτj is one for gradual jumps. Obviously, this could induce very significant correlation between the noise and the price process. Of course not all jumps will have this characteristic. When public announcements are made, where the timing of the announcement is known a priori, then jumps tend to be absorbed immediately in the price process. In those cases θτj = 0. These tend to be the economically most important jumps, as they are difficult to diversify. This line of thinking encouraged us to remove this gradual jump to replace it by a single jump. This is shown in Figure 11, while the corresponding results for the realized kernels are given in Figure 12 which should be compared with Figure 9. This seems to deliver very satisfactory results. The autocorrelations are now very different after having removed observations between 12:13 and 12:21. Compare with Figure 8. Hence ‘gradual jumps’ seem important in practice and challenging for this method. We do not currently have a method for automatically detecting gradual jumps and removing them from the database. 4.6. A puzzling day: January 24, 2007 The feature we want to emphasize with this day is related to the spiky price changes. The upper panel of Figure 13 shows this jittery variation in the price, in particular towards the end of the day, where the price moves a lot within a narrow band. We believe this variation is true volatility rather than noise because the bid ask spread continues to be narrow in this period, about 2 cents most of the time. January 24, 2007 is a day where the realized kernel is sensitive to the sampling frequency and choice of bandwidth parameters, H , as is evident from Figure 14. This may partly be attributed to pure chance, but we do not think that chance is the whole story here. Chance plays a role because the standard error of the realized kernel estimator depends on both the sampling frequency and bandwidth parameter. Rather the problem is that too large a H , or too low sampling frequency will overlook some of the volatility on this day—a problem that will be even more pronounce for the low-frequent realized variance. We will return to this issue in Figure 15. Figure 14 also reveals a rather unusual volatility signature plot for the realized variance based on mid-quote prices. Usually the RV based on tick-by-tick returns is smaller than that based on moderate sampling frequencies, such as 20-minutes, but this is not the case here. Figure 15 shows the prices that will be extracted at different sampling frequencies. The interesting aspect of these plots is that the realized variance, sampled at moderate and low frequencies, largely overlooks the intense volatility seen towards the end of the day. Returns based on 20 minutes, say, will tend to be large in absolute value, during periods where the volatility is high. However, there is a chance that the price will stay within a relatively narrow band over a 20 minute period, despite the volatility being high during this period. This appears to be the case toward the end of the trading day on January 24, 2007. The reason that we believe the rapid changes in the price is volatility rather than noise, is because the bid–ask spread is narrow in this period; so, both bid and ask prices jointly move rapidly up and down C The Author(s). Journal compilation C Royal Economic Society 2009.

Price (01-26-2007, 5090 obs.)

Price changes

9:30

0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04

0.05 0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04 –0.05 9:30

31.4

31.5

31.6

31.7

31.8

31.9

32.0

32.1

32.2

32.3

10:15

10:15

12

13


11

15

15

16

16

9:30

0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03

0.05 0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04 –0.05 9:30

31.4

10:15

10:15

12

13


11

24

20

15

11

30

24

20

15

11

8

6

4

2

Lag length (01-26-2007, ACF(1) = 0.0043)

30

Lag length (01-26-2007, ACF(1) = –0.124)

14

14

31.5

31.6

31.7

31.8

31.9

32.0

32.1

32.2

32.3

14

14

15

15

16

16

100

90

80

70

60

50

40

8

6

4

2

100

90

80

70

60

50

40

Figure 11. High-frequency prices and returns for Alcoa Inc. on January 26, 2007, and the first 100 autocorrelations for tick-by-tick returns, after prices between 12:13 and 12:21 are removed from the sample. For details see Figure 2.

ACF




C27

Realized kernels in practice ˆ = c* ξˆ 4/5 n 3/5 RK with H* 8.72

7.94

7.94

7.16

7.16

6.38

12

10

5

4.04

5.60 102

4.04

3.26

3.26

2.48

2.48

1.70

1.70

22

22.0


18

12

10

8

5


9.50

9.50

8.72

8.72 7.94 IV estimate (01-26-2007)

7.94 IV estimate (01-26-2007)

32

RV

1.7

307.9

128.6

96.3

68.8

36.7

18.4

27.6

13.8

4.6

45

4.82

300.0

17

123.2

20

62.4

26

32.1

30

4.82

6.38

11.8

58

8

92.9

RV


8.72

5.60

ˆ = c* ξˆ 4/5 n 3/5 RK with H*

RV 9.50

6.8


RV 9.50

ˆ = c* ξˆ 4/5 n 3/5 H*

7.16 6.38 5.60 4.82

6.38 5.60 4.82

4.04

4.04

3.26

3.26

2.48

2.48

1.70

1.70 660 560 460 410 360 310 260

190 165 140 115 95 80 70 60 50 40

30 24 20 15

0

10 7 5 3

660 560 460 410 360 310 260

190 165 140 115 95 80 70 60 50 40

30 24 20 15

10 7 5 3

0

H (kernel width)

ˆ = c* ξˆ 4/5 n 3/5 H*

7.16

H (kernel width)

Figure 12. Signature plots for the realized kernel and realized variance for Alcoa Inc. on January 26, 2007, after deleting the prices between 12:13 pm and 12:21 pm. For details see Figure 3.

during this period. Naturally, when prices are measured over 20 minutes intervals returns are small, yet volatility is high, the realized variance (based on 5-minute returns) will underestimate the volatility, for the simple reason that the intraday returns do not reflect the actual volatility. This seems to be the case on this day as illustrated in the two lower panels in Figure 15. The two sparsely sampled RV cannot capture this variation in full, because the intense volatility cannot fully be unearthed by 20-minute intraday returns. Because the realized kernel can be applied to tick-by-tick returns, it does not suffer from this problem to the same extent. Utilizing tick-by-tick data gives the realized kernel a microscopic ability to detect and measure volatility that would otherwise be hidden at lower frequencies (due to chance). The ‘strength’ of this ‘microscope’ is controlled by the bandwidth parameter, and the realized kernel gradually looses its ability to detect volatility at the local level as H is increased. However, H must be chosen sufficiently large to alleviate the problems caused by noise. C The Author(s). Journal compilation C Royal Economic Society 2009.

Price (01-24-2007, 4972 obs.)

9:30

10:15

11

12

13

11 12 13 Time of Day (01-24-2007, min = -0.06, max = 0.06)

10:15

15

15

16

16

9:30

0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03

0.05 0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04 –0.05 9:30 10:15

10:15

12

13


11

24

20

15

11

30

24

20

15

11

8

6

4

2

Lag length (01-24-2007, ACF(1) = 0.0095)

30

Lag length (01-24-2007, ACF(1) = –0.2685)

14

14

31.70

31.72

31.74

31.76

31.78

31.80

31.82

31.84

31.86

31.88

31.90

31.92

31.94

31.96

31.98

14

14

15

15

16

16

100

90

80

70

60

50

40

8

6

4

2

100

90

80

70

60

50

40

Figure 13. High-frequency prices and returns for Alcoa Inc. on January 24, 2007, and the first 100 autocorrelations for tick-by-tick returns. For details see Figure 2.

0.05 0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04

0.05 0.04 0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04 –0.05 9:30

31.70

31.72

31.74

31.76

31.78

31.80

31.82

31.84

31.86

31.88

31.90

31.92

31.94

31.96

31.98

Price changes

ACF




C29

Realized kernels in practice ˆ = c* ξˆ 4/ 5 n 3/ 5 RK with H*

RV

ˆ = c* ξˆ 4/ 5 n 3/ 5 RK with H*

RV

2.50

2.50

2.27

2.27 2.04

1.81

1.81 IV estimate (01-24-2007)


RV

2.04

1.58 1.35 1.12 0.89 86

0.66

44

38

30

25

1.58 1.35 1.12 150

0.89

18

66 47 33

26 18

12

0.43

14

0.43

0.20

12

7

300.0

7

123.2

14

0.20 62.6

92.9

32.1

22.0

11.8


2.50

2.27

2.27

2.04

2.04

1.81

1.81 IV estimate (01-24-2007)

2.50

1.58

ˆ = c* ξˆ 4/ 5 n 3/ 5 H*

1.35

6.8

1.7

303.9

126.5

98.7

65.7

37.6

28.2

18.8

14.1

4.7



RV

0.66

1.12

1.58

1.12

0.89

0.89

0.66

0.66

0.43

0.43

0.20

0.20 660 560 460 410 360 310 260

190 165 140 115 95 80 70 60 50 40

30 24 20 15

10 7 5 3

0

660 560 460 410 360 310 260

190 165 140 115 95 80 70 60 50 40

30 24 20 15

10 7 5 3

0

H (kernel width)

ˆ = c* ξˆ 4/ 5 n 3/ 5 H*

1.35

H (kernel width)

Figure 14. Signature plots for the realized kernel and realized variance for Alcoa Inc. on January 24, 2007. For details see Figure 3.

On January 24, 2007, we believe that K(X) 0.90 is a better estimate of volatility than the subsampled realized variance based on 20 minute returns, whose point estimate is nearly half that of our preferred estimator.

5. CONCLUSIONS In this paper, we have tried to be precise about how to implement our preferred realized kernel on a wide range of data. Based on a non-negative form of the realized kernel, which uses a C The Author(s). Journal compilation C Royal Economic Society 2009.

O. E. Barndorff-Nielsen et al. 1 tick

31.98 31.96 31.94 31.92 31.90 31.88 31.86 31.84 31.82 31.80 31.78 31.76 31.74 31.72 31.70

Price

Price

C30

9:30

11

12

13

14

15

1 min

31.98 31.96 31.94 31.92 31.90 31.88 31.86 31.84 31.82 31.80 31.78 31.76 31.74 31.72 31.70

16

9:30

11

5 min

31.98 31.96 31.94 31.92 31.90 31.88 31.86 31.84 31.82 31.80 31.78 31.76 31.74 31.72 31.70 9:30

11

12

13

12

13

14

15

16

Time of Day (357 obs.)

Price

Price


14

15

16

20 min

31.98 31.96 31.94 31.92 31.90 31.88 31.86 31.84 31.82 31.80 31.78 31.76 31.74 31.72 31.70 9:30


11

12

13

14

15

16


0.05 0.04

Price changes / Spread

0.03 0.02 0.01 0.0 –0.01 –0.02 –0.03 –0.04 –0.05 9:30

10:15

11

12

13

14

15

16

Time of Day (01-24-2007)

Figure 15. Transaction prices for Alcoa Inc. on January 24, 2007 at different sampling frequencies. The lower panel presents the tick-by-tick return on transaction data (dots), and the spread as it varied throughout the day (vertical lines).

Parzen weight function, we implement it using an averaging of the data at the end conditions. The realized kernel is sensitive to its bandwidth choice. We detail how to choose this in practice. A key feature of estimating volatility in the presence of noise is data cleaning. There is very little discussion of this in the literature, and so, we provide quite a sustained discussion of the interaction between cleaning and the properties of realized kernels. This is important in practice, for in some application areas, it is hard to extensively clean the data (e.g. quote data may not be available), while in other areas (such as when one has available trades and quotes from the TAQ database) extensive and rather accurate cleaning is possible. C The Author(s). Journal compilation C Royal Economic Society 2009.


C31

We provide an analysis of the properties of the realized kernel applied simultaneously to trade and quote data. We would expect the estimation of [Y] to deliver similar answers and they do, indicating the strength of these methods. Finally, we identify an unsolved problem for realized kernels when they applied over relatively short periods. We call these ‘challenging days’. They are characterized by lengthy strong trends being present in the data, which are not compatible with standard models of market microstructure noise.

ACKNOWLEDGMENTS This paper was presented at the Econometrics Journal invited session on Financial Econometrics at the Royal Economic Society’s Annual Meeting. We thank Richard Smith for his invitation to give it and the co-editor, Jianqing Fan and two anonymous referees for valuable comments that improved this manuscript. We also thank Roel Oomen, Marius Ooms and Kevin Sheppard for helpful comments. The second and fourth author are also affiliated with CREATES, a research centre funded by the Danish National Research Foundation.

REFERENCES Admati, A. R. and P. Pfleiderer (1988a). Selling and trading on information in financial markets. American Economic Review 78, 96–103. Admati, A. R. and P. Pfleiderer (1988b). A theory of intra day patterns: volume and price variability. Review of Financial Studies 1, 3–40. Admati, A. R. and P. Pfleiderer (1989). Divide and conquer: a theory of intraday and day-of-the-week mean effects. Review of Financial Studies 2, 189–223. Andersen, T. G., T. Bollerslev and F. X. Diebold (2008). Parametric and nonparametric measurement of volatility. Forthcoming in Y. A¨ıt-Sahalia and L. P. Hansen (Eds.), Handbook of Financial Econometrics, Amsterdam: North Holland. Andersen, T. G., T. Bollerslev, F. X. Diebold and P. Labys (2000). Great realizations. Risk 13, 105–8. Andersen, T. G., T. Bollerslev, F. X. Diebold and P. Labys (2001). The distribution of exchange rate volatility. Journal of the American Statistical Association 96, 42–55. (Correction (2003) Journal of the American Statistical Association 98, 501). Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817–58. Bandi, F. M. and J. R. Russell (2006a). Comment of Hansen and Lunde (2006). Journal of Business & Economic Statistics 24, 167–73. Bandi, F. M. and J. R. Russell (2006b). Market microstructure noise, integrated variance estimators, and the limitations of asymptotic approximations: a solution. Working paper, Graduate School of Business, University of Chicago. Bandi, F. M. and J. R. Russell (2008). Microstructure noise, realized variance, and optimal sampling. Review of Economic Studies 75, 339–69. Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde and N. Shephard (2006). Subsampling realized kernels. Working paper, Nuffield College, University of Oxford. Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde and N. Shephard (2008a). Designing realized kernels to measure the ex-post variation of equity prices in the presence of noise. Econometrica 76, 1481–536. C The Author(s). Journal compilation C Royal Economic Society 2009.

C32


Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde and N. Shephard (2008b). Multivariate realized kernels: consistent positive semi-definite estimators of the covariation of equity prices with noise and nonsynchronous trading. Working paper, Oxford-Man Institute, University of Oxford. Barndorff-Nielsen, O. E. and N. Shephard (2002). Econometric analysis of realized volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B, 64, 253–80. Barndorff-Nielsen, O. E. and N. Shephard (2007). Variation, jumps and high frequency data in financial econometrics. In R. Blundell, T. Persson and W. K. Newey (Eds.), Advances in Economics and Econometrics. Theory and Applications, Ninth World Congress. Econometric Society Monographs, 328– 372. Cambridge: Cambridge University Press. Brownless, C. T. and G. M. Gallo (2006). Financial econometric analysis at ultra-high frequency: Data handling concerns. Computational Statistics & Data Analysis 51, 2232–45. Dacorogna, M. M., R. Gencay, U. A. Müller, R. B. Olsen and O. V. Pictet (2001). An Introduction to High-Frequency Finance. San Diego: Academic Press. Diebold, F. and G. Strasser (2007). On the correlation structure of microstructure noise in theory and practice. Working paper, Department of Economics, University of Pennsylvania. Falkenberry, T. N. (2001). High frequency data filtering. Technical report, Tick Data. Fan, J. and Y. Wang (2007). Multi-scale jump and volatility analysis for high-frequency financial data. Journal of the American Statistical Association 102, 1349–62. Hansen, P. R. and A. Lunde (2006). Realized variance and market microstructure noise (with discussion). Journal of Business and Economic Statistics 24, 127–218. Hasbrouck, J. (2007). Empirical Market Microstructure: Economic and Statistical Perspectives on the Dynamics of Trade in Securities Markets. New York: Oxford University Press. Jacod, J., Y. Li, P. A. Mykland, M. Podolskij and M. Vetter (2007). Microstructure noise in the continuous case: the pre-averaging approach. Working paper, Department of Statistics, University of Chicago. Kalnina, I. and O. Linton (2008). Estimating quadratic variation consistently in the presence of correlated measurement error. Forthcoming in Journal of Econometrics. Kyle, A. S. (1985, November). Continuous auctions and insider trading. Econometrica 53, 1315–35. Li, Y. and P. Mykland (2007). Are volatility estimators robust to modelling assumptions? Bernoulli 13, 601–22. Newey, W. K. and K. D. West (1987). A simple positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–8. O’Hara, M. (1995). Market Microstructure Theory. Cambridge: Blackwell. Shephard, N. (2005). Stochastic Volatility: Selected Readings. Oxford: Oxford University Press. Zhang, L. (2006). Efficient estimation of stochastic volatility using noisy observations: a multi-scale approach. Bernoulli 12, 1019–43. Zhang, L., P. A. Mykland and Y. A¨ıt-Sahalia (2005). A tale of two time scales: determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association 100, 1394– 1411. Zhou, B. (1996). High-frequency data and volatility in foreign-exchange rates. Journal of Business and Economic Statistics 14, 45–52.



An arbitrage-free generalized Nelson–Siegel term structure model J ENS H. E. C HRISTENSEN † , F RANCIS X. D IEBOLD ‡,§ AND G LENN D. R UDEBUSCH † †

‡

Federal Reserve Bank of San Francisco, 101 Market Street, San Francisco, CA 94105-1579, USA E-mails: [email protected], [email protected]

University of Pennsylvania, 3451 Walnut Street, Philadelphia, PA 19104, USA §

NBER, 1050 Massachusetts Avenue, Cambridge, MA 02138, USA E-mail: [email protected]

First version received: May 2008; final version accepted: September 2008

Summary The Svensson generalization of the popular Nelson–Siegel term structure model is widely used by practitioners and central banks. Unfortunately, like the original Nelson– Siegel specification, this generalization, in its dynamic form, does not enforce arbitrage-free consistency over time. Indeed, we show that the factor loadings of the Svensson generalization cannot be obtained in a standard finance arbitrage-free affine term structure representation. Therefore, we introduce a closely related generalized Nelson–Siegel model on which the noarbitrage condition can be imposed. We estimate this new AFGNS model and demonstrate its tractability and good in-sample fit. Keywords: Arbitrage-free, Nelson–Siegel, Svensson extension, Yield curve.

1. INTRODUCTION To investigate yield-curve dynamics, researchers have produced a vast literature with a wide variety of models. Many of these models assume that at observed bond prices there are no remaining unexploited opportunities for riskless arbitrage. This theoretical assumption is consistent with the observation that bonds of various maturities all trade simultaneously in deep and liquid markets. Rational traders in such markets should enforce a consistency in the yields of various bonds across different maturities—the yield curve at any point in time—and the expected path of those yields over time—the dynamic evolution of the yield curve. Indeed, the assumption that there are no remaining arbitrage opportunities is central to the enormous finance literature devoted to the empirical analysis of bond pricing. Unfortunately, as noted by Duffee (2002), the associated arbitrage-free (AF) models can demonstrate disappointing empirical performance, especially with regard to out-of-sample forecasting. In addition, the estimation of these models is problematic, in large part because of the existence of numerous model likelihood maxima that C The Author(s). Journal compilation C Royal Economic Society 2009. Published by Blackwell Publishing Ltd, 9600 Garsington Road,


C34

Jens H. E. Christensen et al.

have essentially identical fit to the data but very different implications for economic behavior (Kim and Orphanides, 2005). 1 In contrast to the popular finance AF models, many other researchers have employed representations that are empirically appealing but not well grounded in theory. Most notably, the Nelson and Siegel (1987) curve provides a remarkably good fit to the cross section of yields in many countries and has become a widely used specification among financial market practitioners and central banks. Moreover, Diebold and Li (2006) develop a dynamic model based on this curve and show that it corresponds exactly to a modern factor model, with yields that are affine in three latent factors, which have a standard interpretation of level, slope and curvature. Such a dynamic Nelson–Siegel (DNS) model is easy to estimate and forecasts the yield curve quite well. Despite its good empirical performance, however, the DNS model does not impose the presumably desirable theoretical restriction of absence of arbitrage (e.g. Filipović, 1999, and Diebold et al., 2005). In Christensen et al. (2007), henceforth CDR, we show how to reconcile the Nelson–Siegel model with the absence of arbitrage by deriving an affine AF model that maintains the Nelson– Siegel factor loading structure for the yield curve. This arbitrage-free Nelson–Siegel (AFNS) model combines the best of both yield-curve modeling traditions. Although it maintains the theoretical restrictions of the affine AF modeling tradition, the Nelson–Siegel structure helps identify the latent yield-curve factors, so the AFNS model can be easily and robustly estimated. Furthermore, our results show that the AFNS model exhibits superior empirical forecasting performance. In this paper, we consider some important generalizations of the Nelson–Siegel yield curve that are also widely used in central banks and industry (e.g. De Pooter, 2007). 2 Foremost among these is the Svensson (1995) extension to the Nelson–Siegel curve, which is used at the Federal Reserve Board (see Gürkaynak et al., 2007, 2008), the European Central Bank (see Coroneo et al., 2008) and many other central banks (see Söderlind and Svensson, 1997, and Bank for International Settlements, 2005). The Svensson extension adds a second curvature term, which allows for a better fit at long maturities. Following Diebold and Li (2006), we first introduce a dynamic version of this model, which corresponds to a modern four-factor term structure model. Unfortunately, we show that it is not possible to obtain an arbitrage-free ‘approximation’ to this model in the sense of obtaining analytically identical factor loadings for the four factors. Intuitively, such an approximation requires that each curvature factor must be paired with a slope factor that has the same mean-reversion rate. This pairing is simply not possible for the Svensson extension, which has one slope factor and two curvature factors. Therefore, to obtain an arbitragefree generalization of the Nelson–Siegel curve, we add a second slope factor to pair with the second curvature factor. The simple dynamic version of this model is a generalized version of the DNS model. We also show that the result in CDR can be extended to obtain an arbitragefree approximation to that five-factor model, which we refer to as the arbitrage-free generalized Nelson–Siegel (AFGNS) model. Finally, we show that this new AFGNS model of the yield curve not only displays theoretical consistency but also retains the important properties of empirical tractability and fit. We estimate 1 A further failing is that the affine AF finance models offer little insight into the economic nature of the underlying forces that drive movements in interest rates. This issue has been addressed by a burgeoning macro-finance literature, which is described in Rudebsuch and Wu (2007, 2008). 2 Alternative flexible parameterizations of the yield curve include the use of Legendre polynomials (as in Almeida and Vicente, 2008) and natural cubic splines (as in Bowsher and Meeks, 2008).


An arbitrage-free term structure model

C35

the independent-factor versions of the four-factor and five-factor non-AF models and the independent-factor version of the five-factor arbitrage-free AFGNS model. We compare the results to those obtained by CDR for the DNS and AFNS models and find good in-sample fit for the AFGNS model. The remainder of the paper is structured as follows. Section 2 briefly describes the DNS model and its arbitrage-free equivalent as derived in CDR. Section 3 contains the description of the AFGNS model. Section 4 describes the five specific models that we analyze, while Section 5 describes the data, estimation method and estimation results. Section 6 concludes the paper, and an Appendix contains some additional technical details.

2. NELSON–SIEGEL TERM STRUCTURE MODELS In this section, we review the DNS and AFNS models that maintain the Nelson–Siegel factor loading structure. 2.1. The dynamic Nelson–Siegel model The Nelson–Siegel curve fits the term structure of interest rates at any point in time with the simple functional form y(τ ) = β0 + β1

1 − e−λτ λτ

+ β2

1 − e−λτ −λτ , −e λτ

(2.1)

where y(τ ) is the zero-coupon yield with τ denoting the time to maturity, and β 0 , β 1 , β 2 and λ are model parameters. 3 As many have noted, this representation is able to provide a good fit to the cross section of yields at a given point in time, and this is a key reason for its popularity with financial market practitioners. Still, to understand the evolution of the bond market over time, a dynamic representation is required. Diebold and Li (2006) supply such a model by replacing the parameters with time-varying factors yt (τ ) = Lt + St

1 − e−λτ λτ

+ Ct

1 − e−λτ −λτ . −e λτ

(2.2)

Given their associated Nelson–Siegel factor loadings, Diebold and Li show that L t , S t and C t can be interpreted as level, slope and curvature factors. Furthermore, once the model is viewed as a factor model, a dynamic structure can be postulated for the three factors, which yields a DNS model. Despite its good empirical performance, however, the DNS model does not impose absence of arbitrage (e.g. Filipović, 1999, and Diebold et al., 2005). This problem was solved in CDR, where we derived the affine arbitrage-free class of DNS term structure models, referred to as the AFNS model in the remainder of this paper.

3

This is equation (2) in Nelson and Siegel (1987).


C36


2.2. The arbitrage-free Nelson–Siegel model The derivation in CDR of the class of AFNS models starts from the standard continuous-time affine arbitrage-free term structure model. In this framework, we consider a three-factor model with a constant volatility matrix, i.e. in the terminology of the canonical characterization of affine term structure models provided by Dai and Singleton (2000), we start with the A 0 (3) class of term structure models. Within the A 0 (3) class, CDR prove the following proposition. P ROPOSITION 2.1. Assume that the instantaneous risk-free rate is defined by rt = Xt1 + Xt2 . In addition, assume that the state variables X t = (X1t , X2t , X3t ) are described by the following system of stochastic differential equations (SDEs) under the risk-neutral Q-measure: ⎛ 1⎞ ⎛ 0 dXt ⎜ 2⎟ ⎜ ⎝dXt ⎠ = ⎝ 0 dXt3 0

⎞ ⎡⎛ Q ⎞ ⎛ 1 ⎞⎤ ⎛ ⎞ dWt1,Q θ1 0 0 Xt ⎟ ⎢⎜ ⎟ ⎜ ⎟⎥ ⎜ ⎟ λ −λ ⎠ ⎣⎝θ2Q ⎠ − ⎝Xt2 ⎠⎦ dt + ⎝dWt2,Q ⎠ , Xt3 0 λ θ3Q dWt3,Q

λ > 0.

Then, zero-coupon bond prices are given by P (t, T ) =

EtQ

exp

− t

T

ru du = exp B 1 (t, T )Xt1 + B 2 (t, T )Xt2 + B 3 (t, T )Xt3 + C(t, T ) ,

where B 1 (t, T ), B 2 (t, T ), B 3 (t, T ) and C(t, T ) are the unique solutions to the following system of ordinary differential equations (ODEs): ⎛

dB 1 (t,T ) ⎜ dB 2dt(t,T ) ⎜ ⎝ dt dB 3 (t,T ) dt

⎞

⎛ ⎞ ⎛ 1 0 ⎟ ⎜ ⎟ ⎜ ⎟ = ⎝1 ⎠ + ⎝ 0 ⎠ 0 0

0 λ −λ

⎞⎛ 1 ⎞ B (t, T ) ⎟⎜ ⎟ 0 ⎠ ⎝B 2 (t, T ) ⎠ λ B 3 (t, T ) 0

(2.3)

and dC(t, T ) 1 = −B(t, T ) K Q θ Q − B(t, T )B(t, T ) j ,j , dt 2 j =1 3

(2.4)

with boundary conditions B 1 (T , T ) = B 2 (T , T ) = B 3 (T , T ) = C(T , T ) = 0. The unique solution for this system of ODEs is: B 1 (t, T ) = −(T − t), B 2 (t, T ) = −

1 − e−λ(T −t) , λ

B 3 (t, T ) = (T − t)e−λ(T −t) −

1 − e−λ(T −t) , λ


C37


and

T

C(t, T ) = (K Q θ Q )2 +

t

3 1

2

j =1

B 2 (s, T )ds + (K Q θ Q )3

T t

T

B 3 (s, T )ds t

B(s, T )B(s, T ) j ,j ds.

Finally, zero-coupon bond yields are given by 1 − e−λ(T −t) 2 1 − e−λ(T −t) C(t, T ) y(t, T ) = Xt1 + Xt + − e−λ(T −t) Xt3 − . λ(T − t) λ(T − t) T −t For proof see CDR. This proposition defines the class of AFNS models. In this class of models, the factor loadings exactly match the Nelson–Siegel ones, but there is an unavoidable additional term in the yield ) function, − C(t,T , which depends only on the maturity of the bond. This ‘yield-adjustment’ term T −t is a crucial difference between the AFNS and DNS models and has the following form: 4 3 C(t, T ) 1 1 T B(s, T )B(s, T ) j ,j ds. − =− T −t 2 T − t j =1 t

Given a general volatility matrix ⎛

σ11

⎜ = ⎝ σ21 σ31

σ12

σ13

⎞

σ22

⎟ σ23 ⎠ ,

σ32

σ33

the yield-adjustment term can be derived in analytical form as C(t, T ) 1 1 = T −t 2T −t

t

T

3

B(s, T )B(s, T )

j ,j

ds

j =1

1 1 − e−2λ(T −t) (T − t)2 1 1 1 − e−λ(T −t) +B + − 6 2λ2 λ3 T −t 4λ3 T −t 1 1 1 3 (T − t)e−2λ(T −t) − 2 e−2λ(T −t) + 2 e−λ(T −t) − +C 2λ2 λ 4λ 4λ 5 1 − e−2λ(T −t) 2 1 − e−λ(T −t) + 3 − 3 λ T −t 8λ T −t 1 −λ(T −t) 1 1 1 − e−λ(T −t) (T − t) + 2 e +D − 3 2λ λ λ T −t

=A

4 As explained in CDR, this form of the yield-adjustment term is obtained by fixing the mean parameters of the state variables under the Q-measure at zero, i.e. θ Q = 0, which implies no loss of generality. C The Author(s). Journal compilation C Royal Economic Society 2009.

C38


+E

3 −λ(T −t) 1 3 1 − e−λ(T −t) 1 −λ(T −t) e + − (T − t) + (T − t)e λ2 2λ λ λ3 T −t

1 1 −λ(T −t) 1 −2λ(T −t) 3 1 − e−λ(T −t) + e − e − λ2 λ2 2λ2 λ3 T −t −2λ(T −t) 3 1−e , + 3 4λ T −t +F

where • • • • • •

2 2 2 A = σ11 + σ12 + σ13 , 2 2 2 B = σ21 + σ22 + σ23 , 2 2 2 C = σ31 + σ32 + σ33 , D = σ11 σ21 + σ12 σ22 + σ13 σ23 , E = σ11 σ31 + σ12 σ32 + σ13 σ33 , F = σ21 σ31 + σ22 σ32 + σ23 σ33 .

This result has two implications. First, the fact that zero-coupon bond yields in the AFNS class of models are given by an analytical formula greatly facilitates empirical implementation of these models. Second, the nine underlying volatility parameters are not identified. Indeed, only the six terms A, B, C, D, E and F can be identified; thus, the maximally flexible AFNS specification that can be identified has a triangular volatility matrix given by 5 ⎞ ⎛ 0 σ11 0 ⎟ ⎜ = ⎝ σ21 σ22 0 ⎠ . σ31 σ32 σ33

3. EXTENSIONS OF THE NELSON–SIEGEL MODEL The main in-sample problem with the regular Nelson–Siegel yield curve is that, for reasonable choices of λ (which are empirically in the range from 0.5 to 1 for U.S. Treasury yield data), the factor loading for the slope and the curvature factor decay rapidly to zero as a function of maturity. Thus, only the level factor is available to fit yields with maturities of ten years or longer. In empirical estimation, this limitation shows up as a lack of fit of the long-term yields, as described in CDR. To address this problem in fitting the cross section of yields, Svensson (1995) introduced an extended version of the Nelson–Siegel yield curve with an additional curvature factor, 1 − e−λ1 τ 1 − e−λ1 τ 1 − e−λ2 τ −λ1 τ −λ2 τ + β4 . + β3 −e −e y(τ ) = β1 + β2 λ1 τ λ1 τ λ2 τ Just as Diebold and Li (2006) replaced the three β coefficients with dynamic factors in the regular Nelson–Siegel model, we can replace the four β coefficients in the Svensson model with dynamic 5

The choice of upper or lower triangular is irrelevant for the fit of the model. C The Author(s). Journal compilation C Royal Economic Society 2009.

C39

1.0

1.0


0.6

0.8

Level Slope No. 1 Slope No. 2 Curvature No. 1 Curvature No. 2

0.0

0.0

0.2

0.4

Factor loading

0.6 0.4 0.2

Factor loading

0.8

Level Slope Curvature No. 1 Curvature No. 2

0

5

10

15

20

25

30

Time to maturity in years

(a) Factor loadings in the DNSS model.

0

5

10

15

20

25

30


(b) Factor loadings in the DGNS model.

Figure 1. Factor loadings in the yield functions of the DNSS and DGNS models.

processes (L t , S t , C 1t , C 2t ) interpreted as a level, a slope and two curvature factors, respectively. Thus, the dynamic factor model representation of the Svensson yield curve, which we label the DNSS model, is given by 1 − e−λ1 τ 1 − e−λ1 τ 1 − e−λ2 τ + Ct1 − e−λ1 τ + Ct2 − e−λ2 τ , yt (τ ) = Lt + St λ1 τ λ1 τ λ2 τ along with the processes describing factor dynamics. The factor loadings of the four state variables in the yield function of the DNSS model are illustrated in Figure 1(a) with λ 1 and λ 2 set equal to our estimates described in Section 5. The left-hand figure shows the factor loadings of the four state variables in the yield function of the DNSS model with λ 1 and λ 2 equal to 0.8379 and 0.09653, respectively. The critique raised by Filipović (1999) against the dynamic version of the Nelson–Siegel model also applies to the dynamic version of the Svensson model introduced in this paper. Thus, this model is not consistent with the concept of absence of arbitrage. Ideally, we would like to repeat the work in CDR and derive an arbitrage-free approximation to the DNSS model. However, from the mechanics of Proposition 2.1 for the arbitrage-free approximation of the regular Nelson–Siegel model, it is clear that we can only obtain the Nelson–Siegel factor loading structure for the slope and curvature factors under two specific conditions. First, each pair of slope and curvature factors must have identical own mean-reversion rates. Second, the impact of deviations in the curvature factor from its mean on the slope factor must be scaled with a factor equal to that own mean-reversion rate (λ). Thus, it is impossible in an arbitrage-free model to generate the factor loading structure of two curvature factors with only one slope factor. Consequently, it is impossible to create an arbitrage-free version of the Svensson extension to the Nelson–Siegel model that has factor loadings analytically identical to the ones in the DNSS model. However, this discussion suggests that we can create a generalized AF Nelson–Siegel model by including a fifth factor in the form of a second slope factor. The yield function of this model C The Author(s). Journal compilation C Royal Economic Society 2009.

C40


takes the form yt (τ ) = Lt + St1 +

Ct2

1 − e−λ1 τ λ1 τ

+ St2

1 − e−λ2 τ −λ2 τ . −e λ2 τ

1 − e−λ2 τ λ2 τ

+ Ct1

1 − e−λ1 τ − e−λ1 τ λ1 τ

This dynamic generalized Nelson–Siegel model, which we denote as the DGNS model, is a fivefactor model with one level factor, two slope factors and two curvature factors. (Note that we impose the restriction that λ 1 > λ 2 , which is non-binding due to symmetry. 6 ) The factor loadings of the five state variables in the yield function of the DGNS model are illustrated in Figure 1(b) with λ 1 and λ 2 set equal to our estimates in Section 5. The right-hand figure shows the factor loadings of the five state variables in the yield function of the DGNS model with λ 1 and λ 2 equal to 1.190 and 0.1021, respectively. These λ i values equal the estimated values obtained below, and they require maturity to be measured in years. A straightforward extension of Proposition 2.1 delivers the arbitrage-free approximation of this model, which we denote as the AFGNS model. P ROPOSITION 3.1. Assume that the instantaneous risk-free rate is defined by rt = Xt1 + Xt2 + Xt3 . In addition, assume that the state variables X t = (X1t , X2t , X3t , X4t , X5t ) are described by the following system of SDEs under the risk-neutral Q-measure: ⎛ 1⎞ ⎛ 0 dXt ⎜ 2⎟ ⎜ ⎜dXt ⎟ ⎜ 0 ⎜ 3⎟ ⎜ ⎜dX ⎟ = ⎜ 0 ⎜ t⎟ ⎜ ⎜ 4⎟ ⎜ ⎝dXt ⎠ ⎝ 0 0 dXt5

0

0

0

λ1 0

0 λ2

−λ1 0

0 0

0 0

λ1 0

⎛ ⎞ ⎞ ⎡⎛ Q ⎞ ⎛ 1 ⎞⎤ dWt1,Q θ1 Xt ⎜ ⎟ ⎟ ⎟ ⎢⎜ ⎟⎥ ⎜θ2Q ⎟ ⎜ ⎜dWt2,Q ⎟ 0 ⎟⎢ ⎜Xt2 ⎟⎥ ⎜ ⎜ ⎢ ⎥ ⎟ ⎟ ⎟ ⎢⎜ Q ⎟ ⎜ 3 ⎟⎥ 3,Q ⎟ ⎜X ⎟⎥ dt + ⎜ −λ2 ⎟ − θ ⎜ ⎜ ⎢ ⎟ ⎟, dW t t ⎟ ⎢⎜ 3 ⎟ ⎜ ⎟⎥ ⎜ ⎟ ⎟ ⎜ Q ⎟ ⎜ 4 ⎟⎥ 4,Q ⎟ ⎜ 0 ⎠⎢ X ⎝dWt ⎠ ⎣⎝θ4 ⎠ ⎝ t ⎠⎦ λ2 Xt5 θQ dW 5,Q 0

5

t

where λ 1 > λ 2 > 0. Then, zero-coupon bond prices are given by T ru du P (t, T ) = EtQ exp − t = exp B 1 (t, T )Xt1 + B 2 (t, T )Xt2 + B 3 (t, T )Xt3 + B 4 (t, T )Xt4 + B 5 (t, T )Xt5 + C(t, T ) ,

6 Bj¨ ork and Christensen (1999) introduce a related extension of the Nelson–Siegel model with one level factor, two slope factors and a single curvature factor with the restriction that λ 1 = 2λ 2 . C The Author(s). Journal compilation C Royal Economic Society 2009.

C41


where B 1 (t, T ), B 2 (t, T ), B 3 (t, T ), B 4 (t, T ), B 5 (t, T ) and C(t, T ) are the unique solutions to the following system of ODEs: ⎛ dB 1 (t,T ) ⎞ dt ⎜ dB 2 (t,T ) ⎜ dt ⎜ ⎜ dB 3 (t,T ) ⎜ dt ⎜ 4 ⎜ dB (t,T ) ⎝ dt dB 5 (t,T ) dt

⎛ ⎞ ⎛ 0 0 1 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜1 ⎟ ⎜ 0 λ1 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜0 0 ⎟ = ⎜1 ⎟ + ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝0 ⎠ ⎝ 0 −λ1 ⎠ 0 0 0

0

0

0 λ2

0 0

0 −λ2

λ1 0

0

⎞⎛

B 1 (t, T )

⎞

⎟⎜ 2 ⎟ ⎟ ⎜B (t, T ) ⎟ ⎟⎜ 3 ⎟ ⎟ ⎜B (t, T ) ⎟ ⎟⎜ ⎟ ⎜ ⎟ 0⎟ ⎠ ⎝B 4 (t, T ) ⎠ λ2 B 5 (t, T ) 0 0

(3.1)

and dC(t, T ) 1 = −B(t, T ) K Q θ Q − B(t, T )B(t, T ) j ,j , dt 2 j =1 5

(3.2)

with boundary conditions B 1 (T , T ) = B 2 (T , T ) = B 3 (T , T ) = B 4 (T , T ) = B 5 (T , T ) = C(T , T ) = 0. The unique solution for this system of ODEs is: B 1 (t, T ) = −(T − t), 1 − e−λ1 (T −t) , λ1 1 − e−λ2 (T −t) B 3 (t, T ) = − , λ2 B 2 (t, T ) = −

1 − e−λ1 (T −t) , λ1 1 − e−λ2 (T −t) B 5 (t, T ) = (T − t)e−λ2 (T −t) − , λ2

B 4 (t, T ) = (T − t)e−λ1 (T −t) −

and C(t, T ) = (K Q θ Q )2 t

T

T

B 2 (s, T )ds + (K Q θ Q )3

+ (K Q θ Q )5 t

T

1 2 j =1 5

B 5 (s, T )ds +

t

t T

T

B 3 (s, T )ds + (K Q θ Q )4

B 4 (s, T )ds t

B(s, T )B(s, T )

j ,j

ds.

Finally, zero-coupon bond yields are given by 1 − e−λ1 (T −t) 2 1 − e−λ2 (T −t) 3 1 − e−λ1 (T −t) −λ1 (T −t) Xt4 Xt + Xt + −e y(t, T ) = + λ1 (T − t) λ2 (T − t) λ1 (T − t) 1 − e−λ2 (T −t) C(t, T ) − e−λ2 (T −t) Xt5 − . + λ2 (T − t) T −t Xt1

The proof is a straightforward extension of CDR. C The Author(s). Journal compilation C Royal Economic Society 2009.

C42


Similar to the AFNS class of models, the yield-adjustment term will have the following form: 7 5 1 1 T C(t, T ) =− B(s, T )B(s, T ) j ,j ds. − T −t 2 T − t j =1 t

Following arguments similar to the ones provided for the AFNS class of models in the previous section, the maximally flexible specification of the volatility matrix that can be identified in estimation is given by a triangular matrix ⎛

σ11

⎜ ⎜ σ21 ⎜ =⎜ ⎜ σ31 ⎜ ⎝ σ41 σ51

0

0

0

σ22 σ32

0 σ33

0 0

σ42 σ52

σ43 σ53

σ44 σ54

0

⎞

⎟ ⎟ ⎟ ⎟. ⎟ ⎟ 0 ⎠ σ55 0 0

4. FIVE SPECIFIC NELSON–SIEGEL MODELS In general, all the models considered in this paper are silent about the P-dynamics, and an infinite number of possible specifications could be used to match the data. However, for continuity with the existing literature, our econometric analysis focuses on independent-factor versions of the five different models we have described. These models include the DNS and AFNS models from CDR and the generalized DNSS, DGNS and AFGNS models introduced in Section 3. In the independent-factor DNS model, all three state variables are assumed to be independent first-order autoregressions, as in Diebold and Li (2006). Using their notation, the state equation is given by ⎛

Lt − μL

⎞

⎛

a11

⎟ ⎜ ⎜ ⎝St − μS ⎠ = ⎝ 0 0 Ct − μC

⎞ ⎞⎛ ⎞ ⎛ Lt−1 − μL ηt (L) ⎟ ⎟⎜ ⎟ ⎜ 0 ⎠ ⎝St−1 − μS ⎠ + ⎝ηt (S) ⎠ , a33 ηt (C) Ct−1 − μC

0

0

a22 0

where the error terms η t (L), η t (S) and η t (C) have a conditional covariance matrix given by ⎛

2 q11

⎜ Q=⎝ 0 0

0 2 q22

0

0

⎞

⎟ 0 ⎠. 2 q33

7 The analytical formula for the yield-adjustment term in the AFGNS model is provided in the Appendix. As was the case for Proposition 2.1, Proposition 3.1 is also silent about the P-dynamics of the state variables, so to identify the model, we follow CDR and fix the mean under the Q-measure at zero, i.e. θ Q = 0. C The Author(s). Journal compilation C Royal Economic Society 2009.

C43


In this model, the measurement equation takes the form ⎛ −λτ1 1−e−λτ1 ⎛ ⎞ 1 1−eλτ1 − e−λτ1 yt (τ1 ) λτ1 ⎜ −λτ2 −λτ2 1−e ⎜ y (τ ) ⎟ ⎜ 1 1−e − e−λτ2 ⎜ t 2 ⎟ ⎜ λτ2 λτ2 ⎜ ⎟=⎜ ⎜ .. ⎟ ⎜ .. .. .. ⎝ . ⎠ ⎜. . . ⎝ −λτN −λτN 1−e yt (τN ) − e−λτN 1 1−eλτN λτN

⎞

⎛ ⎞ εt (τ1 ) ⎟⎛ ⎞ ⎟ Lt ⎜ ⎟ ⎟ ⎜ ⎟ ⎜ εt (τ2 ) ⎟ ⎟ ⎝St ⎠ + ⎜ ⎟ ⎟ ⎜ .. ⎟ , ⎟ ⎝ ⎠ . ⎠ Ct εt (τN )

where the measurement errors ε t (τ i )are assumed to be independently and identically distributed (i.i.d.) white noise. The corresponding AFNS model is formulated in continuous time and the relationship between the real-world dynamics under the P-measure and the risk-neutral dynamics under the Q-measure is given by the measure change dWtQ = dWtP + t dt, where t represents the risk premium specification. To preserve affine dynamics under the Pmeasure, we limit our focus to essentially affine risk premium specifications (see Duffee, 2002). Thus, t will take the form ⎞⎛ 1⎞ ⎛ 0⎞ ⎛ 1 1 1 γ1 γ13 γ11 γ12 Xt ⎜ 0⎟ ⎜ 1 1 1 ⎟⎜ 2⎟

t = ⎝γ2 ⎠ + ⎝ γ21 γ22 γ23 ⎠ ⎝Xt ⎠ . 1 1 1 Xt3 γ31 γ32 γ33 γ30 With this specification, the SDE for the state variables under the P-measure, dXt = K P [θ P − Xt ]dt + dWtP ,

(4.1)

remains affine. Due to the flexible specification of t , we are free to choose any mean vector θ P and mean-reversion matrix K P under the P-measure and still preserve the required Q-dynamic structure described in Proposition 2.1. Therefore, we focus on the independent-factor AFNS model, which corresponds to the specific DNS model from earlier in this section and assumes all three factors are independent under the P-measure ⎞ ⎞ ⎡⎛ P ⎞ ⎛ 1 ⎞⎤ ⎛ ⎞⎛ ⎛ 1⎞ ⎛ P dWt1,P θ1 Xt σ1 0 0 0 κ11 0 dXt ⎟ ⎢⎜ ⎟ ⎜ ⎟⎥ ⎜ ⎟⎜ ⎜ 2 ⎟ ⎜ 0 κP 2,P ⎟ 0 ⎟ ⎠ ⎣⎝θ2P ⎠ − ⎝Xt2 ⎠⎦ dt + ⎝ 0 σ2 0 ⎠ ⎜ ⎝dXt ⎠ = ⎝ 22 ⎝dWt ⎠ . P dXt3 0 0 κ33 θ3P 0 0 σ3 Xt3 dWt3,P In this case, the measurement equation takes the form ⎞ −λτ1 ⎛ ⎞ ⎛ C(τ1 ) ⎞ ⎛ ⎞ ⎛ 1−e−λτ1 1 1−eλτ1 − e−λτ1 yt (τ1 ) εt (τ1 ) λτ1 τ1 ⎛ ⎞ ⎟ X1 −λτ2 ⎜ y (τ ) ⎟ ⎜ ⎟ ⎜ C(τ2 ) ⎟ ⎜ 1−e−λτ2 ⎜ t 2 ⎟ ⎜ ⎟ ⎜ εt (τ2 ) ⎟ 1 1−eλτ2 − e−λτ2 ⎟ ⎟ ⎜ t2 ⎟ ⎜ λτ2 τ2 ⎟ ⎜ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝X ⎠ − ⎜ + ⎜ . ⎟, ⎜ .. ⎟ = ⎜ t ⎟ .. ⎟ .. .. .. ⎜ ⎜ . ⎟ ⎜ ⎟ ⎜ . ⎟ ⎟ . . ⎝ ⎝ . ⎠ ⎝ . ⎠ ⎠ ⎜ ⎝. ⎠ Xt3 −λτN C(τN ) yt (τN ) 1−e−λτN εt (τN ) 1 1−e − e−λτN N λτN

λτN

τ

where, again, the measurement errors ε t (τ i ) are assumed to be i.i.d. white noise. C The Author(s). Journal compilation C Royal Economic Society 2009.

C44


We now turn to the three generalized Nelson–Siegel models. In the independent-factor DNSS model, all four state variables are assumed to be independent first-order autoregressions, as in Diebold and Li (2006). Using their notation, the state equation is given by ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ Lt−1 − μL ηt (L) Lt − μL a11 0 0 0 ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜St − μS ⎟ ⎜ 0 a22 0 0 ⎟ ⎜St−1 − μS ⎟ ⎜ηt (S) ⎟ ⎜ ⎟=⎜ ⎟⎜ ⎟+⎜ ⎟, ⎜ 1 ⎜C 1 − μ 1 ⎟ ⎜ 0 ⎟ ⎜ 1 ⎟ 0 a33 0 ⎟ C ⎠ ⎝ ⎝ t ⎠ ⎝Ct−1 − μC 1 ⎠ ⎝ηt (C ) ⎠ 2 0 0 0 a44 Ct2 − μC 2 ηt (C 2 ) Ct−1 − μC 2 where the error terms η t (L), η t (S), η t (C 1 ) and η t (C 2 ) have a conditional covariance matrix given by ⎛ 2 ⎞ q11 0 0 0 ⎜ ⎟ 2 ⎜ 0 q22 0 0 ⎟ ⎜ ⎟. Q=⎜ 2 0 q33 0 ⎟ ⎝ 0 ⎠ 2 0 0 0 q44 In the DNSS model, the measurement equation takes the form −λ1 τ1 ⎛ ⎞ ⎛ 1−e−λ1 τ1 1−e−λ2 τ1 1 1−eλ1 τ1 − e−λ1 τ1 − e−λ2 τ1 yt (τ1 ) λ1 τ 1 λ2 τ 1 ⎜ −λ τ −λ τ −λ τ ⎜ ⎟ 1 2 1−e 1 2 1−e 2 2 ⎜ yt (τ2 ) ⎟ ⎜ 1 1−eλ1 τ2 − e−λ1 τ2 − e−λ2 τ2 λ1 τ 2 λ2 τ 2 ⎜ ⎟ ⎜ ⎜ ⎜ . ⎟=⎜. .. .. .. ⎜ .. ⎟ ⎜ . . . . ⎝ ⎠ ⎝. −λ1 τN −λ1 τN −λ2 τN 1−e 1−e 1−e −λ τ yt (τN ) 1 −e 1 N − e−λ2 τN λ1 τ N λ1 τ N λ2 τ N

⎞

⎛ ⎞ ⎛ ε (τ ) ⎞ t 1 ⎟ Lt ⎟ ⎟⎜ ⎟ ⎜ εt (τ2 ) ⎟ ⎟ ⎜St ⎟ ⎜ ⎟ ⎟⎜ ⎟ + ⎜ , .. ⎟ ⎟ ⎜C 1 ⎟ ⎜ ⎟ ⎟⎝ t ⎠ ⎜ . ⎝ ⎠ ⎠ Ct2 εt (τN )

where the measurement errors ε t (τ i ) are assumed to be i.i.d. white noise. In the independent-factor DGNS model, all five state variables are assumed to be independent first-order autoregressions, and the state equation is given by ⎞ ⎞ ⎛a ⎛ 0 0 0 0 ⎛Lt−1 − μL ⎞ ⎛ηt (L) ⎞ 11 Lt − μL ⎟ ⎜ ⎜S 1 − μ 1 ⎟ ⎜ 1 1 ⎟ 0 0 ⎟⎜ ⎜ 0 a22 0 St−1 − μS 1 ⎟ ⎟ ⎜ηt (S ) ⎟ ⎜ t S ⎟ ⎟⎜ ⎜ ⎜ ⎟ ⎜ ⎟ ⎜ 2 ⎟ 2 ⎜S − μS 2 ⎟ = ⎜ 0 ⎟ 0 0 a33 0 St−1 − μS 2 ⎟ ηt (S 2 ) ⎟ +⎜ ⎟⎜ ⎜ ⎜ ⎟ ⎜ ⎟ ⎜ t ⎟, ⎟⎜ ⎜ ⎟ ⎜ ⎟ ⎜ 1 1 1 0 0 0 a44 0 ⎟ ⎝Ct−1 − μC 1 ⎠ ⎝ηt (C ) ⎟ ⎝Ct − μC 1 ⎠ ⎜ ⎠ ⎝ ⎠ 2 2 2 0 0 0 0 a55 Ct − μC 2 ηt (C ) Ct−1 − μC 2 where the error terms η t (L), η t (S 1 ), η t (S 2 ), η t (C 1 ) and η t (C 2 ) have a conditional covariance matrix given by ⎛ 2 ⎞ q11 0 0 0 0 ⎜ 0 q2 0 0 0 ⎟ ⎜ ⎟ 22 ⎜ ⎟ 2 ⎜ 0 q33 0 0 ⎟ Q=⎜ 0 ⎟. ⎜ ⎟ 2 ⎝ 0 0 0 q44 0 ⎠ 0

0

0

0

2 q55


C45


In the DGNS model, the measurement equation takes the form −λ1 τ1 ⎛ ⎞ ⎛ 1−e−λ2 τ1 1−e−λ1 τ1 1 1−eλ1 τ1 − e−λ1 τ1 yt (τ1 ) λ2 τ 1 λ1 τ 1 −λ1 τ2 ⎜ ⎟ ⎜ 1−e−λ2 τ2 1−e−λ1 τ2 ⎜ yt (τ2 ) ⎟ ⎜ 1 1−eλ1 τ2 − e−λ1 τ2 λ2 τ 2 λ1 τ 2 ⎜ ⎟ ⎜ ⎜ ⎜ . ⎟=⎜. .. .. .. ⎜ .. ⎟ ⎜ . . . . ⎝ ⎠ ⎝. 1−e−λ1 τN 1−e−λ2 τN 1−e−λ1 τN yt (τN ) 1 − e−λ1 τN λ1 τ N λ2 τ N λ1 τ N ⎛ ⎞ εt (τ1 ) ⎜ ⎟ ⎜ εt (τ2 ) ⎟ ⎜ ⎟ +⎜ . ⎟, ⎜ .. ⎟ ⎝ ⎠ εt (τN )

1−e−λ2 τ1 λ2 τ 1

− e−λ2 τ1

1−e−λ2 τ2 λ2 τ 2

− e−λ2 τ2 .. .

1−e−λ2 τN λ2 τ N

− e−λ2 τN

⎞⎛ ⎞ Lt ⎟ ⎜S 1 ⎟ ⎟⎜ t ⎟ ⎟⎜ 2 ⎟ ⎟ ⎜S ⎟ ⎟⎜ t ⎟ ⎟⎜ 1⎟ ⎠ ⎝Ct ⎠ Ct2

where the measurement errors ε t (τ i ) are assumed to be i.i.d. white noise. Finally, as for the AFNS model, the AFGNS model is formulated in continuous time and the relationship between the real-world dynamics under the P-measure and the risk-neutral dynamics under the Q-measure is given by the measure change dWtQ = dWtP + t dt, where t represents the risk premium specification. Again, to preserve affine dynamics under the P-measure, we limit our focus to essentially affine risk premium specifications (see Duffee, 2002). Thus, t takes the form ⎛ 0 ⎞ ⎛γ1 γ1 γ1 γ1 γ1 ⎞⎛ 1 ⎞ γ1 Xt 11 12 13 14 15 1 1 1 1 1 ⎟⎜ 2⎟ ⎜γ 0 ⎟ ⎜ γ γ γ γ γ ⎜ ⎟ 22 23 24 25 ⎜ 2 ⎟ ⎜ 21 ⎟ ⎜Xt ⎟ ⎟ ⎜ 0⎟ ⎜ 1 1 1 1 1 ⎟⎜ 3⎟ ⎟ ⎜ γ γ γ γ γ

t = ⎜γ3 ⎟ + ⎜ 31 32 33 34 35 ⎟ ⎜ ⎜Xt ⎟ . ⎜ ⎟ ⎜ 0 ⎟ ⎜γ1 γ1 γ1 γ1 γ1 ⎟⎜ 4 ⎟ ⎝γ4 ⎠ ⎝ 41 42 43 44 45 ⎠ ⎝Xt ⎠ 1 1 1 1 1 0 γ γ γ γ γ Xt5 γ 51 52 53 54 55 5

With this specification, the SDE for the state variables under the P-measure, dXt = K P [θ P − Xt ]dt + dWtP ,

(4.2)

remains affine. Due to the flexible specification of t , we are free to choose any mean vector θ P and mean-reversion matrix K P under the P-measure and still preserve the required structure for the Q-dynamics described in Proposition 3.1. Therefore, we focus on the AFGNS model that corresponds to the specific DGNS model we have described earlier. In this independent-factor AFGNS model, all five factors are assumed to be independent under the P-measure ⎛ ⎞ ⎡⎛ P ⎞ ⎛ 1 ⎞⎤ ⎞ ⎛dW 1,P ⎞ ⎛ 1⎞ ⎛ P Xt dXt θ1 σ1 0 0 0 0 κ11 0 0 0 0 t ⎟ ⎜0 σ 0 0 0 ⎟ ⎜ ⎜dX2 ⎟ ⎜0 κ P 0 0 0 ⎟ ⎢⎜θ P ⎟ ⎜X2 ⎟⎥ dWt2,P ⎟ 2 ⎜ ⎟ ⎢⎜ 2 ⎟ ⎜ t ⎟⎥ ⎟⎜ ⎜ t⎟ ⎜ 22 ⎜ ⎟ ⎜ ⎟ ⎢⎜ ⎟ ⎜ ⎟⎥ ⎟ ⎜ 3⎟ ⎜ 3,P ⎟ ⎜dX ⎟ = ⎜0 0 κ P 0 0 ⎟ ⎢⎜θ P ⎟ − ⎜X3 ⎟⎥ dt + ⎜0 0 σ3 0 0 ⎟ ⎜ . dW ⎜ 33 ⎜ ⎟ ⎢⎜ 3 ⎟ ⎜ t ⎟⎥ ⎟⎜ t ⎟ ⎜ t⎟ ⎜ ⎟ ⎜ ⎟ ⎢⎜ P ⎟ ⎜ 4 ⎟⎥ ⎟⎜ ⎜ 4⎟ ⎜ P ⎟ 4,P ⎝0 0 0 σ4 0 ⎠ ⎝dWt ⎠ ⎝dXt ⎠ ⎝0 0 0 κ44 0 ⎠ ⎣⎝θ4 ⎠ ⎝Xt ⎠⎦ P P 5 5 θ5 0 0 0 0 κ55 0 0 0 0 σ5 dXt Xt dWt5,P C The Author(s). Journal compilation C Royal Economic Society 2009.

C46


For the AFGNS model, the measurement equation takes the form −λ1 τ1 ⎛ ⎞ ⎛ 1−e−λ2 τ1 1−e−λ1 τ1 1−e−λ2 τ1 1 1−eλ1 τ1 − e−λ1 τ1 yt (τ1 ) λ2 τ 1 λ1 τ 1 λ2 τ 1 −λ1 τ2 −λ2 τ2 −λ1 τ2 −λ2 τ2 ⎜ y (τ ) ⎟ ⎜ 1−e 1−e 1−e 1−e −λ τ ⎜ 1 2 ⎜ t 2 ⎟ ⎜1 −e λ τ λ τ λ τ λ τ2 1 2 2 2 1 2 2 ⎜ ⎟ ⎜ .. ⎟ = ⎜ ⎜ . . . . ⎜ . ⎟ ⎜. .. .. .. ⎝ ⎠ ⎝. −λ1 τN −λ2 τN yt (τN ) 1−e−λ2 τN 1−e−λ1 τN 1 1−e − e−λ1 τN 1−e λ1 τ N

⎛ C(τ1 ) ⎞

⎛

λ2 τ N

⎞

λ1 τ N

λ2 τ N

− e−λ2 τ1

⎞⎛

Xt1

⎞

⎟ ⎜X 2 ⎟ ⎜ ⎟ − e−λ2 τ2 ⎟ ⎟ ⎜ t3 ⎟ ⎟ ⎜X ⎟ ⎟⎜ t ⎟ .. ⎟⎜ 4⎟ . ⎠ ⎝Xt ⎠ − e−λ2 τN Xt5

εt (τ1 ) ⎟ ⎟ ⎜ ⎟ ⎜ εt (τ2 ) ⎟ ⎟ ⎟ ⎜ ⎟ + ⎜ . ⎟, ⎟ ⎜ .. ⎟ ⎠ ⎠ ⎝ C(τN ) εt (τN ) τN τ1

⎜ C(τ2 ) ⎜ τ2 ⎜ − ⎜ . ⎜ . ⎝ .

where, again, the measurement errors ε t (τ i ) are assumed to be i.i.d. white noise.

5. ESTIMATION OF THE MODELS In this section, we will first describe the interest rate data to be used and the estimation method. Next, we examine estimation results and in-sample fit for the DNS, AFNS, DNSS, DGNS and AFGNS models. 5.1. Data Our data are monthly observations on U.S. Treasury security yields covering the period from January 1987 to December 2002 (and also used in CDR). The data are end-of-month, unsmoothed Fama-Bliss zero-coupon yields for 16 different maturities that range from three months to 30 years. Summary statistics of the yields are provided in Table 1, which lists the 16 maturities. Figure 2 displays the time series for the 3-month, 2-year and 10-year yields. 5.2. Estimation method All 16 maturities are used throughout. Since the five models are affine Gaussian, we estimate them by maximizing the likelihood function in the standard Kalman filter algorithm which is an efficient and consistent estimator in this setting (see Harvey, 1989). A separate advantage of the Kalman filter is that it lets the data speak on which maturities are fitted the best by each model. Thus, we avoid identifying the factors of the models by assuming a corresponding number of yields are observed without error as is done, e.g. in Duffee (2002). This is important for our analysis as we are comparing models with a varying number of factors and focus on the insample fit of the entire yield curve. For the DNS, DNSS and DGNS models, the state equation is Xt = (I − A)μ + AXt−1 + ηt ,

ηt ∼ N (0, Q),


C47


Maturity

Table 1. Summary statistics for U.S. Treasury Yields. Mean St.dev. Skewness

3 6

0.0509 0.0522

0.0174 0.0175

−0.0598 −0.1400

2.8199 2.7892

9 12

0.0533 0.0548

0.0176 0.0177

−0.1681 −0.1960

2.7474 2.7663

18 24 36

0.0570 0.0581 0.0606

0.0173 0.0166 0.0155

−0.1951 −0.1797 −0.1160

2.7605 2.7415 2.6952

48 60

0.0626 0.0636

0.0148 0.0144

−0.0829 −0.0196

2.5919 2.4418

84 96 108

0.0660 0.0670 0.0674

0.0138 0.0136 0.0136

0.0465 0.0610 0.0638

2.2071 2.1290 2.0617

120 180

0.0674 0.0716

0.0135 0.0123

0.0618 0.2130

1.9843 1.8874

240 360

0.0725 0.0677

0.0113 0.0121

0.0760 0.0589

1.7757 1.7428

Kurtosis

0.00

0.02

0.04

Yield

0.06

0.08

0.10

Note: The summary statistics for our sample of monthly observed unsmoothed Fama-Bliss zero-coupon Treasury bond yields, which covers the period from January 1987 to December 2002.

1990

1995

2000

Time

Figure 2. Time series of U.S. Treasury Yields. Illustration of the observed Treasury zero-coupon bond yields covering the period from January 1987 to December 2002. The yields shown have 3-month, 2-year and 10-year maturities. C The Author(s). Journal compilation C Royal Economic Society 2009.

C48


where X t = (L t , S t , C t ), X t = (L t , S t , C 1t , C 2t ) and X t = (L t , S 1t , S 2t , C 1t , C 2t ), respectively, while the measurement equation is given by yt = BXt + εt . Following Diebold et al. (2006), we start the algorithm at the unconditional mean and variance of the state variables. This assumes the state variables are stationary, which is imposed with the constraint that the eigenvalues of A are smaller than 1. For the continuous-time AFNS and AFGNS models, the conditional mean vector and the conditional covariance matrix are given by E P [XT |Ft ] = (I − exp(−K P t))θ P + exp(−K P t)Xt , t P P V P [XT |Ft ] = e−K s e−(K ) s ds, 0

where t = T − t. By discretizing the continuous dynamics under the P-measure, we obtain the state equation Xi = (I − exp(−K P ti ))θ P + exp(−K P ti )Xi−1 + ηt , where t i = t i − t i−1 is the time between observations. The conditional covariance matrix for the shock terms is given by ti P P Q= e−K s e−(K ) s ds. 0

Stationarity of the system under the P-measure is imposed by restricting the real component of each eigenvalue of K P to be positive. The Kalman filter for these models is also started at the unconditional mean and covariance 8 ∞ P P 0 = θ P and 0 = e−K s e−(K ) s ds. X 0

Finally, the AFNS and AFGNS measurement equation is given by yt = A + BXt + εt . For all five models, the error structure is ηt 0 Q0 ∼N , , 0 0 H εt where H is a diagonal matrix for the measurement errors of the 16 maturities used in estimation ⎛ 2 ⎞ 0 σ (τ1 ) . . . ⎜ . .. ⎟ .. ⎟ H =⎜ . . ⎠. ⎝ .. 0

8

In the estimation,

∞ 0

e−K

Ps

e−(K

P ) s

...

σ 2 (τ16 )

ds is approximated by

10 0

e−K

Ps

e−(K

P ) s

ds.


C49


A

L t−1

Table 2. Estimated dynamic parameters in the DNSS model. S t−1 C 1t−1 C 2t−1 μ

q

Lt

0.9839 (0.0145)

0

0

0

0.04907 (0.0112)

0.001835 (0.000280)

St

0

0.9889 (0.0126)

0

0

−0.006021 (0.0208)

0.002728 (0.000216)

C 1t

0

0

0

C 2t

0

0

0.9565 (0.0221) 0

0.003424 (0.0169) 0.06082

0.007988 (0.000448) 0.006355

(0.0422)

(0.000682)

0.9864 (0.0146)

Note: This table reports the estimated A matrix and μ vector along with the estimated parameters of the Q matrix in the independent-factor DNSS model for the sample period from January 1987 to December 2002. The maximum log likelihood value is 16658.40. The estimated value of λ 1 is 0.8379 (0.0117), while the estimated value of λ 2 is 0.09653 (0.0163). The numbers in parentheses are the estimated standard deviations of the parameter estimates.

The linear least-squares optimality of the Kalman filter requires that the transition and measurement errors be orthogonal to the initial state, i.e. E[f0 ηt ] = 0,

E[f0 εt ] = 0.

Finally, parameter standard deviations are calculated as −1 T ) ∂ log lt (ψ ) 1 1 ∂ log lt (ψ ) = (ψ , T T t=1 ∂ψ ∂ψ denotes the estimated model parameter set. where ψ 5.3. DNSS model estimation results Table 2 presents the estimated mean-reversion matrix A and the estimated vector of mean parameters μ, along with the estimated parameters of the conditional covariance matrix Q obtained for the DNSS model. The results reveal that the slope factor is the most persistent factor. Also, the relatively large standard deviations of the estimated mean parameters suggest some difficulty in pinning down their value under the P-measure, which is likely related to the fairly high persistence of the state variables (e.g. Kim and Orphanides, 2005). The λ 1 parameter is estimated at 0.838, which implies a factor loading for the first curvature factor that peaks near the 2-year maturity. The estimated value of λ 2 is 0.097, so the factor loading of the second curvature factor reaches its maximum near the 19-year maturity. (These are illustrated in Figure 1(a).) Clearly, the two curvature factors take on very different roles in the fit of the model. Volatility parameters across the various models are most easily compared by focusing on the 1-month conditional covariance matrix that they generate. For the independent-factor DNSS C The Author(s). Journal compilation C Royal Economic Society 2009.

C50 0.10


0.05

0.06 0.07

Estimated path of S(t)

0.08

Slope factor, DNSS Slope factor, DNS

0.03 0.04

Estimated path of L(t)

0.09

Level factor, DNSS Level factor, DNS

1990

1995

2000

1990

Time

1995

2000

Time

(b) Estimated slope factor St.

Estimated path of C(t)

(a) Estimated level factor Lt.

Curvature factor No. 1, DNSS Curvature factor, DNS 1990

1995

2000

Time

(c)Estimated first curvature factor C1t .

Figure 3. Level, slope and first curvature factors in the DNSS model.

model, the estimated matrix is given by ⎛ 3.37 × 10−6 ⎜ ⎜ 0 ⎜ QDNSS indep = qq = ⎜ 0 ⎝ 0

0

0

0

7.44 × 10−6

0

0

0

6.38 × 10−5

0

0

0

4.04 × 10−5

⎞ ⎟ ⎟ ⎟. ⎟ ⎠

(5.1)

The level factor has the smallest volatility, and the two curvature factors are the most volatile, similar to the CDR results for the DNS model. In Figure 3, we compare the estimated level, slope and first curvature factors in the DNSS model to the corresponding factors estimated by CDR for the independent-factor DNS model. The correlations for these three factors across the two models are 0.553, 0.844 and 0.899, respectively. Thus, only the level factor changes notably when the second curvature factor is C The Author(s). Journal compilation C Royal Economic Society 2009.

C51

0.05

0.10

Curvature factor No. 2, DNSS

0.00

Estimated path of C2(t)

0.15


1990

1995

2000

Time

Figure 4. Second curvature factor in the DNSS model.

added to the model. Intuitively, without the second curvature factor, only the level factor is able to fit the long-term yields. However, the second curvature factor can fit yields with maturities in the 10–30-year range, so when it is included, the level factor is allowed to fit other areas of the yield curve. Figure 4 shows the second curvature factor. The estimated path of the second curvature factor from the independent-factor DNSS model is shown with the 10-year yield for comparison. The purpose of this factor is to improve the fit of long-term yields, and there is a clear relationship between it and the 10-year yield (with a correlation coefficient of 0.793). The second curvature factor also inherits the downward trend observed in long-term yields over this sample period, while the DNSS level factor starts to look more stationary. Table 3 reports summary statistics for the fitted errors of all five models. With its additional flexibility, the DNSS model does show some improvement in fit over the DNS model, especially in the maturity range from 3 months to 8 years. There is also a slightly better DNSS model fit with long-term yields, which is consistent with the second curvature factor operating at long maturities. Figure 5 displays the fitted yield curves from the independent-factor DNS, AFNS, DNSS, DGNS, and AFGNS models estimated over the full sample from January 1987 to December 2002 on four specific dates (June 30, 1989, November 30, 1995, August 31, 1998, September 29, 2000). Observed yields are indicated with plus signs on these same dates. Figure 5 shows that at times the DNSS model still does not fit the long end of the yield curve very well. 9 Indeed, since the factor loading of the second curvature factor is practically flat in the 10–30-year maturity range, it can only provide a level difference between the shorter end of the yield curve and the very long end of the curve, but it cannot fit deviations between the 10-, 15-, 20- and 30-year yields. The fitted errors reported in Table 3 for the DNSS model can be compared loosely to the errors reported by Gürkaynak et al. (2007), who use the Svensson yield curve to fit bond yields. Importantly, they fit the curve separately for each business day with no regard for the time series 9 These four dates provide examples of the variety of yield curve shapes observed over this sample period and were selected by De Pooter (2007).


C52

Mat. in mos.


DNS indep.-factor

Table 3. Summary statistics of in-sample fit. AFNS DNSS DGNS indep.-factor indep.-factor indep.-factor

Mean

RMSE

Mean

RMSE

Mean

RMSE

Mean

−1.64 −0.24 −0.54

12.26 1.09 7.13

−2.85 −1.19 −1.24

18.54 7.12 3.44

2.53 0.01 −2.73

10.65 0.60 6.82

2.36 −0.06 −2.64

9.07 1.05 6.15

0.03 0.01 −1.58

9.52 0.86 5.94

12 18

4.04 7.22

11.19 10.76

3.58 7.15

9.60 10.44

0.53 3.19

8.16 5.87

0.77 3.60

6.84 5.56

1.99 4.12

7.62 6.11

24 36 48

1.18 −0.07 −0.67

5.83 1.51 3.92

1.37 0.31 −0.39

5.94 1.98 3.72

−1.82 0.07 1.69

4.11 2.68 3.78

−1.44 0.03 1.20

3.61 2.57 3.12

−1.76 −0.62 1.56

3.80 2.65 3.47

60 84

−5.33 −1.22

7.13 4.25

−5.27 −1.50

6.82 4.29

−2.32 −0.26

5.24 4.04

−2.99 −0.36

5.15 3.73

−1.56 0.65

4.71 3.92

96 108 120

1.31 0.03 −5.11

2.10 2.94 8.51

1.02 −0.11 −4.96

2.11 3.02 8.23

0.47 −2.67 −9.51

0.85 4.49 12.13

0.99 −1.41 −7.46

1.80 3.27 9.73

0.31 −4.56 −13.60

0.77 6.08 15.47

180 240

24.11 25.61

29.44 34.99

27.86 35.95

32.66 42.61

16.37 23.12

24.94 34.62

21.97 30.72

28.16 36.43

−0.04 1.51

12.03 6.67

360

−29.62

37.61

1.37

22.04

−8.65

24.45

−0.96

6.81

−2.65

24.62

1.19

11.29

3.82

11.41

1.25

9.59

2.77

8.32

−1.01

7.14

−0.16

7.13

0.10

6.97

0.04

5.56

−0.02

5.36

−0.01

6.01

3 6 9

Mean Median

RMSE

AFGNS indep.-factor Mean

RMSE

Note: The means and the root mean squared errors for 16 different maturities. All numbers are measured in basis points.

behavior of the extracted factors, which show dramatic variation over time. Their estimation will always produce a better fit on any given day than ours, but the fit of the DNSS model is quite comparable to theirs over the maturity range from 6 months to 9 years. 5.4. DGNS model estimation results Table 4 presents the estimated mean-reversion matrix A and the estimated vector of mean parameters μ along with the estimated parameters of the conditional covariance matrix Q for the independent-factor DGNS model. Relative to the independent-factor DNSS model reported in the previous section, the level factor and the two curvature factors preserve their relatively high rate of persistence after the inclusion of the second slope factor. However, for the two slope factors, we see a significant change in the estimated mean-reversion rates after this addition. Overall, though, all the factors have become less persistent than what we observed in the DNSS model. For the estimated mean parameters we find little change after adding the second slope factor to the model. If anything, it seems like the uncertainty about these parameters has declined notably. This ties in well with the fact that the factors have become less persistent, which allows the estimation to determine their means more precisely. C The Author(s). Journal compilation C Royal Economic Society 2009.

C53

0.060 Yield

0.056

0.058

0.078 0.076 0.074 0

5

DNS AFNS DNSS DGNS AFGNS

0.054


0.052

Yield

0.080

0.062

0.082

0.064


10

15

20

25

30

0

5

10


25

30

(b) November 30, 1995.


0.060 0.054

0.048

0.056

0.050

0.058

Yield

0.052

0.062


0.054

20

0.064

(a) June 30, 1989.

Yield

15


0

5

10

15

20

25

30

0

5

10

15

20



(c) August 31, 1998.

(d) September 29, 2000.

25

30

Figure 5. Fitted yield curves for four specific dates.

For the independent-factor DGNS model, the estimated q-parameters translate into a 1-month conditional covariance matrix given by ⎞ ⎛ 3.99 × 10−6 0 0 0 0 ⎟ ⎜ 0 0 0 0 1.86 × 10−5 ⎟ ⎜ ⎟ ⎜ DGNS −5 ⎟ . (5.2) ⎜ 0 0 0 0 1.20 × 10 Qindep = qq = ⎜ ⎟ ⎟ ⎜ 0 0 0 0 3.37 × 10−5 ⎠ ⎝ 0

0

0

0

2.73 × 10−5

This matrix shows that for the level factor and the two curvature factors the estimated volatilities are now smaller than the ones reported in equation (5.1) for the DNSS model. In contrast, the C The Author(s). Journal compilation C Royal Economic Society 2009.

C54


A

L t−1

Table 4. Estimated dynamic parameters in the DGNS model. S 1t−1 S 2t−1 C 1t−1 C 2t−1 μ

q

Lt

0.9758 (0.0239)

0

0

0

0

0.05140 (0.0104)

0.001998 (0.000268)

S 1t

0

0.9235 (0.0295)

0

0

0

−0.007039 (0.00718)

0.004309 (0.000371)

S 2t

0

0

0

0

C 1t

0

0

0.9306 (0.0341) 0

0.9543

0

0.0006993 (0.00686) −0.0006114

0.003462 (0.000363) 0.005807

0

(0.0223) 0

0.9782

(0.0109) 0.05536

(0.000405) 0.005223

(0.0194)

(0.0207)

(0.000756)

C 2t

0

0

Note: This table reports the estimated A matrix and μ vector along with the estimated parameters of the Q matrix in the DGNS model with independent factors for the sample period from January 1987 to December 2002. The maximum log likelihood value is 16816.08. The estimated value of λ 1 is 1.190 (0.0350), while the estimated value of λ 2 is 0.1021 (0.00863). The numbers in parentheses are the estimated standard deviations of the parameter estimates.

estimated volatilities of the two slope factors are notably higher than the one reported for the single slope factor in the DNSS model. The estimated values of λ 1 and λ 2 , which are 1.19 and 0.102, respectively, are also of interest. The estimated value of λ 1 is higher than the estimate of 0.838 obtained for the DNSS model, which implies that the factor loadings of the first slope and curvature factors decay to zero at a more rapid pace. Thus, as illustrated in Figure 1(b), movements in these two factors will have a limited impact on yields beyond the five-year maturity. However, that lack of influence is made up for by the second slope factor. The low estimate of λ 2 implies that this factor has a loading that decays very slowly. Therefore, this factor can affect the important intermediate range of maturities from 5 to 15 years of maturity. In Figure 6, we compare the estimated level, first slope, and first curvature factors from the Kalman filter estimation of the DGNS model with independent factors to the corresponding factors obtained for the DNS model (from CDR) and the DNSS model (described earlier in this section). For ease of comparison the estimated paths from the independent-factor DNS and DNSS models have been included. In all three cases the data used are unsmoothed Fama-Bliss yields covering the period from January 1987 to December 2002. The correlations of these three factors across the DNS and DGNS models are 0.730, 0.804 and 0.793, respectively. For the DNS and DNSS models, the correlations are 0.549, 0.821 and 0.949, respectively. Thus, while the level factor is affected by the addition of a second curvature factor, as in the DNSS model, the impact of a second slope factor, as in the DGNS model, is more limited. Also, the first slope and curvature factors have very similar sample paths across all three models. Given the fairly large estimated values of λ 1 in all three models, the factor loadings of these two factors decay towards zero relatively rapidly as a function of maturity, so their roles in fitting the shorter end of the yield curve are well defined. Figure 7 shows the estimated paths of the second slope and curvature factors of the independent-factor DGNS model. The estimated path of the second curvature factor from the independent-factor DNSS model has been included for comparison. There is a clear correlation between the curvature factor and the ten-year yield, as in the DNSS model. The second slope C The Author(s). Journal compilation C Royal Economic Society 2009.

C55

0.10


0.05 0.06

0.07

Estimated path of S(t)

0.08

Slope factor No. 1, DGNS Slope factor, DNSS Slope factor, DNS

0.03 0.04


0.09

Level factor, DGNS Level factor, DNSS Level factor, DNS

1990

1995

2000

1990

Time

1995

2000

Time

(b) The estimated first slope factor S1t .


(a) The estimated level factor Lt.

Curvature factor No. 1, DGNS Curvature factor No. 1, DNSS Curvature factor, DNS 1990

1995

2000

Time

(c) The estimated first curvature factor C1t .

Figure 6. Estimated paths of the level, first slope and first curvature factor in the DGNS model.

factor appears to be a stationary process with a fairly high rate of mean-reversion, but its intuition is not obvious. If we focus on the fit of the DGNS model in Table 3, we see fairly uniform improvement in the fit in the maturity range from 3 months to 10 years and a dramatic improvement in the fit of the 30-year yield. The improved fit for the long yield in the DGNS model relative to the DNSS model reflects the presence of the second slope factor and is also visible in Figure 5 . However, there is still no improvement for the 15- or 20-year yields, a deficiency that can perhaps be alleviated by imposing the AF restrictions. 5.5. AFGNS model estimation results Table 5 presents the estimated parameters for the mean-reversion matrix K P , the mean vector θ P , and the volatility matrix for the AFGNS model with independent factors. To compare C The Author(s). Journal compilation C Royal Economic Society 2009.

C56

0.05

0.10

Curvature factor No. 2, DGNS Curvature factor No. 2, DNSS

0.00

Estimated path of S2(t)


0.15


1990

1995

2000

1990

1995

Time

2000

Time

(a) The estimated second slope factor S t2 .

(b) The estimated second curvature factor C t2 .

Figure 7. Second slope and second curvature factors in the DGNS model.

KP K 1,· K 2,·

K ·,1 1.012 (0.716) 0

Table 5. Estimated dynamic parameters in the AFGNS model. K ·,2 K ·,3 K ·,4 K ·,5 θP 0

0

0

0

0.2685

0

0

0.3812

K 3,·

0

(0.497) 0

K 4,·

0

0

(0.603) 0

K 5,·

0

0

0

0

0.1165 (0.00651) −0.04551

0.01057 (0.000262) 0.01975

0

0

(0.0493) −0.02912

(0.00255) 0.01773

1.409 (0.970)

0

(0.0322) −0.02398 (0.0227)

(0.00225) 0.05049 (0.00304)

0

0.8940 (0.927)

−0.09662 (0.0338)

0.04304109 (0.00305)

Note: This table reports the estimated K P matrix and θ P mean vector along with the estimated parameters of the volatility matrix in the AFGNS model with independent factors for the sample period from January 1987 to December 2002. The maximum log likelihood value is 16982.52. λ 1 is estimated at 1.005 (0.0246) and λ 2 is estimated at 0.2343 (0.00922). The numbers in parentheses are the estimated standard deviations of the parameter estimates.

the estimated mean-reversion parameters in this model to the results reported for the previous models, we calculate the 1-month conditional discrete-time mean-reversion matrix, which is given by ⎞ ⎛ 0.9191 0 0 0 0 ⎜ 0.9779 0 0 0 ⎟ ⎟ ⎜ 0 ⎟ ⎜ 1 P ⎜ 0 0.9687 0 0 ⎟ (5.3) =⎜ 0 exp − K ⎟. 12 ⎟ ⎜ 0 0 0.8892 0 ⎠ ⎝ 0 0

0

0

0

0.9282


C57


Compared to the estimated A matrix reported for the DGNS model in Table 4, this shows that by imposing an absence of arbitrage on that model, the level and two curvature factors become notably less persistent, while the two slope factors become more persistent. Based on the estimated volatility parameters, the 1-month conditional covariance matrix in the AFGNS model is given by 1 12 P P AFGNS e−K s e−(K ) s ds Qindep = 0

⎛ ⎜ ⎜ ⎜ =⎜ ⎜ ⎜ ⎝

8.52 × 10−6

0

0

0

0

0

0.0000317

0

0

0

0

0

0.0000253

0

0

0

0

0

0.000188

0

0

0

0

0

0.000143

⎞ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎠

(5.4)

Across the board, the volatility of each factor is notably higher in the AFGNS model than in the corresponding non-AF DGNS model. The estimated AFGNS values of λ 1 and λ 2 are 1.01 and 0.234, compared with the DGNS values of 1.19 and 0.1016. The lower value of λ 1 implies that the first slope and curvature factors decay somewhat slower to zero than in the DGNS model, while the higher value of λ 2 indicates that the model is using the additional yield-adjustment term to get the level of the long-term yields right, which eases the tension on the second curvature factor. This shows up as a much larger estimate for λ 2 . Figure 8 displays the estimated level, first slope, and first curvature factors in the independentfactor AFGNS model with the estimated paths from the DNS, DNSS, and DGNS models for comparison. The correlations for these three factors across the AFGNS and DGNS models are 0.692, 0.668 and 0.952, respectively. Thus, for the level and first slope factors, the imposition of an absence of arbitrage leads to some changes. 10 Figure 9 illustrates the estimated second slope and curvature factors for the independentfactor AFGNS model and the effect of the increase in the estimated value of λ 2 . The corresponding estimated paths from the DNSS and DGNS models are shown for comparison. Figure 9(a) shows that there is a notable change in the path of the second slope factor in the AFGNS model relative to the DGNS model, and the two paths show a correlation of only 0.046. There is greater correlation between the AFGNS and DGNS second curvature factors (of 0.696), as depicted in Figure 9(b). Focusing on the fit of the AFGNS model in Table 3, it is clear that the AFGNS model provides a more balanced fit across maturities than the DNSS model. Indeed, only the 30-year yield does not really benefit from adding the second slope factor or the AF restrictions. There are also benefits relative to the DGNS model, especially on the four specific dates studied in Figure 5 when the improvement in the fit of the 15- and 20-year yields obtained with the AFGNS model is quite apparent. The increase in the maximum log likelihood value from 16816.08 to 16982.52 from the imposition of the AF restrictions also indicates that the overall fit of the model has been improved notably. 10 Note that, with the inclusion of the yield-adjustment term in the yield function of the AFGNS model, the estimated values of all five factors are rescaled relative to the estimated values obtained in the DGNS model (and reflected in the mean parameter estimates as well).


C58 0.10


0.09


0.08 0.07 0.06 0.05 0.03

0.04


Slope factor No. 1, AFGNS Slope factor No. 1, DGNS Slope factor, DNSS Slope factor, DNS

Level factor, DGNS Level factor, DNSS Level factor, DNS

1990

1995

2000

1990

Time

1995

2000

Time

(b) The estimated first slope factor S t1.


(a) The estimated level factor Lt.

Curvature factor No. 1, AFGNS Curvature factor No. 1, DGNS Curvature factor No. 1, DNSS Curvature factor, DNS

1990

1995

2000

Time

(c) The estimated first curvature factor Ct1.

Figure 8. Level, first slope and first curvature factors in the AFGNS model.

The only difference between the DGNS and the AFGNS models is tied to the yield) , which is a maturity-dependent function that appears in the yield function adjustment term, − C(τ τ as a result of the imposition of absence of arbitrage and is a consequence of convexity effects. Figure 10 displays the AFNS yield-adjustment term from CDR (and its three subcomponents) and the AFGNS yield-adjustment term (and its five subcomponents). 11 These two yield adjustments have similar shapes but a somewhat different scale. In the AFNS model, the yieldadjustment term stays below 50 basis points even at the 30-year maturity, while in the AFGNS model it reaches a full 3 percentage points at that same maturity. The AFGNS model uses the 11 As long as we only consider models with diagonal volatility matrices, the yield-adjustment term will be a negative, monotonically decreasing function of maturity that will eventually converge to −∞ due to the level factor imposed in the Nelson–Siegel model.


C59

0.15


0.05

0.10

Curvature factor No. 2 + 0.15, AFGNS Curvature factor No. 2, DGNS Curvature factor No. 2, DNSS

0.00



Slope factor No. 2, AFGNS Slope factor No. 2, DGNS

1990

1995

2000

1990

1995

Time

2000

Time

(a) The estimated second slope factor S t2 .

(b) The estimated second curvature factor C t2 .

Figure 9. Second slope and second curvature factors in the AFGNS model.

Level only Slope No. 1 only Curvature No. 1 only Slope No.2 only Curvature No. 2 only

Level only Slope only Curvature only

0

5

10

15

20

25

30

0

5

10

15

20

25

30

Maturity in years

Maturity in years

(a) Yield-adjustment in the AFNS model.

(b) Yield-adjustment in the AFGNS model.

Figure 10. Yield-adjustment term for the AFNS and AFGNS models.

large negative values of the yield adjustment at long maturities to generate the second hump of the yield curve in order to deliver a reasonable fit to the 15–30-year yields.

6. CONCLUSION The Nelson and Siegel (1987) curve and the associated dynamic DNS model of Diebold and Li (2006) both have trouble fitting long-maturity yields (in large part because of convexity effects). C The Author(s). Journal compilation C Royal Economic Society 2009.

C60


In this paper, we solve that problem while simultaneously imposing an absence of arbitrage. We argue that although the popular Svensson (1995) extension of the Nelson–Siegel curve may improve long-maturity fit, there does not exist an arbitrage-free yield-curve model that matches its factor loadings. However, we show that there is a natural five-factor generalization, which adds a second slope factor to join the additional curvature factor in the Svesson extension, that does achieve freedom from arbitrage. Finally, we show that the estimation of this new AFGNS model is tractable and provides good fit to the yield curve. The empirical tractability is especially important because, as noted in the introduction, it would be very difficult to estimate the maximally flexible five-factor affine arbitrage-free term structure model. Going forward, the AFGNS model may be a useful addition to the tool kit of central banks and practitioners who now use the non-AF Svensson extension of the Nelson–Seigel yield curve. Furthermore, we envision much future research that employs the underlying arbitrage-free Nelson–Seigel structure. In particular, given its tractable estimation, the basic AFNS model can be easily extended to incorporate other elements, such as stochastic volatility, inflation-indexed bond yields, or interbank lending rates (Christensen et al., 2008a, b, c). These extensions would be difficult to include in an estimated maximally flexible affine model but may help illuminate various important issues.

ACKNOWLEDGMENTS We thank Richard Smith for organizing the Special Session on Financial Econometrics at the 2008 meeting of the Royal Economic Society, at which we first presented this paper. We also thank our discussant, Alessio Sancetta. The views expressed are those of the authors and do not necessarily reflect the views of others at the Federal Reserve Bank of San Francisco.

REFERENCES Almeida, C. and J. Vicente (2008). The role of no-arbitrage in forecasting: lessons from a parametric term structure model. Forthcoming in Journal of Banking and Finance. Bank for International Settlements (2005). Zero-Coupon Yield Curves: Technical Documentation. BIS Papers No. 25, Bank for International Settlements. Björk, T. and B. J. Christensen (1999). Interest rate dynamics and consistent forward rate curves. Mathematical Finance 9, 323–48. Bowsher, C. G. and R. Meeks (2008). The dynamics of economic functions: modelling and forecasting the yield curve. Forthcoming in Journal of the American Statistical Association. Christensen, J. H., F. X. Diebold and G. D. Rudebusch (2007). The affine arbitrage-free class of Nelson–Siegel term structure models. NBER Working Paper No. 13611, National Bureau of Economic Research. Christensen, J. H., J. A. Lopez and G. D. Rudebusch (2008a). Inflation expectations and risk premiums in an arbitrage-free model of nominal and real bond yields. Working Paper, Federal Reserve Bank of San Francisco. Christensen, J. H., J. A. Lopez and G. D. Rudebusch (2008b). Do central bank liquidity facilities affect interbank lending rates? Working Paper, Federal Reserve Bank of San Francisco. Christensen, J. H., J. A. Lopez and G. D. Rudebusch (2008c). Stochastic volatility in arbitrage-free NelsonSiegel models of the term structure. Working Paper, Federal Reserve Bank of San Francisco. C The Author(s). Journal compilation C Royal Economic Society 2009.

C61


Coroneo, L., K. Nyholm and R. Vidova-Koleva (2008). How arbitrage-free is the Nelson-Siegel model? Working Paper Series #874, European Central Bank. Dai, Q. and K. Singleton (2000). Specification analysis of affine term structure models. Journal of Finance 55, 1943–78. De Pooter, M. (2007). Examining the Nelson-Siegel class of term structure models. Discussion Paper No. 43, Tinbergen Institute. Diebold, F. X. and C. Li (2006). Forecasting the term structure of government bond yields. Journal of Econometrics 130, 337–64. Diebold, F. X., M. Piazzesi and G. D. Rudebusch (2005). Modeling bond yields in finance and macroeconomics. American Economic Review 95, 415–20. Diebold, F. X., G. D. Rudebusch and S. B. Aruoba (2006). The macroeconomy and the yield curve: a dynamic latent factor approach. Journal of Econometrics 131, 309–38. Duffee, G. (2002). Term premia and interest rate forecasts in affine models. Journal of Finance 57, 405– 43. Filipović, D. (1999). A note on the Nelson-Siegel family. Mathematical Finance 9, 349–59. Gürkaynak, R., B. Sack and J. H. Wright (2007). The U.S. treasury yield curve: 1961 to the present. Journal of Monetary Economics 54, 2291–304. Gürkaynak, R., B. Sack and J. H. Wright (2008). The TIPS yield curve and inflation compensation. Finance and Economics Discussion Series No. 2008-05. Board of Governors of the Federal Reserve System. Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. Kim, D. H. and A. Orphanides (2005). Term structure estimation with survey data on interest rate forecasts. Finance and Economics Discussion Series No. 2005-48. Board of Governors of the Federal Reserve System. Nelson, C. R. and A. F. Siegel (1987). Parsimonious Modeling of Yield Curves. Journal of Business 60, 473–89. Rudebusch, G. D. and T. Wu (2007). Accounting for a shift in term structure behavior with no-arbitrage and macro-finance models. Journal of Money, Credit, and Banking 39, 395–422. Rudebusch, G. D. and T. Wu (2008). A macro-finance model of the term structure, monetary policy, and the economy. Economic Journal 118, 906–26. Söderlind, P. and L. E. O. Svensson (1997). New techniques to extract market expectations from financial instruments. Journal of Monetary Economics 40, 383–429. Svennsson, L. E. O. (1995). Estimating forward interest rates with the extended Nelson-Siegel method. Quarterly Review 1995: 3, Sveriges Riksbank, 13–26.

APPENDIX: YIELD-ADJUSTMENT TERM IN THE AFGNS MODEL Given a general volatility matrix ⎛

σ11

⎜ ⎜ σ21 ⎜ ⎜ = ⎜ σ31 ⎜ ⎜σ ⎝ 41 σ51

σ12

σ13

σ14

σ22

σ23

σ24

σ32

σ33

σ34

σ42

σ43

σ44

σ52

σ53

σ54


σ15

⎞

⎟ σ25 ⎟ ⎟ ⎟ σ35 ⎟ ⎟ σ45 ⎟ ⎠ σ55

C62


the analytical AFGNS yield-adjustment term, via calculations available from the authors, is T 5 C(t, T ) 1 1 B(s, T )B(s, T ) j ,j ds = T −t 2 T − t t j =1 (T − t)2 6 1 1 − e−2λ1 (T −t) 1 1 1 − e−λ1 (T −t) + 3 − 3 +B T −t T −t 2λ21 λ1 4λ1 1 1 − e−2λ2 (T −t) 1 1 1 − e−λ2 (T −t) + +C − T −t T −t 2λ22 λ32 4λ32 1 1 1 3 +D + 2 e−λ1 (T −t) − (T − t)e−2λ1 (T −t) − 2 e−2λ1 (T −t) 2 4λ1 2λ1 λ1 4λ1 −2λ1 (T −t) −λ1 (T −t) 2 1−e 5 1−e − 3 + 3 T −t T −t 8λ1 λ1 1 1 1 3 + 2 e−λ2 (T −t) − (T − t)e−2λ2 (T −t) − 2 e−2λ2 (T −t) +E 2 4λ2 2λ2 λ2 4λ2 −2λ2 (T −t) −λ2 (T −t) 5 1−e 2 1−e + 3 − 3 T − t T −t 8λ2 λ2 1 1 1 1 − e−λ1 (T −t) +F (T − t) + 2 e−λ1 (T −t) − 3 2λ1 T −t λ1 λ1 1 1 1 1 − e−λ2 (T −t) +G (T − t) + 2 e−λ2 (T −t) − 3 2λ2 T −t λ2 λ2 3 −λ1 (T −t) 1 1 3 1 − e−λ1 (T −t) −λ1 (T −t) + H 2e + (T − t) + (T − t)e − 3 2λ1 λ1 T −t λ1 λ1 3 −λ2 (T −t) 1 1 3 1 − e−λ2 (T −t) −λ2 (T −t) + I 2e + (T − t) + (T − t)e − 3 2λ2 λ2 T −t λ2 λ2 1 1 − e−λ2 (T −t) 1 1 − e−λ1 (T −t) 1 − +J − 2 λ1 λ2 T −t T −t λ1 λ2 λ1 λ22 −(λ1 +λ2 )(T −t) 1−e 1 + λ1 λ2 (λ1 + λ2 ) T −t 1 1 1 3 1 − e−λ1 (T −t) + K 2 + 2 e−λ1 (T −t) − 2 e−2λ1 (T −t) − 3 T −t λ1 λ1 2λ1 λ1 3 1 − e−2λ1 (T −t) + 3 T −t 4λ1 1 1 −λ2 (T −t) 1 + e − +L e−(λ1 +λ2 )(T −t) λ1 λ2 λ1 λ2 λ1 (λ1 + λ2 )

=A

−

2 1 − e−λ2 (T −t) 1 1 − e−λ1 (T −t) − T −t T −t λ21 λ2 λ1 λ22 C The Author(s). Journal compilation C Royal Economic Society 2009.


λ1 + 2λ2 −(λ1 +λ2 )(T −t) 1−e + λ1 λ2 (λ1 + λ2 )2 1 1 −λ1 (T −t) 1 + e − +M e−(λ1 +λ2 )(T −t) λ1 λ2 λ1 λ2 λ2 (λ1 + λ2 ) 2 1 − e−λ1 (T −t) 1 1 − e−λ2 (T −t) − 2 2 T −t T −t λ1 λ2 λ1 λ2 λ2 + 2λ1 −(λ1 +λ2 )(T −t) 1−e + λ1 λ2 (λ1 + λ2 )2 1 1 1 3 1 − e−λ2 (T −t) + N 2 + 2 e−λ2 (T −t) − 2 e−2λ2 (T −t) − 3 T −t λ2 λ2 2λ2 λ2 3 1 − e−2λ2 (T −t) + 3 T −t 4λ2 1 1 −λ1 (T −t) 1 −λ2 (T −t) +O + e + e λ1 λ2 λ1 λ2 λ1 λ2 1 1 1 2 − + e−(λ1 +λ2 )(T −t) − e−(λ1 +λ2 )(T −t) λ1 λ2 λ1 + λ2 (λ1 + λ2 )2 1 − (T − t)e−(λ1 +λ2 )(T −t) λ1 + λ2 2 1 − e−λ2 (T −t) 2 1 − e−λ1 (T −t) − − 2 T −t T −t λ1 λ2 λ1 λ22 −

1 − e−(λ1 +λ2 )(T −t) 2 (λ1 + λ2 )3 T −t 1 1 − e−(λ1 +λ2 )(T −t) 1 1 + + λ1 λ2 (λ1 + λ2 )2 T −t −(λ1 +λ2 )(T −t) 1−e 1 , + λ1 λ2 (λ1 + λ2 ) T −t +

where • • • • • • • • • • • •

2 2 2 2 2 A = σ11 + σ12 + σ13 + σ14 + σ15 , 2 2 2 2 2 B = σ21 + σ22 + σ23 + σ24 + σ25 , 2 2 2 2 2 C = σ31 + σ32 + σ33 + σ34 + σ35 , 2 2 2 2 2 D = σ41 + σ42 + σ43 + σ44 + σ45 , 2 2 2 2 2 E = σ51 + σ52 + σ53 + σ54 + σ55 , F = σ11 σ21 + σ12 σ22 + σ13 σ23 + σ14 σ24 + σ15 σ25 , G = σ11 σ31 + σ12 σ32 + σ13 σ33 + σ14 σ34 + σ15 σ35 , H = σ11 σ41 + σ12 σ42 + σ13 σ43 + σ14 σ44 + σ15 σ45 , I = σ11 σ51 + σ12 σ52 + σ13 σ53 + σ14 σ54 + σ15 σ55 , J = σ21 σ31 + σ22 σ32 + σ23 σ33 + σ24 σ34 + σ25 σ35 , K = σ21 σ41 + σ22 σ42 + σ23 σ43 + σ24 σ44 + σ25 σ45 , L = σ21 σ51 + σ22 σ52 + σ23 σ53 + σ24 σ54 + σ25 σ55 ,


C63

C64


• M = σ31 σ41 + σ32 σ42 + σ33 σ43 + σ34 σ44 + σ35 σ45 , • N = σ31 σ51 + σ32 σ52 + σ33 σ53 + σ34 σ54 + σ35 σ55 , • O = σ41 σ51 + σ42 σ52 + σ43 σ53 + σ44 σ54 + σ45 σ55 . Empirically, we can only identify the 15 terms (A, B, C, D, E, F , G, H , I , J , K, L, M, N , O). Thus, not all 25 volatility parameters can be identified. This implies that the maximally flexible specification that is well identified has a volatility matrix given by a triangular volatility matrix 12 ⎞ ⎛ σ11 0 0 0 0 ⎟ ⎜ ⎜ σ21 σ22 0 0 0 ⎟ ⎟ ⎜ ⎟ ⎜ 0 ⎟. = ⎜ σ31 σ32 σ33 0 ⎟ ⎜ ⎟ ⎜σ ⎝ 41 σ42 σ43 σ44 0 ⎠ σ51 σ52 σ53 σ54 σ55

12

Note that it can be either upper or lower triangular. The choice is irrelevant for the fit of the model. C The Author(s). Journal compilation C Royal Economic Society 2009.


The econometrics of mean-variance efficiency tests: a survey E NRIQUE S ENTANA † †

CEMFI, Casado del Alisal 5, E-28014 Madrid, Spain E-mail: [email protected]

First version received: May 2008; final version accepted: June 2009

Summary This paper provides a comprehensive survey of the econometrics of meanvariance efficiency tests. Starting with the classic F-test of Gibbons et al. (1989) and its generalized method of moments version, I analyse the effects of the number of assets and portfolio composition on test power. I then discuss asymptotically equivalent tests based on portfolio weights, and study the trade-offs between efficiency and robustness of using parametric and semi-parametric likelihood procedures that assume either elliptical innovations or elliptical returns. After reviewing finite sample tests, I conclude with a discussion of meanvariance-skewness efficiency and spanning tests, and other interesting extensions. Keywords: Elliptical distributions, Exogeneity, Financial returns, Generalized method of moments, Linear factor pricing, Maximum likelihood, Portfolio choice, Stochastic discount factor.

1. INTRODUCTION Mean-variance analysis is widely regarded as the cornerstone of modern investment theory. Despite its simplicity, and the fact that more than five and a half decades have elapsed since Markowitz published his seminal work on the theory of portfolio allocation under uncertainty (Markowitz, 1952), it remains the most widely used asset allocation method. There are several reasons for its popularity. First, it provides a very intuitive assessment of the relative merits of alternative portfolios, as their risk and expected return characteristics can be compared in a two-dimensional graph. Second, mean-variance frontiers are spanned by only two funds, a property that simplifies their calculation and interpretation, and that also led to the derivation of the Capital Asset Pricing Model (CAPM) by Sharpe (1964), Lintner (1965) and Mossin (1966). Finally, mean-variance analysis becomes the natural approach if we assume Gaussian or elliptical distributions for asset returns, because in that case it is fully compatible with expected utility maximization regardless of investor preferences (see e.g. Chamberlain, 1983a, Owen and Rabinovitch, 1983, and Berk, 1997; see also Ross, 1978, for a related discussion). A portfolio with excess returns r 1t is mean-variance efficient with respect to a given set of N 2 assets with excess returns r 2t if it is not possible to form another portfolio of those assets and r 1t with the same expected return as r 1t but a lower variance, or more appropriately, with the same variance but a higher expected return. Despite the simplicity of the definition, testing for mean-variance efficiency is of paramount importance in many practical situations, such as mutual fund performance evaluation (see De Roon and Nijman, 2001, for a recent survey), gains from portfolio diversification (Errunza et al., 1999), or tests of linear factor asset pricing models, C The Author(s). Journal compilation C Royal Economic Society 2009. Published by Blackwell Publishing Ltd, 9600 Garsington Road,


C66

E. Sentana

including the CAPM and APT, which imply that certain portfolio must be mean-variance efficient (see e.g. Campbell et al., 1996, or Cochrane, 2001, for advanced textbook treatments). If the first two moments of returns were known, then it would be straightforward to confirm or disprove the mean-variance efficiency of r 1t by simply checking whether they lied on the portfolio frontier spanned by r t = (r 1t , r2t ) . In practice, of course, the mean and variance of portfolio returns are unknown, and the sample mean and standard deviation of r 1t will lie inside the estimated mean-variance frontier with probability one. Therefore, a statistical hypothesis test provides a rather natural decision method in this context because it explicitly takes into account the sampling variability in the estimation of the first two moments of returns. Otherwise, such a variability would be misleading because the inclusion of additional assets systematically leads to the expansion of the sample frontiers irrespective of whether the theoretical frontier is affected, in the same way as the inclusion of additional regressors systematically leads to increments in sample R 2 ’s regardless of whether their theoretical regression coefficients are 0. To emphasize the importance of sampling uncertainty in this context, I have conducted the following simulation experiment. I have assumed that investors have access to a reference asset with excess returns r 1t and three additional assets, whose excess returns r it , r j t and r kt are i.i.d. with an annual mean of 0%, uncorrelated among themselves and with the original asset, so that the true maximum Sharpe ratio (i.e. the ratio of the expected excess return on a portfolio to its standard deviation) does not increase. Then I simulate two years of daily data many times, and compute the original and augmented mean-variance frontiers, as well as the incremental one, which is based on the differences between r it , r j t and r kt and their best tracking portfolios based on r 1t . Figure 1 presents part of the ensemble of incremental frontiers, while Figure 2 contains the sampling distribution of the GMM estimator of the incremental Sharpe ratio. As can be seen from both pictures, if one did not take into account sampling uncertainty then one would always

60

50

mean xs.ret.

40

30

20

10

0 0

5

10 std.dev.

15

20

Figure 1. Incremental mean-variance frontiers. C The Author(s). Journal compilation C Royal Economic Society 2009.

C67

The econometrics of mean-variance efficiency tests

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

Figure 2. GMM estimator of incremental Sharpe ratio (ν = 8).

conclude that there are clear gains from also investing in r it , r j t and r kt when in reality there are none. In fact, the sampling uncertainty surrounding expected returns is so large that several authors have forcefully raised some doubts about the usual practice of applying meanvariance investment rules replacing expected returns, variances and covariances by their sampling counterparts. In this sense, there are several solutions that explicitly take into account sampling uncertainty in making portfolio decisions in practice. These include not only Bayesian approaches but also classical ones. For instance, the modifications of the plug-in rule suggested by ter Horst et al. (2006) or Antoine (2008) from a classical perspective, as well as the Bayesian solution proposed by Bawa et al. (1979) and others amount to levering up or more likely down the usual mean-variance portfolio rule by effectively changing the risk aversion parameter of the investor. However, the maximum Sharpe ratio attainable remains the same. Hence, an investor who currently applies one of those alternative rules to a vector of N 1 excess returns r 1t , say, but who is considering whether or not to diversify her investments into r 2t , should still be interested in conducting a mean-variance efficiency test. The purpose of this paper is to survey mean-variance efficiency tests, with an emphasis on methodology rather than empirical findings, and paying more attention to some recent contributions and their econometric subtleties. In this sense, it complements previous surveys by Shanken (1996), Campbell et al. (1997) and Cochrane (2001). In order to accommodate most of the literature, in what follows I shall often work with the vector r 1t , so that the null hypothesis should be understood as saying that some portfolio of the N 1 elements in r 1t lies on the efficient part of the mean-variance frontier spanned by r 1t and r 2t . 1 1 In this sense, it is important to note that in the case in which r contains a single asset, the null hypothesis only says 1t that r 1t spans the mean-variance frontier, so in principle it could lie on its inefficient part (see GRS). C The Author(s). Journal compilation C Royal Economic Society 2009.

C68

E. Sentana

The rest of the paper is organized as follows. I introduce the theoretical set-up in Section 2, review the original tests in Section 3, and analyse the effects of the number of assets and portfolio composition on test power in Section 4. Then I discuss asymptotically equivalent tests based on portfolio weights in Section 5, and study the trade-offs between efficiency and robustness of using parametric and semi-parametric likelihood procedures that assume either elliptical innovations or elliptical returns in Section 6. After reviewing finite sample tests in Section 7, I conclude with a discussion of mean-variance-skewness efficiency and spanning tests in Section 8. Finally, I mention some related topics and suggestions for future work in Section 9. Proofs of the few formal results that I present can be found in the original references.

2. MEAN-VARIANCE PORTFOLIO FRONTIERS Consider a world with one riskless asset, and a finite number N of risky assets. Let R 0 denote the gross return on the safe asset (that is, the total payoff per unit invested, which includes capital gains plus any cash flows received), R = (R 1 , R 2 , . . . , R N ) the vector of gross returns on the N remaining assets, with vector of means and matrix of variances and covariances ν and , respectively, which I assume bounded. Let p = w 0 R 0 + w 1 R 1 + · · · + wN RN denote the payoffs to a portfolio of the N + 1 primitive assets with weights given by w 0 and the vector w = (w 1 , w 2 , . . . , w N ) . Importantly, I assume that there are no transaction costs or other impediments to trade, and in particular, that short-sales are allowed. I also assume that the wealth of any particular investor is such that her individual behaviour does not alter the distribution of returns. There are at least three characteristics of portfolios in which investors are usually interested: their cost, the expected value of their payoffs, and their variance, given by C(p) = w0 + w ιN , E(p) = w0 R0 + w ν and V (p) = w w, respectively, where ιN is a vector of N ones. Let P be the set of payoffs from all possible portfolios of the N + 1 original assets, i.e. the linear span of (R 0 , R ), R 0 , R . Within this set, several subsets deserve special attention. For instance, it is worth considering all unit cost portfolios R = {p ∈ P : C(p) = 1}, whose payoffs can be directly understood as returns per unit invested; and also all zero cost, or arbitrage portfolios A = {p ∈ P : C(p) = 0}. In this sense, note that any non-arbitrage portfolio can be transformed into a unit-cost portfolio by simply scaling its weights by its cost. Similarly, if r = R−R0 ιN denotes the vector of returns on the N primitive risky assets in excess of the riskless asset, it is clear that A coincides with the linear span of r, r. The main advantage of working with excess returns is that their expected values μ = ν − R0 ιN directly give us the risk premia of R, without altering their covariance structure. On the other hand, one must distinguish between riskless portfolios, S = {p ∈ P : V (p) = 0} and the rest. In what follows, I shall impose restrictions on the elements of S so that there are no riskless ‘arbitrage’ opportunities. In particular, I shall assume that is regular, so that S is limited to the linear span of R 0 , and the law of one price holds (i.e. portfolios with the same payoffs have the same cost). I shall also assume that R 0 is strictly positive (in practice, R 0 ≥ 1 for nominal returns). A simple, yet generally incomplete method of describing the choice set of an agent is in terms of the mean and variance of all the portfolios that she can afford. Let us consider initially the case of an agent who has no wealth whatsoever, which means that she can only choose portfolios in A. In this context, frontier arbitrage portfolios, in the usual mean-variance sense, will be those that solve the program min V( p) subject to the restrictions C(p) = 0 and E(p) = μ, ¯ with μ¯ real. Given that C(p) = 0 is equivalent to p = w r, I can re-write this problem as min w w w subject ¯ There are two possibilities: (i) μ = 0, when the frontier can only be defined for to w μ = μ. C The Author(s). Journal compilation C Royal Economic Society 2009.


C69

μ¯ = 0; or (ii) μ = 0, in which case the solution for each μ¯ is w∗ (μ) ¯ = μ(μ ¯ −1 μ)−1 −1 μ. As a consequence, the arbitrage portfolio rp = (μ −1 μ)−1 μ −1 r generates the whole zerocost frontier, in what can be called one-fund spanning. Moreover, given that the variance of the frontier portfolios with mean μ¯ will be μ¯ 2 (μ −1 μ)−1 , in mean-standard deviationspace the frontier is a straight line reflected in the origin whose efficient section has slope μ −1 μ. Therefore, this slope fully characterizes in mean-variance terms the investment opportunity set of an investor with no wealth, as it implicitly measures the trade-off between risk and return that the available assets allow at the aggregate level. Traditionally, however, the frontier is usually obtained for unit-cost portfolios, and not for arbitrage portfolios. Nevertheless, given that the payoffs of any portfolio in R can be replicated by means of a unit of the safe asset and a portfolio in A, in mean-standard deviation space, the frontier for R is simply the frontier for A shifted upwards in parallel by the amount R0 . And although now we will have two-fund spanning, for a given safe rate, the slope μ −1 μ continues to fully characterize the investment opportunity set of an agent with positive wealth. An alternative graphical interpretation of the same result would be as follows. The trade-off between risk and return of any unit-cost portfolio in R is usually measured as the ratio of its risk premium to its standard deviation. More formally, if Ru ∈ R, then s(r u ) = μ u /σ u , where μ u = E(r u ), σ 2u = V (r u ) and r u = R u − R 0 . This expression, known as the Sharpe ratio of the portfolio after Sharpe (1966, 1994), remains constant for any portfolio whose mean excess return and standard deviation lie along the ray which, starting at the origin, passes through the point (μ u , σ u ) because the Sharpe ratio coincides with the slope of this ray. As a result, the steeper (flatter) a ray is (i.e. the closer to the y (x) axis), the higher (lower) the correspondingSharpe ratio. Then, since μ p = 1 and σp2 = (μ −1 μ)−1 , the slope s(rp ) = μp /σp = μ −1 μ will give us the Sharpe ratio of Rp (wrp ) = R0 + wrp rp for any wrp > 0, which is the highest attainable. Therefore, in mean excess return-standard deviation space, all Rp (wrp ) lie on a positively sloped straight line that starts from the origin. Assuming that ιN −1 μ > 0, as the investor moves away from the origin, where she is holding all her wealth in the safe asset, the net total position invested in the riskless asset is steadily decreasing, and eventually becomes zero. Beyond that point, she begins to borrow in the money market to lever up her position in the financial markets. 2 The main point to remember, though, is that a portfolio will span the mean-variance frontier if and only if its square Sharpe ratio is maximum. As we shall see below, this equivalence relationship underlies most mean-variance efficiency tests. 2 The portfolio at which the net position in the riskless asset is exactly 0 is known as the ‘tangency’ portfolio, because when ιN −1 μ = 0, there is tangency at that particular point between the mean-variance frontier without a riskless asset and the analogous frontier with a riskless asset in expected return–standard deviation space. The exact expression for its weights is:

w∗ (μ −1 μ/ιN −1 μ) = (ιN −1 μ)−1 −1 μ. If ιN −1 μ < 0 (> 0), the expected excess return of this portfolio will be negative (positive), which means that tangency will take place along the inefficient (efficient) section of the mean-variance frontier for excess returns (see e.g. Maller and Turkington, 2002). If ιN −1 μ = 0, though, the frontier with a riskless asset coincides with the asymptotes of the frontier without a riskless asset, so strictly speaking no tangency portfolio exists. C The Author(s). Journal compilation C Royal Economic Society 2009.

C70

E. Sentana

For our purposes, it is useful to relate the maximum Sharpe ratio to the Sharpe ratio of the N underlying assets. Proposition 3 in Sentana (2005) gives the required expression. P ROPOSITION 2.1. The Sharpe ratio of the optimal portfolio (in the unconditional meanvariance sense), s(r p ), only depends on the vector of Sharpe ratios of the N underlying assets, s(r), and their correlation matrix, ρ rr = dg −1/2 ()dg −1/2 () through the following quadratic form: s 2 (rp ) = s(r) ρ −1 rr s(r),

(2.1)

where dg() is a matrix containing the diagonal elements of and zeros elsewhere. The above expression, which for the case of N = 2 adopts the particularly simple form: s 2 (rp ) =

2 1 s (r1 ) + s 2 (r2 ) − 2ρr1 r2 s(r1 )s(r2 ) , 2 1 − ρr1 r2

(2.2)

where ρr1 r2 = cor(r1 , r2 ), turns out to be remarkably similar to the formula that relates the R2 of the multiple regression of r on (a constant and) x with the correlations of the simple regressions. Specifically, R 2 = ρ xr ρ −1 xx ρ xr .

(2.3)

The similarity is not merely coincidental. From the mathematics of the mean-variance frontier, we know that E(r j ) = cov(r j , r p )E(r p )/V (r p ), and therefore that s(r j ) = cor(r j , r p )s(r p ). In other words, the correlation coefficient between r j and r p is s(r j )/s(r p ), i.e. the ratio of their Sharpe ratios. Hence, the result in Proposition 2.1 follows from (2.3) and the fact that the coefficient of determination in the multiple regression of r p on r will be 1 because r p is a linear combination of this vector. We can use the partitioned inverse formula to alternatively write expression (2.1) in the following convenient form −1 2 −1 s 2 (rp ) = s(r1 ) ρ −1 r1 r1 s(r1 ) + s(z2 ) ρ zz s(z2 ) = s (rp1 ) + a a,

(2.4)

where s(rp1 ) is the Sharpe ratio of the tangency portfolio obtained from r 1 alone, rp1 = μ1 −1 11 r1 , r contains the components of r whose risk has been fully hedged the vector z 2 = r 2 − 21 −1 1 2 11 −1/2 ()dg −1/2 () and = against the risk of r1 , a = E(z2 ) = μ2 − 21 −1 11 μ1 , ρ zz = dg −1 22 − 21 11 12 . Given that we can interpret a as the intercepts in the theoretical least-squares projection of r 2 on a constant and r 1 , it trivially follows from (2.4) that r 1 will be mean-variance efficient if and only if a = 0 (see Black et al., 1972, Jobson and Korkie, 1982, 1985, Huberman and Kandel, 1987, and Gibbons et al., 1989, hereinafter GRS). In the bivariate case, equation (2.4) reduces to: s 2 (rp ) = s 2 (r1 ) + s 2 (z2 ), where μ2 − (σ12 /σ12 )μ1 a2 s(r2 ) − ρ12 s(r1 ) = = s(z2 ) = ω2 2 2 σ22 − σ12 1 − ρ12 /σ12 is the Sharpe ratio of z 2 = r 2 − σ 12 /σ 21 r 1 . When r 1 is regarded as a benchmark portfolio, s(z 2 ) is often known as the information (or appraisal) ratio of r 2 . C The Author(s). Journal compilation C Royal Economic Society 2009.

C71


Corollary 1 in Shanken (1987a) provides the following alternative expression for the maximum Sharpe ratio of z 2 in terms of the Sharpe ratio of rp1 and the correlation between this portfolio and r p : 1 −1 2 s(z2 ) ρ zz s(z2 ) = s (rp1 ) −1 . cor2 (rp1 , rp ) This result exploits the previously mentioned fact that cor(rp1 , rp ) = s(rp1 )/s(rp ) (see also Kandel and Stambaugh, 1987, and Meloso and Bossaerts, 2006). Intuitively, the incremental Sharpe ratio will reach its minimum value of 0 when rp1 = rp but it will increase as the correlation between those two portfolios decreases.

3. THE ORIGINAL TESTS The framework described in the previous section has an implicit time dimension that corresponds to the investment horizon of the agents. To make it econometrically operational for a panel data of excess returns on N 1 + N 2 = N assets over T periods whose length supposedly coincides with the relevant investment horizon, GRS considered the following multivariate, conditionally homoscedastic, linear regression model: r2t = a + Br1t + ut = a + Br1t + 1/2 ε ∗t ,

(3.1)

where a is the N 2 × 1 vector of intercepts, B is an N 2 × N 1 matrix of regression coefficients, 1/2 is an N 2 × N 2 ‘square root’ matrix such that 1/2 1/2 = , ε ∗t is an N 2 -dimensional standardized vector martingale difference sequence satisfying E(ε∗t |r1t , It−1 ; γ 0 , ω0 ) = 0 and V (εt∗ |r1t , It−1 ; γ 0 , ω0 ) = IN2 , γ = (a , b ), b = vec(B), ω = vech(), the subscript 0 refers to the true values of the parameters, and I t−1 denotes the information set available at t − 1, which contains at least past values of r 1t and r 2t . Crucially, GRS assumed that conditional on r 1t and It−1 , ε ∗t is independent and identically distributed as a spherical Gaussian random vector, or εt∗ |r1t , It−1 ; γ 0 , ω0 ∼ i.i.d. N(0, IN2 ) for short. Given the structure of the model, the unrestricted Gaussian ML estimators of a and B coincide with the equation by equation OLS estimators in the regression of each element of r 2t on a constant and r 1t . Consequently, aˆ = μˆ 2 − Bˆ μˆ 1 ,

(3.2)

ˆ 21 ˆ −1 Bˆ = 11 ,

(3.3)

ˆ = ˆ 22 − ˆ 21 ˆ −1 ˆ 11 21 , where

μˆ =

ˆ =

μˆ 1 μˆ 2

ˆ 11

ˆ 21

T 1 r1t = , T t=1 r2t T

ˆ 21 1 r1t r1t = T t=1 r2t r1t

ˆ 22

ˆ = ˆ − μˆ μˆ . and C The Author(s). Journal compilation C Royal Economic Society 2009.

r1t r2t r2t r2t

,

C72

E. Sentana

In fact, aˆ and Bˆ would continue to be the Gaussian ML estimators if the matrix 0 were known. In those circumstances, the results in Breusch (1979) would imply that the Wald (W T ), LR (LR T ) and LM (LM T ) test statistics for the null hypothesis H 0 : a = 0 would all be numerically identical to T ·

ˆ aˆ −1 0 a

ˆ −1 ˆ1 1 + μˆ 1 11 μ

,

ˆ 11 would be that whose finite sample distribution conditional on the sufficient statistics μˆ 1 and 2 of a non-central χ with N 2 degrees of freedom and non-centrality parameter T · a0 −1 0 a0 /(1 + 3 ˆ 11 , is ˆ −1 ˆ ˆ ˆ μ ). The reason is that the finite sample distribution of a , conditional on μ and μˆ 1 1 11 1 ˆ −1 −1 multivariate normal with mean a 0 and covariance matrix T (1 + μˆ 1 11 μˆ 1 )0 . In practice, of course, 0 is unknown, and has to be estimated along the other parameters. But then, the Wald, LM and LR tests no longer coincide. However, for fixed N 2 and large T all three tests will be asymptotically distributed as the same non-central χ 2 with N 2 degrees of freedom and non-centrality parameter a˜ −1 a˜ 1 + μ1 −1 11 μ1

√ under the Pitman sequence of local alternatives HlT : a = a˜ / T (see Newey and McFadden, 1994). In contrast, they will separately diverge to infinity for fixed alternatives of the form Hf : a = a˙ , which makes them consistent tests. In the case of the Wald test, in particular, we can use Theorem 1 in Geweke (1981) to show that p lim

a˙ −1 a˙ 1 WT = T 1 + μ1 −1 11 μ1

coincides with Bahadur’s (1960) definition of the approximate slope of the Wald test. 4 In finite samples, though, the test statistics satisfy the following inequalities: WT ≥ LRT ≥ LMT , which may lead to the conflict among criteria for testing hypotheses pointed out by Berndt and Savin (1977). In effect, the above inequalities reflect the fact that the finite sample distribution of the three tests is not well approximated by their asymptotic distribution, especially when N 2 is moderately large. For that reason, Jobson and Korkie (1982) proposed a Bartlett (1937) correction that scales the usual LR T statistic by 1 − (N 2 + N 1 + 3)/2T to improve the finite sample reliability of its asymptotic distribution. In this context, the novel contribution of GRS was to exploit results from classic multivariate ˆ 11 , the test regression analysis to show that, conditional on the sufficient statistics μˆ 1 and statistic ˆ −1 aˆ aˆ T − N2 − N1 FT = N2 ˆ −1 ˆ1 1 + μˆ 1 11 μ 3 Consequently, the distribution under the null H : a = 0 is effectively unconditional. In contrast, the unconditional 0 distribution under the alternative is unknown. 4 Although in general approximate slopes differ from non-centrality parameters for local alternatives, in this case both expressions coincide because the asymptotic variance of aˆ is the same under the null and the alternative.



C73

will be distributed in finite samples as a non-central F with N 2 and T − N 1 − N 2 degrees of freedom and non-centrality parameter T · a0 −1 0 a0 −1

ˆ 11 μˆ 1 1 + μˆ 1

.

Importantly, for N 2 = 1 this F-test coincides with the square of the t-test proposed by Black et al. (1972). The Wald, LM or LR statistics mentioned before can be written as monotonic transformations of this F-test. For instance, FT =

T − N2 − N1 [exp(LRT /T ) − 1]. N2

GRS also showed that ˆ −1 aˆ = μˆ ˆ −1 μˆ − μˆ 1 ˆ −1 ˆ 1 = sˆ 2 (ˆrp ) − sˆ 2 (ˆrp1 ), aˆ 1 μ ˆ −1 μˆ is the (square) sample Sharpe ratio of the ex post tangency portfolio where sˆ 2 (ˆrp ) = μˆ ˆ −1 ˆ 1 is the (square) sample Sharpe ratio of the that combines r 1 and r 2 , while sˆ 2 (ˆrp1 ) = μˆ 1 1 μ ex post tangency portfolio that uses data on r 1 only. 5 In view of expression (2.4), an alternative ˆ −1 aˆ is the maximum ex post square Sharpe ratio obtained by combining interpretation is that aˆ zˆ 2 , which are the components of r 2 that have been fully hedged in the sample relative to r 1 . ˆ −1 (r2 − Br ˆ −1 zˆ 2 , is sometimes known as the (ex post) ˆ 1 ) = aˆ The corresponding portfolio, aˆ optimal orthogonal portfolio (see MacKinlay, 1995). Strictly speaking, GRS considered an incomplete (conditional) model that left unspecified the marginal distribution of r 1t . But they would have obtained exactly the same test had they considered the complete (joint) model rt |It−1 ; ρ ∼ i.i.d. N [μ(ρ), (ρ)], where μ1 , (3.4) μ(ρ) = a + Bμ1 11 11 B (ρ) = , (3.5) B11 B 11 B + and ρ = (a , b , ω , μ1 , σ 11 ), where σ 11 = vech( 11 ). The reason is that under this assumption the joint log-likelihood function of r t conditional on I t−1 can be written as the sum of the conditional log-likelihood function of r 2t given r 1t (and the past), which depends on a, B and only, plus the marginal log-likelihood function of r 1t (conditional on the past), which just depends on μ 1 and 11 . Given that θ = (a , b , ω ) and (μ1 , σ 11 ) are variation free, we have thus performed a sequential cut of the joint log-likelihood function that makes r 1t weakly exogenous for (a, b, ω), which in turn guarantees the efficiency of the GRS procedure (see Engle et al., 1983). In addition, the i.i.d. assumption implies that r 1t would in fact be strictly exogenous, which justifies finite sample inferences. Although the existence of finite sample results is very attractive, particularly when N 2 is moderately large, many empirical studies with financial time-series data indicate that the distribution of asset returns is usually rather leptokurtic. For that reason, MacKinlay and 5 Kandel and Stambaugh (1989) provide an alternative graphical interpretation of the GRS test in sample mean-variance space.


C74

E. Sentana

Richardson (1991) developed a robust test of mean-variance efficiency by using Hansen’s (1982) GMM methodology (see also Harvey and Zhou, 1991). The orthogonality conditions that they considered are E [mR (Rt ; γ )] = 0, 1 ⊗ ε t (γ ) , mR (rt ; γ ) = r1t ε t (γ ) = r2t − a − Br1t .

(3.6)

The advantage of working within a GMM framework is that under fairly weak regularity conditions inference can be made robust to departures from the assumption of normality, conditional homoscedasticity, serial independence or identity of distribution. But since the above moment conditions exactly identify γ , the unrestricted GMM estimators coincide with the Gaussian pseudo ML estimators in (3.2) and (3.3). 6, 7 An alternative way of reaching the same conclusion is by noticing that the influence function mR (Rt ; γ ) is a full-rank linear transformation with time-invariant weights of the Gaussian pseudo-score with respect to γ : 1 ⊗ −1 ε t (γ ). (3.7) sγ t (θ , 0) = r1t Not surprisingly, GMM asymptotic theory yields the same answer as standard Gaussian PML results for multivariate regression models. P ROPOSITION 3.1. Under appropriate regularity conditions √ T (γˆ GMM − γ 0 ) → N 0, Cγ γ (φ 0 ) ,

(3.8)

where −1 Cγ γ (φ) = A−1 γ γ (φ)Bγ γ (φ)Aγ γ (φ), Aγ γ (φ) = −E hγ γ t (θ, 0)|φ = E Aγ γ t (φ)|φ ,

1 r1t Aγ γ t (φ) = −E[ hγ γ t (θ; 0) r1t , It−1 ; φ] = ⊗ −1 , r1t r1t r1t

√

T

s¯γ T (θ, 0) φ , Bγ γ (φ) = lim V T →∞

T

where hγ γ t (θ; 0) is the block of the component of the Gaussian Hessian matrix corresponding to γ attributable to the tth observation, s¯γ T (θ , 0) is the sample mean of the Gaussian scores, and φ = (θ , η) the 2N 2 + N 2 (N 2 + 1)/2 + q parameters of the model, which include some q additional parameters η that determine the shape of the distribution of ε∗t conditional on r 1t and I t−1 . From here, it is straightforward to obtain robust, efficient versions of the Wald and LM tests, which will continue to be asymptotically equivalent to each other under the null and sequences 6 In this paper, I use ‘pseudo ML’ estimator in the same way as Gouri´ eroux et al. (1984). In contrast, White (1982) uses the term ‘quasi ML’ for the same concept. 7 The obvious GMM estimator of ω is given by , ˆ which is the sample analogue to the residual covariance matrix. C The Author(s). Journal compilation C Royal Economic Society 2009.


C75

of local alternatives (see Property 18.2 in Gouriéroux and Monfort, 1995). However, the LR test will not be asymptotically valid unless εt (γ 0 ) is i.i.d. conditional on r 1t and I t−1 . But it is possible to define an LR analogue as the difference in the GMM criterion functions under the null and the alternative. This ‘distance metric’ test will have an asymptotic χ 2 distribution only if the GMM weighting matrix is optimally chosen, in which case it will be asymptotically equivalent to the optimal GMM versions of the W T and LM T tests under the null and sequences of local alternatives (see e.g. Theorem 9.2 in Newey and McFadden, 1994). Importantly, the optimal distance metric test will coincide with the usual overidentification test since the moment conditions (3.6) exactly identify γ under the alternative. In addition, given that the influence functions (3.6) are linear in the parameters γ , the results in Newey and West (1987) imply that regardless of whether we use the Wald, Lagrange multiplier or Distance Metric tests, there will be two numerical distinct test statistics only: those that use the optimal GMM weighting matrix computed under the null, and those based on the optimal weighting matrix computed under the alternative.

4. THE EFFECTS OF THE NUMBER OF ASSETS AND PORTFOLIO COMPOSITION ON TEST POWER Although at first sight this section may only seem interesting for theoretically inclined econometricians, arguably it is also relevant for applied researchers because in practice the substantive conclusions about the mean-variance efficiency of a candidate portfolio can be rather sensitive to the way in which tests are implemented. Let us start by considering a very simple practical situation. As we mentioned in the previous section, Black et al. (1972) proposed the use of the t-ratio of a i in the regression of r 2 on a constant and r 1 to test the mean-variance efficiency of r 1 . However, when r 2 contains more than one element, it seems natural to follow GRS and conduct a joint test of H 0 : a = 0 in order to increase the probability of rejecting the null hypothesis when r 1 is not mean-variance efficient. Somewhat surprisingly, the answer is not so straightforward. For simplicity, let us initially assume that there are only two assets in r 2 , r i and r j , say. According to (2.4), the incremental Sharpe ratio that one can attain by combining r 1t , r it and r j t is given by a −1 a, which is the maximum (square) Sharpe ratio that one can achieve by combining the components of r i and r j that are fully hedged with respect to r 1 , z i = a i + ε i and z j = a j + ε j . But if we apply (2.4) to z i and z j we get a −1 a =

[s(zj ) − ρzi zj s(zi )]2 ai2 + , ωi2 1 − ρz2i zj

where ρzi zj is the correlation between z i and z j . An alternative way to interpret this expression is to think of the second summand as the (square) Sharpe ratio of u j = z j − (ω ij /ω2j )z i , which is the component of r j that is fully hedged with respect to both r 1t and r it . 8 Therefore, when we add r j to r i for the purpose of testing the mean-variance efficiency of r 1 we must consider three effects:

8 It is important to remember that as the correlation between z and z increases, the law of one price guarantees that i j s 2 (u j ) = 0 in the limit of ρz2i zj = 1. C The Author(s). Journal compilation C Royal Economic Society 2009.

C76

(1) (2) (3)

E. Sentana

The increase in the so-called non-centrality parameter of the test statistic, which is proportional to s 2 (u j ) and ceteris paribus increases power. The increase in the number of degrees of freedom of the numerator, which ceteris paribus decreases power. The decrease in the number of degrees of freedom of the denominator resulting from the fact that there are additional parameters to be estimated, which ceteris paribus decreases power too, although not by much if T is reasonably large.

The net effect is studied in detail by Rada and Sentana (1997). For a given value of sˆ 2 (r1 ) and different values of T, these authors obtain isopower lines, defined as the locus of points in s 2 (z i ), s 2 (u j ) space for which the power of the univariate test is exactly the same as the power of the bivariate test. GRS also present some evidence on the effects of increasing the number of assets on power under the assumption that the innovations are cross-sectionally homoscedastic and equicorrelated, so that = ω[(1 − ρ)IN2 + ριN2 ιN2 ],

(4.1)

where ω and ρ are two scalars. Given that the F-test estimates a fully unrestricted , it is not surprising that their results suggest that one should not use a large N 2 (see also MacKinlay, 1987). In fact, the F-test can no longer be computed if N 2 ≥ T − N 1 . 9 The answer to the previous practical question leads to another practical question. If we want to increase the chances of rejecting the null hypothesis when r 1t is not mean-variance efficient, should we group r it and r j t into a portfolio and carry out a single individual t-test, or should we consider them separately? Rada and Sentana (1997) study this question in a multivariate context. For simplicity, I will only discuss the situation in which is assumed to be a known diagonal matrix, in which case one could work with the vector of re-scaled excess returns r∗2 = dg −1/2 ()r 2 , which are such that r∗2 = a∗ + B∗ r1 + ε ∗ , where a∗ = dg −1/2 ()a, B∗ = dg −1/2 ()B and V (ε∗ |r1 ) = IN2 . Note that the ith element of a∗ , a ∗i = a i /ω i , coincides with the ‘information ratio’ of r i introduced at the end of Section 2, since it reflects the Sharpe ratio of zi = ri − σ i1 −1 11 r1 , which is the component of r i that cannot be hedged against r 1 . In this simplified context, Rada and Sentana (1997) express the non-centrality parameter of the joint Wald test of H 0 : a∗ = 0 as the sum of the non-centrality parameters of a Wald test whose null is that all information ratios are equal (H0 : a∗ = a ∗ ιN2 ) and a Wald test whose null is that the average information ratio is 0 (H 0 : a ∗ = 0). Their result is based on a standard analysis of variance argument applied to the ML estimator of a∗ . Specifically, they exploit the fact that N2

ˆ aˆ i∗2 = N2 (aˆ ∗2 + δ),

(4.2)

i=1

9 Affleck-Graves and McDonald (1990) proposed a maximum entropy statistic that ensures the non-singularity of the estimated residual covariance matrix even if N 2 > T . Unfortunately, the finite sample distribution of their test statistic is generally unknown even under normality, and can only be assessed by simulation. In addition, it is not clear either what is limiting behaviour will be when both N 2 and T go to infinity at the same rate. C The Author(s). Journal compilation C Royal Economic Society 2009.


C77

where aˆ ∗ = N2−1

N2

aˆ i∗ ,

δˆ = N2−1

i=1

N2

(aˆ i∗ − aˆ ∗ )2 .

i=1

It is then easy to see that under their maintained distributional assumptions, aˆ ∗2 is proportional to a non-central chi-square with one degree of freedom, while δˆ is proportional to an independent non-central chi-square with N 2 − 1 degrees of freedom. Not surprisingly, Rada and Sentana (1997) show that the contribution of each of those two components to the power of the test depend exclusively mean of the information ratios 2 ∗ on the relative values of the cross-sectional −1 N2 ∗ ∗ 2 a , and their cross-sectional variance δ = N a ∗ = N2−1 N i=1 i i=1 (ai − a ) . In particular, if 2 there were no cross-sectional variability in the information ratios because δ = 0, then one should simply apply the Black et al. (1972) test to the equally weighted portfolio of r 2 . In contrast, if a∗ were 0, such a test would have no power to reject the mean-variance efficiency of r 1 regardless of how big δ could be. Finally, Rada and Sentana (1997) extend their analysis to the case in which one forms L equally weighted portfolios of M different assets from the N 2 elements of r∗2 , where M = N 2 /L. In that case, an analysis of variance decomposes the test into three components: a test that the overall mean of the information ratios is zero, as in the previous case, a test that the between group variance in information ratios is 0, and finally a test that their within groups variance is 0. More specifically, if we denote by aˆ l∗ the average value of aˆ i∗ for those assets that belong to the lth group, so that aˆ ∗ = L−1 Ll=1 aˆ l∗ , then we will have that M L L 1 ∗ 1 L ∗ ˆ 2+ δˆ = (aˆ l − a) (aˆ − aˆ l∗ )2 . L l=1 L l=1 N2 j =1 i

(4.3)

Note that the first summand is proportional to a non-central chi-square with L − 1 degrees of freedom, while the second one is proportional to an independent non-central chi-square with N 2 − L degrees of freedom. In this context, Rada and Sentana (1997) provide isopower lines in the space of within group and between group variances. Their analysis suggests that randomly chosen portfolios will have very little power over and above a test that the overall mean is zero, since the between groups variance is likely to be close to 0 for large M. In contrast, if we could form portfolios that reduce the within group variance in information ratios but increase their between group variance then we would have substantially more power in the portfolio tests than in the test that considers the individual assets. The above results provide a formal justification for the usual practice of grouping returns according to the ranked values of certain observable characteristics that are likely to yield disperse information ratios, such as size or book to value, as opposed to grouping them by industry, which is likely to produce very similar information ratios. Nevertheless, it is important to realise that such procedures may introduce some data-snooping size distortions, as illustrated by Lo and MacKinlay (1990). Another fact that is worth remembering in this context is that the maximum Sharpe ratio attainable for any particular N 2 will be bounded from above by the limiting maximum Sharpe ratio, s ∞ , which is also bounded if we rule out arbitrage opportunities as N 2 → ∞ (see Ross, 1976, and Chamberlain, 1983b). This is important because an increasing number of assets cannot result in an unbounded Sharpe ratio and, consequently, an unbounded non-centrality parameter, as explained by MacKinlay (1987, 1995). In other words, N 2 (a ∗2 + δ) must remain bounded as N 2 goes to infinity, which requires that (a ∗2 + δ) = O(N −1 2 ). C The Author(s). Journal compilation C Royal Economic Society 2009.

C78

E. Sentana

To see the effects of this restriction, let us obtain the asymptotic distribution of the meanvariance efficiency test when N 2 → ∞ in the case in which is diagonal but unknown and the distribution of returns is i.i.d. multivariate normal. Conditional on sˆ 2 (r1 ), the squared t-ratio of the intercept of the ith asset t˜i∗2 =

T − N1 − 1 aˆ i2 · [1 + sˆ 2 (rp1 )] ωˆ ii

will be distributed independently of the t-ratios of the intercepts of the other assets as a noncentral F distribution with 1 and T − N 1 − 1 degrees of freedom and non-centrality parameter T ai∗2 [1 + sˆ 2 (rp1 )]−1 . Hence, its mean will be T T − N1 − 1 ∗2 (4.4) 1+ a πi = T − N1 − 3 [1 + sˆ 2 (rp1 )] i and its variance

⎫ ⎧ 2 T ∗2 ⎬ ⎨ 2 a 1 + − 1) 2(T − N 2 1 [1+ˆs (rp1 )] i . λ2i = 2T (T − N1 − 3)2 (T − N1 − 5) ⎩ + (T − N1 − 3) 1 + a ∗2 ⎭ [1+ˆs 2 (rp )] i 1

Given the mean-variance efficiency test that exploits the diagonality of will be proportional Nthat 2 ∗2 ˜ t to i=1 i , we can use the Linderberg–Feller central limit theorem for independent but heterogeneously distributed random variables to obtain the asymptotic distribution of the joint test for fixed T but large N 2 , which under the null will be given by 10 √ N2 N2

T − N1 − 1 t˜i∗2 − → N (0, 2). N2 i=1 T − N1 − 3 In contrast, the mean under the alternative will be proportional to a ∗2 + δ in view of (4.4). But arbitrage opportunities, since we saw before that a ∗2 + δ = O(N −1 2 ) in order to rule out limiting √ ¯ N2 , and therefore the meanone cannot even allow for local alternatives of the form (a¯ ∗2 + δ)/ variance efficiency test is likely to have negligible asymptotic power in those circumstances. 11 2 ˜∗2 Affleck-Graves and McDonald (1990) suggest to use the statistic N i=1 ti even when is not diagonal. Part of their motivation is that in this way there is no longer any need to form portfolios for the purposes of avoiding a singular estimated covariance matrix. The problem is that the distribution of such a statistic is non-standard if is not diagonal, although in samples in which N 2 is small but T is large, we could use Imhof’s (1961) results (see also Farebrother, 10

As is well known, this central limit theorem says that N2 ∗2 N2 ˜ i=1 ti − i=1 πi → N (0, 1) N2 2 i=1 λi

as long as the Lindeberg condition is satisfied, which we are implicitly assuming. This condition guarantees that the individual variances λ2i are small compared to their sum, in the sense that for given and for all sufficiently large N2 2 λj < for i = 1, . . . , N 2 (see Feller, 1971, p. 256). N2 , λ2i / j =1 N ∗2 11 Rada and Sentana (1997) also combine the decompositions of ˆ i in (4.2) and (4.3) with this asymptotic i=1 a approximation to obtain the asymptotic distribution of the components of the mean-variance efficiency test attributable to the overall mean of the information ratios, their between groups variance and the within groups one. C The Author(s). Journal compilation C Royal Economic Society 2009.


C79

2 ∗2 ˜ 1990) to approximate the distribution of the statistic N i=1 ti , replacing the matrix by its ˆ unrestricted sample counterpart in computing the weights of the associated quadratic form in normal variables. Alternatively, we could impose structure on the cross-sectional distribution of the asset returns. Bossaerts and Hillion(1995) take a first step in this direction and derive N1 ˜ 2 ˜ the asymptotic distribution of N i=1 (rit − j =1 bij rj t ) for large N 2 but fixed T, where bij is the restricted OLS estimator of b ij that imposes the null hypothesis a i = 0, under the assumptions that (i) the conditional distribution of εt given r 1t is exchangeable (see e.g. Kingman, 1978), which among other things requires that can be written as in (4.1) and (ii) has an approximate zero factor structure as N 2 grows (see Chamberlain and Rothschild, 1983), which requires that ρ = O(1/N ) so that the largest eigenvalue of in (4.1) is bounded. Bossaerts and Hillion (1995) show that their test, which is effectively focusing on H 0 : a ∗ = 0, is not consistent for fixed T if we rule out limiting arbitrage opportunities, but at least has non-trivial power against admissible √ alternatives of the form a = (a¯ ∗ / N2 )ιN2 . As expected, though, their test becomes consistent as T → ∞. However, the application of mean-variance efficiency tests in situations in which N 2 /T is not negligible would require not only a different asymptotic theory in which the object of interest is the cross-sectional limit of a −1 a, but also the imposition of plausible restrictions on the matrix , with exact or approximate factor structures being the most natural candidates.

5. ASYMPTOTICALLY EQUIVALENT TESTS Both Jobson and Korkie (1983) and Britten-Jones (1999) suggested to test the mean-variance efficiency of a given portfolio by regressing 1 on r t . The rationale is that the coefficients of such a projection, −1 μ, are proportional to the weights of the tangency portfolio, −1 μ, by virtue of the Woodbury formula. In a GMM framework, the moment conditions and parametric restrictions of their proposed test are E(rt rt φ + − rt ) = E[mU (rt ; φ + )] = 0,

(5.1)

and H0 : φ + 2 = 0, respectively. This test is essentially identical to the GMM test of the moment conditions E[rt (κ + ψ1 r1t )] = 0 studied by Cochrane (2001) as a test of linear factor pricing models, since in the case of excess returns the choice of κ is arbitrary. Intuitively, Cochrane’s moment conditions can be understood as simply saying that under the null there is a stochastic discount factor (SDF) generated from (1, r 1t ) alone that prices correctly all N assets under consideration. Peñaranda and Sentana (2004) provide a third interpretation of (5.1) by using the fact that the arbitrage (i.e. zero-cost) mean-variance frontier (AMVF) can be written as 1 + μ −1 μ p+ , r MV (μ) = μ μ −1 μ where p+ is the (uncentred) mean representing portfolio for arbitrage portfolios, i.e. the arbitrage portfolio that satisfies E(rp+ ) = μ. C The Author(s). Journal compilation C Royal Economic Society 2009.

(5.2)

C80

E. Sentana

Specifically, they show that the test of H0 : φ + 2 = 0 based on (5.1) can be understood as checking that AN1 = r1 and AN = r share the same mean representing portfolio (see also Sentana, 2005). In this context, we can once more apply the trinity of asymptotic GMM tests, which will again have a limiting chi-square distribution with N 2 degrees of freedom under the null. But since the moment conditions defining φ ∗ and φ + are exactly identified, the distance metric test will coincide with the overidentifying restrictions test. In addition, all the tests can be made numerically identical by using a common estimator of the asymptotic covariance matrix of √ ¯ U T (φ 0 ), because both the moment conditions and the restrictions to test are linear in the Tm parameters (see Newey and West, 1987). Peñaranda and Sentana (2004) also consider an alternative approach based on the centred mean representing portfolio, Cov(r, p++ ) = μ, which leads to the moment conditions mM (rt ; μ) rt − μ =E = E[mE (rt ; ϕ + , μ)] = 0, (5.3) E (rt − μ) (rt − μ) ϕ + − rt mC (rt ; ϕ + , μ) −1 + to test H0 : ϕ + 2 = 0. The advantage of working with centred moments is that ϕ = μ, which means that their test can also be regarded as a test based on the most frequent presentation of the weights of the tangency portfolio. In this sense, ϕ + 2 = 0 means that the tangency portfolio does not involve any asset in r 2t . In addition, their test is entirely analogous to the one considered by De Santis (1995) and Bekaert and Urias (1996). Although these authors were interested in assessing the gains to US investors from internationally diversifying their portfolios, they exploited the duality between return mean-variance frontiers and Hansen and Jagannathan (1991) frontiers by basing their tests on the SDF moment conditions

E{rt [c + (r1t − μ1 ) β 1 ]} = 0, in which the choice of c is arbitrary. In this context, sequential GMM can be successfully applied to (5.3), and it retains the computational advantage of linearity in ϕ + (see Ogaki, 1993). In addition, since E[mM (rt ; μ)] = 0 exactly identifies the nuisance parameter μ, Peñaranda and Sentana (2004) show that SGMM entails no asymptotic efficiency loss. Therefore, we have three different ways to test for the mean-variance efficiency of r 1t : centred and uncentred representing portfolios (or portfolio weights), and the GRS regression version. The equivalence between their respective parametric restriction can be easily proved by showing that + a is a full-rank linear transformation of φ + 2 , which in turn is proportional to ϕ 2 . However, the fact that the restrictions to test are equivalent does not necessarily imply that the corresponding GMM-based test statistics will be equivalent too. This is particularly true in the case of the regression version of the test, in which the number of moments and parameters involved is different, although the number of degrees of freedom is the same. It turns out, however, that those three families of mean-variance efficiency tests are asymptotically equivalent under the null and sequences of local alternatives, as shown by Peñaranda and Sentana (2004). Therefore, there is no basis to prefer one test to the other from this perspective because all three statistics converge to exactly the same random variable. In this respect, note that this equivalence result is valid as long as the asymptotic distributions of the different tests are standard, which happens under fairly weak assumptions on the distribution of asset returns. However, such an equivalence is lost under fixed alternatives. But by strengthening the distributional assumptions, Peñaranda and Sentana (2004) prove that if r t are independently and C The Author(s). Journal compilation C Royal Economic Society 2009.


C81

identically distributed as an elliptical random vector with mean μ, covariance matrix , and bounded fourth moments, then the approximate slope of the Wald version of the regression test is at least as large as the approximate slope of the Wald version of the centred RP test. In contrast, it is fairly easy to find parametric configurations for which the approximate slope of the uncentred RP test is either bigger or smaller than the approximate slope of the GMM version of the GRS test. In particular, Peñaranda and Sentana (2004) prove that the uncentred RP test is more powerful than the regression test under normality regardless of the parameter values. Although these results are fairly specific, they can rationalise Monte Carlo results obtained under commonly made assumptions, since the elliptical distributions nest both the multivariate normal and Student t. Finally, it is worth mentioning that the moment conditions (5.1) and (5.3), as well the ones used by MacKinlay and Richardson (1991) (see equation (3.6)) are exactly identified under the alternative, so that weighting matrix is asymptotically irrelevant for the unrestricted estimators. Under the null, though, those systems of moment conditions are overidentified, so we may need an initial estimate of the optimal weighting matrix based on a consistent estimator of the parameters. Although the choice of preliminary estimator does not affect the asymptotic distribution of two-step GMM estimators up to O p (T −1/2 ) terms, there is some Monte Carlo evidence suggesting that their finite sample properties can be negatively affected by an arbitrary choice of initial weighting matrix such as the identity (see e.g. Kan and Zhou, 2001). For that reason, Peñaranda and Sentana (2004) provide the following useful expressions for first-step, consistent restricted estimators, which are optimal under the assumption that r t is independently and identically distributed as an elliptical random vector with mean μ, covariance matrix , and bounded coefficient of multivariate excess kurtosis κ (see Mardia, 1970). 12 P ROPOSITION 5.1. (1)

The linear combinations of the moment conditions in (5.1) that provide the most efficient + estimators of φ + 1 under H0 : φ 2 = 0 will be given by E(r1t r1t φ + 1 − r1t ) = 0,

(2)

−1 + so that φ¯ 1 = ˆ 11 μˆ 1 . The linear combinations of the moment conditions (5.3) that provide the most efficient + estimators of ϕ + 1 under H0 : ϕ 2 = 0 will be given by r1t − μ1 E = 0, (r1t − μ1 )(r1t − μ1 ) ϕ + 1 − r1t −1

(3)

ˆ ˆ 1 , and so that μ¯ 1T = μˆ 1 and ϕ¯ + 1 = 11 μ The linear combinations of the moment conditions (3.6) that provide the most efficient estimators of b under H 0 : a = 0 will be given by E[(r1t + κμ1 ) ⊗ (r2t − Br1t )] = 0.

+ In this respect, note that since −1 μ = (1 + μ −1 μ)−1 −1 μ, φ¯ 1 and ϕ¯ + 1 will be proportional + + ˆ to each other, and the same applies to φ and ϕˆ . However, since the factor of proportionality 12 See Renault (1997) for a result analogous to part 3 in the special case in which the payoffs of the arbitrage portfolios are i.i.d. Gaussian.


C82

E. Sentana

+ depends on the data, the Wald tests of H0 : φ + 2 = 0 and H0 : ϕ 2 = 0 cannot be made numerically identical.

6. MORE EFFICIENT TESTS 6.1. Tests based on the distribution of r 2t conditional on r 1t The GMM tests discussed in previous sections provide asymptotically valid inferences under fairly weak assumptions on the distribution of returns. However, this robustness may come at the cost of a power loss. In this sense, Hodgson et al. (2002; hereinafter HLV) developed a semi-parametric estimation and testing methodology that enabled them to obtain optimal meanvariance efficiency tests under the assumption that the distribution of r 2t conditional on r 1t (and their past) is elliptically symmetric. Specifically, HLV showed that their proposed estimators of a and b are adaptive under the aforementioned assumptions of linear conditional mean and constant conditional variance, which means that they are as efficient as infeasible maximum likelihood estimators that use the correct parametric elliptical density with full knowledge of its shape parameters. The main advantage of elliptical distributions in this context is that they generalize the multivariate normal distribution, but at the same time they retain its analytical tractability irrespective of the number of assets. Before discussing their test, though, it is pedagogically convenient to introduce a parametric version, which will be based on the assumption that conditional on r 1t and It−1 , ε ∗t is independent and identically distributed as a spherical random vector with a well-defined density, or ε∗t |rMt , It−1 ; γ 0 , ω0 , η0 ∼ i.i.d. s(0, IN , η0 ) for short, where η is the q × 1 vector of shape ∗ parameters that determine the distribution of ςt = ε ∗ t ε t . Apart from the normal distribution, another popular and more empirically realistic example is a standardized multivariate t with ν 0 degrees of freedom, or i.i.d. t(0, I N , ν 0 ) for short. As is well known, the multivariate Student t approaches the multivariate normal as ν 0 → ∞, but has generally fatter tails. Zhou (1993) and Amengual and Sentana (2009) consider two other illustrative examples: the original Kotz (1975) distribution and a discrete scale mixture of normals. Let φ = (γ , ω , η) ≡ (θ , η) denote the 2N 2 + N 2 (N 2 + 1)/2 + q parameters of interest, which we assume variation free. The log-likelihood function of a sample of size T based on a particular parametric spherical assumption will take the form LT (φ) = Tt=1 lt (φ), with lt (φ) = dt (θ ) + c(η) + g[ςt (θ ), η], where dt (θ) = − 12 ln || corresponds to the Jacobian, c(η) to the constant of integration of the assumed density, and g[ςt (θ), η] to its kernel, where −1/2 ∗ ∗ ε t (θ ) and ε t (θ) = r2t − a − Br1t . 13 ςt (θ) = ε∗ t (θ)ε t (θ), ε t (θ) = Let st (φ) denote the score function ∂lt (φ)/∂φ, and partition it into three blocks, sγ t (φ), sωt (φ) and sηt (φ), whose dimensions conform to those of γ , ω and η, respectively. A straightforward application of expression (2) in Fiorentini and Sentana (2007) implies that 1 ⊗ δ[ςt (θ), η]−1 ε t (θ ), (6.1) sγ t (φ) = r1t

13 Fiorentini et al. (2003) provide expressions for c(η) and g [ς (θ), η] in the multivariate t case, which under normality t t collapse to −(N 2 /2) log π and − 21 ςt (θ), respectively. C The Author(s). Journal compilation C Royal Economic Society 2009.


C83

where δ[ςt (θ), η] = −2∂g[ςt (θ), η]/∂ς, which reduces to 1 under Gaussianity (cf. equation (3.7)). Given correct specification, the results in Crowder (1976) imply that the score vector st (φ) evaluated at the true parameter values has the martingale difference property. His results also imply that, under suitable regularity conditions, which typically require that both r 1t and vech(r 1t r1t ) are strictly stationary process with absolutely summable autocovariances, the asymptotic distribution of the feasible ML estimator will be given by the following expression: √ d T φˆ ML − φ 0 −→ N 0, I −1 (φ 0 ) , where I(φ 0 ) = E[It (φ 0 )|φ 0 ],

It (φ) = V [st (φ)|rMt , It−1 ; φ] = −E ht (φ)|rMt , It−1 ; φ ,

and ht (φ) denotes the Hessian function ∂st (φ)/∂φ = ∂ 2 lt (φ)/∂φ∂φ . On this basis, Amengual and Sentana (2009) prove the following result. P ROPOSITION 6.1. If ε ∗t |rMt , It−1 ; φ 0 in (3.1) is i.i.d. s(0, IN2 , η0 ) with density exp[c(η) + g(ςt , η)] such that Mll (η0 ) < ∞, and both r 1t and vech(r 1t r1t ) are strictly stationary processes with absolutely summable autocovariances, then √ d T (âML − a0 ) −→ N [0,I aa (φ 0 )], (6.2) where 1 −1 I aa (φ) = [Iaa (φ) − Iab (φ)Ibb [1 + s 2 (rp1 )], (φ)Iab (φ)]−1 = M ll (η)

2∂δ[ςt (θ), η] ςt (θ) ςt (θ)

2 φ =E + δ[ςt (θ), η]

φ , M ll (η) = E δ [ςt (θ), η]

N ∂ς N μ1 = E(r1t |φ) and 11 = V (r1t |φ), so that s(rp1 ) = μ1 −1 11 μ1 is the maximum Sharpe ratio attainable with the reference portfolios. Importantly, expression (6.2) is valid regardless of whether or not the shape parameters η are fixed to their true values η0 , as in an infeasible ML estimator, aˆ IML say, or jointly estimated with θ , as in an unrestricted one, aˆ UML say. The reason is that the scores corresponding to the mean parameters, sγ t (φ 0 ), and the scores corresponding to variance and shape parameters, sωt (φ 0 ) and sηt (φ 0 ), respectively, are asymptotically uncorrelated under the sphericity assumption. The usual asymptotic efficiency properties of maximum likelihood estimators and associated test procedures imply that mean-variance efficiency tests based on this elliptical assumption will be more efficient than those based on the assumption of normality. Specifically, it is easy to see that Cαα (φ 0 ) = [1 + s 2 (rp1 )]0 ,

(6.3)

which does not depend on the specific distribution for the innovations that we are considering, regardless of whether or not the conditional distribution of ε∗t is spherical, as long as it is i.i.d. Since Mll (η) ≥ 1, with equality if and only if ε ∗t is normal, it is clear that the parametric procedure is more efficient than the GMM one. C The Author(s). Journal compilation C Royal Economic Society 2009.

C84

E. Sentana

However, unless one is careful, the elliptically symmetric parametric approach may provide misleading inference if the relevant conditional distribution does not coincide with the assumed one, even if both are elliptical. Nevertheless, Amengual and Sentana (2009) show that the parametric pseudo ML estimator of γ that makes the wrong distributional assumption remains consistent in that case. In contrast, the ML estimator of is only consistent up to scale, in the sense that if we re-parametrise as τ ϒ(υ), where υ are N 2 (N 2 + 1)/2 − 1 parameters that ensure that |ϒ(υ)| = 1 ∀υ, υ will be consistently estimated but τ will not. They illustrate their results when the pseudo log-likelihood function is based on the multivariate t, in which case the correct asymptotic distribution for the pseudo t-based ML estimator of a is given by the following expression: P ROPOSITION 6.2. If ε ∗t |rMt , It−1 ; ϕ 0 is i.i.d. s(0, IN , 0 ) but not t with κ 0 > 0, where ϕ 0 = (γ 0 , υ 0 , τ0 , 0 ), then: √ MO d ll (φ ∞ ; ϕ 0 ) T (âUML − a0 ) −→ N 0, (6.4) 2 · Caa (ϕ 0 ) , λ∞ M H ll (φ ∞ ; ϕ 0 ) where MO ll (φ; ϕ)

= E δ 2 [ςt (ϑ), η] · [ςt (ϑ)/N ] ϕ ,

MH ll (φ; ϕ)

= E { 2∂δ[ςt (ϑ), η]/∂ς · [ςt (ϑ)/N ] + δ[ςt (θ), η]| ϕ} ,

λ ∞ = τ 0 /τ ∞ , and τ∞ is the pseudo-true value of τ . The analysis of a restricted t-based PML estimator which fixes η to some value η, ¯ is entirely ¯ as opposed to analogous, except for the fact that the pseudo-true value of τ becomes τ∞ (η), τ ∞ = τ ∞ (η ∞ ). 14 A natural question in this context is a comparison of the efficiency of the t-based pseudo ML estimator and the GMM estimator when the distribution is elliptical but not t. Amengual and Sentana (2009) answer this question by assuming that the conditional distribution is either normal, Kotz, or the two-component scale mixture of normals previously discussed, for which 2 H they obtain analytical expressions for the inefficiency ratio MO ll (φ ∞ ; ϕ 0 )/{λ∞ [ M ll (φ ∞ ; ϕ 0 )] }. Trivially, they find that if the true conditional distribution is Gaussian, then the restricted ML estimator that makes the erroneous assumption that it is a Student t with η¯ −1 degrees of freedom is inefficient relative to the GMM estimator, the more so the larger the value of η. ¯ Nevertheless, this inefficiency becomes smaller and less sensitive to η¯ as the number of assets increases. But of course η ∞ = 0 in this case, which suggests that estimating η is clearly beneficial under misspecification. They also find that the restricted t-based PML estimator seems to be strictly more efficient than the GMM one when the true conditional distribution is leptokurtic. And again, they find that as N 2 increases the restricted t-based PML estimator tends to achieve the full efficiency of the ML estimator for any η¯ > 0. As we mentioned before, HLV proposed a semi-parametric estimator of multivariate linear regression models that updates θˆ GMM (or any other root-T consistent estimator) by means of a single scoring iteration without line searches. The crucial ingredient of their method is the socalled elliptically symmetric semi-parametric efficient score (see Proposition 7 in Fiorentini and 14 When the true distribution is either mesokortic (κ = 0) or platikurtic (κ < 0), Amengual and Sentana (2009) show that the t-based pseudo ML estimators will be asymptotically equivalent to the GMM estimators.



C85

Sentana, 2007): sθt (φ 0 ) = sθt (φ 0 ) − Ws (φ 0 )

δ[ςt (θ 0 ), η0 ]

2 ςt (θ 0 ) ςt (θ 0 ) −1 − −1 , N (N + 2)κ0 + 2 N

where Ws (φ) = [0, 0, 12 vec (−1 )DN2 ] and DN2 the duplication matrix of order N 2 (see Magnus and Neudecker, 1988). In fact, the special structure of Ws (φ) implies that we can update the GMM estimator of γ by means of the following simple BHHH correction: T

−1 sγ t (φ 0 )sγ t (φ 0 )

t=1

T

sγ t (φ 0 ),

(6.5)

t=1

which does not require the computation of s˚ωt (φ 0 ). In practice, of course, sγ t (φ 0 ) has to be replaced by a semi-parametric estimate obtained from the joint density of ε∗t . However, the elliptical symmetry assumption allows one to obtain such an estimate from a non-parametric estimate of the univariate density of ςt , h(ςt ; η), avoiding in this way the curse of dimensionality (see HLV and Appendix B1 in Fiorentini and Sentana, 2007, for details). Proposition 7 in Fiorentini and Sentana (2007) shows that the elliptically symmetric semiparametric efficiency bound will satisfy S˚γ γ (φ 0 ) = Iγ γ (φ 0 ) in view of the structure of Ws (φ 0 ). This result confirms that the HLV estimator of γ is adaptive. 15 Unfortunately, the HLV approach may also lead to erroneous inferences if the true conditional distribution is asymmetric, and the same is true of the parametric procedure. Amengual and Sentana (2009) illustrate the problem for the case in which ε∗t is distributed as an i.i.d. multivariate asymmetric t (see Menc´ıa and Sentana, 2009b). In that context, they show that the unrestricted t-based PMLE of a will be inconsistent. In contrast, B will be consistently estimated precisely because the estimator of a will fully mop up the bias in the mean. Unfortunately, meanvariance efficiency tests are based on a, not B. For analogous reasons, the HLV estimator of a also becomes inconsistent under asymmetry. Intuitively, the problem is that it will not be true any more that the N 2 -dimensional density of ∗ ε∗t could be written as a function of ςt = ε∗ t ε t alone. Therefore, a semi-parametric estimator of sγ t (φ 0 ) that combines the elliptical symmetry assumption with a non-parametric specification for δ[ςt (θ), η] will be contaminated by the skewness of the data. In contrast, the GMM estimator always yields a consistent estimator of a, on the basis of which we can develop a GMM-based Wald test with the correct asymptotic size because (3.8) remains valid under asymmetry. Another problem that the semi-parametric procedures could have is that their finite sample performance may not be well approximated by the first-order asymptotic theory that justifies them. In this respect, the Monte Carlo evidence presented in Amengual and Sentana (2009) suggests that HLV-based joint and individual tests have systematically the largest size distortions. In contrast, GMM tests have finite sample sizes that are close to the asymptotic levels. As for the tests that use the unrestricted t-based PML estimator, they find that both the robust and non-robust versions are well behaved.

15 HLV also consider alternative estimators that iterate the semi-parametric adjustment (6.5) until it becomes negligible. However, since they have the same first-order asymptotic distribution, we shall not discuss them separately.


C86

E. Sentana

6.2. Tests based on the joint distribution of r 1t and r 2t In this section we explicitly study the framework analysed by MacKinlay and Richardson (1991) and Kan and Zhou (2006), who considered a joint distribution of excess returns for the N assets in r t . Such an assumption is particularly relevant in this context because in the presence of a safe asset a sufficient condition for mean-variance analysis applied to r t to be compatible with expected utility maximization is that the joint distribution of r t is elliptical (see e.g. Chamberlain, 1983a, Owen and Rabinovitch, 1983, and Berk, 1997). As we mentioned before, when the joint distribution of r t is i.i.d. Gaussian, the distribution of r 2t conditional on r 1t must also be normal, with a mean a + Br 1t that is a linear function of r 1t , and a covariance matrix that does not depend on r 1t . However, while the linearity of the conditional mean will be preserved when r t is elliptically distributed but non-Gaussian, the conditional covariance matrix will no longer be independent of r 1t . For instance, if we assume that −1/2 (ρ)[rt − μ(ρ)] ∼ i.i.d. t(0, IN , ν), where μ(ρ) and (ρ) are defined in (3.4) and (3.5), then E [r2t |r1t ; ρ, ν] = a + Br1t , 1 ν−2 −1 1+ V [r2t |r1t ; ρ, ν] = (r1t − μ1 ) 11 (r1t − μ1 ) , ν + N1 − 2 (ν − 2) which means that model (3.1) will be misspecified due to contemporaneous, conditionally heteroscedastic innovations. In other words, the variances and covariances of the regression residuals will be a function of the regressor. In addition, note that we can no longer operate the sequential cut of the joint log-likelihood function discussed in Section 3, which invalidates the exogeneity of r 1t . As MacKinlay and Richardson (1991) pointed out, the GMM estimator of γ remains consistent in this case. In fact, Kan and Zhou (2001) and Amengual and Sentana (2009) show that if r t is independently and identically distributed as an elliptical random vector with mean μ(ρ), covariance matrix (ρ), and bounded fourth moments, then V (âGMM ) = 1 + s 2 (rp1 ) (1 + κ0 ) 0 . (6.6) In this sense, note that the only difference with respect to (3.8) is that the maximum (square) Sharpe ratio of the reference portfolios s 2 (rp1 ) is multiplied by the factor (1 + κ 0 ). In practice, we could estimate V (âGMM ) by using heteroscedastic robust standard errors a` la White (1980). At the other extreme of the efficiency range, we can use Proposition 6 in Amengual and Sentana (2009) to show that 1 1 2 + s (rp1 ) , (6.7) V (âJML ) = M ll (η0 ) M ss (η0 ) where aˆ JML denotes the joint ML estimator that makes the correct assumption that ∗t (ρ) = −1/2 (ρ)[rt − μ(ρ)] ∼ i.i.d. s(0, IN , η), and both Mll (η) and

2∂δ[ςt (θ), η] ςt2 (θ)

N ςt

φ +1 1 + V δ[ςt (θ), η] φ = E M ss (η) = N +2 N ∂ς N (N + 2) correspond to the N-dimensional joint distribution of r t . This estimator has been proposed by Kan and Zhou (2006) for the case of the multivariate t. Amengual and Sentana (2009) also prove the consistency of the t-based estimators of γ which make the erroneous assumption that V [r2t |r1t ] = τ ϒ(υ), where τ = ||1/N2 and ϒ(υ) = C The Author(s). Journal compilation C Royal Economic Society 2009.


C87

/||1/N2 , and provide expressions for the conditional variance of the score and expected Hessian matrix under such misspecification. Specifically, they show that a sandwich formula analogous to the one in (6.4) can still be applied to obtain the asymptotic variance of the unrestricted ML estimator. They also quantify the efficiency of the GMM and conditional ML estimator relative to the full information ML estimator when r t is distributed as a multivariate t. Their results indicate that the restricted t-based PML estimator of γ is more efficient than the GMM estimator for all values of η, ¯ the more so the larger N 2 is. Furthermore, the unrestricted t-based PML estimator that also estimates η gets close to achieving the full efficiency of the joint ML estimator, especially for large N 2 . In principle, their results will continue to hold if we replace the t-based ML estimator by any other estimator based on a specific i.i.d. elliptical distribution for r 2t |r 1t , I t−1 . But since the HLV estimator is asymptotically equivalent to a parametric estimator that uses a flexible elliptical distribution as we increase the number of parameters, their results suggest that the HLV estimator of γ will continue to be consistent. In fact, an argument analogous to the one made by Hodgson (2000) in a closely related univariate context would imply that the HLV estimator is as efficient as the parametric estimator that used the true unconditional distribution of the innovations εt = r2t − a0 − B0 r1t . Nevertheless, inferences about a and B would have to be adjusted to reflect the contemporaneous conditional heteroscedasticity of εt , which is not straightforward.

7. FINITE SAMPLE TESTS As we discussed in Section 3, one of the nicest features of the GRS test is that it allows us to make exact finite sample inferences conditional on the observations of r 1t for t = 1, . . . , T under the assumption of conditional normality and homoscedasticity. But since their distributional assumption turns out to be empirically implausible, several studies have analysed the finite sample properties of their tests in more realistic circumstances. In particular, Affleck-Graves and McDonald (1989) found that while the nominal size and power of the GRS test can be seriously misleading if the non-normalities are severe, they are reasonably robust to minor departures from normality (see also MacKinlay, 1987, and Zhou, 1993, who shows that the finite sample results differ depending on whether the non-normality affects the conditional distribution of r 2t given r 1t , or the joint distribution of r 1t and r 2t , which is not surprising in view of the discussion in the previous section). Given that elliptical distributions are natural alternatives to multivariate normality in this context, Zhou (1993) proposed simulation-based p-values for the GRS statistic for a few fully specified elliptical distributions, including multivariate t, Kotz and discrete scale mixtures of normals (see also Harvey and Zhou, 1991). Similarly, Geczy (2001) suggested an adjustment to the F version of the GRS test that has approximately the correct size under the same distributional assumptions. More recently, Beaulieu et al. (2007a) have developed a method to obtain the exact distribution of the Gaussian-based Wald, LR, LM and F versions of the mean-variance efficiency tests described at the beginning of Section 3 when the innovations are i.i.d. but not necessarily Gaussian or elliptical. For the sake of clarity, let us discuss first the case in which the distribution of the innovations is fully specified, including the nuisance parameters η. Their approach relies on the fact that in classical multivariate regression models such as (3.1) the numerical values of the LR, W and LM test of a = 0 depend exclusively on the realizations of the regressors r 1t and C The Author(s). Journal compilation C Royal Economic Society 2009.

C88

E. Sentana

innovations ε ∗t over the full sample t = 1, . . . , T . Consequently, tests of linear hypothesis on the regression coefficients a are pivotal with respect to the parameters b and ω for any finite T. On this basis, one can simulate to any desired degree of accuracy the finite sample distribution of the trinity of classical tests conditional on the full sample realization of r 1t by generating artificial sample paths of the standardized disturbances ε∗t according to some specific i.i.d. distribution, such a multivariate t with some fixed degrees of freedom ν 0 . 16 Interestingly, their procedure could also be trivially applied to the Wald, LM and DM versions of the MacKinlay and Richardson (1991) test, as long as one exploits the i.i.d. assumption in computing the efficient GMM weighting matrix according to expression (6.3). To handle the more realistic situation in which the distribution of the innovations depends on some unknown parameters η, Beaulieu et al. (2007a) exploit the fact that the sample values of the multivariate skewness and kurtosis measures underlying Mardia’s (1970) multivariate normality tests are also pivotal with respect to b and ω conditional on the full sample realisation of r 1t (see Zhou, 1993, and Dufour et al., 2003). On this basis, they manage to construct an exact 1 − α 1 confidence set for the nuisance parameters by ‘inverting’ a simulated moment-based distributional goodness of fit test that they construct by comparing the aforementioned skewness and kurtosis components with their finite sample expectations computed by simulation under the assumed i.i.d. distribution for the innovations. 17 Then, they repeat the procedure described in the previous paragraph at a confidence level α 2 for all values of η in the 1 − α 1 confidence set, and report the maximum p-value. Somewhat remarkably, they show that the resulting maximized Monte Carlo p-value has exact level α 1 + α 2 , in the sense that the probability of rejecting the null hypothesis of mean-variance efficiency is not greater than α 1 + α 2 for any data-generating process compatible with the null (see Lehmann, 1986, ch. 3). Like in the original GRS test, the sampling framework of their tests is one in which the full sample path of the excess returns on the candidate portfolio r 1t is ‘fixed in repeated samples’. Except in the i.i.d. normal case, though, it is not clear whether the null distribution of the Beaulieu et al. (2007a) tests is in fact independent in finite samples from the values of the regressors. Despite the fact that it may seem a contradiction in terms, it is interesting to analyse the asymptotic behaviour of their finite sample procedures in order to relate them to the analysis in Section 6. Although the exact confidence set for η that they construct should become more and more concentrated around the true value η0 as T → ∞, let us consider for simplicity the case in which a researcher specifies that the distribution of the innovations is i.i.d. t with ν 0 degrees of freedom. Given that the multivariate regression Wald test numerically coincides with a GMM version that exploits the i.i.d. assumption in computing the efficient GMM weighting matrix, the asymptotic size and power properties of the Beaulieu et al. (2007a) procedure are identical to the asymptotic size and power properties of the GMM tests discussed in Section 6.1 as long as the distribution of the innovations is i.i.d., regardless of whether or not they really follow a t with ν 0 degrees of freedom. However, their test will have asymptotically the wrong size if the conditional distribution of the innovations is not i.i.d., and the same is obviously true in finite samples. As we saw in Section 6.2, a potentially relevant example would be one in which the joint distribution of r 1t and r 2t were elliptical. 16 In fact, if one is only interested in finding the exact p-value for a given value of the LR statistic say, as opposed to the exact critical values at some pre-specificed level α, the Beaulieu et al. (2007a) procedure provides the answer with a finite number of simulations. 17 That is, their 1 − α confidence level set for η is made up by all the values of this parameter for which their 1 distribution goodness of fit test has an exact Monte Carlo p-value less than or equal to α 1 . C The Author(s). Journal compilation C Royal Economic Society 2009.

C89


Obviously, standard simulation techniques, such as bootstrap and subsampling methods, can in principle be applied to any of the tests that we have previously discussed, although once again it would be important to distinguish the situation in which r 1t is treated as if it were ‘fixed in repeated samples’ from the more realistic situation in which the relevant sampling framework involves all assets in r t . In this sense, it is worth remembering that the same exogeneity considerations apply to Bayesian testing methods, such as the ones considered by Shanken (1987b), Harvey and Zhou (1990), Kandel et al. (1995) or Cremers (2006), which can also be regarded as finite sample methods.

8. MEAN-VARIANCE-SKEWNESS EFFICIENCY AND SPANNING TESTS Despite its popularity, mean-variance analysis also suffers from important limitations. Specifically, it neglects the effect of higher-order moments on asset allocation. In particular, it ignores the third central moment of returns, which as a measure of skewness is undoubtedly a crucial ingredient in analysing derivative assets, games of chance and insurance contracts. In this sense, Patton (2004) uses a bivariate copula model to show the empirical importance of asymmetries in asset allocation. Further empirical evidence has been provided by Harvey et al. (2002) and Jondeau and Rockinger (2006). From the theoretical point of view, Athayde and Flôres (2004) derive several useful properties of mean-variance-skewness frontiers, and obtain their shape for some examples by simulation techniques. Similarly, Briec et al. (2007) propose an optimization algorithm that, starting from a specific portfolio, obtains the mean-varianceskewness efficient portfolio along a given direction that reflects investors’ relative preferences for those three moments. From an econometric point of view, it is important to distinguish between testing the meanvariance-skewness efficiency of a particular portfolio, and testing spanning of the mean-varianceskewness frontier. Let us start with the first test. Using a variational argument, Kraus and Litzenberger (1976) showed that the risk premia of any portfolio could be expressed as a linear combination of its covariance and co-skewness with any mean-variance-skewness efficient portfolio (see also Barone-Adesi, 1985, Ingersoll, 1987, and Lim, 1989). Specifically, they showed that 18 μi = τr σi1 + τs φi11

∀i,

(8.1)

where σij = cov(ri , rj ), φij k = E[(ri − μi )(rj − μj )(rk − μk )],

18 Strictly speaking, Kraus and Litzenberger (1976) derived a ‘beta’ version of (8.1), in which σ i1 is divided by σ 11 and φ i11 by φ 111 , with the appropriate adjustments to τ r and τ s . An advantage of the formulation in (8.1) relative to the original one is that it does not require the reference portfolio to be asymmetric.


C90

E. Sentana

and the coefficients τ r and τ s are common across assets. These restrictions were cast in a GMM framework by Sánchez-Torres and Sentana (1998) as follows: E(r1t − τr σ11 − τs φ111 ) = 0 E[(r1t − τr σ11 − τs φ111 )2 − σ11 ] = 0 E[(r1t − τr σ11 − τs φ111 )3 − φ111 ] = 0 E(rit − τr σi1 − τs φi11 ) = 0 E[(rit − τr σi1 − τs φi11 )(r1t − τr σ11 − τs φ111 ) − σi1 ] = 0 E[(rit − τr σi1 − τs φi11 )(r1t − τr σ11 − τs φ111 )2 − φi11 ] = 0. Note that for each asset except the reference portfolio there are three restrictions but only two parameters, while for the reference portfolio there are four parameters but only three restrictions. All in all, there are 3(N 2 + 1) moment restrictions on r with 2(N 2 + 1) + 2 parameters (τ r , τ s , σ i1 , φ i11 ). Therefore, the corresponding overidentification test has N 2 − 1 degrees of freedom under the null hypothesis of mean-variance-skewness efficiency of r 1 , the loss of one degree of freedom relative to the MacKinlay and Richardson (1991) test being due to the addition of the parameter τ s . As in the case of mean-variance frontiers, the overidentifying test can be made robust to departures from the assumption of normality, conditional homoscedasticity, serial independence or identity of distribution. Given that (8.1) would also arise from an asset pricing model in which the SDF were proportional to (8.2) 1 − τr (r1t − μ1 ) − τs r1t2 − μ21 + σ11 , we could always interpret a test of H 0 : τ s = 0 as a test that (co-)skewness with r 1t is not priced. 19 This interpretation also suggests that an alternative test of the mean-variance-skewness efficiency of r 1t could be obtained from the SDF-type restrictions: = 0 ∀i. E rit 1 − τr (r1t − μ1 ) − τs r1t2 − μ21 + σ11 An econometric problem that arises in this set-up is that σ i1 and φ i11 are highly crosssectionally collinear in practice (see Barone-Adesi et al., 2004), which makes the separate identification of τ r and τ s problematic (see Kan and Zhang, 1999a,b, or Kleibergen, 2007, for related discussions in more general contexts). Given the well-known relationship between beta pricing and SDF pricing, Barone-Adesi et al. (2004) proposed a ‘quadratic’ regression version of the above problem. Specifically, they showed that if the SDF is a linear combination of r 1t and (R 21t − R 0t ), then the intercept of the following multivariate regression 2 − R0t + vt r2t = α + βr1t + γ R1t must satisfy the restriction α = τg γ ,

(8.3)

19 Chabi-Yo et al. (2007) extend the infinitesimal risk analysis of Samuelson (1970) to provide a justification for an SDF specification such as (8.2). They also provide an alternative representation of the SDF in terms of r 1t and a skewnessrepresenting portfolio, which is the least-squares projection of r 21t on a constant and r t . C The Author(s). Journal compilation C Royal Economic Society 2009.


C91

where τ g is a scalar parameter (see also Barone-Adesi, 1985). However, it is necessary to bear in mind that unless r 1t is symmetric, γ i will not be exactly proportional to the co-skewness of asset i with r 1 even if one makes the additional assumptions that E(v it |r 1t , I t−1 ) is 0 and both R 0t and V (v it |r 1t , I t−1 ) are constant because φi11 = cov rit , r1t2 = γi V r1t2 + βi cov r1t , r1t2 . As a result, one has to be careful in testing whether co-skewness with r 1t is priced (see also Chabi-Yo et al., 2007). Nevertheless, Barone-Adesi et al. (2004) argue that the difference between γ i and φ i11 /V (r 21t ) is likely to be fairly small in practice when r 1t is a well-diversified portfolio, since the distribution of such portfolios is strongly leptokurtic but only mildly asymmetric, if at all. 20 More recently, Beaulieu et al. (2008) have explained how to obtain by simulation the finite sample size of the Wald and LR test of the non-linear restriction (8.3) under the assumption that the distribution of εt conditional on I t−1 and the past, present and future of r 1t is i.i.d.(0, , ρ). 21 Notice, though, that like in the case of the mean-variance frontier without a riskless asset, the fact that a portfolio is mean-variance-skewness efficient does not imply that any particular agent would be interested in investing in it. An obvious example is the usual mean-variance tangency portfolio. The properties of the mean-variance frontier imply that such a portfolio will trivially satisfy (8.1) with τ s = 0. However, only those agents who do not care about skewness will choose it. Therefore, from an investors’ point of view it may be more interesting to consider meanvariance-skewness spanning tests. The problem with those tests is that in general the meanvariance-skewness frontier is not generated by any finite number of assets. Nevertheless, Menc´ıa and Sentana (2009a) make mean-variance-skewness analysis fully operational by working with a rather flexible family of multivariate asymmetric distributions, known as location-scale mixtures of normals (LSMN), which nest as particular cases several important elliptically symmetric distributions, such as the Gaussian or the Student t, and also some well-known asymmetric distributions like the Generalized Hyperbolic ({GH}) introduced by Barndorff-Nielsen (1977). The {GH} distribution in turn nests many other well-known and empirically relevant special cases, such as symmetric and asymmetric versions of the Hyperbolic (Chen et al., 2008), Normal Gamma (Madan and Milne, 1991), Normal Inverse Gaussian (Aas et al., 2005) or Multivariate Laplace (Cajigas and Urga, 2007). In addition, LSMN nest other interesting examples, such as finite mixtures of normals, which have been shown to be a flexible and empirically plausible device to introduce non-Gaussian features in high dimensional multivariate distributions (see e.g. Kon, 1984), but which at the same time remain analytically tractable. Formally, a random vector r of dimension N follows an LSMN if it can be generated as: r = υ + ξ −1 ϒδ + ξ −1/2 ϒ 1/2 ε o ,

(8.4)

where υ and δ are N-dimensional vectors, ϒ is a positive definite matrix of order N , εo ∼ N(0, IN ), and ξ is an independent positive mixing variable whose distribution function depends on a vector of q shape parameters . Since r given ξ is Gaussian with conditional mean 20 S´ anchez-Torres and Sentana (1998) proposed a moment test of the restriction E(r 1t − μ 1 )3 = 0 to assess the asymmetry of the distribution of r 1t . The advantage of their test relative to the skewness component of the usual Jarque and Bera (1980) test is that it can be made robust to non-normality, heteroscedasticity and serial correlation (see also Bai and Ng, 2005, and Bontemps and Meddahi, 2005, for closely related approaches). 21 In addition, they explicitly consider the more general case in which a riskless asset is not available.


C92

E. Sentana

υ + ϒδξ −1 and covariance matrix ϒξ −1 , it is clear that υ and ϒ play the roles of location vector and dispersion matrix, respectively. The parameters allow for flexible tail modelling, while the vector δ introduces skewness in this distribution. For ease of interpretation, Menc´ıa and Sentana (2009a) re-write the data generation process for returns as r = μ + 1/2 ε ∗ ,

(8.5)

where ε ∗ is a standardized LSMN vector that is obtained from (8.4) by choosing υ and ϒ appropriately. In addition, they choose δ = −1/2 d

(8.6)

in order to make the distribution of r independent of the particular factorization of in (8.5). In terms of portfolio allocation, Menc´ıa and Sentana (2009a) show that if the distribution of asset returns can be expressed as an LSMN, then the distribution of any portfolio that combines those assets will be uniquely characterized by its mean, variance and skewness parameter w d. This implies that, from an investor’s point of view, the relative attractiveness of any two portfolios can always be explained in terms of those three quantities because all higher-order moments depend on the lower ones and the common tail parameters . Hence, one only needs to characterize the investment opportunity set in terms of these moments to fully describe the investor’s available strategies. Furthermore, Menc´ıa and Sentana (2009a) show that the efficient part of this frontier can be spanned by three funds: the fund that together with the safe asset generates the usual meanvariance frontier, whose weights are proportional to ϕ + = −1 μ, plus an additional fund whose weights are given by the vector d in (8.6). This second vector can be interpreted as an asymmetryvariance efficient portfolio because one can maximize efficiency for a given standard deviation by considering portfolios with weights proportional to d. Consequently, any portfolio in the efficient part of the mean-variance-skewness frontier will be of the type wr ϕ + + ws d, where w r and w s are two scalars. 22 On this basis, Menc´ıa and Sentana (2009a) develop a mean-variance-skewness spanning test that jointly assesses whether ϕ + 2 = 0 and d 2 = 0. Given that they work within a fully parametric framework, their test is based on the asymptotic distribution of the ML estimator of the parameters of the LSMN model. In this regard, they provide analytical expressions for the score by means of the EM algorithm, and explain how to reliably evaluate the information matrix. 23

9. CONCLUSIONS This paper provides a survey of the econometrics of mean-variance efficiency tests. Starting with the classic F-test of Gibbons et al. (1989) and its generalized method of moments version, I analyse the effects of the number of assets and portfolio composition on test power. I then discuss asymptotically equivalent tests based on portfolio weights, and study the trade-offs between 22

There are other asymmetric distributions that satisfy this property. Specifically, Simaan (1993) studies portfolio allocation when excess returns are the sum of an elliptical random vector and an independent scalar asymmetric variable times a constant vector. Similarly, Menc´ıa and Sentana (2009b) consider a multivariate Hermite expansion of a multivariate normal vector in which asymmetry is a common feature. 23 In principle, one could exploit the non-elliptical nature of the distribution of returns for the only purpose of obtaining more efficient parameter estimates of the mean vector and covariance matrix of returns, as in Section 6. As we have just seen, though, mean-variance analysis is generally suboptimal for asymmetric return distributions. C The Author(s). Journal compilation C Royal Economic Society 2009.


C93

efficiency and robustness of using parametric and semi-parametric likelihood procedures that assume either elliptical innovations or elliptical returns. After reviewing finite sample tests, I conclude with a discussion of mean-variance-skewness efficiency and spanning tests. A unifying theme of this survey is that empirical researchers must decide how much a priori knowledge about the degree of inefficiency of the candidate portfolio, its exogeneity, the pattern of the residual covariance matrix or the conditional distribution of asset returns they want to use in order to obtain tests that are either more powerful or have more reliable finite sample distributions. As usual, if they make the wrong a priori assumptions they may inadvertently introduce potential biases in their conclusions. In this sense, it is important that they are aware of and understand those biases, so that they can robustify their inferences. However, it does not necessarily follow that they should systematically rely on ‘asymptotically robust’ procedures whose main justification is based on first-order limiting results if they provide a poor approximation in finite samples. In any case, there are many important issues that I have unfortunately not considered in the interest of space. In particular, I have not looked at mean-variance efficiency tests when a riskless asset is not available (as in e.g. Gibbons, 1982, Kandel, 1984, Shanken, 1985, 1986, Zhou, 1991, Velu and Zhou, 1999, and more recently Beaulieu et al., 2007b), in which case the regression should be run in terms 1of returns instead of excess returns, and the null hypothesis should become H0 : αi = (1 − N j =1 bij ) ∀i, where is a scalar parameter representing the expected return of the so-called zero–beta portfolio. As I mentioned before, in those circumstances it is important to distinguish between mean-variance efficiency tests on the one hand, and spanning tests on the other (see Huberman and Kandel, 1987, and De Roon and Nijman, 2001, for a recent survey), in which the null hypothesis involves restrictions on both intercepts and slopes of the multivariate regression model (3.1) (see Peñaranda and Sentana, 2008a, for a comparison of alternative GMM procedures). Moreover, I have ignored the effects of transaction costs and short sale constraints on testing for mean-variance analysis, which are discussed in detail by De Roon et al. (2001). Short sale and additivity constraints are particularly relevant in style analysis, which is often used in practice (see Sharpe, 1992, for a definition, and De Roon et al., 2004, for a discussion of the econometric issues). I have also disregarded the effects of using proxies of the true benchmark portfolios r 1t , which is particularly relevant in asset pricing applications in view of the so-called Roll (1977) critique (see Kandel and Stambaugh, 1987, and Shanken, 1987a). There is also an extensive body of literature that looks at the two-pass procedures of Fama and MacBeth (1973), which continue to attract substantial attention from practitioners (see Shanken, 1992, Lewellen et al., 2006, and Shanken and Zhou, 2007), and also Cochrane (2001, p. 247) for a re-interpretation of their procedure in cross-sectional and pooled regression contexts in which the estimated regression coefficients Bˆ are held constant over the full sample period). Similarly, there is a growing literature that discusses portfolio selection and its pricing implications taking into account either fourth-order moments of the distribution of returns through expansions of general expected utility von Neumann–Morgenstern preferences (see e.g. Dittmar, 2002, Jondeau and Rockinger, 2006, Chabi-Yo et al., 2008, and Guidolin and Timmermann, 2008), or a specific parametric class of utility functions (see Gouriéroux and Monfort, 2005). Relatedly, Jurczenko et al. (2006) extend the dual approach in Briec et al. (2007) to obtain the portfolio frontier for fourth-order moments. Finally, a very important issue that I have ignored is the fact that nowadays it is widely accepted that asset returns are predictable, if not in mean at least in variance, and that C The Author(s). Journal compilation C Royal Economic Society 2009.

C94

E. Sentana

investors can exploit this fact to their advantage by using conditional distributions as opposed to unconditional ones in deciding their portfolio strategies. 24 For instance, an investor can not only choose a passive ‘buy and hold’ portfolio strategy whose weights are fixed over time, but also define a dynamic trading strategy as a function of the volatility level of the stock market, as measured by the VIX, say. Frontiers for such active strategies were introduced by Hansen and Richard (1987), and have been recently revisited by Ferson and Siegel (2001), Abhyankar et al. (2007) and Peñaranda and Sentana (2008b). Hansen and Richard (1987) carefully distinguish between conditional mean-variance frontiers, which refer to conditional moments of active strategies, from unconditional mean-variance frontiers, which bound the first two unconditional moments of all conceivable actively managed portfolios. In turn, these unconditional frontiers should not be confused with unconditional mean-variance frontiers for passive portfolios, where by passive we mean portfolios whose weights do not depend on the information available at the time of trading. 25 In line with most of the existing literature on mean-variance efficiency tests, though, the information that is available at the time of trading has played no explicit role in this paper. In this strict sense, therefore, one could regard the procedures that I have surveyed as tests of passive mean-variance efficiency, although the underlying assets could be portfolios managed according to some specific dynamic strategy. At first sight, it may seem irrelevant to study passive strategies in the presence of conditioning information. However, following Hansen and Richard (1987) and many others, empirical work on unconditional mean-variance frontiers typically relies on passive strategies of managed portfolios such as r t ⊗ x t−1 , where x t−1 is a vector of predictor variables known at time t − 1, as a way of approximating the complexity of active strategies without running the risk of misspecifying the conditional distribution of asset returns (see chapter 8 in Cochrane, 2001, for a justification). Still, other authors prefer to impose functional form restrictions on the conditional distribution of r t given x t−1 . In some cases, those restrictions amount to assuming that the conditional analogue to the multivariate regression slope and intercepts in (3.1) linearly depend on x t−1 while remains constant (see Beaulieu et al., 2007a, or Morales, 2009, for recent examples). Alternatively, the conditional regression coefficients and residual covariance matrix may be kept constant, but the conditional means, variances and covariances of r 1t are allowed to change over time (as in Gouriéroux et al., 1991). A third possibility is to assume that the conditional mean of r t is linear in x t−1 but the corresponding conditional covariance is constant (see e.g. Ferson and Siegel, 2009). Such parametric restrictions typically imply that some of the procedures surveyed in the previous sections can be easily adapted. For instance, Beaulieu et al. (2007a) test conditional mean-variance efficiency by checking that the coefficients of x t−1 in the regression of r 2t on x t−1 and r 1t ⊗ x t−1 are simultaneously 0. Similarly, Property 17 in Gouriéroux et al. (1991) implies that under their assumptions a −1 a also reflects the time-invariant incremental Sharpe ratio that separates the conditional mean-variance frontier generated from r 1t alone from the one generated from both r 1t and r 2t , even though the unconditional means of the corresponding maximum 24

See Cochrane (2001) for a summary of the empirical evidence on mean predictability, and Sentana (2005) for a recent example of the link between regression forecasts and optimal portfolios. 25 Pe˜ naranda and Sentana (2008b) also discuss extended mean-variance frontiers, which correspond to actively managed portfolios whose cost is one on average, but not necessarily one for every possible value of the variables in the information set. In addition, there is a non-trivial connection between mean-variance preferences and return frontiers when investors rely on active strategies. Peñaranda (2008) studies such a connection, showing that different mean-variance preferences lead to different interpretations of the results of portfolio efficiency tests. C The Author(s). Journal compilation C Royal Economic Society 2009.


C95

conditional Sharpe ratios do not coincide with the maximum Sharpe ratios of constant-weight portfolios discussed in Section 2. As a result, a test of H 0 : a = 0 is relevant for both conditional and passive mean-variance frontiers. Finally, Ferson and Siegel (2009) compare the maximum Sharpe ratio of the unconditional Hansen and Richard (1987) frontier for arbitrage portfolios constructed from r 1t alone, with the one generated from both r 1t and r 2t . To do so, they exploit a result in Ferson and Siegel (2001) which indicates that the arbitrage portfolio with maximum unconditional Sharpe ratio is given by [μ(xt−1 )μ (xt−1 ) + (xt−1 )]−1 μ (xt−1 )rt , where μ(xt−1 ) and (x t−1 ) are the mean vector and covariance matrix of the distribution of r t given x t−1 . In principle, the procedures described in the earlier sections could also be modified to test for conditional mean-variance efficiency for a specific value that the conditioning variables may take at the time of trading. Non-parametric procedures can be developed by localizing either with respect to state (as in Wang, 2002, 2003, and Kayahan and Stengos, 2007) or with respect to time (as in Lewellen and Nagel, 2006; see also Fan et al., 2007, for a combined approach). All these issues constitute interesting avenues for further research.

ACKNOWLEDGMENTS This paper was prepared for the Econometrics Journal Session on Financial Econometrics that took place during the 2008 Royal Economic Society Annual Conference at the University of Warwick. Some of the material is partly based on a series of Master theses that I supervised at CEMFI between 1994 and 2004, which became joint work with Pedro L. Sánchez-Torres, Mar´ıa Rada, Francisco Peñaranda, Javier Menc´ıa and Dante Amengual, to whom I am grateful. I am also grateful to the last three for their comments on this draft, as well as to Peter Bossaerts, Raymond Kan, Richard Luger, Theo Nijman, Eric Renault, Alessio Sancetta, Jay Shanken, Giovanni Urga and Guofu Zhou for their feedback. The comments of the editor and two anonymous referees have also substantially improved the presentation. Of course, the usual caveat applies. Financial support from the Spanish Ministry of Science and Innovation through grant ECO 2008-00280 is gratefully acknowledged.

REFERENCES Aas, K., X. K. Dimakos and I. H. Haff (2005). Risk estimation using the multivariate normal inverse Gaussian distribution. Journal of Risk 8, 39–60. Abhyankar, A., D. Basu and A. Stremme (2007). Portfolio efficiency and discount factor bounds with conditioning information: an empirical study. Journal of Banking and Finance 31, 419–37. Affleck-Graves, J. and B. McDonald (1989). Nonnormalities and tests of asset pricing theories. Journal of Finance 44, 889–908. Affleck-Graves, J. and B. McDonald (1990). Multivariate tests of asset pricing: the comparative power of alternative statistics. Journal of Financial and Quantitative Analysis 25, 163–85. Amengual, D. and E. Sentana (2009). A comparison of mean-variance efficiency tests. Forthcoming in Journal of Econometrics. Available at http://dx.doi.org/10.1016/j.jeconom.2009.06.006. Antoine, B. (2008). Portfolio selection with estimation risk: a test-based approach. Working paper, Simon Fraser University. Athayde, G. M. de and R. G. Flôres (2004). Finding a maximum skewness portfolio—a general solution to three-moments portfolio choice. Journal of Economic Dynamics and Control 28, 1335–52. Bahadur, R. (1960). Stochastic comparison of tests. Annals of Mathematical Statistics 31, 276–95. C The Author(s). Journal compilation C Royal Economic Society 2009.

C96

E. Sentana

Bai, J. and S. Ng (2005). Tests for skewness, kurtosis, and normality for time series data. Journal of Business and Economic Statistics 23, 49–60. Barndorff-Nielsen, O. E. (1977). Exponentially decreasing distributions for the logarithm of particle size. Proceedings of the Royal Society 353, 401–19. Barone-Adesi, G. (1985). Arbitrage equilibrium with skewed asset returns. Journal of Financial and Quantitative Analysis 20, 299–313. Barone-Adesi, G., P. Gagliardini and G. Urga (2004). Testing asset pricing models with co-skewness. Journal of Business and Economic Statistics 22, 474–85. Bartlett, M. (1937). Properties of sufficiency and statistical tests. Proceedings of the Royal Society Series A 160, 268–82. Bawa, V. S., S. J. Brown and R. W. Klein (1979). Estimation Risk and Optimal Portfolio Choice. Amsterdam: North-Holland. Beaulieu, M. C., J. M. Dufour and L. Khalaf (2007a). Testing mean-variance efficiency in CAPM with possibly non-Gaussian errors: an exact simulation-based approach. Journal of Business and Economic Statistics 25, 398–410. Beaulieu, M. C., J. M. Dufour and L. Khalaf (2007b). Testing Black’s CAPM with possibly non-Gaussian errors: an exact identification-robust simulation-based approach. Working paper, CIRANO and CIREQ, University of Montréal. Beaulieu, M. C., J. M. Dufour and L. Khalaf (2008). Finite-sample multivariate tests of asset pricing models with coskewness. Working paper, McGill University. Bekaert, B. and M. S. Urias (1996). Diversification, integration and emerging market closed-end funds. Journal of Finance 51, 835–69. Berk, J. (1997). Necessary conditions for the CAPM. Journal of Economic Theory 73, 245–57. Berndt, E. R. and N. E. Savin (1977). Conflict among criteria for testing hypotheses in the multivariate linear regression model. Econometrica 45, 1263–78. Black, F., M. C. Jensen and M. Scholes (1972). The capital asset pricing model: some empirical tests. In Michael C. Jensen (Ed.), Studies in the Theory of Capital Markets, 79–121. New York: Praeger. Bontemps, C. and N. Meddahi (2005). Testing normality: a GMM approach. Journal of Econometrics 124, 149–86. Bossaerts, P. and P. Hillion (1995). Testing the mean-variance efficiency of well-diversified portfolios in large cross-sections. Annales d’Economie et de Statistique 40, 93–124. Breusch, T. S. (1979). Conflict among criteria for testing hypotheses: extensions and comments. Econometrica 47, 203–07. Briec, W., K. Kerstens and O. Jokung (2007). Mean-variance-skewness portfolio performance gauging: a general shortage function and dual approach. Management Science 53, 135–49. Britten-Jones, M. (1999). The sampling error in estimates of mean-variance efficient portfolio weights. Journal of Finance 54, 655–71. Cajigas, J. and G. Urga (2007). Dynamic conditional correlation models with asymmetric multivariate Laplace innovations. Working paper 08-2007, Cass Business School Centre for Econometric Analysis. Campbell, J., W. Lo and A. C. MacKinlay (1996). The Econometrics of Financial Markets. Princeton: Princeton University Press. Chabi-Yo, F., E. Ghysels and E. Renault (2008). On portfolio separation theorems with heterogeneous beliefs and attitudes towards risk. Working paper, University of North Carolina. Chabi-Yo, F., D. Leisen and E. Renault (2007). Implications of asymmetry risk for portfolio analysis and asset pricing. Working Paper 2007-47, Bank of Canada. Chamberlain, G. (1983a). A characterization of the distributions that imply mean-variance utility functions. Journal of Economic Theory 29, 185–201. C The Author(s). Journal compilation C Royal Economic Society 2009.


C97

Chamberlain, G. (1983b). Funds, factors, and diversification in arbitrage pricing models. Econometrica 51, 1305–23. Chamberlain, G. and M. Rothschild (1983). Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica 51, 1281–304. Chen, Y., W. Härdle and S. Jeong (2008). Nonparametric risk management with generalized hyperbolic distributions. Journal of the American Statistical Association 103, 910–23. Cochrane, J. (2001). Asset Pricing. Princeton: Princeton University Press. Cremers, K. J. M. (2006). Multifactor efficiency and Bayesian inference. Journal of Business 79, 2951–98. Crowder, M. J. (1976). Maximum likelihood estimation for dependent observations. Journal of the Royal Statistical Society, Series B, 38, 45–53. De Roon, F. A. and T. E. Nijman (2001). Testing for mean-variance spanning: a survey. Journal of Empirical Finance 8, 111–56. De Roon, F. A., T. E. Nijman and J. R. ter Horst (2004). Evaluating style analysis. Journal of Empirical Finance 11, 29–53. De Roon, F. A., T. E. Nijman and B. J. M. Werker (2001). Testing for mean-variance spanning with short sales constraints and transaction costs: the case of emerging markets. Journal of Finance 56, 723– 44. De Santis, G. (1995). Volatility bounds for stochastic discount factors: tests and implications from international financial markets. Working paper, University of Southern California. Dittmar, R. F. (2002). Non-linear pricing kernels, kurtosis preference and evidence from the cross section of equity returns. Journal of Finance 57, 368–403. Dufour, J.-M., L. Khalaf and M. C. Beaulieu (2003). Exact skewness-kurtosis tests for multivariate normality and goodness-of-fit in multivariate regressions with application to asset pricing models. Oxford Bulletin of Economics and Statistics 65, 891–906. Engle, R. F., D. F. Hendry and J. F. Richard (1983). Exogeneity. Econometrica 51, 277–304. Errunza, V., K. Hogan and M. Hung (1999). Can the gains from international diversification be achieved without trading abroad? Journal of Finance 54, 2075–107. Fama, E. and J. MacBeth (1973). Risk, return and equilibrium: empirical tests. Journal of Political Economy 81, 607–36. Fan, J., Y. Fan and J. Jiang (2007). Dynamic integration of time- and state-domain methods for volatility estimation. Journal of the American Statistical Association 102, 618–31. Farebrother, R. (1990). The distribution of a quadratic form in normal variables (Algorithm AS 256. 3). Applied Statistics 39, 294–309. Feller, W. (1971). An Introduction to Probability Theory and its Applications, Volume 2 (3rd ed.). New York: Wiley. Ferson, W. A. and A. F. Siegel (2001). The efficient use of conditioning information in portfolios. Journal of Finance 56, 967–82. Ferson, W. A. and A. F. Siegel (2009). Testing portfolio efficiency with conditioning information. Review of Financial Studies 22, 2735–58. Fiorentini, G. and E. Sentana (2007). On the efficiency and consistency of likelihood estimation in multivariate conditionally heteroskedastic dynamic regression models. Working Paper 0713, CEMFI. Fiorentini, G., E. Sentana and G. Calzolari (2003). Maximum likelihood estimation and inference on multivariate conditionally heteroscedastic dynamic regression models with Student t innovations. Journal of Business and Economic Statistics 24, 532–46. Geczy, C. C. (2001). Some generalized tests of mean-variance efficiency and multifactor model performance. Working paper, Wharton School. Geweke, J. (1981). The approximate slopes of econometric tests. Econometrica 49, 1427–42. C The Author(s). Journal compilation C Royal Economic Society 2009.

C98

E. Sentana

Gibbons, M. (1982). Multivariate tests of financial models: a new approach. Journal of Financial Economics 10, 3–27. Gibbons, M., S. Ross and J. Shanken (1989). A test of the efficiency of a given portfolio. Econometrica 57, 1121–52. Gouriéroux, C. and A. Monfort (1995). Statistics and Econometric Models. Volumes 1 and 2. Cambridge: Cambridge University Press. Gouriéroux, C. and A. Monfort (2005). The econometrics of efficient portfolios. Journal of Empirical Finance 12, 1–41. Gouriéroux, C., A. Monfort and E. Renault (1991). A general framework for factor models. Working paper, INSEE. Gouriéroux, C., A. Monfort and A. Trognon (1984). Pseudo maximum likelihood methods: theory. Econometrica 52, 681–700. Guidolin, M. and A. Timmermann (2008). International asset allocation under regime switching, skew and kurtosis preferences. Review of Financial Studies 21, 889–935. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50, 1029–54. Hansen, L. P. and R. Jagannathan (1991). Implications of security market data for models of dynamic economies. Journal of Political Economy 99, 225–62. Hansen, L. P. and S. F. Richard (1987). The role of conditioning information in deducing testable restrictions implied by dynamic asset pricing models. Econometrica 55, 587–613. Harvey, C. R., J. C. Liechty, M. W. Liechty and P. Müller (2002). Portfolio selection with higher moments. Working paper, Duke University. Harvey, C. R. and G. Zhou (1990). Bayesian inference in asset pricing tests. Journal of Financial Economics 26, 221–54. Harvey, C. R. and G. Zhou (1991). International asset pricing with alternative distributional specifications. Journal of Empirical Finance 1, 107–31. Hodgson, D. J. (2000). Unconditional pseudo-maximum likelihood and adaptive estimation in the presence of conditional heterogeneity of unknown form. Econometric Reviews 19, 175–206. Hodgson, D., O. Linton and K. Vorkink (2002). Testing the capital asset pricing model efficiently under elliptical symmetry: a semiparametric approach. Journal of Applied Econometrics 17, 617–39. Huberman, G. and S. Kandel (1987). Mean-variance spanning. Journal of Finance 42, 873–88. Imhof, J. P. (1961). Computing the distribution of quadratic forms in normal variables. Biometrika 48, 419–26. Ingersoll, J. (1987). Theory of Financial Decision Making. Lanham, MD: Rowman & Littlefield. Jarque, C. M. and A. Bera (1980). Efficient tests for normality, heteroskedasticity, and serial independence of regression residuals. Economics Letters 6, 255–59. Jobson, J. D. and B. Korkie (1982). Potential performance and tests of portfolio efficiency. Journal of Financial Economics 10, 433–66. Jobson, J. D. and B. Korkie (1983). Statistical inference in two-parameter portfolio theory with multiple regression software. Journal of Financial and Quantitative Analysis 18, 189–97. Jobson, J. D. and B. Korkie (1985). Some tests of linear asset pricing with multivariate normality. Canadian Journal of Administrative Science 2, 114–38. Jondeau, E. and M. Rockinger (2006). Optimal portfolio allocation under higher moments. European Financial Management 12, 29–55. Jurczenko, E., B. Maillet and P. Merlin (2006). Hedge funds portfolio selection with higher order moments: a non-parametric mean-variance-skewness-kurtosis efficient frontier. In E. Jurczenko and B. Maillet (Eds.), Multi-Moment Asset Allocation and Pricing Models, 51–66. New York: Wiley. C The Author(s). Journal compilation C Royal Economic Society 2009.


C99

Kan, R. and C. Zhang (1999a). Two-pass tests of asset pricing models with useless factors. Journal of Finance 54, 204–35. Kan, R. and C. Zhang (1999b). GMM tests of stochastic discount factor models with useless factors. Journal of Financial Economics 54, 103–27. Kan, R. and G. Zhou (2001). Tests of mean-variance spanning. Working paper, J. M. Olin School of Business, Washington University in St. Louis. Kan, R. and G. Zhou (2006). Modeling non-normality using multivariate t: implications for asset pricing. Working paper, J. M. Olin School of Business, Washington University in St. Louis. Kandel, S. (1984). The likelihood ratio test statistic of mean-variance efficiency without a riskless asset. Journal of Financial Economics 13, 575–92. Kandel, S. and R. Stambaugh (1987). On correlations and inferences about mean-variance efficiency. Journal of Financial Economics 18, 61–90. Kandel, S. and R. Stambaugh (1989). A mean-variance framework for tests of asset pricing models. Review of Financial Studies 2, 125–56. Kandel, S., R. McCulloch and R. Stambaugh (1995). Bayesian inference and portfolio inference. Review of Financial Studies 8, 1–53. Kayahan, B. and T. Stengos (2007). Testing the capital asset pricing model with local maximum likelihood methods. Mathematical and Computer Modelling 46, 138–50. Kingman, J. F. C. (1978). Uses of exchangeability. Annals of Probability 6, 183–97. Kleibergen, F. (2007). Test of risk premia in linear factor models. Working paper, Brown University. Kon, S. J. (1984). Models of stock returns—a comparison. Journal of Finance 39, 147–65. Kotz, S. (1975). Multivariate distributions at a cross-road. In G. P. Patil, S. Kotz and J. K. Ord (Eds.), Statistical Distributions in Scientific Work, Volume I, 247–70. Dordrecht: Reidel. Kraus, A. and R. H. Litzenberger (1976). Skewness preference and the valuation of risky assets. Journal of Finance 31, 1085–100. Lehmann, E. L. (1986). Testing Statistical Hypotheses (2nd ed.). New York: Wiley. Lewellen, J. and S. Nagel (2006). The conditional CAPM does not explain asset-pricing anomalies. Journal of Financial Economics 82, 289–314. Lewellen, J., S. Nagel and J. Shanken (2006). A skeptical appraisal of asset-pricing tests. NBER Working Paper 12360. Forthcoming in Journal of Financial Economics. Lim, K. G. (1989). A new test of the three-moment capital asset pricing model. Journal of Financial and Quantitative Analysis 24, 205–16. Lintner, J. (1965). The valuation of risky assets and the selection of risky investments in portfolio selection and capital budgets. Review of Economics and Statistics 47, 13–37. Lo, A. W. and A. C. MacKinlay (1990). Data-snooping biases in tests of financial asset pricing models. Review of Financial Studies 3, 431–67. MacKinlay, A. C. (1987). On multivariate tests of the Capital Asset Pricing Model. Journal of Financial Economics 18, 341–72. MacKinlay, A. C. (1995). Multifactor models do not explain deviations from the Capital Asset Pricing Model. Journal of Financial Economics 38, 3–28. MacKinlay, A. C. and M. Richardson (1991). Using generalized method of moments to test mean-variance efficiency. Journal of Finance 46, 511–27. Madan, D. B. and F. Milne (1991). Option pricing with VG martingale components. Mathematical Finance 1, 39–55. Magnus, J. R. and H. Neudecker (1988). Matrix Differential Calculus with Applications on Statistics and Econometrics. New York: Wiley.


C100

E. Sentana

Maller, R. A. and D. A. Turkington (2002). New light on the portfolio allocation problem. Mathematical Methods of Operations Research 56, 501–11. Mardia, K. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika 57, 519– 30. Markowitz, H. (1952). Portfolio selection. Journal of Finance 8, 77–91. Meloso, D. and P. Bossaerts (2006). Portfolio correlation and the power of portfolio efficiency tests. Working paper, Bocconi University. Menc´ıa, J. and E. Sentana (2009a). Multivariate location-scale mixtures of normals and meanvariance-skewness portfolio allocation. Forthcoming in Journal of Econometrics. Available at http://dx.doi.org/10.1016/j.jeconom.2009.05.001. Menc´ıa, J. and E. Sentana (2009b). Distributional tests in multivariate dynamic models with normal and Student t innovations. Working paper, CEMFI. Morales, L. (2009). Mean-variance efficiency tests with conditioning information: a comparison. Unpublished Master Thesis 0902, CEMFI. Mossin, J. (1966). Equilibrium in a capital asset market. Econometrica 35, 768–83. Newey, W. K. and D. L. McFadden (1994). Large sample estimation and hypothesis testing. In R. F. Engle (Ed.), Handbook of Econometrics, Volume IV, 2111–245. Amsterdam: Elsevier. Newey, W. K. and K. D. West (1987). Hypothesis testing with efficient method of moments estimation. International Economic Review 28, 777–87. Ogaki, M. (1993). Generalized method of moments: econometric applications. In G. S. Maddala, C. R. Rao and H. D. Vinod (Eds.), Handbook of Statistics, Volume 11, 455–88. Amsterdam: Elsevier. Owen, J. and R. Rabinovitch (1983). On the class of elliptical distributions and their applications to the theory of portfolio choice. Journal of Finance 38, 745–52. Patton, A. J. (2004). On the out-of-sample importance of skewness and asymmetric dependence for asset allocation. Journal of Financial Econometrics 2, 130–68. Peñaranda, F. (2008). Understanding portfolio efficiency with conditioning information. Financial Markets Group Discussion Paper 626, London School of Economics. Peñaranda, F. and E. Sentana (2004). Tangency tests in return and stochastic discount factor mean-variance frontiers: a unifying approach. Working paper, CEMFI. Peñaranda, F. and E. Sentana (2008a). Spanning tests in return and stochastic discount factor mean-variance frontiers: a unifying approach. Working paper, CEMFI. Peñaranda, F. and E. Sentana (2008b). Duality in mean-variance frontiers with conditioning information. Working paper, CEMFI. Rada, M. and E. Sentana (1997). The power of mean-variance efficiency tests: portfolio aggregation considerations. Working paper, CEMFI. Renault, E. (1997). Econométrie de la finance: la méthode des moments généralisés. In Y. Simon (Ed.), Encyclopédie des Marchés Financiers, 330–407. Paris: Economica. Roll, R. A. (1977). A critique of the asset pricing theory’s tests. Part I: on past and potential testability of the theory. Journal of Financial Economics 4, 129–76. Ross, S. A. (1976). The arbitrage theory of capital asset pricing. Journal of Economic Theory 13, 341– 60. Ross, S. A. (1978). Mutual fund separation in financial theory—the separating distributions. Journal of Economic Theory 17, 254–86. Samuelson, P. (1970). The fundamental approximation theorem of portfolio analysis in terms of mean, variances and higher moments. Review of Economic Studies 36, 537–41. Sánchez-Torres, P. L. and E. Sentana (1998). Mean-variance-skewness analysis: an application to risk premia in the Spanish stock market. Investigaciones Económicas 22, 5–17. C The Author(s). Journal compilation C Royal Economic Society 2009.


C101

Sentana, E. (2005). Least squares predictions and mean-variance analysis. Journal of Financial Econometrics 3, 56–78. Shanken, J. (1985). Multivariate tests of the zero-beta CAPM. Journal of Financial Economics 14, 327–48. Shanken, J. (1986). Testing portfolio efficiency when the zero-beta rate is unknown: a note. Journal of Finance 41, 269–76. Shanken, J. (1987a). Multivariate proxies and asset pricing relations: living with Roll’s critique. Journal of Financial Economics 18, 91–110. Shanken, J. (1987b). A Bayesian approach to testing portfolio efficiency. Journal of Financial Economics 19, 195–215. Shanken, J. (1992). On the estimation of beta pricing models. Review of Financial Studies 5, 1–33. Shanken, J. (1996). Statistical methods in tests of portfolio efficiency: a synthesis. In G. S. Maddala, and C. R. Rao (Eds.), Handbook of Statistics, Volume 14, 693–711. Amsterdam: Elsevier. Shanken, J. and G. Zhou (2007). Estimating and testing beta pricing models: alternative methods and their performance in simulations. Journal of Financial Economics 84, 40–86. Sharpe, W. F. (1964). Capital asset prices: a theory of capital market equilibrium under conditions of risk. Journal of Finance 19, 425–42. Sharpe, W. F. (1966). Mutual fund performance. Journal of Business 39, 119–38. Sharpe, W. F. (1992). Asset allocation: management style and performance measurement. Journal of Portfolio Management, Winter, 7–19. Sharpe, W. F. (1994). The Sharpe ratio. Journal of Portfolio Management 21, 49–58. Simaan, Y. (1993). Portfolio selection and asset pricing—three parameter framework. Management Science 39, 568–77. ter Horst, J. R., F. A. de Roon and B. J. M. Werker (2006). An alternative approach to estimation risk. In L. Renneboog (Ed.), Advances in Corporate Finance and Asset Pricing, 449–72. Amsterdam: Elsevier. Velu, R. and G. Zhou (1999). Testing multi-beta pricing models. Journal of Empirical Finance 6, 219–41. Wang, K. Q. (2002). Nonparametric tests of conditional mean-variance efficiency of a benchmark portfolio. Journal of Empirical Finance 9, 133–69. Wang, K. Q. (2003). Asset pricing with conditioning information: a new test. Journal of Finance 58, 161– 96. White, H. (1980). A heteroskedastic-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 45, 817–38. White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50, 1–25. Zhou, G. (1991). Small sample tests of portfolio efficiency. Journal of Financial Economics 30, 165–91. Zhou, G. (1993). Asset-pricing tests under alternative distributions. Journal of Finance 48, 1927–42.


Econometrics Journal (2009), volume 12, pp. 397–413. doi: 10.1111/j.1368-423X.2009.00296.x

Identification of peer effects using group size variation L AURENT D AVEZIES † , X AVIER D’H AULTFOEUILLE † ` RE †,‡,§ ,¶ AND D ENIS F OUG E †

‡

CREST-INSEE, 18 boulevard Gabriel Péri 92245 Malakoff Cedex, France E-mails: [email protected], [email protected], [email protected]

CNRS, Délégation Paris A, 27 Rue Paul Bert, 94204 Ivry-sur-Seine Cedex, France § ¶

CEPR, 53-56 Great Sutton Street, London EC1V 0DG, UK

IZA, Schaumberg-Lippe-Straβe 5-9, D-53113 Bonn, Germany

First version received: November 2007; final version accepted: July 2009

Summary This paper studies the econometric properties of a linear-in-means model of social interactions. Under a slightly more restrictive framework than Lee (2007), we show that this model is generally identified when at least three different sizes of peer groups are observed in the sample at hand. While unnecessary in general, homoscedasticity may be required in special cases, for instance when endogenous and exogenous peer effects cancel each other. We extend this analysis to the case where only binary outcomes are observed. Once more, most parameters are semiparametrically identified under weak conditions. However, identifying all of them requires more stringent assumptions, including a homoscedasticity condition. We also develop a parametric estimator for the binary case, which relies on the Geweke-HajivassiliouKeane (GHK) simulator. Monte Carlo simulations illustrate the influence of group sizes on the accuracy of the estimation, in line with the results obtained by Lee (2007). Keywords: Linear-in-means model, Semiparametric identification, Social interactions.

1. INTRODUCTION In a seminal paper, Manski (1993) showed that in a linear-in-expectations model with social interactions, endogenous and exogenous peer effects cannot be separately identified. Only a function of these two types of effects can be identified under some strong exogeneity conditions. In the context of pupil achievement for instance, Hoxby (2000) and Ammermueller and Pischke (2006) reach identification by assuming that variations in time or across classrooms within the same school are random. 1 However, Lee (2007) has recently proposed a modified version of the social interaction model, which corresponds to a linear-in-means model, and which is shown to be identifiable without any of the previous restrictive assumptions, thanks to group size variation.

1 Subsequently, we will often consider the example of peer effects in schools, although the model could also be applied to other topics, like smoking (see e.g. Krauth, 2006), productivity in teams (see Rees, 2003) or retirement (Duflo and Saez, 2003). C The Author(s). Journal compilation C Royal Economic Society 2009. Published by Blackwell Publishing Ltd, 9600 Garsington Road,


398

L. Davezies, X. D’Haultfoeuille and D. Fougère

The aim of our paper is three-fold. First, we re-examine the identification of this linear-inmeans model when group sizes do not depend on the sample size. 2 We believe that, in practice, such an assumption is virtually always satisfied. For instance, there is no reason why the mean classroom size should depend on the size of the sample. Moreover, this extra assumption enables us to clarify the sources of identification in this model. 3 More precisely, we show that in the linear-in-means model, the crucial assumptions for identification are (1) the knowledge of the group sizes and (2) the fact that group sizes take at least three different values. Parametric assumptions on the error term are not required. In general, homoscedasticity is not required either. This last assumption is useful however when both types of peer effects cancel each other, since in this case identification is lost without such a restriction. Secondly, we extend these results to a model where only binary outcomes are observed. Identification of discrete outcome models with social interactions has already been studied by e.g. Brock (2001, 2007) and Krauth (2006). Our model is slightly different, though, as we assume that social interactions may affect individuals through peers’ latent variables rather than through their observable outcomes. This is convenient when only binary outcomes are observable because of data limitation. This model is close to spatial discrete choice models (see e.g. Case, 1992, McMillen, 1992, Pinkse, 1998, Beron, 2004, or Klier, 2008). The difference is that we allow here for exogenous peer effects and for fixed group effects simultaneously. The attractive feature of our result is that it does not rely on any functional assumption concerning the errors. Once more, the exogenous peer effects can be identified through group size variation. On the other hand, due to the loss of information, endogenous peer effects cannot be identified without further restrictions. We show that a homoscedasticity condition is sufficient for this purpose. Thirdly, we develop a parametric estimation of the binary model, complementing the methods proposed by Lee (2007) for the model with a continuous outcome. We show that under a normality assumption on the residuals and a linear specification a` la Mundlak (1961) on the fixed effect, a simulated maximum likelihood estimator can be implemented by using the GHK algorithm (Geweke, 1989, Keane, 1994, and Hajivassiliou et al., 1996). Thus, this estimator is close to Beron and Vijverberg’s (2004) one on spatial probit models. We investigate its finite sample properties through Monte Carlo simulations. The results stress the determining effect of average group size for the accuracy of the inference, in line with Lee’s (2007) result concerning the linear model. The paper is organized as follows. In the next section, we present the theoretical model of social interactions. In Section 3, we study the identification of the model, both for the continuous and the discrete cases. The fourth section discusses the parametric estimation method of the discrete model. Section 5 displays Monte Carlo simulations. Section 6 concludes. Proofs are given in the Appendix.

2. A THEORETICAL MODEL OF SOCIAL INTERACTIONS We consider the issue of individual choices in the presence of social interactions within groups. Let ei denote the continuous choice variable of an individual i who belongs to a group of size

2

This is approximately the scenario with small group interactions proposed by Lee (2007). Under his more general setting, Lee (2007) provides sufficient conditions for identification, but they are rather difficult to interpret (see his assumptions 6.1 and 6.2). 3


Identification of peer effects using group size variation

399

m, xi be her exogenous covariates, and εi her (random) individual-specific characteristic. We suppose that her utility when choosing ei , while the other persons in the group choose (ej )j =i , takes the following form: ⎛ ⎡ ⎞ m 1 ej ⎠ λ0 Ui (ei , (ej )j =i ) = ei ⎣xi β10 + ⎝ m − 1 j =1, j =i ⎛ ⎞ ⎤ m 1 1 + ⎝ xj ⎠ β20 + α + εi ⎦ − ei2 . m − 1 j =1, j =i 2 In this framework, the marginal returns of individual i depend on her own characteristics xi , her peers’ choices (ej )j =i , their observable (exogenous) characteristics (xj )j =i and a group fixed effect α. In a classroom, for instance, the utility of a student depends on her effort ei and on the efforts of others because of spillovers in the learning process. Like Cooley (2007) and CalvóArmengol et al. (2008), among others, we also allow the utility of each individual to depend on the observable characteristics of her peers. Indeed, there is some empirical evidence about the influence of peers’ race, gender or parental education on student achievement (see e.g. Hoxby, 2000, or Cooley, 2007). A plausible explanation is that the marginal effect of ei on achievement (which is positively correlated with the student’s utility) depends on these characteristics. 4 Lastly, the outcome may depend on a classroom-specific effect, because of the teacher’s quality, for instance. This model is close to the one considered by Calvó-Armengol et al. (2008) who study the effect of peers on individual achievement at school. An important difference is that they consider the network of friends, whereas our model is better suited when all classmates potentially affect the student’s achievement. Assuming that α and the (xi , εi )1≤i≤m are observed by all the individuals in the group, the Nash equilibrium of the game (y˜1 , . . . , y˜m ) satisfies ⎛ ⎛ ⎞ ⎞ m m 1 1 y˜j ⎠ λ0 + ⎝ xj ⎠ β20 + α + εi . (2.1) y˜i = xi β10 + ⎝ m − 1 j =1,j =i m − 1 j =1,j =i This model is identical to Lee’s (2007) model of social interactions. Following the terminology introduced by Manski (1993), the second term in the right-hand side corresponds to the endogenous peer effect, the third refers to the exogenous peer effect and α is a contextual (group-specific) effect. This model departs from the one considered by Manski (1993) or by Graham and Hahn (2005) by replacing, on the right-hand side, the expectations relative to the whole group by the means of outcomes and covariates in the group of peers. 5 Interestingly, one can show that Manski’s model is actually the Bayesian Nash equilibrium of the game when player i does not observe the characteristics (xj , εj )j =i of her peers, the (εi )1≤i≤m being mutually independent and independent of (x1 , . . . , xm , α, m). This framework seems more realistic in large groups, whereas the hypothesis that the characteristics of other persons in the group are observed is likely to hold in small ones.

4 For instance, girls may less disrupt classrooms than boys, all things being equal (see Lazear, 2001). In this case, the marginal effect of effort increases for everyone in the classroom. 5 Graham and Hahn (2005) make the further restriction that β = 0, i.e. that there are no exogenous peer effects. 20 C The Author(s). Journal compilation C Royal Economic Society 2009.

400


3. IDENTIFICATION We now turn to the identification of model (2.1). First, as a benchmark, we suppose that the outcomes y˜i are directly observed. This case corresponds to Lee’s (2007) framework, but we investigate it under a slightly different approach in Section 3.1. In Section 3.2, we study the situation where only rough measures of the outcomes, namely yi = 1{y˜i ≥ 0}, are available. In both cases, we implicitly assume that the econometrician knows the group of interactions for each individual. In the previous example, this assumption is mild if students really interact within the classroom, since the classroom identifier is usually known. It can be restrictive otherwise, but to our best knowledge, this assumption is also maintained in all papers studying identification of peer effects, including those by Manski (1993), Brock and Durlauf (2001), Lee (2007), Graham (2008) and Bramoullé et al. (2009). This stems from the fact that, in Manski’s model at least, very little can be inferred from the data and from the model if the peer group is not known (see Manski, 1993, subsection 2.5).

3.1. The benchmark: the linear model In this section, we clarify the results obtained by Lee (2007), in the case where the size m of the group does not depend on the size of the sample. 6 We believe that such an assumption is virtually always satisfied in practice. For instance, there is no reason why the mean classroom size should depend on the size of the sample. Moreover, this restriction enables us to show what is identified from the usual exogeneity condition (see Assumption 3.4 below) and when homoscedasticity is required (see Theorem 3.2 below). It is quite common to observe some but not all members in each group, and we take this into account for identification. On the other hand, we maintain the assumption that the size of the group is observed. 7 Let n denote the number of sampled individuals in the group (n ≤ m). We denote by Y˜ (respectively, X) the vector of outcomes y˜i (respectively, of covariates) of the individuals sampled in the group. Let Fm,n denote the distribution function of (m, n) and FY˜ ,X | m,n denote the conditional distribution of (Y˜ , X) given (m, n). Lastly, we denote by Supp(T ) the support of a random variable T. We rely on the following definition of identification. D EFINITION 3.1. (β10 , β20 , λ0 ) is identified if there exists a function ϕ such that

(β10 , β20 , λ0 ) = ϕ FY˜ ,X | m=m∗ ,n=n∗ (m∗ ,n∗ )∈Supp(m,n) , Fm,n . This definition states that the structural parameters are identified if they can be obtained through the distribution of the data. Implicit in the definition is the fact that our asymptotic is in the number of groups, as is the case in standard panel data models. 8 Now, the key point for identification of the parameters when the y˜i are observed is to focus on the within-group equation,

6

This is approximately the scenario with small group interactions considered by Lee (2007). This assumption is realistic in our leading example. In French panels of students, for instance, classroom sizes are observed while only a fraction of pupils within classrooms is sampled. 8 Indeed, when the number of groups tend to infinity, we are able to estimate consistently (FY˜ ,X | m=m∗ ,n=n∗ )(m∗ ,n∗ )∈Supp(m,n) as well as Fm,n . 7



which may be written as: Wn Y˜ = Wn X

(m − 1)β10 − β20 m − 1 + λ0

+ Wn

U , 1 + λ0 /(m − 1)

401

(3.1)

where U is the vector of unobserved residuals ε for individuals sampled in the group, and Wn denotes the within-group matrix of size n, that is to say the matrix with (1 − 1/n) on the diagonal and (−1/n) elsewhere. To identify the structural parameters, we use the variation in the slope coefficient β(m) = ((m − 1)β10 − β20 )/(m − 1 + λ0 ). For this purpose, we make the following assumptions: A SSUMPTION 3.1. Pr(n ≥ 2) > 0. A SSUMPTION 3.2. Supp(m) contains at least three values. A SSUMPTION 3.3. For all 1 ≤ i, j ≤ m, E[xi εj | m, n] = 0. A SSUMPTION 3.4. E[X Wn X | m, n] is almost surely non-singular. A SSUMPTION 3.5. 1 > λ0 > 1 − min(Supp(m)). Assumption 3.1 simply states that the within-group approach is feasible. Assumption 3.2, which is the cornerstone of our approach, ensures that there is sufficient variation in group sizes. Assumptions 3.1, 3.3 and 3.4 are standard in linear panel data models, except that conditional expectations depend here both on the number of observed individuals in each group and on the group size. Conditioning by n does not cause any trouble if, for instance, the observed individuals are drawn randomly in each group. Finally, Assumption 3.5 ensures that β(m∗ ) exists for all m∗ ∈ Supp(m). 9 T HEOREM 3.1. Under Assumptions 3.1–3.5, β10 is identified. Moreover, (a) if β20 = −λ0 β10 , then λ0 and β20 are identified; (b) if β20 = −λ0 β10 , then λ0 is not identified and β20 is identified only up to scale. Theorem 3.1 states that all parameters are generally identified, provided that there is sufficient variation in the group sizes. As a notable exception, identification is lost in the absence of endogenous and exogenous peer effects, since then β20 = −λ0 β10 = 0. One can always = −λ0 β10 . Using the first conditional moment rationalize such a model with any λ0 = 0 and β20 ˜ of Y alone, one cannot distinguish the case with both exogenous and endogenous peer effects (which cancel out in this case) from the case with no peer effects. Below, we provide a method which yields identification in this case, but it relies on a stronger assumption of homoscedasticity. In any case, one can check whether identification is lost or not, since this amounts to test whether β(·) is constant or not. Contrary to the reduced form approach, we do not need to know the mean (x¯r )1≤r≤R in each group to identify the parameters. Thus, the problem of measurement error on x¯r , which appears when some individuals in the group are unobserved, does not arise in our framework. Here, the crucial assumption is the knowledge of the group size. If it is unknown but can be estimated, the 9 Theorem 3.1 would remain valid if Assumption 3.5 were replaced by the weaker condition λ ∈ 0 / −Supp(m − 1). However, Assumption 3.5 is required under this form in Theorems 3.2, 3.3, 3.4 and in Lemma 3.1.


402


measurement error problem comes back in a non-linear way. The issue of identification in this case is left for future research. 10 The nature of the group size effect provides another identifying assumption. Indeed, m may be correlated with α in a general way, but we cannot add interaction terms between the indicators 1{m = m∗ } (with m∗ ∈ Supp(m)) and the covariates to the list of regressors, since then Assumption 3.4 would fail. To see this, let us remark that, if β10 and β20 depend on m in an unspecified way, then we can still identify β(m) but not the structural parameters. On the other hand, identification of these structural parameters can be achieved if the dependence of β10 and β20 with respect to m takes a parametric form. 11 Of course, in this case, identification requires that m takes more than three different values. This also implies that the basic model where β10 , β20 and λ0 are constant across group sizes is overidentified as soon as we observe at least four different group sizes. A simple way to test this restriction is to estimate β(·) by using a within-group estimator for each group size, and then to implement the overidentification test for minimum distance estimators (see e.g. Wooldridge, 2002, p. 444). If β20 = −λ0 β10 , then λ0 and β20 cannot be identified without further restriction. To recover them, one can use the residual variance variation, under a homoscedasticity condition (see our Assumption 3.6 below). More precisely, the conditional variance of the residuals should not depend on the group size. This hypothesis is quite weak since it does not restrict the relationship between the residuals εri and the covariates xri . Moreover, under Assumption 3.6, one needs less variation across group sizes than previously, and we can replace Assumption 3.2 by Assumption 3.2 . A SSUMPTION 3.2 . Supp(m) contains at least two values. A SSUMPTION 3.6. V (U | n, m) = σ 2 In where In is the identity matrix of size n. T HEOREM 3.2. Under Assumptions 3.1, 3.2 and 3.3–3.6, (β10 , λ0 , β20 ) are identified. The idea of using second-order moments to identify peer effects has already been exploited by Glaeser et al. (1996) and Graham (2008). In particular, Graham (2008) develops a framework where composite peer effects can be identified through such a restriction. In his model, however, endogenous peer effects are not identified. 3.2. The binary model We now investigate whether the parameters are still identified when one cannot observe directly the outcome variable y˜i but only a rough binary measure of it, namely yi = 1{y˜i ≥ 0}. 12 For instance, when studying peer effects in the classroom, the analyst could observe only grade retention decisions rather than students’ efforts. Similarly, in criminal studies, the violence level chosen by an individual may depend on the violence level chosen by her peers. The level chosen in equilibrium is a continuous variable. However, the econometrician may only be able to observe 10 Following Schennach (2004), the model would still be identified if two independent measures of m were available. The remaining issue is whether the model is identified with only one measure, as it is in a linear model (see e.g. Lewbel, 1997). 11 For instance, we can write these parameters as affine transformations of m. This is equivalent to adding interaction terms between X and m. 12 The definition of identification that we use here is similar to the one introduced in Definition 3.1, except that Y˜ has to be replaced by Y, the vector of outcomes yi observed for the individuals sampled in the group. C The Author(s). Journal compilation C Royal Economic Society 2009.


403

a rough measure of this violence level, through criminal acts. This fits within our framework as long as doing criminal acts corresponds to being above a given threshold of violence. The binary model we consider is not a discrete choice model but rather a continuous choice model with imperfect observations of the choice. In discrete choice models, the econometrician observes the choice y˜i ∈ {1, . . . , p} of i. This choice depends on (y˜j )j =i , as in equation (2.1), but in a non-linear way. Such models have been studied by Brock and Durlauf (2001, 2007), Tamer (2003), Krauth (2006) and Bayer and Timmins (2007). The main challenge when making inference on these models is that, in general, multiple equilibria arise. This is not a concern here, as y˜i is uniquely defined by equation (2.1). When the outcome is a binary variable, the reduced-form equation (3.1) is useless for identification since Wn Y˜ has no observational counterpart. Instead, we rely on equation (3.2) below. L EMMA 3.1. Suppose that yi = 1{y˜i ≥ 0}, where y˜i satisfies equation (2.1), and that Assumption 3.5 holds. Then the model is observationally equivalent to the model generated by the following equation: m β20 β10 + β20 + x¯ β20 + yi = 1 xi β10 − λ0 + α(1 + λ0 (m)) m−1 m−1 1 − λ0 + ε¯ λ0 (m) + εi ≥ 0 , (3.2) where λ0 (m) = mλ0 /((m − 1)(1 − λ0 )). The term in brackets is a fixed group effect. Thus, we are led back to a binary model for panel data. Identification of such a model has been considered, among others, by Manski (1987), and our analysis relies on his paper. In the following, we denote by xjk the kth covariate of individual j. The following assumptions are needed for identification. A SSUMPTION 3.7. (ε1 , . . . , εm ) are exchangeable conditional on (m, x1 , . . . , xm , α). The support of ε1 + λ0 (m)¯ε conditional on (m, x1 , . . . , xm , α) is R, almost surely. A SSUMPTION 3.8. Let z = x2 − x1 . 13 The support of z is not contained in any proper linear subspace of RK , where K denotes the dimension of xi . A SSUMPTION 3.9. There exists k0 such that zk0 has everywhere a positive Lebesgue conditional k0 = 1. Without loss of generality, density given (m, z1 , . . . , zk0 −1 , zk0 +1 , . . . , zK ) and such that β10 we set k0 = 1. The first part of Assumption 3.7 holds for instance if, conditional on m and α, the residuals (εi )1≤i≤m are exchangeable and independent of the covariates (xi )1≤i≤m . In particular, Assumption 3.7 is satisfied if the (εi )1≤i≤m are i.i.d. and independent of (x1 , . . . , xm , m, α). The second part of Assumption 3.7 is a technical condition, which is identical to the second part of assumption 1 set forth by Manski (1987). Assumption 3.8 ensures that z varies enough within a group. As usually in binary models, one parameter must be normalized, and this is the purpose of Assumption 3.9. However, a small difficulty arises here, because the reduced form does not allow

13

Without loss of generality, we assume here that individuals 1 and 2 are observed.


404


us to identify the sign of the structural parameters. A sufficient condition is to fix one parameter 1 = 1. 14 (and not only its absolute value): thus we set β10 T HEOREM 3.3. Suppose that Assumptions 3.1–3.2, 3.5 and 3.7–3.9 hold. Then β10 is identified. Moreover, 1 (a) if β20 = β20 β10 , then β20 is identified, k 1 1 1 β10 , β20 is not identified and the other parameters β20 are identified up to β20 . (b) if β20 = β20

On the other hand, λ0 is not identified. If fewer parameters (i.e. fewer than those included in model (2.1)) are identified, Theorem 3.3 shows that the main attractive features of the method remain. Without any exclusion restriction and even if only two members of the groups are observed, β10 and β20 are generally identified. Similarly to the result set forth in Theorem 3.1, identification of β20 is lost when there is no 1 β10 = 0. The non-identifiability of λ0 is exogenous peer effect, because in this case β20 = β20 not surprising since this parameter only appears in the fixed effect and in the residuals (see equation (3.2)). Heuristically, without any assumption imposed on these terms, any λ0 can be rationalized by changing accordingly α and the residuals (εi )1≤i≤m . Thus, stronger assumptions are needed for identifying λ0 . One possibility is to observe x¯ and to restrict the dependence between the residuals and the covariates through the following assumptions: A SSUMPTION 3.2 . The support of m given x¯ has at least three elements with positive probability. A SSUMPTION 3.10. x¯ is observed. ¯ A SSUMPTION 3.11. (ε1 , . . . , εm , α) ⊥⊥ (x1 , . . . , xm ) | m, x. ¯ m 0 V (ε1 | x)I ¯ m) = A SSUMPTION 3.12. V (ε1 , . . . , εm , α | x, . ¯ 0 V (α | x) β20 β20 ¯ m), the support of x1 β10 − A SSUMPTION 3.13. Given (x, , x2 β10 − is m−1 m−1 R2 . Assumption 3.2 is slightly more restrictive than Assumption 3.2, but should hold most of the time. For instance, it is satisfied for a multinomial logit (or probit) model generating the ¯ As mentioned above, Assumption 3.10 is a restrictive conditional distribution of m given x. condition as it imposes either to observe all individuals in the group or to consider only covariates whose means are known. Assumption 3.11 is in the same spirit as Assumption 3.7. It restricts the dependence between α and the covariates to a dependence on the mean. Assumption 3.12 is the assumption of homoscedasticity in m; it is very similar to Assumption 3.6. The difference between both assumptions stems from the identifying equation we use in both cases. In the discrete model, α remains in equation (3.2), and thus its variance must be modelled as well as its covariance with the residuals (εi )1≤i≤m . 15 Finally, Assumption 3.13 is a condition of large 1 = −1. Obviously, Theorem 3.3 also holds with β10 The assumption of no covariance is not restrictive. Indeed, if there is a correlation between εi and α which does not depend on i, one can always reparametrize the model in order to make them uncorrelated. 14 15



405

support. In particular, it implies that m ≥ 3. Otherwise, indeed, the two variables belong to a line in R2 . 1 β10 , λ0 is also T HEOREM 3.4. Under Assumptions 3.1, 3.2 , 3.5 and 3.7–3.13, and if β20 = β20 identified.

4. ESTIMATION In this section, we restrict the analysis to the case where only 1{y˜i ≥ 0} is observed, since the continuous case is analysed in full detail by Lee (2007). We also restrict ourselves to a parametric setting with homoscedasticity that is characterized by the following assumptions: A SSUMPTION 4.1. The residuals (εi )1≤i≤m are i.i.d. and εi ∼ N (0, 1). ¯ m ∼ N (γ0 (m) + δ0 (m)x, ¯ σ02 ). A SSUMPTION 4.2. α | x, Assumption 4.1 imposes the normality of the residuals. This assumption is also imposed by Lee (2007) when he develops his conditional maximum likelihood estimator, or by McMillen (1992) and Beron and Vijverberg (2004), among others, when studying spatially dependent discrete choice models. Contrary to the previous section, we adopt here the usual normalization by supposing that the variance of the residuals is equal to one. Assumption 4.2 has two consequences. First, it strengthens Assumptions 3.11 and 3.12 by introducing a linear ¯ conditional on m. Note that the dependence dependence a` la Mundlak (1961) between α and x, between α and m remains very flexible. Secondly, Assumption 4.2 imposes the normality of the residual term, in a similar way to the standard random effect probit. Under these conditions, the model is fully identified, as in Theorem 3.4 but in a more direct way. Indeed, β10 and β20 can be identified through group size variations. Moreover, the model can be written in this case as β20 + x¯ δ0 (m) − vi ≥ 0 , (4.1) yi = 1 γ0 (m) + xi β10 − m−1 where γ0 (m) and δ0 (m) depend on γ0 (m), δ0 (m) and on the parameters of the model, the error ¯ Conditional on term vi being a combination of (εi )1≤i≤m with the residual α − γ0 (m) − δ0 (m)x. m, the vector (vi )1≤i≤m is normally distributed and exchangeable, with 1 2 2 , V (vi | m) = 1 + σ0 + λ0 (m)(2 + λ0 (m)) σ0 + m 1 Cov(vi , vj | m) = σ02 + λ0 (m)(2 + λ0 (m)) σ02 + , ∀ i = j . m One can show that when m varies, it is possible to separate λ0 from σ02 in the covariances (or in the variance). Now, let us suppose that we observe a sample of R groups where, for the sake of simplicity, ¯ Hence, for group r, all members in each group are observed (even if we only need to observe x). we observe its size mr , the vector of outcomes Yr = (yr1 , . . . , yrmr ) and the vector of covariates Xr = (xr1 , . . . , xrmr ). We suppose that the sizes (mr )1≤r≤R are i.i.d., and that (Xr , αr , Vr )1≤r≤R are independent and distributed according to FX,α,V | m,n , where V is the vector of unobserved C The Author(s). Journal compilation C Royal Economic Society 2009.

406


shocks (v1 , . . . , vm ). In the previous example of peer effects in the classroom, this condition imposes that there is no spillover between classrooms. Let θ = (β1 , β2 , λ, σ 2 , (γ (m∗ ), δ (m∗ ))m∗ ∈Supp(m) ) denote the vector of all parameters. Under the previous i.i.d. assumption, the likelihood of the whole sample satisfies L(Y1 , . . . , YR | m1 , . . . , mR , X1 , . . . XR , θ ) =

R

L(Yr | mr , Xr , θ ),

r=1

where L(Yr | mr , Xr , θ ) denotes the likelihood for group r. Moreover, by using (4.1), we can write this likelihood as: L(Yr | mr , Xr , θ ) β2 + x¯r δ (mr ) , . . . , = Pr (2yr1 − 1)vr1 ≤ (2yr1 − 1) γ (mr ) + xr1 β1 − mr − 1 β2 (2yrmr − 1)vrmr ≤ (2yrmr − 1) γ (mr ) + xrmr β1 − + x¯r δ (mr ) . mr − 1 This is the probability that a multivariate normal vector belongs to a hyper-rectangle in Rmr . Such a probability can be estimated, for instance, by the GHK algorithm (Geweke, 1989, Keane, 1994, and Hajivassiliou et al., 1996). Thus, the model can be estimated by simulated maximum likelihood.

5. MONTE CARLO SIMULATIONS In this section, we investigate the finite sample performance of our estimator. The sample data are generated with one regressor xri ∼ N (0, 4), the (xri )r,i being independent for all r and i. The true parameters are β10 = 1, β20 = 1, λ = 0.2, σ02 = 0.5, γ (m) = 0 for all m, and δ(m) = 0.1 for all m. As Lee (2007), we consider a case where the average size group is small, and another where it is relatively large. In the first case, the group sizes vary from 3 to 8, the number of groups of each size being the same. In the relatively large case, they range from 15 to 25. The first case could be realistic for groups of good friends or roommates for instance, whereas the second one could correspond to groups of students in a classroom. In each case, we consider different sample sizes from 330 to 21,120. In the GHK algorithm, we use Halton sequences instead of standard uniform random numbers as they improve, on average, the accuracy of the integral estimation (see e.g. Sándor and András, 2004). In the small group case where the dimension of the integral is low, we rely on 25 replications, whereas we utilize 50 replications in the large group case. Table 1 displays our results. The first striking point is that sample sizes must be quite large to obtain satisfactory results. If we compare the results of our small groups scenario with the one considered by Lee (see Lee, 2007, Table 1, Model SG-SX), it seems that, observing a binary measure of y˜i instead of y˜i itself leads to rather large biases for even moderately large sample sizes. 16 In particular, the bias on λ0 is systematically negative for small and moderately large sample sizes. The second striking result is the influence of the group sizes. The accuracy of the 16 Note that it is difficult to compare our large group scenario with the one studied by Lee, since he considers a model 1 = 0), while x affects y with two independent covariates x1i and x2i such that x1i has only a direct effect on yi (i.e. β20 2i i 2 = 0). only through exogenous peer effects (so that β10 C The Author(s). Journal compilation C Royal Economic Society 2009.

407


Sample size

Table 1. Results of the Monte Carlo simulations. Small groups

Large groups

Parameter

Mean

Std. err.

Mean

Std. err.

660

β10 β20 λ0

0.9975 0.8956 −0.0304

0.2254 0.8877 0.5688

1.0128 1.4445 −0.3801

0.1658 2.7601 0.6600

1320

β10 β20

1.0029 0.9823

0.1198 0.4885

1.0025 0.9780

0.0865 1.4712

2640

λ0 β10 β20

0.1158 0.9936 0.9378

0.3458 0.0951 0.3739

−0.0026 0.9978 1.0761

0.3093 0.0678 0.8625

5280

λ0 β10 β20

0.1831 0.9904 0.9744

0.1405 0.0664 0.2425

0.1247 1.0001 1.0264

0.1833 0.0419 0.5747

10,560

λ0 β10

0.1927 0.9914

0.0678 0.0451

0.1620 1.0014

0.1167 0.0285

21,120

β20 λ0 β10

0.9708 0.2000 0.9911

0.1690 0.0389 0.0295

1.0240 0.1788 0.9984

0.4303 0.0513 0.0180

β20 λ0

0.9872 0.1897

0.1065 0.0284

0.9777 0.1950

0.2847 0.0311

Notes: The small groups scenario corresponds to a sample composed of groups whose size goes from 3 to 8, the number of groups of different sizes being equal. The large groups scenario corresponds to a sample of groups whose size goes from 15 to 25, the number of groups of different sizes being still equal.

estimator of β20 in large groups is approximately the same as the one in small groups, but with a sample four times larger. This is not surprising, since identification of peer effects becomes weak as the sample size increases (see Lee, 2007). The parameter λ0 is also better estimated with small groups, but the difference between the two designs seems to decrease when the sample size grows. On the other hand, and quite surprisingly, the estimator of β10 is more precise in large groups.

6. CONCLUSION This paper considers identification and estimation of social interaction models using group size variation. Provided that the sizes of the groups are known and vary sufficiently, endogenous and exogenous peer effects can be identified without any exclusion restriction in the linear-in-means model. The result can be extended to a binary outcome model. In this case, exogenous peer effects are also identified under weak assumptions. Identification of endogenous peer effects is more stringent, as it requires a homoscedasticity condition and restrictions on the dependence between fixed group effects and covariates. Our paper has two main limitations. First, the size of each group is assumed to be known. However, as emphasized by Manski (2000), it is often difficult to define groups a priori. This C The Author(s). Journal compilation C Royal Economic Society 2009.

408


criticism is common to all models of social interactions, but may be especially problematic here. Indeed, ignoring the boundaries of the group leads (among other difficulties) to measurement errors on the group size, which could prevent identification. Secondly, we do not consider a fully non-parametric regression. The issue of whether group size variation has an identifying power in this general case should be examined in future research.

ACKNOWLEDGMENTS We would like to thank the editor, two anonymous referees, Stéphane Grégoir, Steve Machin, Amine Ouazad, as well as the participants in the CEPR Summer School on ‘The Economics of Education and Education Policy in Europe’ (Padova) and in the CREST seminar (Paris) for their useful comments.

REFERENCES Ammermueller, A. and J. S. Pischke (2006). Peer effects in European primary schools: evidence from PIRLS. IZA Discussion Paper No. 2077, Institute for the Study of Labor (IZA). Bayer, P. and C. Timmins (2007). Estimating equilibrium models of sorting across location. Economic Journal 117, 353–74. Beron, K. J. and W. P. M. Vijverberg (2004). Probit in a spatial context: a Monte Carlo analysis. In L. Anselin, R. J. G. M. Florax and S. J. Rey (Eds.), Advances in Spatial Econometrics: Methodology, Tools and Applications, 169–96. Berlin: Springer-Verlag. Bramoullé, Y., H. Djebbari and B. Fortin (2009). Identification of peer effects through social networks. Journal of Econometrics 50, 41–55. Brock, W. A. and S. M. Durlauf (2001). Discrete choice with social interactions. Review of Economic Studies 68, 235–60. Brock, W. A. and S. M. Durlauf (2007). Identification of binary choice models with social interactions. Journal of Econometrics 140, 52–75. Calvó-Armengol, J., E. Pattacchini and Y. Zenou (2008). Peer effects and social networks in education. Forthcoming in Review of Economic Studies. Case, A. (1992). Neighborhood influence and technological change. Regional Science and Urban Economics 22, 491–508. Cooley, J. (2007). Desegregation and the achievement gap: do diverse peers help? Working paper, Duke University. Duflo, E. and E. Saez (2003). The role of information and social interactions in retirement plan decisions: evidence from a randomized experiment. Quarterly Journal of Economics 118, 815–42. Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration. Econometrica 57, 1317–39. Graham, B. S. (2008). Identifying social interactions through excess variance contrasts. Econometrica 76, 643–60. Graham, B. S. and J. Hahn (2005). Identification and estimation of the linear-in-means model of social interactions. Economics Letters 88, 1–6. Hajivassiliou, V. A., D. McFadden and P. Ruud (1996). Simulation of multivariate normal rectangle probabilities and their derivatives: theoretical and computational results. Journal of Econometrics 72, 85–134. C The Author(s). Journal compilation C Royal Economic Society 2009.


409

Hoxby, C. (2000). Peer effects in the classroom: learning from gender and race variation. NBER Working Paper No. 7867, National Bureau of Economic Research. Keane, M. (1994). A computationally practical simulation estimator for panel data. Econometrica 62, 95– 116. Klier, T. and D. P. McMillen (2008). Clustering of auto supplier plants in the United States: generalized method of moments spatial logit for large samples. Journal of Business and Economic Statistics 26, 460–71. Krauth, B. V. (2006). Simulation-based estimation of peer effects. Journal of Econometrics 133, 243–71. Lazear, E. (2001). Educational production. Quarterly Journal of Economics 116, 777–803. Lee, L. F. (2007). Identification and estimation of econometric models with group interactions, contextual factors and fixed effects. Journal of Econometrics 140, 333–74. Lewbel, A. (1997). Constructing instruments for regressions with measurement error when no additional data are available, with an application to patents and R&D. Econometrica 65, 1201–13. Manski, C. F. (1987). Semiparametric analysis of random effects linear models from binary panel data. Econometrica 55, 357–62. Manski, C. F. (1993). Identification of endogenous social effects: the reflection problem. Review of Economic Studies 60, 531–42. Manski, C. F. (2000). Economic analysis of social interactions. Journal of Economic Perspectives 14, 115– 36. McMillen, D. P. (1992). Probit with spatial autocorrelation. Journal of Regional Science 32, 335–48. Mundlak, Y. (1961). Empirical production function free of management bias. Journal of Farm Economics 43, 44–56. Pinkse, J. and M. E. Slade (1998). Contracting in space: an application of spatial statistics to discrete-choice models. Journal of Econometrics 85, 125–54. Rees, D. I., J. S. Zax and J. Herries (2003). Interdependence in worker productivity. Journal of Applied Econometrics 18, 585–604. Sándor, Z. and P. András (2004). Alternative sampling methods for estimating multivariate normal probabilities. Journal of Econometrics 120, 207–34. Schennach, S. M. (2004). Estimation of non-linear models with measurement error. Econometrica 72, 33– 75. Tamer, E. (2003). Incomplete simultaneous discrete response models with multiple equilibria. Review of Economic Studies 70, 147–65. Wooldridge, J. M. (2002). The Econometrics of Cross-Section and Panel Data. Cambridge, MA: MIT Press.

APPENDIX A: PROOFS In all the proofs, t ∗ denotes a possible value of the random variable t. Proof of Theorem 3.1: First, under Assumption 3.3, E(X Wn U | n, m) = 0. Thus, by Assumption 3.4, β(m∗ ) is identified for all m∗ ∈ Supp(m). We now prove that the knowledge of m∗ → β(m∗ ) allows in general to identify the structural parameters. Let (m∗1 , m∗2 ) ∈ Supp(m)2 . Then ∗

∗ m1 − 1 β10 − β20 m2 − 1 β10 − β20 = m∗1 − 1 + λ0 m∗2 − 1 + λ0 is equivalent to

(β10 λ0 + β20 ) m∗1 − m∗2 = 0. C The Author(s). Journal compilation C Royal Economic Society 2009.

410


Hence, if β20 = −λ0 β10 , β(·) is constant. In the opposite case, β(·) is a one-to-one mapping. In the first case, β(m∗ ) = β10 for all m∗ . Thus β10 is identified, but λ0 cannot be identified by β(·). Since β20 = −λ0 β10 , β20 is identified up to a constant. Now suppose that β20 = −λ0 β10 . Let (m∗0 , m∗1 , m∗2 ) be three different values in Supp(m). We will prove that the knowledge of β(m∗0 ), β(m∗1 ) and β(m∗2 ) allows to identify (β10 , λ0 , β20 ). This amounts to show that the system

⎧ ∗ β m λ − m∗ − 1 β10 + β20 = −β m∗0 m∗0 − 1 ⎪ ⎨ 0 0 0

β m∗1 λ0 − m∗1 − 1 β10 + β20 = −β m∗1 m∗1 − 1 ⎪

⎩ ∗ β m2 λ0 − m∗2 − 1 β10 + β20 = −β m∗2 m∗2 − 1 has a unique solution. Using the matrix form, we can rewrite the system as Aζ0 = B where ζ0 = (λ0 , β10 , β20 ) . If det(A) = 0, ζ0 is identified. Suppose that det(A) = 0. Then com(A) B = 0 where com(A) denotes the comatrix of A. By using the first line of this equation and the expression of det(A), we get: ⎧

⎨ m∗2 − m∗1 β m∗0 + m∗0 − m∗2 β m∗1 + m∗1 − m∗0 β m∗2 = 0 ⎩ m∗ − 1 m∗ − m∗ β m∗ + m∗ − 1 m∗ − m∗ β m∗ + m∗ − 1 m∗ − m∗ β m∗ = 0. 0 2 1 0 1 0 2 1 2 1 0 2 Hence,

m∗2 − m∗1 β m∗0 + m∗0 − m∗2 β m∗1 + m∗1 − m∗0 β m∗2 = 0

m∗0 m∗2 − m∗1 β m∗0 + m∗1 m∗0 − m∗2 β m∗1 + m∗2 m∗1 − m∗0 β m∗2 = 0.

Thus, ∗

m2 − m∗1 β m∗0 + m∗0 − m∗2 β m∗1 + m∗1 − m∗0 β m∗2 = 0 ∗

m0 − m∗2 m∗2 − m∗1 β m∗0 + m∗0 − m∗2 m∗1 − m∗2 β m∗1 = 0. Because m∗1 = m∗2 and m∗0 = m∗2 , this implies that β(m∗1 ) = β(m∗0 ), which is in contradiction with the fact that β(·) is a one-to-one mapping. Thus, det(A) = 0 and ζ0 is identified. Wn U ∗ ∗ is known. Thus, under Proof of Theorem 3.2: Because m → β(m ) is identified, V λ0 n, m 1+ m−1 Assumption 3.6, Wn U σ2 n, m = V

2 Wn . λ0 1 + m−1 1 + λ0 m−1

Hence, for all

m∗1

=

m∗2

∈ Supp(m) , 2

1+ C≡ 1+

λ0 m∗1 −1 λ0 m∗2 −1

2 2

is identified. Under Assumption 3.5, 1 + λ0 /(m∗ − 1) > 0 for all m∗ ∈ Supp(m). Thus, √ √ 1 C − λ0 = 1 − C. m∗1 − 1 m∗2 − 1 √

C − It is clear that ( m∗ −1 1 identified.

1 ) m∗2 −1

= 0. Otherwise, C = 1 and m∗1 = m∗2 , which is a contradiction. Thus, λ0 is


411


Then, because m∗ → β(m∗ ) is identified, β10 − β20 /(m∗ − 1) is known for all m∗ ∈ Supp(m). Taking two different values for m∗ allows to identify β20 and thus β10 . Proof of Lemma 3.1: Taking the mean in both sides of equation (2.1) leads to α ε¯ β10 + β20 + + , y¯˜ = x¯ 1 − λ0 1 − λ0 1 − λ0 since 1/(1 − λ0 ) exists, according to Assumption 3.5. Because j =i y˜j = my¯˜ − y˜i and xi , equation (2.1) is then equivalent to m λ0 β20 β10 + β20 = xi β10 − + x¯ β20 + y˜i 1 + λ0 m−1 m−1 m−1 1 − λ0 λ0 λ0 m m + ε¯ + εi . +α 1 + m − 1 1 − λ0 m − 1 1 − λ0

j =i

xj = mx¯ −

Now, under Assumption 3.5, 1 + λ0 /(m − 1) > 0, so that y˜i ≥ 0 if and only if y˜i (1 + λ0 /(m − 1)) ≥ 0. Thus, under Assumption 3.5, yi = 1{y˜i ≥ 0}, where y˜i satisfies equation (2.1), is observationally equivalent to yi satisfying equation (3.2). Proof of Theorem 3.3: Assumption 3.7 implies that the conditional distribution of εi + λ0 (m)¯ε is identical for every i. Thus assumption 1 in Manski (1987) is satisfied and, using our Assumptions 3.8 and 3.9, we can 1 |. The first term of the vector, apply directly Manski’s result to identify ((m − 1)β10 − β20 )/|m − 1 − β20 1 1 1 ((m − 1)β10 − β20 )/|m − 1 − β20 |, is also identified. By Assumption 3.9, (m−1)β10 −β20

1 (m − 1)β10 − β20 |m−1−β20 | ˜ = , β(m) ≡ 1 1 1 (m−1)β10 −β20 m − 1 − β20 1 |m−1−β20 |

˜ so that β(m) is identified as the ratio of two known terms. The rest of the proof for the identification of 1 . (β10 , β20 ) follows the same development as the one used for Theorem 3.1, λ0 being replaced by −β20 However, λ0 cannot be identified. Indeed, let λ0 = λ0 , and define εi = εi + ε¯

m(λ0 − λ0 ) . (m − 1 + λ0 )(1 − λ0 )

Finally, let α =

¯ 10 + β20 )(λ0 − λ0 ) + α(m − 1 + λ0 )(1 − λ0 ) mx(β . (m − 1 + λ0 )(1 − λ0 )

Then the parameters (λ0 , α , ε1 , . . . , εm ) are observationally equivalent to those characterizing the initial model. Indeed, we can check that they lead to equation (3.2) as well. Moreover, conditioning on (m, x1 , . . . , xm , α ) is equivalent to conditioning on (m, x1 , . . . , xm , α), and conditional exchangeability of (ε1 , . . . , εm ) implies conditional exchangeability of (ε1 , . . . , εm ). Furthermore, letting λ0 (m) = mλ0 /((m − 1)(1 − λ0 )), we get Fε1 +ε¯ λ0 (m) | m=m∗ ,x1 =x1∗ ,...,xm =xm∗ ,α =α ∗ = Fε1 +¯ελ0 (m) | m=m∗ ,x1 =x1∗ ,...,xm =xm∗ ,α=α∗ , where α∗ =

(m − 1 + λ0 )(1 − λ0 )α ∗ − mx¯ ∗ (β10 + β20 )(λ0 − λ0 ) (m − 1 + λ0 )(1 − λ0 )


412


and x¯ ∗ = (1/m) i xi∗ . Thus, the second part of Assumption 3.7 also holds with (λ0 , α , ε1 , . . . , εm ). This shows that λ0 is not identified. Proof of Theorem 3.4: Let θ0 = λ0 /(1 − λ0 ) and m m m νi = x¯ [β20 + θ0 (β10 + β20 )] + α 1 + θ0 + ε¯ θ0 + εi . m−1 m−1 m−1 Note that Fν1 ,...,νm | x1 ,...,xm ,m = Fν1 ,...,νm | x,m ¯ . Indeed ∗

Fν1 ,...,νm | x1 ,...,xm ,m ν1 , . . . , νm∗ | x1∗ , . . . , xm∗ , m∗ !

= Fν1 ,...,νm | x1 ,...,xm ,m,α ν1∗ , . . . , νm∗ | x1∗ , . . . , xm∗ , m∗ , α ∗ dFα | x1 ,...,xm ,m α ∗ | x1∗ , . . . , xm∗ , m∗ ! ∗

∗ ¯ ∗ , m) ν1 , . . . , νm∗ | x¯ ∗ , α ∗ , m∗ dFα | x,m = Fν1 ,...,νm | x,α,m ¯ ¯ (α | x ∗

ν1 , . . . , νm∗ | x¯ ∗ , m∗ , = Fν1 ,...,νm | x,m ¯ where the third line is derived from Assumption 3.11 and the fact that, given (x1 , . . . , xm , m, α), (ν1 , . . . , νm ) is a deterministic function of (ε1 , . . . , εm ). Moreover,

Pr y1 = 0, y2 = 0 | x1 = x1∗ , x2 = x2∗ , x¯ = x, m = m∗ β20 β20 x1 = x ∗ , x2 = x ∗ , x¯ = x, m = m∗ , ν2 ≤ −x2∗ β10 − = Pr ν1 ≤ −x1∗ β10 − 1 2 m−1 m−1 β20 β20 x, m∗ . −x1∗ β10 − ∗ , −x2∗ β10 − ∗ = Fν1 ,ν2 | x,m ¯ m −1 m −1 Since Theorem 3.3 implies that (β10 , β20 ) is identified, x1∗ (β10 − β20 /(m∗ − 1)) and x2∗ (β10 − β20 /(m∗ − 1)) are known. Moreover, x¯ is observed so that the first term is identified on the whole support of (x1 , x2 ) ¯ m). Thus, by Assumption 3.13, making (x1 , x2 ) vary allows us to identify the whole conditional on (x, conditional distribution of (ν1 , ν2 ) given x¯ and m. Now, by Assumption 3.12, m ¯ m) = Cov ε¯ ¯ m = V (ε1 | x), ¯ θ0 + ε1 , ε1 − ε2 | x, Cov(ν1 , ν1 − ν2 | x, m−1 so that the right-hand-side term is identified. Moreover, a little algebra shows that # " ¯ m) = m2 (1 + θ0 )2 V (α | x) ¯ + m [−2(1 + θ0 )V (α | x) ¯ (m − 1)2 Cov(ν1 , ν2 | x, ¯ + [V (α | x) ¯ − 2θ0 V (ε1 | x)] ¯ . + θ0 (2 + θ0 )V (ε1 | x)] ¯ this is a regression of the (known) left term on (m2 , m, 1). By Assumption 3.2 , there Conditional on x, exists a set A of positive probability such that m can take three different values with positive probability, given that x¯ = x ∗ for all x ∗ ∈ A. Thus, the coefficients (a, b, c) of this regression can be identified. These coefficients depend on x¯ but, for the sake of simplicity, this dependence is let implicit in the following. We show now that the knowledge of these coefficients implies that θ0 is identified. The conclusion follows since θ0 is in a one-to-one relationship with λ0 . ¯ ¯ We also define a = a/V (ε1 | x), ¯ b = (ε1 | x). First, we set φ0 = 1 + θ0 and ρ0 = V (α | x)/V ¯ + 1 and c = c/V (ε1 | x) ¯ − 2. Then a , b and c are identified, and b/V (ε1 | x) ⎧ φ02 ρ0 = a ⎪ ⎨ −2φ0 ρ0 + φ02 = b ⎪ ⎩ ρ0 − 2φ0 = c . C The Author(s). Journal compilation C Royal Economic Society 2009.

Identification of peer effects using group size variation Replacing ρ0 by c + 2φ0 in the first and second equations leads to ⎧ ⎪ φ 3 + c2 φ02 − a2 = 0 ⎪ ⎨ 0 φ02 + 2c3 φ0 + b3 = 0 ⎪ ⎪ ⎩ ρ − 2φ = c . 0

413

(A.1)

0

This system admits at most two solutions in terms of (ρ, φ). Suppose that there exist two different solutions, and let (ρ1 , φ1 ) denote the second one. Then we can write the polynomial of the first equation as a product in which one factor is the polynomial of the second equation. Hence, there exists x such that, for all φ ∈ R, c a 2c 2 b = φ2 + φ + (φ + x). φ3 + φ2 − 2 2 3 3 Thus, x = −c /6 and 2c x = −b , which implies that c2 = 3b . Replacing b and c by their values in terms of φi and ρi (i ∈ {0, 1}), we obtain, for i ∈ {0, 1},

3 −2φi ρi + φi2 − (ρi − 2φi )2 = 0. This equation is equivalent to φi + ρi = 0. Replacing ρi by −φi in c yields φ0 = φ1 = −c /3 and thus also ρ0 = ρ1 . This contradicts our assumption that (ρ0 , φ0 ) = (ρ1 , φ1 ). Thus φ0 is identified by system (A.1), and the conclusion follows.



Testing for the cointegrating rank of a vector autoregressive process with uncertain deterministic trend term ‡ ¨ M ATEI D EMETRESCU † , H ELMUT L UTKEPOHL AND P ENTTI S AIKKONEN § †

Applied Econometrics, Goethe University Frankfurt, Grüneburgplatz 1 (Postfach RuW 49), D-60629 Frankfurt, Germany E-mail: [email protected] ‡

Department of Economics, European University Institute, Via della Piazzola 43, I-50133 Firenze, Italy E-mail: [email protected] §

Department of Mathematics and Statistics, University of Helsinki, P.O. Box 68 (Gustaf Hällströmin katu 2b), FIN-00014 University of Helsinki, Finland E-mail: [email protected] First version received: November 2008; final version accepted: July 2009

Summary When applying Johansen’s procedure for determining the cointegrating rank to systems of variables with linear deterministic trends, there are two possible tests to choose from. One test allows for a trend in the cointegration relations and the other one restricts the trend to being orthogonal to the cointegration relations. The first test is known to have reduced power relative to the second one if there is in fact no trend in the cointegration relations, whereas the second one is based on a misspecified model if the linear trend is not orthogonal to the cointegration relations. Hence, the treatment of the linear trend term is crucial for the outcome of the rank determination procedure. We compare three alternative procedures, which are applicable if there is uncertainty regarding the proper trend specification. In the first one a specific cointegrating rank is rejected if one of the two tests rejects, in the second one the trend term is decided upon by a pretest and in the third procedure only tests which allow for an unrestricted trend term are used. We provide theoretical asymptotic and small sample simulation results, which show that the first strategy is preferable in applied work. Keywords: Cointegration analysis, Likelihood ratio test, Vector autoregressive model, Vector error correction model.

1. INTRODUCTION In a vector autoregressive (VAR) analysis with integrated variables determining the cointegrating rank is central for setting up a well-specified model. The most popular method used for this purpose is the Johansen (1995) sequence of cointegrating rank tests which are based on the likelihood ratio (LR) principle. It is well known that the asymptotic distributions of these tests depend on the deterministic term, which is present in the data-generation process. Moreover, it is C The Author(s). Journal compilation C Royal Economic Society 2009. Published by Blackwell Publishing Ltd, 9600 Garsington Road,


Testing for the cointegrating rank of a VAR process with uncertain trend

415

also known that the power of the tests depends on the deterministic term allowed for in the model. More precisely, if the deterministic term is over-specified the power may suffer substantially (Doornik et al., 1998, Saikkonen and Lütkepohl, 1999, 2000); if a linear trend is allowed for while a constant is sufficient to capture the data properties, a more powerful version of the cointegrating rank test can be obtained by allowing only for a constant and no linear trend. Johansen (1995) also proposes tests that can help in choosing the deterministic term. Hence, given a null hypothesis of a specific cointegrating rank, one may test for the deterministic term first and then use the cointegrating rank test with the deterministic term suggested by the pretest. Pretesting has in fact been reported in the literature, e.g. by Crowder and Hoffman (1996) and Peytrignet and Stahel (1998). On the other hand, practitioners often proceed in a different way if there is uncertainty regarding the deterministic term. They perform tests based on models with different possible deterministic terms and then decide on the cointegrating rank in some way taking into account all the test results (e.g. Hubrich, 2001). In this study, we will formalize this procedure and compare it to the aforementioned pretest procedure. Moreover, we will compare these two procedures to one which includes an unrestricted trend term in all tests to be on the safe side and to avoid misspecification due to under-specifying the linear trend. This procedure is included in the comparison to determine possible losses or gains incurred by accounting for the possibility of a more restrictive trend term. In this context, the three most popular model versions in applied work are: (i) a model for variables without linear trend, (ii) a model where at least one of the variables has a linear trend but the cointegration relations are trend-free and (iii) a model with a general linear trend which may also be part of the cointegration relations. In practice many economic variables are known to have a deterministic time trend. Moreover, it can be checked by univariate tests whether some of the variables are well modelled by including a linear trend. Perron and Yabu (2009), for instance, have proposed relevant tests which are valid for both stationary and unit root processes. If a deterministic linear trend is found, the choice between (ii) and (iii) becomes relevant. The types of cointegration accommodated by models (ii) and (iii) were labelled ‘deterministic cointegration’ and ‘stochastic cointegration’, respectively, by Perron and Campbell (1993). For the practitioner, the main problem in the multivariate case becomes thus the choice between (ii) and (iii). Hence, we will focus on this case in the following. Clearly, focusing on a decision between (ii) and (iii) assumes that some pretesting for a linear trend has been done on the univariate series. Such pretesting gives rise to additional questions regarding the properties of the overall procedure. These questions are quite delicate and challenging as is known from Harvey et al. (2009), for example. Still allowing for uncertainty regarding the deterministic linear trend term in the model in testing for the cointegrating rank is a relevant problem and this is the subject of the present paper. It appears that many applied economists have a preference for (ii) based on a priori grounds. If a cointegration relation is interpreted as an equilibrium relation, a linear trend in that relation may not be very plausible. For example, Ericsson and Sharma (1998), Coenen and Vega (2001), Lettau and Ludvigson (2001), Funke and Rahn (2005), Ribba (2006) and Stephan (2006) apply cointegration tests which allow for a linear trend in the variables but not in the cointegration relations in various contexts. On the other hand, Hubrich (2001) applies both tests with and without allowing for a linear trend in the cointegration relations and she checks the robustness of her results. Crowder and Hoffman (1996) perform a test for the correct trend specification and, based on its outcome, eliminate the trend from the cointegration relations. In contrast, Brüggemann (2003) and Breitung et al. (2004) maintain a linear trend C The Author(s). Journal compilation C Royal Economic Society 2009.

416

M. Demetrescu, H. Lütkepohl and P. Saikkonen

in the cointegration relations in their rank tests. Clearly, if the individual variables are well modelled with a linear trend, it is useful to allow for a linear trend in the cointegration relations as long as the number and type of cointegration relations are still unknown. In that situation, where the cointegration relations are still unknown, it is difficult to argue that the economic relations captured by the cointegration relations should not have trends and, hence, trends are excluded from the cointegration relations. Therefore, one could make a case for always including trends in the cointegration relations at the stage where the cointegrating rank is tested. In this paper, we extend and generalize ideas put forward by Perron (1988) and Harvey et al. (2009) in the context of testing for unit roots. Harvey et al. (2009) show that if there is uncertainty regarding a linear trend term, applying both tests with and without such a term is a good strategy. They obtain this result because the unit root test without trend is conservative if there is actually a deterministic trend. Perron (1988), on the other hand, shows that unit root tests have no power under the alternative if a relevant time trend is omitted. We shall show in the following that a testing sequence for the cointegrating rank of a VAR process based on the LR test which assumes a trend orthogonal to the cointegration relations is asymptotically likely to end up with a cointegrating rank smaller than the true one if the linear trend is in fact also in the cointegration relations. This result suggests that, similarly to the case considered by Harvey et al. (2009), if there is uncertainty in applied work with respect to the correct trend specification, one may perform both tests, with and without trend in the cointegration relations, and reject a given cointegrating rank if one of the tests rejects. This procedure will be shown to work well relative to a procedure based on pretesting for the correct trend specification and one that always uses a general linear trend. The structure of this study is as follows. In the next section, the general model set-up is presented. In Section 3, the procedures for determining the cointegrating rank are discussed and in Section 4 the results of a small sample comparison of these procedures are reported. Section 5 concludes. Finally, the Appendix contains the derivation of the limiting distribution of the cointegrating rank test applied to a misspecified model which does not allow for a linear time trend in the cointegration relations, although there is one. Throughout the paper, we use the following abbreviations: ML for maximum likelihood, LR for likelihood ratio, DGP for data-generation process, VAR for vector autoregressive and VECM for vector error correction model. Moreover, the differencing operator is signified by , that is, for a stochastic process xt , xt = xt − xt−1 . A stationary (short memory) or asymptotically stationary process will sometimes be referred to as an I (0) process and a nonstationary process which becomes stationary after differencing once is called I (1) process. A normal (Gaussian) distribution with mean μ and variance (covariance matrix) is denoted by N (μ, ). Furthermore, R stands for the set of real numbers. For a matrix A, rk(A) denotes its rank and A⊥ denotes an orthogonal complement.

2. THE MODEL SET-UP We consider a K-dimensional system of I (1) variables yt = (y1t , . . . , yKt ) with deterministic term μt such that yt = μt + xt ,

(2.1)



417

where μt = μ0 + μ1 t is a K-dimensional linear trend term and xt is a K-dimensional zero mean VAR(p) process with VECM representation: xt = xt−1 + 1 xt−1 + · · · + p−1 xt−p+1 + ut .

(2.2)

The (K × K) matrix is assumed to have rank r which is the cointegrating rank of xt and, hence, of yt . The j ’s (j = 1, . . . , p − 1) are (K × K) coefficient matrices and the error term ut is an independently, identically distributed white noise process with zero mean and non-singular covariance matrix E(ut ut ) = u . For simplicity, we also assume that ut is Gaussian. Thereby our tests are proper LR tests. For our arguments, this assumption is not essential and our results hold under more general assumptions, as usual. In fact, our results are valid whenever the cointegrating rank tests to be discussed in the following have their usual asymptotic properties. For the deterministic term, we consider the following alternative possibilities: (1) (2)

μ1 = 0 and μ1 = 0, that is, there is a deterministic linear trend in the variables which is, however, orthogonal to the cointegration relations. μ1 = 0 and μ1 = 0, that is, the linear trend is fully general and, hence, it is also part of the cointegration relations.

Notice that rk() = r implies that = αβ for suitable (K × r) matrices α and β of rank r and β yt represents the cointegration relations. Hence, μ1 = 0 is equivalent to β μ1 = 0 which shows that μ1 = 0 is just another way of stating that the linear trend is orthogonal to the cointegration relations. For both linear trend specifications, we can write the generation process of the observed variables yt in VECM form as (i) yt = ν + (i) yt−1 + 1 yt−1 + · · · + p−1 yt−p+1 + ut ,

(2.3)

where ν is an intercept term and the superscript i refers to the two cases of deterministic terms. Hence, (i)

=

(K × K), ∗

(K × (K + 1)),

for i = 1, for i = 2.

(2.4)

Here the first K columns of ∗ are equal to . Accordingly, (i) yt−1

=

yt−1 ,

for i = 1,

, t − 1) , (yt−1

for i = 2

(2.5)

(see e.g. Lütkepohl, 2005, section 6.4, for details). For given cointegrating rank r, the relevant model can be estimated by Johansen’s reduced rank regression method in both cases. Under our Gaussian assumptions, this method delivers ML estimators. Since the cointegrating rank r is usually unknown, testing procedures for determining r will be discussed in the next section. C The Author(s). Journal compilation C Royal Economic Society 2009.

418


3. TESTING FOR THE COINTEGRATING RANK In the context of the model set-up presented in the previous section, we are interested in finding the cointegrating rank r. This quantity is typically chosen by testing a sequence of hypotheses H0 (r0 ) : rk() = r0

versus

H1 (r0 ) : rk() > r0

(3.1)

for r0 = 0, 1, . . . , K − 1. The first rank r0 for which the null hypothesis cannot be rejected is then chosen as an estimate for r. Alternatively, one may consider tests of H0 : rk() = r0 versus H1 : rk() = r0 + 1. This choice would result in a completely analogous discussion and is therefore not treated here in order to save space. Because Gaussian ML estimation is straightforward, LR tests can readily be used for testing (3.1) (Johansen, 1995). In the following we will denote by LR(r0 ) the LR statistic based on a model with intercept only, yt = ν + yt−1 + 1 yt−1 + · · · + p−1 yt−p+1 + ut ,

(3.2)

and we use LR∗ (r0 ) for the LR statistic based on the model with linear trend term in the cointegration relations. Using this notation, the asymptotic null distributions and the asymptotic distributions under local alternatives are known for both test statistics if the deterministic term is specified properly (see Johansen, 1995, and Saikkonen and Lütkepohl, 1999, 2000). Since we are interested in analysing the properties of LR(r0 ) more closely, it is useful to provide a more explicit expression of this statistic. Let R0 and R1 be (K × T ) matrices containing the vectors of residuals of a regression of yt and yt−1 , respectively, on 1, yt−1 , . . . , yt−p+1 and define Sij = T −1 Ri Rj , i = 0, 1. Moreover, let λ1 ≥ · · · ≥ λK ≥ 0 −1 −1 be the ordered eigenvalues of the matrix S01 S11 S10 S00 . Then LR(r0 ) = −T

K

log(1 − λk ).

(3.3)

k=r0 +1 −1 −1 In other words, the test statistic is made up of the K − r0 smallest eigenvalues of S01 S11 S10 S00 . If the true cointegrating rank r = r0 , the matrix converges in probability to a matrix with rank r0 which has K − r0 zero eigenvalues, as the sample size T → ∞. Hence, the limiting values of the K − r0 smallest eigenvalues are zero. If the true cointegrating rank is greater than r0 , at least one of the eigenvalues (λr0 +1 ) in the test statistic in (3.3) will be non-zero asymptotically and both −T log(1 − λr0 +1 ) and LR(r0 ) will diverge to infinity as the sample size gets large. Hence, the test is consistent. The following proposition shows the number of zero eigenvalues to increase by one if the DGP contains a linear trend in the cointegration relations which is not accounted for in LR(r0 ). The proposition may be viewed as a multivariate extension of the mentioned results of Perron (1988) and Harvey et al. (2009). It will be used to motivate one of the procedures for choosing the cointegrating rank when the actual trending properties are unknown. −1 −1 S10 S00 converges in P ROPOSITION 3.1. If r = rk() > 0 and μ1 = 0, then S01 S11 probability to a matrix with exactly K − r + 1 zero eigenvalues, as T → ∞.

In the Appendix we will derive the limiting null distribution of LR(r) under the conditions of Proposition 3.1, that is, for the case where the rank test is applied to a model with misspecified trend term. As a by-product, we will also prove Proposition 3.1. An immediate implication of the asymptotic distribution presented in Proposition A.1 in the Appendix is that the LR test is not C The Author(s). Journal compilation C Royal Economic Society 2009.


419

consistent if there is a trend in the cointegration relations. If a null hypothesis H0 : rk() = r0 is tested while the true cointegrating rank is r0 + 1, then Proposition A.1 shows that LR(r0 ) has a regular asymptotic distribution and does not converge to infinity. Hence, the test is not consistent if the trend is under-specified. This is the multivariate analogue of the mentioned result due to Perron (1988); see the discussion following equation (A.1) in the Appendix. Unfortunately, under the conditions of Proposition 3.1, the limiting distribution of LR(r) given in Proposition A.1 in the Appendix depends in a complicated way on nuisance parameters and is therefore not directly useful for devising a rank test. In fact, it does not even allow us to conclude that the test is generally conservative if the trend is under-specified. The derivation of the limiting distribution of LR(r) is based on writing the DGP as yt = ν + αβ (yt−1 − μ1 (t − 1)) + 1 yt−1 + · · · + p−1 yt−p+1 + ut = ν + α2 β2 μ0 + α1 β1 yt−1 + α2 β2 (yt−1 − μ0 − μ1 (t − 1)) + 1 yt−1 + · · · + p−1 yt−p+1 + ut = ν∗ + α1 β1 yt−1 + 1 yt−1 + · · · + p−1 yt−p+1 + et ,

(3.4)

where α1 (K × (r − 1)) and α2 (K × 1) are such that α = [α1 : α2 ]. Furthermore, the (K × r) cointegration matrix β is chosen to have orthogonal columns and is such that β = [β1 : β2 ], where β1 (K × (r − 1)) and β2 (K × 1) have the properties β1 μ1 = 0 and β2 μ1 = 0, ν∗ = ν + α2 β2 μ0 and et = ut + α2 β2 xt−1 . The representation in (3.4) suggests that the test procedure tries to test the null hypothesis that there are r − 1 stationary linear combinations of yt given by β1 yt−1 and K − r + 1 non-stationary linear combinations of which one, β2 yt , is trend stationary and the others, β⊥ yt , are I (1). Here β⊥ denotes an orthogonal complement of β. The main reason why the limiting distribution of the test becomes complicated is that the error term of the relevant model, et , is autocorrelated (although stationary). Consequently, the resulting limiting distribution suffers from problems similar to those previously encountered in unit root tests with autocorrelated errors (see e.g. Phillips, 1987, or Phillips and Perron, 1988). In particular, the limiting distribution involves ‘second-order bias’ terms and complications resulting from the fact that the covariance matrix of the error term differs from the long-run covariance matrix. Although the residual autocorrelation may be taken care of if data-dependent lag order selection procedures are used, as is often the case in applied work, this will not fully eliminate the dependence of the limiting distribution on nuisance parameters because the lagged differences of yt cannot fully capture the autocorrelation in α2 β2 xt−1 . Proposition 3.1 implies, however, that a test based on a model with misspecified (or better under-specified) deterministic term is likely to terminate a testing sequence for the cointegrating rank too early and, hence, chooses the rank too small because, even for large T there is a positive probability for not rejecting a rank r0 = r − 1. Given this result, the procedure used by some practitioners may not be implausible when they do not know the precise deterministic term. They perform tests for both alternative trend specifications and reject a cointegrating rank if one of the tests rejects. If LR(r0 ) is applied, although there is a trend in the cointegration relations, then the test tends to terminate too early, whereas in this case LR∗ (r0 ) will find the true cointegrating rank, r, or even overestimate r at least asymptotically. Notice that, in contrast to LR(r0 ), LR∗ (r0 ) is a consistent test and, hence, it rejects all false null hypotheses asymptotically with probability one while the true null hypothesis will be rejected with a probability corresponding to the significance level of the test. On the other hand, if there is no trend in the cointegration relations, LR∗ (r0 ) will have reduced power in small samples and will hence have a tendency to choose too small a cointegrating rank, while in this case LR(r0 ) has its usual properties and, in particular, the C The Author(s). Journal compilation C Royal Economic Society 2009.

420


associated test is consistent so that it will reject all cointegrating ranks below the true one at least asymptotically. The procedure which decides on the basis of the outcome of both tests can be compared formally to a pretest procedure which also tests the deterministic term. As mentioned earlier, pretesting is, for instance, reported by Crowder and Hoffman (1996) and Peytrignet and Stahel (1998). Thus, the following two procedures for choosing an estimate rˆ of the true cointegrating rank r will be considered in the following. P ROCEDURE 3.1. For a given r0 , starting with r0 = 0, use both LR(r0 ) and LR∗ (r0 ) to test H0 (r0 ). Choose rˆ = r0 if none of the tests rejects. Otherwise proceed to testing r0 + 1 etc. until a given rank is not rejected by both tests. P ROCEDURE 3.2. Choose rˆ = 0 if none of the tests rejects H0 (0). Otherwise proceed with r0 = 1. For a given r0 > 0, test H0 : μ1 = 0 versus H1 : μ1 = 0. If H0 is not rejected, use LR(r0 ) to test H0 (r0 ). If H0 is rejected, use LR∗ (r0 ). Choose rˆ = r0 if the appropriate test does not reject H0 (r0 ). Otherwise proceed to rank r0 + 1 etc. until a given rank is not rejected. If r0 = 0, a pretest is not possible in Procedure 3.2 because there are no cointegration relations under the null hypothesis. Still LR(0) and LR∗ (0) differ because they are based on different models. The null hypothesis r0 = 0 is rejected if one of the tests rejects, as in Procedure 3.1. Thus, the two procedures differ only for r0 > 0. One could in fact argue that LR(K − 1) should not be used in these procedures because it would result in a contradiction under the alternative hypothesis which implies a full rank of . Notice that rk() = K implies that the process is stationary and, hence, an intercept will not generate variables with a linear trend. Thus, a stationary process with an intercept as its only deterministic component would be in contradiction to the assumption of a linear trend in at least some of the variables. It is not fully clear how practitioners proceed in this case. In the pretest procedure the null hypothesis H0 : μ1 = 0 can be checked by an LR test (e.g. Johansen, 1995). The relevant test statistic is LR = T

r0

log[(1 − λk )/(1 − λ∗k )],

(3.5)

k=1

where the λk are the eigenvalues based on model (3.2) without linear trend term in the cointegration relations, and the λ∗k are the corresponding eigenvalues from a reduced rank regression of a model with a general linear trend term. The test statistic has a standard χ 2 limiting distribution with r0 degrees of freedom if the null hypothesis holds. The fact that λr0 converges to zero if the trend is under-specified contributes to the power of the test. However, in small samples the power of the test may be less than one. Hence, H0 : μ1 = 0 will not be rejected with some positive probability even if the trend is not orthogonal to the cointegration relations. In other words, there may be a positive probability of using LR(r0 ) in a situation where the corresponding test is expected to have low power according to Proposition 3.1. In the next section, we will report the results of a Monte Carlo study to explore the small sample properties of the two aforementioned procedures for choosing the cointegrating rank. For comparison purposes, we will also include a third procedure which only applies LR∗ (r0 ) to be on the safe side if there is uncertainty regarding the deterministic trend term. In other words, we compare Procedures 3.1 and 3.2 to the following approach. C The Author(s). Journal compilation C Royal Economic Society 2009.


421

P ROCEDURE 3.3. For a given r0 , starting with r0 = 0, use LR∗ (r0 ) to test H0 (r0 ). Choose rˆ = r0 if the test does not reject. Otherwise proceed to testing r0 + 1 etc. until a given rank is not rejected. Clearly, Procedure 3.3 is expected to do well if there is actually a trend in the cointegration relations because LR∗ (r0 ) is constructed for precisely this case. On the other hand, as mentioned earlier, a test based on LR∗ (r0 ) is known to have poor power if the trend is orthogonal to the cointegration relations. Still it may be of interest to see how serious the problem is for the processes considered in our small sample study. A related procedure which has been considered for choosing both the cointegrating rank and the deterministic term simultaneously is based on the so-called Pantula principle (Pantula, 1989). The procedure was suggested by Johansen (1992) for use in the context of cointegrating rank testing although not precisely for the case of interest here. It has been studied in a more general framework which includes the present one by Hjelm and Johansson (2005) and it is included in the CATS in RATS software package (see Hansen and Juselius, 1994). The idea is to apply all the LR tests related to the relevant deterministic terms starting from the most restricted model and stopping when a null hypothesis cannot be rejected. In other words, H0 (r0 + 1) is considered only if all tests reject H0 (r0 ). For our case, where only two model types are considered, this means that a model with intercept only and cointegrating rank rˆ = r0 is chosen if the test based on LR(r0 ) does not reject H0 (r0 ). If, however, LR(r0 ) rejects and LR∗ (r0 ) does not reject H0 (r0 ), a model with rˆ = r0 and a general linear trend is selected. In their simulation study, Hjelm and Johansson (2005) found that this procedure has a strong tendency to end up with a model with intercept only if there is in fact a linear trend in the cointegration relations. Clearly, our Proposition 3.1 explains this result because it shows that LR(r0 ) has a tendency for not rejecting even before the true rank is reached. Since the procedure based on the Pantula principle was found not to work in the study by Hjelm and Johansson (2005), we do not include it in our comparison. We also emphasize that this procedure is really meant to find both the cointegrating rank and the correct specification of the deterministic term. In contrast, in the present study we focus on the more modest task of choosing the cointegrating rank only if there is uncertainty regarding the deterministic trend specification. In fact, in order to fix the problem with the procedure mentioned earlier, Hjelm and Johansson (2005) propose a pretesting procedure which can be seen as a mixture of our Procedures 3.1 and 3.2. We will not elaborate on that procedure because we have the more modest objective of selecting the cointegrating rank only.

4. SIMULATION STUDY In this section, we investigate the empirical small sample properties of the two tests and the procedures for choosing the cointegrating rank if there is uncertainty about the correct trend specification. We will consider both types of DGPs with and without trend in the cointegration relations. All simulations are done with R programs. 4.1. Monte Carlo set-up Time series from DGPs with linear trend in the cointegration relations are generated as yt = μ0 + μ1 t + xt , t = 1, . . . , T , C The Author(s). Journal compilation C Royal Economic Society 2009.

(4.1)

422


with μ0 = 0 and

μ1 = cιK ,

where ιK is a (K × 1) vector of ones, and ψIr 0 xt = xt−1 + ut , 0 IK−r Moreover,

εt ∼ N

x0 = 0,

0, ε =

c = 0.1, 0.5,

Ir

IK−r

ut = φut−1 + εt .

(4.2)

(4.3)

is Gaussian white noise. Here, the parameter |ψ| < 1 and is an (r × (K − r)) matrix. For the five-dimensional processes considered later, we use either = 0 or ⎧ (0.4, 0.2, 0.4, 0.2) for r = 1, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 0.4 0.2 0.4 ⎪ ⎪ for r = 2, ⎪ ⎪ ⎨ 0.2 0.4 0.2 = ⎪ 0.4 0.2 0.4 ⎪ ⎪ ⎪ for r = 3, ⎪ ⎪ ⎪ 0.2 0.4 0.2 ⎪ ⎪ ⎪ ⎩ (0.4, 0.2, 0.4, 0.2) for r = 4. Notice that these choices of ’s imply that the asymptotically non-zero eigenvalues which enter the test statistics tend to be larger for = 0. Thus, the tests may be expected to be more powerful for non-zero matrices. This feature is worth recalling when the simulation results are evaluated. The error process ut is a VAR(1) with scalar parameter φ, |φ| < 1, for which we have used different values. Equivalently, we could have written xt as a VAR(2) process. In (4.2), this process is expressed such that the unit root and short-term properties are easy to disentangle. For φ = 0, xt is a VAR(1), of course. This type of VAR(1) process was also used by Toda (1994) and subsequently in a number of other simulation studies where properties of cointegrating rank tests were explored (see e.g. Hubrich et al., 2001). Toda argues that this process is useful for investigating the properties of LR tests for the cointegrating rank because other VAR(1) processes can be obtained from it by linear transformations, which leave the tests invariant. Thus, this process allows us to explore the properties of the tests for a wide range of DGPs. We have also used VAR(2) processes because the short-term dynamics play a role in the asymptotic distribution of the LR(r0 ) tests if the trend is under-specified. In the following, we will primarily present results for five-dimensional DGPs. Time series from DGPs with trend orthogonal to the cointegration relations will be generated as 0 ψIr 0 (4.4) + yt−1 + ut , y0 = 0, yt = c ιK−r 0 IK−r with ut as in (4.2). For both types of DGPs, we generated 50 presample values to reduce the effects of initial values. As many presample values are used in the estimation, however, as are required for lagged values so that T is the net sample size. C The Author(s). Journal compilation C Royal Economic Society 2009.


423

4.2. Monte Carlo results We have generated three- and five-dimensional time series from the two types of DGPs specified in (4.1)–(4.3) and (4.4) with a range of different values for the parameters ψ, c, φ, and cointegrating rank r. We have also used different sample sizes and data-driven VAR order selection in some of our simulations. A small selection of results for five-dimensional processes is presented in Tables 1–3. We only present results with data-driven VAR order selection because in practice the VAR order is unknown. In this context, the AIC is a popular criterion for choosing the lag order. Therefore, we use it here. The nominal significance level for all tests is 5% because this is the leading case considered in practice. We present only relative frequencies of ranks selected by the three different procedures. As discussed in Section 3, a practitioner may not consider testing H0 (K − 1) if s/he believes that the trend is orthogonal to the cointegration relations because this would be incompatible with the alternative hypothesis rk() = K, that is, the process is stationary under the alternative hypothesis and cannot have a linear trend in model (3.2) with intercept only which would be in contradiction to the assumption that there is a linear trend in the variables. Therefore, we present results for cointegrating ranks r = 1, 2 and 3 only in Tables 1–3. Note that by construction Procedures 3.1 and 3.2 always lead to identical results for r = 0. This property is obviously reflected in the tables. Moreover, in all tables the results of Procedure 3.3 depend very little on the values of the trend slope c. This feature is a consequence of the fact that LR∗ (r0 ) is similar with respect to the trend slope parameters (see Nielsen and Rahbek, 2000). Thus, the rejection frequencies for c = 0.1 and c = 0.5 are identical apart from sampling variability and data-dependent VAR order selection. In contrast, tests based on LR(r0 ) are not similar and, hence, the rank frequencies for Procedures 3.1 and 3.2 may vary for different values of c even if the same VAR orders are used because both procedures are based partly on LR(r0 ). It turns out, however, that the impact of c on Procedures 3.1 and 3.2 is also quite limited in the tables. The DGP underlying Table 1 is a five-dimensional VAR(2) of the type (4.1)–(4.3) with φ = −0.8 and ψ = 0.5. The sample size is T = 100 and the VAR order is chosen by AIC using a maximum order of four. 1 Order selection is based on a VAR model in levels with an intercept term because this appears to be a common approach in practice. We have also used VAR order selection based on trend-adjusted data in other experiments and found qualitatively similar results. For the DGP underlying Table 1 the trend is under-specified in the LR(r0 ) tests. Thus, the true cointegrating rank r has to be at least one because otherwise there cannot be a trend in the cointegration relations. Clearly in this situation Procedure 3.3 is expected to give the best results because it assumes a general trend which is actually present here. On the other hand, Procedures 3.1 and 3.2 allow for uncertainty regarding the deterministic trend. While Procedure 3.3 is indeed superior to Procedure 3.2 for r > 1, a substantial advantage of Procedure 3.3 over Procedure 3.1 cannot be observed for the present settings. In fact, for = 0 and true cointegrating rank r = 1, Procedure 3.3 finds the true cointegrating rank slightly less often than the other two procedures. In this situation, Procedure 3.2 is most successful. On the other hand, Procedure 3.1 leads to a substantially higher success rate than Procedure 3.2 if the true 1 Generally, we have chosen the maximum VAR order as the integer part of 4(T /100)1/4 as recommended in some of the related literature (e.g. Schwert, 1989, Demetrescu et al., 2008a). This choice leads, for example, to pmax = 4 for T = 100 and pmax = 5 for T = 250. C The Author(s). Journal compilation C Royal Economic Society 2009.

424

M. Demetrescu, H. Lütkepohl and P. Saikkonen Table 1. Relative frequencies of cointegrating ranks selected. c = 0.1 c = 0.5

=0 True rank

r0

Proc 3.1

Proc 3.2

Proc 3.3

Proc 3.1

Proc 3.2

Proc 3.3

r=1

0 1 2

32.0 57.7 9.0

32.0 61.4 5.7

33.6 57.3 8.0

32.7 57.7 8.3

32.7 60.8 5.6

33.5 57.4 7.9

3 4

1.0 0.1

0.6 0.0

0.8 0.1

0.9 0.1

0.5 0.1

0.8 0.1

0 1

2.5 40.9

2.5 61.4

2.7 41.3

2.4 42.4

2.4 63.7

2.6 42.4

2 3 4

50.3 5.5 0.5

31.7 3.8 0.3

50.0 5.3 0.4

48.8 5.6 0.5

29.3 3.9 0.3

48.8 5.4 0.5

0 1

0.1 3.4

0.1 16.9

0.1 3.5

0.1 3.7

0.1 18.9

0.1 3.8

2 3 4

38.6 52.3 5.0

43.4 35.3 3.8

38.6 52.2 4.9

38.8 52.2 4.6

42.2 34.6 3.6

38.7 52.2 4.5

r=2

r=3

= 0

c = 0.1

c = 0.5

True rank

r0

Proc 3.1

Proc 3.2

Proc 3.3

Proc 3.1

Proc 3.2

Proc 3.3

r=1

0 1 2

8.1 77.8 12.3

8.1 81.1 9.4

8.4 78.8 11.2

8.5 77.2 12.5

8.5 79.9 10.2

8.6 77.8 11.9

3 4

1.4 0.1

1.0 0.1

1.2 0.1

1.4 0.1

1.1 0.1

1.3 0.1

0 1

0.9 24.9

0.9 34.6

0.9 25.3

0.9 26.1

0.9 36.8

1.0 26.1

2 3 4

64.6 8.4 0.8

56.0 7.4 0.8

64.5 8.2 0.8

63.5 8.3 0.8

53.7 7.4 0.8

63.5 8.3 0.8

0 1

0.1 4.1

0.1 13.6

0.1 4.1

0.1 4.4

0.1 15.2

0.1 4.5

2 3

33.2 56.5

29.2 51.3

33.3 56.4

33.5 55.8

28.8 50.3

33.5 55.8

4

5.5

5.2

5.4

5.3

4.9

5.3

r=2

r=3

Notes: 25,000 replications based on time series from DGP (4.1)–(4.3) with K = 5, ψ = 0.5, φ = −0.8, varying c, , r and T = 100 (nominal significance level of 5%, VAR order selected by AIC).



425

Table 2. Relative frequencies of cointegrating ranks selected. c = 0.1 c = 0.5

=0 True rank

r0

Proc 3.1

Proc 3.2

Proc 3.3

Proc 3.1

Proc 3.2

Proc 3.3

r=1

0 1 2

0.1 91.2 7.8

0.1 92.1 7.0

0.1 92.0 7.0

0.2 91.5 7.6

0.2 92.1 7.1

0.2 92.1 7.1

3 4

0.6 0.0

0.6 0.0

0.6 0.0

0.5 0.0

0.5 0.0

0.5 0.0

0 1

0.0 0.3

0.0 30.7

0.0 0.3

0.0 0.2

0.0 31.7

0.0 0.2

2 3 4

92.2 6.7 0.5

62.2 6.4 0.5

92.4 6.6 0.5

92.5 6.6 0.4

61.4 6.3 0.4

92.6 6.5 0.4

0 1

0.0 0.0

0.0 0.0

0.0 0.0

0.0 0.0

0.0 0.0

0.0 0.0

2 3 4

0.1 93.5 5.9

21.5 72.2 5.7

0.1 93.5 5.9

0.1 93.7 5.7

22.1 71.8 5.5

0.1 93.7 5.7

r=2

r=3

= 0

c = 0.1

c = 0.5

True rank

r0

Proc 3.1

Proc 3.2

Proc 3.3

Proc 3.1

Proc 3.2

Proc 3.3

r=1

0 1 2

0.1 89.0 9.9

0.1 89.5 9.4

0.1 89.5 9.4

0.1 89.0 9.7

0.1 89.4 9.4

0.1 89.4 9.4

3 4

0.8 0.0

0.8 0.0

0.8 0.0

0.9 0.0

0.8 0.0

0.8 0.0

0 1

0.0 0.7

0.0 8.6

0.0 0.7

0.0 0.7

0.0 9.0

0.0 0.7

2 3 4

90.3 8.1 0.7

82.7 7.8 0.7

90.3 8.0 0.7

90.2 8.1 0.8

82.2 7.7 0.8

90.2 8.1 0.8

0 1

0.0 0.0

0.0 0.3

0.0 0.0

0.0 0.0

0.0 0.3

0.0 0.0

2 3

0.7 92.3

2.8 89.9

0.7 92.3

0.6 92.3

2.8 89.8

0.6 92.3

4

6.4

6.3

6.4

6.5

6.4

6.5

r=2

r=3

Notes: 25,000 replications based on time series from DGP (4.1)–(4.3) with K = 5, ψ = 0.5, φ = −0.8, varying c, , r and T = 250 (nominal significance level of 5%, VAR order selected by AIC).


426

M. Demetrescu, H. Lütkepohl and P. Saikkonen Table 3. Relative frequencies of cointegrating ranks selected. c = 0.1 c = 0.5

=0 True rank

r0

Proc 3.1

Proc 3.2

Proc 3.3

Proc 3.1

Proc 3.2

Proc 3.3

r=1

0 1 2

20.5 67.1 10.7

20.5 70.8 7.5

33.3 58.1 7.5

20.1 67.5 10.8

20.1 71.3 7.5

32.1 59.1 7.7

3 4

1.3 0.1

0.8 0.0

0.7 0.1

1.1 0.1

0.7 0.1

0.7 0.1

0 1

0.3 23.7

0.3 30.1

2.1 41.5

0.4 24.7

0.4 31.2

2.1 41.2

2 3 4

66.1 8.4 0.7

62.7 6.0 0.4

50.6 5.1 0.4

65.8 8.1 0.6

62.2 5.6 0.3

50.9 5.1 0.4

0 1

0.0 0.2

0.0 0.5

0.0 1.9

0.0 0.3

0.0 0.6

0.0 1.8

2 3 4

13.4 76.0 7.5

19.5 73.0 5.1

37.1 55.9 4.5

14.7 76.5 6.9

20.9 73.0 4.4

37.5 55.7 4.3

r=2

r=3

= 0

c = 0.1

c = 0.5

True rank

r0

Proc 3.1

Proc 3.2

Proc 3.3

Proc 3.1

Proc 3.2

Proc 3.3

r=1

0 1 2

2.4 81.4 13.9

2.4 86.2 9.9

6.8 81.5 10.4

2.5 81.9 13.6

2.5 86.7 9.6

6.6 81.7 10.3

3 4

1.7 0.2

1.0 0.0

1.0 0.1

1.6 0.1

0.9 0.1

1.0 0.1

0 1

0.0 8.3

0.0 11.5

0.0 20.2

0.0 9.3

0.0 12.7

0.0 20.6

2 3 4

79.1 10.9 0.9

79.9 7.5 0.6

72.1 6.9 0.5

78.8 10.4 0.9

79.2 7.2 0.5

71.6 6.9 0.6

0 1

0.0 0.0

0.0 0.1

0.0 0.4

0.0 0.0

0.0 0.1

0.0 0.4

2 3

6.2 82.2

9.7 82.1

22.6 71.0

7.2 82.7

10.7 82.6

23.4 70.2

4

8.3

5.8

5.4

8.1

5.3

5.4

r=2

r=3

Notes: 25,000 replications based on time series from DGP (4.4) with K = 5, ψ = 0.5, φ = −0.8, varying c, , r and T = 100 (the nominal significance level of the tests is 5%, VAR order selected by AIC).



427

cointegrating rank is 2 or 3. For instance, for r = 2 and = 0 the success rates of Procedures 3.1 and 3.2 are 50.3% versus 31.7% for c = 0.1, and 48.8% versus 29.3% if c = 0.5. Because the non-zero asymptotic eigenvalues for processes with = 0 tend to be larger than in the = 0 case, one would expect that the tests have more power and, hence, find larger cointegrating ranks more easily. This is in fact the case as can be seen in the lower half of Table 1. Apart from that the results are qualitatively the same as in the = 0 case. In Table 2, results for the same DGPs but with sample size T = 250 are reported and a substantial improvement regarding the correct choice of the cointegrating rank can be noticed. Obviously, Procedure 3.2 is less successful than the other two procedures in this respect for r > 1. Thus, our results suggest that Procedures 3.1 and 3.3 have a clear advantage in the presently considered situation where a trend is present in the cointegration relations. Both procedures produce very similar success rates in finding the correct cointegrating rank. As mentioned earlier, we have also done simulations with a range of other DGPs of the type (4.1)–(4.3) with and without data-driven VAR order selection. More results are given in the working paper version of this paper (Demetrescu et al., 2008b). They largely confirm the findings reported in Tables 1 and 2. Not surprisingly, if the VAR order is assumed known, all procedures tend to find the true rank more easily. The main finding that Procedure 3.3 is not much better in this respect than Procedure 3.1, if at all, is however maintained. Moreover, Procedure 3.1 may have a substantial lead over Procedure 3.2. If the latter procedure dominates it is only by a small margin. Of course, the question arises how the tests behave for processes which in fact have no trend in the cointegration relations. This question is considered next by analysing results obtained for DGP (4.4). Some results based on five-dimensional versions of DGP (4.4) are presented in Table 3. Now both tests (based on LR(r0 ) and LR∗ (r0 )) are in principle applicable and should have their usual asymptotic null distributions because both of them are based on properly specified models under the present conditions. Looking at the frequencies of ranks chosen in Table 3, it is seen that there is not much difference between Procedures 3.1 and 3.2 in any of the cases. In fact sometimes Procedure 3.1 is slightly more successful in finding the true cointegrating rank and in other cases the reverse is true. On the other hand, Procedure 3.3 falls substantially behind in some cases, in particular, for r > 1 and = 0. This illustrates the loss in power when the deterministic trend term is overspecified. Again, the procedures find the true cointegrating ranks more often when = 0, as expected. We have also considered other DGPs without a trend in the cointegration relations. The overall conclusion from looking at processes with trend in the variables but not in the cointegration relations is that using Procedures 3.1 or 3.2 does not result in substantial gains or losses relative to the other one, whereas Procedure 3.3 is now substantially less successful in finding the true cointegrating rank. Summarizing the results from all of the experiments, the overall conclusion is that there are DGPs for which Procedure 3.1, which is based on the outcome of both tests, finds the true cointegrating rank much more often than the pretest procedure (Procedure 3.2) and Procedure 3.3 which always allows for a general linear trend, whereas in other cases the procedures perform in a very similar way. Thus, a practitioner who has based the decision regarding the cointegrating rank on the outcome of both tests may in fact have done the right thing. In some cases, a better decision might have been possible by applying a pretest procedure, however.


428


5. CONCLUSIONS In this study, we have compared three procedures for choosing the cointegrating rank of a VECM when the variables have a deterministic linear time trend of unknown form. In that case, there is a choice of two LR tests for the cointegrating rank: the first one allows for a trend not only in the variables but also in the cointegration relations, whereas the second one assumes that the linear trend is orthogonal to the cointegration relations. If there is no linear trend in the cointegration relations (i.e. the linear trend is orthogonal to the cointegration relations), then the second test is preferable because it may be more powerful than the first one. We have derived the asymptotic distribution of the second test if there is actually a linear trend in the cointegration relations and, hence, the test is based on a misspecified model. Unfortunately, in this case the limiting distribution depends in a complicated way on nuisance parameters. It turns out, however, that if the deterministic trend term is under-specified the test tends to be conservative. Taking into account the asymptotic properties of the tests, three procedures for choosing the cointegrating rank of a VAR process when there is uncertainty regarding the type of linear trend suggest themselves and have been used in practice: (1) apply both tests and reject any rank for which one of the tests rejects the null hypothesis; (2) perform a pretest for the deterministic trend and choose the test for the cointegrating rank on the basis of the outcome of the pretest; and (3) use only the test which allows for a general linear trend. Although it is not always fully clear which procedure has actually been used in a particular study, all three possibilities appear to have been used in the literature. Given our theoretical results regarding the properties of the test which ignores a trend in the cointegration relations and, hence, may be applied to a misspecified model, all three procedures have an asymptotic justification. We have performed a Monte Carlo study to investigate the small sample properties of the three procedures. In our simulations, the first procedure which is based on the outcome of both tests is overall preferable. It tends to find the true cointegrating rank much more often than the pretest procedure and the procedure based only on the general trend specification for some of the processes we have considered. Moreover, in those cases where the pretest procedure or the procedure which always allows for a general trend dominate, they usually have only a small lead over the first procedure. Therefore, based on our simulation results, the first procedure can be recommended. Unfortunately, the LR tests for the cointegrating rank are known to have poor power for processes with large dimension and/or order. Therefore, the three procedures may not find the true cointegrating rank very often in extreme situations, which do arise in practice, however. Without a reasonably large sample size, finding the true cointegrating rank of a large VAR process cannot be expected. Considering also lower-dimensional subsystems and building up a higher-dimensional model by taking into account the cointegration relations from the lower-dimensional analysis may be worthwhile in this case. This type of specific-togeneral specification procedure may in fact be a good strategy more generally when cointegrated variables are considered (e.g. Lütkepohl, 2007).

ACKNOWLEDGMENTS The research for this paper was carried out while the first author was a Max Weber Fellow and the third author was a Fernand Braudel Fellow at the European University Institute in Florence. The third author also acknowledges financial support from the Academy of Finland and the Okobank Group Research C The Author(s). Journal compilation C Royal Economic Society 2009.


429

Foundation. Earlier versions of this paper were presented at various conferences and seminars. We are grateful to the participants of these events and, in particular, to Uwe Hassler, two referees and Pierre Perron, the editor, for useful comments on earlier versions of this paper.

REFERENCES Breitung, J., R. Brüggemann and H. Lütkepohl (2004). Structural vector autoregressive modeling and impulse responses. In H. Lütkepohl and M. Krätzig (Eds.), Applied Time Series Econometrics, 159–96. Cambridge: Cambridge University Press. Brüggemann, I. (2003). Measuring monetary policy in Germany: a structural vector error correction approach. German Economic Review 4, 307–39. Coenen, G. and J. L. Vega (2001). The demand for M3 in the euro area. Journal of Applied Econometrics 16, 727–48. Crowder, W. J. and D. L. Hoffman (1996). The long-run relationship between nominal interest rates and inflation: the Fisher equation revisited. Journal of Money, Credit and Banking 28, 102–18. Demetrescu, M., V. Kuzin and U. Hassler (2008a). Long memory testing in the time domain. Econometric Theory 24, 176–215. Demetrescu, M., H. Lütkepohl and P. Saikkonen (2008b). Testing for the cointegrating rank of a vector autoregressive process with uncertain deterministic trend term. Working Paper ECO 2008/24, European University Institute. Doornik, J. A., D. F. Hendry and B. Nielsen (1998). Inference in cointegrating models: UK M1 revisited. Journal of Economic Surveys 12, 533–72. Ericsson, N. R. and S. Sharma (1998). Broad money demand and financial liberalization in Greece. Empirical Economics 23, 417–36. Funke, M. and J. Rahn (2005). Just how undervalued is the Chinese renminbi? The World Economy 28, 465–89. Hansen, H. and K. Juselius (1994). CATS for RATS4: Manual to Cointegration Analysis to Time Series. Evanston, IL: Estima. Harvey, D. I., S. J. Leybourne and A. M. R. Taylor (2009). Unit root testing in practice: dealing with uncertainty over the trend and initial condition. Econometric Theory 25, 587–636. Hjelm, G. and M. W. Johansson (2005). A Monte Carlo study on the pitfalls in determining deterministic components in cointegrating models. Journal of Macroeconomics 27, 691–703. Hubrich, K. (2001). Cointegration Analysis in a German Monetary System. Heidelberg: Physica-Verlag. Hubrich, K., H. Lütkepohl and P. Saikkonen (2001). A review of systems cointegration tests. Econometric Reviews 20, 247–318. Johansen, S. (1992). Determination of cointegration rank in the presence of a linear trend. Oxford Bulletin of Economics and Statistics 54, 383–97. Johansen, S. (1995). Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press. Lettau, M. and S. Ludvigson (2001). Consumption, aggregate wealth, and expected stock returns. Journal of Finance 56, 815–49. Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Berlin: Springer-Verlag. Lütkepohl, H. (2007). General-to-specific or specific-to-general modelling? An opinion on current econometric terminology. Journal of Econometrics 136, 319–24. Nielsen, B. and A. Rahbek (2000). Similarity issues in cointegration analysis. Oxford Bulletin of Economics and Statistics 62, 5–22.


430


Pantula, S. G. (1989). Testing for unit roots in time series data. Econometric Theory 5, 256–71. Perron, P. (1988). Trends and random walks in macroeconomic time series: further evidence from a new approach. Journal of Economic Dynamics and Control 12, 297–332. Perron, P. and J. Y. Campbell (1993). A note on Johansen’s cointegration procedure when trends are present. Empirical Economics 18, 777–89. Perron, P. and T. Yabu (2009). Estimating deterministic trends with an integrated or stationary noise component. Journal of Econometrics 151, 56–69. Peytrignet, M. and C. Stahel (1998). Stability of money demand in Switzerland: a comparison of the M2 and M3 cases. Empirical Economics 23, 437–54. Phillips, P. C. B. (1987). Time series regression with a unit root. Econometrica 55, 277–301. Phillips, P. C. B. and P. Perron (1988). Testing for a unit root in time series regression. Biometrika 75, 335–46. Ribba, A. (2006). The joint dynamics of inflation, unemployment and interest rate in the United States since 1980. Empirical Economics 31, 497–511. Saikkonen, P. and H. Lütkepohl (1999). Local power of likelihood ratio tests for the cointegrating rank of a VAR process. Econometric Theory 15, 50–78. Saikkonen, P. and H. Lütkepohl (2000). Testing for the cointegrating rank of a VAR process with an intercept. Econometric Theory 16, 373–406. Schwert, G. W. (1989). Tests for unit roots: a Monte Carlo investigation. Journal of Business and Economic Statistics 7, 147–59. Stephan, S. (2006). German exports to the euro area. Empirical Economics 31, 871–82. Toda, H. Y. (1994). Finite sample properties of likelihood ratio tests for cointegrating ranks when linear trends are present. Review of Economics and Statistics 76, 66–79.

APPENDIX: THE LIMITING DISTRIBUTION OF LR(r) We derive here the asymptotic distribution of LR(r) when there is actually a trend in the cointegration relations and, hence, the model underlying LR(r) is misspecified. As a by-product, we also prove Proposition 3.1. We use the notation and model set-up of Section 2. Moreover, the space of right-continuous functions on the interval [0, 1] which have left limits is denoted by D[0, 1] and weak convergence on D[0, 1] with respect p to the uniform topology is denoted by ⇒. Convergence in probability is signified by →. Furthermore, op (·) and Op (·) are the usual symbols for stochastic sequences which converge to zero or are bounded, respectively. The test statistic LR(r0 ) is made up of the ordered eigenvalues λ1 ≥ · · · ≥ λK ≥ 0 of the matrix −1 −1 −1 S10 S00 . Let S(λ) abbreviate λS11 − S10 S00 S01 ; as in Johansen (1995) the eigenvalues can S01 S11 alternatively be computed as solutions to the determinant equation

−1 (A.1) S01 = 0. det(S(λ)) = det λS11 − S10 S00 In the remainder of the Appendix, we assume that r0 = r. Thereby, we exclude the case of a stationary xt , i.e. r = K. In this case, the observed series yt have exactly one non-stationary component, which is trend stationary. In the univariate stationary case, Perron (1988) showed that the corresponding unit root test statistic collapses to 0 asymptotically. By extending his arguments to our multivariate setting, one could conclude that, not only would the smallest eigenvalue λK converge to 0, but the test statistic LR(K − 1) itself would vanish asymptotically. We do not give here a rigorous discussion of the stationarity case to save space. C The Author(s). Journal compilation C Royal Economic Society 2009.


431

To obtain the limiting distribution of test statistic LR(r) under the conditions of Proposition 3.1, we follow the pattern in Johansen (1995, pp. 158–60) with appropriate modifications. First, we have to transform the matrix S(λ) in a suitable manner. To this end, recall from equation (2.1) that yt = μ0 + μ1 t + xt , where xt is a zero mean cointegrated VAR(p) process with cointegrating rank r. Let β(K × r) be a matrix of cointegrating vectors with orthogonal columns. Proposition 3.1 assumes that β μ1 = 0 and postmultiplying β by a suitable orthogonal matrix, we can transform this matrix to the form [β1 : β2 ] where β1 (K × (r − 1)) and β2 (K × 1) have the properties β1 μ1 = 0 and β2 μ1 = 0, which will henceforth be assumed. Define the matrix η = [β⊥ : β2 ](K × (K − r + 1)). The columns of the matrix η are orthogonal and we can find a non-singular matrix ξ such that ηξ = [γ : β2 ], where γ (K × (K − r)) satisfies γ μ1 = 0. The last column of ξ is a vector with last component unity and all other components zero and the first K − r columns of ξ can be taken as (η μ1 )⊥ . It is straightforward to check that the matrix γ β⊥ is non-singular. Now consider weak convergence of the process T −1/2 y[T s] , s ∈ [0, 1], in the directions of the matrices γ and β2 . We use the notation γ¯ = γ (γ γ )−1 (similarly for any matrix of full column rank). By Granger’s representation theorem (e.g. Johansen, 1995, Theorem 4.2), xt can be expressed as xt = C

t

uj + (L)ut + A,

(A.2)

j =1

j where C = β⊥ (α⊥ β⊥ )−1 α⊥ with = IK − 1 − · · · − p−1 , (L) = ∞ with L the lag j =0 j L operator and the coefficient matrices j decaying to zero exponentially fast, and A depends on initial values and satisfies β A = 0. Thus, because γ μ1 = 0, equations (2.1) and (A.2) yield T −1/2 γ¯ y[T s] = T −1/2 γ¯ C

[T s]

uj + op (1),

(A.3)

j =1

where the latter term on the right-hand side is op (1) in D[0, 1]. Denoting τ = (β2 μ1 )−1 β2 and observing that τ μ1 = 1 and τ C = 0, we can write [T s] + op (1), T −1 τ y[T s] = (A.4) T with the latter term on the right-hand side again op (1) in D[0, 1]. These results can be justified in the same way as their counterparts in the proof of Lemma 10.2 of Johansen (1995) (or by using Theorem B.13 of the same reference). In the same way as in Johansen’s (1995) Lemma 10.2, we find that T −1/2 γ¯ y[T s] ⇒ γ¯ CW (s)

and

T −1 τ y[T s] ⇒ s,

where W(s) is a Brownian motion with covariance matrix u . Using the matrix BT = [γ¯ : T −1/2 τ ] we can proceed as in Johansen’s (1995) Lemma 10.2 and obtain γ¯ C(W (s) − W ) def −1/2 ¯ ⇒ G(s) = G0 (s) − G0 = T BT (y[T s] − y) , (A.5) s − 12 1 where G0 = 0 G0 (s)ds and analogously for W . Similarly to Johansen (1995, p. 158) we now introduce the transformation matrix AT = [β1 : T −1/2 BT ] and transform the generalized eigenvalue problem (A.1) used to obtain test statistic LR(r). Instead of det(S(λ)), we can consider det(AT S(λ)AT ) and its weak limit. To this end, we need some notation. As in Johansen (1995, p. 141) we define 00 0β xt x , . . . , x = . Cov t−1 t−p+1 β xt−1 β0 ββ By 0β1 (K × (r − 1)) we denote the submatrix of 0β (K × r) obtained by deleting the last column so that 0β1 is the conditional covariance matrix between xt and β1 xt−1 , given xt−1 , . . . , xt−p+1 . C The Author(s). Journal compilation C Royal Economic Society 2009.

432


Similarly, ββ1 (r × (r − 1)) is used for the matrix obtained by deleting the last column from ββ (r × r) and β1 β1 ((r − 1) × (r − 1)) signifies the matrix obtained by deleting the last row and last column from ββ . Properties of these matrices are given in the following lemma. L EMMA A.1. 0β1 = αββ1 ,

(A.6)

00 = αββ α + u

(A.7)

−1 −1 −1 −1 00 − 00 0β1 (β1 0 00 0β1 )−1 β1 0 00 = a(a 00 a)−1 a ,

(A.8)

and

where a = (αββ1 )⊥ (K × (K − r + 1)).

Proof: By straightforward extension of Lemma 10.1 of Johansen (1995).

A more detailed proof of Lemma A.1, as well as of the Lemma A.2 below (containing further intermediate results), can be found in the working paper version of this paper (Demetrescu et al., 2008b). L EMMA A.2. Under the conditions of Proposition 3.1, p

S00 → 00 ,

(A.9)

p

β1 S11 β1 → β1 β1 ,

(A.10)

p

β1 S10 → β1 0 , T −1 BT S11 BT ⇒

1

(A.11)

GG ds,

(A.12)

0

BT (S10 − S11 βα ) ⇒

1

GdW ,

(A.13)

0

BT S11 β1 = Op (1),

(A.14)

BT S10 = Op (1).

(A.15)

Proof: By straightforward extension of Lemma 10.3 of Johansen (1995). Now consider the determinant det(AT S(λ)AT )

⇒ det

λβ1 β1 0

= det λβ1 β1

0

−1 0β1 β1 0 00

0

0

0

− GG ds

−1 − β1 0 00 0β1 det λ λ

1 0

1

GG ds .

0

The weak convergence can be justified by using the definitions and (A.9)–(A.12) (cf. (11.16) in Johansen, 1995, p. 158). Setting the limit equal to zero it is seen that there are K − r + 1 zero roots and r − 1 positive C The Author(s). Journal compilation C Royal Economic Society 2009.


433

roots given by the solutions of

−1 det λβ1 β1 − β1 0 00 0β1 = 0. Thus, the r − 1 largest roots of (A.1) converge weakly to the roots of this equation and the rest converge weakly to zero. This proves Proposition 3.1. To derive an explicit expression for the limiting distribution of LR(r), we can now follow arguments entirely similar to those starting at the top of p. 159 of Johansen (1995). First consider the decomposition [β1 : BT ] det(S(λ))[β1 : BT ]

−1 = det(β1 S(λ)β1 ) det(BT {S(λ) − S(λ)β1 β1 S(λ)β1 β1 S(λ)}BT )

(A.16)

and let T → ∞ and λ → 0 in such a way that ρ = T λ is fixed. Using (A.9)–(A.11) from Lemma A.2 and arguments similar to those in Johansen (1995, p. 159), it can be seen that in order to derive the asymptotic distribution of the ρ’s it suffices to consider the second factor on the right-hand side of (A.16). For this factor, (A.9), (A.11), (A.14) and (A.15) yield −1 S01 β1 BT S(λ)β1 = ρT −1 BT S11 β1 − BT S10 S00 −1 = −BT S10 00 0β1 + op (1)

and −1 BT S(λ)BT = ρT −1 BT S11 BT − BT S10 S00 S01 BT −1 = ρT −1 BT S11 BT − BT S10 00 S01 BT + op (1).

Using these results, we find that (cf. Johansen, 1995, p. 159) −1 BT S(λ) − S(λ)β1 β1 S(λ)β1 β1 S(λ) BT = ρT −1 BT S11 BT − BT S10 N1 S01 BT + op (1), where

−1 −1 −1 −1 −1 N1 = 00 − 00 0β1 β1 0 00 0β1 β1 0 00 . From (A.8), it is known that this matrix can be expressed as N1 = a(a 00 a)−1 a , where a(K × (K − r + 1)) is an orthogonal complement of αββ1 (K × (r − 1)). It is easy to see that a = ¯ where κ(K × 1) is an orthogonal complement of ββ1 . [α⊥ : ακ], Thus, we have reduced the problem to investigating the weak limit of the roots of

(A.17) det ρT −1 BT S11 BT − BT S10 a(a 00 a)−1 a S01 BT = 0. w 1 First, we use (A.12) to conclude that T −1 BT S11 BT → 0 GG ds. Next, we consider the matrix def

BT S10 a = BT (S1u + S11 βα )a = BT S1v a, where S1v is the sample moment matrix between yt−1 and the stationary process vt = ut + αβ xt−1 corrected for (1, yt−1 , . . . , yt−p+1 ). The reason why we can define vt by using xt−1 instead of yt−1 is that vt can be (2) = αβ (yt−1 − μ0 − μ1 (t − 1)) + αβ μ0 = obtained from the error correction form (2.3) by writing (2) yt−1 αβ xt−1 + αβ μ0 and including αβ μ0 in the intercept term. Because mean corrected series are used, the change in the intercept term has no effect. C The Author(s). Journal compilation C Royal Economic Society 2009.

434


To derive the weak limit of BT S10 a = BT S1v a we conclude from (A.2) that the process vt has the linear representation def

vt = ut + αβ (L)ut−1 = ut + wt so that, with obvious notation, BT S1v a = BT S1u a + BT S1w a.

(A.18)

Since BT (S10 − S11 βα ) = BT S1u we find from (A.13) that BT S1u a ⇒

1

GdW a.

(A.19)

0 For examining BT S1w a, the other component of BT S1v a, we denote by zt the vector (yt−1 ,..., let also Sa−i b−j stand for the sample covariance matrix of any two time series at−i and bt−j . Then,

) ; yt−p+1

−1 Szw a. BT S1w a = BT Sy−1 w a − BT Sy−1 z Szz

(A.20)

For the first term on the right-hand side, we can use the definition of wt and Theorem B.13 of Johansen (1995) to obtain ⎡ ∞ ⎤ 1 Cov(y , w )a t t+k k=1 ⎦ GdW (1) βα a + ⎣ BT Sy−1 w a ⇒ 0 0 1 def (A.21) = GdW (1) βα a + yw a, 0

where the zero is (1 × K) and Cov(yt , wt+k ) on the right-hand side can be expressed by using the parameters in the linear representations of the processes yt and wt . A complication in the second term on the right-hand side of (A.20) is that the covariance matrix Szw is not of order Op (T −1/2 ) as it is when we have ut in place of wt (cf. Johansen, 1995, p. 148). Therefore, the second term does not vanish. Because zt and wt are jointly stationary and ergodic processes, a law of large numbers and the fact BT Sy−1 z = Op (1) (to be justified shortly) give −1 −1 Szw a = BT Sy−1 z zz zw a + op (1), BT Sy−1 z Szz

(A.22)

where zz = Cov(zt ) and zw = Cov(zt , wt ). Regarding the matrix BT Sy−1 z , consider BT Sy−1 y−j and conclude from (2.1), (A.2) and Theorem B.13 of Johansen (1995) that ⎡ ∞ ⎤ 1 Cov(y , y ) t t−j +k k=1 ⎦ GdW C + ⎣ BT Sy−1 y−j ⇒ 0 0 1 def = GdW C + yy−j , 0

where the zero is (1 × K). Thus, it follows that 1 GdW C + yy−1 : · · · : BT Sy−1 z ⇒ 0 1

0

GdW C J + yz ,

=

1

GdW C + yy−p+1 (A.23)

0 C The Author(s). Journal compilation C Royal Economic Society 2009.


435

where J = [IK : · · · : IK ](K × K(p − 1)) and yz = [yy−1 : · · · : yy−p+1 ](K × K(p − 1)). Note that the last row of yz is zero and the autocovariances in yz can again be expressed by using the parameters in the linear representation of the process yt . Combining (A.20), (A.21), (A.22) and (A.23), we get 1 1 −1 GdW (1) βα a + yw a + GdW C J + yz zz zw a, BT S1w a ⇒ 0

0

which in conjunction with (A.18) and (A.19) gives 1 GdW a + BT S1v a ⇒ 0

def

=

GdW (1) βα a + yw a

0

1

+

1

−1 GdW C J + yz zz zw a

0 1

GdW a + (G, yw , yz ).

(A.24)

0

Now recall that we need to study the weak limit of the roots of (A.17). By (A.12), (A.24) and the identity BT S1v a = BT S01 a we get ρT −1 BT S11 BT − BT S10 a(a 00 a)−1 a S01 BT 1 1 ⇒ρ GG ds − GdW a + (a 00 a)−1 0

0

1

GdW a + ,

(A.25)

0

where we have written for (G, yw , yz ). The limit is a square matrix of order K − r + 1. Set the determinant of the limit equal to zero and let ρ1 ≥ · · · ≥ ρK−r+1 ≥ 0 be the ordered roots. Then we get the following result. P ROPOSITION A.1. Under the conditions of Proposition 3.1, K−r0 +1

LR(r) ⇒

ρi .

i=1

It is seen that the limiting distribution depends on a number of nuisance parameters and, although some simplifications may be achieved, this dependence appears complicated. For instance, the Brownian motion γ¯ CW (s) in the definition of G(s) can be transformed to the standard Brownian motion B1 (s) = (γ¯ Cu C γ¯ )−1/2 γ¯ CW (s) without changing the limiting distribution of LR(r0 ) (cf. Johansen, 1995, p. 160). However, since this transformation changes the matrices yw and yz by premultiplying their first K − r rows by (γ¯ Cu C γ¯ )−1/2 the resulting simplification (if any) may not be great. Making an analogous transformation from W(s) to B2 (s) to deal with the term dW in (A.25) (see Johansen, 1995, p. 160) does ¯ we not work in our case because in place of the matrix a u a we have a 00 a. Also, since a = [α⊥ : ακ] can write (1) βα a = (1) β[0 : κ] and (potentially) achieve a small simplification in the definition of . However, it seems that any major simplifications are not possible because it is, for instance, unlikely that the effect of the complicated ‘second-order bias’ terms yw and yz could be totally eliminated. Finally, note that the limiting distribution could be derived without using the decomposition of BT S10 a = BT S1v a given in (A.18). The given derivation shows better, however, how and why the resulting limiting distribution differs from its counterparts obtained for the corresponding correctly specified models.



Stationarity of a family of GARCH processes J I -C HUN L IU † †

School of Mathematical Science, Xiamen University, Xiamen, 361005, PR China E-mail: [email protected] First version received: December 2006; final version accepted: June 2009

Summary A natural generalization of the first-order GARCH processes family introduced in 1999 to allow for higher-order past errors and conditional variances on the current conditional variance equation is proposed. This new family of GARCH processes includes many wellknown GARCH processes. A sufficient and necessary condition for the existence of stationary solution of the family of GARCH processes is given. In particular, we proved the stationarity of the so-called family of IGARCH processes. Keywords: GARCH processes family, Strict stationarity, Integrated GARCH.

1. INTRODUCTION He and Teräsvirta (1999) defined the following general class of the GARCH(1,1) model: εt = zt ht ,

hδt = g1 (zt−1 ) + c1 (zt−1 )hδt−1 ,

(1.1)

where δ > 0, {z t } is a sequence of i.i.d. non-degenerate random variables with mean zero and it is assumed that z t is independent of ε t−1 , ε t−2 , . . . , and that g 1 (x) is a positive function and c 1 (x) is a non-negative function. As indicated by these authors, this family of GARCH processes includes the GARCH(1,1) model of Bollerslev (1986), the absolute value GARCH(1,1) model of Taylor (1986) and of Schwert (1989), the asymmetric GJR-GARCH(1,1) model of Ding et al. (1993), the non-linear GARCH(1,1) model of Engle (1990), the volatility switching GARCH(1,1) model of Fornari and Mele (1997), the threshold GARCH(1,1) model of Zako¨ıan (1994), the 4NLGMACH(1,1) model of Yang and Bewley (1995), the generalized quadratic ARCH(1,1) model of Sentana (1995), etc. Some of the structural properties of the family of GARCH processes are discussed in He and Teräsvirta (1999), Ling and McAleer (2002) and Liu (2006). He and Teräsvirta (1999) gave the conditions for the existence of mδ-order moments of the family of first-order GARCH processes, where m is a positive integer and δ = 1 or 2. A sufficient condition for the strict stationarity of the family of first-order GARCH processes with finite αδ-order moment is established in Ling and McAleer (2002), where α ∈ (0, 1]. Liu (2006) gave a sufficient and necessary condition for the strict stationarity of the family of GARCH processes and also derived some simple conditions for the existence of the moments of the family of GARCH processes. Moreover, Liu (2006) described the tail of the marginal distribution of the family of GARCH processes. C The Author(s). Journal compilation C Royal Economic Society 2009. Published by Blackwell Publishing Ltd, 9600 Garsington Road,


Stationarity of a family of GARCH processes

437

To derive general results, we introduce a natural generalization of the first-order GARCH processes family defined by equation (1.1) that allows for an influence of higher-order past errors and conditional variances on the current conditional variance. The data-generating process can be written as follows: εt = zt ht ,

hδt = g(zt−1 , . . . , zt−s ) +

r

ck (zt−k )hδt−k ,

(1.2)

k=1

where g(z, t, s) = g(z t−1 , . . . , z t−s ) is a strictly positive function and c k (x), k = 1, . . . , r, all are non-negative functions. This new family of GARCH processes includes: • The GARCH( p, q) model (Bollerslev, 1986) for δ = 2 and g(z, t, s) ≡ α 0 , c k (z t−k ) = β k + α k z2t−k , k = 1, . . . , r, where r = max{p, q}, α i = 0 and β j = 0 for i > p and j > q. • The absolute value GARCH( p, q) (AVGARCH( p, q)) model of Taylor (1986, pp. 78–79) and Schwert (1989) for δ = 1 and g(z, t, s) ≡ α 0 , c k (z t−k ) = β k + α k |z t−k |, k = 1, . . . , r, where r = max{p, q}, α i = 0 and β j = 0 for i > p and j > q. • The volatility switching GARCH (VS-GARCH) model (Fornari and Mele, 1997) for δ = 2 , k = 1, . . . , r, where 2 and g(z, t, s) = α0 + sk=1 γk sgn(zt−k ), ck (zt−k ) = βk + αk zt−k r = max{p, q}, α i = 0 and β j = 0 for i > p and j > q. • The GJR-GARCH( p, q) model (Glosten et al., 1993) for δ = 2 and g(z, t, s) ≡ α 0 , c(z t−k ) = β k + (α k + ω k I (z t−k ))z2t−k , where I (z t−k ) = 1 if z t−k < 0, and I (z t−k ) = 0 otherwise, k = 1, . . . , r, r = max{p, q}, α i = 0 and β j = 0 for i > p and j > q. • The non-linear GARCH( p, q) (NLGARCH(p, q, δ)) model (Engle, 1990). Let r = max{p, q}, α i = 0 and β j = 0 for i > p and j > q. If δ = 2 then g(z, t, s) ≡ α 0 , c k (z t−k ) = β k + α k (1 − 2ηsgn(zt−k ) + η2 )z2t−k , k = 1, . . . , r. If δ = 1 then g(z, t, s) ≡ α 0 , c k (z t−k ) = β k + α k (1 − 2ηsgn(zt−k ) + η2 )|zt−k |, k = 1, . . . , r. And, the asymmetric power ARCH( p, q) (A-PARCH( p, q)) model (Ding et al., 1993) for δ > 0, g(z, t, s) ≡ α 0 , c k (z t−k ) = β k + α k (1 − 2ηsgn(zt−k ) + η2 )|zt−k |δ , k = 1, . . . , r. • The threshold GARCH( p, q) model for δ > 0 and g(z, t, s) ≡ α 0 , c k (z t−k ) = β k + (α 1k (1 − I (z t−k )) + α 2k I (z t−k ))|z t−k |δ , k = 1, . . . , r, where r = max{p, q}, α i = 0 and β j = 0 for i > p and j > q. This is a further generalization of the models introduced by Zako¨ıan (1994), Hwang and Woo (2001) and Hwang and Basawa (2004). • The fourth-order non-linear generalized moving-average conditional heteroscedasticity (4NLGMACH) model for δ = 2, c k (z t−k ) = β k , k = 1, . . . , r, and g(z, t, s) = α0 + s 2 4 k=1 α1k (zt−k − dk ) + α2k (zt−1 − dk ) . This is a generalization of the family of movingaverage conditional heteroscedasticity (MACH) models Yang and Bewley (1995) introduced. The main purpose of this paper is to investigate the stationary properties of this new family of GARCH processes. In particular, we will discuss the stationarity of the so-called family of integrated GARCH (IGARCH) processes. The organization of this paper is as follows. In Section 2, a sufficient and necessary condition for the existence of stationary solution of the family of GARCH processes defined by equation (1.2) is given. We also present a sufficient and necessary condition for the existence of a stationary solution of the family of GARCH processes with a finite δ-order moment. Section 3 is devoted to the stationarity of the family of IGARCH processes. Section 4 concludes. Proofs of the results in this paper are given in the Appendix. C The Author(s). Journal compilation C Royal Economic Society 2009.

438

J.-C. Liu

2. STATIONARITY Define

and

Xt = hδt , hδt−1 , . . . , hδt−r+1 ⎡ ⎢ ⎢ At = ⎢ ⎢ ⎣

c1 (zt−1 )

···

1 .. . 0

··· .. .

Bt = (g(z, t, s), 0 . . . , 0) ,

and

cr−1 (zt−r+1 ) cr (zt−r ) 0 .. . 1

···

0 .. . 0

⎤ ⎥ ⎥ ⎥, ⎥ ⎦

(2.1)

where denotes the transpose of the matrix. Then the second equation of equation (1.2) can be rewritten with the vector notations as follows: Xt = At Xt−1 + Bt .

(2.2)

Let us recall the definition of the top Lyapunov exponent. Consider the linear space Rr endowed with the scalar product ·, · and the associated norm x∈R , r

x =

r

|x, ei |.

i=1

We denote the canonical basis {e i , i = 1, . . . , r}, which is orthonormal for this scalar product. For an r × r matrix A with real entries, we define the norm of A by

A = sup Ax : x ∈ Rr , x = 1 . Then the top Lyapunov exponent associated to a sequence {A t } of strictly stationary and ergodic random matrices, is defined by 1 log A0 A−1 . . . A−n , n ∈ N , (2.3) γ = inf E n+1 when E[log+ A 0 ] is finite. Moreover, it is known that, a.s., γ = lim

n→∞

1 log A0 A−1 . . . A−n n

(2.4)

(see Theorem 6 in Kingman, 1973). This shows that γ is independent of the chosen norm. T HEOREM 2.1. Assume that E[log+ (g(z, t, s))] and E[log+ A 0 ] are finite, where {A t } is defined by equation (2.1). Then the family of GARCH processes defined by equation (1.2) has a unique strictly stationary and ergodic solution if and only if the top Lyapunov exponent γ associated to the random matrices {A t } is strictly negative. Moreover, this stationary solution is explicitly expressed as ⎧ ⎞ ⎛ ⎫1/δ r ∞ r m m ⎬ ⎨ ⎠ g z, t − ⎝ ckj z j ... k , s . (2.5) εt = zt g(z, t, s) + i t− k i=1 i ⎭ ⎩ m=1 k1 =1

km =1

j =1

i=1


439


The top Lyapunov exponent γ = E[log(c 1 (z t ))] can be easily seen as r = 1. Therefore, according to Theorem 2.1, we can obtain the following corollary. C OROLLARY 2.1 (Theorem 2.1 in Liu, 2006). Assume that E[log+ (g 1 (z t ))] < ∞ and E[log+ (c 1 (z t ))] < ∞. Then the family of GARCH processes defined by equation (1.1) has a unique strictly stationary and ergodic solution if and only if E[log(c 1 (z t ))] < 0. Moreover, this stationary solution is explicitly expressed as εt = zt

⎧ ⎨ ⎩

g1 (zt−1 ) +

∞

g1 (zt−n−1 )

n=1

n

⎫1/δ ⎬ c1 (zt−j )

j =1

⎭

.

As an application of Theorem 2.1 previously mentioned, a sufficient and necessary condition for the existence of the strictly stationary solution of the family of GARCH processes defined by equation (1.2) with a finite δ-order moment can be given. T HEOREM 2.2. Assume that E[|z t |δ ] < ∞ and E[g(z, t, s)] < ∞, and E[c k (z t )] < ∞, k = 1, . . . , r. Then the family of GARCH processes defined by equation (1.2) has a unique strictly stationary solution with the finite δ-order moment if and only if rk=1 E[ck (zt )] < 1..

3. FAMILY OF IGARCH PROCESSES In this section, we consider the stationarity of another important sub-class of the GARCH process family defined by equation (1.2), namely the IGARCH processes. We need additional stronger hypotheses, but start by introducing a new definition. D EFINITION 3.1. The function family {c k (x), k = 1, . . . , r} is of the same type if there are a non-negative function c(x) and non-negative real numbers a k and b k such that ck (x) = ak c(x) + bk ,

k = 1, . . . , r.

(3.1)

R EMARK 3.1. c(x) = x 2 for the GARCH( p, q) model (Bollerslev, 1986) and VS-GARCH model (Fornari and Mele, 1997); c(x) = |x| for AVGARCH( p, q) model of Taylor (1986, pp. 78–79) and Schwert (1989); c(x) = (1 − 2ηsgn(x) + η2 )x 2 or c(x) = (1 − 2ηsgn(x) + η2 )|x| for the non-linear GARCH( p, q) model (Engle, 1990); c(x) = (1 − 2ηsgn(x) + η2 )|x|δ and δ > 0 for the asymmetric power ARCH( p, q) model (Ding et al., 1993); c(x) ≡ 0 for 4NLGMACH model (Yang and Bewley, 1995). In addition, c(x) = (1 + ωI (x))x 2 and ω > 0 for a special form of the GJR-GARCH( p, q) model (Glosten et al., 1993); c(x) = (ω 1 (1 − I (x)) + ω 2 I (x))|x|δ and δ > 0, and ω i ≥ 0, i = 1, 2 for a special form of the threshold GARCH( p, q) model (a further generalization of the models introduced by Zako¨ıan, 1994, Hwang and Woo, 2001, and Hwang and Basawa, 2004). T HEOREM 3.1. Assume that the function family {c k (x), k = 1, . . . , r} is of the same type, r k=1 E[ck (zt )] = 1, ck (zt ) = ak c(zt ) + bk > 0, a.s., k = 1, . . . , r, and E[|log c(z t )|] < ∞. Then the family of GARCH processes defined by equation (1.2) has a unique strictly stationary solution. C The Author(s). Journal compilation C Royal Economic Society 2009.

440

J.-C. Liu

4. CONCLUSION A natural generalization of the first-order GARCH processes family introduced in He and Teräsvirta (1999) to allow for higher-order past errors and conditional variances on the current conditional variance equation is proposed. This new family of GARCH processes includes many well-known GARCH processes and can be regarded as a class of non-parametric GARCH processes. This article offers explicit results for the strictly stationary properties of the family of GARCH processes; in particular, it proves, via a new approach, the strict stationarity of the family of the so-called integrated GARCH processes. Bougerol and Picard (1992a) discussed the conditions for the existence of a stationary solution of GARCH process of Bollerslev (1986), but the stationarity of the processes listed in Section 1 has seldom been investigated. The results we obtained in this paper fill in the gap.

ACKNOWLEDGMENTS The author would like to thank Stéphane Grégoir and two anonymous referees for their helpful comments and suggestions on earlier versions of this paper.

REFERENCES Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31, 307–27. Bougerol, P. and N. Picard (1992a). Stationarity of GARCH processes and of some nonnegative time series. Journal of Econometrics 52, 115–27. Bougerol, P. and N. Picard (1992b). Strict stationarity of generalized autoregressive processes. Annals of Probability 20, 1714–30. Ding, Z., C. W. J. Granger and R. F. Engle (1993). A long memory property of stock market returns and a new model. Journal of Empirical Finance 1, 83–106. Engle, R. F. (1990). Discussion: stock market volatility and the crash of ’87. Review of Financial Studies 3, 103–06. Fornari, F. and A. Mele (1997). Sign- and volatility-switching ARCH models: theory and applications to international stock markets. Journal of Applied Econometrics 12, 49–65. Glosten, L. R., R. Jagannathan and D. E. Runkle (1993). On the relation between the expected value and the volatility of the nominal excess return on stocks. Journal of Finance 48, 1779–801. He, C. and T. Teräsvirta (1999). Properties of moments of a family of GARCH processes. Journal of Econometrics 92, 173–92. Hennion, H. (1997). Limit theorems for products of positive random matrices. Annals of Probability 25, 1545–87. Horn, R. A. and C. R. Johnson (1985). Matrix Analysis. Cambridge: Cambridge University Press. Hwang, S. Y. and I. V. Basawa (2004). Stationarity and moment structure for Box–Cox transformed threshold GARCH(1,1) processes. Statistics and Probability Letters 68, 209–20. Hwang, S. Y. and M.-J. Woo (2001). Threshold ARCH(1) processes: asymptotic inference. Statistics and Probability Letters 53, 11–20. Kingman, J. F. C. (1973). Subadditive ergodic theory. Annals of Probability 1, 883–99. Ling, S. and M. McAleer (2002). Stationarity and the existence of moments of a family of GARCH processes. Journal of Econometrics 106, 109–17. C The Author(s). Journal compilation C Royal Economic Society 2009.


441

Liu, J.-C. (2006). On the tail behaviors of a family of GARCH processes. Econometric Theory 22, 852–62. Schwert, G. W. (1989). Why does stock market volatility change over time? Journal of Finance 44, 1115– 53. Sentana, E. (1995). Quadratic ARCH models. Review of Economic Studies 62, 639–61. Taylor, S. J. (1986). Modelling Financial Time Series. New York: Wiley. Yang, M. and R. Bewley (1995). Moving average conditional heteroskedastic processes. Economics Letters 49, 367–72. Zako¨ıan, J.-M. (1994). Threshold heteroskedastic models. Journal of Economic Dynamics and Control 18, 931–55.

APPENDIX: PROOFS OF RESULTS Proof of Theorem 2.1: Suppose that γ is strictly negative. Since {z t } is a sequence of i.i.d. non-degenerate random variables with zero mean, then, by the definitions of {A t } and {B t }, we recognize that {A t , B t } is a sequence of strictly stationary and ergodic random matrices. From B 0 = g(z, 0, s), we see that E[log+ B 0 ] is finite. According to Theorem 1.1 in Bougerol and Picard (1992b), equation (2.2) has a unique strictly stationary and ergodic solution, which is explicitly expressed as n−1 ∞ At−j Bt−n . (A.1) Xt = Bt + j =0

n=1

Let ε t = z t (eτ1 X t )1/δ , where X t is defined by equation (A.1). Then {ε t } is a unique strictly stationary and ergodic solution of the family of GARCH processes and equation (2.5) holds. Conversely, we assume that a strictly stationary solution exists for {ε t } of equation (1.2). By iterating equation (2.2), we can establish that m n−1 m A−j B−n + A−j X−m−1 . X0 = B0 + j =0

n=1

j =0

Notice that all entries of X n , A n and B n are non-negative. Therefore, whenever m > 0, n−1 m A−j B−n ≤ X0 , a.s. j =0

n=1

This shows that the series

m

n=1 (

!n−1 j =0

lim

n→∞

Now, we will prove that, for 1 ≤ i ≤ r, lim

n→∞

A−j )B−n converges a.s. Thus, we can see that n

A−j B−n−1 = 0, a.s.

(A.2)

j =0

n

A−j ei = 0, a.s.

j =0


(A.3)

442

J.-C. Liu

Since B −n−1 = g(z, −n − 1, s)e 1 and the law of [g(z, −n − 1, s)]−1 is independent of n, equation (A.2) implies n n −1 lim A−j e1 = lim [g(z, −n − 1, s)] A−j B−n−1 = 0, a.s. n→∞

n→∞

j =0

j =0

Because A −n e r = c r (z −n−r )e 1 and the law of c r (z −n−r ) is independent of n, we have lim

n→∞

n

A−j er = lim cr (z−n−r ) n→∞

j =0

n−1

A−j e1 = 0, a.s.

j =0

If equation (A.3) holds for some 1 < i ≤ r, then, using A −n e i−1 = c i−1 (z −n−i+1 )e 1 + e i , we get lim

n→∞

n

A−j ei−1 = lim

n−1

n→∞

j =0

A−j (ci−1 (z−n−i+1 )e1 + ei ) = 0, a.s.

j =0

Thus, through a backward recursion, equation (A.3) is true for 1 ≤ i ≤ r. Therefore, lim

n→∞

n

A−j = 0, a.s.

(A.4)

j =0

Hence, by Lemma 3.4 in Bougerol and Picard (1992b), equation (A.4) implies that γ < 0.

To obtain Theorem 2.2, we first state a lemma. L EMMA A.1. Let Mt(n)

=

n−1

At−j ,

n ∈ N,

j =0

where A t is defined as in equation (2.1). Then, for each n ≥ 1, ekτ Mt(n) ∈ Ft−k ,

k = 1, . . . , r,

(A.5)

where Ft = σ {zt , zt−1 , . . .}. Moreover, if E[c k (z t )] < ∞, k = 1, . . . , r, then " n # n At−j = E[At−j ]. E j =0

(A.6)

j =0

Proof: When n = 1, M (1) t = A t , it is evident that equation (A.5) is true. Now, assume that equation (A.5) holds for n = m. Notice that Mt(m+1)

=

(m) At Mt−1

=

r

τ (m) τ τ (m) τ (m) τ ck (zt−k ) ekτ Mt−1 , e1 Mt−1 , . . . , er−1 Mt−1

τ .

k=1 C The Author(s). Journal compilation C Royal Economic Society 2009.

443


This shows that equation (A.5) holds for m + 1. We then turn to the proof of equation (A.6). From equation (A.5), we get " n # $ (n) % E At−j = E At Mt−1 j =0

=

r

τ (n) τ (n) τ (n) τ Eck (zt−k )E ekτ Mt−1 , E e1τ Mt−1 , . . . , E er−1 Mt−1

τ

k=1 n $ (n) % = ··· = = E[At ]E Mt−1 E[At−j ].

j =0

Proof of Theorem 2.2: Suppose that rk=1 E[ck (zt )] < 1 and let ρ(A) denote the spectral radius of the matrix A = (A(i, j )) i,j =1,...,r = E(A t ). It is easy to check that ρ(A) < 1. Take ε > 0 satisfying to ρ(A) + ε < 1. Then, it is known that there exists a constant c 0 > 0 such that An+1 (i, j ) ≤ c0 (ρ(A) + ε)n+1 (A.7) i,j

for all n ∈ N (see Horn and Johnson, 1985, p. 299). By equation (2.3), we have, for all n ∈ N, 1 1 γ ≤E log A0 A−1 . . . A−n ≤ log EA0 A−1 . . . A−n . n+1 n+1 Moreover, equations (A.6) and (A.7) together suggest that " n # EA0 A−1 . . . A−n ≤ A−k (i, j ) = An+1 (i, j ) ≤ c0 (ρ(A) + ε)n+1 E i,j

i,j

k=0

for all n ∈ N. It therefore follows that γ ≤

1 log c0 + log (ρ(A) + ε) , n+1

n ∈ N.

Taking n → ∞ in the above equation, we can conclude that γ ≤ log(ρ(A) + ε); that is, γ < 0. Therefore, Theorem 2.1 implies that the family of GARCH processes defined by equation (1.2) has Moreover, by noting the independence between !m a unique strictly stationarysolution. m j =1 ckj (zt− j ki ) and g(z, t − i=1 ki , s), by equation (2.5), we have that i=1

$

E |εt |

δ

%

& " m r ∞ r % = E |zt | E[g(z, t, s)] 1 + ... E ckj zt−j $

δ

m=1 k1 =1

∞ % = E |zt | E[g(z, t, s)]

$

δ

m=0

r

km =1

E[ck (zt )]

j =1

m .

#'

i=1 ki

(A.8)

k=1

δ δ Therefore, according to equation r (A.8), E[|ε t | ] < ∞ follows from the fact that E[|z t | ] < ∞ and E[g(z, t, s)] < ∞, and k=1 E[ck (zt )] < 1. C The Author(s). Journal compilation C Royal Economic Society 2009.

444

J.-C. Liu

Conversely, if we assume that there is a strictly stationary solution {ε t } in the family of δ GARCH processes with a finite δ-order moment, r then, equation (A.8) together with E[|z t | ] > 0 δ and E[g(z, t, s)] > 0, E[|ε t | ] < ∞ implies k=1 E[ck (zt )] < 1 For the proof of Theorem 3.1, we used additional notation introduced in Hennion (1997) that benefit the proof. We set C = {x : x ∈ Rr , x, ei > 0, i = 1, . . . , r},

C = {x : x ∈ Rr , x, ei ≥ 0, i = 1, . . . , r},

B = C ∩ {x : x ∈ Rr , x = 1},

B = C ∩ {x : x ∈ Rr , x = 1},

and define A·x =

Ax , Ax

where A is an r × r matrix and x ∈ Rr . On the other hand, for y ∈ B, k, n ∈ Z, k ≤ n, set y

Zn+1,n = y,

y

Zk,n = (Yk . . . Yn ) · y,

(A.9)

where Y n = Aτn and A n is defined as in equation (2.1) for n ∈ Z. L EMMA A.2. Assume that the function family {c k (x), k = 1, . . . , r} is of the same type and y y r k=1 E[ck (zt )] = 1. Then E[Y 1 Z 2,n ] ≤ 1, where Y 1 and Z 2,n are defined by equations (A.9). Proof: Let nm Y = Ym Y m+1 . . . Y n and nm yi = nm Y y, ei , i = 1, . . . , r, y ∈ B. Using Lemma A.1 (equation (A.5)), we can easily check that nm yi ∈ F m−i = σ (zm−i , zm−i+1 , . . .), i = 1, . . . , r. Hence, nm y i is independent of z t as t < m − i. Now we can write Y2 . . . Yn y =

r

n 2 yi

i=1

=

r r

1+r−m Y (i, j )n2+r−m yj 2

(A.10)

i=1 j =1

and Y1 Y2 . . . Yn y =

r

ck (z1−k )n2 y1

k=1

+

r

n 2 yk .

(A.11)

k=2

We will show that, for 1 ≤ m ≤ r, $ % $ % % $ E Y2 . . . Yn y−1 cm (z1−m )n2 y1 ≤ E cm (z1−m ) E Y2 . . . Yn y−1n2 y1 .

(A.12)

When m = r, it is evident that c r (z 1−r ) is dependent of Y 2 . . . Yn y−1 n2 y 1 ; thus, the equality in equation (A.12) holds. What remains is the case of 1 ≤ m < r. In terms of equation (3.1), equation (A.10) can be rewritten as Y2 . . . Yn y =

r r

1+r−m Y (i, j )n2+r−m yj 2

= Um c(z1−m ) + Vm ,

i=1 j =1

where Um and Vm are dependent of z 1−m . C The Author(s). Journal compilation C Royal Economic Society 2009.

445


For convenience, let ξ m = Um c(z 1−m ) + V m . By the Schwarz inequality together with n2 y1 ∈ F , we get, for m = 1, . . . , r − 1, ( " # ( $n % n −1 2 1/2 1 ( ∗ −1 E 2 y1 Um = E 2 y1 Um E ξm 1/2 ( F1−m ξm ( * ) ( 1 (( ∗ ∗ F ≤ E n2 y1 Um−1 E ξm | F1−m E ξm ( 1−m $ −1 −1 n % $ −1 n % = E[c(z1−m )]E ξm 2 y1 + E ξm Um 2 y1 Vm , 1

∗ where F1−m = σ (z1−r , . . . , z−m , z2−m , . . .). Then $ % (am c(z1−m ) + bm )n2 y1 n −1 E Y2 . . . Yn y cm (z1−m )2 y1 = E Um c(z1−m ) + Vm am c(z1−m )n2 y1 bm n2 y1 +E =E Um c(z1−m ) + Vm Um c(z1−m ) + Vm % % $ % $n $ −1 −1 n −1 = am E 2 y1 Um − am E ξm Um 2 y1 Vm + E ξm−1 bm n2 y1 % $ % $ % $ ≤ am E c(z1−m ) E ξm−1 n2 y1 + E ξm−1 bm n2 y1 % $ = E[cm (z1−m )]E Y2 . . . Yn y−1n2 y1 ,

namely, equation (A.12), which holds for 1 ≤ m < r. Together with equations (A.10) and (A.11), equation (A.12) implies % $ % $ y E Y1 Z2,n = E Y2 . . . Yn y−1 Y1 Y2 . . . Yn y " r # r n = E Y2 . . . Yn y−1 ck (z1−k )n2 y1 + 2 yk k=1

≤

r

k=2

" # r % $ % $ −1n −1 n E ck (z1−k ) E Y2 . . . Yn y 2 y1 + E Y2 . . . Yn y 2 yk = 1.

k=1

k=2

Proof of Theorem 3.1: First, note that, if rk=1 ck (zt−k ) = 1 a.s., the conclusion is trivial. Hence, in the following, we assume that P ( rk=1 ck (zt−k ) = 1) < 1. Since {At , t ∈ Z} is a sequence of strictly stationary and ergodic random matrices, by equation (2.4), lim

n→∞

1 1 log An An−1 . . . A1 = lim log A−1 A−2 . . . A−n = γ . n→∞ n n

Since c k (z t ) > 0, a.s., k = 1, . . . , r, it is easy to check that, for each n ≥ r, P X(n) ∈ S ◦ = 1, where X(n) = An A n−1 . . . A 1 and S ◦ denotes the set of r × r matrices with strictly positive entries. This ensures that the condition (C) in Hennion (1997) is satisfied and that 1 [T ≤n] = 1, a.s., for n ≥ r, where the stopping time T = inf{n : n ≥ 1, X(n) ∈ S ◦ }. Moreover, from E[c(z t )] < ∞ and E[|log c(z t )|] < ∞, it is easy to confirm that m 1 < ∞, where m 1 is defined as in Hennion (1997). C The Author(s). Journal compilation C Royal Economic Society 2009.

446

J.-C. Liu

Also, let Y (n) = Y 1 Y 2 . . . Y n . According to Lemma 3.3 in Hennion (1997), a stationary and ergodic sequence {Zk , k ∈ Z} of random elements of B exists, so that Zk ∈ F k−r , Z1 = lim (Y1 . . . Yn ) · y and Zk = lim (Yk . . . Yn ) · y = Yk · Zk+1 , n→∞

n→∞

(A.13)

y

where y ∈ B. Therefore, equation (A.13) implies that the sequence {Z k+1,n } converges to Z k+1 y with probability 1 as n → ∞, where {Z k+1,n } is defined as in equation (A.9). We can write n + (n−1) + + + (n) + + y + + + + (Yn · y) + log Yn y = log +Yk Zk+1,n +. log Y y = log Y k=1

Therefore, by Theorem 2 and Lemma 5.1 in Hennion (1997), n + + + + 1 1 y log +Yk Zk+1,n + = lim log +Y (n) y + = γ . n→∞ n n→∞ n k=1

lim

Thus, n $ % 1 log Yk Zk+1 = E log Y1 Z2 = γ . n→∞ n k=1

lim

(A.14)

But E[Y 1 Z 2 ] < ∞ follows from Y 1 Z 2 ≤ Y 1 Z 2 = Y 1 . Applying the dominated convergence theorem in conjunction with Lemma A.2, we can obtain that $ % y E [Y1 Z2 ] = lim E Y1 Z2,n ≤ 1. n→∞

If E[Y 1 Z 2 ] < 1, according to equation (A.14), it is obvious that γ < 0. If E[Y 1 Z 2 ] = 1, we can conclude that P (Y 1 Z 2 = 1) < 1. Otherwise, if we assume that Y 1 Z 2 = 1, a.s., then, since {Yk Zk+1 , k ∈ N} is a sequence of strictly stationary and ergodic random vectors, we know that Yk Z k+1 = 1, a.s., k = 1, 2 . . . . It then follows that nm Y Zn+1 = 1,

a.s.,

m, n = 1, 2 . . .

(A.15)

Note that Zr+2 ∈ F 2 , as the arguments in Lemma A.1, we easily prove that eiτ nm Y ∈ F m−i and 2−i , i = 1, 2, . . . , r. In equation (A.15), with n = r + 1, and m = 2 and m = 1, eiτ r+1 2 Y Zr+2 ∈ F respectively, we have, a.s., r

eiτ r+1 2 Y Zr+2 = 1,

i=1

r i=1

ci (z1−i )e1τ r+1 2 Y Zr+2 +

r

eiτ r+1 2 Y Zr+2 = 1.

i=2

It follows that, a.s., r

r

τ r+1 1 ci (z1−i )e1τ r+1 2 Y Zr+2 = e1 2 Y Zr+2 ∈ F .

i=1

This shows that i=1 ci (z1−i ) = 1, contradicting the assumption at the beginning of the proof. Therefore, P (Y 1 Z 2 = 1) < 1. Thus, γ = E[log Y 1 Z 2 ] < log E[Y 1 Z 2 ] ≤ 0. By Theorem 2.1, the family of GARCH processes has a unique stationary solution.



Errata

In Smith (2009), the following errors were published on page Si. In the third line of the third paragraph, the text reads: ‘The particular focus of the paper by Federico Bandi, Peter Hall, Joel Horowitz and George Newman is. . .’ This was incorrect and should have read: ‘The particular focus of the paper by Federico Bugni, Peter Hall, Joel Horowitz and George Neumann is. . .’ In Coudin and Dufour (2009), the following errors were published on page S19. In the postal address of Elise Coudin, the street name was give as, ‘15 Boulevard Gabriel Perl’. This was incorrect and should have read, ‘15 Boulevard Gabriel Peri’. In the first postal address for Jean-Marie Dufour, the university affiliation was given as ‘McBill University’. This was incorrect and should have read, ‘McGill University’. The correct contact details are given in full below: ´ Elise Coudin, Centre de Recherche en Economie et Statistique, Institute National de la Statistique ´ et des Etudes Economiques, 15 Boulevard Gabriel Peri, 92245, Malakoff Cedex, France. Jean-Marie Dufour, Department of Economics, McGill University, 855 Sherbrooke Street West, Montréal, Quebec H3A 2T7, Canada.

We apologize for these errors.

REFERENCES Coudin, E. and J.-M. Dufour (2009). Finite-sample distribution-free inference in linear median regressions under heteroscedasticity and non-linear dependence of unknown form. The Econometrics Journal 12, S19–S49. Smith, R. J. (2009). Editorial. The Econometrics Journal 12, Si–Sv.

C The Author(s). Journal compilation C Royal Economic Society 2009 Published by Blackwell Publishing Ltd, 9600 Garsington Road,



Index to The Econometrics Journal Volume 12

ORIGINAL ARTICLES Ardia, D., Bayesian estimation of a Markov-switching threshold asymmetric GARCH model with Student-t innovations Asai, M. and M. McAleer, Multivariate stochastic volatility, leverage and news impact surfaces Bao, Y. and A. Ullah, On skewness and kurtosis of econometric estimators Bravo, F., Blockwise generalized empirical likelihood inference for non-linear dynamic moment conditions models ˇ ızˇ ek, P, W. Härdle and V. Spokoiny, Adaptive pointwise estimation C´ in time-inhomogeneous conditional heteroscedasticity models Davezies, L., X. D’Haultfoeuille and D. Fougère, Identification of peer effects using group size variation Demetrescu, M., H. Lütkepohl and P. Saikkonen, Testing for the cointegrating rank of a vector autoregressive process with uncertain deterministic trend term Grigoletto, M. and F. Lisi, Looking for skewness in financial time series Gu, Y., D.G. Fiebig, E. Cripps and R. Kohn, Bayesian estimation of a random effects heteroscedastic probit model Hafner, C.M., Causality and forecasting in temporally aggregated multivariate GARCH processes Hoderlein, S. and E. Mammen, Identification and estimation of local average derivatives in non-separable models without monotonicity Kawakatsu, H. and A.G. Largey, EM algorithms for ordered probit models with endogenous regressors Kring, S., S.T. Rachev, M. Höchstötter, F.J. Fabozzi and M.L. Bianchi, Multi-tail generalized elliptical distributions for asset returns Li, Q. and J. Pan, Determining the number of factors in a multivariate error correction–volatility factor model Linton, O., J. Perch Nielsen, and S. Feodor Nielsen, Non-parametric regression with a latent time series Nakatani, T. and T. Teräsvirta, Testing for volatility interactions in the Constant Conditional Correlation GARCH model Poskitt, D.S. and C.L. Skeels, Assessing the magnitude of the concentration parameter in a simultaneous equations model Sarafidis, V. and D. Robertson, On the impact of error cross-sectional dependence in short dynamic panel estimation

PAGE 105 292 232 208 248 397 414 310 324 127 1 164 272 45 187 147 26 62

C The Author(s). Journal compilation C Royal Economic Society 2009. Published by Blackwell Publishing Ltd, 9600 Garsington Road,


450

Index

de Silva, S., K. Hadri and A.R. Tremayne, Panel unit root tests in the presence of cross-sectional dependence: finite sample performance and an application Wilhelmsson, A., Value at Risk with time varying variance, skewness and kurtosis—the NIG-ACD model

340 82

SPECIAL ISSUE ARTICLES Andrews, D.W.K. and S. Han, Invalidity of the bootstrap and the m out of n bootstrap for confidence interval endpoints defined by moment inequalities Antoine, B. and E. Renault, Efficient GMM with nearly-weak instruments Barndorff-Nielsen, O.E., P.R. Hansen, A. Lunde and N. Shephard, Realized kernels in practice: trades and quotes Bugni, F.A., P. Hall, J.L. Horowitz and G.R. Neumann, Goodness-of-fit tests for functional data Coudin, E. and J.-M. Dufour, Finite-sample distribution-free inference in linear median regressions under heteroscedasticity and non-linear dependence of unknown form Chen, X., R. Koenker and Z. Xiao, Copula-based nonlinear quantile autoregression Cheng, X. and P.C.B. Phillips, Semiparametric cointegrating rank selection Christensen, J.H.E., F.X. Diebold and G.D. Rudebusch, An arbitrage-free generalized Nelson–Siegel term structure model Delgado, M.A., J. Hidalgo and C. Velasco, Distribution-free specification tests for dynamic linear models Heckman, J.J. and P.E. Todd, A note on adapting propensity score matching and selection models to choice based samples Manski, C.F. and J.V. Pepper, More on monotone instrumental variables Newey, W.K., Two-step series estimation of sample selection models Robinson, P.M., Large-sample inference on spatial dependence Sentana, E., The econometrics of mean-variance efficiency tests: a survey

S172 S135 C1 S1 S19 S50 S83 C33 S105 S230 S200 S217 S68 C65

NOTES Engler, E. and B. Nielsen, The empirical process of autoregressive residuals Liu, J.-C., Stationarity of a family of GARCH processes Sperlich, S., A note on non-parametric estimation with predicted variables

367 436 382


No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

Recommend Documents